agentfootprint 2.8.0 → 2.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,8 +1,11 @@
1
1
  <p align="center">
2
- <h1 align="center">agentfootprint</h1>
3
- <p align="center">
4
- <strong>Context engineering, abstracted.</strong>
5
- </p>
2
+ <img width="220" alt="agentfootprint logo" src="https://github.com/user-attachments/assets/a47840f4-cc8b-4bea-b88d-d9753f59616b" />
3
+ </p>
4
+
5
+ <h1 align="center">agentfootprint</h1>
6
+
7
+ <p align="center">
8
+ <strong>Context engineering, abstracted.</strong>
6
9
  </p>
7
10
 
8
11
  <p align="center">
@@ -13,123 +16,100 @@
13
16
  <a href="https://github.com/footprintjs/agentfootprint/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="MIT"></a>
14
17
  </p>
15
18
 
16
- <br>
17
-
18
- > **PyTorch's autograd abstracted gradient computation. Express abstracted the HTTP request loop. Prisma abstracted SQL CRUD. Kubernetes abstracted reconciliation. React abstracted the DOM.**
19
- >
20
- > Every load-bearing dev tool of the last decade is *the same kind of move* — abstract one specific kind of bookkeeping that practitioners were doing by hand, so they can spend their attention on intent instead of plumbing.
21
- >
22
- > **agentfootprint is that move applied to context engineering** — the discipline of deciding what content lands in which slot of an LLM call, when, and why. You describe injections declaratively. The framework evaluates every trigger every iteration, composes the `system` / `messages` / `tools` slots, observes every decision as a typed event, and persists checkpoints you can replay six months later. So you write the **intent**, not 200 lines of slot-management bookkeeping per agent.
19
+ ---
23
20
 
24
- | Framework | You write declaratively | The framework abstracts |
25
- |---|---|---|
26
- | **PyTorch (autograd)** | Forward computation graph | Gradient computation, backward pass, parameter bookkeeping |
27
- | **Express / Fastify** | Routes + handlers | HTTP request loop, middleware chain, response serialization |
28
- | **Prisma / SQLAlchemy** | Schema + query intent | SQL generation, connection pooling, migrations |
29
- | **Kubernetes** | Desired state (manifests) | Scheduling, health checks, reconciliation loop |
30
- | **React** | Components + state | DOM diffing, render path, event delegation |
31
- | **agentfootprint** | Injections (slot × trigger × cache) | Slot composition, iteration loop, prompt caching, observation, replay |
21
+ ## What is agentfootprint?
32
22
 
33
- The closest structural parallel is **autograd**: you describe the graph, the framework traverses it, and *because the framework owns the traversal it can record everything that happens for free*. Same idea here you describe Injections, agentfootprint runs the iteration loop, and the typed-event stream + replayable checkpoints + provider-agnostic prompt caching are consequences, not extra features.
23
+ **A framework for building AI agents by treating context as a first-class runtime system.**
34
24
 
35
- ---
25
+ Most agent code becomes context plumbing: which instructions go in `system`, which messages get added after a tool returns, which tools should be exposed right now, which memory to load for this tenant, which parts of the prompt are stable enough to cache.
36
26
 
37
- ## Why it's shaped this way two pillars
27
+ Without a framework, every agent hand-rolls this logic. Over time it becomes a fragile mix of prompt concatenation, tool routing, memory loading, cache markers, observability hooks, and retry logic.
38
28
 
39
- The abstraction lineage above tells you *what* this library is. The two pillars below explain *why* it's structured the way it is. Neither is decorative both are operationalized in the runtime.
29
+ **agentfootprint abstracts that bookkeeping.** You declare what context to inject, where it lands, and when it activates. The framework owns the agent loop, recomposes the LLM call every iteration, records typed events, applies caching, and persists replayable checkpoints.
40
30
 
41
- ### THE WHY connected data (the user-visible win)
31
+ > You write the intent. agentfootprint owns the context loop.
42
32
 
43
- Palantir's 2003 thesis: enterprise insight is bottlenecked by **data fragmentation**, not analyst skill. Connecting siloed data into one ontology collapses weeks of manual correlation into minutes.
33
+ ---
44
34
 
45
- LLM agents face the same fragmentation problem at *runtime*. Disconnected tool state, lost decision evidence, scattered execution context — the agent re-discovers relationships every iteration, burning tokens. agentfootprint connects four classes of agent data so the next token compounds the connection instead of paying for it again:
35
+ ## The lineage
46
36
 
47
- | Class | Mechanism |
48
- |---|---|
49
- | **State** | `TypedScope<S>` — single typed shared state, every read/write tracked |
50
- | **Decisions** | `decide()` evidence — every branch carries the inputs that triggered it |
51
- | **Execution** | `commitLog` + `runtimeStageId` — every state mutation keyed to its writing stage |
52
- | **Memory** | Causal memory — full footprintjs snapshots persisted, cosine-matched on follow-up runs |
37
+ Every load-bearing dev tool of the last decade made the same move:
53
38
 
54
- **Connected data fewer iterations fewer tokens.** Same arithmetic Palantir was attacking in 2003, different decade, different layer.
39
+ | Framework | You write | The framework abstracts |
40
+ |---|---|---|
41
+ | **PyTorch (autograd)** | Forward graph | Gradient computation, backward pass |
42
+ | **Express / Fastify** | Routes + handlers | HTTP loop, middleware chain |
43
+ | **Prisma** | Schema + query intent | SQL generation, migrations |
44
+ | **React** | Components + state | DOM diffing, render path |
45
+ | **agentfootprint** | Injections (slot × trigger × cache) | Slot composition, iteration loop, caching, observation, replay |
55
46
 
56
- ### THE HOWmodular boundaries (the engineering discipline)
47
+ The closest structural parallel is **autograd**: you describe the graph, the framework traverses it, and *because the framework owns the traversal it can record everything for free*. Same idea here typed events, replayable checkpoints, and provider-agnostic prompt caching are consequences of owning the loop, not extra features.
57
48
 
58
- Liskov's ADT (1974) and LSP (1987) work gives a vocabulary for boundaries that don't leak. Every framework boundary in agentfootprint is an LSP-substitutable interface — `LLMProvider`, `ToolProvider`, `CacheStrategy`, `Recorder`, `MemoryStore` — so you can swap implementations without changing agent code. Subflows are CLU clusters with explicit input/output mappers; nothing leaks across the boundary.
49
+ ---
59
50
 
60
- Together: **clean modules + connected data = a runtime that's both fast (Palantir multiplier) and reasonable (Liskov locality).** Boundaries alone produce a clean but dumb library. Connections alone produce a fast but unmaintainable one.
51
+ ## The core idea
61
52
 
62
- Detailed write-ups: [`docs/inspiration/`](./docs/inspiration/) *"Connected Data — the Palantir lineage"* and *"Modularity — the Liskov lineage"*. Not required reading for using the library; required reading for extending or evaluating it.
53
+ Every LLM call has three slots:
63
54
 
64
- ---
55
+ ```text
56
+ system messages tools
57
+ ```
65
58
 
66
- ## The abstraction, concretely
59
+ Every agent feature — steering, instructions, skills, facts, memory, RAG, tool schemas — is content flowing into one of those slots. agentfootprint models all of them as one primitive:
67
60
 
68
- ### Without agentfootprint — context engineering by hand
61
+ ```text
62
+ Injection = slot × trigger × cache
63
+ ```
69
64
 
70
- ```typescript
71
- async function runAgentTurn(userMsg, state) {
72
- let systemPrompt = baseSystem;
73
- const messages = [...state.history, { role: 'user', content: userMsg }];
74
- let activeTools = [...baseTools];
75
-
76
- // 1. Apply always-on steering rules
77
- for (const rule of steeringRules) systemPrompt += '\n' + rule.text;
78
-
79
- // 2. Evaluate conditional instructions
80
- for (const inst of instructions) {
81
- if (inst.activeWhen(state)) systemPrompt += '\n' + inst.prompt;
82
- }
83
-
84
- // 3. Check on-tool-return triggers
85
- if (state.lastToolResult?.toolName === 'redact_pii') {
86
- messages.push({ role: 'system', content: 'Use redacted text only.' });
87
- }
88
-
89
- // 4. Resolve LLM-activated skills
90
- for (const id of state.activatedSkills) {
91
- systemPrompt += '\n' + skills[id].body;
92
- activeTools.push(...skills[id].tools);
93
- }
94
-
95
- // 5. Load + format memory for this tenant
96
- const memEntries = await store.list({ tenant, conversationId });
97
- messages.unshift({ role: 'system', content: formatMemory(memEntries.slice(-10)) });
98
-
99
- // 6. Decide what's cacheable; place provider-specific cache_control markers...
100
- // 7. Call LLM, route tool calls, loop, capture state for resume...
101
- // 8. Persist new turn back to memory tagged with identity...
102
- // 9. Wire SSE for streaming, attach observability hooks...
103
-
104
- // No replay. No audit trail. Per agent, hundreds of lines.
105
- // Every refactor risks a slot-ordering bug nobody catches until prod.
106
- }
65
+ An Injection answers three questions:
66
+
67
+ 1. **Where does this content land?** `system`, `messages`, or `tools`
68
+ 2. **When does it activate?** `always` · `rule` · `on-tool-return` · `llm-activated`
69
+ 3. **How is it cached?** `always` · `never` · `while-active` · predicate
70
+
71
+ That is the whole abstraction. Every named pattern in the agent literature — Reflexion, Tree-of-Thoughts, Skills, RAG, Constitutional AI — reduces to *which slot* + *which trigger*. You learn one model; the field's growth lands as new factories on the same primitive.
72
+
73
+ ```text
74
+ LLM call
75
+ ┌────────────────────────────────────┐
76
+ │ system messages tools │
77
+ │ ▲ ▲ ▲ │
78
+ └──────┼────────────┼────────────┼───┘
79
+ │ │ │
80
+ Injection Injection Injection
81
+
82
+
83
+ always · rule · on-tool-return · llm-activated
107
84
  ```
108
85
 
109
- ### With agentfootprint — declarative
86
+ ---
110
87
 
111
- ```typescript
112
- const agent = Agent.create({ provider, model: 'claude-sonnet-4-5-20250929' })
113
- .system('You are a support assistant.')
114
- .steering(toneRule) // always-on
115
- .instruction(urgentRule) // rule-gated
116
- .skill(billingSkill) // LLM-activated
117
- .memory(conversationMemory) // cross-run, multi-tenant
118
- .tool(weather)
119
- .build();
88
+ ## Why this isn't just an ergonomics win — Dynamic ReAct
120
89
 
121
- await agent.run({ message: userInput, identity: { conversationId } });
90
+ Because the framework owns the loop, **all three slots recompose every iteration based on what just happened.**
91
+
92
+ - **LangChain** assembles prompts once per turn.
93
+ - **LangGraph** composes state per node, not per loop iteration.
94
+ - **agentfootprint** recomposes per iteration.
95
+
96
+ Per-iteration recomposition is what makes context engineering compositional instead of static. It's also the structural prerequisite for the cache layer — cache markers can't track active injections in lockstep without it.
122
97
 
123
- // Every iteration is a replayable typed event stream — for free.
124
- agent.on('agentfootprint.context.injected', (e) =>
125
- console.log(`[${e.payload.source}] landed in ${e.payload.slot}`));
98
+ ```text
99
+ Classic ReAct Dynamic ReAct
100
+ ─────────────── ─────────────
101
+ iter 1: 12 tools shown iter 1: 1 tool (read_skill)
102
+ iter 2: 12 tools shown iter 2: 5 tools (skill activated)
103
+ iter 3: 12 tools shown iter 3: 5 tools
126
104
  ```
127
105
 
128
- Same agent. The hand-rolled version is ~80 lines and growing; the declarative version is ~8 and stable. **The framework owns the wiring** which is exactly why it can observe, replay, audit, and cache it for you.
106
+ Use Dynamic ReAct when your tools have dependencies (one tool's output implies which tool to call next). Use Classic ReAct when all tools are independent and ordering doesn't matter.
107
+
108
+ > 📖 Deep dive: [Dynamic ReAct guide](https://footprintjs.github.io/agentfootprint/guides/dynamic-react/) · [Cache layer](https://footprintjs.github.io/agentfootprint/guides/caching/)
129
109
 
130
110
  ---
131
111
 
132
- ## In 30 seconds — runs offline, no API key
112
+ ## Quick start — runs offline, no API key
133
113
 
134
114
  ```bash
135
115
  npm install agentfootprint footprintjs
@@ -150,7 +130,7 @@ const weather = defineTool({
150
130
  });
151
131
 
152
132
  const agent = Agent.create({
153
- provider: mock({ reply: 'I checked: it is 72°F and sunny.' }), // ← deterministic, no API key
133
+ provider: mock({ reply: 'I checked: it is 72°F and sunny.' }),
154
134
  model: 'mock',
155
135
  })
156
136
  .system('You answer weather questions using the weather tool.')
@@ -165,370 +145,153 @@ Swap `mock(...)` for `anthropic(...)` / `openai(...)` / `bedrock(...)` / `ollama
165
145
 
166
146
  ---
167
147
 
168
- ## The mental model three slots, four triggers, one Injection
169
-
170
- Every LLM call has three slots. **Every "agent feature" — Skill, Steering doc, Instruction, Fact, Memory replay, RAG chunk — is content flowing into one of them, under one of four triggers.** That's the entire abstraction.
171
-
172
- ```
173
- ┌─────────────────────────────────────┐
174
- │ │
175
- │ Your LLM call has 3 slots: │
176
- │ │
177
- │ system messages tools │
178
- │ ▲ ▲ ▲ │
179
- └───────┼──────────┼──────────┼───────┘
180
- │ │ │
181
- │ one │ one │
182
- │ Injection│ Injection│
183
- │ fires… │ fires… │
184
- │ │ │
185
- ┌──────────────┴────┐ ┌──┴───┐ ┌──┴────┐
186
- │ defineSteering │ │ ... │ │ ... │
187
- │ defineInstruction │ │ │ │ │
188
- │ defineSkill │ │ │ │ │
189
- │ defineFact │ │ │ │ │
190
- │ defineMemory(read) │ │ │ │ │
191
- │ defineRAG │ │ │ │ │
192
- │ …your next idea │ │ │ │ │
193
- └────────────────────┘ └──────┘ └───────┘
194
-
195
- …under one of:
196
- always · rule · on-tool-return · llm-activated
197
- ```
198
-
199
- There's no fourth slot. There won't be. Every named pattern in the agent literature — Reflexion, Tree-of-Thoughts, Skills, RAG, Constitutional AI — reduces to *which slot* + *which trigger*. **You learn one model; the field's growth lands as new factories on the same primitive.**
200
-
201
- ### The four triggers — *who decides* this injection is needed right now?
202
-
203
- | Trigger | Who decides | Fires when | Real-world example |
204
- |---|---|---|---|
205
- | `always` | nobody (always on) | Every iteration, every turn | *"Be friendly and concise."* — `defineSteering` |
206
- | `rule` | **you**, via predicate | A `(ctx) => boolean` you wrote returns true | *"If user wrote 'urgent', prioritize fastest path."* — `defineInstruction({ activeWhen })` |
207
- | `on-tool-return` | **the system** | A specific tool just returned (recency-first injection on the next iteration) | *"After `redact_pii` ran, use redacted text only."* — Dynamic ReAct |
208
- | `llm-activated` | **the LLM** | The LLM called your activation tool (e.g. `read_skill('billing')`) | Skill body + unlocked tools land next iteration — `defineSkill` |
209
-
210
- Why exactly four? Because *who decides activation* is a closed axis: nobody / the developer / the system / the LLM. Together those four exhaust the meaningful "when does this content matter?" cases. A fifth would require introducing a new agent of decision — and there isn't one. That's why the primitive surface stays this small even as named patterns proliferate above it.
211
-
212
- ---
213
-
214
- ## Why this isn't just an ergonomics win — Dynamic ReAct
215
-
216
- The React parallel goes one layer deeper than "less code." Because the framework owns the loop, **the framework recomposes the prompt + tool list every iteration based on what just happened.** That's what we call **Dynamic ReAct** — and it's the thing other agent frameworks don't do.
217
-
218
- | You write declaratively | The framework does for you, **every iteration** |
219
- |---|---|
220
- | `.steering(rule)` | Evaluates every iteration, composes into `system` slot |
221
- | `.instruction(activeWhen, prompt)` | Re-evaluates predicate per iteration; routes to `system` or `messages` for attention positioning |
222
- | `.skill(billing)` | Auto-attaches `read_skill` tool; LLM activates by id; body + unlocked tools land in next iteration |
223
- | `.memory(causal)` | Persists footprintjs decision-evidence snapshots; embeds queries; cosine-matches on follow-up runs |
224
- | `.tool(weather)` | Schemas to LLM, dispatches calls, captures args/results, gates by permission policy |
225
- | `.attach(recorder)` | Subscribes to typed events across many domains as the chart traverses |
226
- | `agent.run({...})` | Captures every decision, every commit, every tool call as a JSON checkpoint that's replayable cross-server |
227
-
228
- LangChain assembles prompts once per turn. LangGraph composes state per node, not per loop iteration. CrewAI's Agent is tool-aware but not iteration-aware. **Per-iteration recomposition of all three slots based on the latest tool result + accumulated state is structurally distinct.** Frameworks that compose state per-node rather than per-loop-iteration can't recompute cache markers in lockstep with the active injection set — the structural prerequisite for the cache layer below.
229
-
230
- ### What "every iteration" makes possible
231
-
232
- | Use case | The mechanism |
233
- |---|---|
234
- | **Tool-by-tool LLM steering** — agent called `redact_pii` → next iter, system prompt gets *"use redacted text, don't paraphrase original"* | `defineInstruction({ activeWhen: (ctx) => ctx.lastToolResult?.toolName === 'redact_pii' })` |
235
- | **Adaptive tool exposure** — agent activated `billing` skill → next iter, tool list switches to billing-only set (3× context-budget reduction) | `defineSkill({...})` + LLM-activated trigger |
236
- | **Cost guardrails** — accumulated cost > threshold → next iter, system prompt adds *"be concise"* | `defineInstruction({ activeWhen: (ctx) => ctx.accumulatedCostUsd > 0.50 })` |
237
- | **Iterative format refinement** — iter 1 emitted JSON → iter 2 prompt adds *"continue this format"*; iter 5 prompt drops it | predicate over `ctx.iteration` + `ctx.history` |
238
- | **Failure adaptation** — tool X returned an error → next iter, prompt adds *"don't try X again; use Y"* | `on-tool-return` predicate inspecting `ctx.lastToolResult` for error markers |
239
- | **Few-shot evolution** — iter 1 prompt has example for the rare case → iter 2 drops it because example is consumed | predicate that tracks which examples have already fired |
240
-
241
- The framework owns the loop. The framework re-evaluates triggers every iteration. Tool results reshape the next iteration's prompt. **That's what makes context engineering compositional instead of static.**
242
-
243
- **The flowchart-pattern substrate** ([footprintjs](https://github.com/footprintjs/footPrint)) is what makes the observation automatic. Every stage execution is a typed event during one DFS traversal — no instrumentation, no post-processing. Same way React DevTools shows you the component tree because React owns the render path, agentfootprint shows you the slot composition because agentfootprint owns the prompt path.
244
-
245
- ### When to use Dynamic ReAct
246
-
247
- Use it when **your tools have dependencies** — when one tool's output implies which tool to call next.
248
-
249
- A skill body like *"if `get_port_errors` reports CRC > 0, call `get_sfp_diag` next; if it reports `signal_loss`, call `get_flogi` next"* IS a dependency graph. The skill encodes the workflow; Dynamic ReAct gates the tool surface to that workflow at runtime.
250
-
251
- If your tools are independent (the LLM can call any of them at any time, ordering doesn't matter), Classic ReAct is fine and simpler — don't reach for Skills.
252
-
253
- ### Side-by-side example
254
-
255
- [`examples/dynamic-react/`](./examples/dynamic-react/) ships two mock-backed scripts solving the same task. Per-iteration tool-count progression makes the shape clear:
256
-
257
- ```
258
- Classic ReAct Dynamic ReAct
259
- ─────────────── ─────────────
260
- iter 1: 12 tools shown iter 1: 1 tool (read_skill)
261
- iter 2: 12 tools shown iter 2: 5 tools (skill activated)
262
- iter 3: 12 tools shown iter 3: 5 tools
263
- iter 4: 12 tools shown iter 4: 5 tools
264
- iter 5: 5 tools (final answer)
265
- ```
266
-
267
- The unactivated skills' tools never enter the LLM context. Classic ReAct has no equivalent — every registered tool ships on every call.
268
-
269
- What Dynamic gives you that Classic doesn't:
270
-
271
- 1. **Constant per-call payload** bounded by active-skill size, not registry size. Scales to 50+ tool catalogs.
272
- 2. **Deterministic routing** — `read_skill` forces scope before data tools fire. LLM can't drift to off-topic tools.
273
- 3. **Auditability** — each iteration's tool list is a pure function of `activatedInjectionIds`. Recorded, replayable, diff-able across runs.
274
- 4. **Less hallucination** — fewer tools per call = more in-distribution on the active task.
275
-
276
- > **Compounds with the cache layer (next section).** Because the framework owns both the per-iteration slot recomposition AND the cache marker placement, cache invalidation tracks the live skill state — when a skill deactivates, only its prefix invalidates; the rest of the cached system prompt stays warm.
277
-
278
- Run it:
279
-
280
- ```sh
281
- TSX_TSCONFIG_PATH=examples/runtime.tsconfig.json npx tsx examples/dynamic-react/01-classic-react.ts
282
- TSX_TSCONFIG_PATH=examples/runtime.tsconfig.json npx tsx examples/dynamic-react/02-dynamic-react.ts
283
- ```
284
-
285
- ---
286
-
287
- ## The cache layer — provider-agnostic prompt caching
288
-
289
- Anthropic gives you `cache_control` blocks. OpenAI auto-caches. Bedrock has its own format. Each provider's docs are 30+ pages, the wire formats are different, and the right cache placement depends on what's stable across iterations vs what's volatile.
290
-
291
- agentfootprint gives you **one declarative API across all three** (and a `NoOp` wildcard for the rest). You annotate intent at the injection level; the framework computes the cacheable boundary every iteration; per-provider strategies translate to the right wire format.
292
-
293
- ### Declarative cache directives
294
-
295
- Every injection factory has a `cache:` field. Four forms:
296
-
297
- | Policy | Meaning |
298
- |---|---|
299
- | `'always'` | Cache whenever this injection is in `activeInjections`. |
300
- | `'never'` | Never cache — volatile content (timestamps, per-request IDs). |
301
- | `'while-active'` | Cache while the injection is active; invalidates the moment it becomes inactive. |
302
- | `{ until: ctx => boolean }` | Predicate-driven invalidation (Turing-complete escape hatch). |
303
-
304
- **Smart defaults per factory** — most consumers never write `cache:` explicitly:
148
+ ## A real agent in 8 lines
305
149
 
306
150
  ```typescript
307
- defineSteering({ id: 'tone', prompt: '...' }); // default: 'always'
308
- defineFact({ id: 'profile', data: '...' }); // default: 'always'
309
- defineSkill({ id: 'billing', body: '...', tools: [...] }); // default: 'while-active'
310
- defineInstruction({ id: 'urgent', activeWhen: ..., prompt: '...' }); // default: 'never'
311
- defineMemory({ id: 'causal', type: MEMORY_TYPES.CAUSAL, ... }); // default: 'while-active'
312
- ```
313
-
314
- For composition beyond the four sentinels, use the predicate form:
315
-
316
- ```typescript
317
- // Stable for the first 5 iterations, then flush:
318
- defineSteering({ id: 'examples', prompt: '...', cache: { until: ctx => ctx.iteration > 5 } });
151
+ const agent = Agent.create({ provider, model: 'claude-sonnet-4-5-20250929' })
152
+ .system('You are a support assistant.')
153
+ .steering(toneRule) // always-on
154
+ .instruction(urgentRule) // rule-gated
155
+ .skill(billingSkill) // LLM-activated
156
+ .memory(conversationMemory) // cross-run, multi-tenant
157
+ .tool(weather)
158
+ .build();
319
159
 
320
- // Invalidate when cumulative spend exceeds budget:
321
- defineFact({ id: 'rules', data: '...', cache: { until: ctx => ctx.cumulativeInputTokens > 50_000 } });
160
+ await agent.run({ message: userInput, identity: { conversationId } });
322
161
  ```
323
162
 
324
- ### What the framework does every iteration
163
+ The hand-rolled equivalent is ~80 lines of slot management, trigger evaluation, memory loading, and cache marker placement — and growing with every feature. The declarative version stays at 8.
325
164
 
326
- 1. **`CacheDecisionSubflow`** walks `activeInjections`, evaluates each one's cache directive, and emits provider-independent `CacheMarker[]`.
327
- 2. **`CacheGate decider`** uses footprintjs `decide()` with three rules — kill switch, hit-rate floor (skip when recent hit-rate < 0.3), skill-churn (skip when ≥3 unique skills in the last 5 iters). Decision evidence captured for free.
328
- 3. **The active provider strategy** (registered automatically per `LLMProvider.name`) translates markers to wire format:
329
- - `AnthropicCacheStrategy` → `cache_control` on system blocks (4-marker clamp)
330
- - `OpenAICacheStrategy` → no-op writes (auto-cached); extracts metrics from `prompt_tokens_details.cached_tokens`
331
- - `BedrockCacheStrategy` → model-aware (Anthropic-style for Claude, pass-through else)
332
- - `NoOpCacheStrategy` → wildcard fallback
333
- 4. **`cacheRecorder`** emits typed events: hit rate, fresh-input tokens, cache-read tokens, cache-write tokens, markers applied. Same observability surface as every other event domain.
165
+ > 📖 Compare: [hand-rolled vs declarative](https://footprintjs.github.io/agentfootprint/getting-started/why/) · [migration from LangChain / CrewAI / LangGraph](https://footprintjs.github.io/agentfootprint/getting-started/vs/)
334
166
 
335
- For the per-iteration cache invalidation walkthrough and the full benchmark numbers, see [`docs/guides/caching.md`](./docs/guides/caching.md).
336
-
337
- ### When to use it
167
+ ---
338
168
 
339
- Always it's on by default. The smart defaults handle 80% of cases.
169
+ ## The differentiator: the trace is a cache of the agent's thinking
340
170
 
341
- To audit it:
171
+ Other agent frameworks remember *what was said*. agentfootprint's causal memory records the **decision evidence** — every value the flowchart captured during the run, persisted as a JSON-portable snapshot.
342
172
 
343
- ```typescript
344
- import { cacheRecorder } from 'agentfootprint';
173
+ That changes the cost structure of everything that happens after the agent runs:
345
174
 
346
- agent.attach(cacheRecorder({ onTurnEnd: (m) => console.log(m) }));
347
- // { hitRate: 0.71, freshInput: 1240, cacheRead: 9180, cacheWrite: 0, markersApplied: 2 }
348
- ```
175
+ 1. **Audit / explain** — six months later, "why was loan #42 rejected?" answers from the original evidence (creditScore=580, threshold=600), not reconstruction.
176
+ 2. **Cheap-model triage** a trace from Sonnet is good *input* for Haiku to answer follow-up questions about that run. Memoization for agent reasoning.
177
+ 3. **Training data** — every successful production run is a labeled trajectory for SFT/DPO/process-RL, no separate data-collection phase.
349
178
 
350
- To opt out globally for a specific run:
179
+ One recording, three downstream consumers, no extra instrumentation.
351
180
 
352
- ```typescript
353
- const agent = Agent.create({ provider, caching: 'off', ... }).build();
354
- ```
181
+ > 📖 Deep dive: [Causal memory guide](https://footprintjs.github.io/agentfootprint/guides/causal-memory/)
355
182
 
356
183
  ---
357
184
 
358
185
  ## What you can build
359
186
 
360
- Three example shapes, all runnable end-to-end with `npm run example examples/<file>.ts`.
361
-
362
- ### Customer support agent (skills + memory + audit trail + cache)
363
-
364
187
  ```typescript
365
- const agent = Agent.create({ provider, model: 'claude-sonnet-4-5-20250929' })
188
+ // Customer support skills + memory + audit + cache
189
+ const agent = Agent.create({ provider, model })
366
190
  .system('You are a friendly support assistant.')
367
- .skill(billingSkill) // LLM activates with read_skill('billing'); cached while active
368
- .steering(toneGuidelines) // always-on; cached forever
369
- .memory(conversationMemory) // remembers across .run() calls, per-tenant
191
+ .skill(billingSkill)
192
+ .steering(toneGuidelines)
193
+ .memory(conversationMemory)
370
194
  .build();
371
- ```
372
-
373
- → [`examples/context-engineering/06-mixed-flavors.ts`](examples/context-engineering/06-mixed-flavors.ts)
374
195
 
375
- ### Research pipeline (multi-agent fan-out + merge)
376
-
377
- ```typescript
196
+ // Research pipeline multi-agent fan-out + merge
378
197
  const research = Parallel.create()
379
198
  .branch(optimist).branch(skeptic).branch(historian)
380
199
  .merge(synthesizer)
381
200
  .build();
382
201
 
383
- await research.run({ message: 'Should we adopt microservices?' });
202
+ // Streaming chat token-by-token to a browser via SSE
203
+ agent.on('agentfootprint.stream.token', (e) => res.write(toSSE(e)));
204
+ await agent.run({ message: req.query.message });
384
205
  ```
385
206
 
386
- [`examples/patterns/05-tot.ts`](examples/patterns/05-tot.ts) (Tree-of-Thoughts) · [`examples/patterns/01-self-consistency.ts`](examples/patterns/01-self-consistency.ts)
387
-
388
- ### Streaming chat agent (token-by-token to a browser)
389
-
390
- ```typescript
391
- import express from 'express';
392
- import { toSSE } from 'agentfootprint';
393
-
394
- app.get('/chat', async (req, res) => {
395
- res.setHeader('Content-Type', 'text/event-stream');
396
- agent.on('agentfootprint.stream.token', (e) => res.write(toSSE(e)));
397
- agent.on('agentfootprint.stream.tool_start', (e) => res.write(toSSE(e)));
398
- agent.on('agentfootprint.stream.tool_end', (e) => res.write(toSSE(e)));
399
- await agent.run({ message: req.query.message as string });
400
- res.end();
401
- });
402
- ```
207
+ > 📖 Full examples: [examples gallery](https://github.com/footprintjs/agentfootprint/tree/main/examples) · every example is also a CI test.
403
208
 
404
209
  ---
405
210
 
406
- ## The differentiator: the trace is a cache of the agent's thinking
407
-
408
- Other agent frameworks' memory remembers *what was said*. agentfootprint's `defineMemory({ type: CAUSAL })` records the **decision evidence** — every value the agent's flowchart captured during the run, persisted as a JSON-portable snapshot.
211
+ ## Mocks first, production second
409
212
 
410
- That changes the cost structure of *everything that happens after the agent runs.* The expensive thinking happened once; the recorded trace makes consuming that thinking cheap, three different ways:
213
+ Build the entire app against in-memory mocks with **zero API cost**, then swap real infrastructure one boundary at a time.
411
214
 
412
- ### 1. Audit / explain cross-run, six months later, exact past facts
413
-
414
- ```typescript
415
- const causal = defineMemory({
416
- id: 'causal',
417
- type: MEMORY_TYPES.CAUSAL,
418
- strategy: { kind: MEMORY_STRATEGIES.TOP_K, topK: 1, threshold: 0.7, embedder },
419
- store,
420
- projection: SNAPSHOT_PROJECTIONS.DECISIONS, // inject "why" only, not "what"
421
- });
422
-
423
- // Monday: agent decides loan #42 should be rejected (creditScore=580, threshold=600).
424
- // Friday: user asks "Why was my application rejected?"
425
- // → Causal memory loads the exact decision evidence from Monday.
426
- // → LLM answers from the SOURCE, not reconstruction.
427
- ```
215
+ | Boundary | Dev | Prod |
216
+ |---|---|---|
217
+ | LLM provider | `mock(...)` | `anthropic()` · `openai()` · `bedrock()` · `ollama()` |
218
+ | Memory store | `InMemoryStore` | `RedisStore` · `AgentCoreStore` · DynamoDB / Postgres / Pinecone |
219
+ | MCP | `mockMcpClient(...)` | `mcpClient({ transport })` |
220
+ | Cache strategy | `NoOpCacheStrategy` | auto-selected per provider |
428
221
 
429
- [`examples/memory/06-causal-snapshot.ts`](examples/memory/06-causal-snapshot.ts) runs end-to-end with mock embedder, ~50 lines.
222
+ The flowchart, recorders, and tests don't change between dev and prod.
430
223
 
431
- ### 2. Cheap-model triage — the trace *is* the reasoning
224
+ ---
432
225
 
433
- A trace recorded from your expensive production model (Sonnet-4, GPT-4) is a perfectly good *input* for a small, fast, cheap model (Haiku, GPT-4o-mini) answering follow-up questions about that run. The expensive model already did the work; the cheap model just **reads what's in the trace**.
226
+ ## What ships today
434
227
 
435
- Reading recorded decision evidence is structurally simpler than re-deriving the answer from first principles so a smaller model is enough. You can compose the routing yourself: when causal memory injected a snapshot on the next turn, send that turn to a cheaper provider.
228
+ - **2 primitives**`LLMCall`, `Agent` (the ReAct loop)
229
+ - **4 compositions** — `Sequence`, `Parallel`, `Conditional`, `Loop`
230
+ - **7 LLM providers** — Anthropic · OpenAI · Bedrock · Ollama · Browser-Anthropic · Browser-OpenAI · Mock
231
+ - **One Injection primitive** — `defineSkill` / `defineSteering` / `defineInstruction` / `defineFact`
232
+ - **One Memory factory** — 4 types × 7 strategies including **Causal**
233
+ - **Provider-agnostic prompt caching** — declarative per-injection, per-iteration marker recomputation
234
+ - **RAG · MCP · Memory store adapters** — InMemory · Redis · AgentCore
235
+ - **48+ typed observability events** across context · stream · agent · cost · skill · permission · eval · memory · cache · embedding · error
236
+ - **Pause / resume** — JSON-serializable checkpoints; resume hours later on a different server
237
+ - **Resilience** — `withRetry`, `withFallback`, `resilientProvider`
238
+ - **AI-coding-tool support** — Claude Code · Cursor · Windsurf · Cline · Kiro · Copilot
436
239
 
437
- ```typescript
438
- const heavy = anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! });
439
- const cheap = anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! });
240
+ > 📖 [Full feature list & API reference](https://footprintjs.github.io/agentfootprint/reference/) · [CHANGELOG](./CHANGELOG.md)
440
241
 
441
- // Production turn — heavy model, full reasoning, snapshot persisted.
442
- const productionAgent = Agent.create({ provider: heavy, model: 'claude-sonnet-4-5-20250929' })
443
- .memory(causal)
444
- .build();
445
- await productionAgent.run({ message: 'Should we approve loan #42?', identity });
242
+ ---
446
243
 
447
- // Follow-up turn — cheaper model reads the snapshot, lower cost per turn.
448
- const followUpAgent = Agent.create({ provider: cheap, model: 'claude-haiku-4-5-20251001' })
449
- .memory(causal)
450
- .build();
451
- await followUpAgent.run({ message: 'Why was loan #42 rejected?', identity });
452
- ```
244
+ ## Roadmap
453
245
 
454
- This is memoization for agent reasoning — do the expensive work once, serve many queries from the cached result. Across a production system that handles audit / explain / "why did the agent do X?" traffic, this is real money.
246
+ | Theme | Focus |
247
+ |---|---|
248
+ | Reliability | Circuit breaker, output fallback, auto-resume-on-error |
249
+ | Causal exports | `causalMemory.exportForTraining({ format: 'sft' \| 'dpo' \| 'process' })` |
250
+ | Governance | Policies, budget tracking, production memory adapters |
251
+ | Cache v2 | Gemini handle-based caching, cost attribution |
252
+ | Deep agents | Planning-before-execution, A2A protocol, Lens UI |
455
253
 
456
- ### 3. Training data every successful run becomes a labeled trajectory
254
+ Roadmap items are *not* current API claims. If a feature isn't in `npm install agentfootprint` today, it's listed here, not in the docs.
457
255
 
458
- The same snapshot data shape is the input to SFT / DPO / process-RL training pipelines (`causalMemory.exportForTraining({ format: 'sft' | 'dpo' | 'process' })` is on the roadmap). You don't run a separate data-collection phase — **your production traffic IS your training set.** Every successful customer interaction is a positive trajectory; every escalation or override is a counter-example.
256
+ ---
459
257
 
460
- The same JSON shape that powered the audit trail and the cheap-model follow-up is the training payload. One recording, three downstream consumers, no extra instrumentation.
258
+ ## Design philosophy
461
259
 
462
- ---
260
+ Two principles shape the runtime:
463
261
 
464
- ## Mocks first, prod second
262
+ **Connected data (Palantir, 2003).** Enterprise insight is bottlenecked by data fragmentation, not analyst skill. Agents face the same problem at runtime — disconnected tool state, lost decision evidence, scattered execution context. agentfootprint connects state, decisions, execution, and memory into one runtime footprint so the next iteration compounds the connection instead of paying for it again.
465
263
 
466
- Generative AI development is expensive when every iteration hits a paid API. agentfootprint is designed so you build the entire app agent, context engineering, memory, RAGagainst in-memory mocks, prove the logic end-to-end with **zero API cost**, then swap real infrastructure in one boundary at a time.
264
+ **Modular boundaries (Liskov, 1974).** Every framework boundary`LLMProvider`, `ToolProvider`, `CacheStrategy`, `Recorder`, `MemoryStore`is an LSP-substitutable interface. Swap implementations without changing agent code.
467
265
 
468
- | Boundary | Dev (mock) | Prod (swap one line) |
469
- |---|---|---|
470
- | LLM provider | `mock({ reply })` · `mock({ replies })` for scripted multi-turn | `anthropic()` · `openai()` · `bedrock()` · `ollama()` |
471
- | Embedder | `mockEmbedder()` | OpenAI / Cohere / Bedrock embedder (factories on roadmap) |
472
- | Memory store | `InMemoryStore` | `RedisStore` (`agentfootprint/memory-redis`) · `AgentCoreStore` (`agentfootprint/memory-agentcore`) · DynamoDB / Postgres / Pinecone (planned) |
473
- | MCP server | `mockMcpClient({ tools })` — in-memory, no SDK | `mcpClient({ transport })` to a real server |
474
- | Tool execution | inline closure | real implementation |
475
- | Cache strategy | `NoOpCacheStrategy` (when `mock` provider) | Auto-selected by provider: `AnthropicCacheStrategy` / `OpenAICacheStrategy` / `BedrockCacheStrategy` |
266
+ Connected data alone is fast but unmaintainable. Modular boundaries alone are clean but dumb. Together: a runtime that's both fast and reasonable.
476
267
 
477
- The flowchart, recorders, narrative, and tests don't change between dev and prod. **Ship the patterns first; pay for tokens last.**
268
+ > 📖 Long-form: [the Palantir lineage](https://footprintjs.github.io/agentfootprint/inspiration/connected-data/) · [the Liskov lineage](https://footprintjs.github.io/agentfootprint/inspiration/modularity/)
478
269
 
479
270
  ---
480
271
 
481
- ## Pick your starting door
272
+ ## Where to next
482
273
 
483
- | If you are... | Start here |
274
+ | If you are... | Go here |
484
275
  |---|---|
485
- | 🎓 **New to agents** | [5-minute Quick Start](https://footprintjs.github.io/agentfootprint/getting-started/quick-start/) → first agent runs offline |
486
- | 🛠️ **A LangChain / CrewAI / LangGraph user** | [Migration sketch](https://footprintjs.github.io/agentfootprint/getting-started/vs/) — same patterns, fewer classes |
487
- | 🏗️ **Architecting an enterprise rollout** | [Production guide](https://footprintjs.github.io/agentfootprint/guides/deployment/) — multi-tenant identity, audit trails, redaction, OTel |
488
- | 🏛️ **Doing production due diligence** | [Architecture page](https://footprintjs.github.io/agentfootprint/architecture/dependency-graph/) — 8-layer stack, hexagonal ports, the conventions SSOT |
489
- | 💡 **Curious about the design philosophy** | [Inspiration](./docs/inspiration/) — Palantir-style connected data + Liskov-style modular boundaries |
490
- | 🔬 **Researcher / extending the framework** | [Extension guide](https://footprintjs.github.io/agentfootprint/contributing/extension-guide/) — add a new flavor in 50 lines |
276
+ | New to agents | [5-minute quick start](https://footprintjs.github.io/agentfootprint/getting-started/quick-start/) |
277
+ | Coming from LangChain / CrewAI / LangGraph | [Migration guide](https://footprintjs.github.io/agentfootprint/getting-started/vs/) |
278
+ | Architecting an enterprise rollout | [Production guide](https://footprintjs.github.io/agentfootprint/guides/deployment/) |
279
+ | Doing due diligence | [Architecture overview](https://footprintjs.github.io/agentfootprint/architecture/) |
280
+ | Researcher / extending | [Extension guide](https://footprintjs.github.io/agentfootprint/contributing/extension-guide/) |
281
+ | Curious about design | [Inspiration docs](https://footprintjs.github.io/agentfootprint/inspiration/) |
491
282
 
492
- Every code snippet on the docs site is imported from a real, runnable file in [`examples/`](examples/) — every example is also an end-to-end test in CI. There is no docs-only code in this repo.
283
+ Or jump into the [examples gallery](https://github.com/footprintjs/agentfootprint/tree/main/examples) — every example is also an end-to-end CI test.
493
284
 
494
285
  ---
495
286
 
496
- ## What ships today
497
-
498
- - **2 primitives** — `LLMCall`, `Agent` (the ReAct loop)
499
- - **4 compositions** — `Sequence`, `Parallel`, `Conditional`, `Loop`
500
- - **7 LLM providers** — Anthropic · OpenAI · Bedrock · Ollama · Browser-Anthropic · Browser-OpenAI · Mock (with `mock({ replies })` for scripted multi-turn)
501
- - **One Injection primitive** — `defineSkill` / `defineSteering` / `defineInstruction` / `defineFact` (one engine, four typed factories, all reduce to `{ trigger, slot, cache }`)
502
- - **One Memory factory** — `defineMemory({ type, strategy, store })` — 4 types × 7 strategies including **Causal**
503
- - **Provider-agnostic prompt caching** — declarative `cache:` field per injection · per-iteration marker recomputation via `CacheDecisionSubflow` · registered strategies for Anthropic / OpenAI / Bedrock with `NoOp` wildcard fallback · `cacheRecorder` for hit-rate observability
504
- - **RAG** — `defineRAG()` + `indexDocuments()` (sugar over Semantic + TopK)
505
- - **MCP** — `mcpClient({ transport })` for real servers · `mockMcpClient({ tools })` for in-memory development
506
- - **Memory store adapters** — `InMemoryStore` · `RedisStore` (subpath `agentfootprint/memory-redis`) · `AgentCoreStore` (subpath `agentfootprint/memory-agentcore`)
507
- - **48+ typed observability events** across context · stream · agent · cost · skill · permission · eval · memory · cache · embedding · error · …
508
- - **Chat-bubble status surface** — `agent.enable.thinking({ onStatus })` for one-callback Claude-Code-style updates · `agentfootprint/status` subpath (`selectThinkingState` · `renderThinkingLine` · `defaultThinkingTemplates`) for custom UIs with per-tool template overrides + locale switching — see [`examples/features/06-status-subpath.md`](./examples/features/06-status-subpath.md)
509
- - **Pause / resume** — JSON-serializable checkpoints; pause via `askHuman` / `pauseHere`, resume hours later on a different server
510
- - **Resilience** — `withRetry`, `withFallback`, `resilientProvider`
511
- - **AI-coding-tool support** — bundled instructions for Claude Code · Cursor · Windsurf · Cline · Kiro · Copilot (see `ai-instructions/`)
512
- - **Runnable examples** organized by DNA layer (core · core-flow · patterns · context-engineering · memory · features) — every example is also an end-to-end CI test
287
+ ## Built on
513
288
 
514
- ## What's next (clearly marked roadmap)
289
+ [footprintjs](https://github.com/footprintjs/footPrint) — the flowchart pattern for backend code. The decision-evidence capture, narrative recording, and time-travel checkpointing this library uses are footprintjs primitives. The same way autograd's forward-pass traversal is what makes gradient inspection automatic, footprintjs's flowchart traversal is what makes agentfootprint's typed-event stream and replayable traces automatic.
515
290
 
516
- | Theme | Focus |
517
- |---|---|
518
- | **Reliability subsystem** | `CircuitBreaker` · 3-tier output fallback · auto-resume-on-error · Skills upgrades (`surfaceMode`, `refreshPolicy`) · `MockEnvironment` composer |
519
- | **Causal training-data exports** | `causalMemory.exportForTraining({ format: 'sft' \| 'dpo' \| 'process' })` — production traffic becomes labeled SFT / DPO / process-RL trajectories |
520
- | **Governance** | `Policy` · `BudgetTracker` · DynamoDB / Postgres / Pinecone memory adapters · production embedder factories |
521
- | **Cache layer v2** | Gemini handle-based caching · automatic provider routing based on causal-memory state · `cacheRecorder` cost-attribution |
522
- | **Deep Agents · A2A protocol** | Planning-before-execution · agent-to-agent protocol · Lens UI deep-link |
523
-
524
- For shipped features per release see [CHANGELOG.md](./CHANGELOG.md). Roadmap items are *not* claims about the current API — if a feature isn't in `npm install agentfootprint` today, it's listed here, not in the documentation.
291
+ You don't need to learn footprintjs to use agentfootprint — but if you want to build your own primitives at this depth, [start there](https://footprintjs.github.io/footPrint/).
525
292
 
526
293
  ---
527
294
 
528
- ## Built on
529
-
530
- [footprintjs](https://github.com/footprintjs/footPrint) — the flowchart pattern for backend code. The decision-evidence capture, narrative recording, and time-travel checkpointing this library uses are footprintjs primitives. The same way autograd's forward-pass traversal is what makes gradient inspection automatic, footprintjs's flowchart traversal is what makes agentfootprint's typed-event stream and replayable traces automatic. You don't need to learn footprintjs to use agentfootprint — but if you want to build your own primitives at this depth, [start there](https://footprintjs.github.io/footPrint/).
531
-
532
295
  ## License
533
296
 
534
297
  [MIT](./LICENSE) © [Sanjay Krishna Anbalagan](https://github.com/sanjay1909)