@takk/racs 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +70 -0
- package/LICENSE +190 -0
- package/NOTICE +40 -0
- package/README.md +381 -0
- package/SECURITY.md +57 -0
- package/dist/cli/index.js +3016 -0
- package/dist/cli/index.js.map +1 -0
- package/dist/edge/index.cjs +2000 -0
- package/dist/edge/index.cjs.map +1 -0
- package/dist/edge/index.d.cts +598 -0
- package/dist/edge/index.d.ts +598 -0
- package/dist/edge/index.js +1987 -0
- package/dist/edge/index.js.map +1 -0
- package/dist/index.cjs +2071 -0
- package/dist/index.cjs.map +1 -0
- package/dist/index.d.cts +39 -0
- package/dist/index.d.ts +39 -0
- package/dist/index.js +2057 -0
- package/dist/index.js.map +1 -0
- package/dist/integrations/index.cjs +123 -0
- package/dist/integrations/index.cjs.map +1 -0
- package/dist/integrations/index.d.cts +285 -0
- package/dist/integrations/index.d.ts +285 -0
- package/dist/integrations/index.js +117 -0
- package/dist/integrations/index.js.map +1 -0
- package/dist/otel/index.cjs +93 -0
- package/dist/otel/index.cjs.map +1 -0
- package/dist/otel/index.d.cts +105 -0
- package/dist/otel/index.d.ts +105 -0
- package/dist/otel/index.js +91 -0
- package/dist/otel/index.js.map +1 -0
- package/dist/types-DQ7-9sk3.d.cts +758 -0
- package/dist/types-DQ7-9sk3.d.ts +758 -0
- package/dist/vercel/index.cjs +209 -0
- package/dist/vercel/index.cjs.map +1 -0
- package/dist/vercel/index.d.cts +210 -0
- package/dist/vercel/index.d.ts +210 -0
- package/dist/vercel/index.js +206 -0
- package/dist/vercel/index.js.map +1 -0
- package/dist/web/index.cjs +2000 -0
- package/dist/web/index.cjs.map +1 -0
- package/dist/web/index.d.cts +2 -0
- package/dist/web/index.d.ts +2 -0
- package/dist/web/index.js +1987 -0
- package/dist/web/index.js.map +1 -0
- package/package.json +189 -0
package/README.md
ADDED
|
@@ -0,0 +1,381 @@
|
|
|
1
|
+
# RACS NPM
|
|
2
|
+
|
|
3
|
+
[](https://www.npmjs.com/package/@takk/racs)
|
|
4
|
+
[](./LICENSE)
|
|
5
|
+
[](./SPEC.md)
|
|
6
|
+
[](./SPEC.md#12-the-planning-benchmark-p1-p10)
|
|
7
|
+
[](./SPEC.md)
|
|
8
|
+
[](./package.json)
|
|
9
|
+
|
|
10
|
+
<p align="center">
|
|
11
|
+
<img src="https://raw.githubusercontent.com/davccavalcante/racs/main/assets/racs.png" alt="RACS" width="500">
|
|
12
|
+
</p>
|
|
13
|
+
|
|
14
|
+
RACS (Remote Agent Context Store) plans provider-faithful prefix-cache directives for Massive Intelligence (IM) agent workloads, and it never calls a provider API. That is the product invariant: zero credentials, zero network, the host stays in full control of its own calls, retries, and transport. RACS tells you where to place `cache_control` breakpoints, which `prompt_cache_key` to send, when to create, refresh, or delete a Gemini `cachedContent` resource, and when a cache write would lose money. You apply the directives to the call you were already making, then report the usage counters you already received, and RACS accounts for every cached token.
|
|
15
|
+
|
|
16
|
+
Why it exists: prefix caching saves 41 to 80 percent of input spend in measured agent workloads (arXiv 2601.06007, January 2026), and the documented production failure is volatile content silently busting the cache. One OpenClaw issue measured ten times the expected cost from timestamps interpolated into a system prompt. Providers also disagree on semantics: Anthropic wants explicit breakpoints, OpenAI caches automatically behind routing keys with no write counter, Gemini bills server-side cached content by the token-hour. As of June 2026 we found no shipping npm package that combines stability linting, multi-provider directive planning, drift detection, persistence, and savings analytics, and the Hermes Agent ecosystem has multiple open issues asking for exactly these capabilities.
|
|
17
|
+
|
|
18
|
+
What ships: 16 provider profiles over 4 adapter families, 9 lint codes catching the documented cache-killers, deterministic prefix keys (FNV-1a 64), break-even math, TTL keep-warm scheduling, prefix drift detection, hit-ratio and USD savings analytics with user-supplied pricing, memory, file, and KV persistence, a Vercel AI SDK middleware, OpenTelemetry GenAI ingestion, bridges for the @takk family, and a CLI with a hardened HTTP bridge. Zero runtime dependencies. 187 tests across 13 suites on Node 22 and Node 24.
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## See it run
|
|
23
|
+
|
|
24
|
+
This is the literal output of the deterministic simulation that ships in the CLI. Same flags, same bytes, every run:
|
|
25
|
+
|
|
26
|
+
```console
|
|
27
|
+
$ racs simulate --calls 400 --seed 7
|
|
28
|
+
racs simulate: 400 calls, seed 7, interval 60s, provider anthropic
|
|
29
|
+
structured lint: clean
|
|
30
|
+
naive lint:
|
|
31
|
+
LINT warning segment-order naive-turn Volatile segment 'naive-turn' precedes stable segment 'naive-tools'. Prefix caches are left-anchored, so every token after 'naive-turn' is unreachable for the cache. Reorder stable-first: move 'naive-tools' and every other stable segment ahead of 'naive-turn'.
|
|
32
|
+
LINT warning timestamp-in-stable naive-system Segment 'naive-system' is declared stable but contains an ISO-8601 datetime (digest 9d6f8366), and the words 'today' or 'current time' near digits. A timestamp changes the prefix on every call and silently defeats the cache. Move live time values into a volatile segment at the prompt tail.
|
|
33
|
+
drift naive: 217dbd595cc63a93 -> 7dfee9d9ed5102b9, segments [naive-system], 3000 tokens invalidated (call 2)
|
|
34
|
+
drift naive: 7dfee9d9ed5102b9 -> 44338bff92d176f9, segments [naive-system], 3000 tokens invalidated (call 3)
|
|
35
|
+
drift naive: 44338bff92d176f9 -> afa57f324ffa6032, segments [naive-system], 3000 tokens invalidated (call 4)
|
|
36
|
+
[... 399 lines hidden: 396 further drift reports (calls 5 through 400) and the progress lines at calls 100, 200, and 300 ...]
|
|
37
|
+
progress: 400/400 calls, structured hits 399, naive hits 0
|
|
38
|
+
--- summary ---
|
|
39
|
+
calls: 400
|
|
40
|
+
structured: hit ratio 0.96, net savings 9.87 USD
|
|
41
|
+
naive: hit ratio 0.00, write-premium loss 1.50 USD
|
|
42
|
+
structured prompt saves $11.37 (88.1%) versus naive
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
Two prompts with identical content run side by side. The structured one orders stable segments first and hits the cache on 399 of 400 calls. The naive one interpolates a timestamp into its "stable" system prompt, so the prefix key drifts on every call, it never reads a single cached token, and it pays the write premium 400 times. RACS catches the bug at lint time, names the segment, and quantifies the loss.
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
## Quickstart
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
pnpm add @takk/racs
|
|
53
|
+
# or: npm install @takk/racs
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
Plan, apply, record. The whole integration is this loop:
|
|
57
|
+
|
|
58
|
+
```ts
|
|
59
|
+
import { createRACS } from '@takk/racs';
|
|
60
|
+
|
|
61
|
+
const racs = createRACS({
|
|
62
|
+
// Pricing is always user-supplied; without it you still get every
|
|
63
|
+
// token-denominated statistic, just no USD figures.
|
|
64
|
+
pricing: {
|
|
65
|
+
'claude-sonnet-4-5': {
|
|
66
|
+
inputPerMTok: 3,
|
|
67
|
+
cacheReadPerMTok: 0.3,
|
|
68
|
+
cacheWrite5mPerMTok: 3.75,
|
|
69
|
+
cacheWrite1hPerMTok: 6,
|
|
70
|
+
},
|
|
71
|
+
},
|
|
72
|
+
});
|
|
73
|
+
|
|
74
|
+
const plan = racs.plan({
|
|
75
|
+
agentId: 'support-agent',
|
|
76
|
+
provider: 'anthropic',
|
|
77
|
+
model: 'claude-sonnet-4-5',
|
|
78
|
+
segments: [
|
|
79
|
+
{ id: 'system', role: 'system', stability: 'stable', content: SYSTEM_PROMPT },
|
|
80
|
+
{ id: 'tools', role: 'tools', stability: 'stable', content: TOOLS_JSON },
|
|
81
|
+
{ id: 'history', role: 'history', stability: 'semi', content: historyText },
|
|
82
|
+
{ id: 'turn', role: 'dynamic', stability: 'volatile', content: userTurn },
|
|
83
|
+
],
|
|
84
|
+
reuse: { intervalSeconds: 60 },
|
|
85
|
+
});
|
|
86
|
+
|
|
87
|
+
// Gate on the lint pass: an error-severity finding means the prompt as
|
|
88
|
+
// declared cannot achieve cache hits.
|
|
89
|
+
const fatal = plan.findings.find((finding) => finding.severity === 'error');
|
|
90
|
+
if (fatal !== undefined) throw new Error(`${fatal.code}: ${fatal.message}`);
|
|
91
|
+
|
|
92
|
+
// plan.directives for this input:
|
|
93
|
+
// [{ kind: 'breakpoint', segmentId: 'system', ttl: '5m' },
|
|
94
|
+
// { kind: 'breakpoint', segmentId: 'tools', ttl: '5m' },
|
|
95
|
+
// { kind: 'breakpoint', segmentId: 'history', ttl: '5m' }]
|
|
96
|
+
// Apply them to your own Anthropic call: a cache_control marker on the
|
|
97
|
+
// last content block of each named segment. RACS never makes that call.
|
|
98
|
+
|
|
99
|
+
// After your call returns, report the usage counters it already carried:
|
|
100
|
+
racs.record({
|
|
101
|
+
provider: 'anthropic',
|
|
102
|
+
model: 'claude-sonnet-4-5',
|
|
103
|
+
prefixKey: plan.prefixKey,
|
|
104
|
+
inputTokens: 5200,
|
|
105
|
+
cacheReadTokens: 4600,
|
|
106
|
+
});
|
|
107
|
+
|
|
108
|
+
const stats = racs.stats();
|
|
109
|
+
console.log(stats.hitRatio, stats.savedUsd, stats.netUsd);
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
`CacheUsage.inputTokens` is the all-in billed input (fresh input + cached reads + cache writes); the otel and vercel adapters normalize raw exclusive provider counts to this convention automatically.
|
|
113
|
+
|
|
114
|
+
The full engine surface is `plan`, `lint`, `record`, `stats`, `schedule`, `markRefreshed`, `drifts`, `invalidate`, `profileOf`, `on`, `flush`, and `close`, specified in [SPEC.md](./SPEC.md).
|
|
115
|
+
|
|
116
|
+
---
|
|
117
|
+
|
|
118
|
+
## Provider matrix
|
|
119
|
+
|
|
120
|
+
Every named provider is a thin profile over exactly one of four adapter families, which is how 16 providers ship without 16 code paths. All numbers document provider semantics as researched in June 2026 and are overridable per engine instance through `options.profiles`.
|
|
121
|
+
|
|
122
|
+
| Provider | Family | Mechanism |
|
|
123
|
+
|---|---|---|
|
|
124
|
+
| `anthropic` | breakpoint | Explicit `cache_control` markers, up to 4 per request, 5m and 1h TTL tiers, 1.25x and 2x write premiums, 0.1x reads |
|
|
125
|
+
| `bedrock` | breakpoint | `cachePoint` blocks on the Converse API, Anthropic-equivalent breakpoint semantics and multipliers |
|
|
126
|
+
| `hermes` | breakpoint | Hermes Agent's fixed system_and_3 layout rides `cache_control`; RACS plans superior layouts for it; 1024-token cacheable minimum as on Anthropic |
|
|
127
|
+
| `microsoft-foundry` | breakpoint | Claude models on Microsoft Foundry honor `cache_control` unchanged |
|
|
128
|
+
| `openai` | routing-key | Automatic server-side caching in 128-token increments above 1024, `prompt_cache_key` stickiness, optional 24h retention, no write counter |
|
|
129
|
+
| `xai` | routing-key | Automatic prefix caching, steerable via the `x-grok-conv-id` header and `prompt_cache_key` |
|
|
130
|
+
| `mistral` | routing-key | Automatic caching in 64-token blocks with `prompt_cache_key` routing |
|
|
131
|
+
| `moonshot` | routing-key | Kimi platform caching through the OpenAI-compatible surface, conservative defaults |
|
|
132
|
+
| `openrouter` | routing-key | Normalizes `cache_control` passthrough and `cached_tokens` reporting across upstream providers |
|
|
133
|
+
| `google` | resource | `cachedContent` lifecycle: create, reuse, refresh, delete, caller-set TTL, per-token-hour storage billing |
|
|
134
|
+
| `groq` | passive | Automatic on gpt-oss models, no control surface, entries expire after roughly 2 hours idle |
|
|
135
|
+
| `deepseek` | passive | Disk-based automatic context caching with hit and miss token reporting |
|
|
136
|
+
| `ollama` | passive | Local runtime KV reuse, no billing dimension, analytics measure latency-motivated reuse |
|
|
137
|
+
| `lmstudio` | passive | Local runtime KV reuse, same posture as Ollama |
|
|
138
|
+
| `huggingface` | passive | Inference Endpoints expose no public prefix-cache controls as of June 2026 |
|
|
139
|
+
| `custom` | passive (default) | Escape hatch, fully caller-defined through `options.profiles` |
|
|
140
|
+
|
|
141
|
+
Passive providers still get the full value of segment ordering, linting, and usage accounting; the ordering itself is the optimization.
|
|
142
|
+
|
|
143
|
+
---
|
|
144
|
+
|
|
145
|
+
## Lint codes
|
|
146
|
+
|
|
147
|
+
Nine codes, each a documented production cache-killer the analyzer detects from structure alone. Errors defeat caching, warnings degrade it, info advises.
|
|
148
|
+
|
|
149
|
+
| Code | What it catches | Severity |
|
|
150
|
+
|---|---|---|
|
|
151
|
+
| `volatile-early` | A volatile segment in the first half of the prompt before any breakpoint-eligible boundary; nothing after it can ever be cached | error |
|
|
152
|
+
| `unstable-tools` | A tools segment declared semi or volatile; almost always a serialization bug (key order, timestamps in descriptions) | error |
|
|
153
|
+
| `breakpoint-after-volatile` | A breakpoint would land after a volatile segment; the written span could never be read back | error |
|
|
154
|
+
| `timestamp-in-stable` | ISO-8601 datetimes, unix epochs, or "today"/"current time" near digits inside a stable or semi segment | warning |
|
|
155
|
+
| `identifier-in-stable` | UUID v4 shapes, long hex runs, or base64-like runs inside a stable segment (session ids, request ids) | warning |
|
|
156
|
+
| `write-premium-trap` | Declared reuse does not repay the cache write premium inside the TTL window; caching this prefix loses money | warning |
|
|
157
|
+
| `segment-order` | Segments are not ordered stable-first; reordering would lengthen the cacheable prefix without semantic change | warning |
|
|
158
|
+
| `below-minimum` | The stable prefix is shorter than the provider minimum; the provider would silently cache nothing | info |
|
|
159
|
+
| `missing-stability` | A contradictory or unusable stability declaration (guards untyped JavaScript callers) | info |
|
|
160
|
+
|
|
161
|
+
Run the lint pass standalone with `racs.lint(input)` or gate prompt changes in CI with `racs analyze --input prompts.json`, which exits 1 on any error-severity finding.
|
|
162
|
+
|
|
163
|
+
---
|
|
164
|
+
|
|
165
|
+
## Break-even math
|
|
166
|
+
|
|
167
|
+
Cache writes cost a premium, and RACS refuses to recommend a write that will not pay for itself. The math is stated in base-input-token equivalents because the multipliers are price-relative, so it holds with or without a pricing table.
|
|
168
|
+
|
|
169
|
+
Worked example, Anthropic 5-minute tier, 4000 stable tokens:
|
|
170
|
+
|
|
171
|
+
```text
|
|
172
|
+
write multiplier 1.25 (5m tier)
|
|
173
|
+
read multiplier 0.1
|
|
174
|
+
|
|
175
|
+
writePremiumTokens = 4000 * (1.25 - 1) = 1000 token equivalents
|
|
176
|
+
savingsPerReuse = 4000 * (1 - 0.1) = 3600 token equivalents
|
|
177
|
+
minReusesToProfit = ceil(1000 / 3600) = 1
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
One read inside the window already repays the write. The same prefix on the 1-hour tier (2x write multiplier) costs a 4000-token premium and needs 2 reuses. When the declared `reuse` pattern cannot reach `minReusesToProfit` inside the TTL window, the plan carries a `write-premium-trap` warning, and where caching can only lose money the directive is an explicit `{ kind: 'none', reason }` instead of a trap.
|
|
181
|
+
|
|
182
|
+
For resource-family providers the same logic runs in USD against per-token-hour storage: below roughly one reuse per hour, keeping a Gemini `cachedContent` alive costs more in storage than the reads save, and RACS says so.
|
|
183
|
+
|
|
184
|
+
### Keep-warm scheduling
|
|
185
|
+
|
|
186
|
+
Breakpoint and resource caches expire on a TTL, and a touch shortly before expiry keeps them warm at read price instead of paying the write premium again. `racs.schedule()` returns every prefix whose refresh is due, scheduled at 90 percent of the TTL window after the last write, early enough to absorb timer jitter, late enough not to waste reads. The host runs the timer and the warming call, then reports it with `racs.markRefreshed(prefixKey)`.
|
|
187
|
+
|
|
188
|
+
### Caching MCP tool descriptions
|
|
189
|
+
|
|
190
|
+
An MCP server's tool list is the ideal cache prefix: tool schemas and descriptions routinely run thousands of tokens, the list is byte-stable between calls, and the agent replays it on every request. Serialize the `tools/list` response into a `'tools'` segment and let the planner place the marker:
|
|
191
|
+
|
|
192
|
+
```ts
|
|
193
|
+
const tools = JSON.stringify(toolListResponse.tools); // serialized MCP tools/list result
|
|
194
|
+
const plan = racs.plan({
|
|
195
|
+
provider: 'anthropic',
|
|
196
|
+
model: 'claude-sonnet-4-5',
|
|
197
|
+
segments: [{ id: 'mcp-tools', role: 'tools', stability: 'stable', content: tools }],
|
|
198
|
+
});
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
The `'tools'` role carries the highest breakpoint placement weight, so the marker lands exactly where the provider hashes first. Keep the serialization deterministic (same key order, no timestamps in descriptions), or the `unstable-tools` lint will name the bug. The runnable version, a literal tool-list shape with no MCP SDK import, is [examples/mcp-tools-segment.ts](./examples/mcp-tools-segment.ts).
|
|
202
|
+
|
|
203
|
+
---
|
|
204
|
+
|
|
205
|
+
## Persistence
|
|
206
|
+
|
|
207
|
+
State backends persist fingerprints, schedules, the resource registry, and ledger aggregates, never prompt content. Three ship in the box: `memoryState` (default), `fileState` (Node), and `kvState`, which wraps any string key-value client in one line:
|
|
208
|
+
|
|
209
|
+
```ts
|
|
210
|
+
import { createRACS, kvState } from '@takk/racs';
|
|
211
|
+
|
|
212
|
+
// Any Redis client (ioredis, node-redis), one line:
|
|
213
|
+
const state = kvState({
|
|
214
|
+
get: (k) => redis.get(k),
|
|
215
|
+
set: (k, v) => redis.set(k, v),
|
|
216
|
+
delete: (k) => redis.del(k),
|
|
217
|
+
});
|
|
218
|
+
|
|
219
|
+
// Upstash Redis: get: (k) => upstash.get<string>(k), set: (k, v) => upstash.set(k, v), delete: (k) => upstash.del(k)
|
|
220
|
+
// Cloudflare KV: get: (k) => env.RACS_KV.get(k), set: (k, v) => env.RACS_KV.put(k, v), delete: (k) => env.RACS_KV.delete(k)
|
|
221
|
+
|
|
222
|
+
const racs = createRACS({ state });
|
|
223
|
+
// ... plan and record as usual ...
|
|
224
|
+
await racs.flush(); // snapshot saved; a new engine with the same backend restores it
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
RACS never constructs the client and never sees connection credentials; the host passes a ready object.
|
|
228
|
+
|
|
229
|
+
---
|
|
230
|
+
|
|
231
|
+
## Vercel AI SDK middleware
|
|
232
|
+
|
|
233
|
+
One middleware object plans before each call, applies directives through `providerOptions`, and records the usage the provider reports back, including streamed calls via `wrapStream`:
|
|
234
|
+
|
|
235
|
+
```ts
|
|
236
|
+
import { anthropic } from '@ai-sdk/anthropic';
|
|
237
|
+
import { wrapLanguageModel } from 'ai';
|
|
238
|
+
import { createRACS } from '@takk/racs';
|
|
239
|
+
import { racsMiddleware } from '@takk/racs/vercel';
|
|
240
|
+
|
|
241
|
+
const racs = createRACS();
|
|
242
|
+
const model = wrapLanguageModel({
|
|
243
|
+
model: anthropic('claude-sonnet-4-5'),
|
|
244
|
+
middleware: racsMiddleware(racs, { provider: 'anthropic', model: 'claude-sonnet-4-5' }),
|
|
245
|
+
});
|
|
246
|
+
|
|
247
|
+
// Use `model` with generateText or streamText as usual, then:
|
|
248
|
+
const { hitRatio, savedUsd } = racs.stats();
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
The middleware is structural: it matches the `LanguageModelV3Middleware` contract without importing the `ai` package, so the zero-dependency invariant survives. A custom `segmenter` lets the host declare its own prompt anatomy when the default segmentation is not enough.
|
|
252
|
+
|
|
253
|
+
---
|
|
254
|
+
|
|
255
|
+
## OpenTelemetry ingestion
|
|
256
|
+
|
|
257
|
+
If your stack already emits GenAI spans, RACS ingests them directly. `usageFromSpan` reads token counters and identity only, never `gen_ai.prompt`, `gen_ai.completion`, or any content attribute:
|
|
258
|
+
|
|
259
|
+
```ts
|
|
260
|
+
import { createRACS } from '@takk/racs';
|
|
261
|
+
import { usageFromSpan, type GenAISpanLike } from '@takk/racs/otel';
|
|
262
|
+
|
|
263
|
+
const racs = createRACS();
|
|
264
|
+
|
|
265
|
+
// Inside a span processor's onEnd, an OTLP collector hook, or wherever
|
|
266
|
+
// finished spans surface in your host:
|
|
267
|
+
function onSpanEnd(span: GenAISpanLike): void {
|
|
268
|
+
const usage = usageFromSpan(span, { provider: 'anthropic' });
|
|
269
|
+
if (usage !== undefined) racs.record(usage);
|
|
270
|
+
}
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
It tolerates every attribute spelling in circulation (Anthropic-flavored, OpenLLMetry lineage, and the newer semantic-convention draft) and works with the spans Vercel AI SDK telemetry emits under `experimental_telemetry`.
|
|
274
|
+
|
|
275
|
+
---
|
|
276
|
+
|
|
277
|
+
## CLI and the serve bridge
|
|
278
|
+
|
|
279
|
+
The `racs` binary ships five commands: `help`, `version`, `analyze` (the CI lint gate), `simulate` (the deterministic demonstration above), `inspect` (print a saved snapshot, `--watch` for a live redraw), and `serve`.
|
|
280
|
+
|
|
281
|
+
`racs serve` wraps one engine in a hardened local HTTP bridge so non-JavaScript hosts can plan, lint, record, and read statistics. Endpoints: `POST /plan`, `POST /lint`, `POST /usage`, `GET /stats`, `GET /schedule`, `POST /refreshed`, `POST /invalidate`, and a `GET /healthz` that never requires the bearer. It binds loopback by default, refuses non-loopback hosts without a bearer token, rejects non-loopback Host headers in tokenless mode with 403 (the DNS-rebinding defense, `/healthz` included for consistency), compares tokens in constant time, and gates bodies with 415 and 413 responses.
|
|
282
|
+
|
|
283
|
+
```bash
|
|
284
|
+
racs serve --port 4378 --token "$RACS_TOKEN" --state .racs/state.json
|
|
285
|
+
|
|
286
|
+
curl -s -X POST http://127.0.0.1:4378/plan \
|
|
287
|
+
-H "authorization: Bearer $RACS_TOKEN" \
|
|
288
|
+
-H "content-type: application/json" \
|
|
289
|
+
-d '{
|
|
290
|
+
"provider": "anthropic",
|
|
291
|
+
"model": "claude-sonnet-4-5",
|
|
292
|
+
"segments": [
|
|
293
|
+
{ "id": "system", "role": "system", "stability": "stable", "contentHash": "sys-v1", "tokens": 3000 },
|
|
294
|
+
{ "id": "turn", "role": "dynamic", "stability": "volatile", "contentHash": "turn-1", "tokens": 200 }
|
|
295
|
+
],
|
|
296
|
+
"reuse": { "intervalSeconds": 60 }
|
|
297
|
+
}'
|
|
298
|
+
```
|
|
299
|
+
|
|
300
|
+
Hermes Agent note: a Hermes deployment can call `/plan` before each provider call and `/usage` after it, getting planned breakpoint layouts, drift detection, and savings analytics without touching its own transport. The full recipe, including the shell hook and the honest limits of an out-of-process bridge, is in [examples/hermes-bridge.md](./examples/hermes-bridge.md).
|
|
301
|
+
|
|
302
|
+
---
|
|
303
|
+
|
|
304
|
+
## Package surface
|
|
305
|
+
|
|
306
|
+
| Entry | What it is | Brotli size |
|
|
307
|
+
|---|---|---|
|
|
308
|
+
| `@takk/racs` | The engine, profiles, state backends, hashing | 10.72 kB ESM / 10.86 kB CJS |
|
|
309
|
+
| `@takk/racs/otel` | GenAI span ingestion | 604 B |
|
|
310
|
+
| `@takk/racs/vercel` | Vercel AI SDK middleware | 1.15 kB |
|
|
311
|
+
| `@takk/racs/integrations` | The four family bridges | 666 B |
|
|
312
|
+
| `@takk/racs/web` | Browser surface, no Node imports | 10.23 kB |
|
|
313
|
+
| `@takk/racs/edge` | Edge-runtime surface, no Node imports | 10.23 kB |
|
|
314
|
+
|
|
315
|
+
Zero runtime dependencies on every entry. The published tarball carries 46 files.
|
|
316
|
+
|
|
317
|
+
---
|
|
318
|
+
|
|
319
|
+
## The family stack
|
|
320
|
+
|
|
321
|
+
RACS is one layer of a five-package stack for production agents, each independent, each one line to bridge:
|
|
322
|
+
|
|
323
|
+
- Route models with [@takk/modelchain](https://www.npmjs.com/package/@takk/modelchain): `modelchainBridge` plans a cache per routed model, because provider caches are per-model.
|
|
324
|
+
- Rotate credentials with [@takk/keymesh](https://www.npmjs.com/package/@takk/keymesh): `keymeshBridge` invalidates provider-scoped cache state on `key.rotated` and `circuit.open`, since cached resources may be scoped to the credential that created them.
|
|
325
|
+
- Observe behavior with [@takk/behavioralai](https://www.npmjs.com/package/@takk/behavioralai): `behavioralaiBridge` turns the cache itself into a behaviorally observed agent, so a hit-ratio collapse surfaces as behavioral drift.
|
|
326
|
+
- Tune parameters with [@takk/noeticos](https://www.npmjs.com/package/@takk/noeticos): `noeticosBridge` freezes parameter tuning when a prefix drifts (the reward landscape moved) and releases after 3 stable plans.
|
|
327
|
+
- Cache context with RACS.
|
|
328
|
+
|
|
329
|
+
All four bridges live in `@takk/racs/integrations`, are structural (no sibling package is imported at runtime), and the siblings stay optional peers.
|
|
330
|
+
|
|
331
|
+
---
|
|
332
|
+
|
|
333
|
+
## FAQ
|
|
334
|
+
|
|
335
|
+
**Does RACS ever make a network call?**
|
|
336
|
+
No, never. RACS plans directives and normalizes usage reports; the host makes every provider call with its own credentials and transport. This is the product invariant, and it is why the package has zero runtime dependencies, never sees an API key, and runs identically in Node, browsers, and edge runtimes.
|
|
337
|
+
|
|
338
|
+
**Where do prices come from?**
|
|
339
|
+
You supply them, per model, in `options.pricing`. RACS never hardcodes prices because providers change terms without notice, and a stale hardcoded number is worse than none. Without a pricing table you still get every token-denominated statistic, just no USD figures.
|
|
340
|
+
|
|
341
|
+
**What happens below provider minimums?**
|
|
342
|
+
Providers silently cache nothing below their minimum prefix length (1024 tokens on most Anthropic and OpenAI models as of June 2026), with no error and no signal. RACS fires the `below-minimum` lint before that happens and emits an explicit `none` directive with the reason instead of a marker that would buy nothing.
|
|
343
|
+
|
|
344
|
+
**Why did my plan say `none` when my prompt is cacheable?**
|
|
345
|
+
Probably the write-premium trap. If your declared reuse can never repay the write premium, neither inside the refresh-extended TTL window (reads refresh breakpoint-family TTLs at no cost, so any reuse interval that fits the window keeps the cache alive indefinitely) nor through keep-warm touches, caching loses money and RACS refuses: it emits an explicit `none` plus the `write-premium-trap` finding. The `breakEven` field on the plan shows the exact derivation; raise reuse density, choose a longer TTL tier, or accept the no-op.
|
|
346
|
+
|
|
347
|
+
**What is the difference between drift and volatile churn?**
|
|
348
|
+
Volatile segments are declared to differ on every call, so their churn is expected behavior and never drift. Drift is when a segment you declared stable or semi quietly changes, which invalidates the entire left-anchored prefix behind it. RACS fingerprints stable and semi segments per agent lineage, names exactly which segments changed, and quantifies the invalidated tokens.
|
|
349
|
+
|
|
350
|
+
**Can I use RACS without ever showing it my prompts?**
|
|
351
|
+
Yes, hash-only mode. Pass `contentHash` (any stable digest you compute) plus `tokens` instead of `content`, and RACS never sees or stores the text. Plans, drift reports, persisted snapshots, and telemetry then carry hashes and counts only. Content-shape lints such as `timestamp-in-stable` are skipped for those segments by design, there is nothing to scan.
|
|
352
|
+
|
|
353
|
+
**How does RACS behave across multiple replicas?**
|
|
354
|
+
Each engine learns per process: fingerprints, schedules, and aggregates live in one engine instance. A shared KV backend shares state across restarts and replicas, but it is persistence, not coordination, the snapshot is last-writer-wins. For exact multi-replica aggregation, run one engine behind `racs serve` as a sidecar, or aggregate usage centrally before recording.
|
|
355
|
+
|
|
356
|
+
**How is this different from response-caching gateways like LiteLLM or Helicone?**
|
|
357
|
+
Those products sit on the wire and translate requests or cache whole responses. RACS does neither: it plans the structure of your prompt so the provider's own prefix cache hits, and accounts for what the provider reports. The two are complementary; a gateway cannot fix a timestamp in your system prompt, and RACS will not serve you a cached response.
|
|
358
|
+
|
|
359
|
+
**OpenAI reports no cache write counter. How do analytics work there?**
|
|
360
|
+
The routing-key family has no write premium and no write counter, so the ledger accounts reads against total input tokens (`cached_tokens` over `prompt_tokens`), which is exactly the normalized hit ratio. Write-tier fields stay empty for those providers, and the break-even question never arises because writes are free.
|
|
361
|
+
|
|
362
|
+
**Does it work on edge runtimes and in the browser?**
|
|
363
|
+
Yes. `@takk/racs/web` and `@takk/racs/edge` export the full surface minus the Node-only file state backend. Nothing in the engine touches sockets, the filesystem, or platform globals; persist through `kvState` over Cloudflare KV or Upstash.
|
|
364
|
+
|
|
365
|
+
---
|
|
366
|
+
|
|
367
|
+
## Author
|
|
368
|
+
|
|
369
|
+
RACS is built and maintained by David C Cavalcante, Takk Innovate Studio, who researches Massive Intelligence (IM) and non-human entities at [takk.ag](https://takk.ag).
|
|
370
|
+
|
|
371
|
+
- Email: [davcavalcante@proton.me](mailto:davcavalcante@proton.me)
|
|
372
|
+
- GitHub: [github.com/davccavalcante](https://github.com/davccavalcante)
|
|
373
|
+
- Project site: [https://racs.takk.ag/](https://racs.takk.ag/)
|
|
374
|
+
|
|
375
|
+
## Sponsors
|
|
376
|
+
|
|
377
|
+
If RACS saves your agent fleet real money, consider sponsoring continued maintenance through the channels in [.github/FUNDING.yml](./.github/FUNDING.yml). Sponsorship never buys roadmap priority; it buys maintenance time.
|
|
378
|
+
|
|
379
|
+
## License
|
|
380
|
+
|
|
381
|
+
[Apache-2.0](./LICENSE). Copyright 2026 David C Cavalcante, Takk Innovate Studio. See [NOTICE](./NOTICE).
|
package/SECURITY.md
ADDED
|
@@ -0,0 +1,57 @@
|
|
|
1
|
+
# Security Policy
|
|
2
|
+
|
|
3
|
+
`@takk/racs` is a stable (1.0.0) library and CLI for provider-faithful prefix-cache planning and analytics. We take security reports seriously and aim to acknowledge each one within two business days.
|
|
4
|
+
|
|
5
|
+
## Supported versions
|
|
6
|
+
|
|
7
|
+
Each published version follows strict SemVer (see [SPEC.md](./SPEC.md) section 11 and [.github/RELEASING.md](./.github/RELEASING.md)). Only the latest minor of the current major receives security patches; an older major receives critical-CVE fixes for 6 months after the next major lands.
|
|
8
|
+
|
|
9
|
+
| Package | Supported |
|
|
10
|
+
|---|---|
|
|
11
|
+
| `@takk/racs` | current `latest` dist-tag |
|
|
12
|
+
|
|
13
|
+
## Reporting a vulnerability
|
|
14
|
+
|
|
15
|
+
**Do not file public GitHub issues for security problems.** Send reports to **davcavalcante@proton.me** (preferred) or **say@takk.ag** (Takk relay), with the subject line beginning `[SECURITY]`.
|
|
16
|
+
|
|
17
|
+
Include, at minimum:
|
|
18
|
+
|
|
19
|
+
- Affected version (`npm ls @takk/racs`).
|
|
20
|
+
- Reproduction steps or a minimal proof of concept.
|
|
21
|
+
- Impact assessment (what an attacker can achieve).
|
|
22
|
+
- Any suggested mitigation.
|
|
23
|
+
|
|
24
|
+
PGP or signed reports are welcome but not required. If you need an out-of-band channel, ask in the first message and we will propose one.
|
|
25
|
+
|
|
26
|
+
## Response process
|
|
27
|
+
|
|
28
|
+
1. Acknowledgement within **2 business days**.
|
|
29
|
+
2. Triage and severity assignment within **7 days**.
|
|
30
|
+
3. Fix targeted for the next release; critical issues ship as an out-of-band patch on the affected minor.
|
|
31
|
+
4. Coordinated disclosure: the reporter is credited in the changelog and advisory unless they request anonymity.
|
|
32
|
+
|
|
33
|
+
## Threat model: in scope
|
|
34
|
+
|
|
35
|
+
- **Credential handling: there is none, and any appearance of it is a bug.** RACS never handles API keys, by product invariant. The `KvLike` and bridge interfaces receive ready client objects; any path that causes a connection string, token, or key to be read, stored, logged, or persisted by this package is in scope and treated as a vulnerability.
|
|
36
|
+
- **Prompt content leakage.** Segment content must never reach a persisted snapshot, a telemetry event, a lint message (matches are referenced by digest only), or the `/otel` ingestion path (which must never read content attributes). Any counterexample is in scope.
|
|
37
|
+
- **The serve bridge.** `racs serve` binds loopback by default and refuses non-loopback hosts without a token; the bearer comparison is constant-time over SHA-256 digests; POST bodies are gated by content type (415) and a 1 MB cap (413); CORS headers are emitted only when both `--token` and `--cors-origin` are configured. Tokenless instances additionally validate the Host header as the DNS-rebinding defense: any request whose hostname (port stripped, IPv6 brackets tolerated) is not loopback (`localhost`, `127.0.0.1`, `::1`) is answered 403 `forbidden host`, and `/healthz` is host-checked too, consistency over the nothing-leaked argument; with a token configured the bearer is the gate and the Host header is not consulted. Any bypass of these gates (auth bypass, timing oracle on the token, body-cap evasion, Host-validation bypass in tokenless mode, CORS leak without the double opt-in) is in scope.
|
|
38
|
+
- **State snapshot handling.** Snapshots are validated by version (`version: 1`, rejected otherwise with `ERR_STATE_VERSION`) and restored defensively section by section. A crafted snapshot that crashes the engine, escapes the defensive restore, or smuggles content into memory it should not reach is in scope. Path traversal in the file backend write path likewise.
|
|
39
|
+
- **Forged usage, as a documented boundary.** `record()` and the authenticated `/usage` endpoint trust the operator: fabricated `CacheUsage` skews hit ratios and USD analytics. By design this can never alter directives, plans, or schedules, only the analytics. A forged usage record that influences planning output would cross the documented boundary and is in scope as a vulnerability; skewed analytics from a trusted-but-lying feeder is not, that is the operator trust boundary.
|
|
40
|
+
- **Misuse-resistance of cache keys.** FNV-1a 64 prefix keys are non-cryptographic, predictable, and collision-constructible, and the package must never use them for authentication, authorization, or integrity decisions. Any internal code path that does is in scope.
|
|
41
|
+
- **Supply chain.** Tarball contamination, compromised npm scope, or a published artifact whose provenance attestation does not match the source commit.
|
|
42
|
+
|
|
43
|
+
## Out of scope
|
|
44
|
+
|
|
45
|
+
- The security of upstream provider APIs and the accuracy of the usage counters they report.
|
|
46
|
+
- Custody of your prompts, pricing tables, and provider credentials before anything reaches RACS; that is the operator's responsibility (RACS never receives the credentials at all).
|
|
47
|
+
- Analytics skew caused by an operator feeding the engine false usage within their own trust domain (see the documented boundary above).
|
|
48
|
+
- Theoretical attacks against FNV-1a as a hash; it is declared non-cryptographic and is never used for security decisions. Report a violation of that rule, not the hash.
|
|
49
|
+
- Denial of service through unbounded inputs to your own embedding application; the serve bridge's own caps are in scope, your application's are yours.
|
|
50
|
+
|
|
51
|
+
## Supply-chain assurances
|
|
52
|
+
|
|
53
|
+
- **Zero runtime dependencies.** The transitive attack surface of the published package is the package itself. Sibling bridges are structural; the optional peers are never imported at runtime.
|
|
54
|
+
- **Provenance.** Every release is published with `npm publish --provenance` (SLSA attestation from GitHub Actions). Verify with `npm view @takk/racs@<version> --json | jq .dist.attestations`.
|
|
55
|
+
- **Files allowlist.** `package.json#files` enumerates exactly what ships (`dist`, `README.md`, `LICENSE`, `NOTICE`, `CHANGELOG.md`, `SECURITY.md`); nothing else can leak into the tarball. The published artifact carries 46 files.
|
|
56
|
+
- **Frozen lockfile.** `pnpm-lock.yaml` is committed, and CI installs with `--frozen-lockfile`, so builds are reproducible and dependency swaps cannot ride a CI run.
|
|
57
|
+
- **Two-step release.** A reviewable GitHub Release precedes every npm publish (see [.github/RELEASING.md](./.github/RELEASING.md)).
|