@possumtech/rummy 2.2.1 → 2.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. package/package.json +14 -6
  2. package/service.js +18 -10
  3. package/src/agent/AgentLoop.js +2 -11
  4. package/src/agent/ContextAssembler.js +34 -3
  5. package/src/agent/Entries.js +16 -89
  6. package/src/agent/ProjectAgent.js +1 -16
  7. package/src/agent/TurnExecutor.js +12 -52
  8. package/src/agent/XmlParser.js +30 -117
  9. package/src/agent/errors.js +3 -22
  10. package/src/agent/materializeContext.js +3 -11
  11. package/src/hooks/Hooks.js +0 -29
  12. package/src/lib/hedberg/hedberg.js +4 -14
  13. package/src/lib/hedberg/marker.js +15 -59
  14. package/src/llm/LlmProvider.js +13 -26
  15. package/src/llm/errors.js +3 -11
  16. package/src/llm/openaiStream.js +6 -46
  17. package/src/plugins/ask_user/ask_user.js +12 -17
  18. package/src/plugins/budget/README.md +46 -8
  19. package/src/plugins/budget/budget.js +23 -42
  20. package/src/plugins/cp/cp.js +28 -18
  21. package/src/plugins/env/env.js +11 -7
  22. package/src/plugins/error/error.js +8 -37
  23. package/src/plugins/get/get.js +42 -24
  24. package/src/plugins/google/google.js +23 -3
  25. package/src/plugins/helpers.js +34 -50
  26. package/src/plugins/instructions/README.md +2 -2
  27. package/src/plugins/instructions/instructions-user.md +1 -1
  28. package/src/plugins/instructions/instructions.js +19 -6
  29. package/src/plugins/known/known.js +1 -8
  30. package/src/plugins/log/log.js +15 -1
  31. package/src/plugins/mv/mv.js +29 -19
  32. package/src/plugins/persona/persona.js +4 -4
  33. package/src/plugins/prompt/README.md +1 -1
  34. package/src/plugins/prompt/prompt.js +1 -1
  35. package/src/plugins/rm/rm.js +26 -15
  36. package/src/plugins/rm/rmDoc.md +0 -2
  37. package/src/plugins/set/set.js +37 -84
  38. package/src/plugins/set/setDoc.md +16 -16
  39. package/src/plugins/sh/sh.js +10 -8
  40. package/src/plugins/skill/skillDoc.md +1 -1
  41. package/src/plugins/unknown/README.md +1 -1
  42. package/src/plugins/unknown/unknown.js +2 -6
  43. package/src/plugins/update/update.js +3 -2
  44. package/src/plugins/update/updateDoc.md +1 -1
  45. package/.env.example +0 -152
  46. package/.xai.key +0 -1
  47. package/PLUGINS.md +0 -962
  48. package/SPEC.md +0 -1897
  49. package/biome/no-fallbacks.grit +0 -50
  50. package/gemini.key +0 -1
package/SPEC.md DELETED
@@ -1,1897 +0,0 @@
1
- # RUMMY: Architecture Specification
2
-
3
- The authoritative reference for Rummy's design. The instructions
4
- plugin (`instructions-system.md` + `instructions-user.md` + tool
5
- docs) and the persona plugin (`persona/default.md`) define model-
6
- facing behavior. This document defines everything else.
7
-
8
- ---
9
-
10
- ## Glossary {#glossary}
11
-
12
- Canonical meanings. When a doc, comment, test name, or commit message
13
- uses one of these words, it should mean exactly what's written here.
14
-
15
- | Term | Meaning |
16
- |---|---|
17
- | **run** | The alias-keyed lifetime of one project-agent invocation. Begins on `set run://{alias}` with a prompt; ends at terminal status (200/204/422/499/500). One run per alias; aliases are unique per project. |
18
- | **loop** | One `ask` or `act` invocation and all its continuation turns until terminal `<update>`, abandonment, or abort. A run can contain multiple loops if a fresh prompt arrives on an existing run. |
19
- | **turn** | One round-trip with the LLM: one assembled prompt sent, one response parsed. A loop is a sequence of turns. |
20
- | **mode** | `ask` (read-only — no proposals, no `<sh>`, no edits) or `act` (full tool surface). Per loop, set at the entry point. |
21
- | **phase** | The RECORD→DISPATCH split within a single turn (see [dispatch_path](#dispatch_path)). AGENTS.md "Phase 1 / Phase 2 / ..." entries refer to project-development milestones; that's a separate use of the word. The model-facing workflow lives in `persona/default.md` as the 7D ladder — Draft → Decompose → Discover → Distill → Define → Determine → Deliver — a persona convention, not a status-keyed engine state. |
22
- | **proposal** | A tool-call entry at status 202 awaiting client resolution (accept/reject). Side-effecting actions (`<sh>`, `<env>`, file `<set>`, file `<rm>`/`<mv>`/`<cp>`, `<ask_user>`) emit proposals. YOLO mode auto-accepts. |
23
- | **verdict** | The end-of-turn ruling from `hooks.turn.verdict.filter` — a generic filter chain. Returns `{continue, status, reason}`. The error plugin is the canonical subscriber today; future plugins (cycle-detection, budget-overflow termination) can join the chain to vote without touching error.js or AgentLoop. Decides whether the loop continues to another turn or terminates. |
24
- | **strike** | A turn whose verdict counts toward `MAX_STRIKES`. A strike fires when `turnErrors > 0` (any `error.log` entry that turn) or when cycle detection trips silently. The streak counter resets on a clean turn (no errors, no cycle); reaches `MAX_STRIKES` → loop abandons at 499. |
25
- | **resolution** | Client's accept/reject of a proposal via `run/resolve` RPC. |
26
- | **dispatch** | The DISPATCH phase of a turn — actually executing recorded action entries. |
27
-
28
- **Hierarchy:** project ⊃ run ⊃ loop ⊃ turn. A turn is the smallest
29
- unit of model interaction. A strike is a per-turn property that
30
- accumulates across turns within a loop.
31
-
32
- ---
33
-
34
- ## The Contract
35
-
36
- Rummy has one contract. Every actor speaks it.
37
-
38
- ### Entries {#entries}
39
-
40
- An entry is the sole unit of state the contract names. Every entry
41
- carries:
42
-
43
- | Field | Meaning |
44
- |-------|---------|
45
- | **path** | Identity. `scheme://locator` or bare filepath. |
46
- | **body** | Content (text). |
47
- | **attributes** | JSON bag of structured metadata. |
48
- | **visibility** | `visible \| summarized \| archived`. What the model sees of this entry next turn. |
49
- | **state** | `proposed \| streaming \| resolved \| failed \| cancelled`. Where the entry is in its lifecycle. |
50
- | **outcome** | Short reason string when state ∈ {failed, cancelled}. Opaque to most callers; a few plugins parse it. |
51
- | **writer** | Which tier wrote it last. |
52
- | **scope** | `run:N \| project:N \| global`. Determines namespace and readership. |
53
-
54
- Visibility and state are independent axes. An entry can be `state=resolved,
55
- visibility=archived` (complete and hidden) or `state=streaming,
56
- visibility=summarized` (in-flight, shown as summary) or `state=proposed,
57
- visibility=visible` (visible, awaiting resolution).
58
-
59
- ### Six Primitives {#primitives}
60
-
61
- The entire grammar for changing entries:
62
-
63
- | Verb | Effect |
64
- |------|--------|
65
- | **set** | Create or update an entry. Writes content, state, visibility, attributes. |
66
- | **get** | Promote an entry to `visibility=visible`. The read-with-side-effect. |
67
- | **rm** | Remove an entry from the caller's view (or delete it when scope permits). |
68
- | **cp** | Copy an entry to a new path. |
69
- | **mv** | Rename an entry to a new path. |
70
- | **update** | Record a turn's continuation or terminal signal. |
71
-
72
- Every tool in rummy (`<sh>`, `<ask_user>`, `<search>`, `<env>`, `<think>`,
73
- `<known>`, `<unknown>`, …) is a **plugin that composes the six
74
- primitives**. A `<sh>` invocation becomes a `set` that creates a
75
- proposed entry; on user accept, a stream plugin drives body appends
76
- via `set` and eventually a state transition to `resolved`. The
77
- primitives are the atoms; tools are the molecules.
78
-
79
- ### Three Surfaces, One Grammar {#surfaces}
80
-
81
- | Actor | Syntax |
82
- |-------|--------|
83
- | **Model** | XML tags: `<set path="..." />` |
84
- | **Plugin** | RummyContext methods: `rummy.set({...})` |
85
- | **Client** | JSON-RPC: `{"method":"set","params":{...}}` |
86
-
87
- Syntactic skins over the same semantics. A plugin calling
88
- `rummy.set(...)`, a client sending `{"method":"set",...}`, and a model
89
- emitting `<set/>` are the same event at the store layer, authorized by
90
- the respective writer identity against the scheme's permissions.
91
-
92
- ### Four Writer Tiers {#writer_tiers}
93
-
94
- A strict hierarchy of writer identities. Each tier is a superset of
95
- what's below it:
96
-
97
- | Tier | Access |
98
- |------|--------|
99
- | **system** | Internal plumbing (TurnExecutor, AgentLoop audit writes — `instructions://`, `reasoning://`, message schemes). |
100
- | **plugin** | Declares schemes, registers hooks and filters, calls store methods directly. Everything below plus plugin-scope infrastructure. |
101
- | **client** | RPC surface. Writes to client-writable schemes (`run://`, proposed-entry state transitions, config) and reads via subscribed notifications. |
102
- | **model** | XML-tag surface. Writes to model-writable schemes (`known://`, `unknown://`, `update://`, tool-result schemes) as restricted by the active run's capability set. |
103
-
104
- Every scheme declares `writable_by` as a subset of `{system, plugin,
105
- client, model}`. A write from an identity outside that subset rejects
106
- with state=failed, outcome="permission:403".
107
-
108
- ### Runs Are Entries {#runs_are_entries}
109
-
110
- Starting a run is not a separate API — it is a `set` to
111
- `run://{alias}` with a prompt body and attributes carrying model,
112
- restrictions, and resolution strategy. A run plugin observes `run://`
113
- entry writes and starts the turn loop. Cancelling is a state
114
- transition to `cancelled` on the same path. Resolving a proposed entry
115
- is a state transition on that entry's path.
116
-
117
- The lifecycle API is the entry grammar. No parallel verb set.
118
-
119
- ### Events & Filters {#events_and_filters}
120
-
121
- Between the primitive-write layer and the actual work, rummy is a
122
- hooks-and-filters system. Plugins subscribe to events (fire-and-forget
123
- side effects) and filters (transformation chains that thread a value
124
- through subscribers in priority order).
125
-
126
- **Every `<tag>` the model sees is a plugin.** `<summary>` /
127
- `<visible>` → known plugin. `<unknowns>` → unknown plugin. `<log>`
128
- → log plugin. `<instructions>` → instructions plugin. `<prompt>` →
129
- prompt plugin. `<budget>` → budget plugin. No monolithic assembler
130
- decides what goes where. Each plugin filters for its own data from
131
- the shared row set, renders its section, returns.
132
-
133
- **Plugins compose, they don't coordinate.** A plugin subscribes to a
134
- filter at a priority, receives the accumulator value, appends its
135
- contribution, returns. It doesn't know what other plugins exist.
136
- Priority determines ordering. Lower numbers run first.
137
-
138
- **The core is a filter chain invocation.** `ContextAssembler` computes
139
- `loopStartTurn` from the latest prompt entry's `source_turn`, then
140
- calls `assembly.system.filter(systemPrompt, ctx)` and
141
- `assembly.user.filter("", ctx)`. Everything else is plugins.
142
-
143
- ### Physical Layout
144
-
145
- The contract is realized across two tables plus a compat view:
146
-
147
- - **`entries`** — content layer. `(scope, path)` unique. Body,
148
- attributes, hash, tokens.
149
- - **`run_views`** — per-run projection. Visibility, state, outcome,
150
- turn, loop. A run sees an entry only if it has a view row.
151
- - **`known_entries`** — compatibility VIEW joining the two for legacy
152
- SELECT queries. Not writable.
153
-
154
- Server-side bookkeeping (runs, loops, turns, projects, models,
155
- schemes, file_constraints, turn_context, rpc_log) exists to support
156
- the contract; the contract's actors never address these tables
157
- directly.
158
-
159
- ---
160
-
161
- ## The Known Store {#known_store}
162
-
163
- All model-facing state is stored across two tables joined via the
164
- `known_entries` compatibility VIEW. Files, knowledge, tool results,
165
- skills, audit — everything is a keyed entry with a URI path, body,
166
- attributes, per-run status, and per-run visibility.
167
-
168
- ### Schema {#schema}
169
-
170
- **Content layer** — `entries` (shared, scope-owned):
171
-
172
- ```sql
173
- entries (
174
- id, scope, path, scheme, body, attributes,
175
- hash, created_at, updated_at,
176
- UNIQUE (scope, path)
177
- )
178
- ```
179
-
180
- | Column | Purpose |
181
- |--------|---------|
182
- | `scope` | `global`, `project:N`, or `run:N`. Determines who can read; per-scheme `writable_by` determines who can write. |
183
- | `path` | Entry identity within scope. Bare paths (`src/app.js`) or URIs (`known://auth`). Max 2048 chars. |
184
- | `scheme` | GENERATED from `schemeOf(path)`. Drives dispatch and view routing. |
185
- | `body` | Content. File text, tool output, skill docs. |
186
- | `attributes` | Tag attributes as JSON. `CHECK (json_valid)`. |
187
- | `hash` | SHA-256 for file change detection. |
188
-
189
- Tokens are not stored on entries. See [token_accounting](#token_accounting) — token cost is a property of the materialized packet, computed during assembly, never persisted.
190
-
191
- **View layer** — `run_views` (per-run projection):
192
-
193
- ```sql
194
- run_views (
195
- id, run_id, entry_id, loop_id, turn,
196
- status INTEGER, visibility TEXT,
197
- write_count, refs, created_at, updated_at,
198
- UNIQUE (run_id, entry_id)
199
- )
200
- ```
201
-
202
- | Column | Purpose |
203
- |--------|---------|
204
- | `run_id`, `entry_id` | (run, entry) unique pair. Absent view = not in context. |
205
- | `loop_id`, `turn` | Freshness — when this run last touched the entry. |
206
- | `status` | HTTP status code — outcome of the run's last operation on this entry. |
207
- | `visibility` | `visible` \| `summarized` \| `archived`. The run's relationship to the entry. |
208
- | `write_count` | How many times this run has written this entry. |
209
-
210
- **Compatibility view** — `known_entries` joins the two tables so
211
- legacy SELECT queries keep working. Not writable; new write code must
212
- target `entries` + `run_views` directly (see [upsert_semantics](#upsert_semantics)).
213
-
214
- **No shadowing.** A run cannot override a global (or project-scoped)
215
- entry with a run-scoped copy of the same path. Scope is resolved from
216
- the scheme's declared `default_scope` at write time; if the writer's
217
- permission doesn't allow the target scope, the write is rejected
218
- (403 + `error://`). Paths are unique within a scope, but different
219
- scopes use independent namespaces — `known://plan` is always run-
220
- scoped; `wiki://...` (hypothetical) would always be global. The
221
- scheme plugin owns the decision; the model doesn't juggle scopes.
222
-
223
- **Forks copy views, not content.** `store.forkEntries(parent, child)`
224
- inserts new `run_views` rows referencing the parent's `entries`
225
- rows — no body copies, O(row-count) rather than O(body-bytes).
226
- A forked child's subsequent writes diverge by creating new entries
227
- at the child's scope; the parent's entries stay untouched.
228
-
229
- ### Schemes, Status & Visibility {#schemes_status_visibility}
230
-
231
- Every entry has two independent dimensions: **status** (HTTP integer —
232
- view-side) and **visibility** (what the model sees — view-side). These
233
- are separate concerns.
234
-
235
- **Status** (operation outcome): 200 (OK), 202 (proposed), 400 (bad
236
- request), 403 (permission denied), 404 (not found), 409 (conflict),
237
- 413 (too large), 499 (aborted), 500 (error).
238
-
239
- **Visibility** (the model's view in the run's context): `visible` (body
240
- shown), `summarized` (path + attrs shown, body hidden or condensed;
241
- promote via `<get>`), `archived` (invisible; retrievable via pattern
242
- search).
243
-
244
- Lifecycle events (budget Turn Demotion, fork copy) change `visibility`
245
- but never `status` — status stays truthful about the last body
246
- operation. See `demote_turn_entries` in `known_store.sql`.
247
-
248
- Paths use URI scheme syntax. Bare paths (no `://`) are files, stored
249
- with `scheme IS NULL` (JOINs treat NULL as `'file'` via COALESCE).
250
-
251
- Every entry plays one of four roles:
252
-
253
- | Role | Category | Section | Description |
254
- |------|----------|---------|-------------|
255
- | **Data** | `data` | `<summary>` + `<visible>` | Entries the model works with — persistent state and captured payload. Summary line in `<summary>` for visible+summarized tiers; full body in `<visible>` only when promoted. |
256
- | **Logging** | `logging` | `<log>` | Records of what happened — tool results, lifecycle signals |
257
- | **Unknowns** | `unknown` | `<unknowns>` | Open questions the model is tracking |
258
- | **Prompt** | `prompt` | `<prompt>` | The task driving the loop |
259
-
260
- `logging` is the default category. Plugins opt into `data` explicitly.
261
-
262
- | Scheme | Category | `writable_by` | Description |
263
- |--------|----------|---------------|-------------|
264
- | `NULL` (bare path) | data | `model, plugin` | File content. JOINs via `COALESCE(scheme, 'file')`. |
265
- | `known://` | data | `model, plugin` | Model-registered knowledge. One fact per entry. |
266
- | `skill://` | data | `model, plugin` | Skill docs. Rendered in system message. |
267
- | `http://`, `https://` | data | `model, plugin` | Web content. |
268
- | `sh://`, `env://` | data | `model, plugin` | Streaming-producer payload — stdout/stderr channel entries from shell/env commands. **Channels only**; the action audit record lives in `log://`. See [scheme_category_split](#scheme_category_split). |
269
- | `unknown://` | unknown | `model, plugin` | Unresolved questions. |
270
- | `prompt://` | prompt | `plugin` | User prompt with `mode` attribute. Written by prompt plugin, never by model. |
271
- | `log://` | logging | `system, plugin, model` | Unified audit record namespace for all tool actions. One entry per action at `log://turn_N/{action}/{slug}`. |
272
- | `update://` | logging | `model, plugin` | Lifecycle signal. Status attr classifies terminal (200/204/422) vs continuation (102). |
273
- | `error://` | logging | `model, plugin` | Runtime errors — policy rejection, budget overflow (status 413), dispatch crashes, protocol violations. Unified channel via `hooks.error.log.emit`. |
274
- | `tool://` | audit | `system` | Internal plugin metadata. `model_visible = 0`. |
275
- | `instructions://`, `system://`, `reasoning://`, `model://`, `user://`, `assistant://`, `content://` | audit | `system` | Audit entries. `model_visible = 0`. Written only by server-level code. |
276
-
277
- ### Scheme / Category Split {#scheme_category_split}
278
-
279
- **Scheme determines category.** Every entry's category is looked up
280
- from its scheme registration; entries of the same scheme always share a
281
- category. Data and logging never share a scheme.
282
-
283
- Streaming producers (sh, env, and future fetch/search/tail/watch) split
284
- across two namespaces as a direct consequence:
285
-
286
- - **Action audit record** lives in `log://turn_N/{action}/{slug}` —
287
- scheme=`log`, category=`logging`. Renders in `<log>`.
288
- - **Payload channels** live in `{action}://turn_N/{slug}_N` —
289
- scheme=`{action}` (registered as `category: "data"`). Render in
290
- `<summary>` (always, while tracked) and `<visible>` (when
291
- promoted).
292
-
293
- This keeps `<log>` a terse audit trail (what happened, exit code,
294
- paths) while `<visible>` carries the actual streamed bytes the model
295
- reads. Conflating the two — e.g., writing channels under `log://...` —
296
- mislabels payload as audit and pollutes the logging section with
297
- multi-line command output. See [streaming_entries](#streaming_entries).
298
-
299
- ### Scheme Registry {#scheme_registry}
300
-
301
- The `schemes` table is a bootstrap registry — rows of
302
- `(name, model_visible, category, default_scope, writable_by)`.
303
- Plugins register their scheme via `core.registerScheme({name, category,
304
- scope, writableBy})` in the constructor. Defaults:
305
- `scope = "run"`, `writableBy = ["model", "plugin"]`.
306
-
307
- - `model_visible` — whether entries appear in `v_model_context` (`0`
308
- hides audit schemes from the model).
309
- - `default_scope` — `run` \| `project` \| `global`. Resolved to a
310
- concrete scope string at write time (`run:N`, `project:N`, `global`).
311
- Project-scoped writes require `projectId` on the call; `Entries.set`
312
- throws if it's missing.
313
- - `writable_by` — JSON array of allowed writer types
314
- (`model` \| `plugin` \| `system` \| `client`). `Entries.set` throws
315
- `PermissionError` when the caller's writer isn't in the list.
316
-
317
- ### UPSERT Semantics {#upsert_semantics}
318
-
319
- Writes go through `Entries.set({runId, path, body, state?, visibility?,
320
- attributes?, outcome?, turn?, loopId?, writer?, projectId?, ...})`
321
- — two-prep flow:
322
-
323
- 1. `upsert_entry` — INSERT OR UPDATE on `(scope, path)`. Scope comes
324
- from scheme's `default_scope`. Returns the `entry_id`.
325
- 2. `upsert_run_view` — INSERT OR UPDATE on `(run_id, entry_id)`.
326
- Increments `write_count` on conflict.
327
-
328
- Blank body is valid. Deletion uses `<rm>`, which removes the
329
- `run_views` row; the shared `entries` row is left for now (GC is a
330
- future concern).
331
-
332
- ---
333
-
334
- ## Relational Tables
335
-
336
- The K/V store is the memory. Relational tables are the skeleton.
337
-
338
- ```sql
339
- projects (id, name UNIQUE, project_root, config_path, created_at)
340
- models (id, alias UNIQUE, actual, context_length, created_at)
341
- runs (id, project_id, parent_run_id, model, alias UNIQUE,
342
- status INTEGER, temperature, persona, context_limit,
343
- next_turn, next_loop, created_at)
344
- loops (id, run_id, sequence, mode, model, prompt, status INTEGER,
345
- config JSON, result JSON, created_at)
346
- turns (id, run_id, loop_id, sequence, context_tokens,
347
- reasoning_content, prompt_tokens, cached_tokens,
348
- completion_tokens, reasoning_tokens, total_tokens, cost,
349
- created_at)
350
-
351
- file_constraints (id, project_id, pattern, visibility, created_at)
352
- -- Project-level config. NOT tool dispatch. See [file_constraints](#file_constraints).
353
- turn_context (id, run_id, loop_id, turn, ordinal, path, scheme,
354
- status, visibility, body, tokens, attributes,
355
- category, source_turn)
356
- rpc_log (id, project_id, method, rpc_id, params, result, error)
357
- ```
358
-
359
- **No sessions.** Runs belong to projects. Any client that knows the project
360
- name can access any run. Temperature, persona, and context_limit are per-run.
361
-
362
- **Models** are bootstrapped from `RUMMY_MODEL_*` env vars at startup (upsert).
363
- Clients can add/remove models at runtime via RPC. No default model — the
364
- client picks for every run.
365
-
366
- ### Run State Machine {#run_state_machine}
367
-
368
- All status fields are HTTP integer codes. `runs.status` transitions
369
- are enforced by `trg_run_state_transition` (see initial migration):
370
-
371
- ```
372
- 100 queued → 102 running, 499 aborted
373
- 102 running → 200 completed, 202 proposed, 500 failed, 499 aborted
374
- 202 proposed → 102 running, 200 completed, 499 aborted
375
- 200 completed → 102 running, 499 aborted
376
- 500 failed → 102 running, 499 aborted
377
- 499 aborted → 102 running
378
- ```
379
-
380
- All terminal states (200/500/499) allow transition back to running.
381
- Runs are long-lived.
382
-
383
- ### Loops Table {#loops_table}
384
-
385
- The loops table IS the prompt queue. Each `ask`/`act` creates a loop.
386
- FIFO per run (ordered by sequence). One active at a time. Abort stops
387
- the current loop; pending loops survive. Projects > runs > loops > turns.
388
-
389
- ### File Constraints {#file_constraints}
390
-
391
- The `file_constraints` table is project-level configuration — it
392
- defines which files a project cares about. This is backbone, not tool
393
- dispatch. Constraint type governs **membership** and **write
394
- permission**, not in-context visibility. In-context visibility
395
- (`visible` / `summarized` / `archived`) is per-entry and model-
396
- controlled — files default to `archived` on ingestion; the model
397
- promotes via `<get>` / `<set visibility=…>`.
398
-
399
- - `add` — file is part of the project; ingested as an entry; model
400
- may write. Default for `setConstraint`.
401
- - `readonly` — same ingestion; `<set>` is vetoed at the proposal-
402
- accept gate.
403
- - `ignore` — excluded from scans entirely. The file remains on disk
404
- for `<sh>` / `<env>` invocation but is not present as an entry.
405
-
406
- **Boundary:** Setting a constraint (`File.setConstraint`) is a
407
- project-config write. Promoting/demoting the matching entries is tool
408
- dispatch that goes through the handler chain with budget enforcement.
409
- These are separate operations: constraint persists across runs, entry
410
- visibility is scoped to a run and subject to the same budget rules as
411
- a model `<get>`.
412
-
413
- `store` RPC manages constraints directly — it is not a model tool.
414
- `get` RPC with `persist` sets the constraint AND dispatches promotion.
415
-
416
- ---
417
-
418
- ## Entry-Driven Dispatch
419
-
420
- ### Unified API {#unified_api}
421
-
422
- Three callers share a tool vocabulary. The invocation shape is
423
- per-tier; params shape is not uniform across tiers.
424
-
425
- | Tier | Transport | Invocation |
426
- |------|-----------|-----------|
427
- | Model | XML tags | `<rm path="file.txt"/>` |
428
- | Client | JSON-RPC | `{ method: "rm", params: { path: "file.txt" } }` |
429
- | Plugin | RummyContext verbs | `rummy.rm("file.txt")` (each verb takes what's natural — see `src/hooks/RummyContext.js`) |
430
-
431
- | Method | Model | Client | Plugin |
432
- |--------|-------|--------|--------|
433
- | `think`, `get`, `set`, `rm`, `mv`, `cp`, `sh`, `env`, `search` | ✓ | ✓ | ✓ |
434
- | `ask_user`, `update` | ✓ | ✓ | ✓ |
435
- | `ask`, `act`, `resolve`, `abort`, `startRun` | — | ✓ | ✓ |
436
- | `getRuns`, `getModels`, `getEntries` | — | ✓ | ✓ |
437
- | `on()`, `filter()`, db/store access | — | — | ✓ |
438
-
439
- Model tier restrictions enforced by unified `resolveForLoop(mode, flags)`.
440
- Ask mode excludes `sh`. Flags: `noInteraction` excludes `ask_user`,
441
- `noWeb` excludes `search`, `noProposals` excludes `ask_user`/`env`/`sh`.
442
- 11 model tools: think, get, set, env, sh, rm, cp, mv, ask_user, update,
443
- search. The model writes `known` and `unknown` entries via
444
- `<set path="known://...">` and `<set path="unknown://...">`; those
445
- plugins don't advertise their own tag name — they render and filter.
446
- Client tier requires project init. Plugin tier has no restrictions.
447
-
448
- ### Dispatch Path {#dispatch_path}
449
-
450
- Each tier feeds into the shared tool handler chain, but through a
451
- different entry point:
452
-
453
- ```
454
- Model: XmlParser → { name, path, ... } → TurnExecutor.#record()
455
- → hooks.tools.dispatch(scheme, entry, rummy)
456
- Client: JSON-RPC → rpc.js dispatchTool(hooks, rummy, scheme, ...)
457
- → hooks.tools.dispatch(scheme, entry, rummy)
458
- Plugin: rummy.set({path, body, ...}) / rummy.rm(path) / etc.
459
- → direct entries.* store calls (bypasses the handler chain)
460
- ```
461
-
462
- Model and client tiers both land in `hooks.tools.dispatch`, which
463
- invokes the scheme's registered handler. Model-tier additionally
464
- passes through `TurnExecutor.#record()` (adds turn-scoped recording,
465
- policy filtering, abort cascade). Plugin-tier convenience verbs
466
- (`rummy.rm`, `rummy.set`, ...) are thin wrappers over the store — they
467
- don't invoke the handler chain. Plugin code that wants full handler
468
- semantics calls `hooks.tools.dispatch` directly.
469
-
470
- **Two-phase turn execution.** Model output flows through
471
- `TurnExecutor.execute` in strict order:
472
-
473
- 1. **RECORD** — every parsed command is materialized as a
474
- `log://turn_N/action/slug` audit entry via `#record()`. Each
475
- tool's parser shape surfaces exactly one of `path` / `command` /
476
- `question` as its addressable target; absent fields are treated
477
- as empty so the validation gate catches bad shapes rather than
478
- letting `undefined` propagate. Targets longer than 512 chars or
479
- containing control characters are rejected as likely reasoning
480
- bleed (the model's chain-of-thought leaking into a tool path).
481
- Plugins can validate or transform via the `entry.recording`
482
- filter before the row is committed.
483
- 2. **DISPATCH** — recorded entries fire sequentially via
484
- `hooks.tools.dispatch`. Each tool runs to completion before the
485
- next starts. A failed entry sets `abortAfter`; subsequent
486
- entries record as `outcome="aborted"`. Crashes inside dispatch
487
- route through `hooks.error.log` at status 500 and trigger the
488
- same abort cascade. After each entry, `proposal.prepare` lets
489
- plugins materialize pending 202 proposals (e.g. `set`'s
490
- search/replace revisions) from the just-recorded entry.
491
-
492
- Narration outside tags is fine when the turn also emitted at least
493
- one command — "OK", "Let me check:", reasoning prefixes are natural
494
- and don't trigger the no-actionable-tags error path.
495
-
496
- **Tool dispatch:** Commands are dispatched sequentially in the order
497
- the model emitted them. Each tool either succeeds (200), fails (400+),
498
- or proposes (202). On failure, all remaining tools are aborted. On
499
- proposal, dispatch pauses, a notification is pushed to the client
500
- (same WebSocket push pattern as `run/state`), the client resolves
501
- (accept/reject), and dispatch resumes — the proposal becomes 200 or
502
- 400+ like any other tool. The `ask`/`act` RPC response is only sent
503
- when all tools have completed. Proposals are NOT batched — each is
504
- sent and resolved inline during dispatch. The model controls tool
505
- ordering; the system respects it.
506
-
507
- If the model sends `<update status="200">` (terminal) but a preceding
508
- action in the same turn failed, the terminal assertion is overridden
509
- to a continuation (the model's claim of doneness is false); the update
510
- plugin resolves the update entry to 409 and surfaces it to the next
511
- turn as a continuation. Multiple `<update>` tags → last signal wins.
512
-
513
- **Post-dispatch budget check:** After all tools dispatch, TurnExecutor
514
- emits `turn.dispatched`; the budget plugin subscribes, re-materializes
515
- context, and checks the ceiling. If context exceeds the ceiling, Turn
516
- Demotion fires — all `visible` `run_views` rows for the current turn
517
- have their `visibility` flipped to `summarized`, and an `error://` entry at status 413 is
518
- written. Status is NOT touched (see [schemes_status_visibility](#schemes_status_visibility)). The tools already ran;
519
- their outcomes are settled.
520
-
521
- ### Plugin Convention {#plugin_convention}
522
-
523
- A plugin is an instantiated class. The class name matches the file name.
524
- The constructor receives `core` (a PluginContext) — the plugin's
525
- complete interface with the system.
526
-
527
- ```js
528
- export default class Rm {
529
- #core;
530
-
531
- constructor(core) {
532
- this.#core = core;
533
- core.ensureTool();
534
- core.registerScheme({ category: "logging" });
535
- core.on("handler", this.handler.bind(this));
536
- core.on("visible", this.full.bind(this));
537
- core.on("summarized", this.summary.bind(this));
538
- }
539
-
540
- async handler(entry, rummy) {
541
- // rummy here is per-turn RummyContext (not the startup PluginContext)
542
- }
543
-
544
- full(entry) { return `# rm ${entry.attributes.path}`; }
545
- summary(entry) { return ""; }
546
- }
547
- ```
548
-
549
- **Registration verbs on PluginContext:**
550
- - `"handler"` — tool handler (dispatches when a matching entry is recorded).
551
- - `"visible"` / `"summarized"` — visibility view projections. Return the
552
- projected body string for the given visibility level.
553
- - Any hook name (e.g. `"turn.started"`, `"entry.created"`) — subscribes
554
- to that event.
555
- - `core.filter(name, callback, priority)` — subscribes to a filter chain.
556
-
557
- **Two objects:**
558
- - `this.#core` — PluginContext (startup). For registration: `on()`, `filter()`.
559
- - `rummy` argument — RummyContext (per-turn). For runtime: tool verbs, queries.
560
-
561
- **Plugin types:**
562
- - **Tool plugins**: register `handler` + `visible`/`summarized`. Model-invokable.
563
- - **Assembly plugins**: register `core.filter("assembly.system"|"assembly.user", ...)`. Own a packet tag.
564
- - **Infrastructure plugins**: subscribe to lifecycle events
565
- (`turn.started`, `turn.response`, `turn.completed`, `entry.created`,
566
- `loop.started`, etc.). Background work.
567
-
568
- A plugin can be multiple types. Known is a tool AND an assembly plugin.
569
-
570
- ### Failure Reporting {#failure_reporting}
571
-
572
- **The action entry IS its outcome.** Every action plugin's handler
573
- finalizes the action's own log entry (`log://turn_N/{action}/{slug}`)
574
- with body, state, and outcome. Success and failure are two values of
575
- the same shape — only the field values change. The model sees both
576
- through the same channel, rendered under the action's scheme.
577
-
578
- ```
579
- <get path="src/x.js" status="200">…file body…</get> # success
580
- <get path="src/x.js" state="failed" outcome="not_found"> # failure
581
- src/x.js not found
582
- </get>
583
- ```
584
-
585
- State + outcome label the verdict; body is the result — file content
586
- on success, failure message on failure. No separate error entry is
587
- written for action-level failures; the model finds the failure exactly
588
- where it would find the success: at the action's own log path.
589
-
590
- **Strike attribution.** `error.js#verdict` looks up the post-handler
591
- state of every recorded entry on each turn. Any `state="failed"`
592
- result counts as a strike. Plugin authors write their action entry
593
- once with the right state; the strike machinery follows. They never
594
- call `error.log.emit` for action-level failures.
595
-
596
- **`error.log.emit` is for actionless failures** — failures that have
597
- no corresponding action entry to attach to:
598
-
599
- - Dispatch crash — the framework caught an exception thrown from inside
600
- a handler before the handler had a chance to write its own entry.
601
- - Parser-level failures — malformed XML warnings, no-actionable-tags
602
- responses, fired before any action entry could be recorded.
603
- - Runtime watchdog firings — `ContextExceededError`, RPC timeout,
604
- stream timeout — not bound to a specific action.
605
- - Budget overflow — pre-dispatch rejection.
606
-
607
- `error.log.emit` writes a `log://turn_N/error/<slug>` entry and
608
- increments `state.turnErrors`, which also feeds strike accumulation.
609
- Both channels (action-entry state=failed and `error.log.emit`)
610
- contribute to the strike streak; either path advances it.
611
-
612
- **Recording-filter rejection.** Plugins on the `entry.recording` filter
613
- chain (e.g. `policy`) can return an entry with `state="failed"`. The
614
- framework writes that entry to the store before returning from
615
- `#record`, and dispatch skips it. The model sees the rejection at the
616
- action's own log path, exactly like any other action-level failure.
617
-
618
- Cycle detection is **silent** — it does not call `error.log.emit`.
619
- The strike accumulates internally via `state.turnErrors++`; on
620
- `MAX_STRIKES` the run abandons at 499 with a telemetry-side reason.
621
- The model sees no special signal, because telling the model "you're
622
- looping" invites superficial evasion (vary an attribute to bust the
623
- fingerprint) without addressing the underlying confusion.
624
-
625
- **Plugin author contract.** Your handler does one job: finalize the
626
- action's own log entry with the right body/state/outcome. That's the
627
- whole API for failure reporting. You do not call `error.log.emit`.
628
- If your handler throws, the framework catches and routes through
629
- `error.log.emit` at status 500 — that's the only situation where the
630
- framework writes on your behalf.
631
-
632
- ### Mode Enforcement {#mode_enforcement}
633
-
634
- Two mechanisms, operating at different layers:
635
-
636
- 1. **Tool-list exclusion** — `hooks.tools.resolveForLoop(mode, flags)`
637
- computes the active tool set at loop start. Ask mode excludes `sh`.
638
- Flag-driven exclusions: `noInteraction` removes `ask_user`; `noWeb`
639
- removes `search`; `noProposals` removes `ask_user`/`env`/`sh`. The
640
- excluded tools don't appear in the system prompt's tool list.
641
- 2. **Per-invocation filtering** — the `policy` plugin subscribes to
642
- `entry.recording` and inspects individual emissions for ask-mode
643
- violations that the tool-list alone can't catch (file-scheme `<set>`
644
- edits, file `<rm>`, file-destination `<mv>`/`<cp>`). Rejects by
645
- marking the action entry `state="failed"`, `outcome="permission"`
646
- with a body describing the rejection. Per the failure-reporting
647
- contract — see [failure_reporting](#failure_reporting). The tool
648
- remains advertised; the specific invocation is blocked.
649
-
650
- ### YOLO Mode {#yolo_mode}
651
-
652
- When a run is started with the `yolo: true` attribute (parallel to
653
- `noRepo`/`noWeb`/`noInteraction`/`noProposals`), the server fully
654
- emulates a connected headless client: every proposal auto-accepts and
655
- every sh/env command spawns server-side, streaming output to the
656
- existing data-channel entries. No client involvement; no human
657
- approval required.
658
-
659
- **Plumbing.** The `yolo` attribute flows through the same path as
660
- `noProposals`: `set run://` → `attributes.yolo` → AgentLoop loop config
661
- JSON → RummyContext.yolo getter. The yolo plugin reads `rummy.yolo`
662
- off the proposal-pending event payload and engages only when set.
663
-
664
- **Behavior on yolo runs:**
665
-
666
- 1. **Auto-accept every proposal.** The yolo plugin listens to
667
- `proposal.pending`, replicates AgentLoop.resolve()'s accept path
668
- inline (`proposal.accepting` filter for veto, `proposal.content`
669
- filter for body, `entries.set state="resolved"`,
670
- `proposal.accepted` event for plugin side effects). The
671
- `entries.waitForResolution` blocking call wakes immediately; the
672
- loop continues without RPC roundtrip.
673
- 2. **Server-side sh/env execution.** For proposals on
674
- `log://turn_N/sh/...` or `log://turn_N/env/...`, the yolo plugin
675
- spawns the command in `projectRoot`, streams stdout/stderr to
676
- `{dataBase}_1`/`{dataBase}_2` via `entries.set append=true`, and
677
- transitions channels to terminal state on exit (200 / 500 mirror
678
- of the existing `stream/completed` RPC contract). Done in-process,
679
- no RPC roundtrip.
680
- 3. **Non-yolo runs unaffected.** Without `yolo: true`, the plugin's
681
- `proposal.pending` listener returns early. Existing client-driven
682
- resolution (rummy.nvim, AuditClient's file-edit auto-accept) works
683
- exactly as before.
684
-
685
- **Use cases.** E2E tests, benchmarks, CI, headless usage. The pattern
686
- is opt-in per run; rummy.nvim does not set `yolo: true` because
687
- human-in-the-loop control is the user-facing flow.
688
-
689
- **Architectural placement.** The yolo plugin owns its flag handling
690
- end-to-end — backbone files (TurnExecutor, AgentLoop) carry only the
691
- plumbing for the attribute and the rummy-context payload enrichment
692
- on `proposal.pending`. Feature logic stays in
693
- `src/plugins/yolo/yolo.js`.
694
-
695
- ### Project Manifest {#project_manifest}
696
-
697
- The `rummy.repo` plugin writes a single `log://turn_0/repo/manifest` entry
698
- once per run — a flat snapshot of every project file with its token
699
- cost. It gives the model orientation at run start without burning
700
- prefix-cache on a turn-keyed regeneration. Files themselves default
701
- to `archived` so a 5000-file repo doesn't dump hundreds of thousands
702
- of tokens into context before any work happens.
703
-
704
- **Entry contract.**
705
-
706
- - Path: `log://turn_0/repo/manifest` (log scheme; turn-0 marks "before
707
- any model turn"). One entry per run, written once.
708
- - Visibility: `visible` at write; demotable like any log entry.
709
- - Body: a flat list of `* <relative-path> - <N> tokens` lines, one
710
- per file, sorted by path. No headers, no directory aggregation, no
711
- constraints, no navigation legend — those are the model's business
712
- to derive from the list itself or from tooldocs.
713
-
714
- **Stale by design.** The manifest is a turn-0 snapshot; it does not
715
- update mid-run. Authoritative current state lives in the per-file
716
- entries (mtime/hash-driven, change-only writes). The model can
717
- `<get path="**" preview/>` for a fresh listing if it suspects
718
- staleness.
719
-
720
- **File default visibility flip.**
721
-
722
- `FileScanner` registers each tracked file at `archived` by default
723
- (was `summarized`). Files with `constraint=active` still register at
724
- `visible`. The model uses the manifest to discover paths, then
725
- promotes individual files via `<get path=...>` (visible, full body)
726
- or whole subtrees via `<set path=".../**" visibility="summarized"/>`
727
- (skim mode, symbols only).
728
-
729
- **Disabled when noRepo.** Setting `noRepo: true` on a run skips the
730
- scan entirely; no manifest is created and no file entries are
731
- registered. Behaviour identical to pre-plugin runs.
732
-
733
- ### Streaming Entries {#streaming_entries}
734
-
735
- Producers that generate output over time (shell commands, web fetches,
736
- log tails, file watches) use the streaming-entry pattern. Entry
737
- lifecycle extends beyond the synchronous 202→200/400+ flow.
738
-
739
- **Lifecycle:**
740
-
741
- ```
742
- 202 Proposal (user decision pending)
743
- → accept → 200 (log entry: action complete) + 102 data entries
744
- → reject → 403
745
- ```
746
-
747
- **Entry shape for a streaming producer** — two namespaces per
748
- invocation, one for the audit record, one for the payload (see
749
- [scheme_category_split](#scheme_category_split)):
750
-
751
- ```
752
- log://turn_N/{action}/{slug} scheme=log category=logging status=202→200
753
- body: "ran 'command', exit=0, Output: {paths}"
754
- (renders in <log>)
755
-
756
- {action}://turn_N/{slug}_1 scheme={action} category=data status=102 → 200/500
757
- body: primary stream (stdout for shell)
758
- tags="{command}" visibility=summarized
759
- (line in <summary>; full body in
760
- <visible> when promoted)
761
-
762
- {action}://turn_N/{slug}_2 scheme={action} category=data status=102 → 200/500
763
- body: alt stream (stderr for shell)
764
- (line in <summary>; full body in
765
- <visible> when promoted, often empty)
766
- ```
767
-
768
- `{action}` is the producer plugin's name (`sh`, `env`, future: `search`,
769
- `fetch`, ...). The stream RPC accepts the **log-entry path** and derives
770
- the data base internally via `logPathToDataBase` — see
771
- [stream_plugin](#stream_plugin).
772
-
773
- **Channel numbering follows Unix file descriptor convention.** Channel
774
- 1 is primary output (stdout for shell); channel 2 is alternate/error
775
- output (stderr); higher numbers for additional producer-specific
776
- channels. Non-process producers (search, fetch) map their streams onto
777
- the same numeric space: `_1` for the primary data stream, `_2` for
778
- anomalies/errors, `_3`+ for auxiliary streams.
779
-
780
- **Search prefetch.** The `search` producer (provided by `rummy.web`
781
- when wired) may prefetch its result URLs as separate `<https>` data
782
- entries before the model emits any `<get>`. The model sees those
783
- pages as already-summarized data without having explicitly loaded
784
- them. Auditors reading dumps should be aware: the absence of a
785
- corresponding `log://turn_N/get/` for a URL does **not** mean the
786
- URL wasn't loaded — it may have arrived via search prefetch. The
787
- prefetch policy is the search plugin's implementation detail; the
788
- data entries themselves obey the streaming-producer shape above.
789
-
790
- **Status 102 ("Processing") marks an entry in mid-stream:** body is
791
- partial, will change; tokens grow as chunks arrive. Agents reading a
792
- 102 entry use `<get>` with `line`/`limit` (including negative `line`
793
- for tail) to sample without promoting full body.
794
-
795
- **Status transition on completion** is terminal: 200 (exit_code=0 or
796
- N/A for non-process producers), 500 (non-zero exit), or 499 (client
797
- aborted via `stream/aborted`). The log entry is rewritten with final
798
- stats (exit code, duration, channel sizes, or abort reason).
799
-
800
- **Budget demotion preserves status.** A 102 entry demoted by Turn
801
- Demotion stays at 102 — status reflects operation outcome, visibility
802
- reflects visibility. See [schemes_status_visibility](#schemes_status_visibility) for the status-vs-visibility separation.
803
-
804
- **Stream plugin ([plugin_system](#plugin_system)) owns the append and completion RPCs.** Producer
805
- plugins (sh, env) create the proposal and data entries; the stream
806
- plugin handles the subsequent growth and terminal transitions.
807
-
808
- ---
809
-
810
- ## Message Structure {#message_structure}
811
-
812
- Two messages per turn. System = stable truth. User = active task.
813
-
814
- ### Packet Structure {#packet_structure}
815
-
816
- ```
817
- [system message]
818
- instructions-system.md text (with [%TOOLS%] / [%TOOLDOCS%]
819
- expansions) + persona body. Resolved by the instructions
820
- plugin's hooks.instructions.resolveSystemPrompt — single-owner,
821
- cache-stable across all turns within a run. The assembly.system
822
- filter chain exists but currently has no subscribers; the
823
- system message is the resolved system prompt verbatim.
824
- [user message] (sandwich ordering — see below)
825
- <prompt tokenUsage="N" tokensFree="M">user prompt</prompt>
826
- (prompt.js, assembly.user priority 30 — front, cacheable
827
- across the run within a loop)
828
- <summary>
829
- one entry per category=data entry whose visibility is visible
830
- or summarized. Each entry renders under its scheme tag with
831
- its summarized projection as the tag body — the compact-but-
832
- informative view produced by the plugin's summarized() hook
833
- (truncated knowns, code symbols for files, page abstracts
834
- for URLs). Identity-keyed, slow-mutating: only grows when a
835
- new entry lands. Archived entries — including prompts —
836
- are filtered out uniformly. There is no instruction-side
837
- guard against archiving the active prompt — if the model
838
- archives it, the next turn renders without a <prompt> tag
839
- and visibly fails (paradigm purity over silent rescue;
840
- action-gate is the principled future fix per
841
- src/plugins/prompt/README.md).
842
- (known.js, assembly.user priority 50)
843
- </summary>
844
- <visible>
845
- each category=data entry whose visibility is visible, rendered
846
- under its scheme tag with its visible projection as the tag
847
- body (full body per the plugin's visible() hook). Working-set:
848
- append on promote, remove on demote. A visible entry exists in
849
- BOTH blocks — summary projection up top, full body below.
850
- (known.js, assembly.user priority 75)
851
- </visible>
852
- <log>
853
- action history — all logging-category entries (log:// audit
854
- records, error://, update://) plus pre-latest prompt://
855
- entries (the active prompt is extracted to <prompt>).
856
- (log.js, assembly.user priority 100)
857
- </log>
858
- <unknowns>
859
- open questions at category=unknown, rendered under <unknown>
860
- children with their bodies as questions. (unknown.js,
861
- assembly.user priority 150)
862
- </unknowns>
863
- <instructions>
864
- instructions-user.md text. Per-turn imperative reminders.
865
- Same bytes every turn — no phase keying, no status-driven
866
- selection. (instructions.js, assembly.user priority 165)
867
- </instructions>
868
- <budget tokenUsage="N" tokensFree="M">…breakdown table…</budget>
869
- (budget.js, assembly.user priority 175 — last, recency for
870
- the live accounting at the action site)
871
- ```
872
-
873
- **System** = stable world state the model operates within (identity,
874
- tools, tool docs, persona). Stable across turns within a run, which
875
- keeps prompt caching intact. **User** = active work: the project's
876
- data surface, history, open questions, current task, and live
877
- accounting. The user message changes turn-to-turn so it sits outside
878
- the prefix-cacheable region; both `<instructions>` and the codebase
879
- blocks (`<summary>` / `<visible>`) live here because they mutate at
880
- turn cadence — putting mutable state in system would invalidate the
881
- cache on every promote.
882
-
883
- **Sandwich ordering.** User-message blocks are arranged
884
- `<prompt>` (30, front) → `<summary>` (50) → `<visible>` (75) →
885
- `<log>` (100) → `<unknowns>` (150) → `<instructions>` (165) →
886
- `<budget>` (175, last). The prompt sits at the front (cacheable
887
- across turns of a loop, since it doesn't change within a loop); the
888
- instructions and budget sit at the tail so the rules and live
889
- accounting have recency at the action site. An earlier front-loaded
890
- ordering (instructions first for max cache) regressed terminal-
891
- `<update>` discipline in e2e — the model lost the rule when it sat
892
- 3K tokens upstream of the action. Recency at the action site beats
893
- cache savings when the action depends on remembering a rule.
894
-
895
- **Why two blocks instead of one `<context>`.** Promote/demote is the
896
- dominant intra-loop operation. A single-block render would
897
- invalidate the entire data surface on every promote. With the split,
898
- `<summary>` mutates only when a new entry lands (slow); `<visible>`
899
- mutates on every promote/demote (fast). Ordering slow-above-fast
900
- preserves the prefix cache for `<summary>` across the common case.
901
- Cognitively: `<summary>` is "what I know exists" (identity);
902
- `<visible>` is "what I'm reading right now" (working memory).
903
-
904
- The `<prompt>` tag is present on every turn — first turn and
905
- continuations alike. The model always sees its task. The
906
- `tokenUsage` / `tokensFree` attributes also appear on `<budget>` so
907
- the model can do budget arithmetic at both ends of the user message.
908
-
909
- ### Loops and Cross-Loop Continuity {#loops_previous_performed}
910
-
911
- A **loop** is one `ask` or `act` invocation and all its continuation
912
- turns until `<update status="200">`, fail, or abort. A run may
913
- contain many loops; pending loops queue FIFO via the loops table.
914
-
915
- Cross-loop continuity is carried by the entry store itself:
916
-
917
- - **Knowns, files, unknowns** persist across loop boundaries with
918
- whatever visibility the model left them at. They render in
919
- `<summary>` / `<visible>` per visibility, regardless of which
920
- loop wrote them.
921
- - **Log entries** (action audit, errors, updates) accumulate at
922
- `log://turn_N/...` for every turn of every loop; `log.js`
923
- renders all logging-category entries plus pre-latest prompts in
924
- `<log>` in chronological order.
925
- - **The active prompt** is extracted from its chronological
926
- position and rendered as `<prompt>` at priority 30 (front);
927
- prior prompts render in `<log>` like any other logging entry.
928
-
929
- When a new prompt arrives on an existing run, the prior loop's
930
- `prompt://N` entry stays in the store; on the next assembly it
931
- falls out of `<prompt>` (replaced by the new prompt) and into
932
- `<log>` — visibility-driven re-rendering of the same entry rows.
933
-
934
- ### Key Entries {#key_entries}
935
-
936
- | Path | Lifetime | Body | Attributes |
937
- |------|----------|------|-----------|
938
- | `instructions://system` | One per run (mutable) | Empty (projection builds from `instructions.md` + tool docs + optional persona) | `{ persona, toolSet }` |
939
- | `system://N` | Audit, one per turn | Full assembled system message | — |
940
- | `user://N` | Audit, one per turn | Full assembled user message | — |
941
- | `assistant://N` | Audit, one per turn | Model's raw response | — |
942
-
943
- `instructions://system` is the only mutable entry in this group. The
944
- framework auto-populates `toolDescriptions` from tool registrations
945
- that include `docs`. The instructions projection assembles the final
946
- text from body + attributes.
947
-
948
- ### Materialization {#materialization}
949
-
950
- Each turn:
951
-
952
- 1. Write `instructions://system` (empty body, attributes = { persona, toolSet })
953
- 2. Emit `turn.started` — plugins write prompt/instructions entries
954
- 3. Resolve the instructions system prompt
955
- (`hooks.instructions.resolveSystemPrompt` — single-owner; see
956
- AGENTS.md "Architectural exceptions"). Returns
957
- `instructions-system.md` with `[%TOOLS%]` / `[%TOOLDOCS%]`
958
- expanded, persona body appended.
959
- 4. Query `v_model_context` VIEW → visible entries (joined from
960
- `run_views` + `entries` + `schemes`)
961
- 5. Project each entry through its scheme's `visible`/`summarized` projection
962
- 6. Insert projected rows into `turn_context`
963
- 7. Invoke `assembly.system` filter chain — currently no
964
- subscribers, so the system message is the resolved system
965
- prompt verbatim.
966
- 8. Invoke `assembly.user` filter chain (empty string as base):
967
- - Prompt plugin (priority 30) → `<prompt>` element (carries
968
- `tokenUsage` / `tokensFree` attrs)
969
- - Known plugin (priority 50) → `<summary>` section
970
- - Known plugin (priority 75) → `<visible>` section
971
- - Log plugin (priority 100) → `<log>` section
972
- - Unknown plugin (priority 150) → `<unknowns>` section
973
- - Instructions plugin (priority 165) → `<instructions>` section
974
- (renders `instructions-user.md`)
975
- - Budget plugin (priority 175) → `<budget>` element (carries
976
- `tokenUsage` / `tokensFree` and per-scheme breakdown)
977
- 9. Store as `system://N` and `user://N` audit entries (telemetry plugin)
978
-
979
- The VIEW determines visibility from `visibility` and `status`:
980
- - `visibility = 'visible'` → full body visible in `<visible>` (data) or `<log>` (logging).
981
- - `visibility = 'summarized'` → summarized projection visible (typically path +
982
- summary attr). Promote with `<get>` to expand.
983
- - `visibility = 'archived'` → invisible. Discoverable via pattern search
984
- (`<get path="known://*">keyword</get>`); promote to bring back into view.
985
- - `status = 202` → invisible (proposed, pending client resolution).
986
- - `model_visible = 0` → invisible (audit schemes: instructions, system,
987
- reasoning, model, user, assistant, content, tool).
988
-
989
- **Partial read:** `<get path="..." line="N" limit="M"/>` returns lines N
990
- through N+M−1 of the entry body as the log item without changing
991
- visibility or promoting the entry to context. Use after reading a
992
- demoted entry (which shows path + summary) to target a specific slice.
993
- Single-path only — glob or body filter with `line`/`limit` is a 400 error.
994
-
995
- Model controls visibility via `<set>` attributes:
996
- `visibility="archived|summarized|visible"`. The `summary="..."` attribute
997
- attaches a description (≤ 80 chars) that persists across visibility
998
- changes.
999
-
1000
- ### Filesystem Freshness {#filesystem_freshness}
1001
-
1002
- After any mutation of a file or scheme entry, the next turn's
1003
- assembled context reflects the post-mutation body AND visibility,
1004
- without the model needing a fresh `<get>` to recover its own
1005
- changes. The model's view of the entry store is always a faithful
1006
- projection of current state — there is no read-after-write skew.
1007
-
1008
- The invariant has two parts:
1009
-
1010
- 1. **Body freshness** — a write that changes the entry body shows
1011
- the new body on the next assembly's `<visible>` (when visible)
1012
- or under `<get>` (when summarized/archived).
1013
- 2. **Visibility freshness** — a write that explicitly sets
1014
- `visibility=...` honors the requested level on the next
1015
- assembly. Edit-path side effects (e.g., a SEARCH/REPLACE accept
1016
- silently downgrading visibility) violate the invariant; the
1017
- model would answer the next turn from memory of pre-edit state
1018
- while the new body sits invisible.
1019
-
1020
- Enforcement: `test/integration/file_freshness.test.js` exercises
1021
- write-through for both file and scheme entries.
1022
-
1023
- ### Token Accounting {#token_accounting}
1024
-
1025
- Tokens are a property of the materialized packet, not of stored entries.
1026
- They are computed during assembly, exposed on the materialization records,
1027
- and consumed by the budget plugin for the model-facing `<budget>` table.
1028
- Nothing else in the system has its own opinion of "what an entry costs."
1029
-
1030
- **Per-entry materialization records** carry three token measures:
1031
-
1032
- | Field | Meaning |
1033
- |---|---|
1034
- | `vTokens` | Wire cost when the entry is fully visible. The body rendered through the scheme's `visible` view, wrapped in its envelope tag, tokenized. |
1035
- | `sTokens` | Wire cost when the entry is summarized. The body rendered through the scheme's `summarized` view (typically a projection or 500-char preview), wrapped in its envelope tag, tokenized. |
1036
- | `aTokens` | `vTokens − sTokens`. The promotion premium — the marginal cost of the entry being visible rather than summarized. The only token measure exposed to the model on per-entry tags. |
1037
-
1038
- The model sees `tokens="N"` on each entry tag. That `N` is `aTokens`. It
1039
- means: *demoting this entry frees `N` tokens; promoting this entry from
1040
- summarized to visible costs `N` tokens.* The number is a pure lever — no
1041
- body-vs-wire ambiguity, no envelope overhead surprise.
1042
-
1043
- **Floor and premium.** A run's packet decomposes into:
1044
-
1045
- - **Summarized floor** = sum of `sTokens` for all non-archived entries.
1046
- Paid regardless of any visibility decision the model can make. Includes
1047
- the per-entry projection cost for every entry that's either `visible`
1048
- (since visible entries also pay their projection-cost-equivalent within
1049
- vTokens) or `summarized`.
1050
- - **Visibility premium** = sum of `aTokens` for currently-visible entries.
1051
- The active cost of visibility decisions. The model's lever.
1052
- - **System overhead** = system prompt + tool definition tokens. Constant
1053
- per turn, not addressable by the model.
1054
-
1055
- `tokenUsage = floor + premium + system`. `tokensFree = ceiling − tokenUsage`.
1056
-
1057
- **`<budget>` rendered shape** (between `<instructions>` and `<prompt>`,
1058
- priority 275):
1059
-
1060
- ```
1061
- <budget tokenUsage="N" tokensFree="M">
1062
- | scheme | visible | tokens | % |
1063
- |---|---|---|---|
1064
- | <scheme> | <count> | <sum-of-aTokens> | <%-of-ceiling> |
1065
- ... rows for visible-scheme breakdown, sorted desc by tokens ...
1066
-
1067
- Summarized: <count> entries, <sum-of-sTokens> tokens (<%>% of budget).
1068
- System: <token-count> tokens (<%>% of budget).
1069
- Total: <visible-count> visible + <summarized-count> summarized entries; tokenUsage <N> / ceiling <C>. <M> tokens free.
1070
- </budget>
1071
- ```
1072
-
1073
- **Why the table only contains visible scheme rows.** The `tokens` column
1074
- in the table is `aTokens` — the action lever. Per-entry visibility of
1075
- summarized entries is intentionally not surfaced; surgical pruning of
1076
- individual high-signal summaries is the wrong action shape. The
1077
- summarized aggregate line below the table is the only signal for that
1078
- class — actionable via glob (`<set path="known://oldsession/*"
1079
- visibility="archived"/>`), not per-entry.
1080
-
1081
- **Where the math is computed.** Materialization (the assembly path
1082
- through `materializeContext.js` and `ContextAssembler.js` plus per-scheme
1083
- view handlers) renders each entry's visible and summarized projections,
1084
- wraps them in their envelope, and tokenizes both. The resulting per-entry
1085
- record carries `vTokens`/`sTokens`/`aTokens` alongside the projected
1086
- text. The budget plugin's `assembleBudget` filter consumes this; no other
1087
- caller measures tokens.
1088
-
1089
- **Body-size gates** (e.g. `known.js` MAX_ENTRY_TOKENS) compute
1090
- `countTokens(body)` inline at write time. They check intrinsic body
1091
- size, not wire cost — the materialization record doesn't yet exist when
1092
- an entry is being written.
1093
-
1094
- ### Budget Enforcement {#budget_enforcement}
1095
-
1096
- The model owns its context. The system enforces a hard ceiling and
1097
- surfaces the numbers. Auto-demotion is reserved for the 413 budget
1098
- grinder, which only fires in response to actual overflow — never
1099
- helpfully or speculatively.
1100
-
1101
- **Ceiling.** `ceiling = floor(contextSize × RUMMY_BUDGET_CEILING)`
1102
- (default `RUMMY_BUDGET_CEILING = 0.9`, i.e. 10% headroom). All budget
1103
- decisions compare `assembledTokens` against `ceiling`, never against
1104
- `contextSize` directly.
1105
-
1106
- **Pre-LLM grinder** (`hooks.turn.beforeDispatch.filter`, in
1107
- TurnExecutor before the LLM call; budget is the canonical
1108
- subscriber). A four-step ladder. Each step demotes a strictly smaller
1109
- scope and rechecks. The first step that fits the ceiling proceeds to
1110
- the LLM; if step 4 fires, AgentLoop exits the loop with 413.
1111
-
1112
- 1. **Check budget.** Measure `assembledTokens` (using
1113
- `turns.context_tokens` from the prior turn when available, the
1114
- materialized packet estimate as a first-turn fallback). If
1115
- `assembledTokens ≤ ceiling`, proceed to the LLM.
1116
- 2. **Soft 413 — previous-turn demotion.** Flip every `run_views`
1117
- row where `turn = current_turn - 1 AND visibility = visible` to
1118
- `summarized` (status preserved — see
1119
- [schemes_status_visibility](#schemes_status_visibility)). All
1120
- schemes participate; no exemption for knowns / unknowns /
1121
- files. Re-materialize, re-check.
1122
- 3. **Soft 413 — current-prompt demotion.** Flip the incoming
1123
- `prompt://N` entry to `summarized`. Re-materialize, re-check.
1124
- Step 3 exists because the prompt is stamped at `current_turn`,
1125
- not the previous turn — step 2's filter never sees it. Without
1126
- step 3, an oversized first-turn prompt has no path to fit.
1127
- 4. **Hard 413.** Emit a 413 `error://` entry via
1128
- `hooks.error.log.emit` with the descriptive body (what was
1129
- demoted across steps 2-3, the ceiling, the residual overflow).
1130
- AgentLoop exits the loop with 413.
1131
-
1132
- Steps 2 and 3 also emit 413 `error://` entries when they fire
1133
- (distinct from step 4 in that the run keeps going). The model reads
1134
- those next turn and learns what got auto-demoted. Status of the
1135
- turn that proceeded after a soft 413 is unaffected.
1136
-
1137
- **Trunks and forks are treated identically.** A forked run inherits
1138
- the parent's `run_views` rows verbatim — each entry keeps its
1139
- original `turn`. There is no fork-event restamping. The grinder's
1140
- `current_turn - 1` rule applies the same way in both cases. For
1141
- the rule to point at meaningful inherited content on a fork's first
1142
- dispatch, the child run inherits the parent's `next_turn` so turn
1143
- numbering is absolute across the lineage; sibling forks share the
1144
- same prior history at lower turn numbers and only diverge at
1145
- fork-time.
1146
-
1147
- **LLM-reported context exceeded.** If the LLM rejects the request
1148
- with a "context too long" error (detected via the regex in
1149
- `src/llm/errors.js`), the LlmProvider raises `ContextExceededError`
1150
- which TurnExecutor catches and emits a 413 error through the same
1151
- channel.
1152
-
1153
- **Known-scheme size gate** (in the `known` plugin). Writes to
1154
- `known://` entries exceeding `RUMMY_MAX_ENTRY_TOKENS` (default 512)
1155
- are rejected at the handler with an instructive error message. Forces
1156
- atomic entries instead of dumping transcripts into a single `known://`.
1157
-
1158
- **Advisory feedback.** The model reads `tokensFree` / `tokenUsage`
1159
- attributes on `<budget>` every turn and self-regulates. The full
1160
- breakdown (per-scheme visible cost, summarized aggregate, system
1161
- overhead) lives in the same tag — see [token_accounting](#token_accounting)
1162
- for the rendered shape and the contract for what each number means.
1163
- No threshold-based warnings. When the ceiling is actually breached the
1164
- 413 `error://` entry is the feedback.
1165
-
1166
- **Token math:** `Math.ceil(text.length / RUMMY_TOKEN_DIVISOR)`. One
1167
- formula, one file (`src/agent/tokens.js`), env-configurable. No
1168
- external dependencies. All costs surfaced to the model and the budget
1169
- guard come through materialization (see [token_accounting](#token_accounting));
1170
- the budget guard's pre-LLM check uses the actual API tokens
1171
- (`turns.context_tokens` from the prior turn) when available, falling
1172
- back to the materialized packet estimate on turn 1.
1173
-
1174
- **`context_tokens` vs `prompt_tokens` in step telemetry:**
1175
- - `context_tokens` in the step JSON = `turns.context_tokens` for that turn =
1176
- per-turn actual input tokens from the LLM API (e.g. 7900 tokens sent this turn)
1177
- - `prompt_tokens` in the step JSON = `SUM(turns.prompt_tokens)` for the run =
1178
- **cumulative** total across all turns (cost tracking, not a context size)
1179
-
1180
- These two will diverge rapidly on any multi-turn run. A run at turn 50 might show
1181
- `context_tokens: 8000` (context under control) and `prompt_tokens: 400000`
1182
- (total input tokens billed across the whole run). They are measuring orthogonal things.
1183
-
1184
-
1185
- ---
1186
-
1187
- ## RPC Protocol
1188
-
1189
- JSON-RPC 2.0 over WebSocket. `discover` returns the live catalog.
1190
-
1191
- ### Methods {#rpc_methods}
1192
-
1193
- #### Protocol
1194
-
1195
- | Method | Params |
1196
- |--------|--------|
1197
- | `ping` | — |
1198
- | `discover` | — |
1199
- | `init` | `{ name, projectRoot, configPath? }` |
1200
-
1201
- #### Models
1202
-
1203
- | Method | Params |
1204
- |--------|--------|
1205
- | `getModels` | `{ limit?, offset? }` |
1206
- | `addModel` | `{ alias, actual, contextLength? }` |
1207
- | `removeModel` | `{ alias }` |
1208
-
1209
- #### Entry Operations (dispatched through handler chain)
1210
-
1211
- | Method | Params |
1212
- |--------|--------|
1213
- | `get` | `{ path, run, persist?, readonly? }` |
1214
- | `set` | `{ run, path, body?, attributes? }` |
1215
- | `rm` | `{ run, path }` |
1216
- | `mv` | `{ run, path, to }` |
1217
- | `cp` | `{ run, path, to }` |
1218
- | `store` | `{ path, run?, persist?, ignore?, clear? }` |
1219
- | `getEntries` | `{ pattern?, body?, run?, limit?, offset? }` |
1220
-
1221
- All entry operations dispatch through the handler chain. `persist`
1222
- on `get` also sets a project-level file constraint (operator privilege).
1223
- `store` manages file constraints — not a model tool.
1224
-
1225
- #### Runs
1226
-
1227
- | Method | Params |
1228
- |--------|--------|
1229
- | `startRun` | `{ model, temperature?, persona?, contextLimit?, yolo? }` |
1230
- | `ask` | `{ prompt, model, run?, temperature?, persona?, contextLimit?, noRepo?, noInteraction?, noWeb?, noProposals?, yolo?, fork? }` |
1231
- | `act` | `{ prompt, model, run?, temperature?, persona?, contextLimit?, noRepo?, noInteraction?, noWeb?, noProposals?, yolo?, fork? }` |
1232
- | `run/resolve` | `{ run, resolution: { path, action, output? } }` |
1233
- | `run/abort` | `{ run }` |
1234
- | `run/rename` | `{ run, name }` |
1235
- | `run/inject` | `{ run, message }` |
1236
- | `run/config` | `{ run, temperature?, persona?, contextLimit?, model? }` |
1237
-
1238
- `model` is required on `ask`, `act`, and `startRun`. No default.
1239
- `noRepo` disables default project/repo file scanning (files can still
1240
- be added explicitly by the client).
1241
- `noInteraction` removes `ask_user` from the tool list.
1242
- `noWeb` removes `search` from the tool list.
1243
- `noProposals` removes `ask_user` / `env` / `sh` from the tool list
1244
- (no proposals at all).
1245
- `yolo` opts the run into server-side proposal auto-accept and
1246
- in-process sh/env execution — see [yolo_mode](#yolo_mode).
1247
-
1248
- #### Streaming (see [streaming_entries](#streaming_entries))
1249
-
1250
- | Method | Params |
1251
- |--------|--------|
1252
- | `stream` | `{ run, path, channel, chunk }` |
1253
- | `stream/completed` | `{ run, path, exit_code?, duration? }` |
1254
- | `stream/aborted` | `{ run, path, reason?, duration? }` |
1255
- | `stream/cancel` | `{ run, path, reason? }` |
1256
-
1257
- Producer-agnostic RPC for streaming output into data entries created by
1258
- any plugin (sh/env today; search/fetch/watch as future consumers). The
1259
- `stream` method appends `chunk` to `{path}_{channel}`; `stream/completed`
1260
- transitions all `{path}_*` channels to terminal status (200/500) and
1261
- finalizes the log entry body; `stream/aborted` is the client-initiated
1262
- cancellation counterpart, transitioning channels to **499** (Client
1263
- Closed Request); `stream/cancel` is the server-initiated counterpart
1264
- (transitions to 499 and pushes `stream/cancelled` notification to
1265
- connected clients). `stream/cancel` also handles stale 102 cleanup.
1266
-
1267
- #### Queries
1268
-
1269
- | Method | Params |
1270
- |--------|--------|
1271
- | `getRuns` | `{ limit?, offset? }` |
1272
- | `getRun` | `{ run }` |
1273
-
1274
- #### Skills & Personas
1275
-
1276
- Both attach to a run via the entry grammar.
1277
-
1278
- - **Skills** — model emits `<skill path="[path-or-url]"/>`.
1279
- Handler walks local file/folder/`.zip` (via `yauzl-promise`) or
1280
- fetches a URL. Single `.md` registers as `skill://<name>`
1281
- (summarized); folder/zip registers root `index.md` summarized,
1282
- rest archived; `foo/index.md` collapses to `skill://<name>/foo`.
1283
- Re-emit overwrites. Authors link with absolute `skill://...` URIs.
1284
- - **Personas** — `ask` / `act` / `startRun` accept `persona` as a
1285
- run attribute. The persona plugin renders the persona body inside
1286
- the system prompt (below tooldocs) on first turn; if no `persona`
1287
- is passed, `AgentLoop.ensureRun` defaults to
1288
- `src/plugins/persona/default.md`. 1:1 run:persona, immutable for
1289
- the run's lifetime.
1290
-
1291
- ### Notifications {#notifications}
1292
-
1293
- | Notification | Scoped by | Purpose |
1294
- |-------------|-----------|---------|
1295
- | `rummy/hello` | connection | Server greeting on client connect. Carries `rummyVersion` (semver). Clients check MAJOR and refuse on mismatch. |
1296
- | `run/state` | projectId | Turn state snapshot (status, history, unknowns, telemetry). Fires per command dispatch (incremental 102), at turn conclusion (verdict status), and at terminal run close. |
1297
- | `run/progress` | projectId | Transient turn activity (`thinking` / `processing` / `retrying`). |
1298
- | `run/proposal` | projectId | A 202 entry is awaiting resolution. |
1299
- | `stream/cancelled` | projectId | Server-initiated streaming cancellation. |
1300
- | `ui/render` | projectId | Streaming UI output (e.g. tool progress). |
1301
- | `ui/notify` | projectId | Toast notification. |
1302
-
1303
- **`run/state` payload shape** — the unified contract for both the
1304
- notification and `getRun` RPC:
1305
-
1306
- ```jsonc
1307
- {
1308
- "run": "gemma_1234567890",
1309
- "turn": 4,
1310
- "status": 102, // numeric HTTP status
1311
- "summary": "…", // latest <update status="200"> body, or ""
1312
- "history": [ // chronological per-entry log
1313
- {
1314
- "tool": "set",
1315
- "path": "known://president/current",
1316
- "status": 200,
1317
- "body": "Donald Trump is the 47th president…",
1318
- "turn": 4,
1319
- "attributes": "{\"summary\":\"president,current,trump\",\"visibility\":\"visible\"}"
1320
- }
1321
- ],
1322
- "unknowns": [{ "path": "unknown://…", "body": "…" }],
1323
- "telemetry": null | { /* final end-of-turn usage; null on mid-turn emissions */ }
1324
- }
1325
- ```
1326
-
1327
- `history` includes every entry the model has touched this run in
1328
- timeline order — prompt entries, unknowns, tool results. `attributes`
1329
- is raw JSON; parse client-side. Mid-turn emissions have `telemetry:
1330
- null`; the final emission of each turn includes the full telemetry
1331
- block (token usage, context distribution, cost).
1332
-
1333
- **Telemetry completeness guarantee.** Every `run/state` emission
1334
- computes a real budget from real numbers — never undefined, never
1335
- synthesized. When no fresh turn result is available
1336
- (abort/max-turns/crash paths fire before any turn executed, or after
1337
- a turn that produced no tokens), `AgentLoop.#emitRunState` reads the
1338
- last turn's `context_tokens` from the DB. Absent means no turn ran
1339
- yet; zero is the truth, not a fallback. The shape and the math are
1340
- the same on every code path so the client's renderer never needs to
1341
- discriminate by emission cause.
1342
-
1343
- `stream/cancelled` payload: `{ run, path, reason }`. Server has
1344
- already transitioned the entries to 499 (`Client Closed Request`);
1345
- client should stop sending `stream` chunks for that path.
1346
-
1347
- ### Resolution {#resolution}
1348
-
1349
- | Resolution | Model signal | Outcome |
1350
- |-----------|-------------|---------|
1351
- | reject | any | `completed` — rejection stops the bus |
1352
- | accept | `<update status="102">` | `running` — model has more work |
1353
- | accept | `<update status="200|204|422">` | `completed` — terminal |
1354
- | accept | neither | `running` — healer decides |
1355
- | error | any | `running` — error state, model retries |
1356
-
1357
- **RPC ack vs run terminal status.** `resolve` and `inject` return the
1358
- *current* run status (typically 102 mid-run), not 200. The client's
1359
- dispatch handler must distinguish the synchronous RPC ack from the
1360
- asynchronous `run/state` notification that carries real terminal
1361
- state at end-of-turn — otherwise an HTTP-style 200 ack on a
1362
- successful resolve would prematurely close the document.
1363
-
1364
- **Proposal hook chain.** Resolution flows through three filter/event
1365
- hooks plugins can subscribe to:
1366
-
1367
- - `proposal.accepting` (filter) — first plugin to return
1368
- `{ allow: false, outcome, body }` vetoes acceptance. The entry
1369
- resolves to `state="failed"` with the plugin-supplied outcome and
1370
- body. Used by `policy` for read-only enforcement and similar
1371
- guards. First veto wins; later filters don't run.
1372
- - `proposal.content` (filter) — when acceptance proceeds, plugins
1373
- override the resolved body. Default is `output ?? ""`. The `set`
1374
- plugin uses this to prefer the proposed body it already staged
1375
- on the audit entry over whatever literal body the client passed
1376
- through `resolve`.
1377
- - `proposal.accepted` / `proposal.rejected` (events) — fired after
1378
- the resolution is committed; plugins side-effect on either
1379
- outcome.
1380
-
1381
- ---
1382
-
1383
- ## Plugin System {#plugin_system}
1384
-
1385
- See [PLUGINS.md](PLUGINS.md) for the full plugin development guide,
1386
- including the RummyContext API, tool registration, handler chains,
1387
- projections, events, filters, and hedberg pattern library.
1388
-
1389
- Each plugin has its own README at `src/plugins/{name}/README.md`.
1390
-
1391
- ---
1392
-
1393
- ## Tool Documentation Design {#tool_documentation}
1394
-
1395
- Tool docs are the most carefully designed text in rummy. Every line
1396
- simultaneously teaches syntax, implies workflow priority, demonstrates
1397
- pattern capabilities, and constrains misuse. Each letter earns its place.
1398
-
1399
- ### Principles
1400
-
1401
- **Show, don't tell.** Examples ARE the documentation. A model learns
1402
- `<get path="known://*">auth</get>` from seeing it, not from being told
1403
- "you can filter known entries by keyword." Examples are ordered from
1404
- simple to powerful — weak models learn from examples 1-2, strong models
1405
- pick up the pattern from example 3.
1406
-
1407
- **Lifecycle continuity.** Examples weave stories across tools. The get
1408
- docs demonstrate `<get path="known://*">keyword</get>` for pattern recall
1409
- and `<get path="..." line="N" limit="M"/>` for partial reads that don't
1410
- promote. The known docs reference `<get path="known://*">keyword</get>`
1411
- for recall. The unknown docs reference `<set path="unknown://..."
1412
- visibility="archived"/>` for retiring resolved questions, `<get/>` for
1413
- investigation. A model reading the full tool docs encounters a coherent
1414
- workflow: discover → load → reason → edit → archive → recall.
1415
-
1416
- **RFC 2119 semantics.** Constraint bullets use YOU MUST, YOU MUST NOT,
1417
- YOU SHOULD, YOU MAY from RFC 2119. Every LLM has extensive pretraining
1418
- on RFC documents where these keywords carry precise semantic weight.
1419
- MUST is absolute. SHOULD is strong advisory. MAY is permissive. This
1420
- is not decorative — it's leveraging the model's existing understanding
1421
- of requirement levels.
1422
-
1423
- **Consistent structure.** Every tool doc follows: header (syntax), 2+
1424
- examples, 2+ constraint bullets. Inconsistent formatting reads as
1425
- inconsistent importance. A tool with 5 examples and dense bullets feels
1426
- complex; a tool with 1 line feels disposable. Both are wrong — every
1427
- tool is equally real, each doc is proportional to the tool's surface area.
1428
-
1429
- ### Format
1430
-
1431
- Tool docs live in `*Doc.js` files as annotated line arrays:
1432
-
1433
- ```js
1434
- const LINES = [
1435
- ["* Body text filters results by content match",
1436
- "Generalizes examples 2-3. Body = filter, not just path."],
1437
- ];
1438
- export default LINES.map(([text]) => text).join("\n");
1439
- ```
1440
-
1441
- The first element is the model-facing text. The second is the rationale —
1442
- visible only in source. Changing any line requires reading all rationales
1443
- first. This prevents well-intentioned edits from breaking subtle behavioral
1444
- guarantees that adjacent lines depend on.
1445
-
1446
- ### Tool Display Order
1447
-
1448
- Tools are presented gather → reason → act → communicate. Position in
1449
- the list implies priority. `get` is first. `ask_user` is last. The
1450
- order is defined in `ToolRegistry.TOOL_ORDER` and applied by
1451
- `resolveForLoop()`. The same method handles all tool exclusions —
1452
- mode restrictions, `noInteraction`, `noWeb`, `noProposals` — through
1453
- one unified mechanism.
1454
-
1455
- ### Pattern Distribution
1456
-
1457
- Hedbergian pattern matching (globs, body filters, manifest) is taught
1458
- across multiple tools, not concentrated in one. `get` shows content
1459
- filtering. `cp` shows glob batch operations. `rm` shows manifest safety.
1460
- Each tool reinforces the pattern vocabulary from a different angle.
1461
- A model that sees `path="known://*"` in get, `path="known://plan_*"` in
1462
- cp, and `path="known://temp_*" manifest` in rm learns that patterns
1463
- are universal — not a feature of any single tool.
1464
-
1465
- ---
1466
-
1467
- ## Edit Syntax
1468
-
1469
- The model expresses entry writes through `<set path="..."><body></set>`.
1470
- The body shape determines the operation. All shaped operations use a
1471
- bash-heredoc-flavored marker family.
1472
-
1473
- ### Marker Grammar
1474
-
1475
- <<IDENT
1476
- body content
1477
- IDENT
1478
-
1479
- Where `IDENT` matches `[A-Z][A-Za-z0-9_]*`. The leading keyword of
1480
- `IDENT` selects the operation; any trailing alphanumeric suffix is
1481
- opaque to operation routing and exists to disambiguate nested markers
1482
- or avoid collisions when the body literally contains the bare keyword
1483
- (same convention as bash heredoc `<<EOF1` vs `<<EOF`).
1484
-
1485
- The opener `<<IDENT` must be preceded by start-of-body, whitespace,
1486
- or `>` (so `vec<<SEARCH` mid-token does not false-trigger). The
1487
- closer is bare `IDENT` with whitespace boundaries on both sides.
1488
-
1489
- Newline-tolerant: the multi-line shape above and the single-line
1490
- `<<IDENT body IDENT` form parse identically.
1491
-
1492
- ### Distinct from Packet Rendering
1493
-
1494
- The engine renders entry bodies in context using a different marker
1495
- shape: `<<:::path...:::path` (see `plugins/helpers.js`). Edit syntax
1496
- is the bare `<<IDENT` form; packet rendering keeps the `:::` sentinel.
1497
- The two grammars are visibly distinct so model emissions and engine
1498
- renderings can never be confused. A `<set>` body echoing the packet
1499
- shape is NOT treated as edit syntax — it falls through to plain-body
1500
- REPLACE with the markers preserved as literal content.
1501
-
1502
- ### Operations
1503
-
1504
- | IDENT prefix | Effect |
1505
- |---|---|
1506
- | `NEW` | Create the entry. Behaves identically to `REPLACE` on existing entries — named separately to align with model intent. |
1507
- | `PREPEND` | Prepend body content to the existing entry. Creates the entry if it doesn't exist. |
1508
- | `APPEND` | Append body content to the existing entry. Creates the entry if it doesn't exist. |
1509
- | `REPLACE` | Replace the entire entry body with the marker content. Standalone (not preceded by `SEARCH`). |
1510
- | `DELETE` | Remove a literal-matching region from the existing entry body. The marker content is the region to remove. |
1511
- | `SEARCH` | Match a literal region in the existing entry body. Must be immediately followed by a `REPLACE` block; the pair is an in-place edit. |
1512
-
1513
- ### SEARCH / REPLACE Pairs
1514
-
1515
- Surgical in-place edits. `SEARCH` must be immediately followed by
1516
- `REPLACE` (no intervening operation):
1517
-
1518
- <set path="src/main.go"><<SEARCH
1519
- old line
1520
- SEARCH
1521
- <<REPLACE
1522
- new line
1523
- REPLACE</set>
1524
-
1525
- Multiple pairs in one `<set>` body apply in order against the
1526
- progressively-edited body.
1527
-
1528
- ### Suffix for Body Collisions
1529
-
1530
- When the body content literally contains a marker keyword (`SEARCH`
1531
- in prose, `<<` in code), the model appends a digit or alphanumeric
1532
- suffix to the IDENT so the inner literal does not prematurely close
1533
- the outer marker:
1534
-
1535
- <set path="docs/grammar.md"><<DOC1
1536
- The opener is <<SEARCH and the closer is bare SEARCH alone on
1537
- a line. Use <<SEARCH1 ... SEARCH1 if your body contains literal
1538
- SEARCH or <<SEARCH tokens.
1539
- DOC1</set>
1540
-
1541
- ### Errors
1542
-
1543
- | Condition | Outcome |
1544
- |---|---|
1545
- | `SEARCH` content not found in current body | conflict (soft) |
1546
- | `DELETE` content not found in current body | conflict (soft) |
1547
- | Lone `SEARCH` (no following `REPLACE`) | parse error |
1548
- | Unclosed marker (opener with no matching `IDENT` closer) | parse error |
1549
- | Non-keyword `IDENT` (e.g. `<<EOF`, `<<DOC`) | routes to REPLACE — inner content becomes the new body |
1550
- | `<set>` body with no `<<IDENT` markers at all | full-body REPLACE (tolerated; not demonstrated to models) |
1551
-
1552
- ### Pattern Matching
1553
-
1554
- The literal-match semantics used by `SEARCH` and `DELETE` are
1555
- delegated to the Hedberg pattern library — see [hedberg](#hedberg).
1556
- Matching is fuzzy on whitespace and indentation; an exact-byte match
1557
- is not required.
1558
-
1559
- ---
1560
-
1561
- ## Hedberg Pattern Library {#hedberg}
1562
-
1563
- The pattern library exposed to every plugin through `core.hooks.hedberg`.
1564
- Used internally by the Edit Syntax (above) for `SEARCH` / `DELETE`
1565
- matching and by `<get>` / `<rm>` for path globs.
1566
-
1567
- | Function | Purpose |
1568
- |---|---|
1569
- | `match(pattern, string)` | Full-string match — paths, equality. |
1570
- | `search(pattern, string)` | Substring search — content filtering. |
1571
- | `replace(text, search, replace, options)` | Patch application; fuzzy on whitespace and indentation. |
1572
- | `generatePatch(path, oldBody, newBody)` | Unified-diff rendering for telemetry. |
1573
-
1574
- Pattern types: glob (picomatch-backed, with `**` cross-slash and
1575
- `!()` negation), regex (`/pattern/flags`), xpath, jsonpath. Detection
1576
- is by syntactic shape — see `src/lib/hedberg/patterns.js`.
1577
-
1578
- ---
1579
-
1580
- ## Response Healing {#response_healing}
1581
-
1582
- The server never throws on model output. "Model behavior" is never an
1583
- acceptable explanation. Recovery order:
1584
-
1585
- 1. Can we recover? Extract the data and continue.
1586
- 2. Can we warn? Log structured warnings.
1587
- 3. Did our structure cause this? Check formatting, prompts.
1588
-
1589
- Termination protocol:
1590
- - `<update status="200|204|422">` → run terminates
1591
- - `<update status="200">` + failed actions → overridden to continuation
1592
- (the claim of doneness is refuted by the failures)
1593
- - `<update status="102">` → run continues
1594
- - Multiple `<update>` → last one wins
1595
- - No `<update>` + action-only tools → healer infers terminal from body
1596
- - No `<update>` + plain text → healer infers terminal from body
1597
- - Repeated turn fingerprints (commands, attributes, or empty turns) →
1598
- cycle detection (`RUMMY_MIN_CYCLES`, `RUMMY_MAX_CYCLE_PERIOD`); after
1599
- detection, strikes accumulate up to `RUMMY_MAX_STRIKES` then close 499.
1600
- - Hard ceiling: `RUMMY_MAX_LOOP_TURNS` caps turns within a single loop,
1601
- regardless of any other guard. There is no per-run cap; a run may
1602
- comprise many loops.
1603
-
1604
- Format normalization:
1605
- - Gemma `\`\`\`tool_code` fences → stripped before parsing
1606
- - Qwen `<|tool_call>` format → normalized to XML
1607
- - OpenAI function_call JSON → normalized to XML
1608
- - Mistral `[TOOL_CALLS]` → normalized to XML
1609
- - Sed alternate delimiters (`s|old|new|`) → parsed like `s/old/new/`
1610
-
1611
- ### XML Parser {#xml_parser}
1612
-
1613
- `src/agent/XmlParser.js` is the syntax layer between raw model output
1614
- and the dispatch pipeline. Models routinely emit malformed XML —
1615
- unclosed tags, missing slashes, mismatched closes, unterminated
1616
- attribute values, embedded code-fences, training-format tool calls.
1617
- The parser's contract is: never throw, never silently drop a tool
1618
- call, surface every recovery as a warning so error.log can route it.
1619
-
1620
- **Pre-flight repair pipeline** (order is load-bearing):
1621
-
1622
- 1. `#normalizeToolCalls` — translate native training formats (gemma
1623
- `\`\`\`tool_code\n<xml>\n\`\`\``, Qwen `<|tool_call>call:NAME{...}`,
1624
- OpenAI `{"name":"...","arguments":{...}}`, Anthropic
1625
- `<tool_use><name>...</name><input>{...}</input></tool_use>`,
1626
- Mistral `[TOOL_CALLS] [{...}]`, harmony role/channel pseudo-tags
1627
- `<|channel>` / `<channel|>`). Catch-all malformed `<|tool_call>`
1628
- tokens become `<error>` blocks (in prose — never with literal
1629
- `<get>`/`<set>`/etc. tags, which would re-enter the parser as
1630
- phantom tool calls).
1631
- 2. `#neutralizeCodeSpans` — entity-encode tag brackets inside
1632
- backtick spans (`` `<get/>` `` → `` `&lt;get/&gt;` ``). Models
1633
- quote instructions; quoted tool names must not parse.
1634
- 3. `#correctMismatchedCloses` — at outer tool depth (stack=1),
1635
- rewrite `</WRONG>` to `</RIGHT>`. htmlparser2 silently drops
1636
- unmatched closes, which would make the explicit recovery path
1637
- unreachable and absorb every sibling command as body text.
1638
- Conservative: only outermost depth; nested mismatches inside
1639
- tool bodies are left alone (bodies are opaque, see below).
1640
- 4. `#balanceAttrQuotes` — close `ATTR="..."` values that never
1641
- quote-close before the next tag. Without this repair,
1642
- htmlparser2 consumes the rest of input as one giant attribute
1643
- value and silently drops every subsequent tool call. Triggers
1644
- only when the value contains no quote, no `>`, and is followed
1645
- by another tag opening or close.
1646
-
1647
- **Body opacity.** Tool bodies are opaque text, not nested XML. The
1648
- model writing a plan with `<get/>` examples in it, SEARCH/REPLACE
1649
- markers in `<set>`, or XML samples in `<known>` all need to survive
1650
- intact. Nested tag opens push onto a per-tool stack; matching closes
1651
- pop. Orphan closes that don't match the stack but match a known tool
1652
- name are treated as recovery (likely typo); unknown orphan closes
1653
- are kept as body text.
1654
-
1655
- **Empty-body recovery.** A new tool tag opens while the current tool
1656
- has no body content yet — the model meant the current tag to
1657
- self-close but typed it paired, or emitted a mismatched close that
1658
- htmlparser2 dropped. Close current, open new, emit recovery warning.
1659
-
1660
- **Per-tool attr-vs-body resolution** (`resolveCommand`). Tools accept
1661
- attributes on the open tag *and* body text inside the tag. If the
1662
- canonical attribute is missing, the body silently fills it. The
1663
- shape per tool:
1664
- - `set` — body parsed via `parseMarkerBody` (see "Edit Syntax"
1665
- above): `<<:::IDENT...:::IDENT` markers route to `operations`
1666
- list; bodies without markers are plain-body REPLACE.
1667
- - `update` — body fills `body`, status defaults to 102 if absent.
1668
- - `get` / `rm` — attr `path` or body fills target. Spread `a` so
1669
- `line` / `limit` / `visibility` / future attrs reach the handler.
1670
- - `search` — attr `path` or body fills target; `results` numeric.
1671
- - `mv` / `cp` — attr `path` (source); attr `to` or body fills dest.
1672
- Spread `a` so `visibility` reaches the handler for batch
1673
- visibility changes.
1674
- - `sh` / `env` — attr `command` or body fills the command.
1675
- - `ask_user` — attr `question`; attr `options` or body for options.
1676
-
1677
- **Tool-call cap.** `RUMMY_MAX_COMMANDS` caps the number of tool
1678
- calls per turn. When hit, remaining commands drop with a warning;
1679
- the model sees one structured error so it can adjust on the next
1680
- turn rather than rediscovering silent truncation.
1681
-
1682
- ---
1683
-
1684
- ## Testing
1685
-
1686
- | Tier | Location | LLM? |
1687
- |------|----------|------|
1688
- | Unit | `src/**/*.test.js` | No |
1689
- | Integration | `test/integration/` | No |
1690
- | Live | `test/live/` | Yes |
1691
- | E2E | `test/e2e/` | Yes |
1692
-
1693
- E2E tests must NEVER mock the LLM. Environment cascade:
1694
- `.env.example` → `.env` → `.env.test`. Always use `npm run test:*`.
1695
-
1696
- ### Spec-Anchored Testing
1697
-
1698
- Integration and e2e tests MUST be anchored to SPEC.md's snake_case
1699
- anchor system. The rule is bidirectional:
1700
-
1701
- 1. **Every SPEC.md heading with a `{#snake_case_id}` anchor has at
1702
- least one integration or e2e test that references it.** The
1703
- reference is literal: an `@snake_case_id` token appearing in the
1704
- test file (suite name, test name, or comment). A heading without
1705
- a test reference is a spec with no verified guarantee.
1706
- 2. **Every integration or e2e test is attributed to at least one
1707
- `@`-reference.** A test describing behavior that isn't in SPEC
1708
- either adds the behavior to SPEC or isn't under the integration
1709
- / e2e tiers.
1710
-
1711
- Enforcement: `npm run test:spec` parses SPEC.md's `{#id}` anchors
1712
- and greps `test/integration/` + `test/e2e/` for `@id` references.
1713
- Missing references fail the script. The check runs in CI and blocks
1714
- merges.
1715
-
1716
- Unit tests (`src/**/*.test.js`) are exempt — they verify
1717
- implementation details, not spec guarantees.
1718
-
1719
- **Why snake_case, not numeric `§X.Y`:** slugs are stable identifiers
1720
- independent of section ordering. Numbering required a rewrite of
1721
- every test reference whenever SPEC.md reorganized. Slugs never
1722
- churn — rename a section's text, leave the anchor, no tests break.
1723
-
1724
- **Anchor naming rules:**
1725
- - Lowercase `[a-z0-9_]`, underscores for word separation.
1726
- - Unique across the whole document.
1727
- - Stable once published: treat as a permanent identifier; renames
1728
- are a breaking change requiring a test sweep.
1729
- - Short and semantic (`entries`, not `section_0_1_the_entry_contract`).
1730
-
1731
- **When a section doesn't get an anchor:** umbrella sections (parents
1732
- of testable subsections, like "The Contract" or "RPC Protocol") and
1733
- pure-documentation sections (env var listings, debugging procedures,
1734
- this section itself) stay as plain headings. The anchor *implies
1735
- testability* — if there's nothing observable to verify, adding an
1736
- anchor creates a permanent false obligation.
1737
-
1738
- **PLUGINS.md and `src/plugins/*/README.md`** participate in the
1739
- same coverage gate as SPEC.md. `npm run test:spec` scans all three
1740
- sources for `{#snake_case_id}` anchors and requires each one to
1741
- have an integration or e2e test that references it. Anchors must
1742
- be unique across the whole doc set — the script errors on
1743
- collision. Conventional prefixes keep namespaces clean: SPEC uses
1744
- bare slugs (`entries`, `primitives`), PLUGINS uses `plugins_*`,
1745
- plugin READMEs use `<plugin>_plugin`.
1746
-
1747
- **Untestable plugin docs (LLM providers, quickstart tutorials,
1748
- loader-level behavior verified only in `test/live/`)** stay as
1749
- plain headings without anchors. Anchors are a commitment to
1750
- verification; skipping the anchor is the honest declaration that
1751
- no integration test exists or is feasible.
1752
-
1753
- ---
1754
-
1755
- ## SQL Functions {#sql_functions}
1756
-
1757
- | Function | Purpose |
1758
- |----------|---------|
1759
- | `schemeOf(path)` | Extract URI scheme |
1760
- | `countTokens(text)` | Token count (`ceil(len / RUMMY_TOKEN_DIVISOR)`) |
1761
- | `hedmatch(pattern, string)` | Full-string pattern match (paths, equality) |
1762
- | `hedsearch(pattern, string)` | Substring pattern search (content filtering) |
1763
- | `hedreplace(pattern, replacement, string)` | Pattern-based replacement |
1764
- | `slugify(text)` | URI-encoded slug, max 80 chars |
1765
-
1766
- See [PLUGINS.md](PLUGINS.md) for the hedberg pattern type reference.
1767
-
1768
- ---
1769
-
1770
- ## Debugging: E2E and Benchmark Results
1771
-
1772
- ### E2E test failures
1773
-
1774
- E2E tests use a temp DB at `/tmp/rummy_test_<timestamp>_<random>.db` (cleaned up after).
1775
- On failure, `AuditClient.assertRun` calls `dumpRun`, which prints a full turn-by-turn audit
1776
- to stdout. That output is in the background task log:
1777
-
1778
- ```
1779
- /tmp/claude-1000/-home-hyzen-repo-rummy-main/<session-id>/tasks/<task-id>.output
1780
- ```
1781
-
1782
- If oversized, the harness saves to:
1783
- ```
1784
- /home/hyzen/.claude/projects/-home-hyzen-repo-rummy-main/<session-id>/tool-results/<id>.txt
1785
- ```
1786
-
1787
- The dump format is: `scheme:state path {attributes}\n body (120 chars)` grouped by turn.
1788
-
1789
- Key things to look for in a dump:
1790
- - **202**: unresolved proposals — model issued `<sh>`, `<rm>`, or `<mv>` that needs approval
1791
- - **413**: budget overflow — assembled context exceeded ceiling (see [budget_enforcement](#budget_enforcement))
1792
- - **403**: policy rejection (ask-mode file writes) or permission denial (writer ∉ `writable_by`)
1793
- - **`error://` entries at status 413**: Turn Demotion fired — model received a directive to demote promotions next turn
1794
- - **`error://` entries at other statuses**: runtime errors (422 parser warnings, 429 cycle detection, 403 policy rejections, 500 dispatch crashes)
1795
- - **`<sh>` in ask mode**: the policy plugin rejected it; check for the corresponding `error://` entry
1796
-
1797
- ### MAB benchmark
1798
-
1799
- Results live in `test/mab/results/<ISO-timestamp>/mab.db`. Latest run = most recent dir.
1800
-
1801
- ```js
1802
- // Query a MAB result DB directly:
1803
- import { DatabaseSync } from 'node:sqlite';
1804
- const db = new DatabaseSync('test/mab/results/<timestamp>/mab.db');
1805
- db.prepare('SELECT * FROM questions').all(); // all questions + scores
1806
- db.prepare('SELECT * FROM runs').all(); // individual model runs
1807
- ```
1808
-
1809
- Run with: `npm run test:mab`
1810
-
1811
- ### LME benchmark
1812
-
1813
- Results live in `test/lme/results/<ISO-timestamp>/lme.db`. Same structure.
1814
-
1815
- Run with: `npm run test:lme`
1816
-
1817
- ---
1818
-
1819
- ## Configuration
1820
-
1821
- Full reference is `.env.example` — these are the load-bearing vars.
1822
-
1823
- **Runtime:**
1824
-
1825
- | Var | Default | Purpose |
1826
- |-----|---------|---------|
1827
- | `PORT` | 3044 | WebSocket port |
1828
- | `RUMMY_HOME` | `~/.rummy` | Local config root. Used by telemetry; available for future per-user state. |
1829
- | `RUMMY_DB_PATH` | `rummy.db` | SQLite path |
1830
- | `RUMMY_MMAP_MB` | 0 | SQLite mmap hint (MB; 0 disables) |
1831
- | `RUMMY_DEBUG` | false | Verbose logging |
1832
-
1833
- **Budget & token math:**
1834
-
1835
- | Var | Default | Purpose |
1836
- |-----|---------|---------|
1837
- | `RUMMY_BUDGET_CEILING` | 0.9 | Fraction of `contextSize` used as ceiling |
1838
- | `RUMMY_MAX_ENTRY_TOKENS` | 512 | `known://` write rejection threshold |
1839
- | `RUMMY_TOKEN_DIVISOR` | 2 | `ceil(chars/N)` token estimate divisor |
1840
-
1841
- **Loop controls:**
1842
-
1843
- | Var | Default | Purpose |
1844
- |-----|---------|---------|
1845
- | `RUMMY_MAX_LOOP_TURNS` | 99 | Per-loop turn cap (no per-run cap) |
1846
- | `RUMMY_MAX_COMMANDS` | 99 | Max parsed tool calls per turn |
1847
- | `RUMMY_MAX_STRIKES` | 3 | Strikes (errors or detected cycles) before close at 499 |
1848
- | `RUMMY_MIN_CYCLES` | 3 | Consecutive repetitions to trigger cycle detection |
1849
- | `RUMMY_MAX_CYCLE_PERIOD` | 4 | Max cycle period checked by healer |
1850
- | `RUMMY_RETENTION_DAYS` | 31 | Days of completed/aborted runs kept |
1851
- | `RUMMY_THINK` | 0 | Reasoning request flag forwarded to LLM provider |
1852
- | `RUMMY_TEMPERATURE` | 0.1 | Default LLM temperature |
1853
- | `RUMMY_RPC_TIMEOUT` | 30000 | RPC timeout (ms) |
1854
- | `RUMMY_FETCH_TIMEOUT` | 300000 | LLM HTTP timeout (ms) |
1855
- | `RUMMY_LLM_DEADLINE` | 600000 | LLM transient-retry deadline (ms). Used as the budget for `warmup` and `rate_limit` categories in `src/llm/retry.js#retryClassified`; gateway/server categories have shorter hardcoded deadlines (30s / 60s). |
1856
- | `RUMMY_LLM_MAX_BACKOFF` | 30000 | Max single backoff between retry attempts (ms) for warmup/rate_limit categories. |
1857
-
1858
- **LLM providers** (plugin-scoped; a provider with no config is inert):
1859
-
1860
- | Var | Purpose |
1861
- |-----|---------|
1862
- | `OPENROUTER_BASE_URL` / `OPENROUTER_API_KEY` | OpenRouter |
1863
- | `OPENAI_BASE_URL` / `OPENAI_API_KEY` | OpenAI-compatible (llama.cpp, OpenAI, etc.) |
1864
- | `OLLAMA_BASE_URL` | Ollama |
1865
- | `XAI_BASE_URL` / `XAI_API_KEY` | xAI |
1866
- | `RUMMY_HTTP_REFERER` / `RUMMY_X_TITLE` | OpenRouter attribution headers |
1867
-
1868
- **Model aliases:**
1869
-
1870
- `RUMMY_MODEL_{alias}={provider/model}` or `{provider/publisher/model}` —
1871
- seeded into `models` table at startup. First path segment picks the
1872
- provider plugin; the rest is the provider's own model identifier. E.g.
1873
- `RUMMY_MODEL_gpt4=openai/gpt-4`, `RUMMY_MODEL_claude=openrouter/anthropic/claude-3-opus`.
1874
- Optional companion: `RUMMY_CONTEXT_{alias}={tokens}` overrides the
1875
- auto-discovered context length.
1876
-
1877
- **External plugins:**
1878
-
1879
- `RUMMY_PLUGIN_{name}={path or npm package}` loads an external plugin
1880
- at startup. Absolute path or published package name (resolved via
1881
- local `node_modules` then global).
1882
-
1883
- **Search:**
1884
-
1885
- | Var | Purpose |
1886
- |-----|---------|
1887
- | `RUMMY_WEB_SEARXNG_URL` | SearXNG instance URL (SearXNG federates Brave / DuckDuckGo / Wikipedia / etc. upstream and normalizes the responses) |
1888
- | `RUMMY_WEB_FETCH_TIMEOUT` | Playwright `page.goto` timeout (ms) |
1889
- | `RUMMY_WEB_PLAYWRIGHT_WS` | Optional CDP endpoint for shared chromium |
1890
- | `RUMMY_WEB_NO_SANDBOX` | `1` to drop chromium's user-namespace sandbox |
1891
- | `RUMMY_WEB_CHROMIUM_HEAP_MB` | Cap chromium's V8 heap (MB) |
1892
-
1893
- **Testing:**
1894
-
1895
- | Var | Purpose |
1896
- |-----|---------|
1897
- | `RUMMY_TEST_MODEL` | Model alias used by test/live/e2e runners |