instar 0.28.77 → 0.28.78
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dashboard/index.html +184 -4
- package/dist/commands/server.d.ts.map +1 -1
- package/dist/commands/server.js +46 -2
- package/dist/commands/server.js.map +1 -1
- package/dist/monitoring/TokenLedger.d.ts +39 -0
- package/dist/monitoring/TokenLedger.d.ts.map +1 -1
- package/dist/monitoring/TokenLedger.js +110 -13
- package/dist/monitoring/TokenLedger.js.map +1 -1
- package/dist/monitoring/TokenLedgerPoller.d.ts.map +1 -1
- package/dist/monitoring/TokenLedgerPoller.js +8 -8
- package/dist/monitoring/TokenLedgerPoller.js.map +1 -1
- package/dist/server/AgentServer.d.ts +4 -0
- package/dist/server/AgentServer.d.ts.map +1 -1
- package/dist/server/AgentServer.js +14 -1
- package/dist/server/AgentServer.js.map +1 -1
- package/dist/server/routes.d.ts +8 -1
- package/dist/server/routes.d.ts.map +1 -1
- package/dist/server/routes.js +98 -0
- package/dist/server/routes.js.map +1 -1
- package/dist/threadline/BackfillCore.d.ts +70 -0
- package/dist/threadline/BackfillCore.d.ts.map +1 -0
- package/dist/threadline/BackfillCore.js +117 -0
- package/dist/threadline/BackfillCore.js.map +1 -0
- package/dist/threadline/ListenerSessionManager.d.ts +35 -0
- package/dist/threadline/ListenerSessionManager.d.ts.map +1 -1
- package/dist/threadline/ListenerSessionManager.js +41 -0
- package/dist/threadline/ListenerSessionManager.js.map +1 -1
- package/dist/threadline/TelegramBridge.d.ts +140 -0
- package/dist/threadline/TelegramBridge.d.ts.map +1 -0
- package/dist/threadline/TelegramBridge.js +224 -0
- package/dist/threadline/TelegramBridge.js.map +1 -0
- package/dist/threadline/ThreadlineMCPServer.d.ts.map +1 -1
- package/dist/threadline/ThreadlineMCPServer.js +5 -0
- package/dist/threadline/ThreadlineMCPServer.js.map +1 -1
- package/dist/threadline/ThreadlineObservability.d.ts +95 -0
- package/dist/threadline/ThreadlineObservability.d.ts.map +1 -0
- package/dist/threadline/ThreadlineObservability.js +310 -0
- package/dist/threadline/ThreadlineObservability.js.map +1 -0
- package/package.json +1 -1
- package/scripts/threadline-bridge-backfill.mjs +379 -0
- package/src/data/builtin-manifest.json +47 -47
- package/upgrades/0.28.78.md +90 -0
- package/upgrades/side-effects/threadline-bridge-backfill.md +203 -0
- package/upgrades/side-effects/threadline-observability-tab.md +206 -0
- package/upgrades/side-effects/threadline-tg-bridge-module.md +196 -0
- package/upgrades/side-effects/token-ledger-bounded-scan.md +230 -0
|
@@ -0,0 +1,230 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Token Ledger — bounded first-boot scan
|
|
3
|
+
slug: token-ledger-bounded-scan
|
|
4
|
+
date: 2026-05-01
|
|
5
|
+
author: echo
|
|
6
|
+
second_pass_required: false
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Summary of the change
|
|
10
|
+
|
|
11
|
+
The token ledger (shipped in v0.28.77 as Phase 1 read-only observability) does
|
|
12
|
+
a synchronous walk of `~/.claude/projects/<encoded-cwd>/<sessionId>.jsonl` on
|
|
13
|
+
every poll tick, and on first boot ingests every file it finds. On Echo's host
|
|
14
|
+
this turned out to mean **119,130 JSONL files / 12 GB of transcripts** — the
|
|
15
|
+
first scan blocked the Node event loop for minutes. The HTTP server accepted
|
|
16
|
+
TCP connections during the scan but never returned a response, including for
|
|
17
|
+
`/health`, which made the lifeline supervisor declare the agent dead and
|
|
18
|
+
restart it in a loop.
|
|
19
|
+
|
|
20
|
+
This change makes the scan bounded in three ways:
|
|
21
|
+
|
|
22
|
+
1. **Per-tick file cap (default 500)** with a persistent cursor across
|
|
23
|
+
ticks, so the ledger backfills incrementally instead of in one pass.
|
|
24
|
+
2. **Async yielding (default every 25 files)** via `setImmediate`, so even
|
|
25
|
+
within a tick the event loop gets to drain HTTP/health traffic.
|
|
26
|
+
3. **Optional max file age (default 30 days at the wiring layer)** so the
|
|
27
|
+
ledger ignores transcripts older than the backfill window. The source
|
|
28
|
+
JSONLs are unchanged and remain the ground truth — the operator can
|
|
29
|
+
widen the window later by passing a larger `maxFileAgeMs`.
|
|
30
|
+
|
|
31
|
+
A new `scanAllAsync()` method wraps the existing scan loop and is the path
|
|
32
|
+
the poller now uses. The original `scanAll()` sync method is preserved for
|
|
33
|
+
callers and tests that don't need yielding (and now honors the per-tick
|
|
34
|
+
cap and age cutoff too).
|
|
35
|
+
|
|
36
|
+
Files touched:
|
|
37
|
+
- `src/monitoring/TokenLedger.ts` — added `maxFileAgeMs`, `maxFilesPerScan`,
|
|
38
|
+
`yieldEveryNFiles` options; refactored `scanAll` into a shared
|
|
39
|
+
`scanInternal` helper plus sync (`scanAll`) and async (`scanAllAsync`)
|
|
40
|
+
entry points; added a persistent `scanCursor` for cross-tick resume.
|
|
41
|
+
- `src/monitoring/TokenLedgerPoller.ts` — switched `tick()` to await
|
|
42
|
+
`scanAllAsync()` (still fire-and-forget; reentry guard unchanged).
|
|
43
|
+
- `src/server/AgentServer.ts` — wires the three caps with sensible defaults
|
|
44
|
+
(30-day age window, 500 files/tick, yield every 25 files).
|
|
45
|
+
- `tests/unit/token-ledger.test.ts` — 3 new tests for cursor resume,
|
|
46
|
+
age cutoff, and async yielding behavior.
|
|
47
|
+
|
|
48
|
+
The change has no decision-point surface. The ledger is still pure
|
|
49
|
+
observability: never gates, blocks, filters, or alters any agent behavior.
|
|
50
|
+
Adding caps does mean the data picture is incomplete during early backfill,
|
|
51
|
+
but only the *speed of completeness* changes — not whether the data ever
|
|
52
|
+
becomes complete.
|
|
53
|
+
|
|
54
|
+
## Decision-point inventory
|
|
55
|
+
|
|
56
|
+
The change has no block/allow/route surface. There is no dispatcher,
|
|
57
|
+
sentinel, gate, or watchdog being added or modified. The "orphans" view
|
|
58
|
+
remains a signal-only list (no kill authority), unchanged.
|
|
59
|
+
|
|
60
|
+
---
|
|
61
|
+
|
|
62
|
+
## 1. Over-block
|
|
63
|
+
|
|
64
|
+
No block/allow surface — over-block not applicable.
|
|
65
|
+
|
|
66
|
+
The closest analogue would be "the dashboard hides data that does exist on
|
|
67
|
+
disk." That's a property of the new caps: a 90-day-old session won't appear
|
|
68
|
+
in `/tokens/summary` until the operator widens `maxFileAgeMs`. This is
|
|
69
|
+
visibility-shaping, not authority. No automation reads
|
|
70
|
+
`/tokens/summary` and acts on it.
|
|
71
|
+
|
|
72
|
+
---
|
|
73
|
+
|
|
74
|
+
## 2. Under-block
|
|
75
|
+
|
|
76
|
+
No block/allow surface — under-block not applicable.
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
## 3. Level-of-abstraction fit
|
|
81
|
+
|
|
82
|
+
The fix lives entirely inside the existing `src/monitoring/TokenLedger.ts`
|
|
83
|
+
file and its poller. It does not introduce a new framework, queue, or
|
|
84
|
+
abstraction. The caps are normal constructor options on the same class
|
|
85
|
+
that already exists. The cursor is a private instance field. The async
|
|
86
|
+
variant uses `setImmediate` — the standard Node primitive for yielding
|
|
87
|
+
to the event loop, which is what every other long-running scanner in this
|
|
88
|
+
codebase uses (see `OrphanProcessReaper`, `MemoryPressureMonitor`).
|
|
89
|
+
|
|
90
|
+
The wiring change in `AgentServer.ts` is co-located with the original
|
|
91
|
+
ledger initialization that landed in v0.28.77 — same try/catch,
|
|
92
|
+
same null-on-failure behavior.
|
|
93
|
+
|
|
94
|
+
---
|
|
95
|
+
|
|
96
|
+
## 4. Signal vs authority compliance
|
|
97
|
+
|
|
98
|
+
**Required reference:** [docs/signal-vs-authority.md](../../docs/signal-vs-authority.md)
|
|
99
|
+
|
|
100
|
+
- [x] No — this change has no block/allow surface.
|
|
101
|
+
|
|
102
|
+
The ledger remains pure read-side observability. The bounding logic does
|
|
103
|
+
not gain any authority — it only changes the cadence at which data
|
|
104
|
+
becomes visible. Future kill-orphan automation, budget enforcement, or
|
|
105
|
+
compaction triggers remain explicitly out of scope and would be separate
|
|
106
|
+
changes with their own review (per the principle, those would feed an
|
|
107
|
+
LLM-backed authority, not become their own brittle blockers).
|
|
108
|
+
|
|
109
|
+
---
|
|
110
|
+
|
|
111
|
+
## 5. Interactions
|
|
112
|
+
|
|
113
|
+
- **Shadowing:** None. No new route, no new file path, no new dispatcher.
|
|
114
|
+
The ledger DB schema is unchanged (no migration needed).
|
|
115
|
+
- **Double-fire:** None. The poller's `running` reentry guard is unchanged
|
|
116
|
+
and still skips a tick if the previous one is in flight. Cursor state
|
|
117
|
+
is mutated only inside the (single-threaded) scan loop; no cross-tick
|
|
118
|
+
race because reentry is blocked.
|
|
119
|
+
- **Races:** Cursor invalidation is handled — if `projectDirs` shrinks
|
|
120
|
+
between ticks (a project directory was deleted), the cursor is reset to
|
|
121
|
+
`{0, 0}` rather than indexing past the end. The `INSERT OR IGNORE` on
|
|
122
|
+
`request_id` continues to make ingest idempotent regardless of cursor
|
|
123
|
+
re-traversal.
|
|
124
|
+
- **Feedback loops:** None. The caps don't create any new path back into
|
|
125
|
+
Claude Code or the agent's behavior. The ledger continues to be
|
|
126
|
+
downstream of Claude Code's logging.
|
|
127
|
+
- **Cross-restart:** The cursor resets on process restart (it's an
|
|
128
|
+
instance field, not persisted). This is correct: after a restart, the
|
|
129
|
+
ledger DB itself records which files have been read up to which offset
|
|
130
|
+
(`file_offsets` table), so re-scanning previously-ingested files is
|
|
131
|
+
cheap (the offset check fires before any line parsing). The cursor
|
|
132
|
+
exists only to bound *intra-process* work; per-file resume is already
|
|
133
|
+
handled by the durable offset table from v0.28.77.
|
|
134
|
+
|
|
135
|
+
One subtle interaction worth naming: the `maxFileAgeMs` filter uses
|
|
136
|
+
`fs.statSync(fp).mtimeMs`. If a JSONL is *appended to* (Claude Code adds
|
|
137
|
+
a turn to an existing session), the mtime updates and the file becomes
|
|
138
|
+
in-window again — so active sessions never get blackholed by the age cap.
|
|
139
|
+
Only sessions that are truly dormant past the cap drop out of the rotation.
|
|
140
|
+
Verified by test: the `respects maxFileAgeMs` test backdates a file with
|
|
141
|
+
`fs.utimesSync` to confirm the filter triggers on stale mtime.
|
|
142
|
+
|
|
143
|
+
---
|
|
144
|
+
|
|
145
|
+
## 6. External surfaces
|
|
146
|
+
|
|
147
|
+
- **Other agents on the same machine:** No effect on their behavior. They
|
|
148
|
+
each gain the bounded-scan defaults when they upgrade.
|
|
149
|
+
- **Other users of the install base:** Pure additive option surface. Old
|
|
150
|
+
callers passing only `dbPath` and `claudeProjectsDir` get the new
|
|
151
|
+
defaults automatically. No existing API contract changed.
|
|
152
|
+
- **External systems:** None. No new outbound calls.
|
|
153
|
+
- **Persistent state:** No schema migration. The existing `file_offsets`
|
|
154
|
+
table continues to drive per-file resume. The new cursor is in-memory
|
|
155
|
+
only.
|
|
156
|
+
- **Timing/runtime:** First-boot scan now spans many ticks instead of
|
|
157
|
+
blocking the event loop. On Echo's box (119k files, mostly stale): with
|
|
158
|
+
defaults, the first useful tick reads ~500 files within a 30-day window,
|
|
159
|
+
yielding every 25 files; subsequent ticks pick up the cursor. Steady
|
|
160
|
+
state once backfill is done is identical to v0.28.77 (same offset-check
|
|
161
|
+
no-op for already-ingested files).
|
|
162
|
+
|
|
163
|
+
The reader remains **strictly read-only against `~/.claude/projects/`**.
|
|
164
|
+
No write fds are ever opened.
|
|
165
|
+
|
|
166
|
+
---
|
|
167
|
+
|
|
168
|
+
## 7. Rollback cost
|
|
169
|
+
|
|
170
|
+
Pure additive change. Rollback steps:
|
|
171
|
+
|
|
172
|
+
1. Revert the commit. Ship as next patch release.
|
|
173
|
+
2. The ledger DB at `<stateDir>/server-data/token-ledger.db` is unchanged
|
|
174
|
+
on disk (no schema migration). Reverting goes back to unbounded scan
|
|
175
|
+
behavior — which is broken on agents with deep history, so we'd want
|
|
176
|
+
to either (a) deploy a different fix, or (b) ship a config option that
|
|
177
|
+
defaults the agent to NOT initialize the ledger at all on the affected
|
|
178
|
+
hosts. But the DB itself is fine.
|
|
179
|
+
3. No agent state repair needed.
|
|
180
|
+
|
|
181
|
+
Estimated rollback time: minutes. Pure code revert.
|
|
182
|
+
|
|
183
|
+
If the bounded defaults turn out to be wrong (too aggressive), the operator
|
|
184
|
+
can override per-agent via the AgentServer construction call (or, if a
|
|
185
|
+
config knob is added later, via `.instar/config.json`). No re-deploy needed
|
|
186
|
+
to widen the window — the data is still in `~/.claude/projects/`.
|
|
187
|
+
|
|
188
|
+
---
|
|
189
|
+
|
|
190
|
+
## Conclusion
|
|
191
|
+
|
|
192
|
+
This change is a containment fix for a v0.28.77 regression: the ledger
|
|
193
|
+
shipped without considering agents that have years of accumulated Claude
|
|
194
|
+
Code history, and the unbounded synchronous first scan blocked the
|
|
195
|
+
server's event loop. The fix bounds work via three independent
|
|
196
|
+
mechanisms (per-tick file cap, intra-tick yielding, age cutoff) so that
|
|
197
|
+
no plausible JSONL tree can stall the agent. None of these mechanisms
|
|
198
|
+
introduce decision-point surface or change the ledger's read-only,
|
|
199
|
+
observability-only character.
|
|
200
|
+
|
|
201
|
+
The change is clear to ship.
|
|
202
|
+
|
|
203
|
+
---
|
|
204
|
+
|
|
205
|
+
## Second-pass review (if required)
|
|
206
|
+
|
|
207
|
+
Not required. The change does not touch any of the trigger criteria from
|
|
208
|
+
the side-effects-review skill (block/allow on messaging or dispatch,
|
|
209
|
+
session lifecycle, context exhaustion/compaction, coherence/idempotency/
|
|
210
|
+
trust, sentinel/guard/gate/watchdog).
|
|
211
|
+
|
|
212
|
+
---
|
|
213
|
+
|
|
214
|
+
## Evidence pointers
|
|
215
|
+
|
|
216
|
+
- Reproduction (before fix): start v0.28.77 server on a host with deep
|
|
217
|
+
Claude Code history. `curl http://localhost:4042/health` hangs;
|
|
218
|
+
`sample <pid> 1` shows the main thread spending 100% of its time in
|
|
219
|
+
`uv_fs_stat` callbacks under `Builtins_InterpreterEntryTrampoline`
|
|
220
|
+
(i.e., a JS loop hammering the filesystem). The lifeline supervisor
|
|
221
|
+
declares the server unhealthy and restarts it in a loop.
|
|
222
|
+
- Reproduction (after fix): same host, same `~/.claude/projects/` tree.
|
|
223
|
+
`curl http://localhost:4042/health` returns within a few hundred ms
|
|
224
|
+
immediately on boot. `curl /tokens/summary` returns valid JSON
|
|
225
|
+
(initially with a small subset of recent sessions; backfill fills in
|
|
226
|
+
across subsequent ticks).
|
|
227
|
+
- Unit tests: `tests/unit/token-ledger.test.ts` — 15/15 passing locally
|
|
228
|
+
on `fix/token-ledger-bounded-scan` branch. New tests cover the three
|
|
229
|
+
bounding mechanisms (cursor resume, age cutoff, async yielding).
|
|
230
|
+
- Typecheck: `npx tsc --noEmit` clean.
|