instar 0.28.77 → 0.28.78

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (46) hide show
  1. package/dashboard/index.html +184 -4
  2. package/dist/commands/server.d.ts.map +1 -1
  3. package/dist/commands/server.js +46 -2
  4. package/dist/commands/server.js.map +1 -1
  5. package/dist/monitoring/TokenLedger.d.ts +39 -0
  6. package/dist/monitoring/TokenLedger.d.ts.map +1 -1
  7. package/dist/monitoring/TokenLedger.js +110 -13
  8. package/dist/monitoring/TokenLedger.js.map +1 -1
  9. package/dist/monitoring/TokenLedgerPoller.d.ts.map +1 -1
  10. package/dist/monitoring/TokenLedgerPoller.js +8 -8
  11. package/dist/monitoring/TokenLedgerPoller.js.map +1 -1
  12. package/dist/server/AgentServer.d.ts +4 -0
  13. package/dist/server/AgentServer.d.ts.map +1 -1
  14. package/dist/server/AgentServer.js +14 -1
  15. package/dist/server/AgentServer.js.map +1 -1
  16. package/dist/server/routes.d.ts +8 -1
  17. package/dist/server/routes.d.ts.map +1 -1
  18. package/dist/server/routes.js +98 -0
  19. package/dist/server/routes.js.map +1 -1
  20. package/dist/threadline/BackfillCore.d.ts +70 -0
  21. package/dist/threadline/BackfillCore.d.ts.map +1 -0
  22. package/dist/threadline/BackfillCore.js +117 -0
  23. package/dist/threadline/BackfillCore.js.map +1 -0
  24. package/dist/threadline/ListenerSessionManager.d.ts +35 -0
  25. package/dist/threadline/ListenerSessionManager.d.ts.map +1 -1
  26. package/dist/threadline/ListenerSessionManager.js +41 -0
  27. package/dist/threadline/ListenerSessionManager.js.map +1 -1
  28. package/dist/threadline/TelegramBridge.d.ts +140 -0
  29. package/dist/threadline/TelegramBridge.d.ts.map +1 -0
  30. package/dist/threadline/TelegramBridge.js +224 -0
  31. package/dist/threadline/TelegramBridge.js.map +1 -0
  32. package/dist/threadline/ThreadlineMCPServer.d.ts.map +1 -1
  33. package/dist/threadline/ThreadlineMCPServer.js +5 -0
  34. package/dist/threadline/ThreadlineMCPServer.js.map +1 -1
  35. package/dist/threadline/ThreadlineObservability.d.ts +95 -0
  36. package/dist/threadline/ThreadlineObservability.d.ts.map +1 -0
  37. package/dist/threadline/ThreadlineObservability.js +310 -0
  38. package/dist/threadline/ThreadlineObservability.js.map +1 -0
  39. package/package.json +1 -1
  40. package/scripts/threadline-bridge-backfill.mjs +379 -0
  41. package/src/data/builtin-manifest.json +47 -47
  42. package/upgrades/0.28.78.md +90 -0
  43. package/upgrades/side-effects/threadline-bridge-backfill.md +203 -0
  44. package/upgrades/side-effects/threadline-observability-tab.md +206 -0
  45. package/upgrades/side-effects/threadline-tg-bridge-module.md +196 -0
  46. package/upgrades/side-effects/token-ledger-bounded-scan.md +230 -0
@@ -0,0 +1,230 @@
1
+ ---
2
+ title: Token Ledger — bounded first-boot scan
3
+ slug: token-ledger-bounded-scan
4
+ date: 2026-05-01
5
+ author: echo
6
+ second_pass_required: false
7
+ ---
8
+
9
+ ## Summary of the change
10
+
11
+ The token ledger (shipped in v0.28.77 as Phase 1 read-only observability) does
12
+ a synchronous walk of `~/.claude/projects/<encoded-cwd>/<sessionId>.jsonl` on
13
+ every poll tick, and on first boot ingests every file it finds. On Echo's host
14
+ this turned out to mean **119,130 JSONL files / 12 GB of transcripts** — the
15
+ first scan blocked the Node event loop for minutes. The HTTP server accepted
16
+ TCP connections during the scan but never returned a response, including for
17
+ `/health`, which made the lifeline supervisor declare the agent dead and
18
+ restart it in a loop.
19
+
20
+ This change makes the scan bounded in three ways:
21
+
22
+ 1. **Per-tick file cap (default 500)** with a persistent cursor across
23
+ ticks, so the ledger backfills incrementally instead of in one pass.
24
+ 2. **Async yielding (default every 25 files)** via `setImmediate`, so even
25
+ within a tick the event loop gets to drain HTTP/health traffic.
26
+ 3. **Optional max file age (default 30 days at the wiring layer)** so the
27
+ ledger ignores transcripts older than the backfill window. The source
28
+ JSONLs are unchanged and remain the ground truth — the operator can
29
+ widen the window later by passing a larger `maxFileAgeMs`.
30
+
31
+ A new `scanAllAsync()` method wraps the existing scan loop and is the path
32
+ the poller now uses. The original `scanAll()` sync method is preserved for
33
+ callers and tests that don't need yielding (and now honors the per-tick
34
+ cap and age cutoff too).
35
+
36
+ Files touched:
37
+ - `src/monitoring/TokenLedger.ts` — added `maxFileAgeMs`, `maxFilesPerScan`,
38
+ `yieldEveryNFiles` options; refactored `scanAll` into a shared
39
+ `scanInternal` helper plus sync (`scanAll`) and async (`scanAllAsync`)
40
+ entry points; added a persistent `scanCursor` for cross-tick resume.
41
+ - `src/monitoring/TokenLedgerPoller.ts` — switched `tick()` to await
42
+ `scanAllAsync()` (still fire-and-forget; reentry guard unchanged).
43
+ - `src/server/AgentServer.ts` — wires the three caps with sensible defaults
44
+ (30-day age window, 500 files/tick, yield every 25 files).
45
+ - `tests/unit/token-ledger.test.ts` — 3 new tests for cursor resume,
46
+ age cutoff, and async yielding behavior.
47
+
48
+ The change has no decision-point surface. The ledger is still pure
49
+ observability: never gates, blocks, filters, or alters any agent behavior.
50
+ Adding caps does mean the data picture is incomplete during early backfill,
51
+ but only the *speed of completeness* changes — not whether the data ever
52
+ becomes complete.
53
+
54
+ ## Decision-point inventory
55
+
56
+ The change has no block/allow/route surface. There is no dispatcher,
57
+ sentinel, gate, or watchdog being added or modified. The "orphans" view
58
+ remains a signal-only list (no kill authority), unchanged.
59
+
60
+ ---
61
+
62
+ ## 1. Over-block
63
+
64
+ No block/allow surface — over-block not applicable.
65
+
66
+ The closest analogue would be "the dashboard hides data that does exist on
67
+ disk." That's a property of the new caps: a 90-day-old session won't appear
68
+ in `/tokens/summary` until the operator widens `maxFileAgeMs`. This is
69
+ visibility-shaping, not authority. No automation reads
70
+ `/tokens/summary` and acts on it.
71
+
72
+ ---
73
+
74
+ ## 2. Under-block
75
+
76
+ No block/allow surface — under-block not applicable.
77
+
78
+ ---
79
+
80
+ ## 3. Level-of-abstraction fit
81
+
82
+ The fix lives entirely inside the existing `src/monitoring/TokenLedger.ts`
83
+ file and its poller. It does not introduce a new framework, queue, or
84
+ abstraction. The caps are normal constructor options on the same class
85
+ that already exists. The cursor is a private instance field. The async
86
+ variant uses `setImmediate` — the standard Node primitive for yielding
87
+ to the event loop, which is what every other long-running scanner in this
88
+ codebase uses (see `OrphanProcessReaper`, `MemoryPressureMonitor`).
89
+
90
+ The wiring change in `AgentServer.ts` is co-located with the original
91
+ ledger initialization that landed in v0.28.77 — same try/catch,
92
+ same null-on-failure behavior.
93
+
94
+ ---
95
+
96
+ ## 4. Signal vs authority compliance
97
+
98
+ **Required reference:** [docs/signal-vs-authority.md](../../docs/signal-vs-authority.md)
99
+
100
+ - [x] No — this change has no block/allow surface.
101
+
102
+ The ledger remains pure read-side observability. The bounding logic does
103
+ not gain any authority — it only changes the cadence at which data
104
+ becomes visible. Future kill-orphan automation, budget enforcement, or
105
+ compaction triggers remain explicitly out of scope and would be separate
106
+ changes with their own review (per the principle, those would feed an
107
+ LLM-backed authority, not become their own brittle blockers).
108
+
109
+ ---
110
+
111
+ ## 5. Interactions
112
+
113
+ - **Shadowing:** None. No new route, no new file path, no new dispatcher.
114
+ The ledger DB schema is unchanged (no migration needed).
115
+ - **Double-fire:** None. The poller's `running` reentry guard is unchanged
116
+ and still skips a tick if the previous one is in flight. Cursor state
117
+ is mutated only inside the (single-threaded) scan loop; no cross-tick
118
+ race because reentry is blocked.
119
+ - **Races:** Cursor invalidation is handled — if `projectDirs` shrinks
120
+ between ticks (a project directory was deleted), the cursor is reset to
121
+ `{0, 0}` rather than indexing past the end. The `INSERT OR IGNORE` on
122
+ `request_id` continues to make ingest idempotent regardless of cursor
123
+ re-traversal.
124
+ - **Feedback loops:** None. The caps don't create any new path back into
125
+ Claude Code or the agent's behavior. The ledger continues to be
126
+ downstream of Claude Code's logging.
127
+ - **Cross-restart:** The cursor resets on process restart (it's an
128
+ instance field, not persisted). This is correct: after a restart, the
129
+ ledger DB itself records which files have been read up to which offset
130
+ (`file_offsets` table), so re-scanning previously-ingested files is
131
+ cheap (the offset check fires before any line parsing). The cursor
132
+ exists only to bound *intra-process* work; per-file resume is already
133
+ handled by the durable offset table from v0.28.77.
134
+
135
+ One subtle interaction worth naming: the `maxFileAgeMs` filter uses
136
+ `fs.statSync(fp).mtimeMs`. If a JSONL is *appended to* (Claude Code adds
137
+ a turn to an existing session), the mtime updates and the file becomes
138
+ in-window again — so active sessions never get blackholed by the age cap.
139
+ Only sessions that are truly dormant past the cap drop out of the rotation.
140
+ Verified by test: the `respects maxFileAgeMs` test backdates a file with
141
+ `fs.utimesSync` to confirm the filter triggers on stale mtime.
142
+
143
+ ---
144
+
145
+ ## 6. External surfaces
146
+
147
+ - **Other agents on the same machine:** No effect on their behavior. They
148
+ each gain the bounded-scan defaults when they upgrade.
149
+ - **Other users of the install base:** Pure additive option surface. Old
150
+ callers passing only `dbPath` and `claudeProjectsDir` get the new
151
+ defaults automatically. No existing API contract changed.
152
+ - **External systems:** None. No new outbound calls.
153
+ - **Persistent state:** No schema migration. The existing `file_offsets`
154
+ table continues to drive per-file resume. The new cursor is in-memory
155
+ only.
156
+ - **Timing/runtime:** First-boot scan now spans many ticks instead of
157
+ blocking the event loop. On Echo's box (119k files, mostly stale): with
158
+ defaults, the first useful tick reads ~500 files within a 30-day window,
159
+ yielding every 25 files; subsequent ticks pick up the cursor. Steady
160
+ state once backfill is done is identical to v0.28.77 (same offset-check
161
+ no-op for already-ingested files).
162
+
163
+ The reader remains **strictly read-only against `~/.claude/projects/`**.
164
+ No write fds are ever opened.
165
+
166
+ ---
167
+
168
+ ## 7. Rollback cost
169
+
170
+ Pure additive change. Rollback steps:
171
+
172
+ 1. Revert the commit. Ship as next patch release.
173
+ 2. The ledger DB at `<stateDir>/server-data/token-ledger.db` is unchanged
174
+ on disk (no schema migration). Reverting goes back to unbounded scan
175
+ behavior — which is broken on agents with deep history, so we'd want
176
+ to either (a) deploy a different fix, or (b) ship a config option that
177
+ defaults the agent to NOT initialize the ledger at all on the affected
178
+ hosts. But the DB itself is fine.
179
+ 3. No agent state repair needed.
180
+
181
+ Estimated rollback time: minutes. Pure code revert.
182
+
183
+ If the bounded defaults turn out to be wrong (too aggressive), the operator
184
+ can override per-agent via the AgentServer construction call (or, if a
185
+ config knob is added later, via `.instar/config.json`). No re-deploy needed
186
+ to widen the window — the data is still in `~/.claude/projects/`.
187
+
188
+ ---
189
+
190
+ ## Conclusion
191
+
192
+ This change is a containment fix for a v0.28.77 regression: the ledger
193
+ shipped without considering agents that have years of accumulated Claude
194
+ Code history, and the unbounded synchronous first scan blocked the
195
+ server's event loop. The fix bounds work via three independent
196
+ mechanisms (per-tick file cap, intra-tick yielding, age cutoff) so that
197
+ no plausible JSONL tree can stall the agent. None of these mechanisms
198
+ introduce decision-point surface or change the ledger's read-only,
199
+ observability-only character.
200
+
201
+ The change is clear to ship.
202
+
203
+ ---
204
+
205
+ ## Second-pass review (if required)
206
+
207
+ Not required. The change does not touch any of the trigger criteria from
208
+ the side-effects-review skill (block/allow on messaging or dispatch,
209
+ session lifecycle, context exhaustion/compaction, coherence/idempotency/
210
+ trust, sentinel/guard/gate/watchdog).
211
+
212
+ ---
213
+
214
+ ## Evidence pointers
215
+
216
+ - Reproduction (before fix): start v0.28.77 server on a host with deep
217
+ Claude Code history. `curl http://localhost:4042/health` hangs;
218
+ `sample <pid> 1` shows the main thread spending 100% of its time in
219
+ `uv_fs_stat` callbacks under `Builtins_InterpreterEntryTrampoline`
220
+ (i.e., a JS loop hammering the filesystem). The lifeline supervisor
221
+ declares the server unhealthy and restarts it in a loop.
222
+ - Reproduction (after fix): same host, same `~/.claude/projects/` tree.
223
+ `curl http://localhost:4042/health` returns within a few hundred ms
224
+ immediately on boot. `curl /tokens/summary` returns valid JSON
225
+ (initially with a small subset of recent sessions; backfill fills in
226
+ across subsequent ticks).
227
+ - Unit tests: `tests/unit/token-ledger.test.ts` — 15/15 passing locally
228
+ on `fix/token-ledger-bounded-scan` branch. New tests cover the three
229
+ bounding mechanisms (cursor resume, age cutoff, async yielding).
230
+ - Typecheck: `npx tsc --noEmit` clean.