instar 1.2.62 → 1.2.63

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,84 @@
1
+ # Upgrade Guide — rate-limit recovery now reaches non-topic-bound sessions
2
+
3
+ <!-- bump: patch -->
4
+ <!-- patch = bug fixes, refactors, test additions, doc updates -->
5
+
6
+ ## What Changed
7
+
8
+ **Fix: the rate-limit sentinel now actually recovers a session that isn't bound
9
+ to a Telegram topic.**
10
+
11
+ When Anthropic's server-side throttle hit ("Server is temporarily limiting
12
+ requests · not your usage limit"), the RateLimitSentinel detected it and ran its
13
+ backoff correctly — but both of its recovery actions started by asking "is this
14
+ session bound to a Telegram topic?" and silently did nothing if the answer was
15
+ no. A developer's interactive Claude Code window isn't bound to a topic, so:
16
+
17
+ - the "throttled, backing off, you're not dropped" notice went nowhere, and
18
+ - the resume nudge that wakes the session back up went nowhere.
19
+
20
+ From the outside this was indistinguishable from the sentinel not existing — the
21
+ exact thing observed: a throttle sat on screen for minutes with no recovery and
22
+ no signal. (v1.2.33 shipped past green tests because every test fixture was
23
+ topic-bound; the non-topic-bound path was never exercised.)
24
+
25
+ This release makes recovery reachable under **all** session conditions:
26
+
27
+ - **Resume** — topic-bound sessions get the topic-tagged nudge as before;
28
+ non-topic-bound sessions get a trusted in-process injection
29
+ (`SessionManager.injectInternalMessage`) that bypasses the topic-prefix
30
+ requirement. This path is in-process only — never exposed over HTTP.
31
+ - **Notify** — the user notice goes to the session's own topic, else falls back
32
+ to the always-available lifeline (system) topic, else is written as a loud
33
+ `recovery-unreachable` audit event. Never a silent drop.
34
+ - **Audit** — every recovery attempt records `recovery-reached` /
35
+ `recovery-unreachable` to `logs/sentinel-events.jsonl`, and unreachable events
36
+ also land in `.instar/sentinel-alerts.json` so the dashboard surfaces them even
37
+ when Telegram can't be reached.
38
+
39
+ The reachability branching was lifted out of the inline server closures into
40
+ `sentinelWiring.buildRateLimitRecoveryDeps()` so it is unit-testable — closing
41
+ the gap (inline + untestable logic) that let this ship past green tests.
42
+
43
+ Spec: `Sentinel Reachability + Worktree Isolation` (the worktree-clone and
44
+ socket/silence-default parts already shipped via #334/#340/#351; this is the
45
+ remaining rate-limit recovery piece).
46
+ Side-effects review: `upgrades/side-effects/rate-limit-recovery-reachability.md`.
47
+
48
+ ## What to Tell Your User
49
+
50
+ If I'm ever hit by one of Anthropic's brief "servers are busy" throttles while
51
+ you're talking to me in a plain window (not a Telegram topic), I'll now actually
52
+ tell you I'm throttled and backing off — and I'll wake myself back up and let you
53
+ know when it clears, instead of going silent until you poke me. Nothing changes
54
+ for the normal Telegram case; this just closes the gap where a throttle in a
55
+ direct dev window left you staring at a frozen screen.
56
+
57
+ ## Summary of New Capabilities
58
+
59
+ No new user-facing capabilities — this is a behavior fix to existing rate-limit
60
+ recovery. (Internal: `SessionManager.injectInternalMessage` for trusted
61
+ in-process recovery nudges; `buildRateLimitRecoveryDeps` reachability factory.)
62
+
63
+ ## Evidence
64
+
65
+ **Live reproduction (the bar that was missed last time).** Drove the *real*
66
+ RateLimitSentinel lifecycle through the *real* recovery factory (wired exactly as
67
+ `server.ts` wires it) against a *real* tmux pane that was **not** bound to any
68
+ topic — the exact failure condition. Result: the resume nudge landed in the pane
69
+ via the real internal-injection path, the "throttled, backing off" notice and the
70
+ "back online" check-in both reached the lifeline topic, and the audit log
71
+ recorded `recovery-reached` with zero `recovery-unreachable`. Before the fix all
72
+ of those were silent.
73
+
74
+ **Regression coverage (CI-permanent):**
75
+ - `tests/unit/rate-limit-recovery-reachability.test.ts` (9) — both sides of every
76
+ reachability boundary: topic / lifeline / internal-injection / unreachable.
77
+ - `tests/integration/rate-limit-recovery-sentinel-lifecycle.test.ts` (2) — the
78
+ real sentinel lifecycle (detect→backoff→resume→verify→recovered) driving the
79
+ factory to the lifeline for a non-topic-bound session, plus the
80
+ never-silent unreachable case.
81
+ - `tests/unit/rate-limit-recovery-wiring.test.ts` (6) — wiring integrity (server
82
+ wires the real primitives, not no-ops) + the InputGuard HTTP boundary.
83
+
84
+ `tsc` clean. The pre-existing rate-limit unit/integration/e2e suites stay green.
@@ -0,0 +1,116 @@
1
+ # Side-Effects Review — RateLimitSentinel recovery reachability
2
+
3
+ **Version / slug:** `rate-limit-recovery-reachability`
4
+ **Date:** `2026-05-24`
5
+ **Author:** `echo`
6
+ **Second-pass reviewer:** `internal-adversarial` (external /crossreview tooling not wired on this host)
7
+
8
+ ## Summary of the change
9
+
10
+ The RateLimitSentinel detects Anthropic's server-side throttle correctly and
11
+ schedules its backoff correctly, but its two recovery closures in `server.ts`
12
+ (`rateLimitResume`, `rateLimitNotify`) both began with
13
+ `const topicId = telegram?.getTopicForSession(sessionName); if (topicId == null) return`.
14
+ For a session not bound to any Telegram topic — e.g. a developer's interactive
15
+ Claude Code window — both paths silently no-opped. Detection + backoff ran, then
16
+ the resume nudge and the user notice dropped on the floor. From the user's seat
17
+ this is indistinguishable from no sentinel existing (the v1.2.33 ship that
18
+ "recovered" in tests but never in Justin's real dev window).
19
+
20
+ This PR makes recovery reachable under **all** session conditions:
21
+
22
+ - **Resume** — topic-bound sessions get the topic-tagged nudge through the
23
+ provenance-checked `injectMessage` path (unchanged). Non-topic-bound sessions
24
+ get a new `SessionManager.injectInternalMessage` path that bypasses the
25
+ topic-prefix InputGuard requirement (trusted in-process caller, logged with
26
+ `source: 'sentinel-recovery'`).
27
+ - **Notify** — session topic → lifeline (system) topic → a loud
28
+ `recovery-unreachable` audit event. Never a silent return.
29
+ - **Audit** — every recovery attempt records `recovery-reached` /
30
+ `recovery-unreachable` to `logs/sentinel-events.jsonl`; an unreachable event
31
+ also appends to `.instar/sentinel-alerts.json` (rolling 200) so the dashboard
32
+ surfaces it even when Telegram is unavailable.
33
+
34
+ The reachability branching was extracted from the inline closures into
35
+ `sentinelWiring.buildRateLimitRecoveryDeps()` so it is unit-testable — the exact
36
+ gap that let the bug ship past green tests (the logic was inline + untestable).
37
+
38
+ **Files touched:**
39
+ - `src/core/SessionManager.ts` (+`injectInternalMessage`, internal-only, NOT HTTP-exposed).
40
+ - `src/monitoring/sentinelWiring.ts` (+`buildRateLimitRecoveryDeps`, `RATE_LIMIT_RESUME_NUDGE`, types).
41
+ - `src/commands/server.ts` (rewired the two closures through the factory + a `recordRecovery` audit writer).
42
+ - `tests/unit/rate-limit-recovery-reachability.test.ts` (new, 9 cases — both sides of every reachability boundary).
43
+ - `tests/unit/rate-limit-recovery-wiring.test.ts` (new, 6 cases — T5 wiring integrity + T7 InputGuard boundary).
44
+ - `upgrades/NEXT.md`, `package.json` bump.
45
+
46
+ ## Decision-point inventory
47
+
48
+ - **Topic vs. non-topic resume path** — *modify*. Pure presence check on
49
+ `getTopicForSession`; topic path is byte-for-byte the old behavior, non-topic
50
+ path is the new internal injection. No judgment.
51
+ - **Notify fallback order** — *new*. session-topic → lifeline → audit. Each step
52
+ is a null check; deterministic, no LLM.
53
+ - **InputGuard bypass (`injectInternalMessage`)** — *new security-relevant path*.
54
+ Covered in §Security boundary below.
55
+ - **Audit sink** — *new*. Append-only, best-effort, wrapped in try/catch so a
56
+ logging failure can never break a recovery nudge.
57
+
58
+ ## Over-block / under-block analysis
59
+
60
+ - **Under-block (the bug):** recovery reaching nothing for non-topic-bound
61
+ sessions. Closed — every path now terminates in a delivery or a recorded
62
+ unreachable event.
63
+ - **Over-block:** none introduced. The topic-bound path is unchanged. The new
64
+ internal injection only fires when there is genuinely no topic, and only from
65
+ the in-process sentinel.
66
+
67
+ ## Security boundary (InputGuard)
68
+
69
+ `injectInternalMessage` bypasses the topic-prefix provenance check, so it is a
70
+ trust boundary. Mitigations:
71
+ - It is a method on `SessionManager` only — **not** wired to any HTTP route
72
+ (asserted by test T7: `routes.ts` must not contain `injectInternalMessage`).
73
+ - All HTTP injection continues through `injectMessage`, which enforces
74
+ provenance / prefix.
75
+ - Every internal injection logs an `internal-recovery-injection` security event
76
+ with the `source` label, so the audit log distinguishes trusted recovery
77
+ nudges from user/topic traffic.
78
+
79
+ ## Signal vs. authority
80
+
81
+ No new blocking authority. The sentinel is a bounded recovery primitive; the new
82
+ code only adds *delivery channels* and an *audit trail*. Nothing gates user
83
+ actions or other sessions.
84
+
85
+ ## Interactions
86
+
87
+ - **CompactionSentinel / other sentinels** — unchanged. The zombie-veto
88
+ composition and bidirectional defer logic are untouched.
89
+ - **SentinelNotifier (socket/silence trio)** — untouched. Their default-off
90
+ `sentinelTelegramEscalation` is deliberately preserved (the post-2026-05-22
91
+ anti-flood design). This PR does **not** flip that default — the original
92
+ spec's Part A3 is intentionally dropped as superseded.
93
+ - **InputGuard** — only adds a new logged event type; existing provenance flow
94
+ is unchanged.
95
+
96
+ ## Migration parity
97
+
98
+ No agent-installed files change. This is server-side binary code
99
+ (`SessionManager`, `sentinelWiring`, `server.ts`) shipped via npm update — every
100
+ existing agent picks it up automatically on update with no migration entry. No
101
+ new config default (the RateLimitSentinel is already default-on), no hook, no
102
+ skill, no CLAUDE.md template capability. (Worktree clone-isolation and the
103
+ socket/silence default already shipped separately via #334/#340/#351 and are out
104
+ of scope here.)
105
+
106
+ ## Rollback
107
+
108
+ Single, independent, trivially reversible: `git revert` restores the two inline
109
+ closures (reintroducing the silent no-op). The new `injectInternalMessage` and
110
+ factory become dead code on revert; no state migration to undo.
111
+
112
+ ## Out of scope
113
+
114
+ - Flipping `sentinelTelegramEscalation` to default-on (superseded by anti-flood).
115
+ - Socket-disconnect / active-silence sentinel reachability (default-off by design).
116
+ - Worktree clone isolation (already shipped via #334).