instar 1.2.62 → 1.2.63
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/commands/server.d.ts.map +1 -1
- package/dist/commands/server.js +60 -24
- package/dist/commands/server.js.map +1 -1
- package/dist/core/SessionManager.d.ts +18 -0
- package/dist/core/SessionManager.d.ts.map +1 -1
- package/dist/core/SessionManager.js +34 -0
- package/dist/core/SessionManager.js.map +1 -1
- package/dist/monitoring/sentinelWiring.d.ts +28 -0
- package/dist/monitoring/sentinelWiring.d.ts.map +1 -1
- package/dist/monitoring/sentinelWiring.js +48 -0
- package/dist/monitoring/sentinelWiring.js.map +1 -1
- package/package.json +1 -1
- package/src/data/builtin-manifest.json +3 -3
- package/upgrades/1.2.63.md +84 -0
- package/upgrades/side-effects/rate-limit-recovery-reachability.md +116 -0
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
# Upgrade Guide — rate-limit recovery now reaches non-topic-bound sessions
|
|
2
|
+
|
|
3
|
+
<!-- bump: patch -->
|
|
4
|
+
<!-- patch = bug fixes, refactors, test additions, doc updates -->
|
|
5
|
+
|
|
6
|
+
## What Changed
|
|
7
|
+
|
|
8
|
+
**Fix: the rate-limit sentinel now actually recovers a session that isn't bound
|
|
9
|
+
to a Telegram topic.**
|
|
10
|
+
|
|
11
|
+
When Anthropic's server-side throttle hit ("Server is temporarily limiting
|
|
12
|
+
requests · not your usage limit"), the RateLimitSentinel detected it and ran its
|
|
13
|
+
backoff correctly — but both of its recovery actions started by asking "is this
|
|
14
|
+
session bound to a Telegram topic?" and silently did nothing if the answer was
|
|
15
|
+
no. A developer's interactive Claude Code window isn't bound to a topic, so:
|
|
16
|
+
|
|
17
|
+
- the "throttled, backing off, you're not dropped" notice went nowhere, and
|
|
18
|
+
- the resume nudge that wakes the session back up went nowhere.
|
|
19
|
+
|
|
20
|
+
From the outside this was indistinguishable from the sentinel not existing — the
|
|
21
|
+
exact thing observed: a throttle sat on screen for minutes with no recovery and
|
|
22
|
+
no signal. (v1.2.33 shipped past green tests because every test fixture was
|
|
23
|
+
topic-bound; the non-topic-bound path was never exercised.)
|
|
24
|
+
|
|
25
|
+
This release makes recovery reachable under **all** session conditions:
|
|
26
|
+
|
|
27
|
+
- **Resume** — topic-bound sessions get the topic-tagged nudge as before;
|
|
28
|
+
non-topic-bound sessions get a trusted in-process injection
|
|
29
|
+
(`SessionManager.injectInternalMessage`) that bypasses the topic-prefix
|
|
30
|
+
requirement. This path is in-process only — never exposed over HTTP.
|
|
31
|
+
- **Notify** — the user notice goes to the session's own topic, else falls back
|
|
32
|
+
to the always-available lifeline (system) topic, else is written as a loud
|
|
33
|
+
`recovery-unreachable` audit event. Never a silent drop.
|
|
34
|
+
- **Audit** — every recovery attempt records `recovery-reached` /
|
|
35
|
+
`recovery-unreachable` to `logs/sentinel-events.jsonl`, and unreachable events
|
|
36
|
+
also land in `.instar/sentinel-alerts.json` so the dashboard surfaces them even
|
|
37
|
+
when Telegram can't be reached.
|
|
38
|
+
|
|
39
|
+
The reachability branching was lifted out of the inline server closures into
|
|
40
|
+
`sentinelWiring.buildRateLimitRecoveryDeps()` so it is unit-testable — closing
|
|
41
|
+
the gap (inline + untestable logic) that let this ship past green tests.
|
|
42
|
+
|
|
43
|
+
Spec: `Sentinel Reachability + Worktree Isolation` (the worktree-clone and
|
|
44
|
+
socket/silence-default parts already shipped via #334/#340/#351; this is the
|
|
45
|
+
remaining rate-limit recovery piece).
|
|
46
|
+
Side-effects review: `upgrades/side-effects/rate-limit-recovery-reachability.md`.
|
|
47
|
+
|
|
48
|
+
## What to Tell Your User
|
|
49
|
+
|
|
50
|
+
If I'm ever hit by one of Anthropic's brief "servers are busy" throttles while
|
|
51
|
+
you're talking to me in a plain window (not a Telegram topic), I'll now actually
|
|
52
|
+
tell you I'm throttled and backing off — and I'll wake myself back up and let you
|
|
53
|
+
know when it clears, instead of going silent until you poke me. Nothing changes
|
|
54
|
+
for the normal Telegram case; this just closes the gap where a throttle in a
|
|
55
|
+
direct dev window left you staring at a frozen screen.
|
|
56
|
+
|
|
57
|
+
## Summary of New Capabilities
|
|
58
|
+
|
|
59
|
+
No new user-facing capabilities — this is a behavior fix to existing rate-limit
|
|
60
|
+
recovery. (Internal: `SessionManager.injectInternalMessage` for trusted
|
|
61
|
+
in-process recovery nudges; `buildRateLimitRecoveryDeps` reachability factory.)
|
|
62
|
+
|
|
63
|
+
## Evidence
|
|
64
|
+
|
|
65
|
+
**Live reproduction (the bar that was missed last time).** Drove the *real*
|
|
66
|
+
RateLimitSentinel lifecycle through the *real* recovery factory (wired exactly as
|
|
67
|
+
`server.ts` wires it) against a *real* tmux pane that was **not** bound to any
|
|
68
|
+
topic — the exact failure condition. Result: the resume nudge landed in the pane
|
|
69
|
+
via the real internal-injection path, the "throttled, backing off" notice and the
|
|
70
|
+
"back online" check-in both reached the lifeline topic, and the audit log
|
|
71
|
+
recorded `recovery-reached` with zero `recovery-unreachable`. Before the fix all
|
|
72
|
+
of those were silent.
|
|
73
|
+
|
|
74
|
+
**Regression coverage (CI-permanent):**
|
|
75
|
+
- `tests/unit/rate-limit-recovery-reachability.test.ts` (9) — both sides of every
|
|
76
|
+
reachability boundary: topic / lifeline / internal-injection / unreachable.
|
|
77
|
+
- `tests/integration/rate-limit-recovery-sentinel-lifecycle.test.ts` (2) — the
|
|
78
|
+
real sentinel lifecycle (detect→backoff→resume→verify→recovered) driving the
|
|
79
|
+
factory to the lifeline for a non-topic-bound session, plus the
|
|
80
|
+
never-silent unreachable case.
|
|
81
|
+
- `tests/unit/rate-limit-recovery-wiring.test.ts` (6) — wiring integrity (server
|
|
82
|
+
wires the real primitives, not no-ops) + the InputGuard HTTP boundary.
|
|
83
|
+
|
|
84
|
+
`tsc` clean. The pre-existing rate-limit unit/integration/e2e suites stay green.
|
|
@@ -0,0 +1,116 @@
|
|
|
1
|
+
# Side-Effects Review — RateLimitSentinel recovery reachability
|
|
2
|
+
|
|
3
|
+
**Version / slug:** `rate-limit-recovery-reachability`
|
|
4
|
+
**Date:** `2026-05-24`
|
|
5
|
+
**Author:** `echo`
|
|
6
|
+
**Second-pass reviewer:** `internal-adversarial` (external /crossreview tooling not wired on this host)
|
|
7
|
+
|
|
8
|
+
## Summary of the change
|
|
9
|
+
|
|
10
|
+
The RateLimitSentinel detects Anthropic's server-side throttle correctly and
|
|
11
|
+
schedules its backoff correctly, but its two recovery closures in `server.ts`
|
|
12
|
+
(`rateLimitResume`, `rateLimitNotify`) both began with
|
|
13
|
+
`const topicId = telegram?.getTopicForSession(sessionName); if (topicId == null) return`.
|
|
14
|
+
For a session not bound to any Telegram topic — e.g. a developer's interactive
|
|
15
|
+
Claude Code window — both paths silently no-opped. Detection + backoff ran, then
|
|
16
|
+
the resume nudge and the user notice dropped on the floor. From the user's seat
|
|
17
|
+
this is indistinguishable from no sentinel existing (the v1.2.33 ship that
|
|
18
|
+
"recovered" in tests but never in Justin's real dev window).
|
|
19
|
+
|
|
20
|
+
This PR makes recovery reachable under **all** session conditions:
|
|
21
|
+
|
|
22
|
+
- **Resume** — topic-bound sessions get the topic-tagged nudge through the
|
|
23
|
+
provenance-checked `injectMessage` path (unchanged). Non-topic-bound sessions
|
|
24
|
+
get a new `SessionManager.injectInternalMessage` path that bypasses the
|
|
25
|
+
topic-prefix InputGuard requirement (trusted in-process caller, logged with
|
|
26
|
+
`source: 'sentinel-recovery'`).
|
|
27
|
+
- **Notify** — session topic → lifeline (system) topic → a loud
|
|
28
|
+
`recovery-unreachable` audit event. Never a silent return.
|
|
29
|
+
- **Audit** — every recovery attempt records `recovery-reached` /
|
|
30
|
+
`recovery-unreachable` to `logs/sentinel-events.jsonl`; an unreachable event
|
|
31
|
+
also appends to `.instar/sentinel-alerts.json` (rolling 200) so the dashboard
|
|
32
|
+
surfaces it even when Telegram is unavailable.
|
|
33
|
+
|
|
34
|
+
The reachability branching was extracted from the inline closures into
|
|
35
|
+
`sentinelWiring.buildRateLimitRecoveryDeps()` so it is unit-testable — the exact
|
|
36
|
+
gap that let the bug ship past green tests (the logic was inline + untestable).
|
|
37
|
+
|
|
38
|
+
**Files touched:**
|
|
39
|
+
- `src/core/SessionManager.ts` (+`injectInternalMessage`, internal-only, NOT HTTP-exposed).
|
|
40
|
+
- `src/monitoring/sentinelWiring.ts` (+`buildRateLimitRecoveryDeps`, `RATE_LIMIT_RESUME_NUDGE`, types).
|
|
41
|
+
- `src/commands/server.ts` (rewired the two closures through the factory + a `recordRecovery` audit writer).
|
|
42
|
+
- `tests/unit/rate-limit-recovery-reachability.test.ts` (new, 9 cases — both sides of every reachability boundary).
|
|
43
|
+
- `tests/unit/rate-limit-recovery-wiring.test.ts` (new, 6 cases — T5 wiring integrity + T7 InputGuard boundary).
|
|
44
|
+
- `upgrades/NEXT.md`, `package.json` bump.
|
|
45
|
+
|
|
46
|
+
## Decision-point inventory
|
|
47
|
+
|
|
48
|
+
- **Topic vs. non-topic resume path** — *modify*. Pure presence check on
|
|
49
|
+
`getTopicForSession`; topic path is byte-for-byte the old behavior, non-topic
|
|
50
|
+
path is the new internal injection. No judgment.
|
|
51
|
+
- **Notify fallback order** — *new*. session-topic → lifeline → audit. Each step
|
|
52
|
+
is a null check; deterministic, no LLM.
|
|
53
|
+
- **InputGuard bypass (`injectInternalMessage`)** — *new security-relevant path*.
|
|
54
|
+
Covered in §Security boundary below.
|
|
55
|
+
- **Audit sink** — *new*. Append-only, best-effort, wrapped in try/catch so a
|
|
56
|
+
logging failure can never break a recovery nudge.
|
|
57
|
+
|
|
58
|
+
## Over-block / under-block analysis
|
|
59
|
+
|
|
60
|
+
- **Under-block (the bug):** recovery reaching nothing for non-topic-bound
|
|
61
|
+
sessions. Closed — every path now terminates in a delivery or a recorded
|
|
62
|
+
unreachable event.
|
|
63
|
+
- **Over-block:** none introduced. The topic-bound path is unchanged. The new
|
|
64
|
+
internal injection only fires when there is genuinely no topic, and only from
|
|
65
|
+
the in-process sentinel.
|
|
66
|
+
|
|
67
|
+
## Security boundary (InputGuard)
|
|
68
|
+
|
|
69
|
+
`injectInternalMessage` bypasses the topic-prefix provenance check, so it is a
|
|
70
|
+
trust boundary. Mitigations:
|
|
71
|
+
- It is a method on `SessionManager` only — **not** wired to any HTTP route
|
|
72
|
+
(asserted by test T7: `routes.ts` must not contain `injectInternalMessage`).
|
|
73
|
+
- All HTTP injection continues through `injectMessage`, which enforces
|
|
74
|
+
provenance / prefix.
|
|
75
|
+
- Every internal injection logs an `internal-recovery-injection` security event
|
|
76
|
+
with the `source` label, so the audit log distinguishes trusted recovery
|
|
77
|
+
nudges from user/topic traffic.
|
|
78
|
+
|
|
79
|
+
## Signal vs. authority
|
|
80
|
+
|
|
81
|
+
No new blocking authority. The sentinel is a bounded recovery primitive; the new
|
|
82
|
+
code only adds *delivery channels* and an *audit trail*. Nothing gates user
|
|
83
|
+
actions or other sessions.
|
|
84
|
+
|
|
85
|
+
## Interactions
|
|
86
|
+
|
|
87
|
+
- **CompactionSentinel / other sentinels** — unchanged. The zombie-veto
|
|
88
|
+
composition and bidirectional defer logic are untouched.
|
|
89
|
+
- **SentinelNotifier (socket/silence trio)** — untouched. Their default-off
|
|
90
|
+
`sentinelTelegramEscalation` is deliberately preserved (the post-2026-05-22
|
|
91
|
+
anti-flood design). This PR does **not** flip that default — the original
|
|
92
|
+
spec's Part A3 is intentionally dropped as superseded.
|
|
93
|
+
- **InputGuard** — only adds a new logged event type; existing provenance flow
|
|
94
|
+
is unchanged.
|
|
95
|
+
|
|
96
|
+
## Migration parity
|
|
97
|
+
|
|
98
|
+
No agent-installed files change. This is server-side binary code
|
|
99
|
+
(`SessionManager`, `sentinelWiring`, `server.ts`) shipped via npm update — every
|
|
100
|
+
existing agent picks it up automatically on update with no migration entry. No
|
|
101
|
+
new config default (the RateLimitSentinel is already default-on), no hook, no
|
|
102
|
+
skill, no CLAUDE.md template capability. (Worktree clone-isolation and the
|
|
103
|
+
socket/silence default already shipped separately via #334/#340/#351 and are out
|
|
104
|
+
of scope here.)
|
|
105
|
+
|
|
106
|
+
## Rollback
|
|
107
|
+
|
|
108
|
+
Single, independent, trivially reversible: `git revert` restores the two inline
|
|
109
|
+
closures (reintroducing the silent no-op). The new `injectInternalMessage` and
|
|
110
|
+
factory become dead code on revert; no state migration to undo.
|
|
111
|
+
|
|
112
|
+
## Out of scope
|
|
113
|
+
|
|
114
|
+
- Flipping `sentinelTelegramEscalation` to default-on (superseded by anti-flood).
|
|
115
|
+
- Socket-disconnect / active-silence sentinel reachability (default-off by design).
|
|
116
|
+
- Worktree clone isolation (already shipped via #334).
|