instar 1.2.71 → 1.2.73

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,94 @@
1
+ # Side-Effects Review — Emergency-stop on the lifeline forward path
2
+
3
+ **Version / slug:** `emergency-stop-forward-path-wiring`
4
+ **Date:** `2026-05-24`
5
+ **Author:** `echo`
6
+ **Second-pass reviewer:** `not required (single localized, fail-open route addition; spec-driven)`
7
+
8
+ ## Summary of the change
9
+
10
+ Wires the existing `MessageSentinel` emergency-stop/pause intercept into `POST /internal/telegram-forward` (the lifeline ingress path) so that lifeline-owned-polling agents — which never run `TelegramAdapter.processUpdate()`, where the intercept currently lives — actually honor "stop everything". Before this, those agents (e.g. echo) delivered emergency-stop messages to the session as normal text and nothing halted a running/wedged session. One file touched: `src/server/routes.ts` (a ~50-line block inserted after request validation, before message logging/routing). Reuses the existing `ctx.telegram.onSentinelKillSession`/`onSentinelPauseSession` callbacks and `stopAutonomousTopic`; adds no new kill logic. Spec: `docs/specs/emergency-stop-forward-path-wiring.md`.
11
+
12
+ ## Decision-point inventory
13
+
14
+ - `MessageSentinel.classify` (existing authority) — **pass-through** — its verdict is unchanged; this change only routes the verdict to an action on a path that previously ignored it.
15
+ - `/internal/telegram-forward` routing decision — **modify** — adds an emergency-stop/pause short-circuit before the existing `onTopicMessage`/registry-inject routing. Normal messages are unaffected.
16
+ - Session kill / pause / autonomous-clear — **pass-through** — reuses `onSentinelKillSession`, `onSentinelPauseSession`, `stopAutonomousTopic` exactly as the `processUpdate` path does.
17
+
18
+ ---
19
+
20
+ ## 1. Over-block
21
+
22
+ **What legitimate inputs does this change reject that it shouldn't?**
23
+
24
+ Only messages the sentinel classifies as `emergency-stop` or `pause` are intercepted (not routed to the session). That is the intended behavior and matches the already-live `processUpdate` path. The classifier's own over-block surface is unchanged (live-tested: a long sentence merely containing "stop" → `normal`, routed normally). A false-positive emergency-stop would terminate the session — but (a) the classifier already gates this with word-count + exact/regex/LLM disambiguation, and (b) the user can simply send a new message to respawn; no data loss (resume UUID is saved by `onSentinelKillSession`). No new over-block beyond what `processUpdate` already exhibits.
25
+
26
+ ---
27
+
28
+ ## 2. Under-block
29
+
30
+ **What failure modes does this still miss?**
31
+
32
+ - A genuinely-wedged classifier (LLM provider down) → fail-open → the emergency-stop is NOT honored (message routes normally). This is the deliberate trade: never block delivery. The deterministic exact/regex patterns ("stop", "cancel", "halt") do not require the LLM, so the most common emergency phrasings still classify without a provider.
33
+ - Non-Telegram lifeline paths (none currently) would need the same treatment.
34
+ - This change does not address the still-dark `processUpdate`-only assumption for any *other* per-message safety hook; scope is the sentinel only.
35
+
36
+ ---
37
+
38
+ ## 3. Level-of-abstraction fit
39
+
40
+ **Is this at the right layer?**
41
+
42
+ Yes. The smart authority (`MessageSentinel`, LLM + deterministic patterns) already exists and already owns the decision. This change is pure routing: it ensures the authority's verdict reaches the action on the server-side ingress path that lifeline-owned agents use. It does not re-implement classification (would be the wrong layer); it consumes the existing authority's output. It places the intercept at the server route layer — the correct single choke-point for the lifeline path, mirroring where `processUpdate` sits for the adapter-poll path.
43
+
44
+ ---
45
+
46
+ ## 4. Signal vs authority compliance
47
+
48
+ **Required reference:** docs/signal-vs-authority.md
49
+
50
+ - [x] No — this change produces no new block/allow logic; it routes an existing smart-gate (`MessageSentinel`, LLM-backed with deterministic patterns) verdict to its action.
51
+
52
+ The sentinel remains the sole authority. The forward route is a consumer of its verdict, not a new detector. No brittle logic gains blocking authority. (And the action is fail-open: a classifier error degrades to normal delivery, never to a wrong block.)
53
+
54
+ ---
55
+
56
+ ## 5. Interactions
57
+
58
+ - **Shadowing:** The new block runs AFTER the version-handshake (426/400 still short-circuit first) and BEFORE message logging + `onTopicMessage` routing. It does not shadow the handshake. It intentionally shadows routing for emergency-stop/pause (that's the point) — confirmed `logInboundMessage` is skipped for intercepted messages, which is acceptable (an emergency-stop need not be logged as a conversational turn; the kill is logged to the server log).
59
+ - **Double-fire:** For lifeline-owned agents, `processUpdate` does not run, so there is no double-intercept. For adapter-poll agents, the forward route is not used, so again no double-fire. The two paths are mutually exclusive per deployment mode. No agent runs both for the same message.
60
+ - **Races:** `onSentinelKillSession` already encapsulates resume-UUID save + kill (its existing concurrency behavior is unchanged). `stopAutonomousTopic` is idempotent (best-effort, try/catch). Session resolution reads the registry file read-only.
61
+ - **Feedback loops:** None. The intercept terminates routing; it does not feed back into the sentinel.
62
+
63
+ ---
64
+
65
+ ## 6. External surfaces
66
+
67
+ - **Other agents on same machine:** none — server-local route logic.
68
+ - **Install base:** ships to every agent via the normal server build. Adapter-poll agents are unaffected (path unused); lifeline-owned agents gain the (intended) emergency-stop. No agent-installed file (hook/config/template) changes → no `PostUpdateMigrator` work.
69
+ - **External systems:** on emergency-stop/pause the route now sends one Telegram message ("Session terminated." / "Session paused." / "No active session to stop."), matching the `processUpdate` user-facing copy. No new external endpoints.
70
+ - **Persistent state:** none added. Reads `topic-session-registry.json` read-only; `onSentinelKillSession` writes the resume-UUID map exactly as today.
71
+ - **Response shape:** the route now may return `{ ok:true, sentinel:'emergency-stop'|'pause', killed|paused }` instead of `{ ok:true, forwarded:true }` for intercepted messages. The lifeline treats any 200 as delivered; the new fields are additive and ignored by existing callers.
72
+
73
+ ---
74
+
75
+ ## 7. Rollback cost
76
+
77
+ Pure code change in one route, fail-open by design. Back-out = revert the block; behavior returns to current (sentinel dark on the forward path). No persistent state, no schema, no migration, no agent-state repair. The fail-open guarantee means even a bug in the block degrades to "behaves like today" (message routes), never to blocked delivery — so the rollback urgency is low even if a defect ships.
78
+
79
+ ## Conclusion
80
+
81
+ The review produced no design changes — the fix is a faithful port of the already-tested `processUpdate` intercept to the lifeline ingress path, consuming the existing sentinel authority, fail-open. The only behavioral change is that emergency-stop/pause now fire for lifeline-owned agents, which is the intended P0 safety fix. Integration tests cover both sides of every boundary (kill/pause/normal/fail-open/no-session) plus a wiring-integrity assertion that classification precedes routing (the guard whose absence caused the original drift). Clear to ship. Live end-to-end verification (real Telegram "stop everything" terminating a real session) occurs post-deploy, since it requires the merged build running; pre-merge proof is the integration suite exercising the real route handler.
82
+
83
+ ---
84
+
85
+ ## Second-pass review (if required)
86
+
87
+ Not required — single localized, fail-open route addition with full test coverage and no new decision logic.
88
+
89
+ ---
90
+
91
+ ## Evidence pointers
92
+
93
+ - Reproduction (pre-fix): live `POST /sentinel/classify` returns `emergency-stop` for "stop everything"; code trace shows `/internal/telegram-forward` (the live path for lifeline-owned echo) had zero sentinel references → message routed as normal.
94
+ - Post-fix proof: `tests/integration/telegram-forward-sentinel-intercept.test.ts` — 6/6 green (emergency-stop kills + not-routed; pause; normal-routes; fail-open-routes; no-session; wiring-integrity classify-before-route).
@@ -0,0 +1,54 @@
1
+ # Side-Effects Review — Never a False Blocker (B17_FALSE_BLOCKER)
2
+
3
+ **Slug:** `never-a-false-blocker-standard`
4
+ **Date:** 2026-05-24
5
+ **Author:** echo
6
+ **Second-pass reviewer:** internal adversarial convergence (two reviewers) + real-LLM test-as-self
7
+
8
+ ## Summary of the change
9
+
10
+ Adds the constitution standard "Never a False Blocker" to `docs/STANDARDS-REGISTRY.md` and its structural enforcement: a new always-evaluated rule **B17_FALSE_BLOCKER** in `MessagingToneGate` (the outbound-message authority that hosts B15/B16). B17 holds an outbound message that defers a doable task to a person — "needs a human / I can't / second opinion / reverse-engineering" — when the message names no genuinely-human-only item and shows no inventory of the agent's own means (computer use, terminal, send-keys, MCP). The `deferral-detector` PreToolUse hook is extended (signal-only) to prime the inventory checklist for the new excuse-shapes. Registers the standard in `docs/INSTAR-DESIGN-PRINCIPLES-AND-LESSONS.md` (P12). The sibling of B16 — feasibility-surrender (B16) vs human-deference (B17).
11
+
12
+ ## Decision-point inventory
13
+
14
+ - `VALID_RULES` set — **add** `'B17_FALSE_BLOCKER'`. Without this the gate's drift-detection fails-open on a legitimate B17 citation (verified: a real-LLM B17 citation is accepted, `failedOpen=false`).
15
+ - `buildPrompt()` rule section — **add** the B17 definition after B16 (always-evaluated, no precondition), including the B16/B17 de-confliction + straddle handling + citation precedence (B15>B16>B17) + the UI-interaction clarification + a worked block example.
16
+ - Response-format enumeration + two doc comments (`B1..B16`→`B1..B17`) — **modify**.
17
+ - `deferral-detector` template (`PostUpdateMigrator.getDeferralDetectorHook`) + the deployed copy — **add** `needs_human_to` / `needs_reverse_engineering` patterns and a guarded `wants_second_opinion` (suppressed when a model/agent is named, so self-fetched cross-model review is not flagged). Checklist text updated to name the agent's own means + the tiny human-only set.
18
+ - No route changes: `checkOutboundMessage` → 422 is rule-agnostic; B17 rides the existing outbound paths.
19
+
20
+ ## 1. Over-block
21
+
22
+ Principal risk: blocking legitimate escalations. Mitigated — severity favors false-negatives, and the allowlist explicitly passes: a password/secret only the user holds, CAPTCHA, legal/billing/payment authorization, **required approvals** (side-effects/policy-gated), **account/access grants**, **external rate-limit/quota waits**, genuine value judgments, deferrals after a named-outcome inventory, self-fetched cross-model review, and rule-discussion. Real-LLM test-as-self confirmed password escalation, value-judgment, and required-approval all PASS while the founding false-blocker BLOCKS — no false-positive introduced by the precision-tightening.
23
+
24
+ ## 2. Under-block (a real false blocker slipping through)
25
+
26
+ Two known holes, both accepted by design:
27
+ - The gate sees only message text, so a **fabricated inventory** ("I tried everything, your call") can pass — same limit as B16, stated honestly in the rule. Mitigated by requiring *named outcomes* (not bare tool names); the hollow-inventory case is a unit assertion.
28
+ - Borderline misses are acceptable per the false-negative-favoring posture. Test-as-self caught the founding case passing initially and the prompt was tightened (UI-interaction clarification + worked example) until real Haiku blocked it.
29
+
30
+ ## 3. Level-of-abstraction fit
31
+
32
+ Correct: the block authority lives inside the single outbound authority (where B15/B16 live), not in the detector. The `deferral-detector` extension is signal-only (injects `additionalContext`, never blocks). Signal-vs-authority compliant.
33
+
34
+ ## 4. Blocking authority
35
+
36
+ No new brittle authority. B17 is one more rule the existing authority may cite; the 422 plumbing and fail-open behavior are inherited unchanged.
37
+
38
+ ## 5. Interactions
39
+
40
+ B17 is always evaluated alongside B15/B16 in one LLM call — no extra calls, marginally longer prompt. De-conflicted from B16 (missing mechanism → B16; person required → B17; the straddle → B17) with explicit citation precedence B15>B16>B17 so telemetry is deterministic. Drift-detection unaffected (an invented rule id still fails open — regression test included). The detector's orphan-TODO patterns are preserved (the regenerated deployed copy carries them, so migration does not regress that prior improvement).
41
+
42
+ ## 6. External surfaces
43
+
44
+ None. No new endpoints, credentials, or network calls.
45
+
46
+ ## 7. Rollback cost
47
+
48
+ Low. Reverting removes the rule from the set + prompt, the detector patterns, and the doc entries; no state, no migration, no schema. An older server simply lacks the rule.
49
+
50
+ ## 8. Test evidence
51
+
52
+ - Unit (`messaging-tone-gate-b17.test.ts`, 13 tests) + integration (`telegram-reply-b17-false-blocker.test.ts`, 2 tests) green; tsc clean; smoke suite (62 files / 2371 tests) green.
53
+ - Detector behaviorally exercised: false-blocker and reverse-engineering payloads flag; self-fetched cross-model review and clean status messages do not.
54
+ - **Real-LLM test-as-self** (real `ClaudeCliIntelligenceProvider` → Haiku, in-process against the built rule, production server untouched): founding codex-trust message + the fused straddle both BLOCK with B17; password escalation, value judgment, required approval, self-fetched second opinion, and post-inventory deferral all PASS.