instar 0.28.49 → 0.28.51

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (33) hide show
  1. package/dist/commands/init.js +93 -93
  2. package/dist/commands/init.js.map +1 -1
  3. package/dist/commands/server.d.ts.map +1 -1
  4. package/dist/commands/server.js +61 -31
  5. package/dist/commands/server.js.map +1 -1
  6. package/dist/core/InputGuard.d.ts +29 -3
  7. package/dist/core/InputGuard.d.ts.map +1 -1
  8. package/dist/core/InputGuard.js +73 -45
  9. package/dist/core/InputGuard.js.map +1 -1
  10. package/dist/core/PostUpdateMigrator.d.ts +14 -0
  11. package/dist/core/PostUpdateMigrator.d.ts.map +1 -1
  12. package/dist/core/PostUpdateMigrator.js +46 -0
  13. package/dist/core/PostUpdateMigrator.js.map +1 -1
  14. package/dist/messaging/shared/isSystemOrProxyMessage.d.ts +41 -0
  15. package/dist/messaging/shared/isSystemOrProxyMessage.d.ts.map +1 -0
  16. package/dist/messaging/shared/isSystemOrProxyMessage.js +64 -0
  17. package/dist/messaging/shared/isSystemOrProxyMessage.js.map +1 -0
  18. package/dist/monitoring/PresenceProxy.d.ts +3 -1
  19. package/dist/monitoring/PresenceProxy.d.ts.map +1 -1
  20. package/dist/monitoring/PresenceProxy.js +5 -16
  21. package/dist/monitoring/PresenceProxy.js.map +1 -1
  22. package/package.json +1 -1
  23. package/scripts/pre-push-gate.js +6 -3
  24. package/src/data/builtin-manifest.json +43 -43
  25. package/upgrades/0.28.50.md +59 -0
  26. package/upgrades/0.28.51.md +31 -0
  27. package/upgrades/0.28.52.md +82 -0
  28. package/upgrades/side-effects/0.28.49.md +90 -0
  29. package/upgrades/side-effects/0.28.50.md +104 -0
  30. package/upgrades/side-effects/0.28.51.md +145 -0
  31. package/upgrades/side-effects/0.28.52.md +276 -0
  32. package/upgrades/side-effects/pre-push-gate-ci-scope.md +104 -0
  33. package/upgrades/side-effects/skill-port-dynamic-resolution.md +104 -0
@@ -0,0 +1,31 @@
1
+ # Upgrade Guide — vNEXT
2
+
3
+ <!-- bump: patch -->
4
+
5
+ ## What Changed
6
+
7
+ Closes the compaction-recovery stall seen on topic 6795 (2026-04-17). When a session hit context compaction, `recoverCompactedSession` was deciding "is there pending work to re-inject?" by looking at the last message in the topic — without filtering out PresenceProxy standby messages (`🔭 …`) or server-emitted delivery/lifecycle acks (`✓ Delivered`, `Session respawned.`). Those messages are `fromUser: false` but they are NOT real agent responses. The recover helper treated them as "agent already answered," which caused it to decline three consecutive re-inject attempts while the user sat with an unanswered question for ~15 minutes.
8
+
9
+ The fix hoists the prefix classifier that `PresenceProxy.isSystemMessage()` and `checkLogForAgentResponse()` already used into a shared module (`src/messaging/shared/isSystemOrProxyMessage.ts`), adds a `findLastRealMessage(history)` walk-back helper on top, and routes `recoverCompactedSession` through it. Three scattered copies of the same prefix list — one of which was silently missing from the recovery path — are now one.
10
+
11
+ Secondary corrections: `recoverCompactedSession`'s history window widened from 5 to 20 entries so the walk-back has headroom past a run of standby/ack messages before it gives up; `checkLogForAgentResponse` now shares the classifier so any future addition to the prefix list (new system-emitted message format) lands in all three consumers for free.
12
+
13
+ Full side-effects review (over/under-block, abstraction fit, signal-vs-authority, interactions, rollback cost): `upgrades/side-effects/0.28.52.md`.
14
+
15
+ ## What to Tell Your User
16
+
17
+ - **Compaction stalls recover cleanly now**: "When the agent's context window fills up mid-conversation, there's a safety net that re-asks me your question on the fresh session. It had a blind spot — if I'd posted a status-update like '🔭 working on it' before the context window filled up, the safety net thought I'd already answered you and stayed silent. That's fixed. The safety net now looks past status updates and delivery acks to find the real last message."
18
+
19
+ ## Summary of New Capabilities
20
+
21
+ | Capability | How to Use |
22
+ |-----------|-----------|
23
+ | Compaction re-injection sees past PresenceProxy standbys and delivery acks | automatic — fires whenever CompactionSentinel detects a compaction event |
24
+
25
+ ## Evidence
26
+
27
+ **Reproduction (pre-fix):** Topic 6795 on 2026-04-17 between 16:05 and 16:20. User sent "Please proceed here." at 16:05:22. CompactionSentinel detected context exhaustion at 16:08:14, 16:13:xx, and 16:18:xx. Each time, `recoverCompactedSession` logged `recoverFn declined (no pending work or session gone)` because the last message in `telegram.getTopicHistory(6795, 5)` was a PresenceProxy standby (`🔭 Echo is currently updating the ledger spec…`), not the user's question. Result: user saw no reply for ~15 minutes until manual intervention.
28
+
29
+ **Post-fix behavior:** `findLastRealMessage(history)` walks backward past the proxy standby and the `✓ Delivered` ack to find the user's question as the last real message; `recoverCompactedSession` returns `true` and re-injects `COMPACTION_RESUME_PROMPT`. The 25-test unit suite at `tests/unit/isSystemOrProxyMessage.test.ts` includes the exact topic-6795 sequence (user question → `🔭` proxy standby → `✓ Delivered` ack) asserting the walk-back returns the user question, not the ack.
30
+
31
+ **Live verification:** After shipping, simulated a compaction event on a topic whose most recent non-user messages are a PresenceProxy standby + delivery ack; confirmed `recoverCompactedSession` fires `direct re-inject OK for topic <N>` instead of the old `recoverFn declined` line. Logged in `.instar/shared-state.jsonl` under the `[CompactionResume]` tag.
@@ -0,0 +1,82 @@
1
+ # Upgrade Guide — v0.28.52
2
+
3
+ <!-- bump: patch -->
4
+
5
+ ## What Changed
6
+
7
+ Compaction recovery no longer silently declines when the only from-agent
8
+ message since the user's last question is a PresenceProxy standby or a
9
+ server-emitted delivery ack.
10
+
11
+ **Background**: after context compaction, the CompactionSentinel asks a
12
+ `recoverFn` to re-inject a "continue where you left off" prompt if the user
13
+ has an unanswered question. The recoverFn decided "answered / not answered"
14
+ by looking at the last message in the topic: `fromUser === true` → recover,
15
+ else decline.
16
+
17
+ On topic 6795 (2026-04-17) this failed three times in a row. The user's
18
+ question had been answered… by a PresenceProxy standby (`🔭 Echo is
19
+ currently updating the ledger spec…`) emitted while the real agent was
20
+ still compacting. Because standby messages are `fromUser: false`, the
21
+ recoverFn treated them as "agent already answered" and refused to act.
22
+ Fifteen minutes of user-facing silence while the safety net stared at the
23
+ problem and did nothing.
24
+
25
+ The root cause was a missing filter. `PresenceProxy.isSystemMessage()` and
26
+ `checkLogForAgentResponse()` already knew to skip standby/delivery/lifecycle
27
+ messages. The recoverFn didn't. Three copies of roughly the same prefix
28
+ list were scattered across the code — the recoverFn was the one that
29
+ missed the update.
30
+
31
+ **The fix**:
32
+
33
+ 1. A new shared module `src/messaging/shared/isSystemOrProxyMessage.ts`
34
+ owns the classification: `✓ Delivered`, `🔄 Session restarting`,
35
+ `Session respawned.`, `Session terminated.`, `Send a new message to
36
+ start`, and any message starting with `🔭` (PresenceProxy standby).
37
+ 2. A companion helper `findLastRealMessage(history)` walks a chronological
38
+ history backward and returns the first non-system/non-proxy entry.
39
+ This is the "has the agent actually answered?" predicate.
40
+ 3. `recoverCompactedSession` in `src/commands/server.ts` now uses
41
+ `findLastRealMessage` instead of looking at the last message
42
+ unconditionally. Also widens the history window from 5 to 20 so a
43
+ burst of standby/ack messages can't push the real user turn out of view.
44
+ 4. `PresenceProxy.isSystemMessage()` and
45
+ `checkLogForAgentResponse()` are thin wrappers over the shared
46
+ classifier so all three callsites stay in sync going forward.
47
+
48
+ ## What to Tell Your User
49
+
50
+ - **Compaction recovery actually recovers**: "When your agent's context
51
+ fills up mid-conversation and it needs to resume, the safety net no
52
+ longer gets fooled by its own standby messages. If you asked a question
53
+ and the agent went silent during compaction, it now gets a re-injection
54
+ prompt instead of sitting there for 15 minutes thinking it already
55
+ answered you."
56
+
57
+ ## Summary of New Capabilities
58
+
59
+ | Capability | How to Use |
60
+ |-----------|-----------|
61
+ | Shared classifier for non-response agent messages | automatic — `isSystemOrProxyMessage` replaces three divergent copies |
62
+ | `findLastRealMessage` walk-back helper | automatic — used by `recoverCompactedSession`; callable from any "did the agent answer?" path |
63
+ | Widened compaction-recovery history window (5 → 20) | automatic — survives longer bursts of standby/ack traffic |
64
+
65
+ ## Evidence
66
+
67
+ Reproduction: topic 6795 on 2026-04-17 logged three identical
68
+ `recoverFn declined (no pending work or session gone)` lines at 16:08:14,
69
+ 16:13, 16:18, each immediately after the CompactionSentinel re-injected.
70
+ The topic's last message at each decline point was a
71
+ `🔭 Echo is currently updating the ledger spec…` from PresenceProxy.
72
+
73
+ Tests: `tests/unit/isSystemOrProxyMessage.test.ts` — 25 unit tests
74
+ covering the classifier (empty/whitespace, all 5 system/lifecycle
75
+ prefixes, PresenceProxy 🔭 prefix, real-message negatives, trim handling)
76
+ and the `findLastRealMessage` walk-back (empty, all-system, trailing
77
+ proxies hiding user turn, trailing proxies hiding agent reply, missing
78
+ text fields, chronological-order contract). Includes an explicit
79
+ `topic-6795 repro` case. All pass.
80
+
81
+ Regression tests for CompactionSentinel (16) and presence-proxy-* (39)
82
+ continue to pass — no behavior drift in the surrounding lifecycle.
@@ -0,0 +1,90 @@
1
+ # 0.28.49 — Evolution Gate Auth Injection
2
+
3
+ ## What Changed
4
+
5
+ Four built-in gate scripts that call the local Instar HTTP API now receive an
6
+ `$INSTAR_AUTH_TOKEN` environment variable when the scheduler runs them, and
7
+ send `Authorization: Bearer $INSTAR_AUTH_TOKEN` with their curl requests.
8
+
9
+ Affected gates: `evolution-proposal-evaluate`, `evolution-proposal-implement`,
10
+ `evolution-overdue-check`, `insight-harvest`.
11
+
12
+ Behind it: `JobScheduler.runGateAsync()` now builds a per-call env object,
13
+ merging `process.env` with `INSTAR_AUTH_TOKEN` when `scheduler.authToken` is
14
+ configured. `JobSchedulerConfig` gained an optional `authToken` field, wired
15
+ up from `instar.json` via `Config.ts`.
16
+
17
+ ## Why
18
+
19
+ Three evolution jobs (`evolution-proposal-evaluate`,
20
+ `evolution-proposal-implement`, `evolution-overdue-check`) plus insight-harvest
21
+ were silently failing every gate check: they curled a localhost endpoint,
22
+ `authMiddleware` returned 401, and the downstream `python3 -c 'json.load(...)'`
23
+ crashed on invalid JSON. The job was then skipped — every cycle, no error
24
+ surface, no telemetry signal other than "this job never does anything."
25
+
26
+ The research cluster
27
+ `cluster-evolution-pipeline-jobs-permanently-skip-gate-scripts-missin` traced
28
+ this to the missing Authorization header and proposed the env-var injection
29
+ pattern.
30
+
31
+ ## Side-Effects Review
32
+
33
+ ### Intended
34
+ - Gates that depend on authenticated localhost endpoints now succeed when an
35
+ `authToken` is configured.
36
+ - Four evolution-pipeline jobs begin firing actual work on next scheduler tick.
37
+
38
+ ### Unintended — Risks Considered
39
+
40
+ **Env leak.** The auth token is exposed to gate shell commands via environment
41
+ variable. Any gate command inherits it. Gate commands are defined in
42
+ `src/commands/init.ts` (built-in) or `instar.json` (user-defined). In a
43
+ single-user local deployment the scope is identical to `process.env` — a gate
44
+ already has full shell access to the machine. No elevation.
45
+
46
+ **Gates that don't need auth.** Unaffected. The env var is present but unused
47
+ when the gate doesn't reference it. Curl without a bearer header still hits
48
+ allowlisted endpoints (`/health`, `/ping`) without issue.
49
+
50
+ **Configs without authToken.** The env var is not set. Gates that reference
51
+ `$INSTAR_AUTH_TOKEN` send `Authorization: Bearer ` (empty bearer), which
52
+ `authMiddleware` treats as unauthorized — same as before. No regression, no
53
+ new failure mode. Public-mode deployments still work because they don't have
54
+ `authToken` configured and the server doesn't require one.
55
+
56
+ **Token visibility in process listings.** `ps -eE` on Linux can expose
57
+ environment variables of running processes. The token is only present in the
58
+ gate child process for the duration of the gate (<10s, enforced by timeout).
59
+ The same token is already in `instar.json` on disk; `ps` leakage is a lesser
60
+ exposure than filesystem read.
61
+
62
+ ### Not Unintended
63
+ - No breaking change for users without `authToken` configured.
64
+ - No schema migration — `JobSchedulerConfig.authToken` is optional.
65
+ - No change to gate timing, retry semantics, or failure-handling.
66
+ - No change to non-gate code paths (the token is only injected in
67
+ `runGateAsync`, not in `runCommandAsync` or the main job command execution).
68
+
69
+ ## Compatibility
70
+
71
+ - Backwards compatible. Existing `instar.json` without `authToken` unchanged
72
+ behavior.
73
+ - Forwards compatible. When `instar init` runs, it generates `instar.json`
74
+ with a fresh `authToken`, so new installations work out of the box.
75
+
76
+ ## Verification Artifacts
77
+
78
+ - Build: `npm run build` — passes (tsc clean + manifest regenerated).
79
+ - Tests: `npm test -- JobScheduler.test.ts` — 47 tests pass including 2 new:
80
+ 1. Gate sees `token=<configured-value>` when `authToken` is set.
81
+ 2. Gate sees empty `token=` when `authToken` is unset.
82
+ - Manual: built-in manifest shows 4 updated gate definitions with the
83
+ Authorization header.
84
+
85
+ ## Summary of New Capabilities
86
+
87
+ Evolution pipeline jobs and insight-harvest now run to completion when an
88
+ auth token is configured — which is the default post-`instar init` state.
89
+ Previously-silent failures become visible work. No new user-facing commands
90
+ or config fields beyond the documented optional `authToken`.
@@ -0,0 +1,104 @@
1
+ # Side-Effects Review — default skills: dynamic localhost port
2
+
3
+ **Version / slug:** `skill-port-dynamic-resolution`
4
+ **Date:** `2026-04-17`
5
+ **Author:** `dawn`
6
+ **Second-pass reviewer:** `not required`
7
+
8
+ ## Summary of the change
9
+
10
+ Two source changes. In `src/commands/init.ts`, every `http://localhost:${port}/...` URL inside `installBuiltinSkills` (and adjacent helpers that share the same file) is rewritten to emit `http://localhost:\${INSTAR_PORT:-${port}}/...`, so the generated `.claude/skills/*/SKILL.md` files contain a shell-expandable port reference instead of a number baked in at install time. In `src/core/PostUpdateMigrator.ts`, a new `migrateSkillPortHardcoding()` scans existing default-skill files for bare `http://localhost:NNNN/` URLs and rewrites them to `http://localhost:${INSTAR_PORT:-NNNN}/`, preserving the original port as the fallback default. The migration is scoped to the 14 known-default skill names and is idempotent. Test coverage: `tests/unit/PostUpdateMigrator-skillPortHardcoding.test.ts` — 6 cases.
11
+
12
+ ## Decision-point inventory
13
+
14
+ - `src/commands/init.ts` `installBuiltinSkills` — **modify** — replaces hardcoded port templating with runtime-expandable pattern. 93 occurrences, mechanical find/replace, all inside backtick template strings for shell-executed content.
15
+ - `src/core/PostUpdateMigrator.ts` `migrateSkillPortHardcoding` — **add** — new migration method. Called from `migrate()` between `migrateBuiltinSkills` and `migrateSelfKnowledgeTree`. Scoped to a fixed allowlist of 14 default skill names.
16
+ - `tests/unit/PostUpdateMigrator-skillPortHardcoding.test.ts` — **add** — regression coverage for the migration.
17
+
18
+ ---
19
+
20
+ ## 1. Over-block
21
+
22
+ No block/allow surface. The change is runtime port resolution in user-project skill files. No message content or agent action is gated.
23
+
24
+ Within the migration's own domain: the scan matches `/http:\/\/localhost:(\d+)\//g` in the default-skill set. This pattern is narrow enough that it will not false-positive on natural-language references ("localhost:4040" mentioned in prose without the URL form is untouched). Files outside the 14-name allowlist are never read, so custom skills are never modified — a principle the test suite asserts explicitly.
25
+
26
+ ---
27
+
28
+ ## 2. Under-block
29
+
30
+ No block surface existed before this change. The migration adds no new enforcement — it is a one-way content rewrite. There is nothing to under-block.
31
+
32
+ Edge case: if a user had a default-skill file with a mix of the new dynamic pattern and stray hardcoded ports (e.g., partial manual edits), the idempotency guard (`includes('${INSTAR_PORT:-')`) will cause the migration to skip the file entirely rather than finish the rewrite. That is the safe direction — migrating a partially-edited file risks corrupting the user's edits. Users in that state can manually finish the rewrite or delete the file and let `installBuiltinSkills` regenerate it.
33
+
34
+ ---
35
+
36
+ ## 3. Level-of-abstraction fit
37
+
38
+ The change is at the correct layer. The root cause was install-time templating of a value that should have been runtime-resolved. Fixing the template is the direct fix; fixing existing user files via migration is the correct catch-up mechanism. Neither change rearchitects the skill system — skills remain static markdown files, the only change is that a value inside them resolves later.
39
+
40
+ The dynamic pattern `${INSTAR_PORT:-PORT}` uses POSIX shell parameter expansion, the same primitive the rest of the Instar shell surface depends on. It is a recognized idiom inside curl-heavy bash content, not a novel construct the user has to learn.
41
+
42
+ ---
43
+
44
+ ## 4. Signal vs authority compliance
45
+
46
+ **Required reference:** [docs/signal-vs-authority.md](../../docs/signal-vs-authority.md)
47
+
48
+ **Does this change hold blocking authority with brittle logic?**
49
+
50
+ - [x] No — this change has no block/allow surface.
51
+
52
+ The change is a content rewrite inside skill files. It does not evaluate messages, gate agent actions, or constrain information flow. Signal-vs-authority applies to decision points that judge messages or block work. A port-expansion template does neither.
53
+
54
+ ---
55
+
56
+ ## 5. Interactions
57
+
58
+ **Shadowing:** `installBuiltinSkills` and `migrateSkillPortHardcoding` target overlapping surface. Order matters: `migrateBuiltinSkills` runs first (non-destructive, writes only missing files), then `migrateSkillPortHardcoding` runs (rewrites existing files). A skill newly written by `installBuiltinSkills` in the same migration pass already uses the dynamic pattern, so `migrateSkillPortHardcoding` will see the `${INSTAR_PORT:-` marker and no-op. No double-processing.
59
+
60
+ **Double-fire:** `migrateSkillPortHardcoding` is idempotent — once a file contains the dynamic marker, it is skipped. Test case `is idempotent on a second run after migration` covers this explicitly.
61
+
62
+ **Races:** `PostUpdateMigrator.migrate()` is sequential and runs once per `instar` update. No concurrent access to the same skill file is expected. If two updaters ran simultaneously, they would both read the hardcoded content, both rewrite it, and the second write would overwrite the first with identical content — no corruption.
63
+
64
+ **Feedback loops:** None. The migration is a one-shot rewrite; the rewritten content does not feed back into any system.
65
+
66
+ ---
67
+
68
+ ## 6. External surfaces
69
+
70
+ - **Other agents:** Each agent running instar will get the migration on next `instar` upgrade. Agents on non-default ports gain working skills; agents on port 4040 see no behavioral change (the fallback matches their previous hardcoded value).
71
+ - **Install base users:** Users with customized skill files (renamed default skills, heavily edited content) are protected by the allowlist and the dynamic-marker idempotency check. The migration touches only the 14 canonical default-skill files, and only if they still contain the bare-port pattern.
72
+ - **External systems:** None. The URL targets are all `localhost` — no external traffic shape changes.
73
+ - **Persistent state:** Skill files on disk are rewritten in place. No database, no config, no registry is touched. Rollback = `git checkout` of the skill file or `rm` and re-run `installBuiltinSkills`.
74
+ - **Timing/runtime:** The `${INSTAR_PORT:-NNNN}` expansion runs at shell invocation time. An agent with `INSTAR_PORT` unset gets the fallback; with it set, gets the override. Zero-cost at skill-read time; one environment variable lookup per curl.
75
+
76
+ ---
77
+
78
+ ## 7. Rollback cost
79
+
80
+ Low. Revert: `git revert` the two source commits; the emitted skills would return to hardcoded ports, matching pre-fix behavior. Users who already ran the migration would keep their dynamic-pattern skills, which continue to work (the fallback equals the previous hardcoded value). No persistent state to undo, no agent state to repair, no user communication required.
81
+
82
+ Narrow risk: if a user's `INSTAR_PORT` env var is set to an invalid value (e.g., a port the server isn't listening on), curls will fail after this change where they would have succeeded before on the hardcoded default. Mitigation: the variable is only consulted if the user explicitly exported it. The intersection of "exported `INSTAR_PORT`" and "set it wrong" is small and self-inflicted; the fix for that case is `unset INSTAR_PORT` or set it correctly.
83
+
84
+ ---
85
+
86
+ ## Conclusion
87
+
88
+ The change is narrow, well-scoped, and covered by regression tests. The template fix is mechanical and safe. The migration is scoped to a known allowlist, idempotent, and respects user customizations. The under-block surface is zero; the over-block surface is zero. The worst case in rollback is a return to the original bug, which affected only users on non-default ports and is already worked around today by hand-sed. Ship.
89
+
90
+ No design changes were made as a result of the review.
91
+
92
+ ---
93
+
94
+ ## Evidence pointers
95
+
96
+ - `tests/unit/PostUpdateMigrator-skillPortHardcoding.test.ts` — 6 tests pass:
97
+ - rewrites hardcoded ports in a default skill
98
+ - leaves already-dynamic skills untouched (idempotent)
99
+ - does not touch custom (non-default) skills
100
+ - is idempotent on a second run after migration
101
+ - skips when the skill file does not exist
102
+ - preserves the original port number in the fallback
103
+ - Live template verification: `node -e "const {installBuiltinSkills}=require('./dist/commands/init.js'); ..."` against a temp dir shows 13 of 14 default skills emit `localhost:${INSTAR_PORT:-4040}` and zero emit bare `localhost:4040` (the 14th skill, `autonomous`, is a stub that deploys separately and has no localhost URLs).
104
+ - Source-side verification: `grep -c 'localhost:${port}' src/commands/init.ts` = 0 after the rewrite (was 93).
@@ -0,0 +1,145 @@
1
+ # Side-Effects Review — Subscription-first InputGuard
2
+
3
+ **Version / slug:** `0.28.51`
4
+ **Date:** `2026-04-17`
5
+ **Author:** `echo`
6
+ **Second-pass reviewer:** converged via `/spec-converge` (2 iterations, 4 internal reviewers × 2 rounds)
7
+
8
+ ## Summary of the change
9
+
10
+ Closes the "subscription-first intelligence" loop (Telegram topic 6655, continuation of watchdog topic 6269). The principle: every LLM-powered decision in instar defaults to the Claude CLI subscription; the Anthropic API is opt-in only, selected at the single provider-construction layer rather than by each consumer.
11
+
12
+ An audit of direct `ANTHROPIC_API_KEY` runtime reads across `src/` found one real bypass: `InputGuard` was constructed with a raw `apiKey` and called `fetch('https://api.anthropic.com/v1/messages', …)` directly for its Layer 2 topic-coherence review. On machines with Claude CLI but no API key, Layer 2 silently no-opped. Three other hits were already OK (`CoherenceGate`, `StallTriageNurse`, and the `AnthropicIntelligenceProvider` itself); the remaining grep matches were doc strings, CLI help text, Docker env clearing, and redaction regex.
13
+
14
+ The initial fix routed InputGuard through the shared `IntelligenceProvider` but kept the direct-fetch/apiKey path as a legacy fallback. Multi-angle spec review (security, scalability, adversarial, integration) surfaced 12 material findings against that design, with two HIGH — the most important being that an attacker who could influence the LLM's output could deterministically produce malformed JSON to trigger the guard's existing "fail-open on parse error" path, bypassing Layer 2. Convergence iteration 2 reshaped the design:
15
+
16
+ - **The direct-fetch / apiKey fallback is removed entirely.** InputGuard has exactly one transport path: the shared `IntelligenceProvider`. No direct `fetch('https://api.anthropic.com/...')` call remains in InputGuard.
17
+ - **Malformed-JSON path is hardened** — returns `verdict: 'suspicious'` with `confidence: 0.3` and a degradation log, not silent `coherent`. Under the default `action: 'warn'`, this surfaces a non-blocking system-reminder rather than passing silently.
18
+ - **Effective review timeout is floored at 8 seconds** — the CLI subprocess cold-start p99 can exceed the pre-change 3000ms budget; a below-floor config is clamped up.
19
+ - **Startup degradation event uses a generic external impact string** — the detailed "which defenses are down" enumeration stays in local logs; external channels (Telegram/feedback) get a summary only.
20
+
21
+ Files touched:
22
+
23
+ - `src/core/InputGuard.ts` — constructor accepts only `intelligence?: IntelligenceProvider`; the direct Anthropic fetch path is removed entirely; `parseReviewResponse()` extracted as a shared helper with fail-closed-to-warn semantics on malformed JSON; `reviewTimeout` JSDoc documents the 8s floor; inline comments explain the rationale for each change.
24
+ - `src/commands/server.ts` — moves `InputGuard` construction to after `sharedIntelligence` is initialized; no longer reads `ANTHROPIC_API_KEY` for the guard; startup log reports either `via shared IntelligenceProvider` or `provenance + patterns only (no LLM review)` (the "via Anthropic API (direct)" third line is gone); startup `DegradationReporter.report(...)` uses a generic external impact string.
25
+ - `tests/unit/InputGuard.test.ts` — 6 regression tests in a `topic coherence review — IntelligenceProvider routing` describe block: provider-routing happy path (fetch asserted never called), suspicious verdict pass-through, markdown-fenced responses, malformed JSON → suspicious with degradation log, empty response → coherent, provider throw → coherent with degradation log, no-provider → coherent with degradation log.
26
+
27
+ ## Decision-point inventory
28
+
29
+ - `InputGuard.reviewTopicCoherence()` — **modify** — Layer 2 authority (the LLM-backed topic-coherence reviewer). Transport layer reshaped (direct-fetch path gone); authority role and prompt unchanged.
30
+ - `InputGuard.parseReviewResponse()` — **modify** — fail-mode behavior change on malformed JSON: previously `coherent` (silent fail-open), now `suspicious` with confidence 0.3 and a degradation log (fail-closed-to-warn). This is an authority-behavior change on an edge case, compliant with signal-vs-authority (see §4).
31
+ - `startServer()` InputGuard construction — **modify** — reorders so the guard is wired with the shared provider; no longer reads `ANTHROPIC_API_KEY` for guard construction; reports chosen backend.
32
+ - `startServer()` no-intelligence branch — **modify** — adds a `DegradationReporter.report(...)` emission alongside the pre-existing yellow console log. Pure observability signal.
33
+
34
+ ---
35
+
36
+ ## 1. Over-block
37
+
38
+ **What legitimate inputs does this change reject that it shouldn't?**
39
+
40
+ The action default is `warn`, not `block` — so "reject" isn't quite the right word even for the verdict changes. Under `warn`, a `suspicious` verdict produces a non-blocking `<system-reminder>` appended to the message; the message still reaches the session.
41
+
42
+ The two verdict-behavior changes in this release:
43
+
44
+ 1. **Fail-closed-to-warn on malformed JSON.** Previously produced `coherent` (no warning); now produces `suspicious` with `confidence: 0.3` (warning emitted). Under `warn` action this adds a system-reminder to a small class of messages — those where the LLM's output genuinely failed to parse due to non-adversarial causes (provider flakiness, upstream rate-limit retries producing truncated output, etc.). Rate-limit: the existing `trackErrors()` counter escalates to an attention-queue event after 3 degradations in 10 minutes, so sustained parse-failure noise would surface as an operator signal. Legitimate over-warn is bounded by provider-output stability — not expected to be meaningful in normal operation.
45
+
46
+ 2. **Layer 2 now runs on CLI-only machines** (pre-existing bug, not a new over-block). The same LLM authority that runs on API-key machines is doing the review, so the over-block behavior matches what API-key users have already been experiencing in production.
47
+
48
+ Under `block` action (non-default, off by default), a `suspicious` verdict would block. The `action: 'block'` mode is opt-in and rare. The acceptance criteria do not cover block-mode fail-closed semantics; an operator who enables block-mode should separately evaluate whether the 0.3-confidence parse-error path should stay warn-only regardless of action. The spec calls out that this nuance is handled in the SessionManager verdict-consumption layer rather than in InputGuard — and SessionManager's action-routing is unchanged by this release.
49
+
50
+ ---
51
+
52
+ ## 2. Under-block
53
+
54
+ **What failure modes does this still miss?**
55
+
56
+ 1. **Pre-change under-block (the one this fixes):** CLI-only machines had no Layer 2 review. This release closes that.
57
+ 2. **Transport-flake fail-open (intentional):** if the provider throws or times out, Layer 2 returns `coherent` with `confidence: 0` and fires a degradation log. Rationale: routine network flakes shouldn't produce warn-spam. Authority-level dissent (suspicious verdict) would have come through the parse path with a proper verdict. The `trackErrors()` attention-queue escalation keeps this from being silent.
58
+ 3. **Consecutive-failure → sticky-suspicious state (deferred):** a motivated attacker who can sustain CLI disruption could keep the transport fail-open path active indefinitely. This is acknowledged in the spec's Deferred #1 — consecutive-failure escalation to a sticky suspicious default needs its own design pass to balance safety against warn-spam. The `trackErrors()` observability floor (3-in-10min → attention queue) is the current mitigation.
59
+ 4. **Provider-layer trust (deferred):** a local attacker who compromises the Claude CLI binary controls every verdict. Spec Deferred #5 — provider-level trust is scoped to `ClaudeCliIntelligenceProvider`, not to guard routing.
60
+
61
+ ---
62
+
63
+ ## 3. Level-of-abstraction fit
64
+
65
+ This change operates at the *transport* layer beneath the existing Layer 2 authority. It does not invent a new authority or a new detector — it routes an existing authority through the already-blessed `IntelligenceProvider` abstraction, same as `CoherenceReviewer` (`3d4240a`) and `StallTriageNurse`.
66
+
67
+ The fail-closed-to-warn change on malformed JSON is at the boundary between transport and authority: when the authority's *output* is malformed, we treat that as weak "suspicious" evidence rather than "allow." This is explicitly a parser-layer policy (what a parse-failed response means), not a new detector with independent blocking authority.
68
+
69
+ The pattern is the established one in the repo. Wrong alternatives rejected:
70
+
71
+ - **Bypass sharedIntelligence and spawn a per-InputGuard provider** — would duplicate config-resolution logic and produce a second source of truth for the backend choice. Rejected.
72
+ - **Keep direct `fetch` and add an `if (sharedIntelligence) skipFetch` branch at the call site** — brittle, would need duplication for every consumer. Rejected; the initial design's legacy-fallback version was a softer form of this and was rejected at iteration 2.
73
+ - **Make `IntelligenceProvider` mandatory at the type level** — would produce TypeScript noise in test fixtures for no real-world gain. The guard degrades loudly when no provider is available, which is the correct runtime behavior.
74
+
75
+ ---
76
+
77
+ ## 4. Signal vs authority compliance
78
+
79
+ **Required reference:** [docs/signal-vs-authority.md](../../docs/signal-vs-authority.md)
80
+
81
+ **Does this change hold blocking authority with brittle logic?**
82
+
83
+ - [x] No — the existing Layer 2 authority (LLM-backed topic-coherence reviewer) is preserved verbatim at the prompt level. The change modifies the transport used to reach it and the interpretation of a parse-failed response. No brittle check is acquiring blocking authority.
84
+
85
+ Narrative: InputGuard Layer 2 *is* an authority in the signal-vs-authority sense — it's the LLM-backed gate that evaluates topic coherence with full context. That authority is unchanged. Layers 1 (provenance tags) and 1.5 (injection-pattern regex) are detectors that operate in warn-only mode and feed signals; they're untouched by this change. The new `DegradationReporter.report(...)` call is a pure observability signal with no blocking surface.
86
+
87
+ The fail-closed-to-warn change on parse errors is authority-adjacent, not a new authority: we are defining what a parse-failed response means. Under warn-only action, this produces a non-blocking warning; the session model remains the decision-maker. A brittle parser is not acquiring blocking authority; it is mapping authority-absence to the safer of the two warn-only outcomes.
88
+
89
+ No brittle check acquires blocking authority. No existing authority is being replaced with a brittle check. The principle holds.
90
+
91
+ ---
92
+
93
+ ## 5. Interactions
94
+
95
+ - **Shadowing:** `InputGuard` construction moved in `startServer()` from ~line 1988 to ~line 2069 (after `sharedIntelligence` is initialized). Manually inspected the intervening 80 lines: none of them invoke the input guard. `sessionManager.setInputGuard()` is the only consumer, and `SessionManager` guards every access with `if (this.inputGuard)` — so even if a message arrived pre-wiring, it would pass through (prior behavior), not crash.
96
+ - **Double-fire:** `reviewTopicCoherence()` runs exactly once per incoming cross-topic message, as before. There is no longer a second branch (the fetch path is gone), so double-fire is structurally impossible.
97
+ - **Races:** Pure synchronous construction during startup. `sharedIntelligence` is created-and-stored before `InputGuard` is constructed. No lifecycle races.
98
+ - **Timeout interaction:** the effective timeout is `max(config.reviewTimeout ?? 0, 8000ms)`. An operator who explicitly sets `reviewTimeout: 3000` in `instar.json` will get 8000ms at runtime — this is documented in the `InputGuardConfig.reviewTimeout` JSDoc with rationale. The floor cannot be overridden downward by design; the direct-HTTPS path that could meaningfully use a 3000ms budget no longer exists.
99
+ - **Feedback loops:** The new `DegradationReporter.report()` event feeds console + disk + Telegram + feedback. None of those sinks feed back into the IntelligenceProvider initialization. No loop.
100
+
101
+ ---
102
+
103
+ ## 6. External surfaces
104
+
105
+ - **Other agents on the same machine:** None. InputGuard is per-server.
106
+ - **Other users of the install base:** On first upgrade, machines with Claude CLI will begin seeing Layer 2 topic-coherence warnings they weren't seeing before. This is the *intended* behavior — the previous silent no-op was the bug. Machines with `ANTHROPIC_API_KEY` but no provider will degrade to provenance+patterns only (no LLM review), with a startup `DegradationReporter` event — these machines were previously taking the direct-fetch path that is now removed. The practical impact is narrow: server startup already tries to construct `AnthropicIntelligenceProvider.fromEnv()` as a last-resort in the sharedIntelligence init (server.ts:2031), so a machine with just an API key gets `sharedIntelligence` populated there, and the InputGuard gets the provider. The only case where the new "no review" path is reached is machines with neither CLI nor a reachable API key — the same machines that previously had silent no-op.
107
+ - **External systems:** No new egress. When routed via `ClaudeCliIntelligenceProvider`, Layer 2 uses the local Claude CLI subprocess (which calls Anthropic with the user's subscription credentials, no billing impact). Direct `api.anthropic.com` egress from InputGuard is removed.
108
+ - **Persistent state:** No schema changes. `security.jsonl` format unchanged. `topic-session-registry.json` unchanged.
109
+ - **Startup log format:** The `Input Guard: enabled (...)` line gained a backend suffix: `via shared IntelligenceProvider` | `provenance + patterns only (no LLM review)` (the `via Anthropic API (direct)` line is gone). Any log-scraper parsing this line for an exact string would need to accept the suffix. (No known scraper depends on it.)
110
+ - **Timing/runtime:** Review timeout on the CLI path is floored at 8s — longer than the pre-change 3000ms default on machines that had API keys. Under `warn` action (default) this is benign; no user-visible latency regression because the review is on the incoming-message path, not the agent's response path.
111
+
112
+ ---
113
+
114
+ ## 7. Rollback cost
115
+
116
+ Pure code change. Revert the commit, ship as a patch. No persistent state migration. No agent state repair. After rollback, CLI-only machines return to the pre-change silent no-op; API-key-only machines return to the direct-fetch path (no behavioral regression). Effort: one revert commit + one release bump, <10 minutes.
117
+
118
+ ---
119
+
120
+ ## Conclusion
121
+
122
+ This review produced substantive design changes via the `/spec-converge` iterative review:
123
+
124
+ - Iteration 1 surfaced 12 material findings across 4 reviewer perspectives. The most consequential were the deterministic malformed-JSON bypass (adversarial HIGH #1) and the subscription→billed-API silent downgrade (adversarial MEDIUM #3). Both are closed in the converged design by dropping the apiKey fallback entirely and hardening the parse-failure path.
125
+ - Iteration 2 surfaced 5 LOW findings, all addressed or explicitly acknowledged. The actionable ones (timeout semantics) were resolved via code + spec alignment and a JSDoc on `InputGuardConfig.reviewTimeout`.
126
+
127
+ Tests: 6 regression cases in `InputGuard.test.ts`, all 35 unit + 36 e2e tests green. TypeScript clean. Full convergence report at `docs/specs/reports/subscription-first-input-guard-convergence.md`.
128
+
129
+ Clear to ship.
130
+
131
+ ---
132
+
133
+ ## Second-pass review
134
+
135
+ Converged via `/spec-converge` multi-reviewer process (2 iterations, 4 internal reviewers × 2 rounds = 8 independent audits). Full finding-by-finding resolution in the convergence report. Final verdict: zero material findings in iteration 2; remaining LOW items addressed or explicitly deferred with rationale.
136
+
137
+ ---
138
+
139
+ ## Evidence pointers
140
+
141
+ - Pre-fix behavior: commit `bacf7fc` on `src/core/InputGuard.ts` shows `if (!this.config.topicCoherenceReview || !this.apiKey) return { verdict: 'coherent', ... }` — the silent no-op, and the direct fetch path to `api.anthropic.com` below it.
142
+ - Post-fix behavior: `src/core/InputGuard.ts` has no `apiKey` field, no `fetch` call, and `parseReviewResponse()` returns suspicious on malformed JSON.
143
+ - Audit evidence: `grep -n "ANTHROPIC_API_KEY" src/` — 4 runtime reads walked; only `server.ts:1990` (now removed) was a real bypass.
144
+ - Convergence report: `docs/specs/reports/subscription-first-input-guard-convergence.md` (full iteration log and finding resolutions).
145
+ - Spec: `docs/specs/subscription-first-input-guard.md` (tagged `review-convergence` + `approved: true` post-convergence).