instar 1.3.582 → 1.3.583

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,8 +1,8 @@
1
1
  {
2
2
  "$schema": "./builtin-manifest.schema.json",
3
3
  "schemaVersion": 1,
4
- "generatedAt": "2026-06-15T22:53:11.307Z",
5
- "instarVersion": "1.3.582",
4
+ "generatedAt": "2026-06-15T23:23:58.124Z",
5
+ "instarVersion": "1.3.583",
6
6
  "entryCount": 201,
7
7
  "entries": {
8
8
  "hook:session-start": {
@@ -0,0 +1,51 @@
1
+ # Upgrade Guide — vNEXT
2
+
3
+ <!-- assembled-by: assemble-next-md -->
4
+ <!-- bump: patch -->
5
+
6
+ ## What Changed
7
+
8
+ A short timer drift that recurs while load sits in the **1.0–1.5/core band** slipped past both
9
+ existing guards: the load guard fires only above 1.5/core, and the consecutive burst floor resets
10
+ whenever on-time ticks fall between drifts. Its ~2-minute cadence also outlasted the 60s cooldown.
11
+ So each isolated drift emitted a **false `wake`**, firing the full wake-recovery cascade (tunnel
12
+ restart, Slack reconnect, mesh-lease churn, topic failover) — the source of a class of multi-machine
13
+ UX failures: a reply that's lost the conversation thread, messages that get no reply, and "remote
14
+ typing is disabled" (the 2026-06-15 incident, measured at ~1.13/core).
15
+
16
+ The detector now adds a **recurring-drift guard**: a short drift within `recentDriftWindowMs`
17
+ (default 5 min) of a prior short drift, while load is oversubscribed (`> recentDriftLoadFloor`,
18
+ default 1.0/core), is treated as recurring CPU starvation and suppressed. This generalizes the burst
19
+ floor from *consecutive* ticks to *recent* ticks, and the load gate confines it to the
20
+ oversubscribed band the hard guard leaves open.
21
+
22
+ ## What to Tell Your User
23
+
24
+ - **Fewer spurious reconnects on a busy laptop**: "When my machine got busy I used to mistake the
25
+ slowdown for the computer going to sleep, which kicked off a disruptive recovery — dropping the
26
+ conversation thread, going quiet, or disabling typing. I now recognize that pattern and stay calm,
27
+ so those multi-machine glitches should largely stop."
28
+ - **Real sleeps still handled**: "If the machine genuinely sleeps, I still notice and recover
29
+ properly — nothing changes there."
30
+
31
+ ## Summary of New Capabilities
32
+
33
+ | Capability | How to Use |
34
+ |-----------|-----------|
35
+ | Suppress false "wake" events from CPU starvation on a loaded host | automatic |
36
+ | Tune or disable the new guard | `monitoring.sleepWake.recentDriftWindowMs` / `.recentDriftLoadFloor` (set window to 0 to disable) |
37
+
38
+ ## Evidence
39
+
40
+ Reproduction (live, 2026-06-15): on a host measured at loadavg ~18 on 16 cores (~1.13/core — above
41
+ 1.0 but below the 1.5 hard guard), `server.log` showed `[SleepWakeDetector] Wake detected after
42
+ ~33s/~21s sleep` recurring roughly every 2 minutes while the host was actively in use (not sleeping),
43
+ each triggering the wake-recovery cascade. The drifts were isolated (on-time ticks between them reset
44
+ the consecutive counter) and ~2 min apart (outlasting the 60s cooldown), so neither existing guard
45
+ caught them.
46
+
47
+ After the fix (verified by 45/45 sleep-wake unit tests across 5 files, both sides of the boundary): a
48
+ recurring short drift in the 1.0–1.5 band is suppressed (no `wake` emitted, recorded as
49
+ `cpu-starvation`); a genuinely isolated short drift, any drift on a light/idle host (ratio ≤ 1.0),
50
+ and every long (real) sleep still emit; `recentDriftWindowMs: 0` restores byte-identical prior
51
+ behavior. tsc clean.
@@ -0,0 +1,46 @@
1
+ # Side-effects — SleepWakeDetector recurring-drift guard (gap #3 / CMT-1563)
2
+
3
+ ## What changed (3 files)
4
+
5
+ - `src/core/SleepWakeDetector.ts` — new config `recentDriftWindowMs` (default 300000) +
6
+ `recentDriftLoadFloor` (default 1.0); new state `lastShortDriftAtMs`; a new suppression branch
7
+ in `start()` (after the load guard, before the cooldown) that suppresses a SHORT drift recurring
8
+ within the window while `loadRatio > recentDriftLoadFloor`. Reuses the existing `cpu-starvation`
9
+ suppression reason — the stats/telemetry type is unchanged.
10
+ - `src/core/types.ts` — `config.monitoring.sleepWake` gains the two optional knobs (mirrors the
11
+ existing `maxLoadRatio` plumbing; no ConfigDefaults change, no migration).
12
+ - `src/commands/server.ts` — the production `new SleepWakeDetector({...})` boot site forwards the
13
+ two new knobs from `config.monitoring.sleepWake`.
14
+
15
+ ## Behavioral side-effects
16
+
17
+ - **On a moderately-loaded host (loadRatio in the 1.0–1.5 band):** a short timer drift that recurs
18
+ within 5 min of a prior short drift no longer emits a `wake` — so it no longer triggers the
19
+ wake-recovery cascade (tunnel restart / Slack reconnect / mesh-lease churn / topic failover). This
20
+ is the fix for the 2026-06-15 multi-machine UX cascade.
21
+ - **No change** on a light/idle host (ratio ≤ 1.0): repeated short drifts still emit (the existing
22
+ "genuinely-isolated drifts both emit" behavior is preserved — verified by the unchanged tests).
23
+ - **No change** for long sleeps (≥ `longSleepFloorSeconds`): always emitted, recovery preserved.
24
+ - **No change** for an isolated short drift (no prior drift in the window): still emits.
25
+ - The wake-reaper's cumulative-sleep accounting is unaffected — suppressed drifts were already
26
+ excluded from `wakeHistory`, and this branch suppresses the same way.
27
+
28
+ ## Risk + rollback
29
+
30
+ - HIGH-risk surface (session-lifecycle / recovery trigger). Fail-safe direction: the branch only
31
+ ADDS suppression to a SHORT drift on an OVERSUBSCRIBED host; it can never suppress a real long
32
+ sleep or change light-host behavior.
33
+ - Rollback lever: `config.monitoring.sleepWake.recentDriftWindowMs: 0` disables the guard with no
34
+ logic redeploy (restores exactly today's behavior).
35
+
36
+ ## Tests
37
+
38
+ - `tests/unit/sleep-wake-starvation-guard.test.ts` — new `describe('recurring-drift guard for the
39
+ moderate-load band')` with 5 cases (band-suppress, light-host-emit, isolated-emit, disable-lever,
40
+ long-sleep-exempt). Full sleep-wake unit suite: 39/39 green. tsc clean on the touched files.
41
+
42
+ ## Migration parity
43
+
44
+ The fix ships in the class default, so every agent gets it on update (the boot site reads optional
45
+ config but the default is in the constructor). No `.claude`/hook/skill/CLAUDE.md template change is
46
+ required — this is an internal monitoring guard, not an agent-facing capability or route.