instar 1.3.583 → 1.3.584

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (33) hide show
  1. package/dist/config/ConfigDefaults.d.ts.map +1 -1
  2. package/dist/config/ConfigDefaults.js +13 -0
  3. package/dist/config/ConfigDefaults.js.map +1 -1
  4. package/dist/core/PlaywrightProfileRegistry.d.ts +269 -0
  5. package/dist/core/PlaywrightProfileRegistry.d.ts.map +1 -0
  6. package/dist/core/PlaywrightProfileRegistry.js +640 -0
  7. package/dist/core/PlaywrightProfileRegistry.js.map +1 -0
  8. package/dist/core/PostUpdateMigrator.d.ts +21 -0
  9. package/dist/core/PostUpdateMigrator.d.ts.map +1 -1
  10. package/dist/core/PostUpdateMigrator.js +195 -0
  11. package/dist/core/PostUpdateMigrator.js.map +1 -1
  12. package/dist/core/devGatedFeatures.d.ts.map +1 -1
  13. package/dist/core/devGatedFeatures.js +6 -0
  14. package/dist/core/devGatedFeatures.js.map +1 -1
  15. package/dist/core/types.d.ts +13 -0
  16. package/dist/core/types.d.ts.map +1 -1
  17. package/dist/core/types.js.map +1 -1
  18. package/dist/scaffold/templates.d.ts.map +1 -1
  19. package/dist/scaffold/templates.js +8 -1
  20. package/dist/scaffold/templates.js.map +1 -1
  21. package/dist/server/CapabilityIndex.d.ts.map +1 -1
  22. package/dist/server/CapabilityIndex.js +1 -0
  23. package/dist/server/CapabilityIndex.js.map +1 -1
  24. package/dist/server/routes.d.ts +8 -0
  25. package/dist/server/routes.d.ts.map +1 -1
  26. package/dist/server/routes.js +341 -0
  27. package/dist/server/routes.js.map +1 -1
  28. package/package.json +1 -1
  29. package/src/data/builtin-manifest.json +63 -63
  30. package/src/scaffold/templates.ts +9 -1
  31. package/upgrades/1.3.584.md +84 -0
  32. package/upgrades/side-effects/playwright-profile-registry.md +140 -0
  33. package/upgrades/1.3.583.md +0 -51
@@ -0,0 +1,84 @@
1
+ # Upgrade Guide — vNEXT
2
+
3
+ <!-- assembled-by: assemble-next-md -->
4
+ <!-- bump: minor -->
5
+
6
+ ## What Changed
7
+
8
+ **A durable per-agent registry mapping each Playwright browser profile to the accounts it is logged into — plus boot-time awareness of what browser access the agent actually has.** Until now the agent self-unblocked by driving a real browser (Playwright MCP) logged into real accounts, but there was no authoritative record of *which profile holds which account*. That knowledge lived only as ~21 scattered, partly-contradictory `operationalFacts` — which led the agent to ask the operator to act (or grind a credential treadmill) instead of resolving the right profile itself.
9
+
10
+ The new `PlaywrightProfileRegistry` (`src/core/PlaywrightProfileRegistry.ts`) is the missing data + awareness + selection + activation layer:
11
+
12
+ - A durable, machine-local state file `state/playwright-profiles.json` mapping each **profile** (a physical browser user-data-dir on THIS machine) to the **accounts** it is responsible for — by vault-secret NAME only, **never values**.
13
+ - A compact boot-awareness pointer injected at session start (`GET /playwright-profiles/session-context`), so the agent knows from message one what browser access it has and **as whom** — operator-owned accounts flagged loud (Know Your Principal), login state rendered as last-asserted staleness (advisory, never a guarantee).
14
+ - Routes to list / create / assign / resolve / activate profiles. `resolve` picks the owning profile for a `(service, identity)` and forces disambiguation rather than silently picking a privileged account; `activate` rewrites the MCP config and restarts the session onto the chosen profile.
15
+
16
+ **Safety posture (the honesty disciplines that keep this from re-creating the scattered-facts problem):** no secret VALUE is ever stored, returned, injected, or resolved (names only). Every write is audited (`logs/playwright-profiles.jsonl`). A corrupt registry file fails CLOSED for writes (never auto-overwritten) and OPEN for the boot block (injects nothing). Caller-supplied profile dirs are path-jailed to the agent home. The seed is metadata-only — it never touches `.mcp.json` / `.claude/settings.json`, so an update can never regress another agent's shared browser login.
17
+
18
+ **Rollout:** the whole feature is **dev-gated** (`playwrightRegistry.enabled` omitted → live on a development agent, **dark on the fleet** — routes 503, the boot block injects nothing). The only destructive op, `activate` (config rewrite + session restart), additionally ships `dryRun: true` — it LOGS the intended rewrite/refresh and performs NEITHER until a deliberate `dryRun: false`. Existing agents pick it up via full migration parity (state seed, session-start hook, CLAUDE.md awareness section, config default + strip-false migration).
19
+
20
+ A short timer drift that recurs while load sits in the **1.0–1.5/core band** slipped past both
21
+ existing guards: the load guard fires only above 1.5/core, and the consecutive burst floor resets
22
+ whenever on-time ticks fall between drifts. Its ~2-minute cadence also outlasted the 60s cooldown.
23
+ So each isolated drift emitted a **false `wake`**, firing the full wake-recovery cascade (tunnel
24
+ restart, Slack reconnect, mesh-lease churn, topic failover) — the source of a class of multi-machine
25
+ UX failures: a reply that's lost the conversation thread, messages that get no reply, and "remote
26
+ typing is disabled" (the 2026-06-15 incident, measured at ~1.13/core).
27
+
28
+ The detector now adds a **recurring-drift guard**: a short drift within `recentDriftWindowMs`
29
+ (default 5 min) of a prior short drift, while load is oversubscribed (`> recentDriftLoadFloor`,
30
+ default 1.0/core), is treated as recurring CPU starvation and suppressed. This generalizes the burst
31
+ floor from *consecutive* ticks to *recent* ticks, and the load gate confines it to the
32
+ oversubscribed band the hard guard leaves open.
33
+
34
+ ## What to Tell Your User
35
+
36
+ - ⚗️ **Experimental, development-agent only.** On the fleet this ships dark — the routes 503 and nothing is injected at session start, so a normal agent sees no change. On a development agent it runs live, but the only state-changing operation (switching the browser onto a profile) is held in dry-run by default, so it only LOGS what it would do until that is deliberately turned off.
37
+ - **What it gives a dev agent:** instead of asking you to drive the browser or produce a credential, the agent can now look up which browser profile is logged into a given account, pick the right one, and (when activated) restart its session onto that profile. It tracks which accounts are **yours** vs the agent's own, so it won't act as you in a browser unless explicitly authorized — and login state is treated as last-asserted, so it re-verifies in-browser before any privileged action.
38
+ - **At-rest honesty:** the registry file is plaintext machine-local. It lists account identities + vault key NAMES — so filesystem access to the machine reveals the agent's access *map*, never the credentials themselves (same posture as the self-knowledge tree and the relationships store).
39
+
40
+ Side-effects review: upgrades/side-effects/playwright-profile-registry.md
41
+
42
+ - **Fewer spurious reconnects on a busy laptop**: "When my machine got busy I used to mistake the
43
+ slowdown for the computer going to sleep, which kicked off a disruptive recovery — dropping the
44
+ conversation thread, going quiet, or disabling typing. I now recognize that pattern and stay calm,
45
+ so those multi-machine glitches should largely stop."
46
+ - **Real sleeps still handled**: "If the machine genuinely sleeps, I still notice and recover
47
+ properly — nothing changes there."
48
+
49
+ ## Summary of New Capabilities
50
+
51
+ | Capability | How to Use |
52
+ |-----------|-----------|
53
+ | See which browser profile holds which account (full detail; vault NAMES only) | `GET /playwright-profiles` |
54
+ | Compact boot-awareness pointer (also auto-injected at session start) | `GET /playwright-profiles/session-context` |
55
+ | Create a custom profile | `POST /playwright-profiles` `{ id, description?, userDataDir? }` |
56
+ | Assign an account to a profile (owner agent\|operator; vault NAMES only) | `POST /playwright-profiles/:id/accounts` `{ service, identity, owner, vaultRefs[], loginMethod?, note? }` |
57
+ | Pick the right profile for a task | `GET /playwright-profiles/resolve?service=&identity=` (ambiguous service-only → `{ ambiguous: true, candidates }`) |
58
+ | Switch the browser onto a profile (config rewrite + session restart) | `POST /playwright-profiles/:id/activate` (ships `dryRun: true` — logs the intended switch until a deliberate `dryRun: false`; reversible by activating `default`) |
59
+
60
+ | Capability | How to Use |
61
+ |-----------|-----------|
62
+ | Suppress false "wake" events from CPU starvation on a loaded host | automatic |
63
+ | Tune or disable the new guard | `monitoring.sleepWake.recentDriftWindowMs` / `.recentDriftLoadFloor` (set window to 0 to disable) |
64
+
65
+ ## Evidence
66
+
67
+ - `PlaywrightProfileRegistry` seeds exactly one `default` profile via the shared `resolvePlaywrightMcpConfig()` resolver (records the real `--user-data-dir` if the canonical config carries one, else `null` = the built-in default — never `.playwright-mcp`, which is the MCP output-dir, not the browser profile).
68
+ - New `DEV_GATED_FEATURES` entry `playwrightRegistry` (configPath `playwrightRegistry.enabled`) — picked up automatically by the dual-side wiring test (`tests/unit/devGatedFeatures-wiring.test.ts`): the entry resolves LIVE under a dev-agent config and DARK under a fleet config.
69
+ - `ConfigDefaults` adds `playwrightRegistry: { dryRun: true }` and OMITS `enabled` (the dev-gate convention, mirroring `credentialRepointing` / `topicProfiles`).
70
+ - Migration parity in `PostUpdateMigrator`: the `/playwright-profiles/session-context` session-start fetch+inject block is modeled byte-for-byte on the existing `/self-knowledge/session-context` block (`curl -sf --max-time 4 --connect-timeout 1`, `python3` parse of `.block`, fail-open on 503/404/empty); a `migrateClaudeMd` content-sniff appends the awareness section; the `playwright-profiles-seed-v1` marker migration seeds the default profile metadata-only (idempotent, marks done either way); a `migrateConfigPlaywrightRegistryDevGate` strip-false migration mirrors the credential-repointing strip so a stale default-shaped `enabled: false` resolves the gate live.
71
+ - The CLAUDE.md awareness section is authored ONCE (`PLAYWRIGHT_PROFILE_REGISTRY_CLAUDEMD_SECTION`) and shared by `generateClaudeMd` (new installs) and `migrateClaudeMd` (existing agents) so the two can never drift.
72
+
73
+ Reproduction (live, 2026-06-15): on a host measured at loadavg ~18 on 16 cores (~1.13/core — above
74
+ 1.0 but below the 1.5 hard guard), `server.log` showed `[SleepWakeDetector] Wake detected after
75
+ ~33s/~21s sleep` recurring roughly every 2 minutes while the host was actively in use (not sleeping),
76
+ each triggering the wake-recovery cascade. The drifts were isolated (on-time ticks between them reset
77
+ the consecutive counter) and ~2 min apart (outlasting the 60s cooldown), so neither existing guard
78
+ caught them.
79
+
80
+ After the fix (verified by 45/45 sleep-wake unit tests across 5 files, both sides of the boundary): a
81
+ recurring short drift in the 1.0–1.5 band is suppressed (no `wake` emitted, recorded as
82
+ `cpu-starvation`); a genuinely isolated short drift, any drift on a light/idle host (ratio ≤ 1.0),
83
+ and every long (real) sleep still emit; `recentDriftWindowMs: 0` restores byte-identical prior
84
+ behavior. tsc clean.
@@ -0,0 +1,140 @@
1
+ # Side-Effects Review — Playwright Profile Registry + Account-Access Awareness
2
+
3
+ Spec: `docs/specs/playwright-profile-registry.md` (converged 2 iterations, approved).
4
+ Tier: **2** (new feature: new store, 8 routes, migration parity, session-start hook,
5
+ config + dev-gate, agent awareness; risk floor raised by new-capability + identity-touch
6
+ + fleet-rollout signals — Tier 2 matches).
7
+
8
+ ## Phase 1 — Principle check (signal vs authority)
9
+
10
+ Does this change involve a decision point that gates information flow / blocks actions /
11
+ constrains agent behavior? **No blocking authority.** The feature is a
12
+ data + awareness + selection + tool layer:
13
+ - The boot block is an explicitly-ADVISORY signal (`<playwright-profiles>` envelope,
14
+ "background signal, not authority — verify before acting"); login state is
15
+ `lastAsserted`, never asserted as fact (D11).
16
+ - `activate` is a TOOL the agent invokes; it does NOT bypass any external-operation /
17
+ coherence gate — switching the browser profile is not authorization to act as that
18
+ identity (D12 + the activate clause).
19
+ - The only gates are the dev-agent dark gate + the `activate` `dryRun` — both ROLLOUT
20
+ controls, not behavioral authority.
21
+
22
+ No brittle check holds blocking authority. Compliant with `docs/signal-vs-authority.md`.
23
+
24
+ ## Phase 2 — Build location
25
+
26
+ Fresh worktree `.worktrees/playwright-profile-registry`, branch
27
+ `echo/playwright-profile-registry` off `JKHeadley/main` @461ceec0e (package.json
28
+ v1.3.579). `git remote -v`: JKHeadley = canonical. Identity set to
29
+ `Instar Agent (echo)` / `echo@instar.local`.
30
+
31
+ ## The 8 questions
32
+
33
+ ### 1. Over-block (what legitimate inputs does this reject that it shouldn't?)
34
+ - `userDataDir` jail (D9) rejects any path outside the agent home, `-`-prefixed, or
35
+ NUL-bearing. A legitimate profile dir is always inside the agent home (the worktree
36
+ convention / sandbox-stable home), so this rejects nothing legitimate. A user who
37
+ genuinely wanted a profile outside the home would be rejected — intentional (the jail
38
+ is the security boundary; out-of-home profiles are exactly the cross-agent-theft /
39
+ sandbox-revocation hazard).
40
+ - Ref-validation fails CLOSED when the vault is unreadable (D17): a legitimate assign is
41
+ rejected (409) while the vault is decrypt-failed/absent. Intentional — better to
42
+ refuse than record an unvalidated ref. The vault being unreadable is itself an
43
+ incident the operator should resolve first.
44
+
45
+ ### 2. Under-block (what failure modes does this still miss?)
46
+ - The registry cannot observe a dead browser cookie — `lastAsserted: true` can be stale.
47
+ MITIGATED by D11 (rendered staleness age + advisory framing + "verify before
48
+ privileged action"), not eliminated. This is by design: the registry is a signal, not
49
+ a liveness oracle. The agent must re-verify in-browser.
50
+ - `owner: agent|operator` is a self-asserted label, not a verified principal (D12 note):
51
+ a poisoned/mistaken write could mislabel. MITIGATED by the audit log (attributable),
52
+ the advisory framing, and the real act-as defense being the un-bypassed
53
+ external-op/coherence gate — the label is a hint, never an authorization.
54
+
55
+ ### 3. Level-of-abstraction fit
56
+ Correct layer. It mirrors the proven `BootSelfKnowledge` boot-block pattern + the
57
+ `CommitmentTracker` CAS pattern + the `credentialRepointing` dryRun convention. The
58
+ login keystrokes are deliberately NOT here (D8 — a non-deterministic interactive action
59
+ belongs in the agent, not a deterministic route). A smarter gate does not already own
60
+ this; it feeds awareness, it does not duplicate one.
61
+
62
+ ### 4. Signal vs authority compliance
63
+ Compliant (see Phase 1). The boot block is advisory; `activate` does not gate behavior;
64
+ the dev-gate + dryRun are rollout controls. No brittle blocking authority added.
65
+
66
+ ### 5. Interactions (shadowing / double-fire / races)
67
+ - `activate`'s session refresh + the MCP-health auto-refresh (`mcp-health-autorefresh.sh`)
68
+ could both target playwright. MITIGATED: the auto-refresh has a hard once-per-(session,
69
+ failed-set) loop-guard (verified at PostUpdateMigrator.ts:8442+); `activate`'s
70
+ already-active fast path (no write/no refresh when the target dir is already set) +
71
+ the per-session activate cooldown/window breaker prevent a restart storm (D19).
72
+ - Concurrent writes to `state/playwright-profiles.json`: single-writer CAS `mutate()`
73
+ (D14) — no lost update (NOT bare `writeConfigAtomic`).
74
+ - The shared `resolvePlaywrightMcpConfig()` is the SINGLE source-of-truth for both seed
75
+ and activate (F2) — the two paths cannot drift on "where the playwright arg lives."
76
+ - The boot fetch is added adjacent to the self-knowledge fetch in `getSessionStartHook()`
77
+ — same fail-open shape; it cannot block boot (D22).
78
+
79
+ ### 6. External surfaces
80
+ - New HTTP routes (`/playwright-profiles/*`) — Bearer-authed, whole-feature dev-gated
81
+ (503 on fleet). Visible to the operating agent only.
82
+ - A new always-injected boot block — kept COMPACT (≤800 bytes, pointer-not-payload, D21)
83
+ to respect the boot-bloat lesson (L1); full detail behind the route.
84
+ - The plaintext `state/playwright-profiles.json` lists account IDENTITIES + vault key
85
+ NAMES (never values) — an at-rest access MAP. Documented honestly in the
86
+ agent-awareness section (same posture as `SelfKnowledgeTree`/operationalFacts).
87
+ - `activate` (only when `dryRun:false`) mutates the playwright MCP config file +
88
+ restarts the session — agent-initiated, audited, reversible.
89
+
90
+ ### 7. Multi-machine posture
91
+ **Machine-local BY DESIGN** (D6). A browser profile's logged-in session lives in cookies
92
+ on one machine's disk and cannot be moved by copying metadata. The state file, routes,
93
+ and boot block describe only the machine serving the request; the boot block reads LOCAL
94
+ state even after a topic transfer. No replication, no proxied-on-read, no generated URLs
95
+ crossing a machine boundary, no user-facing notices needing one-voice gating. Registry
96
+ state does not strand on topic transfer (it correctly does not travel). The cross-machine
97
+ "which machine holds profile X" read is tracked as a follow-up
98
+ (<!-- tracked: CMT-1554-pwprofile-crossmachine-holder-view -->), not silently assumed.
99
+
100
+ ### 8. Rollback cost
101
+ Cheap. Dark on the fleet by construction (dev-gated → all routes 503, session-start
102
+ injects nothing, state file inert). On a dev agent: `playwrightRegistry.enabled: false`.
103
+ `activate`'s config edit (only when `dryRun:false`) is reversed by activating `default`
104
+ (restores the no-arg built-in profile) or a one-line manual revert. No data migration,
105
+ no destructive state. The seeded default profile + `dryRun:true` config default are
106
+ additive. Back-out = flip the flag (no hot-fix release needed for the dark default).
107
+
108
+ ## Verification
109
+
110
+ - `npx tsc --noEmit` → clean (exit 0).
111
+ - New tests: unit 47 + integration 14 + e2e 3 = 64, all green; `devGatedFeatures-wiring`
112
+ 82 green (picks up the new entry); ratchet/capability suites (no-silent-fallbacks,
113
+ no-silent-llm-fallback, CapabilityIndex, capability-registry-generator,
114
+ lint-dev-agent-dark-gate, PostUpdateMigrator-guardsCapabilitySection) all green.
115
+ - `node scripts/lint-dev-agent-dark-gate.js` → clean. `node scripts/lint-guard-manifest.js`
116
+ → clean (request-driven feature, no manifest entry needed).
117
+
118
+ ## Phase 5 — Second-pass review (independent)
119
+
120
+ **Concur with the review** — the implementation matches the artifact's claims and is
121
+ sound. Independently verified against the code (file:line):
122
+ 1. activate (routes.ts:16933-17002): write+refresh gated behind `dryRun` default-true;
123
+ already-active fast path skips both; per-session 30s cooldown + 5/5min breaker
124
+ (:16758-16775) on the real-switch path only; rewrites only `mcpServers.playwright.args`
125
+ + schedules a refresh, makes no authorization claim → cannot grant act-as. No storm.
126
+ 2. No secret values: `listVaultNames` → `secretKeyPaths` (names only); audit log + boot
127
+ block never carry values/refs-values. Invariant holds end-to-end.
128
+ 3. Signal vs authority: boot block advisory; operator accounts marked "act-as only when
129
+ authorized"; staleness rendered; no code consumes the block as authority.
130
+ 4. Fail-closed/open: assign fails closed when vault names null; corrupt file → CRUD
131
+ throws, never overwrites; block fails open; boot hook `curl -sf --max-time 4`,
132
+ non-2xx/empty injects nothing.
133
+ 5. CAS: genuine single-writer `mutate()` (statSig before/after + retry).
134
+ 6. dev-gate: all 8 routes 503 on fleet; flag read fresh per request; `enabled` omitted;
135
+ strip-false migration + DEV_GATED_FEATURES entry present.
136
+ 7. sanitize: every rendered boot field through `sanitizeForBlock` (escapes `<`/`>`,
137
+ strips control chars); envelope breakout impossible.
138
+ 8. New risks: none material (reads call one-time idempotent ensureSeeded — no storm;
139
+ audit-log re-sanitize advisory documented, not yet a live surface).
140
+ No hole in the activate restart path or the no-value invariant.
@@ -1,51 +0,0 @@
1
- # Upgrade Guide — vNEXT
2
-
3
- <!-- assembled-by: assemble-next-md -->
4
- <!-- bump: patch -->
5
-
6
- ## What Changed
7
-
8
- A short timer drift that recurs while load sits in the **1.0–1.5/core band** slipped past both
9
- existing guards: the load guard fires only above 1.5/core, and the consecutive burst floor resets
10
- whenever on-time ticks fall between drifts. Its ~2-minute cadence also outlasted the 60s cooldown.
11
- So each isolated drift emitted a **false `wake`**, firing the full wake-recovery cascade (tunnel
12
- restart, Slack reconnect, mesh-lease churn, topic failover) — the source of a class of multi-machine
13
- UX failures: a reply that's lost the conversation thread, messages that get no reply, and "remote
14
- typing is disabled" (the 2026-06-15 incident, measured at ~1.13/core).
15
-
16
- The detector now adds a **recurring-drift guard**: a short drift within `recentDriftWindowMs`
17
- (default 5 min) of a prior short drift, while load is oversubscribed (`> recentDriftLoadFloor`,
18
- default 1.0/core), is treated as recurring CPU starvation and suppressed. This generalizes the burst
19
- floor from *consecutive* ticks to *recent* ticks, and the load gate confines it to the
20
- oversubscribed band the hard guard leaves open.
21
-
22
- ## What to Tell Your User
23
-
24
- - **Fewer spurious reconnects on a busy laptop**: "When my machine got busy I used to mistake the
25
- slowdown for the computer going to sleep, which kicked off a disruptive recovery — dropping the
26
- conversation thread, going quiet, or disabling typing. I now recognize that pattern and stay calm,
27
- so those multi-machine glitches should largely stop."
28
- - **Real sleeps still handled**: "If the machine genuinely sleeps, I still notice and recover
29
- properly — nothing changes there."
30
-
31
- ## Summary of New Capabilities
32
-
33
- | Capability | How to Use |
34
- |-----------|-----------|
35
- | Suppress false "wake" events from CPU starvation on a loaded host | automatic |
36
- | Tune or disable the new guard | `monitoring.sleepWake.recentDriftWindowMs` / `.recentDriftLoadFloor` (set window to 0 to disable) |
37
-
38
- ## Evidence
39
-
40
- Reproduction (live, 2026-06-15): on a host measured at loadavg ~18 on 16 cores (~1.13/core — above
41
- 1.0 but below the 1.5 hard guard), `server.log` showed `[SleepWakeDetector] Wake detected after
42
- ~33s/~21s sleep` recurring roughly every 2 minutes while the host was actively in use (not sleeping),
43
- each triggering the wake-recovery cascade. The drifts were isolated (on-time ticks between them reset
44
- the consecutive counter) and ~2 min apart (outlasting the 60s cooldown), so neither existing guard
45
- caught them.
46
-
47
- After the fix (verified by 45/45 sleep-wake unit tests across 5 files, both sides of the boundary): a
48
- recurring short drift in the 1.0–1.5 band is suppressed (no `wake` emitted, recorded as
49
- `cpu-starvation`); a genuinely isolated short drift, any drift on a light/idle host (ratio ≤ 1.0),
50
- and every long (real) sleep still emit; `recentDriftWindowMs: 0` restores byte-identical prior
51
- behavior. tsc clean.