instar 1.3.4 → 1.3.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,68 @@
1
+ # Side-Effects Review — Stage-B auto-capture + funnel (Mentor System §19.2)
2
+
3
+ **Spec:** `docs/specs/FRAMEWORK-ONBOARDING-MENTOR-SPEC.md` (converged 5 iters, approved by Justin)
4
+ **Change:** Adds the Stage-B auto-capture entry point `FrameworkIssueLedger.captureRun()` + a
5
+ `framework_capture_runs` funnel table + `captureStats()` + a read-only route
6
+ `GET /framework-issues/capture-stats`. Builds on §19.1 (the ledger foundation, PR #405).
7
+ **Files:** `src/monitoring/FrameworkIssueLedger.ts`, `src/server/routes.ts`,
8
+ `src/server/CapabilityIndex.ts`, `tests/unit/FrameworkIssueLedger.test.ts`,
9
+ `tests/integration/framework-issues-routes.test.ts`,
10
+ `tests/e2e/framework-issue-ledger-lifecycle.test.ts`, `upgrades/NEXT.md`.
11
+
12
+ ## Principle check (Phase 1)
13
+
14
+ Does this involve a decision point that gates info flow / blocks actions / constrains behavior?
15
+ **No.** `captureRun()` is the write-path the mentor tick will call after forensics — it records
16
+ findings and logs the run. It holds no blocking authority. The capture-stats route is read-only
17
+ observability. Signal-only. The decision-bearing components (Stage A, the job, graduation) are
18
+ §19.3–5.
19
+
20
+ ## The seven questions
21
+
22
+ 1. **Over-block.** None. `captureRun` writes whatever findings it's given (validated against the
23
+ same enum allowlists as `recordObservation`); it rejects only structurally-invalid findings
24
+ (bad bucket/severity), which is correct.
25
+
26
+ 2. **Under-block.** The funnel is the explicit guard against the under-detection failure mode it
27
+ exists for: a run that writes nothing is still logged, so an inert/broken writer (runs climbing,
28
+ observations flat) is visible. It does not *interpret* that signal (no alerting yet) — surfacing
29
+ is via the read-only route; an alert/threshold is a later concern (§19.5 observability surface),
30
+ not under-blocking here. There is still no public write route, so no untrusted-write surface.
31
+
32
+ 3. **Level-of-abstraction fit.** `captureRun` lives ON the ledger (not a separate DI component)
33
+ because better-sqlite3 has no nested transactions and the capture is a thin orchestration over
34
+ `recordObservation` + `suggestRegressions` + a run-log insert. One atomic entry point for the
35
+ tick to call is the right seam; a separate service would add wiring with no benefit.
36
+
37
+ 4. **Signal vs authority.** Compliant. `captureRun` writes signal; regression candidates are
38
+ *surfaced, never auto-linked* (the writer doesn't get to decide a regression — §13.5). Promotion
39
+ and graduation authority stay with the human (§6/§8).
40
+
41
+ 5. **Interactions.** `captureRun` calls `recordObservation` per finding (each its own txn — no
42
+ nested-transaction violation), then writes one `framework_capture_runs` row. Episode-collapsing
43
+ in `recordObservation` already prevents double-counting across runs, so re-observing the same
44
+ open issue next tick does not inflate `observationsWritten`. The funnel table is independent of
45
+ the issues/observations tables — no shadowing.
46
+
47
+ 6. **External surfaces.** One new read-only route (`/framework-issues/capture-stats`) behind the
48
+ standard Bearer middleware, added to the existing `frameworkIssues` CapabilityIndex entry (no
49
+ new prefix → no discoverability-classification gap). No agent-facing template change needed
50
+ beyond §19.1's. No timing/conversation-state dependence.
51
+
52
+ 7. **Rollback cost.** Low. Still dormant — no production caller until the mentor job (§19.4). The
53
+ `framework_capture_runs` table auto-creates at startup (idempotent `CREATE TABLE IF NOT EXISTS`)
54
+ and is harmless read-only data. Back-out = revert; the table can stay or be dropped.
55
+
56
+ ## Phase 5 — second-pass
57
+
58
+ **Not required.** Read-only observability + a signal-only write-path; no block/allow, no
59
+ session-lifecycle, nothing named sentinel/guard/gate/watchdog. The spec passed full convergence.
60
+
61
+ ## Testing
62
+
63
+ - Tier 1 (unit, +6 = 28 total): captureRun writes findings + reports summary; **every run logged
64
+ to the funnel including a zero-finding run** (inert-writer guard); no double-count across runs;
65
+ regression candidates surfaced not auto-linked; per-framework funnel breakdown; enum guard.
66
+ - Tier 2 (integration, +2 = 11 total): `/framework-issues/capture-stats` 503 + 200-with-funnel.
67
+ - Tier 3 (e2e, +1 = 7 total): a real capture run surfaces in the funnel route on the live server.
68
+ - Affected push-config suite (vs JKHeadley/main): 842 + 299 capability tests green, no regressions.
@@ -0,0 +1,102 @@
1
+ # Side-Effects Review — FrameworkIssueLedger (Mentor System §19.1 foundation)
2
+
3
+ **Spec:** `docs/specs/FRAMEWORK-ONBOARDING-MENTOR-SPEC.md` (converged 5 iters, approved by Justin)
4
+ **Change:** New SQLite two-table issue ledger (`framework_issues` + `framework_observations`),
5
+ two read-only HTTP routes (`/framework-issues`, `/framework-issues/playbook`), AgentServer
6
+ startup instantiation, RouteContext wiring, CLAUDE.md template row + migrator section, NEXT.md.
7
+ **Files:** `src/monitoring/FrameworkIssueLedger.ts` (new), `src/server/routes.ts`,
8
+ `src/server/AgentServer.ts`, `src/scaffold/templates.ts`, `src/core/PostUpdateMigrator.ts`,
9
+ `tests/unit/FrameworkIssueLedger.test.ts` (new), `tests/integration/framework-issues-routes.test.ts` (new),
10
+ `tests/e2e/framework-issue-ledger-lifecycle.test.ts` (new), `tests/unit/feature-delivery-completeness.test.ts`,
11
+ `upgrades/NEXT.md`.
12
+
13
+ ## Principle check (Phase 1)
14
+
15
+ Does this involve a decision point that gates info flow / blocks actions / filters messages /
16
+ constrains agent behavior? **No.** The ledger is a data store + read-only routes — it records
17
+ observations (signal) and serves queries. It holds zero blocking authority. The decision-bearing
18
+ parts of the mentor system (two-hats enforcement, assignment admission, graduation authority)
19
+ are §19.3–5 and ship later. This PR is a data-model + observability change → **signal-only**,
20
+ the correct posture per `docs/signal-vs-authority.md`.
21
+
22
+ ## The seven questions
23
+
24
+ 1. **Over-block — what legitimate inputs does this reject that it shouldn't?**
25
+ The routes reject unknown `framework` values (returns empty list, not an error) and invalid
26
+ `bucket`/`status` enums (400). An unknown framework returning empty is intentional (allowlist,
27
+ §17); it could surprise a caller who mistypes, but the response includes `knownFrameworks` so
28
+ the caller can self-correct. No legitimate data is rejected on write — `recordObservation`
29
+ accepts any framework string and creates the issue.
30
+
31
+ 2. **Under-block — what failure modes does this still miss?**
32
+ The ledger does not yet have a *writer* (Stage B auto-capture is §19.2), so today it only
33
+ accepts observations from in-process callers (the e2e test writes directly). There is no public
34
+ write route, so there is no untrusted-write surface to under-block. Secret-scanning of evidence
35
+ is pattern-based (api-key/JWT/Slack/GitHub/PEM shapes) — a novel secret format could slip
36
+ through; mitigated by the hard rule that evidence is an opaque reference, not log content, and
37
+ the length cap. The probable-loop flag is a heuristic (12 obs/hr) — a slow loop under that rate
38
+ won't trip it, but episode-collapsing already bounds recurrence inflation structurally.
39
+
40
+ 3. **Level-of-abstraction fit — right layer? smarter gate exists?**
41
+ Yes. It mirrors the established `TokenLedger` (read-only SQLite observability in
42
+ `src/monitoring/`) and reuses `CommitmentTracker`'s transactional-mutate discipline. It does
43
+ NOT duplicate FrameworkParitySentinel (renderings) — it records *behavior*, and §10 has the
44
+ sentinel feed it as an upstream signal in a later PR. No smarter gate exists for this data.
45
+
46
+ 4. **Signal vs authority compliance.**
47
+ Compliant. The ledger produces/serves signal; it never gates. `recordObservation` writes a
48
+ row; `listIssues`/`playbook` read. No method blocks, throttles, kills, or constrains. All
49
+ authority over what to do with an entry (ship a fix, promote to playbook `extracted`, advance
50
+ graduation) is reserved for the human per spec §6/§8 and lands in later PRs.
51
+
52
+ 5. **Interactions — shadow / double-fire / race with cleanup?**
53
+ - DB isolation: a dedicated `framework-issue-ledger.db` under `server-data/`, separate from
54
+ `token-ledger.db` — no shadowing of TokenLedger or its BurnDetector reads.
55
+ - Concurrency: WAL + `busy_timeout=5000` + a single SQLite transaction per write, with a
56
+ `UNIQUE(issue_id, episode_key)` index as the race guard (a concurrent duplicate episode insert
57
+ loses cleanly and is counted as already-recorded). Retention pruning runs inside the same txn.
58
+ - Startup: instantiated in the same `stateDir` guard block as TokenLedger; its own try/catch
59
+ means a ledger failure can't take down TokenLedger or server start.
60
+
61
+ 6. **External surfaces — visible to other agents/users/systems? timing/runtime deps?**
62
+ Two new read-only HTTP routes behind the standard Bearer middleware (verified by e2e: a
63
+ bearer-less request gets 401). New agents get one Registry-First row in CLAUDE.md; existing
64
+ agents get a migrator section (content-sniffed, idempotent). No Codex/Gemini shadow-marker
65
+ (developer-layer observability, not an end-user capability — tracked as a legacyMigratorSection
66
+ so the parity test stays green). No timing/conversation-state dependence.
67
+
68
+ 7. **Rollback cost — if wrong in production, what's the back-out?**
69
+ Low. The feature is dormant (no writer wired yet) and signal-only. Back-out = revert the PR;
70
+ the `framework-issue-ledger.db` file is harmless read-only observability data that can be left
71
+ on disk or deleted (nothing reads it except these routes). No data migration, no agent-state
72
+ repair. The routes fail-soft to 503 if the ledger is unavailable, so even a construction
73
+ failure degrades cleanly rather than breaking server start.
74
+
75
+ ## Phase 5 — second-pass
76
+
77
+ **Not required.** The Phase-5 trigger list is block/allow decisions, session lifecycle,
78
+ compaction, coherence/idempotency/trust gates, and anything named sentinel/guard/gate/watchdog.
79
+ This change is none of those — it is a read-only observability ledger with no blocking authority.
80
+ The decision-bearing components of the mentor system (which WILL trigger Phase 5) ship in §19.3–5.
81
+ The spec itself already passed a 5-iteration adversarial/security/scalability/integration/lessons
82
+ convergence before this build.
83
+
84
+ ## Testing
85
+
86
+ All three tiers, shipped in this PR (no "routes now, migration later"):
87
+ - Tier 1 (unit): 22 tests — CRUD, dedup false-merge resistance, episode collapsing, materialized
88
+ recurrence, impactScore + decay, regression auto-suggest, enum + SQL-injection-literal guard,
89
+ secret-scan redaction, retention pruning, playbook cross-framework semantics, clampLimit.
90
+ - Tier 2 (integration): 9 tests — routes over the HTTP pipeline (503, 200, limit clamp, framework
91
+ allowlist, invalid-enum 400, playbook).
92
+ - Tier 3 (e2e "feature is alive"): 5 tests — real AgentServer boot, DB auto-creates on production
93
+ init path, routes 200-not-503, written observation surfaces end-to-end, 401 without auth.
94
+ - Full push-config suite (vs JKHeadley/main): 3362 tests green, no regressions.
95
+
96
+ ## Post-push CI fix
97
+
98
+ CI shard 1/4 caught `capabilities-discoverability.test.ts`: every registered route prefix must be
99
+ classified in `src/server/CapabilityIndex.ts` (the test reads routes.ts via regex, not import, so
100
+ vitest's `--changed` graph didn't link it locally). Classified `/framework-issues` as a read-only
101
+ observability capability (mirrors the `tokens` entry, `enabled: !!ctx.frameworkIssueLedger`). No
102
+ behavioral change — surfaces the read-only routes in `/capabilities`. 322 capability tests green.