instar 1.3.3 → 1.3.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/core/ClaudeCliIntelligenceProvider.d.ts.map +1 -1
- package/dist/core/ClaudeCliIntelligenceProvider.js +5 -1
- package/dist/core/ClaudeCliIntelligenceProvider.js.map +1 -1
- package/dist/core/CodexCliIntelligenceProvider.d.ts.map +1 -1
- package/dist/core/CodexCliIntelligenceProvider.js +3 -1
- package/dist/core/CodexCliIntelligenceProvider.js.map +1 -1
- package/dist/core/PostUpdateMigrator.d.ts.map +1 -1
- package/dist/core/PostUpdateMigrator.js +18 -0
- package/dist/core/PostUpdateMigrator.js.map +1 -1
- package/dist/core/reviewers/standards-conformance.d.ts +10 -0
- package/dist/core/reviewers/standards-conformance.d.ts.map +1 -1
- package/dist/core/reviewers/standards-conformance.js +11 -0
- package/dist/core/reviewers/standards-conformance.js.map +1 -1
- package/dist/monitoring/FrameworkIssueLedger.d.ts +139 -0
- package/dist/monitoring/FrameworkIssueLedger.d.ts.map +1 -0
- package/dist/monitoring/FrameworkIssueLedger.js +441 -0
- package/dist/monitoring/FrameworkIssueLedger.js.map +1 -0
- package/dist/scaffold/templates.d.ts.map +1 -1
- package/dist/scaffold/templates.js +1 -0
- package/dist/scaffold/templates.js.map +1 -1
- package/dist/server/AgentServer.d.ts +1 -0
- package/dist/server/AgentServer.d.ts.map +1 -1
- package/dist/server/AgentServer.js +32 -17
- package/dist/server/AgentServer.js.map +1 -1
- package/dist/server/CapabilityIndex.d.ts.map +1 -1
- package/dist/server/CapabilityIndex.js +12 -0
- package/dist/server/CapabilityIndex.js.map +1 -1
- package/dist/server/middleware.d.ts +31 -0
- package/dist/server/middleware.d.ts.map +1 -1
- package/dist/server/middleware.js +49 -13
- package/dist/server/middleware.js.map +1 -1
- package/dist/server/routes.d.ts +4 -0
- package/dist/server/routes.d.ts.map +1 -1
- package/dist/server/routes.js +58 -0
- package/dist/server/routes.js.map +1 -1
- package/package.json +1 -1
- package/src/data/builtin-manifest.json +63 -63
- package/src/scaffold/templates.ts +1 -0
- package/upgrades/1.3.4.md +44 -0
- package/upgrades/1.3.5.md +38 -0
- package/upgrades/side-effects/conformance-gate-timeout.md +39 -0
- package/upgrades/side-effects/framework-issue-ledger.md +102 -0
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
# Side-Effects Review — conformance-gate timeout fix
|
|
2
|
+
|
|
3
|
+
**Slug:** `conformance-gate-timeout`
|
|
4
|
+
**Date:** `2026-05-26`
|
|
5
|
+
**Author:** Echo
|
|
6
|
+
**Spec:** `docs/specs/conformance-gate-timeout.md` (converged v1, approved by Justin 2026-05-26)
|
|
7
|
+
**Second-pass reviewer:** independent adversarial reviewer concurred ("blocking finding closed, nothing missing") after catching the second 30s wall in the v0 draft.
|
|
8
|
+
|
|
9
|
+
## Summary of the change
|
|
10
|
+
|
|
11
|
+
`POST /spec/conformance-check` (the Standards-Conformance Gate, auto-wired into `/spec-converge` by PR #403) timed out (HTTP 408 at 30s) on a real ~400-line spec. Root cause is **two** 30s walls: (A) the route was never added to the `requestTimeout` middleware's `perPathOverrides`, and (B) both `ClaudeCliIntelligenceProvider` and `CodexCliIntelligenceProvider` hardcoded `execFile(..., { timeout: 30_000 })` and ignored the `IntelligenceOptions.timeoutMs` the interface already defines, while the reviewer never passed one. A middleware-only fix would have converted the loud 408 into a silent empty `degraded` report (worse).
|
|
12
|
+
|
|
13
|
+
Fix (4 source edits):
|
|
14
|
+
1. Both providers: `timeout: options?.timeoutMs ?? DEFAULT_TIMEOUT_MS` (default 30s unchanged for every other caller).
|
|
15
|
+
2. `StandardsConformanceReviewer`: pass `timeoutMs: CONFORMANCE_REVIEW_TIMEOUT_MS` (150_000).
|
|
16
|
+
3. `middleware.ts`: exported `OUTBOUND_MESSAGING_TIMEOUT_MS`, new `SPEC_REVIEW_TIMEOUT_MS = 180_000`, `buildRequestTimeoutOverrides()` (single source of truth, now includes `/spec/conformance-check`), and `resolveRequestTimeout()` (matching logic extracted + shared with tests).
|
|
17
|
+
4. `AgentServer.ts`: wires `requestTimeout(..., buildRequestTimeoutOverrides())` instead of an inline literal.
|
|
18
|
+
|
|
19
|
+
## Decision-point inventory
|
|
20
|
+
|
|
21
|
+
No new decision point is created. The gate stays **signal-only + fail-open** — the change only lets the review *finish* so it can emit its (advisory) report. No blocking authority added (that remains the separate later `scg-blocking-authority` item).
|
|
22
|
+
|
|
23
|
+
## Seven-dimension review
|
|
24
|
+
|
|
25
|
+
1. **Over/under-reach** — Provider change is gated behind `options?.timeoutMs` being present, so the default path is byte-identical for every other `evaluate` caller (classifiers, sentinels, tone gate). Middleware override matches `/spec/conformance-check` (+children) only; the fast sibling `/spec/conformance-metrics` keeps the default — asserted by `resolveRequestTimeout('/spec/conformance-metrics', …) === default` in the unit test.
|
|
26
|
+
2. **Level-of-abstraction fit** — Each edit sits at its correct layer: child-process budget in the provider, review budget in the reviewer, HTTP budget in the middleware. No cross-layer leakage.
|
|
27
|
+
3. **Signal vs Authority** — Unchanged; verified above.
|
|
28
|
+
4. **Interactions** — The two budgets interact only through the ordering invariant `CONFORMANCE_REVIEW_TIMEOUT_MS (150s) < SPEC_REVIEW_TIMEOUT_MS (180s)`, pinned by a test, so the provider's clean kill fires before the middleware 408 → fail-open, not a raw timeout.
|
|
29
|
+
5. **Rollback cost** — Trivial and total: revert the edits; no data, no state, no migration.
|
|
30
|
+
6. **Migration parity** — N/A. All edits are server/library runtime code shipped in the package; none touch agent-installed files (`.claude/settings.json` / config / CLAUDE.md template / hooks / skills). Existing agents receive it on package update like all runtime code.
|
|
31
|
+
7. **Failure modes** — (a) Spec exceeds 150s → provider kills cleanly → fail-open degraded report (advisory); no 408, no regression vs today. (b) A caller passing `timeoutMs` to a provider expecting the old hard 30s → covered by the default-unchanged guarantee + the both-sides provider tests. (c) Override map drifting from the server's real wiring → prevented by testing the extracted production map/matcher. (d) Inner budget set ≥ outer → caught by the ordering-invariant test.
|
|
32
|
+
|
|
33
|
+
## Tests added/changed
|
|
34
|
+
|
|
35
|
+
- `tests/unit/ClaudeCliIntelligenceProvider-timeout.test.ts` (new) and an added block in `tests/unit/CodexCliIntelligenceProvider.test.ts`: behavioral — short `timeoutMs` kills a slow fake binary (regression catcher: pre-fix this budget was ignored), generous/absent budget resolves (honors longer; 30s default unchanged).
|
|
36
|
+
- `tests/unit/standards-conformance-gate.test.ts`: asserts the reviewer passes `timeoutMs === CONFORMANCE_REVIEW_TIMEOUT_MS` to the provider.
|
|
37
|
+
- `tests/unit/AgentServer-outbound-timeout.test.ts`: rewritten from a brittle source-regex into a wiring-integrity test that imports the production `buildRequestTimeoutOverrides()` + `resolveRequestTimeout()` and asserts `/spec/conformance-check` → 180s, `/spec/conformance-metrics` → default, plus the ordering invariant and that AgentServer wires the shared builder.
|
|
38
|
+
|
|
39
|
+
All 44 tests across the four files pass. Independent reviewer re-verified the revised plan against live code.
|
|
@@ -0,0 +1,102 @@
|
|
|
1
|
+
# Side-Effects Review — FrameworkIssueLedger (Mentor System §19.1 foundation)
|
|
2
|
+
|
|
3
|
+
**Spec:** `docs/specs/FRAMEWORK-ONBOARDING-MENTOR-SPEC.md` (converged 5 iters, approved by Justin)
|
|
4
|
+
**Change:** New SQLite two-table issue ledger (`framework_issues` + `framework_observations`),
|
|
5
|
+
two read-only HTTP routes (`/framework-issues`, `/framework-issues/playbook`), AgentServer
|
|
6
|
+
startup instantiation, RouteContext wiring, CLAUDE.md template row + migrator section, NEXT.md.
|
|
7
|
+
**Files:** `src/monitoring/FrameworkIssueLedger.ts` (new), `src/server/routes.ts`,
|
|
8
|
+
`src/server/AgentServer.ts`, `src/scaffold/templates.ts`, `src/core/PostUpdateMigrator.ts`,
|
|
9
|
+
`tests/unit/FrameworkIssueLedger.test.ts` (new), `tests/integration/framework-issues-routes.test.ts` (new),
|
|
10
|
+
`tests/e2e/framework-issue-ledger-lifecycle.test.ts` (new), `tests/unit/feature-delivery-completeness.test.ts`,
|
|
11
|
+
`upgrades/NEXT.md`.
|
|
12
|
+
|
|
13
|
+
## Principle check (Phase 1)
|
|
14
|
+
|
|
15
|
+
Does this involve a decision point that gates info flow / blocks actions / filters messages /
|
|
16
|
+
constrains agent behavior? **No.** The ledger is a data store + read-only routes — it records
|
|
17
|
+
observations (signal) and serves queries. It holds zero blocking authority. The decision-bearing
|
|
18
|
+
parts of the mentor system (two-hats enforcement, assignment admission, graduation authority)
|
|
19
|
+
are §19.3–5 and ship later. This PR is a data-model + observability change → **signal-only**,
|
|
20
|
+
the correct posture per `docs/signal-vs-authority.md`.
|
|
21
|
+
|
|
22
|
+
## The seven questions
|
|
23
|
+
|
|
24
|
+
1. **Over-block — what legitimate inputs does this reject that it shouldn't?**
|
|
25
|
+
The routes reject unknown `framework` values (returns empty list, not an error) and invalid
|
|
26
|
+
`bucket`/`status` enums (400). An unknown framework returning empty is intentional (allowlist,
|
|
27
|
+
§17); it could surprise a caller who mistypes, but the response includes `knownFrameworks` so
|
|
28
|
+
the caller can self-correct. No legitimate data is rejected on write — `recordObservation`
|
|
29
|
+
accepts any framework string and creates the issue.
|
|
30
|
+
|
|
31
|
+
2. **Under-block — what failure modes does this still miss?**
|
|
32
|
+
The ledger does not yet have a *writer* (Stage B auto-capture is §19.2), so today it only
|
|
33
|
+
accepts observations from in-process callers (the e2e test writes directly). There is no public
|
|
34
|
+
write route, so there is no untrusted-write surface to under-block. Secret-scanning of evidence
|
|
35
|
+
is pattern-based (api-key/JWT/Slack/GitHub/PEM shapes) — a novel secret format could slip
|
|
36
|
+
through; mitigated by the hard rule that evidence is an opaque reference, not log content, and
|
|
37
|
+
the length cap. The probable-loop flag is a heuristic (12 obs/hr) — a slow loop under that rate
|
|
38
|
+
won't trip it, but episode-collapsing already bounds recurrence inflation structurally.
|
|
39
|
+
|
|
40
|
+
3. **Level-of-abstraction fit — right layer? smarter gate exists?**
|
|
41
|
+
Yes. It mirrors the established `TokenLedger` (read-only SQLite observability in
|
|
42
|
+
`src/monitoring/`) and reuses `CommitmentTracker`'s transactional-mutate discipline. It does
|
|
43
|
+
NOT duplicate FrameworkParitySentinel (renderings) — it records *behavior*, and §10 has the
|
|
44
|
+
sentinel feed it as an upstream signal in a later PR. No smarter gate exists for this data.
|
|
45
|
+
|
|
46
|
+
4. **Signal vs authority compliance.**
|
|
47
|
+
Compliant. The ledger produces/serves signal; it never gates. `recordObservation` writes a
|
|
48
|
+
row; `listIssues`/`playbook` read. No method blocks, throttles, kills, or constrains. All
|
|
49
|
+
authority over what to do with an entry (ship a fix, promote to playbook `extracted`, advance
|
|
50
|
+
graduation) is reserved for the human per spec §6/§8 and lands in later PRs.
|
|
51
|
+
|
|
52
|
+
5. **Interactions — shadow / double-fire / race with cleanup?**
|
|
53
|
+
- DB isolation: a dedicated `framework-issue-ledger.db` under `server-data/`, separate from
|
|
54
|
+
`token-ledger.db` — no shadowing of TokenLedger or its BurnDetector reads.
|
|
55
|
+
- Concurrency: WAL + `busy_timeout=5000` + a single SQLite transaction per write, with a
|
|
56
|
+
`UNIQUE(issue_id, episode_key)` index as the race guard (a concurrent duplicate episode insert
|
|
57
|
+
loses cleanly and is counted as already-recorded). Retention pruning runs inside the same txn.
|
|
58
|
+
- Startup: instantiated in the same `stateDir` guard block as TokenLedger; its own try/catch
|
|
59
|
+
means a ledger failure can't take down TokenLedger or server start.
|
|
60
|
+
|
|
61
|
+
6. **External surfaces — visible to other agents/users/systems? timing/runtime deps?**
|
|
62
|
+
Two new read-only HTTP routes behind the standard Bearer middleware (verified by e2e: a
|
|
63
|
+
bearer-less request gets 401). New agents get one Registry-First row in CLAUDE.md; existing
|
|
64
|
+
agents get a migrator section (content-sniffed, idempotent). No Codex/Gemini shadow-marker
|
|
65
|
+
(developer-layer observability, not an end-user capability — tracked as a legacyMigratorSection
|
|
66
|
+
so the parity test stays green). No timing/conversation-state dependence.
|
|
67
|
+
|
|
68
|
+
7. **Rollback cost — if wrong in production, what's the back-out?**
|
|
69
|
+
Low. The feature is dormant (no writer wired yet) and signal-only. Back-out = revert the PR;
|
|
70
|
+
the `framework-issue-ledger.db` file is harmless read-only observability data that can be left
|
|
71
|
+
on disk or deleted (nothing reads it except these routes). No data migration, no agent-state
|
|
72
|
+
repair. The routes fail-soft to 503 if the ledger is unavailable, so even a construction
|
|
73
|
+
failure degrades cleanly rather than breaking server start.
|
|
74
|
+
|
|
75
|
+
## Phase 5 — second-pass
|
|
76
|
+
|
|
77
|
+
**Not required.** The Phase-5 trigger list is block/allow decisions, session lifecycle,
|
|
78
|
+
compaction, coherence/idempotency/trust gates, and anything named sentinel/guard/gate/watchdog.
|
|
79
|
+
This change is none of those — it is a read-only observability ledger with no blocking authority.
|
|
80
|
+
The decision-bearing components of the mentor system (which WILL trigger Phase 5) ship in §19.3–5.
|
|
81
|
+
The spec itself already passed a 5-iteration adversarial/security/scalability/integration/lessons
|
|
82
|
+
convergence before this build.
|
|
83
|
+
|
|
84
|
+
## Testing
|
|
85
|
+
|
|
86
|
+
All three tiers, shipped in this PR (no "routes now, migration later"):
|
|
87
|
+
- Tier 1 (unit): 22 tests — CRUD, dedup false-merge resistance, episode collapsing, materialized
|
|
88
|
+
recurrence, impactScore + decay, regression auto-suggest, enum + SQL-injection-literal guard,
|
|
89
|
+
secret-scan redaction, retention pruning, playbook cross-framework semantics, clampLimit.
|
|
90
|
+
- Tier 2 (integration): 9 tests — routes over the HTTP pipeline (503, 200, limit clamp, framework
|
|
91
|
+
allowlist, invalid-enum 400, playbook).
|
|
92
|
+
- Tier 3 (e2e "feature is alive"): 5 tests — real AgentServer boot, DB auto-creates on production
|
|
93
|
+
init path, routes 200-not-503, written observation surfaces end-to-end, 401 without auth.
|
|
94
|
+
- Full push-config suite (vs JKHeadley/main): 3362 tests green, no regressions.
|
|
95
|
+
|
|
96
|
+
## Post-push CI fix
|
|
97
|
+
|
|
98
|
+
CI shard 1/4 caught `capabilities-discoverability.test.ts`: every registered route prefix must be
|
|
99
|
+
classified in `src/server/CapabilityIndex.ts` (the test reads routes.ts via regex, not import, so
|
|
100
|
+
vitest's `--changed` graph didn't link it locally). Classified `/framework-issues` as a read-only
|
|
101
|
+
observability capability (mirrors the `tokens` entry, `enabled: !!ctx.frameworkIssueLedger`). No
|
|
102
|
+
behavioral change — surfaces the read-only routes in `/capabilities`. 322 capability tests green.
|