instar 1.2.72 → 1.2.74
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/commands/server.d.ts.map +1 -1
- package/dist/commands/server.js +6 -2
- package/dist/commands/server.js.map +1 -1
- package/dist/core/MessagingToneGate.d.ts +2 -2
- package/dist/core/MessagingToneGate.d.ts.map +1 -1
- package/dist/core/MessagingToneGate.js +32 -1
- package/dist/core/MessagingToneGate.js.map +1 -1
- package/dist/core/PostUpdateMigrator.d.ts.map +1 -1
- package/dist/core/PostUpdateMigrator.js +24 -4
- package/dist/core/PostUpdateMigrator.js.map +1 -1
- package/dist/memory/WorkingMemoryAssembler.d.ts +26 -0
- package/dist/memory/WorkingMemoryAssembler.d.ts.map +1 -1
- package/dist/memory/WorkingMemoryAssembler.js +56 -0
- package/dist/memory/WorkingMemoryAssembler.js.map +1 -1
- package/dist/memory/WorkingSet.d.ts +53 -0
- package/dist/memory/WorkingSet.d.ts.map +1 -0
- package/dist/memory/WorkingSet.js +140 -0
- package/dist/memory/WorkingSet.js.map +1 -0
- package/package.json +1 -1
- package/src/data/builtin-manifest.json +17 -17
- package/upgrades/1.2.73.md +34 -0
- package/upgrades/1.2.74.md +62 -0
- package/upgrades/side-effects/cwa-unify-stores.md +83 -0
- package/upgrades/side-effects/never-a-false-blocker-standard.md +54 -0
|
@@ -0,0 +1,62 @@
|
|
|
1
|
+
# Upgrade Guide — one ranked working set across the memory stores
|
|
2
|
+
|
|
3
|
+
<!-- bump: minor -->
|
|
4
|
+
<!-- minor = new features, new APIs, new capabilities (backwards-compatible) -->
|
|
5
|
+
|
|
6
|
+
## What Changed
|
|
7
|
+
|
|
8
|
+
**The memory stores now feed one ranked reading list instead of three separate ones.**
|
|
9
|
+
|
|
10
|
+
Topic-intent (conversation facts + task frame), Playbook (tagged context items),
|
|
11
|
+
and semantic/episodic memory each had their own read path and ranking. This unifies
|
|
12
|
+
the **reading**: the existing working-memory assembler — already a token-budgeted
|
|
13
|
+
multi-source context builder — now also draws from topic-intent and Playbook, and
|
|
14
|
+
ranks everything together by relevance blended with a recency-decay factor (so
|
|
15
|
+
fresher, more-relevant context floats up; stale context sinks but isn't deleted and
|
|
16
|
+
re-warms on reference).
|
|
17
|
+
|
|
18
|
+
The important design choice (ratified): this unifies the **reading, not the
|
|
19
|
+
storage**. The three stores keep their own backends and write paths; the assembler
|
|
20
|
+
is the single unified read. That makes it additive and reversible — and it's
|
|
21
|
+
guarded by a regression pin: **when the new sources are empty, the assembled
|
|
22
|
+
context is byte-for-byte what it was before.** Playbook is read straight from its
|
|
23
|
+
manifest files (no Python invoked in the hot assembly path), and every new source
|
|
24
|
+
is degrade-safe — a missing or erroring source contributes nothing and never
|
|
25
|
+
breaks assembly.
|
|
26
|
+
|
|
27
|
+
The result shows up as a "Working Set" section in the assembled context
|
|
28
|
+
(`GET /session/context/:topicId`), drawing from all sources under one budget, with
|
|
29
|
+
per-source contribution visible in the response's `sources`.
|
|
30
|
+
|
|
31
|
+
**Evidence**: 11 new tests (8 unit — blended ranking, both source adapters,
|
|
32
|
+
degrade-safety, and the regression pin; 3 boot-path route tests confirming the
|
|
33
|
+
unified read path is alive and surfaces topic-intent in a Working Set section, and
|
|
34
|
+
that a ref-less topic is unchanged). The existing 49 assembler + working-memory
|
|
35
|
+
tests stay green (the regression pin, confirmed). `tsc` + lint clean.
|
|
36
|
+
|
|
37
|
+
Spec: `docs/specs/cwa-unify-stores.md` (approved; Claude-authored + manual review —
|
|
38
|
+
this is the most architectural rung, fuller multi-model review advisable, caveat
|
|
39
|
+
ratified). ELI16: `docs/specs/cwa-unify-stores.eli16.md`. Side-effects:
|
|
40
|
+
`upgrades/side-effects/cwa-unify-stores.md`.
|
|
41
|
+
|
|
42
|
+
## What to Tell Your User
|
|
43
|
+
|
|
44
|
+
- **One sorted view of what matters**: "I used to keep 'what's relevant right now'
|
|
45
|
+
in three separate notebooks that didn't talk to each other. Now they feed one
|
|
46
|
+
ranked reading list, so when we pick something up I get a single sorted view —
|
|
47
|
+
freshest and most-relevant first — instead of three half-answers."
|
|
48
|
+
|
|
49
|
+
## Summary of New Capabilities
|
|
50
|
+
|
|
51
|
+
| Capability | How to Use |
|
|
52
|
+
|-----------|-----------|
|
|
53
|
+
| Unified working set in assembled context | Automatic — `GET /session/context/:topicId` now includes a "Working Set" section drawing from topic-intent + Playbook + memory |
|
|
54
|
+
| Blended relevance×recency ranking across sources | Automatic (the assembler ranks all sources together) |
|
|
55
|
+
|
|
56
|
+
## Evidence
|
|
57
|
+
|
|
58
|
+
Not a bug fix — a new capability over the existing assembler. Verified by 11 new
|
|
59
|
+
tests including 3 that boot the real AgentServer and confirm the assembled-context
|
|
60
|
+
route surfaces a topic-intent ref in a "Working Set" section, plus a regression pin
|
|
61
|
+
(unit + the unchanged existing 49-test suites) proving the assembled output is
|
|
62
|
+
identical when the new sources are empty. `tsc` + lint clean.
|
|
@@ -0,0 +1,83 @@
|
|
|
1
|
+
# Side-effects review — unify the working-awareness stores (rung 2)
|
|
2
|
+
|
|
3
|
+
**Scope**: Give the working-awareness stores one ranked read path. Per the
|
|
4
|
+
ratified spec (`docs/specs/cwa-unify-stores.md`), this unifies the READ — extends
|
|
5
|
+
the existing `WorkingMemoryAssembler` to draw topic-intent refs + Playbook
|
|
6
|
+
manifest items into one ranked working-set section — NOT a physical store
|
|
7
|
+
migration. Additive, reversible, regression-pinned.
|
|
8
|
+
|
|
9
|
+
**Files touched**:
|
|
10
|
+
- `src/memory/WorkingSet.ts` — NEW. The `WorkingSetItem` lingua franca +
|
|
11
|
+
`blendedScore`/`rankWorkingSet` (relevance × recency-decay) + two read-only,
|
|
12
|
+
degrade-safe source adapters: `topicIntentToWorkingSet` (refs at/above tentative
|
|
13
|
+
→ items; relevance = confidence, recency = lastReinforcedAt) and
|
|
14
|
+
`playbookManifestToWorkingSet` (scans `{stateDir}/playbook/**.json` manifests,
|
|
15
|
+
trigger/tag-gated by the query, relevance = match + usefulness, recency =
|
|
16
|
+
freshness — **never invokes the Python scripts**).
|
|
17
|
+
- `src/memory/WorkingMemoryAssembler.ts` — optional `topicIntentStore` + `stateDir`
|
|
18
|
+
config; `workingSet` token budget (default 500); a new "Working Set" section
|
|
19
|
+
appended AFTER the existing knowledge/episodes/relationships sections, gated on
|
|
20
|
+
the new deps + content; `assembleWorkingSet`/`renderWorkingSet`; section header.
|
|
21
|
+
- `src/commands/server.ts` — pass `topicIntentStore` + `stateDir` to the assembler
|
|
22
|
+
and broaden its construction gate to include `topicIntentStore` (so the unified
|
|
23
|
+
read path is available even in minimal setups).
|
|
24
|
+
|
|
25
|
+
**Under-block**: None. The new section is purely additive context; it gates
|
|
26
|
+
nothing. Each new source is read-only and wrapped so an error/absence contributes
|
|
27
|
+
nothing.
|
|
28
|
+
|
|
29
|
+
**Over-block**: None. No authority anywhere — the assembler informs context, it
|
|
30
|
+
doesn't decide.
|
|
31
|
+
|
|
32
|
+
**Level-of-abstraction fit**: The stores keep their backends and write paths; only
|
|
33
|
+
the READ is unified, via the assembler that already does token-budgeted
|
|
34
|
+
multi-source assembly. The new sources speak the same `WorkingSetItem` shape and
|
|
35
|
+
flow through the same budget + render machinery. Playbook is read at the manifest
|
|
36
|
+
level (a stable JSON file), not through its Python CLI — the right seam for a fast,
|
|
37
|
+
degrade-safe assembler.
|
|
38
|
+
|
|
39
|
+
**Signal vs authority**: N/A — pure read/ranking.
|
|
40
|
+
|
|
41
|
+
**Interactions**:
|
|
42
|
+
- **REGRESSION PIN (load-bearing):** the working-set section is appended after the
|
|
43
|
+
existing three and only when `topicIntentStore || stateDir` is configured AND a
|
|
44
|
+
source returns content. With the new deps absent OR their sources empty, the
|
|
45
|
+
assembled output is byte-for-byte unchanged — verified by (a) a dedicated unit
|
|
46
|
+
test, (b) the existing 26 assembler unit tests + 9 working-memory route tests +
|
|
47
|
+
the assembler-context route tests all still green, (c) a route test asserting a
|
|
48
|
+
ref-less topic yields no working-set section.
|
|
49
|
+
- The assembler construction gate broadened to include `topicIntentStore` (always
|
|
50
|
+
present), so the assembler now constructs in more setups. This is additive —
|
|
51
|
+
callers in minimal setups gain a working-set context they didn't have; existing
|
|
52
|
+
setups' output is unchanged (pin).
|
|
53
|
+
- The new section draws only from REMAINING token budget after the existing
|
|
54
|
+
sources, so it cannot starve them.
|
|
55
|
+
- Playbook manifest scan is bounded (depth-limited, per-file try/catch) and
|
|
56
|
+
trigger-gated (empty query → no Playbook items), so it can't flood or slow
|
|
57
|
+
assembly.
|
|
58
|
+
|
|
59
|
+
**External surfaces**: New module `src/memory/WorkingSet.ts` (exports
|
|
60
|
+
`WorkingSetItem`, `blendedScore`, `rankWorkingSet`, the two adapters). New optional
|
|
61
|
+
assembler config fields + a `workingSet` budget. The assembled-context HTTP routes
|
|
62
|
+
now MAY include a "Working Set" section. No new endpoint, no config-shape change
|
|
63
|
+
for users (the assembler deps are wired server-side).
|
|
64
|
+
|
|
65
|
+
**Deferred (tracked)**: deeper cross-source blended re-ranking of the existing
|
|
66
|
+
sources (`cwa-physical-store-merge` rejected; cross-blend is implicit future work),
|
|
67
|
+
the Usher / mid-task re-surfacing (`cwa-usher`), capability+standards descriptors
|
|
68
|
+
as sources (`cwa-capability-index-context`). All in the spec's non-goals.
|
|
69
|
+
|
|
70
|
+
**Rollback cost**: Low. Drop the working-set section + the two adapter calls (or
|
|
71
|
+
just don't pass `topicIntentStore`/`stateDir`); the assembler returns to its
|
|
72
|
+
current three sources. No data migration. The regression pin guarantees the
|
|
73
|
+
revert is a no-op for existing output.
|
|
74
|
+
|
|
75
|
+
**Migration parity**: Additive assembler sources + budget default + the broadened
|
|
76
|
+
construction gate — all server-side (every agent gets it on update). No store
|
|
77
|
+
schema change, no hook/template/skill change, no user config change required.
|
|
78
|
+
|
|
79
|
+
**Convergence honesty**: Claude-authored + manual review; full multi-model
|
|
80
|
+
convergence tooling absent on host. This is the most architectural rung (touches
|
|
81
|
+
the shared assembler), so the regression pin is the primary safety and a fuller
|
|
82
|
+
multi-model review remains advisable — but the pin + the unchanged existing suites
|
|
83
|
+
bound the risk tightly.
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
# Side-Effects Review — Never a False Blocker (B17_FALSE_BLOCKER)
|
|
2
|
+
|
|
3
|
+
**Slug:** `never-a-false-blocker-standard`
|
|
4
|
+
**Date:** 2026-05-24
|
|
5
|
+
**Author:** echo
|
|
6
|
+
**Second-pass reviewer:** internal adversarial convergence (two reviewers) + real-LLM test-as-self
|
|
7
|
+
|
|
8
|
+
## Summary of the change
|
|
9
|
+
|
|
10
|
+
Adds the constitution standard "Never a False Blocker" to `docs/STANDARDS-REGISTRY.md` and its structural enforcement: a new always-evaluated rule **B17_FALSE_BLOCKER** in `MessagingToneGate` (the outbound-message authority that hosts B15/B16). B17 holds an outbound message that defers a doable task to a person — "needs a human / I can't / second opinion / reverse-engineering" — when the message names no genuinely-human-only item and shows no inventory of the agent's own means (computer use, terminal, send-keys, MCP). The `deferral-detector` PreToolUse hook is extended (signal-only) to prime the inventory checklist for the new excuse-shapes. Registers the standard in `docs/INSTAR-DESIGN-PRINCIPLES-AND-LESSONS.md` (P12). The sibling of B16 — feasibility-surrender (B16) vs human-deference (B17).
|
|
11
|
+
|
|
12
|
+
## Decision-point inventory
|
|
13
|
+
|
|
14
|
+
- `VALID_RULES` set — **add** `'B17_FALSE_BLOCKER'`. Without this the gate's drift-detection fails-open on a legitimate B17 citation (verified: a real-LLM B17 citation is accepted, `failedOpen=false`).
|
|
15
|
+
- `buildPrompt()` rule section — **add** the B17 definition after B16 (always-evaluated, no precondition), including the B16/B17 de-confliction + straddle handling + citation precedence (B15>B16>B17) + the UI-interaction clarification + a worked block example.
|
|
16
|
+
- Response-format enumeration + two doc comments (`B1..B16`→`B1..B17`) — **modify**.
|
|
17
|
+
- `deferral-detector` template (`PostUpdateMigrator.getDeferralDetectorHook`) + the deployed copy — **add** `needs_human_to` / `needs_reverse_engineering` patterns and a guarded `wants_second_opinion` (suppressed when a model/agent is named, so self-fetched cross-model review is not flagged). Checklist text updated to name the agent's own means + the tiny human-only set.
|
|
18
|
+
- No route changes: `checkOutboundMessage` → 422 is rule-agnostic; B17 rides the existing outbound paths.
|
|
19
|
+
|
|
20
|
+
## 1. Over-block
|
|
21
|
+
|
|
22
|
+
Principal risk: blocking legitimate escalations. Mitigated — severity favors false-negatives, and the allowlist explicitly passes: a password/secret only the user holds, CAPTCHA, legal/billing/payment authorization, **required approvals** (side-effects/policy-gated), **account/access grants**, **external rate-limit/quota waits**, genuine value judgments, deferrals after a named-outcome inventory, self-fetched cross-model review, and rule-discussion. Real-LLM test-as-self confirmed password escalation, value-judgment, and required-approval all PASS while the founding false-blocker BLOCKS — no false-positive introduced by the precision-tightening.
|
|
23
|
+
|
|
24
|
+
## 2. Under-block (a real false blocker slipping through)
|
|
25
|
+
|
|
26
|
+
Two known holes, both accepted by design:
|
|
27
|
+
- The gate sees only message text, so a **fabricated inventory** ("I tried everything, your call") can pass — same limit as B16, stated honestly in the rule. Mitigated by requiring *named outcomes* (not bare tool names); the hollow-inventory case is a unit assertion.
|
|
28
|
+
- Borderline misses are acceptable per the false-negative-favoring posture. Test-as-self caught the founding case passing initially and the prompt was tightened (UI-interaction clarification + worked example) until real Haiku blocked it.
|
|
29
|
+
|
|
30
|
+
## 3. Level-of-abstraction fit
|
|
31
|
+
|
|
32
|
+
Correct: the block authority lives inside the single outbound authority (where B15/B16 live), not in the detector. The `deferral-detector` extension is signal-only (injects `additionalContext`, never blocks). Signal-vs-authority compliant.
|
|
33
|
+
|
|
34
|
+
## 4. Blocking authority
|
|
35
|
+
|
|
36
|
+
No new brittle authority. B17 is one more rule the existing authority may cite; the 422 plumbing and fail-open behavior are inherited unchanged.
|
|
37
|
+
|
|
38
|
+
## 5. Interactions
|
|
39
|
+
|
|
40
|
+
B17 is always evaluated alongside B15/B16 in one LLM call — no extra calls, marginally longer prompt. De-conflicted from B16 (missing mechanism → B16; person required → B17; the straddle → B17) with explicit citation precedence B15>B16>B17 so telemetry is deterministic. Drift-detection unaffected (an invented rule id still fails open — regression test included). The detector's orphan-TODO patterns are preserved (the regenerated deployed copy carries them, so migration does not regress that prior improvement).
|
|
41
|
+
|
|
42
|
+
## 6. External surfaces
|
|
43
|
+
|
|
44
|
+
None. No new endpoints, credentials, or network calls.
|
|
45
|
+
|
|
46
|
+
## 7. Rollback cost
|
|
47
|
+
|
|
48
|
+
Low. Reverting removes the rule from the set + prompt, the detector patterns, and the doc entries; no state, no migration, no schema. An older server simply lacks the rule.
|
|
49
|
+
|
|
50
|
+
## 8. Test evidence
|
|
51
|
+
|
|
52
|
+
- Unit (`messaging-tone-gate-b17.test.ts`, 13 tests) + integration (`telegram-reply-b17-false-blocker.test.ts`, 2 tests) green; tsc clean; smoke suite (62 files / 2371 tests) green.
|
|
53
|
+
- Detector behaviorally exercised: false-blocker and reverse-engineering payloads flag; self-fetched cross-model review and clean status messages do not.
|
|
54
|
+
- **Real-LLM test-as-self** (real `ClaudeCliIntelligenceProvider` → Haiku, in-process against the built rule, production server untouched): founding codex-trust message + the fused straddle both BLOCK with B17; password escalation, value judgment, required approval, self-fetched second opinion, and post-inventory deferral all PASS.
|