instar 1.2.58 → 1.2.60
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/skills/autonomous/hooks/autonomous-stop-hook.sh +31 -0
- package/.claude/skills/autonomous/scripts/setup-autonomous.sh +25 -0
- package/dist/commands/server.d.ts.map +1 -1
- package/dist/commands/server.js +19 -0
- package/dist/commands/server.js.map +1 -1
- package/dist/core/PostUpdateMigrator.js +2 -2
- package/dist/core/PostUpdateMigrator.js.map +1 -1
- package/dist/monitoring/HumanAsDetectorLog.d.ts +123 -0
- package/dist/monitoring/HumanAsDetectorLog.d.ts.map +1 -0
- package/dist/monitoring/HumanAsDetectorLog.js +237 -0
- package/dist/monitoring/HumanAsDetectorLog.js.map +1 -0
- package/dist/server/CapabilityIndex.d.ts.map +1 -1
- package/dist/server/CapabilityIndex.js +3 -0
- package/dist/server/CapabilityIndex.js.map +1 -1
- package/dist/server/routes.d.ts.map +1 -1
- package/dist/server/routes.js +67 -0
- package/dist/server/routes.js.map +1 -1
- package/package.json +1 -1
- package/src/data/builtin-manifest.json +61 -61
- package/upgrades/1.2.59.md +67 -0
- package/upgrades/1.2.60.md +57 -0
- package/upgrades/side-effects/goal-native-delegation.md +76 -0
- package/upgrades/side-effects/human-as-detector.md +43 -0
|
@@ -0,0 +1,76 @@
|
|
|
1
|
+
# Side-Effects Review — native /goal delegation (Phase 2)
|
|
2
|
+
|
|
3
|
+
**Version / slug:** `goal-native-delegation`
|
|
4
|
+
**Date:** 2026-05-24
|
|
5
|
+
**Author:** echo
|
|
6
|
+
**Second-pass reviewer:** internal conformance pass
|
|
7
|
+
|
|
8
|
+
## Summary of the change
|
|
9
|
+
|
|
10
|
+
Where the framework has a native /goal loop (Claude Code >= 2.1.139), autonomous mode delegates
|
|
11
|
+
completion to it: instar **injects `/goal <condition>`** into the session via
|
|
12
|
+
`SessionManager.sendInput` (tmux send-keys — its existing session-input mechanism), marks the
|
|
13
|
+
job `goal_mode: native`, and the stop-hook **defers** the continue/stop decision to native
|
|
14
|
+
/goal (approves each turn). instar still enforces emergency-stop + duration by injecting
|
|
15
|
+
`/goal clear` first. Phase 2 of `docs/specs/goal-completion-evaluator.md`. `src/`: two routes
|
|
16
|
+
(`/autonomous/native-goal/set|clear`) + capability-index entry + migration marker bump. Non-src:
|
|
17
|
+
hook native branch, setup auto-detection.
|
|
18
|
+
|
|
19
|
+
## Decision-point inventory
|
|
20
|
+
- Stop-hook `goal_mode: native` branch — **modify**: defer completion to native /goal (approve);
|
|
21
|
+
still enforce emergency-stop + duration (clear native first). This REMOVES instar's completion
|
|
22
|
+
authority for native topics by design (native /goal is the authority there).
|
|
23
|
+
- `POST /autonomous/native-goal/set` / `clear` — **add**: inject the slash command + flip
|
|
24
|
+
goal_mode. Thin; the side-effect is the session injection.
|
|
25
|
+
- `setup-autonomous.sh` native detection — **modify** (`.claude/`): activate native mode when
|
|
26
|
+
Claude Code >= 2.1.139 + a condition is set.
|
|
27
|
+
|
|
28
|
+
## 1. Over-block
|
|
29
|
+
- In native mode instar approves (never blocks) for completion, so instar cannot over-block.
|
|
30
|
+
Native /goal's own hook decides. No new over-block path.
|
|
31
|
+
|
|
32
|
+
## 2. Under-block (false "done" / premature exit)
|
|
33
|
+
- instar approving in native mode does NOT cause a false done: in Claude Code's hook composition
|
|
34
|
+
a `block` from native /goal wins over instar's `approve`, so native /goal keeps the session
|
|
35
|
+
working until ITS evaluator confirms the condition. If native /goal somehow isn't active, the
|
|
36
|
+
session would exit — mitigated because goal_mode:native is only set after a successful inject
|
|
37
|
+
(the set endpoint flips the flag only when sendInput returns true).
|
|
38
|
+
|
|
39
|
+
## 3. Level-of-abstraction fit
|
|
40
|
+
- Correct + the point of the change: drive the framework's native feature via instar's own
|
|
41
|
+
session-input mechanism, rather than reimplementing or treating "no /goal API" as a blocker.
|
|
42
|
+
|
|
43
|
+
## 4. Blocking authority
|
|
44
|
+
- [x] In native mode instar **yields** completion authority to native /goal (reduces instar's
|
|
45
|
+
authority — safe direction) while retaining its terminal STOP concerns (emergency/duration) by
|
|
46
|
+
clearing the native goal. No new brittle authority added.
|
|
47
|
+
|
|
48
|
+
## 5. Interactions (the key one: two Stop hooks)
|
|
49
|
+
- instar's hook + native /goal's hook both fire each turn. Resolved by composition: instar
|
|
50
|
+
approves (completion) so native /goal's block keeps control; instar only force-stops on
|
|
51
|
+
emergency/duration, and does so by clearing native /goal first (so they don't fight).
|
|
52
|
+
- **Emergency-stop** already kills the session via the sentinel path (native /goal dies with it);
|
|
53
|
+
the hook also clears native /goal on the flag. **Duration** clears native /goal then exits.
|
|
54
|
+
- Falls back cleanly to the instar evaluator (Phase 1) when native /goal is absent.
|
|
55
|
+
|
|
56
|
+
## 6. External surfaces
|
|
57
|
+
- **Session injection:** instar types `/goal <condition>` / `/goal clear` into the agent's own
|
|
58
|
+
tmux session (send-keys). This is instar's established mechanism (initial-message injection).
|
|
59
|
+
No new external/credential surface.
|
|
60
|
+
- **HTTP:** two authed routes under the already-claimed `/autonomous` prefix.
|
|
61
|
+
|
|
62
|
+
## 7. Rollback cost
|
|
63
|
+
- Low. Reverting restores instar's own evaluator everywhere (Phase 1 still in main). A
|
|
64
|
+
`goal_mode: native` left in a state file is ignored by an older hook (falls to the evaluator/
|
|
65
|
+
promise path). Migration marker is content-sniffed (rollback re-deploys cleanly).
|
|
66
|
+
|
|
67
|
+
## 8. Test evidence
|
|
68
|
+
- Hook: native defers (approve/exit, retained) + emergency/duration clear+exit. Integration:
|
|
69
|
+
set injects `/goal <cond>` (verified sendInput) + flips goal_mode; clear injects `/goal clear`;
|
|
70
|
+
404 unknown topic. tsc clean; 174 affected tests green.
|
|
71
|
+
|
|
72
|
+
## Deviation from the original spec
|
|
73
|
+
Spec Phase 2 sketched a `ThreadGoalSlot` provider primitive. Shipped instead via direct slash-
|
|
74
|
+
command injection through `SessionManager.sendInput` — simpler and the correct use of an existing
|
|
75
|
+
instar capability (per maintainer direction: "we already input text into sessions; use that to
|
|
76
|
+
call /goal"). Same intent, better mechanism; `ThreadGoalSlot` left unimplemented (not needed).
|
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
# Side-Effects Review — HumanAsDetectorLog
|
|
2
|
+
|
|
3
|
+
**Change**: Ports Dawn's human-as-detector pattern into Instar. Treats every human-caught
|
|
4
|
+
coherence break as a first-class signal about which automated layer failed to catch it.
|
|
5
|
+
|
|
6
|
+
## Files
|
|
7
|
+
- `src/monitoring/HumanAsDetectorLog.ts` (new) — singleton, deterministic no-LLM classifier,
|
|
8
|
+
append-only `.instar/metrics/human-as-detector.jsonl`, `summarizeByLayer()` heat map, plus
|
|
9
|
+
the `observeInboundMessage()` gating helper (inbound-human-only).
|
|
10
|
+
- `tests/unit/HumanAsDetectorLog.test.ts` (new) — 19 unit tests (classifier/observe/heat-map +
|
|
11
|
+
gating-helper wiring integrity).
|
|
12
|
+
- `tests/integration/human-as-detector-routes.test.ts` (new) — 3 tests via real `createRoutes`.
|
|
13
|
+
- `tests/e2e/human-as-detector-lifecycle.test.ts` (new) — 2 tests, live HTTP boot + disk.
|
|
14
|
+
- `src/server/routes.ts` — adds read-only `GET /human-as-detector/summary` (singleton-backed).
|
|
15
|
+
- `src/commands/server.ts` — configure() at startup; `observeInboundMessage()` chained onto
|
|
16
|
+
`telegram.onMessageLogged` (chains prior callbacks; only inbound human messages).
|
|
17
|
+
|
|
18
|
+
## Side effects
|
|
19
|
+
- **New disk write**: appends to `.instar/metrics/human-as-detector.jsonl` only when an
|
|
20
|
+
inbound human message matches a correction signal. Best-effort; wrapped in try/catch; never
|
|
21
|
+
throws into message handling.
|
|
22
|
+
- **Console**: one `[HUMAN-AS-DETECTOR]` warn line per detected signal (mirrors
|
|
23
|
+
DegradationReporter's loud-not-silent convention).
|
|
24
|
+
- **No network, no LLM, no external calls.** Classifier is pure regex over a conservative set.
|
|
25
|
+
- **No behavior change to message handling**: the hook only observes; prior `onMessageLogged`
|
|
26
|
+
callbacks (TopicMemory dual-write, PresenceProxy, keep-watching) are preserved via chaining.
|
|
27
|
+
- **New endpoint** is read-only and singleton-backed (always available; no 503 path).
|
|
28
|
+
|
|
29
|
+
## Risk
|
|
30
|
+
- Low. Additive, isolated module. Worst case on a logic bug: a spurious JSONL line or a missed
|
|
31
|
+
signal — neither affects message delivery (observe never throws into the caller).
|
|
32
|
+
- False-positive risk on the classifier is bounded by the `totalWeight >= 2` threshold (lone
|
|
33
|
+
weak signals like "actually," are ignored).
|
|
34
|
+
- Rollback cost: trivial — drop the module, the endpoint, and the ~6 wiring lines; no schema,
|
|
35
|
+
no migration, no config default.
|
|
36
|
+
|
|
37
|
+
## Signal vs authority
|
|
38
|
+
- Pure signal. The log only *records* and *summarizes*; it has no blocking authority and gates
|
|
39
|
+
nothing. Consumers (a human reading the heat map, or future evolution tooling) decide.
|
|
40
|
+
|
|
41
|
+
## Verification
|
|
42
|
+
- `npx tsc --noEmit` — clean (no new errors).
|
|
43
|
+
- `npx vitest run` on the three new test files — 24/24 pass across unit + integration + e2e.
|