@forwardimpact/libeval 0.1.45 → 0.1.47

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -7,82 +7,57 @@ reproducible evidence.
7
7
 
8
8
  <!-- END:description -->
9
9
 
10
- `libeval` provides the runtime and tool surface for multi-LLM
11
- coordination: an agent talks to a supervisor, a facilitator chairs a
12
- team meeting, or a lead drives an asynchronous discussion across a
13
- human channel. Every conversation produces a structured NDJSON trace
14
- for analysis.
10
+ `libeval` provides the runtime and tool surface for multi-LLM coordination —
11
+ an agent talks to a supervisor, a facilitator chairs a meeting, a lead drives
12
+ an asynchronous discussion — plus a CLI suite that runs evals, queries the
13
+ traces they produce, and edits skill files under controlled conditions.
15
14
 
16
- ## Modes
17
-
18
- | Mode | Lead | Participants | Terminal tool |
19
- | ------------- | ------------- | ------------- | ---------------------- |
20
- | `run` | (none) | one agent | task completion |
21
- | `supervise` | `supervisor` | one `agent` | `Conclude` |
22
- | `facilitate` | `facilitator` | N named | `Conclude` |
23
- | `discuss` | `lead` | N named | `Adjourn` or `Recess` |
24
- | `judge` | `judge` | (none) | `Conclude` |
15
+ ## CLIs
25
16
 
26
- Every mode except `run` and `judge` shares one orchestration loop
27
- (`OrchestrationLoop`) and one tool surface (`Ask` / `Answer` /
28
- `Announce` / `RollCall`, plus a mode-specific terminal tool). The
29
- loop fires the lead's LLM, fans messages out to participants over an
30
- in-memory bus, wakes them when something lands, and emits the
31
- universal `{source, seq, event}` NDJSON envelope for every line.
17
+ | CLI | Purpose |
18
+ | --------------- | ---------------------------------------------------------------------- |
19
+ | `fit-eval` | Run agents in `run`/`supervise`/`facilitate`/`discuss` subcommands. |
20
+ | `fit-trace` | Download, query, and analyze NDJSON traces produced by `fit-eval`. |
21
+ | `fit-benchmark` | Run task families for N runs each and aggregate pass@k. |
22
+ | `fit-selfedit` | Write stdin to `.claude/**` paths, gated by settings.json + branch. |
32
23
 
33
- ## The Ask / Answer protocol
24
+ `fit-eval`'s subcommands share one orchestration loop and one async tool
25
+ surface, below. The `judge` role is a profile passed to `supervise`.
34
26
 
35
- Coordination uses one async request/reply pattern with one piece of
36
- state per question — the `askId`. Every Ask returns immediately; the
37
- reply arrives later on the asker's inbox.
38
-
39
- ### Ask
40
-
41
- ```text
42
- Ask({ question, to? }) → { askIds: [N, …] }
43
- ```
27
+ ## Modes
44
28
 
45
- The handler registers a pending entry per addressee, posts the
46
- question on the bus, and returns immediately. Each pending entry is
47
- keyed by a numeric `askId`. Two Asks to the same addressee each get
48
- their own id, so they coexist without overwriting.
29
+ | Mode | Lead | Participants | Terminal tool |
30
+ | ------------ | ------------- | ------------- | ---------------------- |
31
+ | `run` | (none) | one agent | task completion |
32
+ | `supervise` | `supervisor` | one `agent` | `Conclude` |
33
+ | `facilitate` | `facilitator` | N named | `Conclude` |
34
+ | `discuss` | `lead` | N named | `Adjourn` or `Recess` |
35
+ | `judge` | `judge` | (none) | `Conclude` |
49
36
 
50
- Broadcast: omit `to` on a multi-participant lead's Ask to fan out to
51
- every other participant the result `askIds` array has one entry
52
- per addressee.
37
+ `run` and `judge` are one-shot. The other three share `OrchestrationLoop`
38
+ plus an async Ask/Answer/Announce/RollCall tool surface; the loop fans
39
+ messages out over an in-memory bus and emits a `{source, seq, event}`
40
+ NDJSON envelope for every line.
53
41
 
54
- ### Answer
42
+ ## Async Ask / Answer / Announce
55
43
 
56
44
  ```text
45
+ Ask({ question, to? }) → { askIds: [N, …] }
57
46
  Answer({ message, askId? }) → routed to the asker
47
+ Announce({ message }) → broadcast, no reply expected
58
48
  ```
59
49
 
60
- The reply lands in the asker's bus inbox as
61
- `[answer#N] <participant>: <text>` on a later turn. `askId` is
62
- optional and the handler is forgiving:
63
-
64
- - **Provided + matches an ask owed by the caller** → routes the reply
65
- to that specific asker.
66
- - **Provided but unknown or wrong addressee** → `isError` with a
67
- pointed message. The caller tried to specify; we tell them why.
68
- - **Omitted + exactly one ask is owed to the caller** → auto-picks
69
- that ask. (Forcing an Announce when the only owed ask is obvious
70
- would be pedantic.)
71
- - **Omitted + 0 or many asks owed** → broadcasts as Announce so the
72
- message still reaches every participant.
73
-
74
- ### Announce
75
-
76
- ```text
77
- Announce({ message }) → broadcast, no reply expected
78
- ```
79
-
80
- Lands on every other participant's queue as `[shared] <from>: <text>`.
50
+ Every Ask returns immediately and registers a pending entry keyed by an
51
+ `askId`. The reply arrives later on the asker's inbox as `[answer#N]
52
+ <participant>: <text>`. Broadcast: omit `to` on a multi-participant
53
+ lead. Answer's `askId` is optional — the handler is forgiving:
81
54
 
82
- ### Inbox format
55
+ - **Provided + matches an ask owed by the caller** → routes to that asker.
56
+ - **Provided but unknown or wrong addressee** → `isError` with a pointed message.
57
+ - **Omitted + exactly one ask owed to the caller** → auto-picks it.
58
+ - **Omitted + 0 or many asks owed** → broadcasts as Announce.
83
59
 
84
- Every line a participant reads on a resume is one bus message rendered
85
- with its tag:
60
+ Inbox lines on resume:
86
61
 
87
62
  ```text
88
63
  [ask#42] facilitator: What is your current condition?
@@ -91,59 +66,39 @@ with its tag:
91
66
  [system] @orchestrator: You have an unanswered ask from facilitator (askId=42)…
92
67
  ```
93
68
 
94
- The `[ask#N]` tag is what the participant quotes back in Answer's
95
- `askId` field.
96
-
97
- ### Why async
98
-
99
- The lead can issue Asks, end its turn, and use the gap between turns
100
- for planning, reflection, or follow-up Asks while participants work
101
- in parallel. Nothing blocks the LLM thread waiting on a reply. The
102
- orchestrator wakes the lead whenever the inbox has new content.
103
-
104
- ## The orchestration loop
69
+ Async means the lead can issue Asks, end its turn, and plan in the gap
70
+ while participants work in parallel — nothing blocks the LLM thread.
105
71
 
106
- `OrchestrationLoop` runs one outer pattern for both the lead and each
107
- participant:
72
+ ## Orchestration loop
108
73
 
109
- 1. Drain the bus queue, or wait for the first message.
110
- 2. Run (first turn) or resume (every subsequent turn) the LLM with the
111
- drained messages formatted as tagged lines.
112
- 3. If the participant ended a turn with an unanswered Ask owed to it,
113
- inject one synthetic reminder and resume once more. If still
114
- unanswered, emit a `protocol_violation` event and cancel the
115
- pending entry with a synthetic null answer so the asker unblocks.
74
+ Each participant drains the bus (or waits), runs/resumes the LLM with
75
+ drained messages as tagged lines, and on an unanswered owed Ask injects
76
+ one synthetic reminder before emitting `protocol_violation` and
77
+ unblocking the asker with a synthetic null answer.
116
78
 
117
- The lead's first turn starts with the task as its initial prompt;
118
- participants' first runs are triggered by their first inbound message.
119
-
120
- Termination flips two flags:
121
-
122
- - `ctx.concluded` — explicit `Conclude` / `Adjourn` / `Recess`. The
123
- handler also cancels any in-flight Asks with a synthetic null so
124
- askers see why their question won't be answered.
125
- - `stopped` — broader: also true on a lead error, an agent crash, or
126
- any abort path. Loops watch `stopped`; `ctx.concluded` is only used
127
- for the summary's `success` / `verdict`.
79
+ Termination uses two flags. `ctx.concluded` is explicit
80
+ `Conclude`/`Adjourn`/`Recess` also cancels in-flight Asks so askers
81
+ see why their question won't be answered. `stopped` is broader: lead
82
+ error, agent crash, abort path. Loops watch `stopped`; `ctx.concluded`
83
+ only feeds the summary's `success`/`verdict`.
128
84
 
129
85
  ## Tool surface, by role
130
86
 
131
- | Role | Ask | Answer | Announce | RollCall | Conclude | Other |
132
- | ------------ | --- | ------ | -------- | -------- | -------- | ------------------------------------ |
133
- | Facilitator | ✓ | ✓ | ✓ | ✓ | ✓ | |
134
- | Fac. agent | ✓ | ✓ | ✓ | ✓ | | |
135
- | Supervisor | ✓ | ✓ | ✓ | ✓ | ✓ | |
136
- | Sup. agent | ✓ | ✓ | ✓ | ✓ | | |
87
+ | Role | Ask | Answer | Announce | RollCall | Conclude | Other |
88
+ | ------------ | --- | ------ | -------- | -------- | -------- | ---------------------------------------- |
89
+ | Facilitator | ✓ | ✓ | ✓ | ✓ | ✓ | |
90
+ | Fac. agent | ✓ | ✓ | ✓ | ✓ | | |
91
+ | Supervisor | ✓ | ✓ | ✓ | ✓ | ✓ | |
92
+ | Sup. agent | ✓ | ✓ | ✓ | ✓ | | |
137
93
  | Discuss lead | ✓ | ✓ | ✓ | ✓ | | `RequestForComment`, `Recess`, `Adjourn` |
138
- | Discuss agt | ✓ | ✓ | ✓ | ✓ | | |
139
- | Judge | | | | | ✓ | |
94
+ | Discuss agt | ✓ | ✓ | ✓ | ✓ | | |
95
+ | Judge | | | | | ✓ | |
140
96
 
141
97
  Ask's `to` accepts a participant name on multi-participant roles
142
- (facilitator, discuss lead, all participants); supervise's
143
- `supervisor` / `agent` pair don't accept `to` because there's only
144
- one possible target.
98
+ (facilitator, discuss lead, all participants). The supervise pair has
99
+ only one possible target so `to` is rejected there.
145
100
 
146
- ## Minimal example: a two-participant facilitator
101
+ ## Minimal example: two-participant facilitator
147
102
 
148
103
  ```js
149
104
  import { createFacilitator, createRedactor } from "@forwardimpact/libeval";
@@ -165,66 +120,77 @@ const result = await facilitator.run("Run a kata storyboard meeting.");
165
120
  // result.success / result.turns / NDJSON trace on process.stdout
166
121
  ```
167
122
 
168
- The facilitator's LLM, started with that task, has access to `Ask`,
169
- `Answer`, `Announce`, `RollCall`, and `Conclude`. Alice and Bob each
170
- get `Ask`, `Answer`, `Announce`, `RollCall`. Every tool call, every
171
- message routed through the bus, and every orchestrator event becomes a
172
- line in the trace.
123
+ The facilitator gets `Ask`/`Answer`/`Announce`/`RollCall`/`Conclude`;
124
+ each agent gets the same minus `Conclude`. Every tool call, bus
125
+ message, and orchestrator event becomes one trace line.
173
126
 
174
- ## Trace format
127
+ ## Trace format and redaction
175
128
 
176
- Every line is one JSON object with three fields:
129
+ Each line is `{ "source": "<participant|orchestrator>", "seq": N, "event":
130
+ {…} }`. `seq` is monotonic across the whole trace; `orchestrator` emits
131
+ `session_start`, `agent_start`, `protocol_violation`, `lead_turn_limit`,
132
+ and `summary`. `event` is the SDK event verbatim or the orchestrator
133
+ payload. `fit-trace` consumes this format.
177
134
 
178
- ```json
179
- { "source": "facilitator", "seq": 42, "event": { … } }
180
- ```
135
+ Redaction is on by default for `fit-eval run`/`supervise`/`facilitate`
136
+ and composes two layers:
181
137
 
182
- - `source`the participant whose LLM produced the line, or
183
- `orchestrator` for loop-level events (`session_start`, `agent_start`,
184
- `protocol_violation`, `lead_turn_limit`, `summary`).
185
- - `seq` — monotonically increasing across the whole trace; useful for
186
- reconstructing the wall-clock order across concurrent participants.
187
- - `event` the SDK event verbatim, or the orchestrator event payload.
188
-
189
- `fit-trace` consumes this format. See the trace analysis guide for the
190
- full schema.
191
-
192
- ## Trace redaction
193
-
194
- `fit-eval run`, `fit-eval supervise`, and `fit-eval facilitate` redact
195
- secrets in trace artifacts before they reach disk. Two layers compose:
196
-
197
- - **Env-var allowlist**, defaulting to `ANTHROPIC_API_KEY`, `GH_TOKEN`,
198
- `GITHUB_TOKEN`. The runtime values of these vars are replaced with
199
- `[REDACTED:env:NAME]` wherever they appear in tool inputs, tool
200
- outputs, assistant text, or orchestrator summaries. Override the
201
- list with `LIBEVAL_REDACTION_ENV_VARS=NAME1,NAME2,…` (replaces, not
202
- extends).
203
- - **Credential-shape patterns**, covering Anthropic API keys
204
- (`sk-ant-`), GitHub PATs (`ghp_`), installation tokens (`ghs_`),
205
- OAuth tokens (`gho_`), and fine-grained PATs (`github_pat_`).
206
- Pattern hits become `[REDACTED:pattern:KIND]`.
207
-
208
- Redaction is on by default. To disable, set
209
- `LIBEVAL_REDACTION_DISABLED=1` — a stderr warning fires once per run.
210
- Never set this in CI on a public repository: workflow artifacts there
211
- are downloadable through the retention window.
138
+ - **Env-var allowlist** `ANTHROPIC_API_KEY`, `GH_TOKEN`, `GITHUB_TOKEN`
139
+ by default; override with `LIBEVAL_REDACTION_ENV_VARS=NAME1,…`
140
+ (replaces, not extends). Runtime values become `[REDACTED:env:NAME]`
141
+ everywhere they appear.
142
+ - **Credential-shape patterns** — `sk-ant-`, `ghp_`, `ghs_`, `gho_`,
143
+ `github_pat_`. Hits become `[REDACTED:pattern:KIND]`.
144
+
145
+ Set `LIBEVAL_REDACTION_DISABLED=1` to disable (one stderr warning per
146
+ run). Never on CI for a public repo — workflow artifacts are
147
+ downloadable through retention.
212
148
 
213
149
  ## Module map
214
150
 
215
- | Module | Purpose |
216
- | ---------------------------- | ----------------------------------------------------------------------- |
217
- | `agent-runner.js` | One Claude Agent SDK session; emits NDJSON via the redactor. |
218
- | `message-bus.js` | In-memory per-participant queues + `waitForMessages` Promise wakeup. |
219
- | `orchestration-toolkit.js` | Shared Ask / Answer / Announce / Conclude / RollCall handlers + builders. |
220
- | `orchestration-loop.js` | Unified lead+participant loop; reminder/violation handling. |
221
- | `facilitator.js` | `Facilitator` class + factory + system prompts. |
222
- | `supervisor.js` | `Supervisor` class + factory + system prompts. |
223
- | `discuss-tools.js` | Discuss-only RequestForComment / Recess / Adjourn handlers + tool servers. |
224
- | `discusser.js` | `Discusser` class + factory + system prompt + resume hydration. |
225
- | `judge.js` | One-shot post-hoc verdict via `Conclude`. |
226
- | `trace-collector.js` / `trace-query.js` / `trace-github.js` | Trace ingestion / querying / GitHub-attachment helpers. |
227
- | `redaction.js` | Env-var allowlist + credential-shape pattern redaction. |
151
+ | Module | Purpose |
152
+ | ----------------------------------------------------------- | -------------------------------------------------------------------- |
153
+ | `agent-runner.js` | One Claude Agent SDK session; emits NDJSON via the redactor. |
154
+ | `message-bus.js` | Per-participant queues + `waitForMessages` Promise wakeup. |
155
+ | `orchestration-toolkit.js` | Shared Ask/Answer/Announce/Conclude/RollCall handlers + builders. |
156
+ | `orchestration-loop.js` | Unified lead+participant loop; reminder/violation handling. |
157
+ | `facilitator.js` / `supervisor.js` / `discusser.js` / `judge.js` | Per-mode class + factory + system prompt. |
158
+ | `discuss-tools.js` | Discuss-only `RequestForComment`/`Recess`/`Adjourn`. |
159
+ | `trace-collector.js` / `trace-query.js` / `trace-github.js` | Trace ingestion / querying / GitHub-attachment helpers. |
160
+ | `redaction.js` | Env-var allowlist + credential-shape pattern redaction. |
161
+
162
+ ## fit-selfedit
163
+
164
+ A narrow, audited bypass for sessions where `Edit`/`Write` (and bash
165
+ writes) are blocked against paths the project's own allowlist permits —
166
+ see [#1162](https://github.com/forwardimpact/monorepo/issues/1162) and
167
+ [#441](https://github.com/forwardimpact/monorepo/issues/441) for the
168
+ original episodes. Reads stdin, writes the target, exits 0 / 2
169
+ (safeguard violation) / 1 (I/O error).
170
+
171
+ ```sh
172
+ echo "<content>" | bunx fit-selfedit <path>
173
+ ```
174
+
175
+ Two safeguards, checked in order:
176
+
177
+ 1. **Settings-allow.** Walk upward from the target with
178
+ [`Finder.findUpward`](../libutil/src/finder.js) to find the nearest
179
+ `.claude/settings.json`. The target relative to its grandparent
180
+ directory must match at least one `Edit(<glob>)` rule in
181
+ `permissions.allow[]` (matched with
182
+ [`minimatch`](https://github.com/isaacs/minimatch), `dot: true`).
183
+ Settings.json is the single source of truth — widen the project
184
+ allowlist and the CLI follows. Traversal like `.claude/../README.md`
185
+ is rejected as a side effect: `path.resolve` collapses `..` first,
186
+ then the resolved path tests against the rules.
187
+
188
+ 2. **Branch scope.** `git rev-parse --abbrev-ref HEAD` must not be
189
+ `HEAD` (detached) or `main`. Edits ride a feature branch through
190
+ whatever merge gates the project has configured.
191
+
192
+ Failure messages name the safeguard that rejected; safeguard 1 also
193
+ lists the `Edit()` rules that were tried.
228
194
 
229
195
  ## Documentation
230
196
 
@@ -1,5 +1,7 @@
1
1
  #!/usr/bin/env node
2
2
 
3
+ import "@forwardimpact/libpreflight/node22";
4
+
3
5
  import { readFileSync, realpathSync } from "node:fs";
4
6
  import { createCli } from "@forwardimpact/libcli";
5
7
  import { createLogger } from "@forwardimpact/libtelemetry";
package/bin/fit-eval.js CHANGED
@@ -1,5 +1,7 @@
1
1
  #!/usr/bin/env node
2
2
 
3
+ import "@forwardimpact/libpreflight/node22";
4
+
3
5
  import { readFileSync } from "node:fs";
4
6
  import { createCli } from "@forwardimpact/libcli";
5
7
  import { createLogger } from "@forwardimpact/libtelemetry";
@@ -0,0 +1,163 @@
1
+ #!/usr/bin/env node
2
+ /**
3
+ * fit-selfedit — write stdin to a path that .claude/settings.json
4
+ * permits Edit on, while on a non-main git branch. See
5
+ * libraries/libeval/README.md § fit-selfedit for the full rationale.
6
+ */
7
+
8
+ import "@forwardimpact/libpreflight/node22";
9
+ import { existsSync, readFileSync, writeFileSync } from "node:fs";
10
+ import fsPromises from "node:fs/promises";
11
+ import { parseArgs } from "node:util";
12
+ import { resolve, relative, dirname } from "node:path";
13
+ import { execFileSync } from "node:child_process";
14
+
15
+ import { Finder } from "@forwardimpact/libutil";
16
+ import { minimatch } from "minimatch";
17
+
18
+ const HELP = `fit-selfedit — write stdin to a settings.json-allowed path on a non-main branch.
19
+
20
+ Usage:
21
+ echo content | fit-selfedit <path>
22
+ fit-selfedit <path> < input.txt
23
+
24
+ Safeguards (checked in order):
25
+ 1. The nearest .claude/settings.json must contain an Edit(<glob>) rule
26
+ in permissions.allow[] that resolves to the target path.
27
+ 2. HEAD must not be detached and the current branch must not be 'main'.
28
+
29
+ Exit codes:
30
+ 0 wrote the file
31
+ 2 safeguard violation (no settings.json, no matching Edit rule, on
32
+ main, detached HEAD, missing parent directory, TTY stdin)
33
+ 1 unexpected I/O error
34
+
35
+ Why this exists:
36
+ Some session harnesses block Edit/Write (and interactive bash writes)
37
+ on .claude/skills/**, even when the project allowlist permits them.
38
+ This CLI is a narrow, audited bypass: a subprocess write that still
39
+ has to clear the project allowlist and the normal merge gates.
40
+ `;
41
+
42
+ function fail(message) {
43
+ process.stderr.write(`fit-selfedit: ${message}\n`);
44
+ process.exit(2);
45
+ }
46
+
47
+ const { values, positionals } = parseArgs({
48
+ options: {
49
+ help: { type: "boolean", short: "h" },
50
+ version: { type: "boolean" },
51
+ },
52
+ allowPositionals: true,
53
+ });
54
+
55
+ if (values.help) {
56
+ process.stdout.write(HELP);
57
+ process.exit(0);
58
+ }
59
+
60
+ if (values.version) {
61
+ const pkg = JSON.parse(
62
+ readFileSync(new URL("../package.json", import.meta.url), "utf8"),
63
+ );
64
+ process.stdout.write(`${pkg.version}\n`);
65
+ process.exit(0);
66
+ }
67
+
68
+ const [targetArg, ...extra] = positionals;
69
+ if (!targetArg) fail("missing <path> (try --help)");
70
+ if (extra.length > 0) fail(`unexpected extra arguments: ${extra.join(" ")}`);
71
+
72
+ const absoluteTarget = resolve(process.cwd(), targetArg);
73
+
74
+ // Safeguard 1: settings.json must grant Edit() on this path.
75
+ const settingsPath = new Finder(fsPromises, { debug() {} }).findUpward(
76
+ dirname(absoluteTarget),
77
+ ".claude/settings.json",
78
+ 20,
79
+ );
80
+ if (!settingsPath) {
81
+ fail(
82
+ `no .claude/settings.json found walking upward from ${dirname(absoluteTarget)}`,
83
+ );
84
+ }
85
+
86
+ const projectRoot = dirname(dirname(settingsPath));
87
+ const relativeTarget = relative(projectRoot, absoluteTarget);
88
+
89
+ let settings;
90
+ try {
91
+ settings = JSON.parse(readFileSync(settingsPath, "utf8"));
92
+ } catch (err) {
93
+ fail(`failed to parse ${settingsPath}: ${err.message}`);
94
+ }
95
+
96
+ const allowRules = settings?.permissions?.allow;
97
+ if (!Array.isArray(allowRules)) {
98
+ fail(`${settingsPath} has no permissions.allow[] array`);
99
+ }
100
+
101
+ const editPatterns = allowRules
102
+ .filter((rule) => typeof rule === "string")
103
+ .map((rule) => rule.match(/^Edit\((.+)\)$/)?.[1])
104
+ .filter(Boolean);
105
+
106
+ if (editPatterns.length === 0) {
107
+ fail(`${settingsPath} has no Edit() rules in permissions.allow[]`);
108
+ }
109
+
110
+ const matchedPattern = editPatterns.find((pattern) =>
111
+ minimatch(relativeTarget, pattern, { dot: true }),
112
+ );
113
+ if (!matchedPattern) {
114
+ fail(
115
+ `no Edit() rule in ${relative(projectRoot, settingsPath)} matches '${relativeTarget}' ` +
116
+ `(tried: ${editPatterns.map((p) => `Edit(${p})`).join(", ")})`,
117
+ );
118
+ }
119
+
120
+ // Safeguard 2: branch must not be main and HEAD must not be detached.
121
+ let branch;
122
+ try {
123
+ branch = execFileSync("git", ["rev-parse", "--abbrev-ref", "HEAD"], {
124
+ stdio: ["ignore", "pipe", "pipe"],
125
+ encoding: "utf8",
126
+ }).trim();
127
+ } catch {
128
+ fail("failed to read current git branch (not inside a git repository?)");
129
+ }
130
+
131
+ if (branch === "HEAD") {
132
+ fail("HEAD is detached — refusing (check out a non-main branch first)");
133
+ }
134
+ if (branch === "main") {
135
+ fail("refusing to write while on branch 'main' — switch to a feature branch");
136
+ }
137
+
138
+ const parent = dirname(absoluteTarget);
139
+ if (!existsSync(parent)) {
140
+ fail(`parent directory '${relative(projectRoot, parent)}' does not exist`);
141
+ }
142
+
143
+ if (process.stdin.isTTY) {
144
+ fail(
145
+ "stdin is a TTY — pipe content in (e.g. `echo … | fit-selfedit <path>`)",
146
+ );
147
+ }
148
+
149
+ const chunks = [];
150
+ for await (const chunk of process.stdin) chunks.push(chunk);
151
+ const content = Buffer.concat(chunks);
152
+
153
+ try {
154
+ writeFileSync(absoluteTarget, content);
155
+ } catch (err) {
156
+ process.stderr.write(`fit-selfedit: write failed: ${err.message}\n`);
157
+ process.exit(1);
158
+ }
159
+
160
+ process.stderr.write(
161
+ `fit-selfedit: wrote ${content.length} byte${content.length === 1 ? "" : "s"} to ${relativeTarget} ` +
162
+ `(matched Edit(${matchedPattern}), branch ${branch})\n`,
163
+ );
package/bin/fit-trace.js CHANGED
@@ -1,5 +1,7 @@
1
1
  #!/usr/bin/env node
2
2
 
3
+ import "@forwardimpact/libpreflight/node22";
4
+
3
5
  import { readFileSync } from "node:fs";
4
6
  import { createCli } from "@forwardimpact/libcli";
5
7
  import { createScriptConfig } from "@forwardimpact/libconfig";
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@forwardimpact/libeval",
3
- "version": "0.1.45",
3
+ "version": "0.1.47",
4
4
  "description": "Agent evaluation framework — prove whether agent changes improved outcomes with reproducible evidence.",
5
5
  "keywords": [
6
6
  "eval",
@@ -33,12 +33,14 @@
33
33
  ".": "./src/index.js",
34
34
  "./bin/fit-eval.js": "./bin/fit-eval.js",
35
35
  "./bin/fit-trace.js": "./bin/fit-trace.js",
36
- "./bin/fit-benchmark.js": "./bin/fit-benchmark.js"
36
+ "./bin/fit-benchmark.js": "./bin/fit-benchmark.js",
37
+ "./bin/fit-selfedit.js": "./bin/fit-selfedit.js"
37
38
  },
38
39
  "bin": {
39
40
  "fit-eval": "./bin/fit-eval.js",
40
41
  "fit-trace": "./bin/fit-trace.js",
41
- "fit-benchmark": "./bin/fit-benchmark.js"
42
+ "fit-benchmark": "./bin/fit-benchmark.js",
43
+ "fit-selfedit": "./bin/fit-selfedit.js"
42
44
  },
43
45
  "files": [
44
46
  "src/**/*.js",
@@ -52,8 +54,11 @@
52
54
  "@anthropic-ai/claude-agent-sdk": "0.2.112",
53
55
  "@forwardimpact/libcli": "^0.1.0",
54
56
  "@forwardimpact/libconfig": "^0.1.0",
57
+ "@forwardimpact/libpreflight": "^0.1.0",
55
58
  "@forwardimpact/libtelemetry": "^0.1.22",
59
+ "@forwardimpact/libutil": "^0.1.0",
56
60
  "jmespath": "^0.16.0",
61
+ "minimatch": "^10.0.0",
57
62
  "zod": "^4.4.3"
58
63
  },
59
64
  "devDependencies": {
@@ -61,7 +66,7 @@
61
66
  },
62
67
  "engines": {
63
68
  "bun": ">=1.2.0",
64
- "node": ">=18.0.0"
69
+ "node": ">=22.0.0"
65
70
  },
66
71
  "publishConfig": {
67
72
  "access": "public"