@forwardimpact/libeval 0.1.45 → 0.1.47
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +124 -158
- package/bin/fit-benchmark.js +2 -0
- package/bin/fit-eval.js +2 -0
- package/bin/fit-selfedit.js +163 -0
- package/bin/fit-trace.js +2 -0
- package/package.json +9 -4
package/README.md
CHANGED
|
@@ -7,82 +7,57 @@ reproducible evidence.
|
|
|
7
7
|
|
|
8
8
|
<!-- END:description -->
|
|
9
9
|
|
|
10
|
-
`libeval` provides the runtime and tool surface for multi-LLM
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
for analysis.
|
|
10
|
+
`libeval` provides the runtime and tool surface for multi-LLM coordination —
|
|
11
|
+
an agent talks to a supervisor, a facilitator chairs a meeting, a lead drives
|
|
12
|
+
an asynchronous discussion — plus a CLI suite that runs evals, queries the
|
|
13
|
+
traces they produce, and edits skill files under controlled conditions.
|
|
15
14
|
|
|
16
|
-
##
|
|
17
|
-
|
|
18
|
-
| Mode | Lead | Participants | Terminal tool |
|
|
19
|
-
| ------------- | ------------- | ------------- | ---------------------- |
|
|
20
|
-
| `run` | (none) | one agent | task completion |
|
|
21
|
-
| `supervise` | `supervisor` | one `agent` | `Conclude` |
|
|
22
|
-
| `facilitate` | `facilitator` | N named | `Conclude` |
|
|
23
|
-
| `discuss` | `lead` | N named | `Adjourn` or `Recess` |
|
|
24
|
-
| `judge` | `judge` | (none) | `Conclude` |
|
|
15
|
+
## CLIs
|
|
25
16
|
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
`
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
17
|
+
| CLI | Purpose |
|
|
18
|
+
| --------------- | ---------------------------------------------------------------------- |
|
|
19
|
+
| `fit-eval` | Run agents in `run`/`supervise`/`facilitate`/`discuss` subcommands. |
|
|
20
|
+
| `fit-trace` | Download, query, and analyze NDJSON traces produced by `fit-eval`. |
|
|
21
|
+
| `fit-benchmark` | Run task families for N runs each and aggregate pass@k. |
|
|
22
|
+
| `fit-selfedit` | Write stdin to `.claude/**` paths, gated by settings.json + branch. |
|
|
32
23
|
|
|
33
|
-
|
|
24
|
+
`fit-eval`'s subcommands share one orchestration loop and one async tool
|
|
25
|
+
surface, below. The `judge` role is a profile passed to `supervise`.
|
|
34
26
|
|
|
35
|
-
|
|
36
|
-
state per question — the `askId`. Every Ask returns immediately; the
|
|
37
|
-
reply arrives later on the asker's inbox.
|
|
38
|
-
|
|
39
|
-
### Ask
|
|
40
|
-
|
|
41
|
-
```text
|
|
42
|
-
Ask({ question, to? }) → { askIds: [N, …] }
|
|
43
|
-
```
|
|
27
|
+
## Modes
|
|
44
28
|
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
29
|
+
| Mode | Lead | Participants | Terminal tool |
|
|
30
|
+
| ------------ | ------------- | ------------- | ---------------------- |
|
|
31
|
+
| `run` | (none) | one agent | task completion |
|
|
32
|
+
| `supervise` | `supervisor` | one `agent` | `Conclude` |
|
|
33
|
+
| `facilitate` | `facilitator` | N named | `Conclude` |
|
|
34
|
+
| `discuss` | `lead` | N named | `Adjourn` or `Recess` |
|
|
35
|
+
| `judge` | `judge` | (none) | `Conclude` |
|
|
49
36
|
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
37
|
+
`run` and `judge` are one-shot. The other three share `OrchestrationLoop`
|
|
38
|
+
plus an async Ask/Answer/Announce/RollCall tool surface; the loop fans
|
|
39
|
+
messages out over an in-memory bus and emits a `{source, seq, event}`
|
|
40
|
+
NDJSON envelope for every line.
|
|
53
41
|
|
|
54
|
-
|
|
42
|
+
## Async Ask / Answer / Announce
|
|
55
43
|
|
|
56
44
|
```text
|
|
45
|
+
Ask({ question, to? }) → { askIds: [N, …] }
|
|
57
46
|
Answer({ message, askId? }) → routed to the asker
|
|
47
|
+
Announce({ message }) → broadcast, no reply expected
|
|
58
48
|
```
|
|
59
49
|
|
|
60
|
-
|
|
61
|
-
`
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
- **Provided + matches an ask owed by the caller** → routes the reply
|
|
65
|
-
to that specific asker.
|
|
66
|
-
- **Provided but unknown or wrong addressee** → `isError` with a
|
|
67
|
-
pointed message. The caller tried to specify; we tell them why.
|
|
68
|
-
- **Omitted + exactly one ask is owed to the caller** → auto-picks
|
|
69
|
-
that ask. (Forcing an Announce when the only owed ask is obvious
|
|
70
|
-
would be pedantic.)
|
|
71
|
-
- **Omitted + 0 or many asks owed** → broadcasts as Announce so the
|
|
72
|
-
message still reaches every participant.
|
|
73
|
-
|
|
74
|
-
### Announce
|
|
75
|
-
|
|
76
|
-
```text
|
|
77
|
-
Announce({ message }) → broadcast, no reply expected
|
|
78
|
-
```
|
|
79
|
-
|
|
80
|
-
Lands on every other participant's queue as `[shared] <from>: <text>`.
|
|
50
|
+
Every Ask returns immediately and registers a pending entry keyed by an
|
|
51
|
+
`askId`. The reply arrives later on the asker's inbox as `[answer#N]
|
|
52
|
+
<participant>: <text>`. Broadcast: omit `to` on a multi-participant
|
|
53
|
+
lead. Answer's `askId` is optional — the handler is forgiving:
|
|
81
54
|
|
|
82
|
-
|
|
55
|
+
- **Provided + matches an ask owed by the caller** → routes to that asker.
|
|
56
|
+
- **Provided but unknown or wrong addressee** → `isError` with a pointed message.
|
|
57
|
+
- **Omitted + exactly one ask owed to the caller** → auto-picks it.
|
|
58
|
+
- **Omitted + 0 or many asks owed** → broadcasts as Announce.
|
|
83
59
|
|
|
84
|
-
|
|
85
|
-
with its tag:
|
|
60
|
+
Inbox lines on resume:
|
|
86
61
|
|
|
87
62
|
```text
|
|
88
63
|
[ask#42] facilitator: What is your current condition?
|
|
@@ -91,59 +66,39 @@ with its tag:
|
|
|
91
66
|
[system] @orchestrator: You have an unanswered ask from facilitator (askId=42)…
|
|
92
67
|
```
|
|
93
68
|
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
### Why async
|
|
98
|
-
|
|
99
|
-
The lead can issue Asks, end its turn, and use the gap between turns
|
|
100
|
-
for planning, reflection, or follow-up Asks while participants work
|
|
101
|
-
in parallel. Nothing blocks the LLM thread waiting on a reply. The
|
|
102
|
-
orchestrator wakes the lead whenever the inbox has new content.
|
|
103
|
-
|
|
104
|
-
## The orchestration loop
|
|
69
|
+
Async means the lead can issue Asks, end its turn, and plan in the gap
|
|
70
|
+
while participants work in parallel — nothing blocks the LLM thread.
|
|
105
71
|
|
|
106
|
-
|
|
107
|
-
participant:
|
|
72
|
+
## Orchestration loop
|
|
108
73
|
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
inject one synthetic reminder and resume once more. If still
|
|
114
|
-
unanswered, emit a `protocol_violation` event and cancel the
|
|
115
|
-
pending entry with a synthetic null answer so the asker unblocks.
|
|
74
|
+
Each participant drains the bus (or waits), runs/resumes the LLM with
|
|
75
|
+
drained messages as tagged lines, and on an unanswered owed Ask injects
|
|
76
|
+
one synthetic reminder before emitting `protocol_violation` and
|
|
77
|
+
unblocking the asker with a synthetic null answer.
|
|
116
78
|
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
- `ctx.concluded` — explicit `Conclude` / `Adjourn` / `Recess`. The
|
|
123
|
-
handler also cancels any in-flight Asks with a synthetic null so
|
|
124
|
-
askers see why their question won't be answered.
|
|
125
|
-
- `stopped` — broader: also true on a lead error, an agent crash, or
|
|
126
|
-
any abort path. Loops watch `stopped`; `ctx.concluded` is only used
|
|
127
|
-
for the summary's `success` / `verdict`.
|
|
79
|
+
Termination uses two flags. `ctx.concluded` is explicit
|
|
80
|
+
`Conclude`/`Adjourn`/`Recess` — also cancels in-flight Asks so askers
|
|
81
|
+
see why their question won't be answered. `stopped` is broader: lead
|
|
82
|
+
error, agent crash, abort path. Loops watch `stopped`; `ctx.concluded`
|
|
83
|
+
only feeds the summary's `success`/`verdict`.
|
|
128
84
|
|
|
129
85
|
## Tool surface, by role
|
|
130
86
|
|
|
131
|
-
| Role | Ask | Answer | Announce | RollCall | Conclude | Other
|
|
132
|
-
| ------------ | --- | ------ | -------- | -------- | -------- |
|
|
133
|
-
| Facilitator | ✓ | ✓ | ✓ | ✓ | ✓ |
|
|
134
|
-
| Fac. agent | ✓ | ✓ | ✓ | ✓ | |
|
|
135
|
-
| Supervisor | ✓ | ✓ | ✓ | ✓ | ✓ |
|
|
136
|
-
| Sup. agent | ✓ | ✓ | ✓ | ✓ | |
|
|
87
|
+
| Role | Ask | Answer | Announce | RollCall | Conclude | Other |
|
|
88
|
+
| ------------ | --- | ------ | -------- | -------- | -------- | ---------------------------------------- |
|
|
89
|
+
| Facilitator | ✓ | ✓ | ✓ | ✓ | ✓ | |
|
|
90
|
+
| Fac. agent | ✓ | ✓ | ✓ | ✓ | | |
|
|
91
|
+
| Supervisor | ✓ | ✓ | ✓ | ✓ | ✓ | |
|
|
92
|
+
| Sup. agent | ✓ | ✓ | ✓ | ✓ | | |
|
|
137
93
|
| Discuss lead | ✓ | ✓ | ✓ | ✓ | | `RequestForComment`, `Recess`, `Adjourn` |
|
|
138
|
-
| Discuss agt | ✓ | ✓ | ✓ | ✓ | |
|
|
139
|
-
| Judge | | | | | ✓ |
|
|
94
|
+
| Discuss agt | ✓ | ✓ | ✓ | ✓ | | |
|
|
95
|
+
| Judge | | | | | ✓ | |
|
|
140
96
|
|
|
141
97
|
Ask's `to` accepts a participant name on multi-participant roles
|
|
142
|
-
(facilitator, discuss lead, all participants)
|
|
143
|
-
|
|
144
|
-
one possible target.
|
|
98
|
+
(facilitator, discuss lead, all participants). The supervise pair has
|
|
99
|
+
only one possible target so `to` is rejected there.
|
|
145
100
|
|
|
146
|
-
## Minimal example:
|
|
101
|
+
## Minimal example: two-participant facilitator
|
|
147
102
|
|
|
148
103
|
```js
|
|
149
104
|
import { createFacilitator, createRedactor } from "@forwardimpact/libeval";
|
|
@@ -165,66 +120,77 @@ const result = await facilitator.run("Run a kata storyboard meeting.");
|
|
|
165
120
|
// result.success / result.turns / NDJSON trace on process.stdout
|
|
166
121
|
```
|
|
167
122
|
|
|
168
|
-
The facilitator
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
message routed through the bus, and every orchestrator event becomes a
|
|
172
|
-
line in the trace.
|
|
123
|
+
The facilitator gets `Ask`/`Answer`/`Announce`/`RollCall`/`Conclude`;
|
|
124
|
+
each agent gets the same minus `Conclude`. Every tool call, bus
|
|
125
|
+
message, and orchestrator event becomes one trace line.
|
|
173
126
|
|
|
174
|
-
## Trace format
|
|
127
|
+
## Trace format and redaction
|
|
175
128
|
|
|
176
|
-
|
|
129
|
+
Each line is `{ "source": "<participant|orchestrator>", "seq": N, "event":
|
|
130
|
+
{…} }`. `seq` is monotonic across the whole trace; `orchestrator` emits
|
|
131
|
+
`session_start`, `agent_start`, `protocol_violation`, `lead_turn_limit`,
|
|
132
|
+
and `summary`. `event` is the SDK event verbatim or the orchestrator
|
|
133
|
+
payload. `fit-trace` consumes this format.
|
|
177
134
|
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
```
|
|
135
|
+
Redaction is on by default for `fit-eval run`/`supervise`/`facilitate`
|
|
136
|
+
and composes two layers:
|
|
181
137
|
|
|
182
|
-
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
`
|
|
190
|
-
|
|
191
|
-
|
|
192
|
-
## Trace redaction
|
|
193
|
-
|
|
194
|
-
`fit-eval run`, `fit-eval supervise`, and `fit-eval facilitate` redact
|
|
195
|
-
secrets in trace artifacts before they reach disk. Two layers compose:
|
|
196
|
-
|
|
197
|
-
- **Env-var allowlist**, defaulting to `ANTHROPIC_API_KEY`, `GH_TOKEN`,
|
|
198
|
-
`GITHUB_TOKEN`. The runtime values of these vars are replaced with
|
|
199
|
-
`[REDACTED:env:NAME]` wherever they appear in tool inputs, tool
|
|
200
|
-
outputs, assistant text, or orchestrator summaries. Override the
|
|
201
|
-
list with `LIBEVAL_REDACTION_ENV_VARS=NAME1,NAME2,…` (replaces, not
|
|
202
|
-
extends).
|
|
203
|
-
- **Credential-shape patterns**, covering Anthropic API keys
|
|
204
|
-
(`sk-ant-`), GitHub PATs (`ghp_`), installation tokens (`ghs_`),
|
|
205
|
-
OAuth tokens (`gho_`), and fine-grained PATs (`github_pat_`).
|
|
206
|
-
Pattern hits become `[REDACTED:pattern:KIND]`.
|
|
207
|
-
|
|
208
|
-
Redaction is on by default. To disable, set
|
|
209
|
-
`LIBEVAL_REDACTION_DISABLED=1` — a stderr warning fires once per run.
|
|
210
|
-
Never set this in CI on a public repository: workflow artifacts there
|
|
211
|
-
are downloadable through the retention window.
|
|
138
|
+
- **Env-var allowlist** — `ANTHROPIC_API_KEY`, `GH_TOKEN`, `GITHUB_TOKEN`
|
|
139
|
+
by default; override with `LIBEVAL_REDACTION_ENV_VARS=NAME1,…`
|
|
140
|
+
(replaces, not extends). Runtime values become `[REDACTED:env:NAME]`
|
|
141
|
+
everywhere they appear.
|
|
142
|
+
- **Credential-shape patterns** — `sk-ant-`, `ghp_`, `ghs_`, `gho_`,
|
|
143
|
+
`github_pat_`. Hits become `[REDACTED:pattern:KIND]`.
|
|
144
|
+
|
|
145
|
+
Set `LIBEVAL_REDACTION_DISABLED=1` to disable (one stderr warning per
|
|
146
|
+
run). Never on CI for a public repo — workflow artifacts are
|
|
147
|
+
downloadable through retention.
|
|
212
148
|
|
|
213
149
|
## Module map
|
|
214
150
|
|
|
215
|
-
| Module
|
|
216
|
-
|
|
|
217
|
-
| `agent-runner.js`
|
|
218
|
-
| `message-bus.js`
|
|
219
|
-
| `orchestration-toolkit.js`
|
|
220
|
-
| `orchestration-loop.js`
|
|
221
|
-
| `facilitator.js`
|
|
222
|
-
| `
|
|
223
|
-
| `
|
|
224
|
-
| `
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
|
|
151
|
+
| Module | Purpose |
|
|
152
|
+
| ----------------------------------------------------------- | -------------------------------------------------------------------- |
|
|
153
|
+
| `agent-runner.js` | One Claude Agent SDK session; emits NDJSON via the redactor. |
|
|
154
|
+
| `message-bus.js` | Per-participant queues + `waitForMessages` Promise wakeup. |
|
|
155
|
+
| `orchestration-toolkit.js` | Shared Ask/Answer/Announce/Conclude/RollCall handlers + builders. |
|
|
156
|
+
| `orchestration-loop.js` | Unified lead+participant loop; reminder/violation handling. |
|
|
157
|
+
| `facilitator.js` / `supervisor.js` / `discusser.js` / `judge.js` | Per-mode class + factory + system prompt. |
|
|
158
|
+
| `discuss-tools.js` | Discuss-only `RequestForComment`/`Recess`/`Adjourn`. |
|
|
159
|
+
| `trace-collector.js` / `trace-query.js` / `trace-github.js` | Trace ingestion / querying / GitHub-attachment helpers. |
|
|
160
|
+
| `redaction.js` | Env-var allowlist + credential-shape pattern redaction. |
|
|
161
|
+
|
|
162
|
+
## fit-selfedit
|
|
163
|
+
|
|
164
|
+
A narrow, audited bypass for sessions where `Edit`/`Write` (and bash
|
|
165
|
+
writes) are blocked against paths the project's own allowlist permits —
|
|
166
|
+
see [#1162](https://github.com/forwardimpact/monorepo/issues/1162) and
|
|
167
|
+
[#441](https://github.com/forwardimpact/monorepo/issues/441) for the
|
|
168
|
+
original episodes. Reads stdin, writes the target, exits 0 / 2
|
|
169
|
+
(safeguard violation) / 1 (I/O error).
|
|
170
|
+
|
|
171
|
+
```sh
|
|
172
|
+
echo "<content>" | bunx fit-selfedit <path>
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
Two safeguards, checked in order:
|
|
176
|
+
|
|
177
|
+
1. **Settings-allow.** Walk upward from the target with
|
|
178
|
+
[`Finder.findUpward`](../libutil/src/finder.js) to find the nearest
|
|
179
|
+
`.claude/settings.json`. The target relative to its grandparent
|
|
180
|
+
directory must match at least one `Edit(<glob>)` rule in
|
|
181
|
+
`permissions.allow[]` (matched with
|
|
182
|
+
[`minimatch`](https://github.com/isaacs/minimatch), `dot: true`).
|
|
183
|
+
Settings.json is the single source of truth — widen the project
|
|
184
|
+
allowlist and the CLI follows. Traversal like `.claude/../README.md`
|
|
185
|
+
is rejected as a side effect: `path.resolve` collapses `..` first,
|
|
186
|
+
then the resolved path tests against the rules.
|
|
187
|
+
|
|
188
|
+
2. **Branch scope.** `git rev-parse --abbrev-ref HEAD` must not be
|
|
189
|
+
`HEAD` (detached) or `main`. Edits ride a feature branch through
|
|
190
|
+
whatever merge gates the project has configured.
|
|
191
|
+
|
|
192
|
+
Failure messages name the safeguard that rejected; safeguard 1 also
|
|
193
|
+
lists the `Edit()` rules that were tried.
|
|
228
194
|
|
|
229
195
|
## Documentation
|
|
230
196
|
|
package/bin/fit-benchmark.js
CHANGED
package/bin/fit-eval.js
CHANGED
|
@@ -0,0 +1,163 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
/**
|
|
3
|
+
* fit-selfedit — write stdin to a path that .claude/settings.json
|
|
4
|
+
* permits Edit on, while on a non-main git branch. See
|
|
5
|
+
* libraries/libeval/README.md § fit-selfedit for the full rationale.
|
|
6
|
+
*/
|
|
7
|
+
|
|
8
|
+
import "@forwardimpact/libpreflight/node22";
|
|
9
|
+
import { existsSync, readFileSync, writeFileSync } from "node:fs";
|
|
10
|
+
import fsPromises from "node:fs/promises";
|
|
11
|
+
import { parseArgs } from "node:util";
|
|
12
|
+
import { resolve, relative, dirname } from "node:path";
|
|
13
|
+
import { execFileSync } from "node:child_process";
|
|
14
|
+
|
|
15
|
+
import { Finder } from "@forwardimpact/libutil";
|
|
16
|
+
import { minimatch } from "minimatch";
|
|
17
|
+
|
|
18
|
+
const HELP = `fit-selfedit — write stdin to a settings.json-allowed path on a non-main branch.
|
|
19
|
+
|
|
20
|
+
Usage:
|
|
21
|
+
echo content | fit-selfedit <path>
|
|
22
|
+
fit-selfedit <path> < input.txt
|
|
23
|
+
|
|
24
|
+
Safeguards (checked in order):
|
|
25
|
+
1. The nearest .claude/settings.json must contain an Edit(<glob>) rule
|
|
26
|
+
in permissions.allow[] that resolves to the target path.
|
|
27
|
+
2. HEAD must not be detached and the current branch must not be 'main'.
|
|
28
|
+
|
|
29
|
+
Exit codes:
|
|
30
|
+
0 wrote the file
|
|
31
|
+
2 safeguard violation (no settings.json, no matching Edit rule, on
|
|
32
|
+
main, detached HEAD, missing parent directory, TTY stdin)
|
|
33
|
+
1 unexpected I/O error
|
|
34
|
+
|
|
35
|
+
Why this exists:
|
|
36
|
+
Some session harnesses block Edit/Write (and interactive bash writes)
|
|
37
|
+
on .claude/skills/**, even when the project allowlist permits them.
|
|
38
|
+
This CLI is a narrow, audited bypass: a subprocess write that still
|
|
39
|
+
has to clear the project allowlist and the normal merge gates.
|
|
40
|
+
`;
|
|
41
|
+
|
|
42
|
+
function fail(message) {
|
|
43
|
+
process.stderr.write(`fit-selfedit: ${message}\n`);
|
|
44
|
+
process.exit(2);
|
|
45
|
+
}
|
|
46
|
+
|
|
47
|
+
const { values, positionals } = parseArgs({
|
|
48
|
+
options: {
|
|
49
|
+
help: { type: "boolean", short: "h" },
|
|
50
|
+
version: { type: "boolean" },
|
|
51
|
+
},
|
|
52
|
+
allowPositionals: true,
|
|
53
|
+
});
|
|
54
|
+
|
|
55
|
+
if (values.help) {
|
|
56
|
+
process.stdout.write(HELP);
|
|
57
|
+
process.exit(0);
|
|
58
|
+
}
|
|
59
|
+
|
|
60
|
+
if (values.version) {
|
|
61
|
+
const pkg = JSON.parse(
|
|
62
|
+
readFileSync(new URL("../package.json", import.meta.url), "utf8"),
|
|
63
|
+
);
|
|
64
|
+
process.stdout.write(`${pkg.version}\n`);
|
|
65
|
+
process.exit(0);
|
|
66
|
+
}
|
|
67
|
+
|
|
68
|
+
const [targetArg, ...extra] = positionals;
|
|
69
|
+
if (!targetArg) fail("missing <path> (try --help)");
|
|
70
|
+
if (extra.length > 0) fail(`unexpected extra arguments: ${extra.join(" ")}`);
|
|
71
|
+
|
|
72
|
+
const absoluteTarget = resolve(process.cwd(), targetArg);
|
|
73
|
+
|
|
74
|
+
// Safeguard 1: settings.json must grant Edit() on this path.
|
|
75
|
+
const settingsPath = new Finder(fsPromises, { debug() {} }).findUpward(
|
|
76
|
+
dirname(absoluteTarget),
|
|
77
|
+
".claude/settings.json",
|
|
78
|
+
20,
|
|
79
|
+
);
|
|
80
|
+
if (!settingsPath) {
|
|
81
|
+
fail(
|
|
82
|
+
`no .claude/settings.json found walking upward from ${dirname(absoluteTarget)}`,
|
|
83
|
+
);
|
|
84
|
+
}
|
|
85
|
+
|
|
86
|
+
const projectRoot = dirname(dirname(settingsPath));
|
|
87
|
+
const relativeTarget = relative(projectRoot, absoluteTarget);
|
|
88
|
+
|
|
89
|
+
let settings;
|
|
90
|
+
try {
|
|
91
|
+
settings = JSON.parse(readFileSync(settingsPath, "utf8"));
|
|
92
|
+
} catch (err) {
|
|
93
|
+
fail(`failed to parse ${settingsPath}: ${err.message}`);
|
|
94
|
+
}
|
|
95
|
+
|
|
96
|
+
const allowRules = settings?.permissions?.allow;
|
|
97
|
+
if (!Array.isArray(allowRules)) {
|
|
98
|
+
fail(`${settingsPath} has no permissions.allow[] array`);
|
|
99
|
+
}
|
|
100
|
+
|
|
101
|
+
const editPatterns = allowRules
|
|
102
|
+
.filter((rule) => typeof rule === "string")
|
|
103
|
+
.map((rule) => rule.match(/^Edit\((.+)\)$/)?.[1])
|
|
104
|
+
.filter(Boolean);
|
|
105
|
+
|
|
106
|
+
if (editPatterns.length === 0) {
|
|
107
|
+
fail(`${settingsPath} has no Edit() rules in permissions.allow[]`);
|
|
108
|
+
}
|
|
109
|
+
|
|
110
|
+
const matchedPattern = editPatterns.find((pattern) =>
|
|
111
|
+
minimatch(relativeTarget, pattern, { dot: true }),
|
|
112
|
+
);
|
|
113
|
+
if (!matchedPattern) {
|
|
114
|
+
fail(
|
|
115
|
+
`no Edit() rule in ${relative(projectRoot, settingsPath)} matches '${relativeTarget}' ` +
|
|
116
|
+
`(tried: ${editPatterns.map((p) => `Edit(${p})`).join(", ")})`,
|
|
117
|
+
);
|
|
118
|
+
}
|
|
119
|
+
|
|
120
|
+
// Safeguard 2: branch must not be main and HEAD must not be detached.
|
|
121
|
+
let branch;
|
|
122
|
+
try {
|
|
123
|
+
branch = execFileSync("git", ["rev-parse", "--abbrev-ref", "HEAD"], {
|
|
124
|
+
stdio: ["ignore", "pipe", "pipe"],
|
|
125
|
+
encoding: "utf8",
|
|
126
|
+
}).trim();
|
|
127
|
+
} catch {
|
|
128
|
+
fail("failed to read current git branch (not inside a git repository?)");
|
|
129
|
+
}
|
|
130
|
+
|
|
131
|
+
if (branch === "HEAD") {
|
|
132
|
+
fail("HEAD is detached — refusing (check out a non-main branch first)");
|
|
133
|
+
}
|
|
134
|
+
if (branch === "main") {
|
|
135
|
+
fail("refusing to write while on branch 'main' — switch to a feature branch");
|
|
136
|
+
}
|
|
137
|
+
|
|
138
|
+
const parent = dirname(absoluteTarget);
|
|
139
|
+
if (!existsSync(parent)) {
|
|
140
|
+
fail(`parent directory '${relative(projectRoot, parent)}' does not exist`);
|
|
141
|
+
}
|
|
142
|
+
|
|
143
|
+
if (process.stdin.isTTY) {
|
|
144
|
+
fail(
|
|
145
|
+
"stdin is a TTY — pipe content in (e.g. `echo … | fit-selfedit <path>`)",
|
|
146
|
+
);
|
|
147
|
+
}
|
|
148
|
+
|
|
149
|
+
const chunks = [];
|
|
150
|
+
for await (const chunk of process.stdin) chunks.push(chunk);
|
|
151
|
+
const content = Buffer.concat(chunks);
|
|
152
|
+
|
|
153
|
+
try {
|
|
154
|
+
writeFileSync(absoluteTarget, content);
|
|
155
|
+
} catch (err) {
|
|
156
|
+
process.stderr.write(`fit-selfedit: write failed: ${err.message}\n`);
|
|
157
|
+
process.exit(1);
|
|
158
|
+
}
|
|
159
|
+
|
|
160
|
+
process.stderr.write(
|
|
161
|
+
`fit-selfedit: wrote ${content.length} byte${content.length === 1 ? "" : "s"} to ${relativeTarget} ` +
|
|
162
|
+
`(matched Edit(${matchedPattern}), branch ${branch})\n`,
|
|
163
|
+
);
|
package/bin/fit-trace.js
CHANGED
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@forwardimpact/libeval",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.47",
|
|
4
4
|
"description": "Agent evaluation framework — prove whether agent changes improved outcomes with reproducible evidence.",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"eval",
|
|
@@ -33,12 +33,14 @@
|
|
|
33
33
|
".": "./src/index.js",
|
|
34
34
|
"./bin/fit-eval.js": "./bin/fit-eval.js",
|
|
35
35
|
"./bin/fit-trace.js": "./bin/fit-trace.js",
|
|
36
|
-
"./bin/fit-benchmark.js": "./bin/fit-benchmark.js"
|
|
36
|
+
"./bin/fit-benchmark.js": "./bin/fit-benchmark.js",
|
|
37
|
+
"./bin/fit-selfedit.js": "./bin/fit-selfedit.js"
|
|
37
38
|
},
|
|
38
39
|
"bin": {
|
|
39
40
|
"fit-eval": "./bin/fit-eval.js",
|
|
40
41
|
"fit-trace": "./bin/fit-trace.js",
|
|
41
|
-
"fit-benchmark": "./bin/fit-benchmark.js"
|
|
42
|
+
"fit-benchmark": "./bin/fit-benchmark.js",
|
|
43
|
+
"fit-selfedit": "./bin/fit-selfedit.js"
|
|
42
44
|
},
|
|
43
45
|
"files": [
|
|
44
46
|
"src/**/*.js",
|
|
@@ -52,8 +54,11 @@
|
|
|
52
54
|
"@anthropic-ai/claude-agent-sdk": "0.2.112",
|
|
53
55
|
"@forwardimpact/libcli": "^0.1.0",
|
|
54
56
|
"@forwardimpact/libconfig": "^0.1.0",
|
|
57
|
+
"@forwardimpact/libpreflight": "^0.1.0",
|
|
55
58
|
"@forwardimpact/libtelemetry": "^0.1.22",
|
|
59
|
+
"@forwardimpact/libutil": "^0.1.0",
|
|
56
60
|
"jmespath": "^0.16.0",
|
|
61
|
+
"minimatch": "^10.0.0",
|
|
57
62
|
"zod": "^4.4.3"
|
|
58
63
|
},
|
|
59
64
|
"devDependencies": {
|
|
@@ -61,7 +66,7 @@
|
|
|
61
66
|
},
|
|
62
67
|
"engines": {
|
|
63
68
|
"bun": ">=1.2.0",
|
|
64
|
-
"node": ">=
|
|
69
|
+
"node": ">=22.0.0"
|
|
65
70
|
},
|
|
66
71
|
"publishConfig": {
|
|
67
72
|
"access": "public"
|