@tangle-network/agent-eval 0.51.0 → 0.53.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +54 -1
- package/dist/adapters/otel.d.ts +1 -1
- package/dist/campaign/index.d.ts +7 -66
- package/dist/campaign/index.js +5 -122
- package/dist/campaign/index.js.map +1 -1
- package/dist/{chunk-XAP6DJZE.js → chunk-YXD7GWJI.js} +35 -2
- package/dist/chunk-YXD7GWJI.js.map +1 -0
- package/dist/contract/index.d.ts +16 -4
- package/dist/contract/index.js +147 -1
- package/dist/contract/index.js.map +1 -1
- package/dist/hosted/index.d.ts +1 -1
- package/dist/{index-DQHtWQ57.d.ts → index-C7RhhEME.d.ts} +46 -0
- package/dist/openapi.json +1 -1
- package/dist/{run-improvement-loop-BPMjNKMJ.d.ts → run-improvement-loop-Cc7oZlRP.d.ts} +48 -15
- package/docs/design/self-improvement-protocol.md +223 -0
- package/docs/specs/driver-honest-spec.md +251 -0
- package/docs/specs/hermes-self-improvement-audit.md +93 -0
- package/docs/specs/profile-versioning.md +291 -0
- package/package.json +1 -1
- package/dist/chunk-XAP6DJZE.js.map +0 -1
package/CHANGELOG.md
CHANGED
|
@@ -4,7 +4,60 @@ All notable changes to `@tangle-network/agent-eval` and its sibling `agent-eval-
|
|
|
4
4
|
|
|
5
5
|
---
|
|
6
6
|
|
|
7
|
-
## [0.
|
|
7
|
+
## [0.53.0] — 2026-05-27 — prior-period comparison ("did my last change help?")
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
|
|
11
|
+
- **`analyzeRuns({ runs, baselineRuns?, baselineLabel? })`** — when `baselineRuns` is provided, `InsightReport` gains a `priorPeriodComparison` block. Two-sample Welch comparison (unpaired — the two windows do NOT need to share scenarios) on: composite score, cost, duration, token usage, and every judge dimension present in both windows.
|
|
12
|
+
- **`PriorPeriodComparison` + `MetricDelta` types** — per-metric `current`, `baseline`, `delta`, Welch 95% CI, p-value, Cohen's d, `baselineN`/`currentN`, and `significant` boolean (p < 0.05 AND |d| ≥ 0.2 — conjunction prevents large-effect-but-noisy and significant-but-tiny from triggering).
|
|
13
|
+
- **`regressedMetrics` + `improvedMetrics` lists** — direction-aware (cost/duration are lower-is-better; composite/dimensions are higher-is-better). Drives the recommendations engine.
|
|
14
|
+
- **New recommendations** — `critical/investigate` fires per regressed metric with the full statistical detail in the rationale (`Welch CI95 = [..], p=.., Cohen's d=..`). `low/ship` fires per improved metric so consumers see what to celebrate without noise.
|
|
15
|
+
|
|
16
|
+
### Why this matters
|
|
17
|
+
|
|
18
|
+
"Did my last change help?" is the conversion question for every observability prospect. LangSmith / Braintrust / Phoenix ship scorecards without paired-CI deltas. Hermes has no comparison at all. Our `priorPeriodComparison` answers the question with a falsifiable, statistically-rigorous delta. The block lands in the existing `InsightReport` so every consumer of `analyzeRuns` picks it up automatically.
|
|
19
|
+
|
|
20
|
+
### Architectural context
|
|
21
|
+
|
|
22
|
+
Part of the self-improvement-protocol design (`docs/design/self-improvement-protocol.md`). This is 0.53.0 of the roadmap that ends at 1.0.0 (profile-versioning + composite driver) and 1.1.0 (empirical-proof publication).
|
|
23
|
+
|
|
24
|
+
### Notes
|
|
25
|
+
|
|
26
|
+
Pure additive surface. `priorPeriodComparison?` is optional; existing consumers untouched. 10 new tests under `tests/prior-period-comparison.test.ts` cover: no-comparison-when-omitted, significant improvement, significant regression, direction-awareness for cost/duration, noise rejection, per-dimension comparison, empty windows, CI bracket-the-truth, both recommendation types. Full suite 1454/1454 green.
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## [0.52.0] — 2026-05-27 — honest drivers + profile-versioning architecture
|
|
31
|
+
|
|
32
|
+
### Honest correction
|
|
33
|
+
|
|
34
|
+
After cloning and reading the actual SkillOpt source (microsoft/SkillOpt) and the GEPA paper (Agrawal et al., arXiv:2507.19457), 0.51.0's `skillOptDriver` was **not** SkillOpt — it was `gepaDriver` + 2 post-parse rejection rules. 0.52.0 closes that integrity gap. Greenfield in-place collapse; no V2.
|
|
35
|
+
|
|
36
|
+
### Changed (breaking)
|
|
37
|
+
|
|
38
|
+
- **`skillOptDriver` removed.** Its only substantive behavior (section preservation + sentence-edit-count cap) moves into `gepaDriver` as opt-in `constraints`. The `skillOptDriver` name is reserved for when we ship the real 6-stage patch-mode pipeline (tracked as task #100, blocked on profile-versioning).
|
|
39
|
+
- **`gepaDriver` gains `constraints?: { preserveSections?, maxSentenceEdits? }`**. When `preserveSections: []`, the driver auto-detects current H2 headings and rejects candidates that drop or rename them. When `maxSentenceEdits: N`, candidates whose sentence-level edit count vs the parent exceeds `N * 2` are rejected. Both inspired by SkillOpt's edit-budget-as-textual-learning-rate principle.
|
|
40
|
+
- **`gepaDriver` docstring updated** to be honest about Pareto: today the driver implements GEPA's *reflection* primitive but not the Pareto frontier or combine-complementary-lessons step. Tracked as task #101.
|
|
41
|
+
|
|
42
|
+
### Added
|
|
43
|
+
|
|
44
|
+
- **`docs/specs/driver-honest-spec.md`** — primary-source comparison vs GEPA and SkillOpt. Quotes the actual source. Names 13 deviations between 0.51.0's `skillOptDriver` and the real SkillOpt pipeline.
|
|
45
|
+
- **`docs/specs/hermes-self-improvement-audit.md`** — corrected audit after cloning NousResearch/hermes-agent. Hermes has two loops, not one: the 7-day curator (housekeeping) AND a per-turn `background_review` fork that uses **user corrective feedback as a first-class skill-update signal** ("stop doing X", "you always do Y"). Signal source we don't capture today.
|
|
46
|
+
- **`docs/specs/profile-versioning.md`** — architecture for the offline/online drift problem. Symmetric-fork framing (both writers are peers, neither is the authority). `AgentProfileVersion` content-hashing, `ProfileDiff` patch/replace types, 4-way `DriftGateDecision` (ship-substrate / ship-harness / merge / inconclusive), opt-in `driftPolicy` (ignore / reject-on-drift / benchmark-branches), four conflict-resolution cases including semantic-duplication detection. Phase 0 forcing-function experiment specified.
|
|
47
|
+
|
|
48
|
+
### Where we beat the prior art (now named explicitly)
|
|
49
|
+
|
|
50
|
+
Our `defaultProductionGate` uses paired bootstrap CI + Cohen's d + MDE + p-value. **SkillOpt's gate is a literal `cand_hard > current_score`** (verified at `skillopt/evaluation/gate.py:38`). **Hermes has no gate** — the forked review agent decides. We are statistically stricter than both.
|
|
51
|
+
|
|
52
|
+
### Notes
|
|
53
|
+
|
|
54
|
+
`gepaDriver({ constraints })` covers every use case the deleted `skillOptDriver` covered. The single `skillOptDriver` test file was removed; 13 new tests under `tests/gepa-driver-constraints.test.ts` cover the absorbed behavior + the unconstrained baseline behavior. Full suite 1444 / 1444 green.
|
|
55
|
+
|
|
56
|
+
---
|
|
57
|
+
|
|
58
|
+
## [0.51.0] — 2026-05-27 — skillOptDriver (SkillOpt methodology as a substrate driver) — SUPERSEDED BY 0.52.0
|
|
59
|
+
|
|
60
|
+
⚠️ 0.51.0 named a driver `skillOptDriver` after Microsoft's SkillOpt methodology but did not implement it (it was `gepaDriver` + 2 post-parse rules). The honest replacement landed in 0.52.0; this entry is preserved for changelog continuity.
|
|
8
61
|
|
|
9
62
|
### Added
|
|
10
63
|
|
package/dist/adapters/otel.d.ts
CHANGED
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
import { T as TraceSpanEvent, H as HostedClient } from '../index-
|
|
1
|
+
import { T as TraceSpanEvent, H as HostedClient } from '../index-C7RhhEME.js';
|
|
2
2
|
import '../types-Dbj5gu8n.js';
|
|
3
3
|
import '../summary-report-B7gNRX-r.js';
|
|
4
4
|
import '../run-record-BGY6bHRh.js';
|
package/dist/campaign/index.d.ts
CHANGED
|
@@ -1,72 +1,13 @@
|
|
|
1
|
-
export { C as CampaignStorage, D as DefaultProductionGateOptions, E as EvolutionaryDriverOptions, G as GepaDriverOptions, H as HeldOutGateOptions, O as OpenAutoPrOptions,
|
|
2
|
-
import { L as
|
|
3
|
-
|
|
4
|
-
|
|
1
|
+
export { C as CampaignStorage, D as DefaultProductionGateOptions, E as EvolutionaryDriverOptions, G as GepaDriverConstraints, a as GepaDriverOptions, H as HeldOutGateOptions, O as OpenAutoPrOptions, b as OpenAutoPrResult, R as RunCampaignOptions, c as RunEvalOptions, d as RunImprovementLoopOptions, e as RunImprovementLoopResult, f as RunOptimizationOptions, g as RunOptimizationResult, h as composeGate, i as countSentenceEdits, j as defaultProductionGate, k as evolutionaryDriver, l as extractH2Sections, m as fsCampaignStorage, n as gepaDriver, o as heldOutGate, p as inMemoryCampaignStorage, q as openAutoPr, r as runCampaign, s as runEval, t as runImprovementLoop, u as runOptimization, v as surfaceHash } from '../run-improvement-loop-Cc7oZlRP.js';
|
|
2
|
+
import { L as LabeledScenarioStore, c as LabeledScenarioWrite, d as LabeledScenarioSampleArgs, e as LabeledScenarioRecord, C as CodeSurface } from '../types-Dbj5gu8n.js';
|
|
3
|
+
export { f as CampaignAggregates, g as CampaignArtifactWriter, h as CampaignCellResult, i as CampaignCostMeter, j as CampaignResult, k as CampaignTraceWriter, b as DispatchContext, D as DispatchFn, G as Gate, l as GateContext, m as GateDecision, n as GateResult, o as GenerationCandidate, p as GenerationRecord, I as ImprovementDriver, q as JudgeAggregate, a as JudgeConfig, r as JudgeDimension, J as JudgeScore, s as LabeledScenarioSource, M as MutableSurface, t as Mutator, O as OptimizerConfig, P as ProposeContext, R as RedactionStatus, S as Scenario, u as ScenarioAggregate, v as SessionScript, T as TraceSpan } from '../types-Dbj5gu8n.js';
|
|
4
|
+
import '../llm-client-BXVRUZyX.js';
|
|
5
|
+
import '../errors-mje_cKOs.js';
|
|
6
|
+
import '../raw-provider-sink-C46HDghv.js';
|
|
5
7
|
import '../red-team-30II1T4o.js';
|
|
6
8
|
import '../dataset-BlwAtYYf.js';
|
|
7
|
-
import '../errors-mje_cKOs.js';
|
|
8
9
|
import '../store-Db2Bv8Cf.js';
|
|
9
10
|
import '../run-record-BGY6bHRh.js';
|
|
10
|
-
import '../raw-provider-sink-C46HDghv.js';
|
|
11
|
-
|
|
12
|
-
/**
|
|
13
|
-
* @experimental
|
|
14
|
-
*
|
|
15
|
-
* `skillOptDriver` — a section-aware, bounded-edit `ImprovementDriver` for
|
|
16
|
-
* structured natural-language procedures (SKILL.md files, runbooks, sectioned
|
|
17
|
-
* system prompts, judge rubrics with dimensions). Implements the SkillOpt
|
|
18
|
-
* methodology (Microsoft, 2026): treat the skill document as a trainable
|
|
19
|
-
* optimization target, train the procedure not the weights, constrain each
|
|
20
|
-
* generation to ≤N targeted edits to prevent useful-rule overwrites.
|
|
21
|
-
*
|
|
22
|
-
* Differs from `gepaDriver` in two specific ways:
|
|
23
|
-
*
|
|
24
|
-
* 1. **Bounded edits.** Each candidate must differ from the baseline by at
|
|
25
|
-
* most `editBudget` sentence-level changes. The "edit budget functions
|
|
26
|
-
* as a textual learning rate" — without it, an LLM proposal can rewrite
|
|
27
|
-
* so much that useful prior rules get overwritten.
|
|
28
|
-
*
|
|
29
|
-
* 2. **Section preservation.** When the surface is a structured doc, the
|
|
30
|
-
* H2 headers (and an opt-in `preserveSections` allowlist) are
|
|
31
|
-
* load-bearing for discoverability. Candidates that delete or rename
|
|
32
|
-
* preserved sections are rejected at parse time.
|
|
33
|
-
*
|
|
34
|
-
* Selectable alongside `gepaDriver` and `evolutionaryDriver`. Use this when
|
|
35
|
-
* the surface IS a structured doc; use `gepaDriver` when the surface is
|
|
36
|
-
* unstructured prose.
|
|
37
|
-
*/
|
|
38
|
-
|
|
39
|
-
interface SkillOptDriverOptions {
|
|
40
|
-
/** Router transport (apiKey/baseUrl). */
|
|
41
|
-
llm: LlmClientOptions;
|
|
42
|
-
/** Model that performs the reflection. */
|
|
43
|
-
model: string;
|
|
44
|
-
/** What is being optimized — appears in the reflection prompt for orientation. */
|
|
45
|
-
target: string;
|
|
46
|
-
/** Max edits per generation — SkillOpt's "textual learning rate".
|
|
47
|
-
* Default 3. Lower = more conservative, higher = more exploratory. */
|
|
48
|
-
editBudget?: number;
|
|
49
|
-
/** Section headings the driver MUST preserve. When the surface is a
|
|
50
|
-
* structured skill doc, sections are load-bearing for discoverability.
|
|
51
|
-
* Default: auto-detected from H2 headers in the baseline. */
|
|
52
|
-
preserveSections?: string[];
|
|
53
|
-
/** Surface-specific mutation levers offered to the model. */
|
|
54
|
-
mutationPrimitives?: string[];
|
|
55
|
-
/** Top/bottom scenarios surfaced as evidence each generation. Default 3. */
|
|
56
|
-
evidenceK?: number;
|
|
57
|
-
/** Reflection sampling temperature. Default 0.7. */
|
|
58
|
-
temperature?: number;
|
|
59
|
-
/** Reflection max tokens. Default 6000. */
|
|
60
|
-
maxTokens?: number;
|
|
61
|
-
}
|
|
62
|
-
/** Internal — exported for tests. */
|
|
63
|
-
declare function extractH2Sections(text: string): string[];
|
|
64
|
-
/** Sentence-level edit distance. Counts distinct sentence add/remove/replace
|
|
65
|
-
* ops between baseline and candidate using a normalised line-by-line diff.
|
|
66
|
-
* Imperfect (treats trivial whitespace as identical) but tight enough to
|
|
67
|
-
* bound an LLM rewrite. Exported for tests. */
|
|
68
|
-
declare function countSentenceEdits(baseline: string, candidate: string): number;
|
|
69
|
-
declare function skillOptDriver(opts: SkillOptDriverOptions): ImprovementDriver;
|
|
70
11
|
|
|
71
12
|
/**
|
|
72
13
|
* @experimental
|
|
@@ -183,4 +124,4 @@ declare function gitWorktreeAdapter(opts: GitWorktreeAdapterOptions): WorktreeAd
|
|
|
183
124
|
* as a ref under the adapter's worktree dir. */
|
|
184
125
|
declare function resolveWorktreePath(surface: CodeSurface, worktreeDir?: string): string;
|
|
185
126
|
|
|
186
|
-
export { CodeSurface, FsLabeledScenarioStore, type FsLabeledScenarioStoreOptions, type GitWorktreeAdapterOptions,
|
|
127
|
+
export { CodeSurface, FsLabeledScenarioStore, type FsLabeledScenarioStoreOptions, type GitWorktreeAdapterOptions, LabeledScenarioRecord, LabeledScenarioSampleArgs, LabeledScenarioStore, LabeledScenarioStoreError, LabeledScenarioWrite, type Worktree, type WorktreeAdapter, WorktreeAdapterError, gitWorktreeAdapter, resolveWorktreePath };
|
package/dist/campaign/index.js
CHANGED
|
@@ -1,7 +1,9 @@
|
|
|
1
1
|
import {
|
|
2
2
|
composeGate,
|
|
3
|
+
countSentenceEdits,
|
|
3
4
|
defaultProductionGate,
|
|
4
5
|
evolutionaryDriver,
|
|
6
|
+
extractH2Sections,
|
|
5
7
|
gepaDriver,
|
|
6
8
|
heldOutGate,
|
|
7
9
|
openAutoPr,
|
|
@@ -9,139 +11,21 @@ import {
|
|
|
9
11
|
runImprovementLoop,
|
|
10
12
|
runOptimization,
|
|
11
13
|
surfaceHash
|
|
12
|
-
} from "../chunk-
|
|
14
|
+
} from "../chunk-YXD7GWJI.js";
|
|
13
15
|
import {
|
|
14
16
|
fsCampaignStorage,
|
|
15
17
|
inMemoryCampaignStorage,
|
|
16
18
|
runCampaign
|
|
17
19
|
} from "../chunk-J3EIOI3O.js";
|
|
18
|
-
import
|
|
19
|
-
buildReflectionPrompt,
|
|
20
|
-
parseReflectionResponse
|
|
21
|
-
} from "../chunk-N4SBKEPJ.js";
|
|
20
|
+
import "../chunk-N4SBKEPJ.js";
|
|
22
21
|
import "../chunk-YV7J7X5N.js";
|
|
23
22
|
import "../chunk-WP7SY7AI.js";
|
|
24
23
|
import "../chunk-GGE4NNQT.js";
|
|
25
|
-
import
|
|
26
|
-
callLlm
|
|
27
|
-
} from "../chunk-VXNVVBZO.js";
|
|
24
|
+
import "../chunk-VXNVVBZO.js";
|
|
28
25
|
import "../chunk-PC4UYEBM.js";
|
|
29
26
|
import "../chunk-QYJT52YW.js";
|
|
30
27
|
import "../chunk-NSBPE2FW.js";
|
|
31
28
|
|
|
32
|
-
// src/campaign/drivers/skillopt.ts
|
|
33
|
-
var REFLECTION_SYSTEM = 'You are an expert prompt engineer applying the SkillOpt methodology. You will edit a structured natural-language procedure under TWO HARD CONSTRAINTS: (1) preserve every H2 section heading verbatim \u2014 do NOT delete, rename, or merge sections; (2) make at most EDIT_BUDGET targeted sentence-level edits per candidate \u2014 bounded edits prevent overwriting useful prior rules. Output ONLY a JSON object of shape {"proposals":[{"label":string,"rationale":string,"payload":string}]} where each `payload` is the FULL improved skill text. No prose outside the JSON.';
|
|
34
|
-
function extractH2Sections(text) {
|
|
35
|
-
const out = [];
|
|
36
|
-
for (const line of text.split("\n")) {
|
|
37
|
-
const match = /^##\s+(.+?)\s*$/.exec(line);
|
|
38
|
-
if (match) out.push(match[1]);
|
|
39
|
-
}
|
|
40
|
-
return out;
|
|
41
|
-
}
|
|
42
|
-
function countSentenceEdits(baseline, candidate) {
|
|
43
|
-
const norm = (s) => s.split(/(?<=[.!?])\s+|\n/g).map((p) => p.trim()).filter((p) => p.length > 0);
|
|
44
|
-
const a = new Set(norm(baseline));
|
|
45
|
-
const b = new Set(norm(candidate));
|
|
46
|
-
let edits = 0;
|
|
47
|
-
for (const s of a) if (!b.has(s)) edits++;
|
|
48
|
-
for (const s of b) if (!a.has(s)) edits++;
|
|
49
|
-
return edits;
|
|
50
|
-
}
|
|
51
|
-
function skillOptDriver(opts) {
|
|
52
|
-
const evidenceK = opts.evidenceK ?? 3;
|
|
53
|
-
const editBudget = opts.editBudget ?? 3;
|
|
54
|
-
if (editBudget < 1) {
|
|
55
|
-
throw new Error(
|
|
56
|
-
`skillOptDriver: editBudget must be >= 1, got ${editBudget} (use evolutionaryDriver with a noop mutator for measure-only runs)`
|
|
57
|
-
);
|
|
58
|
-
}
|
|
59
|
-
return {
|
|
60
|
-
kind: "skillopt",
|
|
61
|
-
async propose(ctx) {
|
|
62
|
-
if (typeof ctx.currentSurface !== "string") {
|
|
63
|
-
throw new Error(
|
|
64
|
-
`skillOptDriver: surface must be a string skill document; got ${typeof ctx.currentSurface}. Use evolutionaryDriver with a CodeSurface mutator for code-tier surfaces.`
|
|
65
|
-
);
|
|
66
|
-
}
|
|
67
|
-
const baseline = ctx.currentSurface;
|
|
68
|
-
const preserveSections = opts.preserveSections ?? extractH2Sections(baseline);
|
|
69
|
-
const { top, bottom, target } = buildEvidence(ctx, evidenceK, opts.target);
|
|
70
|
-
const reflectionUser = buildReflectionPrompt({
|
|
71
|
-
target,
|
|
72
|
-
parentPayload: baseline,
|
|
73
|
-
topTrials: top,
|
|
74
|
-
bottomTrials: bottom,
|
|
75
|
-
childCount: ctx.populationSize,
|
|
76
|
-
mutationPrimitives: opts.mutationPrimitives
|
|
77
|
-
});
|
|
78
|
-
const constraintPreamble = [
|
|
79
|
-
"",
|
|
80
|
-
"## SkillOpt constraints (hard rules \u2014 violations rejected)",
|
|
81
|
-
"",
|
|
82
|
-
`- Edit budget: at most ${editBudget} sentence-level edits per candidate.`,
|
|
83
|
-
"- Section preservation: every H2 heading below must appear unchanged in your output.",
|
|
84
|
-
...preserveSections.map((s) => ` - \`## ${s}\``),
|
|
85
|
-
"",
|
|
86
|
-
"Reject any candidate in your own thinking that would delete a section, rename a heading, or exceed the edit budget. Make TARGETED, surgical edits \u2014 not rewrites.",
|
|
87
|
-
""
|
|
88
|
-
].join("\n");
|
|
89
|
-
const userPrompt = `${reflectionUser}${constraintPreamble}`;
|
|
90
|
-
const system = REFLECTION_SYSTEM.replace("EDIT_BUDGET", String(editBudget));
|
|
91
|
-
const result = await callLlm(
|
|
92
|
-
{
|
|
93
|
-
model: opts.model,
|
|
94
|
-
messages: [
|
|
95
|
-
{ role: "system", content: system },
|
|
96
|
-
{ role: "user", content: userPrompt }
|
|
97
|
-
],
|
|
98
|
-
jsonMode: true,
|
|
99
|
-
temperature: opts.temperature ?? 0.7,
|
|
100
|
-
maxTokens: opts.maxTokens ?? 6e3
|
|
101
|
-
},
|
|
102
|
-
opts.llm
|
|
103
|
-
);
|
|
104
|
-
const proposals = parseReflectionResponse(result.content, ctx.populationSize);
|
|
105
|
-
const out = [];
|
|
106
|
-
for (const proposal of proposals) {
|
|
107
|
-
const text = typeof proposal.payload === "string" ? proposal.payload.trim() : "";
|
|
108
|
-
if (!text || text === baseline) continue;
|
|
109
|
-
if (!validateSections(text, preserveSections)) continue;
|
|
110
|
-
if (countSentenceEdits(baseline, text) > editBudget * 2) continue;
|
|
111
|
-
if (out.includes(text)) continue;
|
|
112
|
-
out.push(text);
|
|
113
|
-
}
|
|
114
|
-
return out;
|
|
115
|
-
}
|
|
116
|
-
};
|
|
117
|
-
}
|
|
118
|
-
function validateSections(candidate, required) {
|
|
119
|
-
if (required.length === 0) return true;
|
|
120
|
-
const have = new Set(extractH2Sections(candidate));
|
|
121
|
-
for (const section of required) {
|
|
122
|
-
if (!have.has(section)) return false;
|
|
123
|
-
}
|
|
124
|
-
return true;
|
|
125
|
-
}
|
|
126
|
-
function buildEvidence(ctx, evidenceK, baseTarget) {
|
|
127
|
-
const last = ctx.history.at(-1);
|
|
128
|
-
if (!last || last.candidates.length === 0) {
|
|
129
|
-
return { top: [], bottom: [], target: baseTarget };
|
|
130
|
-
}
|
|
131
|
-
const best = [...last.candidates].sort((a, b) => b.composite - a.composite)[0];
|
|
132
|
-
if (!best) return { top: [], bottom: [], target: baseTarget };
|
|
133
|
-
const byScore = [...best.scenarios].sort((a, b) => b.composite - a.composite);
|
|
134
|
-
const toTrace = (s) => ({
|
|
135
|
-
id: s.scenarioId,
|
|
136
|
-
score: s.composite
|
|
137
|
-
});
|
|
138
|
-
const top = byScore.slice(0, evidenceK).map(toTrace);
|
|
139
|
-
const bottom = byScore.slice(-evidenceK).reverse().map(toTrace);
|
|
140
|
-
const weakest = Object.entries(best.dimensions).sort((a, b) => a[1] - b[1]).slice(0, 3).map(([dim, value]) => `${dim} (${value.toFixed(2)})`);
|
|
141
|
-
const target = weakest.length > 0 ? `${baseTarget} \u2014 weakest dimensions: ${weakest.join(", ")}` : baseTarget;
|
|
142
|
-
return { top, bottom, target };
|
|
143
|
-
}
|
|
144
|
-
|
|
145
29
|
// src/campaign/labeled-store/fs-adapter.ts
|
|
146
30
|
import { createHash } from "crypto";
|
|
147
31
|
import { existsSync, mkdirSync, readFileSync, writeFileSync } from "fs";
|
|
@@ -416,7 +300,6 @@ export {
|
|
|
416
300
|
runEval,
|
|
417
301
|
runImprovementLoop,
|
|
418
302
|
runOptimization,
|
|
419
|
-
skillOptDriver,
|
|
420
303
|
surfaceHash
|
|
421
304
|
};
|
|
422
305
|
//# sourceMappingURL=index.js.map
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"sources":["../../src/campaign/drivers/skillopt.ts","../../src/campaign/labeled-store/fs-adapter.ts","../../src/campaign/worktree/index.ts"],"sourcesContent":["/**\n * @experimental\n *\n * `skillOptDriver` — a section-aware, bounded-edit `ImprovementDriver` for\n * structured natural-language procedures (SKILL.md files, runbooks, sectioned\n * system prompts, judge rubrics with dimensions). Implements the SkillOpt\n * methodology (Microsoft, 2026): treat the skill document as a trainable\n * optimization target, train the procedure not the weights, constrain each\n * generation to ≤N targeted edits to prevent useful-rule overwrites.\n *\n * Differs from `gepaDriver` in two specific ways:\n *\n * 1. **Bounded edits.** Each candidate must differ from the baseline by at\n * most `editBudget` sentence-level changes. The \"edit budget functions\n * as a textual learning rate\" — without it, an LLM proposal can rewrite\n * so much that useful prior rules get overwritten.\n *\n * 2. **Section preservation.** When the surface is a structured doc, the\n * H2 headers (and an opt-in `preserveSections` allowlist) are\n * load-bearing for discoverability. Candidates that delete or rename\n * preserved sections are rejected at parse time.\n *\n * Selectable alongside `gepaDriver` and `evolutionaryDriver`. Use this when\n * the surface IS a structured doc; use `gepaDriver` when the surface is\n * unstructured prose.\n */\n\nimport { callLlm, type LlmClientOptions } from '../../llm-client'\nimport {\n buildReflectionPrompt,\n parseReflectionResponse,\n type TrialTrace,\n} from '../../reflective-mutation'\nimport type { ImprovementDriver, MutableSurface, ProposeContext } from '../types'\n\nconst REFLECTION_SYSTEM =\n 'You are an expert prompt engineer applying the SkillOpt methodology. ' +\n 'You will edit a structured natural-language procedure under TWO HARD ' +\n 'CONSTRAINTS: (1) preserve every H2 section heading verbatim — do NOT ' +\n 'delete, rename, or merge sections; (2) make at most EDIT_BUDGET targeted ' +\n 'sentence-level edits per candidate — bounded edits prevent overwriting ' +\n 'useful prior rules. Output ONLY a JSON object of shape ' +\n '{\"proposals\":[{\"label\":string,\"rationale\":string,\"payload\":string}]} ' +\n 'where each `payload` is the FULL improved skill text. No prose outside the JSON.'\n\nexport interface SkillOptDriverOptions {\n /** Router transport (apiKey/baseUrl). */\n llm: LlmClientOptions\n /** Model that performs the reflection. */\n model: string\n /** What is being optimized — appears in the reflection prompt for orientation. */\n target: string\n\n /** Max edits per generation — SkillOpt's \"textual learning rate\".\n * Default 3. Lower = more conservative, higher = more exploratory. */\n editBudget?: number\n\n /** Section headings the driver MUST preserve. When the surface is a\n * structured skill doc, sections are load-bearing for discoverability.\n * Default: auto-detected from H2 headers in the baseline. */\n preserveSections?: string[]\n\n /** Surface-specific mutation levers offered to the model. */\n mutationPrimitives?: string[]\n /** Top/bottom scenarios surfaced as evidence each generation. Default 3. */\n evidenceK?: number\n /** Reflection sampling temperature. Default 0.7. */\n temperature?: number\n /** Reflection max tokens. Default 6000. */\n maxTokens?: number\n}\n\n/** Internal — exported for tests. */\nexport function extractH2Sections(text: string): string[] {\n const out: string[] = []\n for (const line of text.split('\\n')) {\n const match = /^##\\s+(.+?)\\s*$/.exec(line)\n if (match) out.push(match[1]!)\n }\n return out\n}\n\n/** Sentence-level edit distance. Counts distinct sentence add/remove/replace\n * ops between baseline and candidate using a normalised line-by-line diff.\n * Imperfect (treats trivial whitespace as identical) but tight enough to\n * bound an LLM rewrite. Exported for tests. */\nexport function countSentenceEdits(baseline: string, candidate: string): number {\n const norm = (s: string) =>\n s\n .split(/(?<=[.!?])\\s+|\\n/g)\n .map((p) => p.trim())\n .filter((p) => p.length > 0)\n const a = new Set(norm(baseline))\n const b = new Set(norm(candidate))\n let edits = 0\n for (const s of a) if (!b.has(s)) edits++ // deletions\n for (const s of b) if (!a.has(s)) edits++ // additions\n return edits\n}\n\nexport function skillOptDriver(opts: SkillOptDriverOptions): ImprovementDriver {\n const evidenceK = opts.evidenceK ?? 3\n const editBudget = opts.editBudget ?? 3\n if (editBudget < 1) {\n throw new Error(\n `skillOptDriver: editBudget must be >= 1, got ${editBudget} (use evolutionaryDriver with a noop mutator for measure-only runs)`,\n )\n }\n return {\n kind: 'skillopt',\n async propose(ctx: ProposeContext): Promise<MutableSurface[]> {\n if (typeof ctx.currentSurface !== 'string') {\n throw new Error(\n `skillOptDriver: surface must be a string skill document; got ${typeof ctx.currentSurface}. Use evolutionaryDriver with a CodeSurface mutator for code-tier surfaces.`,\n )\n }\n const baseline = ctx.currentSurface\n const preserveSections = opts.preserveSections ?? extractH2Sections(baseline)\n\n const { top, bottom, target } = buildEvidence(ctx, evidenceK, opts.target)\n\n const reflectionUser = buildReflectionPrompt({\n target,\n parentPayload: baseline,\n topTrials: top,\n bottomTrials: bottom,\n childCount: ctx.populationSize,\n mutationPrimitives: opts.mutationPrimitives,\n })\n const constraintPreamble = [\n '',\n '## SkillOpt constraints (hard rules — violations rejected)',\n '',\n `- Edit budget: at most ${editBudget} sentence-level edits per candidate.`,\n '- Section preservation: every H2 heading below must appear unchanged in your output.',\n ...preserveSections.map((s) => ` - \\`## ${s}\\``),\n '',\n 'Reject any candidate in your own thinking that would delete a section, rename a heading, or exceed the edit budget. Make TARGETED, surgical edits — not rewrites.',\n '',\n ].join('\\n')\n const userPrompt = `${reflectionUser}${constraintPreamble}`\n const system = REFLECTION_SYSTEM.replace('EDIT_BUDGET', String(editBudget))\n\n const result = await callLlm(\n {\n model: opts.model,\n messages: [\n { role: 'system', content: system },\n { role: 'user', content: userPrompt },\n ],\n jsonMode: true,\n temperature: opts.temperature ?? 0.7,\n maxTokens: opts.maxTokens ?? 6000,\n },\n opts.llm,\n )\n\n const proposals = parseReflectionResponse(result.content, ctx.populationSize)\n const out: MutableSurface[] = []\n for (const proposal of proposals) {\n const text = typeof proposal.payload === 'string' ? proposal.payload.trim() : ''\n if (!text || text === baseline) continue\n if (!validateSections(text, preserveSections)) continue\n if (countSentenceEdits(baseline, text) > editBudget * 2) continue // x2: add+remove pair per edit\n if (out.includes(text)) continue\n out.push(text)\n }\n return out\n },\n }\n}\n\nfunction validateSections(candidate: string, required: string[]): boolean {\n if (required.length === 0) return true\n const have = new Set(extractH2Sections(candidate))\n for (const section of required) {\n if (!have.has(section)) return false\n }\n return true\n}\n\n/** Reused from gepaDriver pattern — build evidence from prior best candidate. */\nfunction buildEvidence(\n ctx: ProposeContext,\n evidenceK: number,\n baseTarget: string,\n): { top: TrialTrace[]; bottom: TrialTrace[]; target: string } {\n const last = ctx.history.at(-1)\n if (!last || last.candidates.length === 0) {\n return { top: [], bottom: [], target: baseTarget }\n }\n const best = [...last.candidates].sort((a, b) => b.composite - a.composite)[0]\n if (!best) return { top: [], bottom: [], target: baseTarget }\n\n const byScore = [...best.scenarios].sort((a, b) => b.composite - a.composite)\n const toTrace = (s: { scenarioId: string; composite: number }): TrialTrace => ({\n id: s.scenarioId,\n score: s.composite,\n })\n const top = byScore.slice(0, evidenceK).map(toTrace)\n const bottom = byScore.slice(-evidenceK).reverse().map(toTrace)\n\n const weakest = Object.entries(best.dimensions)\n .sort((a, b) => a[1] - b[1])\n .slice(0, 3)\n .map(([dim, value]) => `${dim} (${value.toFixed(2)})`)\n const target =\n weakest.length > 0 ? `${baseTarget} — weakest dimensions: ${weakest.join(', ')}` : baseTarget\n\n return { top, bottom, target }\n}\n","/**\n * @experimental\n *\n * Filesystem `LabeledScenarioStore` adapter. The default capture sink for\n * traces + eval artifacts. Production deployments typically swap for a\n * Turso/SQLite adapter (same interface).\n *\n * Records land as one JSONL file per source under `<root>/<source>.jsonl`.\n * Each line is a `LabeledScenarioRecord`. Append-only — no in-place edits.\n *\n * Safety properties enforced at write-time:\n *\n * - **Provenance required**: writes without `source`, `sourceVersionHash`,\n * `capturedAt`, `redactionStatus` are rejected. Closes the alignment\n * reviewer's data-poisoning gap.\n * - **Per-source rate limits**: optional `rateLimitBucket` + `maxWritesPerMinute`\n * stops a single tenant/source from flooding the store.\n *\n * Safety properties enforced at sample-time:\n *\n * - **Required split + capturedBefore**: substrate refuses to sample without\n * an explicit `split` ('train' | 'test') AND a temporal cutoff. Eliminates\n * accidental train/test contamination.\n * - **Default training-source filter**: when the store is sampled with\n * `split: 'train'`, production-trace records are EXCLUDED unless the\n * caller passes `filter.source: 'production-trace'` explicitly. Closes\n * the contamination-by-default gap flagged by the senior eval engineer.\n */\n\nimport { createHash } from 'node:crypto'\nimport { existsSync, mkdirSync, readFileSync, writeFileSync } from 'node:fs'\nimport { join } from 'node:path'\nimport type {\n LabeledScenarioRecord,\n LabeledScenarioSampleArgs,\n LabeledScenarioSource,\n LabeledScenarioStore,\n LabeledScenarioWrite,\n} from '../types'\n\nexport interface FsLabeledScenarioStoreOptions {\n /** Root directory for JSONL files. Created if missing. */\n root: string\n /** Per-source rate limit. When set, writes exceeding the cap are rejected\n * with a typed error. Default: no limit. */\n maxWritesPerMinutePerBucket?: number\n /** Test seam — override `Date.now()` for deterministic tests. */\n now?: () => number\n}\n\nexport class LabeledScenarioStoreError extends Error {\n constructor(\n public readonly code: string,\n message: string,\n ) {\n super(message)\n this.name = 'LabeledScenarioStoreError'\n }\n}\n\ninterface RateLimitState {\n bucket: string\n windowStartMs: number\n count: number\n}\n\nexport class FsLabeledScenarioStore implements LabeledScenarioStore {\n private readonly now: () => number\n private readonly rateLimits = new Map<string, RateLimitState>()\n\n constructor(private readonly options: FsLabeledScenarioStoreOptions) {\n if (!existsSync(options.root)) mkdirSync(options.root, { recursive: true })\n this.now = options.now ?? Date.now\n }\n\n async observe(write: LabeledScenarioWrite): Promise<void> {\n this.assertProvenance(write)\n this.assertRateLimit(write)\n const record = this.toRecord(write)\n const path = this.pathForSource(write.source)\n const line = `${JSON.stringify(record)}\\n`\n // Append atomically. For high-throughput a writev-friendly buffered\n // implementation lands in the Turso adapter; FS adapter is for tests +\n // local dev + small workloads.\n appendLine(path, line)\n }\n\n async sample(args: LabeledScenarioSampleArgs): Promise<LabeledScenarioRecord[]> {\n if (!args.split) {\n throw new LabeledScenarioStoreError(\n 'split_required',\n 'sample() requires an explicit `split` (train | test) — substrate refuses ambiguous reads',\n )\n }\n if (!args.capturedBefore) {\n throw new LabeledScenarioStoreError(\n 'capturedBefore_required',\n 'sample() requires an explicit `capturedBefore` timestamp for temporal-split discipline',\n )\n }\n\n const all: LabeledScenarioRecord[] = []\n for (const source of ALL_SOURCES) {\n // Default training-source filter: when sampling train, EXCLUDE\n // production-trace records unless the caller asks for them.\n if (args.split === 'train' && source === 'production-trace') {\n const explicit = sourceFilterContains(args.filter?.source, 'production-trace')\n if (!explicit) continue\n }\n const path = this.pathForSource(source)\n if (!existsSync(path)) continue\n const lines = readFileSync(path, 'utf8').split('\\n').filter(Boolean)\n for (const line of lines) {\n let record: LabeledScenarioRecord\n try {\n record = JSON.parse(line) as LabeledScenarioRecord\n } catch {\n continue\n }\n if (!matchesFilter(record, args, source)) continue\n all.push(record)\n }\n }\n\n // Deterministic order: by capturedAt ascending, then recordHash.\n all.sort((a, b) => {\n if (a.capturedAt !== b.capturedAt) return a.capturedAt.localeCompare(b.capturedAt)\n return a.recordHash.localeCompare(b.recordHash)\n })\n\n return all.slice(0, args.count)\n }\n\n async size(): Promise<{ train: number; test: number; bySource: Record<string, number> }> {\n const bySource: Record<string, number> = {}\n let total = 0\n for (const source of ALL_SOURCES) {\n const path = this.pathForSource(source)\n if (!existsSync(path)) {\n bySource[source] = 0\n continue\n }\n const count = readFileSync(path, 'utf8').split('\\n').filter(Boolean).length\n bySource[source] = count\n total += count\n }\n // FS adapter doesn't track split assignments per-record (split is\n // computed at sample-time based on `capturedBefore`). For size(), we\n // report `train`+`test` as the same total — split is a sampling concept.\n return { train: total, test: total, bySource }\n }\n\n private assertProvenance(write: LabeledScenarioWrite): void {\n if (!write.source) {\n throw new LabeledScenarioStoreError(\n 'missing_source',\n 'LabeledScenarioWrite requires `source`',\n )\n }\n if (!write.sourceVersionHash || write.sourceVersionHash.length === 0) {\n throw new LabeledScenarioStoreError(\n 'missing_source_version',\n 'LabeledScenarioWrite requires `sourceVersionHash` (git sha or substrate version)',\n )\n }\n if (!write.capturedAt) {\n throw new LabeledScenarioStoreError(\n 'missing_captured_at',\n 'LabeledScenarioWrite requires `capturedAt` ISO timestamp',\n )\n }\n if (!write.redactionStatus) {\n throw new LabeledScenarioStoreError(\n 'missing_redaction_status',\n 'LabeledScenarioWrite requires explicit `redactionStatus` — raw / redacted-pii / redacted-secrets / fully-redacted',\n )\n }\n if (!ALL_SOURCES.includes(write.source)) {\n throw new LabeledScenarioStoreError(\n 'unknown_source',\n `LabeledScenarioWrite.source must be one of: ${ALL_SOURCES.join(', ')}`,\n )\n }\n }\n\n private assertRateLimit(write: LabeledScenarioWrite): void {\n const cap = this.options.maxWritesPerMinutePerBucket\n if (!cap || !write.rateLimitBucket) return\n const now = this.now()\n const windowMs = 60_000\n let state = this.rateLimits.get(write.rateLimitBucket)\n if (!state || now - state.windowStartMs >= windowMs) {\n state = { bucket: write.rateLimitBucket, windowStartMs: now, count: 0 }\n this.rateLimits.set(write.rateLimitBucket, state)\n }\n if (state.count >= cap) {\n throw new LabeledScenarioStoreError(\n 'rate_limit_exceeded',\n `LabeledScenarioStore: bucket ${write.rateLimitBucket} exceeded ${cap} writes/min`,\n )\n }\n state.count += 1\n }\n\n private toRecord(write: LabeledScenarioWrite): LabeledScenarioRecord {\n const recordHash = sha256(\n JSON.stringify({\n id: write.scenario.id,\n src: write.source,\n at: write.capturedAt,\n ver: write.sourceVersionHash,\n }),\n )\n // FS adapter assigns split at sample-time, but we cache a hint here\n // based on capturedAt vs the world's \"now\" — sampler overrides this.\n return {\n ...write,\n recordHash,\n split: 'train',\n }\n }\n\n private pathForSource(source: string): string {\n return join(this.options.root, `${source}.jsonl`)\n }\n}\n\nconst ALL_SOURCES: LabeledScenarioWrite['source'][] = [\n 'production-trace',\n 'eval-run',\n 'manual',\n 'red-team',\n 'synthetic',\n]\n\nfunction sourceFilterContains(\n filter: LabeledScenarioSource | LabeledScenarioSource[] | undefined,\n needle: LabeledScenarioSource,\n): boolean {\n if (!filter) return false\n if (Array.isArray(filter)) return filter.includes(needle)\n return filter === needle\n}\n\nfunction matchesFilter(\n record: LabeledScenarioRecord,\n args: LabeledScenarioSampleArgs,\n source: string,\n): boolean {\n // Temporal cutoff — train must be capturedAt < capturedBefore.\n if (args.split === 'train' && record.capturedAt >= args.capturedBefore) return false\n if (args.split === 'test' && record.capturedAt < args.capturedBefore) return false\n\n const f = args.filter\n if (!f) return true\n if (f.kind && record.scenario.kind !== f.kind) return false\n if (f.source) {\n const sources = Array.isArray(f.source) ? f.source : [f.source]\n if (!sources.includes(source as never)) return false\n }\n if (f.minComposite !== undefined || f.maxComposite !== undefined) {\n const composites = Object.values(record.judgeScores).map((s) => s.composite)\n const max = composites.length === 0 ? 0 : Math.max(...composites)\n if (f.minComposite !== undefined && max < f.minComposite) return false\n if (f.maxComposite !== undefined && max > f.maxComposite) return false\n }\n return true\n}\n\nfunction sha256(input: string): string {\n return createHash('sha256').update(input).digest('hex').slice(0, 16)\n}\n\nfunction appendLine(path: string, line: string): void {\n if (existsSync(path)) {\n const existing = readFileSync(path, 'utf8')\n writeFileSync(path, existing + line)\n } else {\n writeFileSync(path, line)\n }\n}\n","/**\n * @experimental\n *\n * VCS-pluggable worktree adapter. One improvement = one worktree, PR-like\n * (multiple commits allowed). A code-tier driver's `propose()` creates a\n * worktree, an agent commits the change into it, and `finalize()` returns a\n * `CodeSurface{ worktreeRef }` the measurement checks out to run the worker\n * against the changed code. On promotion the worktree becomes the PR branch.\n *\n * The interface is VCS-agnostic so a future `jj` ([jj-vcs](https://github.com/jj-vcs/jj))\n * adapter can slot in without touching driver code. Only the git adapter\n * ships today. See `docs/design/self-improvement-engine.md`.\n */\n\nimport { execFileSync } from 'node:child_process'\nimport { existsSync } from 'node:fs'\nimport { basename, isAbsolute, join } from 'node:path'\nimport type { CodeSurface } from '../types'\n\nexport interface Worktree {\n /** Absolute path to the checked-out worktree directory. */\n path: string\n /** The branch the worktree is on (becomes the PR branch on promotion). */\n branch: string\n /** The ref the worktree was forked from. */\n baseRef: string\n}\n\nexport interface WorktreeAdapter {\n /** Create an isolated worktree on a fresh branch off `baseRef`. */\n create(opts: { baseRef: string; label: string }): Promise<Worktree>\n /** Commit any pending changes in the worktree, then return a CodeSurface\n * pointing at it. The agent has already written its change into\n * `worktree.path` by the time this is called. */\n finalize(worktree: Worktree, summary: string): Promise<CodeSurface>\n /** Remove the worktree (and its branch) — called for losing candidates. */\n discard(worktree: Worktree): Promise<void>\n}\n\nexport class WorktreeAdapterError extends Error {\n constructor(\n message: string,\n readonly cause?: unknown,\n ) {\n super(message)\n this.name = 'WorktreeAdapterError'\n }\n}\n\nexport interface GitWorktreeAdapterOptions {\n /** Repo root the worktrees fork from. */\n repoRoot: string\n /** Directory worktrees are created under. Default: `<repoRoot>/.worktrees`. */\n worktreeDir?: string\n /** Branch-name prefix. Default: `improve`. */\n branchPrefix?: string\n /** Test seam — defaults to a real `git` runner. */\n git?: (args: string[], cwd: string) => string\n}\n\nfunction defaultGit(args: string[], cwd: string): string {\n try {\n return execFileSync('git', args, { cwd, encoding: 'utf8' }).trim()\n } catch (err) {\n const stderr =\n err && typeof err === 'object' && 'stderr' in err\n ? String((err as { stderr: unknown }).stderr)\n : ''\n throw new WorktreeAdapterError(`git ${args.join(' ')} failed: ${stderr || String(err)}`, err)\n }\n}\n\n/** Slugify a label into a branch-safe segment. */\nfunction slug(label: string): string {\n return (\n label\n .toLowerCase()\n .replace(/[^a-z0-9]+/g, '-')\n .replace(/^-+|-+$/g, '')\n .slice(0, 48) || 'candidate'\n )\n}\n\nexport function gitWorktreeAdapter(opts: GitWorktreeAdapterOptions): WorktreeAdapter {\n const git = opts.git ?? defaultGit\n const worktreeDir = opts.worktreeDir ?? join(opts.repoRoot, '.worktrees')\n const branchPrefix = opts.branchPrefix ?? 'improve'\n\n return {\n async create({ baseRef, label }) {\n const id = `${slug(label)}-${Date.now().toString(36)}-${Math.random().toString(36).slice(2, 6)}`\n const branch = `${branchPrefix}/${id}`\n const path = join(worktreeDir, id)\n git(['worktree', 'add', '-b', branch, path, baseRef], opts.repoRoot)\n return { path, branch, baseRef }\n },\n\n async finalize(worktree, summary) {\n // Stage + commit any pending changes the agent left in the worktree.\n // A no-op commit is refused by git, so only commit when the tree is dirty.\n const status = git(['status', '--porcelain'], worktree.path)\n if (status.length > 0) {\n git(['add', '-A'], worktree.path)\n git(['commit', '-m', summary], worktree.path)\n }\n return {\n kind: 'code',\n worktreeRef: worktree.path,\n baseRef: worktree.baseRef,\n summary,\n }\n },\n\n async discard(worktree) {\n // Remove the worktree, then delete its branch. Force-remove because the\n // worktree may hold uncommitted experiment state we're discarding.\n git(['worktree', 'remove', '--force', worktree.path], opts.repoRoot)\n git(['branch', '-D', worktree.branch], opts.repoRoot)\n },\n }\n}\n\n/** Resolve a `CodeSurface`'s worktreeRef to a directory the measurement can\n * run the worker in. A path ref is returned as-is; anything else is treated\n * as a ref under the adapter's worktree dir. */\nexport function resolveWorktreePath(surface: CodeSurface, worktreeDir?: string): string {\n if (isAbsolute(surface.worktreeRef) && existsSync(surface.worktreeRef)) return surface.worktreeRef\n if (worktreeDir) return join(worktreeDir, basename(surface.worktreeRef))\n return surface.worktreeRef\n}\n"],"mappings":";;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;AAmCA,IAAM,oBACJ;AAqCK,SAAS,kBAAkB,MAAwB;AACxD,QAAM,MAAgB,CAAC;AACvB,aAAW,QAAQ,KAAK,MAAM,IAAI,GAAG;AACnC,UAAM,QAAQ,kBAAkB,KAAK,IAAI;AACzC,QAAI,MAAO,KAAI,KAAK,MAAM,CAAC,CAAE;AAAA,EAC/B;AACA,SAAO;AACT;AAMO,SAAS,mBAAmB,UAAkB,WAA2B;AAC9E,QAAM,OAAO,CAAC,MACZ,EACG,MAAM,mBAAmB,EACzB,IAAI,CAAC,MAAM,EAAE,KAAK,CAAC,EACnB,OAAO,CAAC,MAAM,EAAE,SAAS,CAAC;AAC/B,QAAM,IAAI,IAAI,IAAI,KAAK,QAAQ,CAAC;AAChC,QAAM,IAAI,IAAI,IAAI,KAAK,SAAS,CAAC;AACjC,MAAI,QAAQ;AACZ,aAAW,KAAK,EAAG,KAAI,CAAC,EAAE,IAAI,CAAC,EAAG;AAClC,aAAW,KAAK,EAAG,KAAI,CAAC,EAAE,IAAI,CAAC,EAAG;AAClC,SAAO;AACT;AAEO,SAAS,eAAe,MAAgD;AAC7E,QAAM,YAAY,KAAK,aAAa;AACpC,QAAM,aAAa,KAAK,cAAc;AACtC,MAAI,aAAa,GAAG;AAClB,UAAM,IAAI;AAAA,MACR,gDAAgD,UAAU;AAAA,IAC5D;AAAA,EACF;AACA,SAAO;AAAA,IACL,MAAM;AAAA,IACN,MAAM,QAAQ,KAAgD;AAC5D,UAAI,OAAO,IAAI,mBAAmB,UAAU;AAC1C,cAAM,IAAI;AAAA,UACR,gEAAgE,OAAO,IAAI,cAAc;AAAA,QAC3F;AAAA,MACF;AACA,YAAM,WAAW,IAAI;AACrB,YAAM,mBAAmB,KAAK,oBAAoB,kBAAkB,QAAQ;AAE5E,YAAM,EAAE,KAAK,QAAQ,OAAO,IAAI,cAAc,KAAK,WAAW,KAAK,MAAM;AAEzE,YAAM,iBAAiB,sBAAsB;AAAA,QAC3C;AAAA,QACA,eAAe;AAAA,QACf,WAAW;AAAA,QACX,cAAc;AAAA,QACd,YAAY,IAAI;AAAA,QAChB,oBAAoB,KAAK;AAAA,MAC3B,CAAC;AACD,YAAM,qBAAqB;AAAA,QACzB;AAAA,QACA;AAAA,QACA;AAAA,QACA,0BAA0B,UAAU;AAAA,QACpC;AAAA,QACA,GAAG,iBAAiB,IAAI,CAAC,MAAM,YAAY,CAAC,IAAI;AAAA,QAChD;AAAA,QACA;AAAA,QACA;AAAA,MACF,EAAE,KAAK,IAAI;AACX,YAAM,aAAa,GAAG,cAAc,GAAG,kBAAkB;AACzD,YAAM,SAAS,kBAAkB,QAAQ,eAAe,OAAO,UAAU,CAAC;AAE1E,YAAM,SAAS,MAAM;AAAA,QACnB;AAAA,UACE,OAAO,KAAK;AAAA,UACZ,UAAU;AAAA,YACR,EAAE,MAAM,UAAU,SAAS,OAAO;AAAA,YAClC,EAAE,MAAM,QAAQ,SAAS,WAAW;AAAA,UACtC;AAAA,UACA,UAAU;AAAA,UACV,aAAa,KAAK,eAAe;AAAA,UACjC,WAAW,KAAK,aAAa;AAAA,QAC/B;AAAA,QACA,KAAK;AAAA,MACP;AAEA,YAAM,YAAY,wBAAwB,OAAO,SAAS,IAAI,cAAc;AAC5E,YAAM,MAAwB,CAAC;AAC/B,iBAAW,YAAY,WAAW;AAChC,cAAM,OAAO,OAAO,SAAS,YAAY,WAAW,SAAS,QAAQ,KAAK,IAAI;AAC9E,YAAI,CAAC,QAAQ,SAAS,SAAU;AAChC,YAAI,CAAC,iBAAiB,MAAM,gBAAgB,EAAG;AAC/C,YAAI,mBAAmB,UAAU,IAAI,IAAI,aAAa,EAAG;AACzD,YAAI,IAAI,SAAS,IAAI,EAAG;AACxB,YAAI,KAAK,IAAI;AAAA,MACf;AACA,aAAO;AAAA,IACT;AAAA,EACF;AACF;AAEA,SAAS,iBAAiB,WAAmB,UAA6B;AACxE,MAAI,SAAS,WAAW,EAAG,QAAO;AAClC,QAAM,OAAO,IAAI,IAAI,kBAAkB,SAAS,CAAC;AACjD,aAAW,WAAW,UAAU;AAC9B,QAAI,CAAC,KAAK,IAAI,OAAO,EAAG,QAAO;AAAA,EACjC;AACA,SAAO;AACT;AAGA,SAAS,cACP,KACA,WACA,YAC6D;AAC7D,QAAM,OAAO,IAAI,QAAQ,GAAG,EAAE;AAC9B,MAAI,CAAC,QAAQ,KAAK,WAAW,WAAW,GAAG;AACzC,WAAO,EAAE,KAAK,CAAC,GAAG,QAAQ,CAAC,GAAG,QAAQ,WAAW;AAAA,EACnD;AACA,QAAM,OAAO,CAAC,GAAG,KAAK,UAAU,EAAE,KAAK,CAAC,GAAG,MAAM,EAAE,YAAY,EAAE,SAAS,EAAE,CAAC;AAC7E,MAAI,CAAC,KAAM,QAAO,EAAE,KAAK,CAAC,GAAG,QAAQ,CAAC,GAAG,QAAQ,WAAW;AAE5D,QAAM,UAAU,CAAC,GAAG,KAAK,SAAS,EAAE,KAAK,CAAC,GAAG,MAAM,EAAE,YAAY,EAAE,SAAS;AAC5E,QAAM,UAAU,CAAC,OAA8D;AAAA,IAC7E,IAAI,EAAE;AAAA,IACN,OAAO,EAAE;AAAA,EACX;AACA,QAAM,MAAM,QAAQ,MAAM,GAAG,SAAS,EAAE,IAAI,OAAO;AACnD,QAAM,SAAS,QAAQ,MAAM,CAAC,SAAS,EAAE,QAAQ,EAAE,IAAI,OAAO;AAE9D,QAAM,UAAU,OAAO,QAAQ,KAAK,UAAU,EAC3C,KAAK,CAAC,GAAG,MAAM,EAAE,CAAC,IAAI,EAAE,CAAC,CAAC,EAC1B,MAAM,GAAG,CAAC,EACV,IAAI,CAAC,CAAC,KAAK,KAAK,MAAM,GAAG,GAAG,KAAK,MAAM,QAAQ,CAAC,CAAC,GAAG;AACvD,QAAM,SACJ,QAAQ,SAAS,IAAI,GAAG,UAAU,+BAA0B,QAAQ,KAAK,IAAI,CAAC,KAAK;AAErF,SAAO,EAAE,KAAK,QAAQ,OAAO;AAC/B;;;ACrLA,SAAS,kBAAkB;AAC3B,SAAS,YAAY,WAAW,cAAc,qBAAqB;AACnE,SAAS,YAAY;AAmBd,IAAM,4BAAN,cAAwC,MAAM;AAAA,EACnD,YACkB,MAChB,SACA;AACA,UAAM,OAAO;AAHG;AAIhB,SAAK,OAAO;AAAA,EACd;AAAA,EALkB;AAMpB;AAQO,IAAM,yBAAN,MAA6D;AAAA,EAIlE,YAA6B,SAAwC;AAAxC;AAC3B,QAAI,CAAC,WAAW,QAAQ,IAAI,EAAG,WAAU,QAAQ,MAAM,EAAE,WAAW,KAAK,CAAC;AAC1E,SAAK,MAAM,QAAQ,OAAO,KAAK;AAAA,EACjC;AAAA,EAH6B;AAAA,EAHZ;AAAA,EACA,aAAa,oBAAI,IAA4B;AAAA,EAO9D,MAAM,QAAQ,OAA4C;AACxD,SAAK,iBAAiB,KAAK;AAC3B,SAAK,gBAAgB,KAAK;AAC1B,UAAM,SAAS,KAAK,SAAS,KAAK;AAClC,UAAM,OAAO,KAAK,cAAc,MAAM,MAAM;AAC5C,UAAM,OAAO,GAAG,KAAK,UAAU,MAAM,CAAC;AAAA;AAItC,eAAW,MAAM,IAAI;AAAA,EACvB;AAAA,EAEA,MAAM,OAAO,MAAmE;AAC9E,QAAI,CAAC,KAAK,OAAO;AACf,YAAM,IAAI;AAAA,QACR;AAAA,QACA;AAAA,MACF;AAAA,IACF;AACA,QAAI,CAAC,KAAK,gBAAgB;AACxB,YAAM,IAAI;AAAA,QACR;AAAA,QACA;AAAA,MACF;AAAA,IACF;AAEA,UAAM,MAA+B,CAAC;AACtC,eAAW,UAAU,aAAa;AAGhC,UAAI,KAAK,UAAU,WAAW,WAAW,oBAAoB;AAC3D,cAAM,WAAW,qBAAqB,KAAK,QAAQ,QAAQ,kBAAkB;AAC7E,YAAI,CAAC,SAAU;AAAA,MACjB;AACA,YAAM,OAAO,KAAK,cAAc,MAAM;AACtC,UAAI,CAAC,WAAW,IAAI,EAAG;AACvB,YAAM,QAAQ,aAAa,MAAM,MAAM,EAAE,MAAM,IAAI,EAAE,OAAO,OAAO;AACnE,iBAAW,QAAQ,OAAO;AACxB,YAAI;AACJ,YAAI;AACF,mBAAS,KAAK,MAAM,IAAI;AAAA,QAC1B,QAAQ;AACN;AAAA,QACF;AACA,YAAI,CAAC,cAAc,QAAQ,MAAM,MAAM,EAAG;AAC1C,YAAI,KAAK,MAAM;AAAA,MACjB;AAAA,IACF;AAGA,QAAI,KAAK,CAAC,GAAG,MAAM;AACjB,UAAI,EAAE,eAAe,EAAE,WAAY,QAAO,EAAE,WAAW,cAAc,EAAE,UAAU;AACjF,aAAO,EAAE,WAAW,cAAc,EAAE,UAAU;AAAA,IAChD,CAAC;AAED,WAAO,IAAI,MAAM,GAAG,KAAK,KAAK;AAAA,EAChC;AAAA,EAEA,MAAM,OAAmF;AACvF,UAAM,WAAmC,CAAC;AAC1C,QAAI,QAAQ;AACZ,eAAW,UAAU,aAAa;AAChC,YAAM,OAAO,KAAK,cAAc,MAAM;AACtC,UAAI,CAAC,WAAW,IAAI,GAAG;AACrB,iBAAS,MAAM,IAAI;AACnB;AAAA,MACF;AACA,YAAM,QAAQ,aAAa,MAAM,MAAM,EAAE,MAAM,IAAI,EAAE,OAAO,OAAO,EAAE;AACrE,eAAS,MAAM,IAAI;AACnB,eAAS;AAAA,IACX;AAIA,WAAO,EAAE,OAAO,OAAO,MAAM,OAAO,SAAS;AAAA,EAC/C;AAAA,EAEQ,iBAAiB,OAAmC;AAC1D,QAAI,CAAC,MAAM,QAAQ;AACjB,YAAM,IAAI;AAAA,QACR;AAAA,QACA;AAAA,MACF;AAAA,IACF;AACA,QAAI,CAAC,MAAM,qBAAqB,MAAM,kBAAkB,WAAW,GAAG;AACpE,YAAM,IAAI;AAAA,QACR;AAAA,QACA;AAAA,MACF;AAAA,IACF;AACA,QAAI,CAAC,MAAM,YAAY;AACrB,YAAM,IAAI;AAAA,QACR;AAAA,QACA;AAAA,MACF;AAAA,IACF;AACA,QAAI,CAAC,MAAM,iBAAiB;AAC1B,YAAM,IAAI;AAAA,QACR;AAAA,QACA;AAAA,MACF;AAAA,IACF;AACA,QAAI,CAAC,YAAY,SAAS,MAAM,MAAM,GAAG;AACvC,YAAM,IAAI;AAAA,QACR;AAAA,QACA,+CAA+C,YAAY,KAAK,IAAI,CAAC;AAAA,MACvE;AAAA,IACF;AAAA,EACF;AAAA,EAEQ,gBAAgB,OAAmC;AACzD,UAAM,MAAM,KAAK,QAAQ;AACzB,QAAI,CAAC,OAAO,CAAC,MAAM,gBAAiB;AACpC,UAAM,MAAM,KAAK,IAAI;AACrB,UAAM,WAAW;AACjB,QAAI,QAAQ,KAAK,WAAW,IAAI,MAAM,eAAe;AACrD,QAAI,CAAC,SAAS,MAAM,MAAM,iBAAiB,UAAU;AACnD,cAAQ,EAAE,QAAQ,MAAM,iBAAiB,eAAe,KAAK,OAAO,EAAE;AACtE,WAAK,WAAW,IAAI,MAAM,iBAAiB,KAAK;AAAA,IAClD;AACA,QAAI,MAAM,SAAS,KAAK;AACtB,YAAM,IAAI;AAAA,QACR;AAAA,QACA,gCAAgC,MAAM,eAAe,aAAa,GAAG;AAAA,MACvE;AAAA,IACF;AACA,UAAM,SAAS;AAAA,EACjB;AAAA,EAEQ,SAAS,OAAoD;AACnE,UAAM,aAAa;AAAA,MACjB,KAAK,UAAU;AAAA,QACb,IAAI,MAAM,SAAS;AAAA,QACnB,KAAK,MAAM;AAAA,QACX,IAAI,MAAM;AAAA,QACV,KAAK,MAAM;AAAA,MACb,CAAC;AAAA,IACH;AAGA,WAAO;AAAA,MACL,GAAG;AAAA,MACH;AAAA,MACA,OAAO;AAAA,IACT;AAAA,EACF;AAAA,EAEQ,cAAc,QAAwB;AAC5C,WAAO,KAAK,KAAK,QAAQ,MAAM,GAAG,MAAM,QAAQ;AAAA,EAClD;AACF;AAEA,IAAM,cAAgD;AAAA,EACpD;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AACF;AAEA,SAAS,qBACP,QACA,QACS;AACT,MAAI,CAAC,OAAQ,QAAO;AACpB,MAAI,MAAM,QAAQ,MAAM,EAAG,QAAO,OAAO,SAAS,MAAM;AACxD,SAAO,WAAW;AACpB;AAEA,SAAS,cACP,QACA,MACA,QACS;AAET,MAAI,KAAK,UAAU,WAAW,OAAO,cAAc,KAAK,eAAgB,QAAO;AAC/E,MAAI,KAAK,UAAU,UAAU,OAAO,aAAa,KAAK,eAAgB,QAAO;AAE7E,QAAM,IAAI,KAAK;AACf,MAAI,CAAC,EAAG,QAAO;AACf,MAAI,EAAE,QAAQ,OAAO,SAAS,SAAS,EAAE,KAAM,QAAO;AACtD,MAAI,EAAE,QAAQ;AACZ,UAAM,UAAU,MAAM,QAAQ,EAAE,MAAM,IAAI,EAAE,SAAS,CAAC,EAAE,MAAM;AAC9D,QAAI,CAAC,QAAQ,SAAS,MAAe,EAAG,QAAO;AAAA,EACjD;AACA,MAAI,EAAE,iBAAiB,UAAa,EAAE,iBAAiB,QAAW;AAChE,UAAM,aAAa,OAAO,OAAO,OAAO,WAAW,EAAE,IAAI,CAAC,MAAM,EAAE,SAAS;AAC3E,UAAM,MAAM,WAAW,WAAW,IAAI,IAAI,KAAK,IAAI,GAAG,UAAU;AAChE,QAAI,EAAE,iBAAiB,UAAa,MAAM,EAAE,aAAc,QAAO;AACjE,QAAI,EAAE,iBAAiB,UAAa,MAAM,EAAE,aAAc,QAAO;AAAA,EACnE;AACA,SAAO;AACT;AAEA,SAAS,OAAO,OAAuB;AACrC,SAAO,WAAW,QAAQ,EAAE,OAAO,KAAK,EAAE,OAAO,KAAK,EAAE,MAAM,GAAG,EAAE;AACrE;AAEA,SAAS,WAAW,MAAc,MAAoB;AACpD,MAAI,WAAW,IAAI,GAAG;AACpB,UAAM,WAAW,aAAa,MAAM,MAAM;AAC1C,kBAAc,MAAM,WAAW,IAAI;AAAA,EACrC,OAAO;AACL,kBAAc,MAAM,IAAI;AAAA,EAC1B;AACF;;;AC1QA,SAAS,oBAAoB;AAC7B,SAAS,cAAAA,mBAAkB;AAC3B,SAAS,UAAU,YAAY,QAAAC,aAAY;AAuBpC,IAAM,uBAAN,cAAmC,MAAM;AAAA,EAC9C,YACE,SACS,OACT;AACA,UAAM,OAAO;AAFJ;AAGT,SAAK,OAAO;AAAA,EACd;AAAA,EAJW;AAKb;AAaA,SAAS,WAAW,MAAgB,KAAqB;AACvD,MAAI;AACF,WAAO,aAAa,OAAO,MAAM,EAAE,KAAK,UAAU,OAAO,CAAC,EAAE,KAAK;AAAA,EACnE,SAAS,KAAK;AACZ,UAAM,SACJ,OAAO,OAAO,QAAQ,YAAY,YAAY,MAC1C,OAAQ,IAA4B,MAAM,IAC1C;AACN,UAAM,IAAI,qBAAqB,OAAO,KAAK,KAAK,GAAG,CAAC,YAAY,UAAU,OAAO,GAAG,CAAC,IAAI,GAAG;AAAA,EAC9F;AACF;AAGA,SAAS,KAAK,OAAuB;AACnC,SACE,MACG,YAAY,EACZ,QAAQ,eAAe,GAAG,EAC1B,QAAQ,YAAY,EAAE,EACtB,MAAM,GAAG,EAAE,KAAK;AAEvB;AAEO,SAAS,mBAAmB,MAAkD;AACnF,QAAM,MAAM,KAAK,OAAO;AACxB,QAAM,cAAc,KAAK,eAAeA,MAAK,KAAK,UAAU,YAAY;AACxE,QAAM,eAAe,KAAK,gBAAgB;AAE1C,SAAO;AAAA,IACL,MAAM,OAAO,EAAE,SAAS,MAAM,GAAG;AAC/B,YAAM,KAAK,GAAG,KAAK,KAAK,CAAC,IAAI,KAAK,IAAI,EAAE,SAAS,EAAE,CAAC,IAAI,KAAK,OAAO,EAAE,SAAS,EAAE,EAAE,MAAM,GAAG,CAAC,CAAC;AAC9F,YAAM,SAAS,GAAG,YAAY,IAAI,EAAE;AACpC,YAAM,OAAOA,MAAK,aAAa,EAAE;AACjC,UAAI,CAAC,YAAY,OAAO,MAAM,QAAQ,MAAM,OAAO,GAAG,KAAK,QAAQ;AACnE,aAAO,EAAE,MAAM,QAAQ,QAAQ;AAAA,IACjC;AAAA,IAEA,MAAM,SAAS,UAAU,SAAS;AAGhC,YAAM,SAAS,IAAI,CAAC,UAAU,aAAa,GAAG,SAAS,IAAI;AAC3D,UAAI,OAAO,SAAS,GAAG;AACrB,YAAI,CAAC,OAAO,IAAI,GAAG,SAAS,IAAI;AAChC,YAAI,CAAC,UAAU,MAAM,OAAO,GAAG,SAAS,IAAI;AAAA,MAC9C;AACA,aAAO;AAAA,QACL,MAAM;AAAA,QACN,aAAa,SAAS;AAAA,QACtB,SAAS,SAAS;AAAA,QAClB;AAAA,MACF;AAAA,IACF;AAAA,IAEA,MAAM,QAAQ,UAAU;AAGtB,UAAI,CAAC,YAAY,UAAU,WAAW,SAAS,IAAI,GAAG,KAAK,QAAQ;AACnE,UAAI,CAAC,UAAU,MAAM,SAAS,MAAM,GAAG,KAAK,QAAQ;AAAA,IACtD;AAAA,EACF;AACF;AAKO,SAAS,oBAAoB,SAAsB,aAA8B;AACtF,MAAI,WAAW,QAAQ,WAAW,KAAKD,YAAW,QAAQ,WAAW,EAAG,QAAO,QAAQ;AACvF,MAAI,YAAa,QAAOC,MAAK,aAAa,SAAS,QAAQ,WAAW,CAAC;AACvE,SAAO,QAAQ;AACjB;","names":["existsSync","join"]}
|
|
1
|
+
{"version":3,"sources":["../../src/campaign/labeled-store/fs-adapter.ts","../../src/campaign/worktree/index.ts"],"sourcesContent":["/**\n * @experimental\n *\n * Filesystem `LabeledScenarioStore` adapter. The default capture sink for\n * traces + eval artifacts. Production deployments typically swap for a\n * Turso/SQLite adapter (same interface).\n *\n * Records land as one JSONL file per source under `<root>/<source>.jsonl`.\n * Each line is a `LabeledScenarioRecord`. Append-only — no in-place edits.\n *\n * Safety properties enforced at write-time:\n *\n * - **Provenance required**: writes without `source`, `sourceVersionHash`,\n * `capturedAt`, `redactionStatus` are rejected. Closes the alignment\n * reviewer's data-poisoning gap.\n * - **Per-source rate limits**: optional `rateLimitBucket` + `maxWritesPerMinute`\n * stops a single tenant/source from flooding the store.\n *\n * Safety properties enforced at sample-time:\n *\n * - **Required split + capturedBefore**: substrate refuses to sample without\n * an explicit `split` ('train' | 'test') AND a temporal cutoff. Eliminates\n * accidental train/test contamination.\n * - **Default training-source filter**: when the store is sampled with\n * `split: 'train'`, production-trace records are EXCLUDED unless the\n * caller passes `filter.source: 'production-trace'` explicitly. Closes\n * the contamination-by-default gap flagged by the senior eval engineer.\n */\n\nimport { createHash } from 'node:crypto'\nimport { existsSync, mkdirSync, readFileSync, writeFileSync } from 'node:fs'\nimport { join } from 'node:path'\nimport type {\n LabeledScenarioRecord,\n LabeledScenarioSampleArgs,\n LabeledScenarioSource,\n LabeledScenarioStore,\n LabeledScenarioWrite,\n} from '../types'\n\nexport interface FsLabeledScenarioStoreOptions {\n /** Root directory for JSONL files. Created if missing. */\n root: string\n /** Per-source rate limit. When set, writes exceeding the cap are rejected\n * with a typed error. Default: no limit. */\n maxWritesPerMinutePerBucket?: number\n /** Test seam — override `Date.now()` for deterministic tests. */\n now?: () => number\n}\n\nexport class LabeledScenarioStoreError extends Error {\n constructor(\n public readonly code: string,\n message: string,\n ) {\n super(message)\n this.name = 'LabeledScenarioStoreError'\n }\n}\n\ninterface RateLimitState {\n bucket: string\n windowStartMs: number\n count: number\n}\n\nexport class FsLabeledScenarioStore implements LabeledScenarioStore {\n private readonly now: () => number\n private readonly rateLimits = new Map<string, RateLimitState>()\n\n constructor(private readonly options: FsLabeledScenarioStoreOptions) {\n if (!existsSync(options.root)) mkdirSync(options.root, { recursive: true })\n this.now = options.now ?? Date.now\n }\n\n async observe(write: LabeledScenarioWrite): Promise<void> {\n this.assertProvenance(write)\n this.assertRateLimit(write)\n const record = this.toRecord(write)\n const path = this.pathForSource(write.source)\n const line = `${JSON.stringify(record)}\\n`\n // Append atomically. For high-throughput a writev-friendly buffered\n // implementation lands in the Turso adapter; FS adapter is for tests +\n // local dev + small workloads.\n appendLine(path, line)\n }\n\n async sample(args: LabeledScenarioSampleArgs): Promise<LabeledScenarioRecord[]> {\n if (!args.split) {\n throw new LabeledScenarioStoreError(\n 'split_required',\n 'sample() requires an explicit `split` (train | test) — substrate refuses ambiguous reads',\n )\n }\n if (!args.capturedBefore) {\n throw new LabeledScenarioStoreError(\n 'capturedBefore_required',\n 'sample() requires an explicit `capturedBefore` timestamp for temporal-split discipline',\n )\n }\n\n const all: LabeledScenarioRecord[] = []\n for (const source of ALL_SOURCES) {\n // Default training-source filter: when sampling train, EXCLUDE\n // production-trace records unless the caller asks for them.\n if (args.split === 'train' && source === 'production-trace') {\n const explicit = sourceFilterContains(args.filter?.source, 'production-trace')\n if (!explicit) continue\n }\n const path = this.pathForSource(source)\n if (!existsSync(path)) continue\n const lines = readFileSync(path, 'utf8').split('\\n').filter(Boolean)\n for (const line of lines) {\n let record: LabeledScenarioRecord\n try {\n record = JSON.parse(line) as LabeledScenarioRecord\n } catch {\n continue\n }\n if (!matchesFilter(record, args, source)) continue\n all.push(record)\n }\n }\n\n // Deterministic order: by capturedAt ascending, then recordHash.\n all.sort((a, b) => {\n if (a.capturedAt !== b.capturedAt) return a.capturedAt.localeCompare(b.capturedAt)\n return a.recordHash.localeCompare(b.recordHash)\n })\n\n return all.slice(0, args.count)\n }\n\n async size(): Promise<{ train: number; test: number; bySource: Record<string, number> }> {\n const bySource: Record<string, number> = {}\n let total = 0\n for (const source of ALL_SOURCES) {\n const path = this.pathForSource(source)\n if (!existsSync(path)) {\n bySource[source] = 0\n continue\n }\n const count = readFileSync(path, 'utf8').split('\\n').filter(Boolean).length\n bySource[source] = count\n total += count\n }\n // FS adapter doesn't track split assignments per-record (split is\n // computed at sample-time based on `capturedBefore`). For size(), we\n // report `train`+`test` as the same total — split is a sampling concept.\n return { train: total, test: total, bySource }\n }\n\n private assertProvenance(write: LabeledScenarioWrite): void {\n if (!write.source) {\n throw new LabeledScenarioStoreError(\n 'missing_source',\n 'LabeledScenarioWrite requires `source`',\n )\n }\n if (!write.sourceVersionHash || write.sourceVersionHash.length === 0) {\n throw new LabeledScenarioStoreError(\n 'missing_source_version',\n 'LabeledScenarioWrite requires `sourceVersionHash` (git sha or substrate version)',\n )\n }\n if (!write.capturedAt) {\n throw new LabeledScenarioStoreError(\n 'missing_captured_at',\n 'LabeledScenarioWrite requires `capturedAt` ISO timestamp',\n )\n }\n if (!write.redactionStatus) {\n throw new LabeledScenarioStoreError(\n 'missing_redaction_status',\n 'LabeledScenarioWrite requires explicit `redactionStatus` — raw / redacted-pii / redacted-secrets / fully-redacted',\n )\n }\n if (!ALL_SOURCES.includes(write.source)) {\n throw new LabeledScenarioStoreError(\n 'unknown_source',\n `LabeledScenarioWrite.source must be one of: ${ALL_SOURCES.join(', ')}`,\n )\n }\n }\n\n private assertRateLimit(write: LabeledScenarioWrite): void {\n const cap = this.options.maxWritesPerMinutePerBucket\n if (!cap || !write.rateLimitBucket) return\n const now = this.now()\n const windowMs = 60_000\n let state = this.rateLimits.get(write.rateLimitBucket)\n if (!state || now - state.windowStartMs >= windowMs) {\n state = { bucket: write.rateLimitBucket, windowStartMs: now, count: 0 }\n this.rateLimits.set(write.rateLimitBucket, state)\n }\n if (state.count >= cap) {\n throw new LabeledScenarioStoreError(\n 'rate_limit_exceeded',\n `LabeledScenarioStore: bucket ${write.rateLimitBucket} exceeded ${cap} writes/min`,\n )\n }\n state.count += 1\n }\n\n private toRecord(write: LabeledScenarioWrite): LabeledScenarioRecord {\n const recordHash = sha256(\n JSON.stringify({\n id: write.scenario.id,\n src: write.source,\n at: write.capturedAt,\n ver: write.sourceVersionHash,\n }),\n )\n // FS adapter assigns split at sample-time, but we cache a hint here\n // based on capturedAt vs the world's \"now\" — sampler overrides this.\n return {\n ...write,\n recordHash,\n split: 'train',\n }\n }\n\n private pathForSource(source: string): string {\n return join(this.options.root, `${source}.jsonl`)\n }\n}\n\nconst ALL_SOURCES: LabeledScenarioWrite['source'][] = [\n 'production-trace',\n 'eval-run',\n 'manual',\n 'red-team',\n 'synthetic',\n]\n\nfunction sourceFilterContains(\n filter: LabeledScenarioSource | LabeledScenarioSource[] | undefined,\n needle: LabeledScenarioSource,\n): boolean {\n if (!filter) return false\n if (Array.isArray(filter)) return filter.includes(needle)\n return filter === needle\n}\n\nfunction matchesFilter(\n record: LabeledScenarioRecord,\n args: LabeledScenarioSampleArgs,\n source: string,\n): boolean {\n // Temporal cutoff — train must be capturedAt < capturedBefore.\n if (args.split === 'train' && record.capturedAt >= args.capturedBefore) return false\n if (args.split === 'test' && record.capturedAt < args.capturedBefore) return false\n\n const f = args.filter\n if (!f) return true\n if (f.kind && record.scenario.kind !== f.kind) return false\n if (f.source) {\n const sources = Array.isArray(f.source) ? f.source : [f.source]\n if (!sources.includes(source as never)) return false\n }\n if (f.minComposite !== undefined || f.maxComposite !== undefined) {\n const composites = Object.values(record.judgeScores).map((s) => s.composite)\n const max = composites.length === 0 ? 0 : Math.max(...composites)\n if (f.minComposite !== undefined && max < f.minComposite) return false\n if (f.maxComposite !== undefined && max > f.maxComposite) return false\n }\n return true\n}\n\nfunction sha256(input: string): string {\n return createHash('sha256').update(input).digest('hex').slice(0, 16)\n}\n\nfunction appendLine(path: string, line: string): void {\n if (existsSync(path)) {\n const existing = readFileSync(path, 'utf8')\n writeFileSync(path, existing + line)\n } else {\n writeFileSync(path, line)\n }\n}\n","/**\n * @experimental\n *\n * VCS-pluggable worktree adapter. One improvement = one worktree, PR-like\n * (multiple commits allowed). A code-tier driver's `propose()` creates a\n * worktree, an agent commits the change into it, and `finalize()` returns a\n * `CodeSurface{ worktreeRef }` the measurement checks out to run the worker\n * against the changed code. On promotion the worktree becomes the PR branch.\n *\n * The interface is VCS-agnostic so a future `jj` ([jj-vcs](https://github.com/jj-vcs/jj))\n * adapter can slot in without touching driver code. Only the git adapter\n * ships today. See `docs/design/self-improvement-engine.md`.\n */\n\nimport { execFileSync } from 'node:child_process'\nimport { existsSync } from 'node:fs'\nimport { basename, isAbsolute, join } from 'node:path'\nimport type { CodeSurface } from '../types'\n\nexport interface Worktree {\n /** Absolute path to the checked-out worktree directory. */\n path: string\n /** The branch the worktree is on (becomes the PR branch on promotion). */\n branch: string\n /** The ref the worktree was forked from. */\n baseRef: string\n}\n\nexport interface WorktreeAdapter {\n /** Create an isolated worktree on a fresh branch off `baseRef`. */\n create(opts: { baseRef: string; label: string }): Promise<Worktree>\n /** Commit any pending changes in the worktree, then return a CodeSurface\n * pointing at it. The agent has already written its change into\n * `worktree.path` by the time this is called. */\n finalize(worktree: Worktree, summary: string): Promise<CodeSurface>\n /** Remove the worktree (and its branch) — called for losing candidates. */\n discard(worktree: Worktree): Promise<void>\n}\n\nexport class WorktreeAdapterError extends Error {\n constructor(\n message: string,\n readonly cause?: unknown,\n ) {\n super(message)\n this.name = 'WorktreeAdapterError'\n }\n}\n\nexport interface GitWorktreeAdapterOptions {\n /** Repo root the worktrees fork from. */\n repoRoot: string\n /** Directory worktrees are created under. Default: `<repoRoot>/.worktrees`. */\n worktreeDir?: string\n /** Branch-name prefix. Default: `improve`. */\n branchPrefix?: string\n /** Test seam — defaults to a real `git` runner. */\n git?: (args: string[], cwd: string) => string\n}\n\nfunction defaultGit(args: string[], cwd: string): string {\n try {\n return execFileSync('git', args, { cwd, encoding: 'utf8' }).trim()\n } catch (err) {\n const stderr =\n err && typeof err === 'object' && 'stderr' in err\n ? String((err as { stderr: unknown }).stderr)\n : ''\n throw new WorktreeAdapterError(`git ${args.join(' ')} failed: ${stderr || String(err)}`, err)\n }\n}\n\n/** Slugify a label into a branch-safe segment. */\nfunction slug(label: string): string {\n return (\n label\n .toLowerCase()\n .replace(/[^a-z0-9]+/g, '-')\n .replace(/^-+|-+$/g, '')\n .slice(0, 48) || 'candidate'\n )\n}\n\nexport function gitWorktreeAdapter(opts: GitWorktreeAdapterOptions): WorktreeAdapter {\n const git = opts.git ?? defaultGit\n const worktreeDir = opts.worktreeDir ?? join(opts.repoRoot, '.worktrees')\n const branchPrefix = opts.branchPrefix ?? 'improve'\n\n return {\n async create({ baseRef, label }) {\n const id = `${slug(label)}-${Date.now().toString(36)}-${Math.random().toString(36).slice(2, 6)}`\n const branch = `${branchPrefix}/${id}`\n const path = join(worktreeDir, id)\n git(['worktree', 'add', '-b', branch, path, baseRef], opts.repoRoot)\n return { path, branch, baseRef }\n },\n\n async finalize(worktree, summary) {\n // Stage + commit any pending changes the agent left in the worktree.\n // A no-op commit is refused by git, so only commit when the tree is dirty.\n const status = git(['status', '--porcelain'], worktree.path)\n if (status.length > 0) {\n git(['add', '-A'], worktree.path)\n git(['commit', '-m', summary], worktree.path)\n }\n return {\n kind: 'code',\n worktreeRef: worktree.path,\n baseRef: worktree.baseRef,\n summary,\n }\n },\n\n async discard(worktree) {\n // Remove the worktree, then delete its branch. Force-remove because the\n // worktree may hold uncommitted experiment state we're discarding.\n git(['worktree', 'remove', '--force', worktree.path], opts.repoRoot)\n git(['branch', '-D', worktree.branch], opts.repoRoot)\n },\n }\n}\n\n/** Resolve a `CodeSurface`'s worktreeRef to a directory the measurement can\n * run the worker in. A path ref is returned as-is; anything else is treated\n * as a ref under the adapter's worktree dir. */\nexport function resolveWorktreePath(surface: CodeSurface, worktreeDir?: string): string {\n if (isAbsolute(surface.worktreeRef) && existsSync(surface.worktreeRef)) return surface.worktreeRef\n if (worktreeDir) return join(worktreeDir, basename(surface.worktreeRef))\n return surface.worktreeRef\n}\n"],"mappings":";;;;;;;;;;;;;;;;;;;;;;;;;;;;;AA6BA,SAAS,kBAAkB;AAC3B,SAAS,YAAY,WAAW,cAAc,qBAAqB;AACnE,SAAS,YAAY;AAmBd,IAAM,4BAAN,cAAwC,MAAM;AAAA,EACnD,YACkB,MAChB,SACA;AACA,UAAM,OAAO;AAHG;AAIhB,SAAK,OAAO;AAAA,EACd;AAAA,EALkB;AAMpB;AAQO,IAAM,yBAAN,MAA6D;AAAA,EAIlE,YAA6B,SAAwC;AAAxC;AAC3B,QAAI,CAAC,WAAW,QAAQ,IAAI,EAAG,WAAU,QAAQ,MAAM,EAAE,WAAW,KAAK,CAAC;AAC1E,SAAK,MAAM,QAAQ,OAAO,KAAK;AAAA,EACjC;AAAA,EAH6B;AAAA,EAHZ;AAAA,EACA,aAAa,oBAAI,IAA4B;AAAA,EAO9D,MAAM,QAAQ,OAA4C;AACxD,SAAK,iBAAiB,KAAK;AAC3B,SAAK,gBAAgB,KAAK;AAC1B,UAAM,SAAS,KAAK,SAAS,KAAK;AAClC,UAAM,OAAO,KAAK,cAAc,MAAM,MAAM;AAC5C,UAAM,OAAO,GAAG,KAAK,UAAU,MAAM,CAAC;AAAA;AAItC,eAAW,MAAM,IAAI;AAAA,EACvB;AAAA,EAEA,MAAM,OAAO,MAAmE;AAC9E,QAAI,CAAC,KAAK,OAAO;AACf,YAAM,IAAI;AAAA,QACR;AAAA,QACA;AAAA,MACF;AAAA,IACF;AACA,QAAI,CAAC,KAAK,gBAAgB;AACxB,YAAM,IAAI;AAAA,QACR;AAAA,QACA;AAAA,MACF;AAAA,IACF;AAEA,UAAM,MAA+B,CAAC;AACtC,eAAW,UAAU,aAAa;AAGhC,UAAI,KAAK,UAAU,WAAW,WAAW,oBAAoB;AAC3D,cAAM,WAAW,qBAAqB,KAAK,QAAQ,QAAQ,kBAAkB;AAC7E,YAAI,CAAC,SAAU;AAAA,MACjB;AACA,YAAM,OAAO,KAAK,cAAc,MAAM;AACtC,UAAI,CAAC,WAAW,IAAI,EAAG;AACvB,YAAM,QAAQ,aAAa,MAAM,MAAM,EAAE,MAAM,IAAI,EAAE,OAAO,OAAO;AACnE,iBAAW,QAAQ,OAAO;AACxB,YAAI;AACJ,YAAI;AACF,mBAAS,KAAK,MAAM,IAAI;AAAA,QAC1B,QAAQ;AACN;AAAA,QACF;AACA,YAAI,CAAC,cAAc,QAAQ,MAAM,MAAM,EAAG;AAC1C,YAAI,KAAK,MAAM;AAAA,MACjB;AAAA,IACF;AAGA,QAAI,KAAK,CAAC,GAAG,MAAM;AACjB,UAAI,EAAE,eAAe,EAAE,WAAY,QAAO,EAAE,WAAW,cAAc,EAAE,UAAU;AACjF,aAAO,EAAE,WAAW,cAAc,EAAE,UAAU;AAAA,IAChD,CAAC;AAED,WAAO,IAAI,MAAM,GAAG,KAAK,KAAK;AAAA,EAChC;AAAA,EAEA,MAAM,OAAmF;AACvF,UAAM,WAAmC,CAAC;AAC1C,QAAI,QAAQ;AACZ,eAAW,UAAU,aAAa;AAChC,YAAM,OAAO,KAAK,cAAc,MAAM;AACtC,UAAI,CAAC,WAAW,IAAI,GAAG;AACrB,iBAAS,MAAM,IAAI;AACnB;AAAA,MACF;AACA,YAAM,QAAQ,aAAa,MAAM,MAAM,EAAE,MAAM,IAAI,EAAE,OAAO,OAAO,EAAE;AACrE,eAAS,MAAM,IAAI;AACnB,eAAS;AAAA,IACX;AAIA,WAAO,EAAE,OAAO,OAAO,MAAM,OAAO,SAAS;AAAA,EAC/C;AAAA,EAEQ,iBAAiB,OAAmC;AAC1D,QAAI,CAAC,MAAM,QAAQ;AACjB,YAAM,IAAI;AAAA,QACR;AAAA,QACA;AAAA,MACF;AAAA,IACF;AACA,QAAI,CAAC,MAAM,qBAAqB,MAAM,kBAAkB,WAAW,GAAG;AACpE,YAAM,IAAI;AAAA,QACR;AAAA,QACA;AAAA,MACF;AAAA,IACF;AACA,QAAI,CAAC,MAAM,YAAY;AACrB,YAAM,IAAI;AAAA,QACR;AAAA,QACA;AAAA,MACF;AAAA,IACF;AACA,QAAI,CAAC,MAAM,iBAAiB;AAC1B,YAAM,IAAI;AAAA,QACR;AAAA,QACA;AAAA,MACF;AAAA,IACF;AACA,QAAI,CAAC,YAAY,SAAS,MAAM,MAAM,GAAG;AACvC,YAAM,IAAI;AAAA,QACR;AAAA,QACA,+CAA+C,YAAY,KAAK,IAAI,CAAC;AAAA,MACvE;AAAA,IACF;AAAA,EACF;AAAA,EAEQ,gBAAgB,OAAmC;AACzD,UAAM,MAAM,KAAK,QAAQ;AACzB,QAAI,CAAC,OAAO,CAAC,MAAM,gBAAiB;AACpC,UAAM,MAAM,KAAK,IAAI;AACrB,UAAM,WAAW;AACjB,QAAI,QAAQ,KAAK,WAAW,IAAI,MAAM,eAAe;AACrD,QAAI,CAAC,SAAS,MAAM,MAAM,iBAAiB,UAAU;AACnD,cAAQ,EAAE,QAAQ,MAAM,iBAAiB,eAAe,KAAK,OAAO,EAAE;AACtE,WAAK,WAAW,IAAI,MAAM,iBAAiB,KAAK;AAAA,IAClD;AACA,QAAI,MAAM,SAAS,KAAK;AACtB,YAAM,IAAI;AAAA,QACR;AAAA,QACA,gCAAgC,MAAM,eAAe,aAAa,GAAG;AAAA,MACvE;AAAA,IACF;AACA,UAAM,SAAS;AAAA,EACjB;AAAA,EAEQ,SAAS,OAAoD;AACnE,UAAM,aAAa;AAAA,MACjB,KAAK,UAAU;AAAA,QACb,IAAI,MAAM,SAAS;AAAA,QACnB,KAAK,MAAM;AAAA,QACX,IAAI,MAAM;AAAA,QACV,KAAK,MAAM;AAAA,MACb,CAAC;AAAA,IACH;AAGA,WAAO;AAAA,MACL,GAAG;AAAA,MACH;AAAA,MACA,OAAO;AAAA,IACT;AAAA,EACF;AAAA,EAEQ,cAAc,QAAwB;AAC5C,WAAO,KAAK,KAAK,QAAQ,MAAM,GAAG,MAAM,QAAQ;AAAA,EAClD;AACF;AAEA,IAAM,cAAgD;AAAA,EACpD;AAAA,EACA;AAAA,EACA;AAAA,EACA;AAAA,EACA;AACF;AAEA,SAAS,qBACP,QACA,QACS;AACT,MAAI,CAAC,OAAQ,QAAO;AACpB,MAAI,MAAM,QAAQ,MAAM,EAAG,QAAO,OAAO,SAAS,MAAM;AACxD,SAAO,WAAW;AACpB;AAEA,SAAS,cACP,QACA,MACA,QACS;AAET,MAAI,KAAK,UAAU,WAAW,OAAO,cAAc,KAAK,eAAgB,QAAO;AAC/E,MAAI,KAAK,UAAU,UAAU,OAAO,aAAa,KAAK,eAAgB,QAAO;AAE7E,QAAM,IAAI,KAAK;AACf,MAAI,CAAC,EAAG,QAAO;AACf,MAAI,EAAE,QAAQ,OAAO,SAAS,SAAS,EAAE,KAAM,QAAO;AACtD,MAAI,EAAE,QAAQ;AACZ,UAAM,UAAU,MAAM,QAAQ,EAAE,MAAM,IAAI,EAAE,SAAS,CAAC,EAAE,MAAM;AAC9D,QAAI,CAAC,QAAQ,SAAS,MAAe,EAAG,QAAO;AAAA,EACjD;AACA,MAAI,EAAE,iBAAiB,UAAa,EAAE,iBAAiB,QAAW;AAChE,UAAM,aAAa,OAAO,OAAO,OAAO,WAAW,EAAE,IAAI,CAAC,MAAM,EAAE,SAAS;AAC3E,UAAM,MAAM,WAAW,WAAW,IAAI,IAAI,KAAK,IAAI,GAAG,UAAU;AAChE,QAAI,EAAE,iBAAiB,UAAa,MAAM,EAAE,aAAc,QAAO;AACjE,QAAI,EAAE,iBAAiB,UAAa,MAAM,EAAE,aAAc,QAAO;AAAA,EACnE;AACA,SAAO;AACT;AAEA,SAAS,OAAO,OAAuB;AACrC,SAAO,WAAW,QAAQ,EAAE,OAAO,KAAK,EAAE,OAAO,KAAK,EAAE,MAAM,GAAG,EAAE;AACrE;AAEA,SAAS,WAAW,MAAc,MAAoB;AACpD,MAAI,WAAW,IAAI,GAAG;AACpB,UAAM,WAAW,aAAa,MAAM,MAAM;AAC1C,kBAAc,MAAM,WAAW,IAAI;AAAA,EACrC,OAAO;AACL,kBAAc,MAAM,IAAI;AAAA,EAC1B;AACF;;;AC1QA,SAAS,oBAAoB;AAC7B,SAAS,cAAAA,mBAAkB;AAC3B,SAAS,UAAU,YAAY,QAAAC,aAAY;AAuBpC,IAAM,uBAAN,cAAmC,MAAM;AAAA,EAC9C,YACE,SACS,OACT;AACA,UAAM,OAAO;AAFJ;AAGT,SAAK,OAAO;AAAA,EACd;AAAA,EAJW;AAKb;AAaA,SAAS,WAAW,MAAgB,KAAqB;AACvD,MAAI;AACF,WAAO,aAAa,OAAO,MAAM,EAAE,KAAK,UAAU,OAAO,CAAC,EAAE,KAAK;AAAA,EACnE,SAAS,KAAK;AACZ,UAAM,SACJ,OAAO,OAAO,QAAQ,YAAY,YAAY,MAC1C,OAAQ,IAA4B,MAAM,IAC1C;AACN,UAAM,IAAI,qBAAqB,OAAO,KAAK,KAAK,GAAG,CAAC,YAAY,UAAU,OAAO,GAAG,CAAC,IAAI,GAAG;AAAA,EAC9F;AACF;AAGA,SAAS,KAAK,OAAuB;AACnC,SACE,MACG,YAAY,EACZ,QAAQ,eAAe,GAAG,EAC1B,QAAQ,YAAY,EAAE,EACtB,MAAM,GAAG,EAAE,KAAK;AAEvB;AAEO,SAAS,mBAAmB,MAAkD;AACnF,QAAM,MAAM,KAAK,OAAO;AACxB,QAAM,cAAc,KAAK,eAAeA,MAAK,KAAK,UAAU,YAAY;AACxE,QAAM,eAAe,KAAK,gBAAgB;AAE1C,SAAO;AAAA,IACL,MAAM,OAAO,EAAE,SAAS,MAAM,GAAG;AAC/B,YAAM,KAAK,GAAG,KAAK,KAAK,CAAC,IAAI,KAAK,IAAI,EAAE,SAAS,EAAE,CAAC,IAAI,KAAK,OAAO,EAAE,SAAS,EAAE,EAAE,MAAM,GAAG,CAAC,CAAC;AAC9F,YAAM,SAAS,GAAG,YAAY,IAAI,EAAE;AACpC,YAAM,OAAOA,MAAK,aAAa,EAAE;AACjC,UAAI,CAAC,YAAY,OAAO,MAAM,QAAQ,MAAM,OAAO,GAAG,KAAK,QAAQ;AACnE,aAAO,EAAE,MAAM,QAAQ,QAAQ;AAAA,IACjC;AAAA,IAEA,MAAM,SAAS,UAAU,SAAS;AAGhC,YAAM,SAAS,IAAI,CAAC,UAAU,aAAa,GAAG,SAAS,IAAI;AAC3D,UAAI,OAAO,SAAS,GAAG;AACrB,YAAI,CAAC,OAAO,IAAI,GAAG,SAAS,IAAI;AAChC,YAAI,CAAC,UAAU,MAAM,OAAO,GAAG,SAAS,IAAI;AAAA,MAC9C;AACA,aAAO;AAAA,QACL,MAAM;AAAA,QACN,aAAa,SAAS;AAAA,QACtB,SAAS,SAAS;AAAA,QAClB;AAAA,MACF;AAAA,IACF;AAAA,IAEA,MAAM,QAAQ,UAAU;AAGtB,UAAI,CAAC,YAAY,UAAU,WAAW,SAAS,IAAI,GAAG,KAAK,QAAQ;AACnE,UAAI,CAAC,UAAU,MAAM,SAAS,MAAM,GAAG,KAAK,QAAQ;AAAA,IACtD;AAAA,EACF;AACF;AAKO,SAAS,oBAAoB,SAAsB,aAA8B;AACtF,MAAI,WAAW,QAAQ,WAAW,KAAKD,YAAW,QAAQ,WAAW,EAAG,QAAO,QAAQ;AACvF,MAAI,YAAa,QAAOC,MAAK,aAAa,SAAS,QAAQ,WAAW,CAAC;AACvE,SAAO,QAAQ;AACjB;","names":["existsSync","join"]}
|
|
@@ -174,14 +174,45 @@ function gepaDriver(opts) {
|
|
|
174
174
|
);
|
|
175
175
|
const proposals = parseReflectionResponse(result.content, ctx.populationSize);
|
|
176
176
|
const out = [];
|
|
177
|
+
const constraints = opts.constraints;
|
|
178
|
+
const preserveSections = constraints?.preserveSections !== void 0 ? constraints.preserveSections.length === 0 ? extractH2Sections(parent) : constraints.preserveSections : null;
|
|
179
|
+
const maxEdits = constraints?.maxSentenceEdits;
|
|
177
180
|
for (const proposal of proposals) {
|
|
178
181
|
const text = typeof proposal.payload === "string" ? proposal.payload.trim() : "";
|
|
179
|
-
if (text
|
|
182
|
+
if (!text || text === parent || out.includes(text)) continue;
|
|
183
|
+
if (preserveSections && !validatePreservedSections(text, preserveSections)) continue;
|
|
184
|
+
if (maxEdits !== void 0 && countSentenceEdits(parent, text) > maxEdits * 2) continue;
|
|
185
|
+
out.push(text);
|
|
180
186
|
}
|
|
181
187
|
return out;
|
|
182
188
|
}
|
|
183
189
|
};
|
|
184
190
|
}
|
|
191
|
+
function extractH2Sections(text) {
|
|
192
|
+
const out = [];
|
|
193
|
+
for (const line of text.split("\n")) {
|
|
194
|
+
const match = /^##\s+(.+?)\s*$/.exec(line);
|
|
195
|
+
if (match) out.push(match[1]);
|
|
196
|
+
}
|
|
197
|
+
return out;
|
|
198
|
+
}
|
|
199
|
+
function countSentenceEdits(baseline, candidate) {
|
|
200
|
+
const norm = (s) => s.split(/(?<=[.!?])\s+|\n/g).map((p) => p.trim()).filter((p) => p.length > 0);
|
|
201
|
+
const a = new Set(norm(baseline));
|
|
202
|
+
const b = new Set(norm(candidate));
|
|
203
|
+
let edits = 0;
|
|
204
|
+
for (const s of a) if (!b.has(s)) edits++;
|
|
205
|
+
for (const s of b) if (!a.has(s)) edits++;
|
|
206
|
+
return edits;
|
|
207
|
+
}
|
|
208
|
+
function validatePreservedSections(candidate, required) {
|
|
209
|
+
if (required.length === 0) return true;
|
|
210
|
+
const have = new Set(extractH2Sections(candidate));
|
|
211
|
+
for (const section of required) {
|
|
212
|
+
if (!have.has(section)) return false;
|
|
213
|
+
}
|
|
214
|
+
return true;
|
|
215
|
+
}
|
|
185
216
|
function buildEvidence(ctx, evidenceK, baseTarget) {
|
|
186
217
|
const last = ctx.history.at(-1);
|
|
187
218
|
if (!last || last.candidates.length === 0) {
|
|
@@ -631,6 +662,8 @@ export {
|
|
|
631
662
|
openAutoPr,
|
|
632
663
|
evolutionaryDriver,
|
|
633
664
|
gepaDriver,
|
|
665
|
+
extractH2Sections,
|
|
666
|
+
countSentenceEdits,
|
|
634
667
|
composeGate,
|
|
635
668
|
defaultProductionGate,
|
|
636
669
|
heldOutGate,
|
|
@@ -639,4 +672,4 @@ export {
|
|
|
639
672
|
surfaceHash,
|
|
640
673
|
runImprovementLoop
|
|
641
674
|
};
|
|
642
|
-
//# sourceMappingURL=chunk-
|
|
675
|
+
//# sourceMappingURL=chunk-YXD7GWJI.js.map
|