@tangle-network/agent-eval 0.17.3 → 0.19.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -81,7 +81,8 @@ The recipe for a code-generator eval is in [`SKILL.md` §Minimal working path](.
81
81
  | `runAgentControlLoop` | Policy-based runtime for agentic tasks: observe typed state, validate, decide, act, repeat with budgets, tracing, and stuck-loop guards. | [control-runtime.md](./docs/control-runtime.md) |
82
82
  | `FeedbackTrajectory`, `InMemoryFeedbackTrajectoryStore`, `FileSystemFeedbackTrajectoryStore` | Human/environment feedback loops: capture approvals, rejections, choices, revisions, metrics, and policy blocks as train/dev/test/holdout examples. | [feedback-trajectories.md](./docs/feedback-trajectories.md) |
83
83
  | `evaluateActionPolicy` | Generic action preflight for approval, budget, expected-outcome, and kill-criteria checks. | [feature-guide.md](./docs/feature-guide.md) |
84
- | `ExperimentTracker`, `PromptOptimizer`, `bisector` | A/B prompts, optimize steering, bisect regressions. | SKILL.md |
84
+ | `ExperimentTracker`, steering optimizers, `bisector` | A/B prompts, optimize steering, bisect regressions. | SKILL.md |
85
+ | `runMultiShotOptimization`, `trialTraceFromMultiShotTrial` | GEPA-style optimization for variable-length agent trajectories with ASI, paired seeds, and optional held-out promotion gating. | [multi-shot-optimization.md](./docs/multi-shot-optimization.md) |
85
86
  | `runPromptEvolution`, `createCompositeMutator`, `createSandboxPool`, `createSandboxCodeMutator`, `MutationTelemetry`, `LineageRecorder`, `CostLedger`, `JsonlTrialCache` | Prompt + code evolution loops with bounded sandbox pools, durable JSONL telemetry, plateau-detecting composite mutators, crash-resumable trial cache. | §Evolution loop |
86
87
  | `reflective-mutation` (`buildReflectionPrompt`, `parseReflectionResponse`, `DEFAULT_MUTATION_PRIMITIVES`) | Trace-conditioned LLM mutator that reasons over top/bottom trials instead of blind rewrites. | inline JSDoc |
87
88
  | `correlationStudy`, `OutcomeStore`, `ProductRegistry` | Meta-eval: do our scores predict deployment outcomes (revenue, retention)? | inline JSDoc |
@@ -89,6 +90,12 @@ The recipe for a code-generator eval is in [`SKILL.md` §Minimal working path](.
89
90
 
90
91
  ## Evolution loop
91
92
 
93
+ For agent tasks that run across many chat turns or tool calls, start with
94
+ [`runMultiShotOptimization`](./docs/multi-shot-optimization.md). It runs the
95
+ same prompt-evolution core over full trajectories, carries actionable side
96
+ information into reflection, and separates the search winner from the variant
97
+ that actually passes held-out promotion.
98
+
92
99
  Closing the loop on a prompt or codebase is **two adapters + a config**. Compose `runPromptEvolution` with `createCompositeMutator` (plateau policy) and you get prompt-only optimization until improvement stalls, then automatic switch to code-channel mutations from a coding agent inside a `SandboxPool`.
93
100
 
94
101
  ```ts