@tangle-network/agent-eval 0.17.1 → 0.17.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -77,6 +77,8 @@ The recipe for a code-generator eval is in [`SKILL.md` §Minimal working path](.
77
77
  | Wire protocol (`agent-eval serve` / `rpc`) | HTTP and stdio RPC interface for cross-language clients. | wire-protocol.md |
78
78
  | `clients/python/` | First-party Python client (`tangle-agent-eval` on PyPI). Version-locked to npm. | clients/python/README.md |
79
79
  | `BenchmarkRunner`, `executeScenario`, `ConvergenceTracker` | Multi-turn scenario execution + cross-run tracking. | SKILL.md |
80
+ | `runAgentControlLoop` | Policy-based runtime for agentic tasks: observe typed state, validate, decide, act, repeat with budgets, tracing, and stuck-loop guards. | [control-runtime.md](./docs/control-runtime.md) |
81
+ | `FeedbackTrajectory`, `InMemoryFeedbackTrajectoryStore`, `FileSystemFeedbackTrajectoryStore` | Product-native learning loops: capture approvals, rejections, choices, revisions, metrics, and policy blocks as train/dev/test/holdout examples. | [feedback-trajectories.md](./docs/feedback-trajectories.md) |
80
82
  | `ExperimentTracker`, `PromptOptimizer`, `bisector` | A/B prompts, optimize steering, bisect regressions. | SKILL.md |
81
83
  | `runPromptEvolution`, `createCompositeMutator`, `createSandboxPool`, `createSandboxCodeMutator`, `MutationTelemetry`, `LineageRecorder`, `CostLedger`, `JsonlTrialCache` | Prompt + code evolution loops with bounded sandbox pools, durable JSONL telemetry, plateau-detecting composite mutators, crash-resumable trial cache. | §Evolution loop |
82
84
  | `reflective-mutation` (`buildReflectionPrompt`, `parseReflectionResponse`, `DEFAULT_MUTATION_PRIMITIVES`) | Trace-conditioned LLM mutator that reasons over top/bottom trials instead of blind rewrites. | inline JSDoc |
@@ -168,6 +170,51 @@ The `MutationTelemetry`, `LineageRecorder`, and `CostLedger` pass into the `code
168
170
 
169
171
  For the full primitive surface and rationale, read each module's JSDoc — `prompt-evolution.ts`, `composite-mutator.ts`, `sandbox-pool.ts`, `code-mutator.ts`, `reflective-mutation.ts`, `evolution-telemetry.ts`.
170
172
 
173
+ ## Product feedback loop
174
+
175
+ When normal product usage should generate training/eval signal, use feedback
176
+ trajectories. They turn approvals, rejections, option choices, edits, metrics,
177
+ and policy blocks into reusable examples.
178
+
179
+ ```ts
180
+ import {
181
+ createFeedbackTrajectory,
182
+ summarizePreferenceMemory,
183
+ feedbackTrajectoriesToDatasetScenarios,
184
+ feedbackTrajectoriesToOptimizerRows,
185
+ } from '@tangle-network/agent-eval'
186
+
187
+ const trajectory = createFeedbackTrajectory({
188
+ projectId: 'gtm-agent',
189
+ scenarioId: 'ad-positioning-choice',
190
+ task: { intent: 'Choose a paid-social positioning angle.' },
191
+ attempts: [{
192
+ id: 'draft-1',
193
+ stepIndex: 0,
194
+ artifactType: 'decision',
195
+ artifact: { option: 'enterprise procurement language' },
196
+ options: ['enterprise procurement', 'technical founder pain'],
197
+ createdAt: new Date().toISOString(),
198
+ }],
199
+ labels: [{
200
+ source: 'user',
201
+ kind: 'reject',
202
+ value: 'enterprise procurement',
203
+ reason: 'too enterprise; our buyer is a technical founder',
204
+ severity: 'error',
205
+ createdAt: new Date().toISOString(),
206
+ }],
207
+ })
208
+
209
+ const memory = summarizePreferenceMemory([trajectory])
210
+ const scenarios = feedbackTrajectoriesToDatasetScenarios([trajectory])
211
+ const optimizerRows = feedbackTrajectoriesToOptimizerRows([trajectory])
212
+ ```
213
+
214
+ This is the bridge between product UX and optimization: normal user feedback
215
+ becomes immediate memory, replayable eval scenarios, and prompt/signature/code
216
+ optimizer input. See [`docs/feedback-trajectories.md`](./docs/feedback-trajectories.md).
217
+
171
218
  ## v0.16 highlights — production-rigor primitives
172
219
 
173
220
  These are the primitives any team running prompt-optimization in production needs, regardless of whether they're writing a paper. v0.15 shipped them under "paper-grade" naming; v0.16 corrects that — they're production-first, paper-grade as a side effect.