npm - @tangle-network/agent-eval - Versions diffs - 0.17.1 → 0.17.2 - Mend

@tangle-network/agent-eval 0.17.1 → 0.17.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -77,6 +77,8 @@ The recipe for a code-generator eval is in [`SKILL.md` §Minimal working path](.
 | Wire protocol (`agent-eval serve` / `rpc`) | HTTP and stdio RPC interface for cross-language clients. | wire-protocol.md |
 | `clients/python/` | First-party Python client (`tangle-agent-eval` on PyPI). Version-locked to npm. | clients/python/README.md |
 | `BenchmarkRunner`, `executeScenario`, `ConvergenceTracker` | Multi-turn scenario execution + cross-run tracking. | SKILL.md |
+| `runAgentControlLoop` | Policy-based runtime for agentic tasks: observe typed state, validate, decide, act, repeat with budgets, tracing, and stuck-loop guards. | [control-runtime.md](./docs/control-runtime.md) |
+| `FeedbackTrajectory`, `InMemoryFeedbackTrajectoryStore`, `FileSystemFeedbackTrajectoryStore` | Product-native learning loops: capture approvals, rejections, choices, revisions, metrics, and policy blocks as train/dev/test/holdout examples. | [feedback-trajectories.md](./docs/feedback-trajectories.md) |
 | `ExperimentTracker`, `PromptOptimizer`, `bisector` | A/B prompts, optimize steering, bisect regressions. | SKILL.md |
 | `runPromptEvolution`, `createCompositeMutator`, `createSandboxPool`, `createSandboxCodeMutator`, `MutationTelemetry`, `LineageRecorder`, `CostLedger`, `JsonlTrialCache` | Prompt + code evolution loops with bounded sandbox pools, durable JSONL telemetry, plateau-detecting composite mutators, crash-resumable trial cache. | §Evolution loop |
 | `reflective-mutation` (`buildReflectionPrompt`, `parseReflectionResponse`, `DEFAULT_MUTATION_PRIMITIVES`) | Trace-conditioned LLM mutator that reasons over top/bottom trials instead of blind rewrites. | inline JSDoc |
@@ -168,6 +170,51 @@ The `MutationTelemetry`, `LineageRecorder`, and `CostLedger` pass into the `code
 For the full primitive surface and rationale, read each module's JSDoc — `prompt-evolution.ts`, `composite-mutator.ts`, `sandbox-pool.ts`, `code-mutator.ts`, `reflective-mutation.ts`, `evolution-telemetry.ts`.
+## Product feedback loop
+When normal product usage should generate training/eval signal, use feedback
+trajectories. They turn approvals, rejections, option choices, edits, metrics,
+and policy blocks into reusable examples.
+```ts
+import {
+  createFeedbackTrajectory,
+  summarizePreferenceMemory,
+  feedbackTrajectoriesToDatasetScenarios,
+  feedbackTrajectoriesToOptimizerRows,
+} from '@tangle-network/agent-eval'
+const trajectory = createFeedbackTrajectory({
+  projectId: 'gtm-agent',
+  scenarioId: 'ad-positioning-choice',
+  task: { intent: 'Choose a paid-social positioning angle.' },
+  attempts: [{
+    id: 'draft-1',
+    stepIndex: 0,
+    artifactType: 'decision',
+    artifact: { option: 'enterprise procurement language' },
+    options: ['enterprise procurement', 'technical founder pain'],
+    createdAt: new Date().toISOString(),
+  }],
+  labels: [{
+    source: 'user',
+    kind: 'reject',
+    value: 'enterprise procurement',
+    reason: 'too enterprise; our buyer is a technical founder',
+    severity: 'error',
+    createdAt: new Date().toISOString(),
+  }],
+})
+const memory = summarizePreferenceMemory([trajectory])
+const scenarios = feedbackTrajectoriesToDatasetScenarios([trajectory])
+const optimizerRows = feedbackTrajectoriesToOptimizerRows([trajectory])
+```
+This is the bridge between product UX and optimization: normal user feedback
+becomes immediate memory, replayable eval scenarios, and prompt/signature/code
+optimizer input. See [`docs/feedback-trajectories.md`](./docs/feedback-trajectories.md).
 ## v0.16 highlights — production-rigor primitives
 These are the primitives any team running prompt-optimization in production needs, regardless of whether they're writing a paper. v0.15 shipped them under "paper-grade" naming; v0.16 corrects that — they're production-first, paper-grade as a side effect.