@tangle-network/agent-app 0.2.0 → 0.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/skills/eval-architect/SKILL.md +44 -0
- package/.claude/skills/eval-bootstrap/SKILL.md +46 -0
- package/.claude/skills/eval-campaign/SKILL.md +116 -0
- package/.claude/skills/improve-conductor/SKILL.md +45 -0
- package/.claude/skills/measurement-validation/SKILL.md +38 -0
- package/.claude/skills/skill-evolution/SKILL.md +42 -0
- package/.claude/skills/surface-evolution/SKILL.md +35 -0
- package/dist/{chunk-3LP6PEWS.js → chunk-SVCJYRVM.js} +97 -1
- package/dist/chunk-SVCJYRVM.js.map +1 -0
- package/dist/config/index.d.ts +1 -1
- package/dist/eval-campaign/index.d.ts +81 -1
- package/dist/eval-campaign/index.js +94 -1
- package/dist/eval-campaign/index.js.map +1 -1
- package/dist/index.d.ts +1 -1
- package/dist/index.js +19 -1
- package/dist/knowledge-loop/index.d.ts +1 -1
- package/dist/model-CKzniMMr.d.ts +110 -0
- package/dist/runtime/index.d.ts +1 -1
- package/dist/runtime/index.js +19 -1
- package/package.json +4 -3
- package/dist/chunk-3LP6PEWS.js.map +0 -1
- package/dist/model-BOP69mVu.d.ts +0 -35
|
@@ -0,0 +1,44 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: eval-architect
|
|
3
|
+
description: Build a measurement that scores an agent's REAL deliverable — not a proxy — for a product you've never seen before. Use when scaffolding or repairing the eval an Improve loop optimizes against. Get this wrong and every downstream optimization perfects a fiction.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Eval Architect — measure the real deliverable
|
|
7
|
+
|
|
8
|
+
You are building the measurement an improvement loop will optimize against. **The loop optimizes whatever you measure.** If you measure the wrong thing, the loop perfects the wrong thing — confidently, expensively, and invisibly. The measurement is the product. Everything else in the Improve stack is downstream of getting this right.
|
|
9
|
+
|
|
10
|
+
This skill is held by the agent that *builds* the eval (often a delegated coding agent). Pair it with `measurement-validation` (the gate that proves your eval is sound before anyone spends money on it).
|
|
11
|
+
|
|
12
|
+
## The cardinal question
|
|
13
|
+
|
|
14
|
+
**Where does this agent's deliverable actually land?** Prose in the reply? Validated tool calls? Persisted artifacts (vault docs, DB rows)? A PR? A rendered UI? Find out by inspecting *real runs* — never by assuming it's the chat text.
|
|
15
|
+
|
|
16
|
+
> Worked failure (legal-agent, this is why the skill exists): the eval scored the assistant's chat prose. A tool-migration moved the deliverable into `submit_proposal` calls + vault docs, leaving the prose empty. Every scorer reading prose silently collapsed to ~0. The loop would have optimized an empty string. The deliverable had *moved* and the measurement didn't follow it.
|
|
17
|
+
|
|
18
|
+
## Invariant (non-negotiable — violate these and the loop is a slot machine)
|
|
19
|
+
|
|
20
|
+
1. **Score the produced artifact, not the conversation.** Locate the real output channel and score *that*.
|
|
21
|
+
2. **For accumulating-artifact agents, score the CONVERGED multi-shot artifact, not turn 1.** Most real agents build their deliverable over several turns. Define a convergence criterion (e.g. the artifact stops growing for N shots) and score the converged state.
|
|
22
|
+
3. **A held-out split exists and is never trained on.** No held-out → no honest gate → no trustworthy lift.
|
|
23
|
+
4. **Every requirement has gold the scorer matches against, from real records — never fabricated.** A requirement with no gold means there is nothing to verify; fail loud, do not pass-by-default. A fluent hallucination that produced nothing must score 0, not 0.9.
|
|
24
|
+
|
|
25
|
+
## Judgment (figure this out per product — the agentic core)
|
|
26
|
+
|
|
27
|
+
- What *is* the deliverable here, and where does it persist? Read the runtime events / tool calls / storage, not the transcript.
|
|
28
|
+
- What is the convergence criterion for this agent's artifact? When has it stopped accumulating?
|
|
29
|
+
- What gold defines "correct" for each requirement, and where does it come from (real records, never invented figures)?
|
|
30
|
+
- Which dimensions matter, and what are their weights? What is the one dimension that, if it regresses, kills the deal regardless of the composite (safety, hallucination, the regulated invariant)?
|
|
31
|
+
|
|
32
|
+
## Self-test (prove the metric works before trusting it)
|
|
33
|
+
|
|
34
|
+
- **Baseline sanity:** run it. Is the score non-zero and plausible for a competent agent? A near-zero baseline usually means you're scoring the wrong channel, not that the agent is terrible.
|
|
35
|
+
- **The mutation test (the one that catches the empty-string bug):** hand-edit the produced artifact to be *obviously better* and *obviously worse*. Does the score move in the right direction and magnitude? A metric that doesn't move under obvious changes is measuring the wrong thing.
|
|
36
|
+
- **Audit EVERY scoring surface together.** Completion, quality, and the optimizer's own scorer all read *something*. When the deliverable's channel moves, all of them that read the old channel silently zero. (Session: completion + quality were fixed; the optimizer's own scorer was missed and only found by tracing. Three surfaces — enumerate them, don't assume one.)
|
|
37
|
+
|
|
38
|
+
## Evolves-by
|
|
39
|
+
|
|
40
|
+
When a later optimization shows lift on *training* but none on *held-out*, your eval was overfittable or gameable — add the gap it missed as a new judgment rule. The architect's judgment surface is itself optimized by the meta-eval *"did evals built this way yield real held-out lift, no critical regression?"* See `skill-evolution`.
|
|
41
|
+
|
|
42
|
+
## Fleet as dogfood
|
|
43
|
+
|
|
44
|
+
legal / tax / gtm / creative / insurance each put their deliverable in a *different* channel — filings, forms, published copy, rendered artifacts, routed proposals. The skill is general precisely because it forces you to *locate* the channel for the product in front of you rather than hardcode "the reply text."
|
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: eval-bootstrap
|
|
3
|
+
description: When a product has NO improvement infrastructure yet, build it for real — elicit the RIGHT target, ground the measurement in external truth, and construct a validated harness (often via a delegated agent-runtime build loop) BEFORE any optimization spend. The anti-toy, anti-circular skill: it exists so the improver moves what the user actually wants, not a measurable proxy it invented.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Eval Bootstrap — build the apparatus for real, at cold start
|
|
7
|
+
|
|
8
|
+
Cold start is the most dangerous moment in the whole stack. There is no eval, so the agent is tempted to invent one — and **an invented eval optimizes an invented target.** Your job here is to build a measurement of the thing the *user* actually values, grounded in something *real*, and prove it works *before a dollar is spent optimizing against it*. You are a builder, not a tuner: if the apparatus does not exist, you construct it — delegating a coding / agent-runtime build loop that runs to completion when the work is substantial.
|
|
9
|
+
|
|
10
|
+
This skill is what makes the Improve button honest at cold start. Held by the delegated builder; gated by `improve-conductor`; it hands off to `surface-evolution` only once the harness is validated.
|
|
11
|
+
|
|
12
|
+
## The two loops — never collapse them
|
|
13
|
+
|
|
14
|
+
- **BUILD loop** — construct + *validate* the harness: representative scenarios, **externally-grounded** gold, a scorer that passes the mutation test, a held-out split, and the wiring for the single surface to evolve. The done-criterion is **`measurement-validation` passes** — not "the files exist." When the build is substantial, mechanical, or long-running, **delegate it to an agent-runtime driven loop** that builds to completion in its own sandbox and returns a validated harness.
|
|
15
|
+
- **IMPROVE loop** — optimize the surface against the now-trusted harness (`surface-evolution`). **Starts only after BUILD exits clean.**
|
|
16
|
+
|
|
17
|
+
"I built an eval and it shows improvement," said in one breath, is the toy. The gate between the two loops is the product.
|
|
18
|
+
|
|
19
|
+
## Invariant (non-negotiable)
|
|
20
|
+
|
|
21
|
+
1. **No optimization spend until: (a) the target is user-confirmed and tied to a product-value claim, (b) the gold is grounded in EXTERNAL real truth, and (c) the harness passed `measurement-validation`.** The BUILD loop returns a *validated* harness or it isn't done.
|
|
22
|
+
2. **Gold is grounded in external reality** — the user's accepted past outputs, reference documents, real records, or human labels. **NEVER gold the agent generates and then optimizes against.** That is grading its own homework: the number always rises and means nothing. If you cannot name the external source of the gold, it is circular — stop.
|
|
23
|
+
3. **The target is the thing the user would REJECT a draft over** — not the easiest thing to measure. If you're scoring length / format / keyword presence while the user cares about correctness / usefulness / persuasiveness, you are building a toy. Re-elicit.
|
|
24
|
+
4. **If grounding does not exist, ACQUIRE it** — ask the user for examples, pull references, label a seed set — do not fabricate it. (This is where `@tangle-network/agent-app/knowledge-loop`'s source-grounded, propose-don't-apply acquisition plugs in.)
|
|
25
|
+
|
|
26
|
+
## Judgment (figure this out per product)
|
|
27
|
+
|
|
28
|
+
- What does the user *actually* value about this artifact? Extract it, confirm it, phrase it as a product-value claim. The user is often unsure — anticipate the decision-relevant quality and propose it.
|
|
29
|
+
- What external truth can ground it, and is there *enough* (the data threshold)? If not, what's the cheapest way to acquire real grounding?
|
|
30
|
+
- What's the *minimal real* harness — fewest scenarios, simplest scorer — that still measures the real thing? **Small and real beats big and toy.**
|
|
31
|
+
- Build inline, or delegate an agent-runtime loop to construct it? Delegate when it's substantial or long-running; the loop is accountable for returning a *validated* harness.
|
|
32
|
+
|
|
33
|
+
## Self-test
|
|
34
|
+
|
|
35
|
+
- **The "would the user agree?" test:** show the user 2–3 scored examples — one high, one low. Do they agree with the scores? If not, the measurement is wrong; fix it before optimizing. This single check kills most toys.
|
|
36
|
+
- **The mutation test:** an obviously-better and an obviously-worse artifact move the score in the right direction and magnitude. (A metric that doesn't move is measuring the wrong channel — see `eval-architect`.)
|
|
37
|
+
- **The non-circularity check:** name the external source of the gold. If you can't, it's circular — stop.
|
|
38
|
+
- **It RUNS, not just compiles:** a baseline produces a real, non-zero, plausible score against the grounded gold.
|
|
39
|
+
|
|
40
|
+
## Evolves-by
|
|
41
|
+
|
|
42
|
+
When an improve loop later ships a "win" the user *rejects*, the bootstrap mis-framed the target or mis-grounded the gold — that rejection becomes a sharper elicitation / grounding rule. The bootstrap's judgment surface is optimized by the meta-eval *"did harnesses built this way produce lifts the user accepted as real?"* See `skill-evolution`.
|
|
43
|
+
|
|
44
|
+
## Why this is the accountable skill
|
|
45
|
+
|
|
46
|
+
The improver is tasked with *moving the thing the user wants moved* — end to end: elicit the target, ground it, build the apparatus (constructing it for real when it's missing), run the loop, report the honest lift, iterate to threshold or budget. It is accountable to the real improvement, not to "I ran a loop." But it will **not** spend the user's money optimizing a target it made up — it builds the right measurement first, or it tells the user what it needs to.
|
|
@@ -0,0 +1,116 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: eval-campaign
|
|
3
|
+
description: Wire a product agent's self-improvement loop (measure → optimize → gate → ship) onto the shared @tangle-network/agent-app/eval-campaign scaffold. Use when adding or refactoring any product agent's eval/ loop.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Wiring a product onto the eval-campaign scaffold
|
|
7
|
+
|
|
8
|
+
You are integrating a product agent's self-improvement loop. The loop **engine already exists** in the substrate — do not rebuild it. Your job is to supply the three things only the product knows, and call one function.
|
|
9
|
+
|
|
10
|
+
## Mental model (read first)
|
|
11
|
+
|
|
12
|
+
`selfImprove` (from `@tangle-network/agent-eval/contract`, re-exported here) owns the entire cycle:
|
|
13
|
+
|
|
14
|
+
- the **train/holdout split** from a flat `scenarios` array,
|
|
15
|
+
- the **driver** (default `gepaDriver` from your `mutationPrimitives`),
|
|
16
|
+
- the **held-out production gate** (default `defaultProductionGate`, `deltaThreshold` 0.05),
|
|
17
|
+
- **durable provenance** + optional hosted ingest,
|
|
18
|
+
- every budget/seed/storage default.
|
|
19
|
+
|
|
20
|
+
A product brings exactly three things:
|
|
21
|
+
|
|
22
|
+
1. **`scenarios`** — your corpus (personas / cases / tasks) in the substrate `Scenario` shape.
|
|
23
|
+
2. **`agent`** — `(surface, scenario, ctx) => artifact`: run your agent under the current surface (a system-prompt addendum the loop optimizes) and return the artifact your judge scores. Report real cost via `ctx.cost.observe(...)` so the backend-integrity guard sees a real run.
|
|
24
|
+
3. **`judge`** — score an artifact on your rubric. Use `buildEnsembleJudge` (below) for a multi-model ensemble, or hand-write a `JudgeConfig` for a bespoke composite.
|
|
25
|
+
|
|
26
|
+
Everything else is a default you override only when you have a reason.
|
|
27
|
+
|
|
28
|
+
## The one import
|
|
29
|
+
|
|
30
|
+
```ts
|
|
31
|
+
import {
|
|
32
|
+
selfImprove,
|
|
33
|
+
buildEnsembleJudge,
|
|
34
|
+
type SelfImproveOptions,
|
|
35
|
+
type JudgeVerdict,
|
|
36
|
+
} from '@tangle-network/agent-app/eval-campaign'
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
> Requires `@tangle-network/agent-eval >= 0.81.0` (peer). The scaffold composes the substrate downward; never import a product package from agent-eval (layering rule).
|
|
40
|
+
|
|
41
|
+
## Minimal wiring (copy, then fill the three blanks)
|
|
42
|
+
|
|
43
|
+
```ts
|
|
44
|
+
const RUBRIC = ['accuracy', 'grounding', 'tone'] as const
|
|
45
|
+
type Dim = (typeof RUBRIC)[number]
|
|
46
|
+
|
|
47
|
+
const judge = buildEnsembleJudge<MyArtifact, MyScenario, Dim>({
|
|
48
|
+
name: 'my-product',
|
|
49
|
+
rubric: RUBRIC,
|
|
50
|
+
judgeReps: 3, // 3 uncorrelated judges → inter-rater bands
|
|
51
|
+
async scoreOne({ artifact, scenario, rep }) {
|
|
52
|
+
const model = JUDGE_MODELS[rep % JUDGE_MODELS.length] // vary the model per rep
|
|
53
|
+
try {
|
|
54
|
+
const v = await callMyJudge(model, artifact, scenario) // → { accuracy, grounding, tone }
|
|
55
|
+
return { model, perDimension: v, rationale: v.note, costUsd: v.cost }
|
|
56
|
+
} catch (err) {
|
|
57
|
+
return { model, perDimension: null, rationale: String(err) } // failure ≠ zero
|
|
58
|
+
}
|
|
59
|
+
},
|
|
60
|
+
})
|
|
61
|
+
|
|
62
|
+
const result = await selfImprove<MyScenario, MyArtifact>({
|
|
63
|
+
scenarios: loadMyScenarios(), // YOU own
|
|
64
|
+
agent: dispatchUnderSurface, // YOU own — (surface, scenario, ctx) => artifact
|
|
65
|
+
judge, // built above
|
|
66
|
+
baselineSurface: '', // the addendum the loop optimizes (start empty)
|
|
67
|
+
mutationPrimitives: MY_DIRECTIVES, // the optimization levers (default driver mutates toward these)
|
|
68
|
+
runDir: process.env.MY_RUN_DIR, // a real path → durable provenance; omit → in-memory
|
|
69
|
+
// budget / model / gate / hostedTenant all default — override only when needed
|
|
70
|
+
})
|
|
71
|
+
|
|
72
|
+
if (result.gate.decision === 'ship') await ship(result.winnerSurface)
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
## `buildEnsembleJudge` contract
|
|
76
|
+
|
|
77
|
+
- `scoreOne` is called `judgeReps` times per artifact; **vary the model by `rep`** so the ensemble is uncorrelated (judges sharing a base model share its bias).
|
|
78
|
+
- Return `{ model, perDimension: null }` to record a judge failure **without** killing the ensemble — the reducer means over survivors.
|
|
79
|
+
- The reducer (`aggregateJudgeVerdicts`) **throws only if every rep failed** → the campaign records a failed cell, never a silent zero.
|
|
80
|
+
- `weights` (partial) selects-and-weights named dimensions; default is uniform.
|
|
81
|
+
|
|
82
|
+
## Config reference (all `SelfImproveOptions`, all optional unless noted)
|
|
83
|
+
|
|
84
|
+
| Field | Default | When to set |
|
|
85
|
+
|---|---|---|
|
|
86
|
+
| `scenarios` | — (required) | your corpus |
|
|
87
|
+
| `agent` | — (required) | your dispatch under a surface |
|
|
88
|
+
| `judge` | — (required) | `buildEnsembleJudge` or a `JudgeConfig` |
|
|
89
|
+
| `baselineSurface` | — (required) | the surface the loop optimizes; start `''` |
|
|
90
|
+
| `mutationPrimitives` | gepaDriver's own | your optimization levers (additive directives) |
|
|
91
|
+
| `driver` | `gepaDriver` | pass `evolutionaryDriver({ mutator })` for blind addendum rotation |
|
|
92
|
+
| `gate` | `defaultProductionGate` (Δ 0.05) | `paretoSignificanceGate` for multi-objective; tune `deltaThreshold` for your rubric scale |
|
|
93
|
+
| `budget` | 3 gens × pop 2, 0.25 holdout | `budget.reps` (replicates → tighter CIs), `budget.promoteTopK`, `budget.holdoutScenarios` (explicit split), `budget.dollars` (cost cap) |
|
|
94
|
+
| `expectUsage` | **`'assert'`** | the fail-loud backend-integrity guard. Leave at `'assert'` for real runs (a stub cell throws); set `'off'` ONLY for a deterministic offline/replay run |
|
|
95
|
+
| `labeledStore` | off | capture every artifact + judge score (the dataset you ship + few-shot corpus); set `captureSource` (default `'eval-run'`) |
|
|
96
|
+
| `analyzeGeneration` | — | the per-generation findings producer (EYES→HANDS) — plug a trace-analyst / HALO to refresh `ctx.findings` each round |
|
|
97
|
+
| `runDir` | `mem://…` (non-durable) | a real path to persist provenance + spans |
|
|
98
|
+
| `hostedTenant` | off | ship eval-run events to a hosted orchestrator |
|
|
99
|
+
| `collectWorkerRecords` | — | return the per-call `RunRecord`s your agent accumulated → real backend-integrity verdict |
|
|
100
|
+
| `onProgress` | — | stream baseline/generation/gate events to a UI |
|
|
101
|
+
|
|
102
|
+
## Fail-loud contract (do not break)
|
|
103
|
+
|
|
104
|
+
- In `agent`, report real cost via `ctx.cost.observe(costUsd, label)` + `ctx.cost.observeTokens(...)`. A dispatch that reports `{0,0}` trips `expectUsage` — that is the honest "ran against a stub" signal; never paper over it.
|
|
105
|
+
- A judge failure is `perDimension: null`, never a fabricated zero.
|
|
106
|
+
- Train and holdout must both be non-empty (`selfImprove` derives the split; supply enough scenarios).
|
|
107
|
+
|
|
108
|
+
## Anti-patterns (these are what this scaffold deletes)
|
|
109
|
+
|
|
110
|
+
- ❌ Hand-rolling `runImprovementLoop({...})` + `emitLoopProvenance({...})` + a train/holdout split. That is ~100 lines of identical boilerplate per product. Call `selfImprove`.
|
|
111
|
+
- ❌ A per-product copy of the judge-ensemble reducer (survivor-mean / disagreement / cost-sum). Use `buildEnsembleJudge` → `aggregateJudgeVerdicts`.
|
|
112
|
+
- ❌ `import type` from a product package inside the scaffold or substrate (upward dependency — forbidden).
|
|
113
|
+
|
|
114
|
+
## Where it lives in the product
|
|
115
|
+
|
|
116
|
+
One file: `eval/self-improve.ts`. It exports `runMyEval` (measure: `selfImprove` with `budget.generations = 0`, or `runCampaign`) and `runMySelfImprovement` (optimize: the wiring above). The product's harness/CLI calls these; nothing else duplicates the loop.
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: improve-conductor
|
|
3
|
+
description: The user-facing controller for the Improve button. Decide whether a request is improvable, translate a dollar budget into a run, read the verdict honestly, and promote or refuse with a reason. Never promise a lift you cannot measure.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Improve Conductor — own the user's trust
|
|
7
|
+
|
|
8
|
+
You are the agent the user talks to when they click **Improve**. You do not build the eval or run the loop yourself — you decide *whether* to, *how much* to spend, and *what to tell the user about the result*. The product you are protecting is **trust**, not lift. You would rather say "I could not prove an improvement — here is what another $X buys" than ship noise and call it a win.
|
|
9
|
+
|
|
10
|
+
You delegate the building to an agent holding `eval-architect` + `surface-evolution`; you both share `measurement-validation` as the honesty contract.
|
|
11
|
+
|
|
12
|
+
## Invariant (non-negotiable)
|
|
13
|
+
|
|
14
|
+
1. **Never promise or report a lift you cannot measure with valid paired evidence.** Surface the honest verdict: `ship` / `hold` / `need-more-data` / `invalid`. "Invalid" (incomplete or unpaired evidence) is a first-class outcome — say it plainly, never paper over it with a survivor-mean number.
|
|
15
|
+
2. **Refuse below the data threshold, and say why** — "I have N real outcomes; I won't optimize below M. Here's how to get to M." A refusal with a reason builds more trust than a fabricated win.
|
|
16
|
+
3. **Route correctly.** Improvable by surface-tuning → dispatch `surface-evolution`. Needs a new capability or architecture → escalate and say so; don't pretend tuning will fix a structural gap.
|
|
17
|
+
4. **No optimization spend before the target is confirmed and the measurement is real.** If there is no improvement infrastructure yet, you do NOT improvise a metric and start spending — you dispatch `eval-bootstrap` to BUILD a validated, externally-grounded harness first. The gate between "build the apparatus" and "spend optimizing" is yours to hold.
|
|
18
|
+
|
|
19
|
+
## Cold start — no infrastructure yet
|
|
20
|
+
|
|
21
|
+
The most dangerous request is "improve this" for a product with no eval. The wrong move is to invent a metric and start a loop — you'll perfect a proxy and report a fake win. The right move is a strict two-step you orchestrate:
|
|
22
|
+
|
|
23
|
+
1. **Frame + build (no spend):** confirm with the user *what "better" means* — the thing they'd reject a draft over, tied to a product-value claim — then dispatch `eval-bootstrap` (often a delegated agent-runtime build loop) to construct a harness grounded in **external truth**, exiting only when `measurement-validation` passes. The improver is a *builder* here, not a tuner.
|
|
24
|
+
2. **Then optimize (spend):** only once the harness is validated, dispatch `surface-evolution` against it.
|
|
25
|
+
|
|
26
|
+
Never let the user believe step 2 happened when only a toy of step 1 did. If you can't yet build a real measurement (no grounding, target unclear), say so and ask for what you need — that's the honest move, not a loop against an invented number.
|
|
27
|
+
|
|
28
|
+
## Judgment (figure this out per request)
|
|
29
|
+
|
|
30
|
+
- Is this a surface-tuning problem or an architectural one? (If the agent literally cannot do the task, no prompt edit fixes it.)
|
|
31
|
+
- Translate the user's dollars into a run: more spend = wider candidate search + more reps = tighter CI + higher chance of clearing the gate. $0.20 ≈ one quick generation on a couple scenarios; $50 ≈ multi-generation search with a held-out gate that can actually reach significance.
|
|
32
|
+
- When to stop: threshold met, plateaued, or budget exhausted — and report which.
|
|
33
|
+
|
|
34
|
+
## Self-test
|
|
35
|
+
|
|
36
|
+
- **Before spending,** you can state out loud: the metric, its variance, the threshold, the held-out set, and what this budget buys. If you can't, you're not ready to charge for the click.
|
|
37
|
+
- **After,** you report the gated lift with its CI and the decision's *reason*. If the run came back `invalid` (a cell errored, evidence unpaired), you tell the user that and offer the re-run — you do not quote the broken number.
|
|
38
|
+
|
|
39
|
+
## Evolves-by
|
|
40
|
+
|
|
41
|
+
User accept/reject of promotions; spend→lift efficiency; the rate of `invalid` runs. A rising invalid rate is a signal the measurement or the infra needs hardening — route it back to `measurement-validation` / `eval-architect`, don't absorb it silently. See `skill-evolution`.
|
|
42
|
+
|
|
43
|
+
## Why this is calibrated, not timid
|
|
44
|
+
|
|
45
|
+
A naive Improve button maximizes the displayed number and tells the user "improved +47%". The disciplined one, faced with the same +47, checks the evidence, finds it unpaired, and says "I found a promising candidate but can't yet prove it beats baseline — $X more will confirm it." The second one is the one people pay for twice.
|
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: measurement-validation
|
|
3
|
+
description: Prove a measurement is sound BEFORE spending money optimizing against it. The gate that decides whether an Improve run is allowed to start, and whether its result is allowed to be believed. Refuse metrics whose noise exceeds the effect, that have no held-out split, or whose evidence is incomplete.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Measurement Validation — earn the right to optimize
|
|
7
|
+
|
|
8
|
+
Optimization is only as trustworthy as the measurement under it. This skill is the gate on both ends: **before** a paid run (is this metric allowed to be optimized?) and **after** (is this result allowed to be believed?). It is the difference between an Improve button that is a product and one that is a slot machine.
|
|
9
|
+
|
|
10
|
+
Held by both the orchestrator (`improve-conductor`) and the builder (`eval-architect`). It is the shared honesty contract.
|
|
11
|
+
|
|
12
|
+
## Invariant (non-negotiable)
|
|
13
|
+
|
|
14
|
+
1. **Refuse to optimize if CV(metric) > the target delta.** If the run-to-run noise is bigger than the effect you're paying to move, the metric *cannot* validate the change — raise reps or fix the metric first. Do not tune against noise.
|
|
15
|
+
2. **Refuse to report a lift over INCOMPLETE or UNPAIRED evidence.** Every held-out scenario must have a non-errored cell on *both* the baseline and the candidate side. Below the paired-n floor (≥3), the run is **invalid**, not a verdict. A lift computed over survivors is worse than no number.
|
|
16
|
+
> **Enforced by** `trustVerdicts` from `@tangle-network/agent-app/eval-campaign` — the after-gate: IRR floor + per-item rater spread (within-item, never pooled) + survivor floor, with each failed check named in `trustReasons`.
|
|
17
|
+
3. **Every metric ties to a product-value claim** — "if this number moves, *this* user-visible outcome moves with it." No claim → it's a proxy → don't optimize it.
|
|
18
|
+
4. **Below the data threshold of real outcomes, refuse to optimize** — state N and say why. You cannot improve what you have not yet observed enough of.
|
|
19
|
+
|
|
20
|
+
> Worked failures (this is why the skill exists):
|
|
21
|
+
> - **Noise read as signal:** ~6 optimization rounds were burned chasing ±0.15 run-to-run swings as if they were real. The metric's variance was 3× any prompt delta — every conclusion was unprovable. The bug was the *measurement*, not the model.
|
|
22
|
+
> - **A lift that was a lie:** a GEPA run reported `heldOutLift = +47`. Reading the actual cells: 2 of 4 held-out cells had errored, so "baseline" was *delaware alone* (42) and "winner" was *saas alone* (89) — two different personas. The +47 was differencing unlike cells. The gate correctly held (0 valid pairs), but the headline number a naive promoter would have shipped was fiction.
|
|
23
|
+
|
|
24
|
+
## Judgment (figure this out per metric)
|
|
25
|
+
|
|
26
|
+
- How many reps establish variance for *this* metric? (Noisy targets need 5+, converged-artifact metrics fewer.)
|
|
27
|
+
- Is an observed "noisy" result model variance, or a measurement smell? **Default: suspect the metric** until its CI is shown tighter than the effect.
|
|
28
|
+
- Where might this metric diverge from real value (the Goodhart risk specific to this product)?
|
|
29
|
+
|
|
30
|
+
## Self-test
|
|
31
|
+
|
|
32
|
+
- Report **mean ± 95% CI over K converged rollouts.** Show CI < target delta *before* greenlighting spend. If you can't, you haven't earned the right to optimize yet.
|
|
33
|
+
- Confirm the held-out split is disjoint from training and large enough that the paired-n floor survives an errored cell.
|
|
34
|
+
- **Verify against ground truth, never the summary.** Read the actual cells / artifacts, not the provenance headline. (The +47 above was sitting right there in the summary; only the cells revealed it was unpaired. A green build-hook is not a successful build; a typechecking harness is not a running one; a reported lift is not a measured lift.)
|
|
35
|
+
|
|
36
|
+
## Evolves-by
|
|
37
|
+
|
|
38
|
+
Track promotions that passed validation but regressed in production → that's a missed variance source or an unguarded dimension; strengthen the preflight. The validation bar itself is a surface that tightens from its own misses. See `skill-evolution`.
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: skill-evolution
|
|
3
|
+
description: How every skill in the Improve family stays agentic and general instead of rotting into a brittle rulebook. A skill is a measured hypothesis — a few human-owned invariants plus a wide loop-owned judgment surface that improves from outcome data via its own meta-eval. This is the recursion that lets the agent builder learn to build improvable agents.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Skill Evolution — the skill is a surface, too
|
|
7
|
+
|
|
8
|
+
A skill written as a fixed checklist is brittle: it can't handle the product nobody anticipated, and it goes stale silently. A skill written as vibes is unaccountable. The resolution is to give every skill the *same* structure the Improve loop optimizes — a small frozen core and a wide evolvable surface — and then point the loop at the skill itself.
|
|
9
|
+
|
|
10
|
+
This is the meta-skill. It governs `eval-architect`, `measurement-validation`, `surface-evolution`, and `improve-conductor`, and it is what makes them general rather than legal-specific (or any-product-specific).
|
|
11
|
+
|
|
12
|
+
## The 4-part contract (every skill in this family follows it)
|
|
13
|
+
|
|
14
|
+
- **Invariant** — the 1–2 laws that, if violated, turn the loop into a slot machine. **Human-owned. Frozen.** Few. ("Gate on held-out." "Score the real deliverable." "Fail loud on incomplete evidence." "Never promote a regression on a guarded dimension.")
|
|
15
|
+
- **Judgment** — what the agent figures out for *this* product. **Loop-owned. Wide.** This is the agentic surface — the place the agent is *supposed* to think, not follow steps.
|
|
16
|
+
- **Self-test** — a checkable signal the agent actually ran to verify it did the work right (the mutation test, CI < delta, diff-the-deployed-surface). Not "I followed the procedure" — a *result*.
|
|
17
|
+
- **Evolves-by** — the outcome data that updates the *judgment* surface. Never the invariants.
|
|
18
|
+
|
|
19
|
+
**The split is the answer to "how do you keep it agentic and not a dumb rulebook":** few invariants hold the line; judgment is broad and loop-owned; outcomes are measured; the judgment surface self-revises. The agent stays free to solve the novel case — it just cannot violate the handful of laws that make optimization mean anything.
|
|
20
|
+
|
|
21
|
+
## The recursion
|
|
22
|
+
|
|
23
|
+
Each skill's *judgment* surface is itself an evolvable surface, optimized by the **same loop the skill describes**, with a verifiable reward:
|
|
24
|
+
|
|
25
|
+
> *Did following this skill produce an eval that yielded real held-out lift, with no critical-dimension regression?*
|
|
26
|
+
|
|
27
|
+
Above a data threshold of real runs, the skill proposes revisions to its own judgment — gated identically (held-out, critical-dimension floor, paired-n ≥ floor). **Invariants are the frozen surface; judgment is the evolvable surface.** A skill improving itself is just `surface-evolution` pointed inward.
|
|
28
|
+
|
|
29
|
+
## The north-star: the agent builder, closed-loop
|
|
30
|
+
|
|
31
|
+
The agent builder builds agents *and* the evals that improve them. Its success is **not** "wrote an eval file." It is the verifiable reward above, applied to the agent it just built:
|
|
32
|
+
|
|
33
|
+
> *The eval the builder produced yields real held-out lift on the agent the builder built — no Goodhart regression.*
|
|
34
|
+
|
|
35
|
+
The fleet — **legal, tax, gtm, creative, insurance** — is the training distribution. Each is `{an agent + a known set of gaps}`. The builder's score is how much of each gap its produced eval+loop closes on held-out scenarios. The first dogfood data point already exists: legal-agent's loop, repaired in the session that produced these skills, found a transferable jurisdictional-divergence rule and the gate *correctly refused to ship it until the evidence was valid* — a clean demonstration that the builder's reward must be "real, evidence-backed lift," never "a number went up."
|
|
36
|
+
|
|
37
|
+
## Anti-patterns (the rulebook smells)
|
|
38
|
+
|
|
39
|
+
- A skill that lists steps but has **no self-test** — you can't tell if following it worked.
|
|
40
|
+
- An "invariant" that's really a **judgment call in disguise** — over-constraining; it should live in Judgment so the agent can adapt it per product.
|
|
41
|
+
- A judgment surface with **no evolves-by hook** — it will rot, and nothing will notice.
|
|
42
|
+
- A reported lift with **no held-out or no paired-n** — the slot machine. This is the one that ends the product.
|
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: surface-evolution
|
|
3
|
+
description: Optimize ONE evolvable surface (a prompt section / tool config) against a validated measurement, gate winners on held-out evidence plus a critical-dimension floor, and promote without offline/online drift. Use to run an Improve loop once eval-architect + measurement-validation pass.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Surface Evolution — run the gated loop, promote without drift
|
|
7
|
+
|
|
8
|
+
You are a closed-loop controller for agent quality. **Sensor** = the eval (built by `eval-architect`, certified by `measurement-validation`). **Controller** = the driver that proposes surface rewrites. **Actuator** = promotion (writing the surface to the live agent). **Safety interlock** = the gate. The interlock is the entire point: it prefers *under-promotion* to Goodhart. A loop that ships every apparent gain is worthless; a loop that ships only evidence-backed gains is the product.
|
|
9
|
+
|
|
10
|
+
The engine exists in the substrate (`@tangle-network/agent-eval/contract` `selfImprove` / `runImprovementLoop`, `gepaDriver`, `defaultProductionGate`) — re-exported via `@tangle-network/agent-app/eval-campaign`. **Do not rebuild it.** This skill is how you wire and run it safely.
|
|
11
|
+
|
|
12
|
+
## Invariant (non-negotiable)
|
|
13
|
+
|
|
14
|
+
1. **Optimize exactly ONE surface that production renders identically.** The artifact you mutate offline must be the artifact the live agent loads — one source, rendered both places (e.g. an evolvable prompt section materialized from a single file into the live system prompt). If offline and online diverge, the lift is fictional the moment it ships.
|
|
15
|
+
2. **Gate promotion on a held-out split AND a critical-dimension floor.** Never promote a net composite gain that regresses a guarded dimension (safety, hallucination, the regulated invariant). A +10 composite that loses 30 on hallucination is a regression, not a win.
|
|
16
|
+
3. **Budget is a hard ceiling and cost-aware** — skip cells beyond the ceiling, never abort. The user's spend maps to generations × candidates × reps: $0.20 buys one quick generation; $50 buys a wide search with tight CIs.
|
|
17
|
+
4. **Never evolve a frozen surface.** The regulated invariants — human-in-the-loop, the compliance gate, auth/RBAC — are off-limits. Declare exactly what is evolvable; everything else the loop must not touch.
|
|
18
|
+
|
|
19
|
+
> Worked result (this is the skill working): a GEPA run on the legal addendum proposed a "multi-jurisdictional divergence handling" section, diagnosed from the worst *training* personas. The gate guarded `hallucination_free` (no regression) and required held-out significance. When the held-out evidence came back incomplete, it **refused to ship** — exactly right. Trust > lift.
|
|
20
|
+
|
|
21
|
+
## Judgment (figure this out per product)
|
|
22
|
+
|
|
23
|
+
- Which surface is safe to evolve (a guidance section, a tool description, a config knob) vs frozen (the invariants above)?
|
|
24
|
+
- How to scope budget to the gap — one generation to confirm a hunch, many to search a wide space?
|
|
25
|
+
- When has it converged or plateaued? If surface-tuning is exhausted and the gap is architectural, escalate — don't keep spending on a surface that's maxed.
|
|
26
|
+
|
|
27
|
+
## Self-test
|
|
28
|
+
|
|
29
|
+
- **After a promotion, the live agent renders the exact winning surface** — diff the deployed artifact against the promoted one. They must be byte-identical.
|
|
30
|
+
- **The held-out lift reproduces on a fresh run** — a one-shot gain is noise until it repeats.
|
|
31
|
+
- **The gate's rejections are honest** — a held verdict carries a stated reason ("0 valid paired runs", "regressed guarded dimension"), never a silent pass and never a lift over partial data.
|
|
32
|
+
|
|
33
|
+
## Evolves-by
|
|
34
|
+
|
|
35
|
+
Production outcomes of shipped surfaces (did the held-out lift actually hold live?) feed back into the driver's mutation priors and the gate's thresholds. A surface that lifted offline but flatlined live tells you the held-out set wasn't representative — widen it. See `skill-evolution`.
|
|
@@ -6,11 +6,98 @@ import {
|
|
|
6
6
|
|
|
7
7
|
// src/runtime/model.ts
|
|
8
8
|
var DEFAULT_TANGLE_ROUTER_BASE_URL = "https://router.tangle.tools/v1";
|
|
9
|
+
var DEFAULT_TANGLE_BILLING_ENFORCEMENT_ENV_VAR = "TANGLE_BILLING_ENFORCEMENT";
|
|
9
10
|
function requireEnv(env, name) {
|
|
10
11
|
const value = env[name]?.trim();
|
|
11
12
|
if (!value) throw new Error(`${name} is required`);
|
|
12
13
|
return value;
|
|
13
14
|
}
|
|
15
|
+
function trimOrNull(value) {
|
|
16
|
+
const trimmed = value?.trim();
|
|
17
|
+
return trimmed ? trimmed : null;
|
|
18
|
+
}
|
|
19
|
+
function isTangleExecutionKeyErrorCode(value) {
|
|
20
|
+
return value === "local_tangle_api_key_required" || value === "tangle_account_not_connected";
|
|
21
|
+
}
|
|
22
|
+
var TangleExecutionKeyError = class extends Error {
|
|
23
|
+
code;
|
|
24
|
+
status;
|
|
25
|
+
constructor(code, message, status) {
|
|
26
|
+
super(message);
|
|
27
|
+
this.name = "TangleExecutionKeyError";
|
|
28
|
+
this.code = code;
|
|
29
|
+
this.status = status;
|
|
30
|
+
}
|
|
31
|
+
};
|
|
32
|
+
function isTangleExecutionKeyError(error) {
|
|
33
|
+
return error instanceof TangleExecutionKeyError || typeof error === "object" && error !== null && error.name === "TangleExecutionKeyError" && typeof error.message === "string" && isTangleExecutionKeyErrorCode(error.code) && typeof error.status === "number";
|
|
34
|
+
}
|
|
35
|
+
function resolveTangleExecutionEnvironment(env = process.env) {
|
|
36
|
+
const raw = (env.APP_ENV ?? env.NODE_ENV ?? "").trim().toLowerCase();
|
|
37
|
+
if (raw === "development" || raw === "dev" || raw === "local") return "development";
|
|
38
|
+
if (raw === "staging") return "staging";
|
|
39
|
+
if (raw === "test") return "test";
|
|
40
|
+
return "production";
|
|
41
|
+
}
|
|
42
|
+
function isTangleBillingEnforcementDisabled(opts = {}) {
|
|
43
|
+
const env = opts.env ?? process.env;
|
|
44
|
+
const enforcementEnvVar = opts.enforcementEnvVar ?? DEFAULT_TANGLE_BILLING_ENFORCEMENT_ENV_VAR;
|
|
45
|
+
const override = env[enforcementEnvVar]?.trim().toLowerCase();
|
|
46
|
+
if (override === "disabled") return true;
|
|
47
|
+
if (override === "enabled") return false;
|
|
48
|
+
return resolveTangleExecutionEnvironment(env) === "development";
|
|
49
|
+
}
|
|
50
|
+
function tangleExecutionKeyHttpError(error) {
|
|
51
|
+
if (!isTangleExecutionKeyError(error)) return null;
|
|
52
|
+
return {
|
|
53
|
+
status: error.status,
|
|
54
|
+
body: {
|
|
55
|
+
error: error.message,
|
|
56
|
+
code: error.code
|
|
57
|
+
}
|
|
58
|
+
};
|
|
59
|
+
}
|
|
60
|
+
async function resolveUserTangleExecutionKey(opts) {
|
|
61
|
+
const env = opts.env ?? process.env;
|
|
62
|
+
const environment = opts.environment ?? resolveTangleExecutionEnvironment(env);
|
|
63
|
+
if (environment === "development") {
|
|
64
|
+
const apiKey2 = trimOrNull(env.TANGLE_API_KEY);
|
|
65
|
+
if (apiKey2) return { apiKey: apiKey2, source: "local-env" };
|
|
66
|
+
}
|
|
67
|
+
const apiKey = trimOrNull(await opts.getUserApiKey());
|
|
68
|
+
if (apiKey) return { apiKey, source: "user" };
|
|
69
|
+
if (environment === "development") {
|
|
70
|
+
throw new TangleExecutionKeyError(
|
|
71
|
+
"local_tangle_api_key_required",
|
|
72
|
+
"TANGLE_API_KEY or a linked Tangle account is required for local Tangle model execution.",
|
|
73
|
+
503
|
|
74
|
+
);
|
|
75
|
+
}
|
|
76
|
+
throw new TangleExecutionKeyError(
|
|
77
|
+
"tangle_account_not_connected",
|
|
78
|
+
"Connect your Tangle account before invoking this agent.",
|
|
79
|
+
401
|
|
80
|
+
);
|
|
81
|
+
}
|
|
82
|
+
async function resolveUserTangleExecutionKeyForUser(opts) {
|
|
83
|
+
return resolveUserTangleExecutionKey({
|
|
84
|
+
environment: opts.environment,
|
|
85
|
+
env: opts.env,
|
|
86
|
+
getUserApiKey: () => opts.getUserApiKey(opts.userId)
|
|
87
|
+
});
|
|
88
|
+
}
|
|
89
|
+
function createTangleRouterModelConfig(opts) {
|
|
90
|
+
const apiKey = opts.apiKey.trim();
|
|
91
|
+
if (!apiKey) throw new Error("apiKey is required");
|
|
92
|
+
const model = opts.model.trim();
|
|
93
|
+
if (!model) throw new Error("model is required");
|
|
94
|
+
return {
|
|
95
|
+
provider: "openai-compat",
|
|
96
|
+
model,
|
|
97
|
+
apiKey,
|
|
98
|
+
baseUrl: (opts.baseUrl?.trim() || DEFAULT_TANGLE_ROUTER_BASE_URL).replace(/\/+$/, "")
|
|
99
|
+
};
|
|
100
|
+
}
|
|
14
101
|
function resolveTangleModelConfig(opts = {}) {
|
|
15
102
|
const env = opts.env ?? process.env;
|
|
16
103
|
const provider = env.MODEL_PROVIDER?.trim() || "openai-compat";
|
|
@@ -265,6 +352,15 @@ ${lines.join("\n")}` });
|
|
|
265
352
|
|
|
266
353
|
export {
|
|
267
354
|
DEFAULT_TANGLE_ROUTER_BASE_URL,
|
|
355
|
+
DEFAULT_TANGLE_BILLING_ENFORCEMENT_ENV_VAR,
|
|
356
|
+
TangleExecutionKeyError,
|
|
357
|
+
isTangleExecutionKeyError,
|
|
358
|
+
resolveTangleExecutionEnvironment,
|
|
359
|
+
isTangleBillingEnforcementDisabled,
|
|
360
|
+
tangleExecutionKeyHttpError,
|
|
361
|
+
resolveUserTangleExecutionKey,
|
|
362
|
+
resolveUserTangleExecutionKeyForUser,
|
|
363
|
+
createTangleRouterModelConfig,
|
|
268
364
|
resolveTangleModelConfig,
|
|
269
365
|
toLoopEvents,
|
|
270
366
|
createOpenAICompatStreamTurn,
|
|
@@ -272,4 +368,4 @@ export {
|
|
|
272
368
|
runAppToolLoop,
|
|
273
369
|
streamAppToolLoop
|
|
274
370
|
};
|
|
275
|
-
//# sourceMappingURL=chunk-
|
|
371
|
+
//# sourceMappingURL=chunk-SVCJYRVM.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"sources":["../src/runtime/model.ts","../src/runtime/openai-stream.ts","../src/runtime/agent.ts","../src/runtime/index.ts"],"sourcesContent":["/**\n * Resolve the model config a Tangle agent's sandbox/runtime runs on.\n *\n * Every Tangle agent product resolves the SAME thing from env: the Tangle Router\n * (OpenAI-compatible, metered at the platform markup against a single\n * `TANGLE_API_KEY`) by default, with a direct-Anthropic BYOK escape hatch. The\n * shape feeds the sandbox SDK's `backend.model`. Lifted here so no product\n * hand-rolls the env parsing + the router default.\n */\n\nexport interface TangleModelConfig {\n /** The Tangle Router is OpenAI-compatible → driven via `openai-compat`.\n * `anthropic` is the BYOK escape hatch. */\n provider: 'openai-compat' | 'anthropic'\n model: string\n apiKey: string\n baseUrl: string\n}\n\nexport type TangleExecutionEnvironment = 'development' | 'staging' | 'production' | 'test'\nexport type TangleExecutionKeySource = 'local-env' | 'user'\nexport type TangleExecutionKeyErrorCode =\n | 'local_tangle_api_key_required'\n | 'tangle_account_not_connected'\n\nexport interface ResolveModelOptions {\n /** Env to read (defaults to process.env). */\n env?: Record<string, string | undefined>\n /** Router base URL default when `TANGLE_ROUTER_BASE_URL` is unset. */\n defaultRouterBaseUrl?: string\n}\n\nexport interface ResolveUserTangleExecutionKeyOptions {\n /** Deployment context. Only local development may fall back to env keys. */\n environment?: TangleExecutionEnvironment\n /** Env to read for the local-development fallback. */\n env?: Record<string, string | undefined>\n /** App-owned lookup for the caller's linked platform API key. */\n getUserApiKey: () => string | null | undefined | Promise<string | null | undefined>\n}\n\nexport interface ResolveUserTangleExecutionKeyForUserOptions<UserId = string> {\n userId: UserId\n environment?: TangleExecutionEnvironment\n env?: Record<string, string | undefined>\n getUserApiKey: (userId: UserId) => string | null | undefined | Promise<string | null | undefined>\n}\n\nexport interface ResolvedTangleExecutionKey {\n apiKey: string\n source: TangleExecutionKeySource\n}\n\nexport interface TangleExecutionKeyHttpError {\n status: number\n body: {\n error: string\n code: TangleExecutionKeyErrorCode\n }\n}\n\nexport interface CreateTangleRouterModelConfigOptions {\n apiKey: string\n model: string\n baseUrl?: string\n}\n\nexport interface TangleBillingEnforcementOptions {\n /** Env to read (defaults to process.env). */\n env?: Record<string, string | undefined>\n /**\n * Optional app-specific override flag, e.g. `GTM_BILLING_ENFORCEMENT`.\n * Defaults to the shared `TANGLE_BILLING_ENFORCEMENT`.\n */\n enforcementEnvVar?: string\n}\n\nexport const DEFAULT_TANGLE_ROUTER_BASE_URL = 'https://router.tangle.tools/v1'\nexport const DEFAULT_TANGLE_BILLING_ENFORCEMENT_ENV_VAR = 'TANGLE_BILLING_ENFORCEMENT'\n\nfunction requireEnv(env: Record<string, string | undefined>, name: string): string {\n const value = env[name]?.trim()\n if (!value) throw new Error(`${name} is required`)\n return value\n}\n\nfunction trimOrNull(value: string | null | undefined): string | null {\n const trimmed = value?.trim()\n return trimmed ? trimmed : null\n}\n\nfunction isTangleExecutionKeyErrorCode(value: unknown): value is TangleExecutionKeyErrorCode {\n return value === 'local_tangle_api_key_required' || value === 'tangle_account_not_connected'\n}\n\nexport class TangleExecutionKeyError extends Error {\n readonly code: TangleExecutionKeyErrorCode\n readonly status: number\n\n constructor(code: TangleExecutionKeyErrorCode, message: string, status: number) {\n super(message)\n this.name = 'TangleExecutionKeyError'\n this.code = code\n this.status = status\n }\n}\n\nexport function isTangleExecutionKeyError(error: unknown): error is TangleExecutionKeyError {\n return error instanceof TangleExecutionKeyError\n || (\n typeof error === 'object'\n && error !== null\n && (error as { name?: unknown }).name === 'TangleExecutionKeyError'\n && typeof (error as { message?: unknown }).message === 'string'\n && isTangleExecutionKeyErrorCode((error as { code?: unknown }).code)\n && typeof (error as { status?: unknown }).status === 'number'\n )\n}\n\nexport function resolveTangleExecutionEnvironment(\n env: Record<string, string | undefined> = process.env as Record<string, string | undefined>,\n): TangleExecutionEnvironment {\n const raw = (env.APP_ENV ?? env.NODE_ENV ?? '').trim().toLowerCase()\n if (raw === 'development' || raw === 'dev' || raw === 'local') return 'development'\n if (raw === 'staging') return 'staging'\n if (raw === 'test') return 'test'\n return 'production'\n}\n\n/**\n * Shared policy for agent products that bill through the Tangle Platform.\n *\n * Local development defaults billing enforcement off so apps can use a local\n * `TANGLE_API_KEY` without requiring a browser-linked platform account. Any\n * non-development environment defaults enforcement on. Apps may pass their own\n * override flag (`FOO_BILLING_ENFORCEMENT`) while new apps can use the shared\n * `TANGLE_BILLING_ENFORCEMENT`.\n */\nexport function isTangleBillingEnforcementDisabled(\n opts: TangleBillingEnforcementOptions = {},\n): boolean {\n const env = opts.env ?? (process.env as Record<string, string | undefined>)\n const enforcementEnvVar = opts.enforcementEnvVar ?? DEFAULT_TANGLE_BILLING_ENFORCEMENT_ENV_VAR\n const override = env[enforcementEnvVar]?.trim().toLowerCase()\n\n if (override === 'disabled') return true\n if (override === 'enabled') return false\n\n return resolveTangleExecutionEnvironment(env) === 'development'\n}\n\nexport function tangleExecutionKeyHttpError(error: unknown): TangleExecutionKeyHttpError | null {\n if (!isTangleExecutionKeyError(error)) return null\n return {\n status: error.status,\n body: {\n error: error.message,\n code: error.code,\n },\n }\n}\n\n/**\n * Resolve the user-facing Tangle API key for model execution.\n *\n * Local development may use a server env key so apps remain easy to run.\n * Deployed contexts must use the caller's linked platform key; this keeps\n * model execution, billing, and account ownership aligned across products.\n */\nexport async function resolveUserTangleExecutionKey(\n opts: ResolveUserTangleExecutionKeyOptions,\n): Promise<ResolvedTangleExecutionKey> {\n const env = opts.env ?? (process.env as Record<string, string | undefined>)\n const environment = opts.environment ?? resolveTangleExecutionEnvironment(env)\n\n if (environment === 'development') {\n const apiKey = trimOrNull(env.TANGLE_API_KEY)\n if (apiKey) return { apiKey, source: 'local-env' }\n }\n\n const apiKey = trimOrNull(await opts.getUserApiKey())\n if (apiKey) return { apiKey, source: 'user' }\n\n if (environment === 'development') {\n throw new TangleExecutionKeyError(\n 'local_tangle_api_key_required',\n 'TANGLE_API_KEY or a linked Tangle account is required for local Tangle model execution.',\n 503,\n )\n }\n\n throw new TangleExecutionKeyError(\n 'tangle_account_not_connected',\n 'Connect your Tangle account before invoking this agent.',\n 401,\n )\n}\n\nexport async function resolveUserTangleExecutionKeyForUser<UserId = string>(\n opts: ResolveUserTangleExecutionKeyForUserOptions<UserId>,\n): Promise<ResolvedTangleExecutionKey> {\n return resolveUserTangleExecutionKey({\n environment: opts.environment,\n env: opts.env,\n getUserApiKey: () => opts.getUserApiKey(opts.userId),\n })\n}\n\n/**\n * Build an OpenAI-compatible Tangle Router model config from an already\n * resolved execution key. This intentionally does not read TANGLE_API_KEY.\n */\nexport function createTangleRouterModelConfig(\n opts: CreateTangleRouterModelConfigOptions,\n): TangleModelConfig {\n const apiKey = opts.apiKey.trim()\n if (!apiKey) throw new Error('apiKey is required')\n const model = opts.model.trim()\n if (!model) throw new Error('model is required')\n return {\n provider: 'openai-compat',\n model,\n apiKey,\n baseUrl: (opts.baseUrl?.trim() || DEFAULT_TANGLE_ROUTER_BASE_URL).replace(/\\/+$/, ''),\n }\n}\n\n/**\n * Resolve the model config from env. DEFAULT path (`MODEL_PROVIDER` unset or\n * `openai-compat`/`tangle-router`/`tcloud`): the Tangle Router, authenticated\n * with `TANGLE_API_KEY`, model from `MODEL_NAME`. BYOK path\n * (`MODEL_PROVIDER=anthropic`): direct Anthropic with `ANTHROPIC_API_KEY` +\n * `ANTHROPIC_BASE_URL`. Throws (fail-loud) on a missing required var so a\n * misconfigured deploy fails at boot, not mid-turn.\n */\nexport function resolveTangleModelConfig(opts: ResolveModelOptions = {}): TangleModelConfig {\n const env = opts.env ?? (process.env as Record<string, string | undefined>)\n const provider = env.MODEL_PROVIDER?.trim() || 'openai-compat'\n const model = requireEnv(env, 'MODEL_NAME')\n\n if (provider === 'openai-compat' || provider === 'tangle-router' || provider === 'tcloud') {\n return {\n provider: 'openai-compat',\n model,\n apiKey: requireEnv(env, 'TANGLE_API_KEY'),\n baseUrl: (env.TANGLE_ROUTER_BASE_URL?.trim() || opts.defaultRouterBaseUrl || DEFAULT_TANGLE_ROUTER_BASE_URL).replace(/\\/+$/, ''),\n }\n }\n\n if (provider === 'anthropic') {\n return {\n provider,\n model,\n apiKey: requireEnv(env, 'ANTHROPIC_API_KEY'),\n baseUrl: requireEnv(env, 'ANTHROPIC_BASE_URL'),\n }\n }\n\n throw new Error(`Unsupported MODEL_PROVIDER: ${provider} (use openai-compat for the Tangle Router, or anthropic for BYOK)`)\n}\n","/**\n * OpenAI-compatible stream → `LoopEvent` adapter, for NON-sandbox copilots.\n *\n * `streamAppToolLoop` takes a `streamTurn` seam that yields `LoopEvent`s. A\n * sandboxed agent produces those from its container; a browser/edge copilot\n * instead calls a model directly. The Tangle Router, the tcloud SDK, and most\n * providers all speak the OpenAI Chat Completions streaming shape — so the ONE\n * reusable piece is assembling that stream (content deltas + FRAGMENTED\n * tool-call deltas) into `LoopEvent`s. That assembly is the boilerplate every\n * copilot would re-write (and get wrong — OpenAI streams tool-call arguments in\n * pieces across chunks).\n *\n * This does NOT implement an HTTP client beyond a minimal `fetch` + SSE reader\n * (browser/edge/Node-safe, zero deps). For richer transport use the tcloud SDK\n * or the Vercel AI SDK and pipe their stream through {@link toLoopEvents}.\n */\nimport type { LoopEvent, LoopToolCall } from './index'\n\n/** Minimal OpenAI Chat Completions streaming chunk (structural — no `openai` dep). */\nexport interface OpenAIStreamChunk {\n choices?: Array<{\n delta?: {\n content?: string | null\n tool_calls?: Array<{\n index: number\n id?: string\n function?: { name?: string; arguments?: string }\n }>\n }\n finish_reason?: string | null\n }>\n}\n\ninterface PartialToolCall {\n id?: string\n name: string\n args: string\n}\n\n/**\n * Map an OpenAI-compat streaming chunk iterator to `LoopEvent`s: each content\n * delta → a `text` event; tool-call deltas are accumulated by index across\n * chunks and emitted as one complete `tool_call` event when the stream finishes\n * (arguments JSON-parsed; an empty/garbled args string yields `{}` rather than\n * throwing). Works for the Tangle Router, tcloud, or any OpenAI-compat source.\n */\nexport async function* toLoopEvents(chunks: AsyncIterable<OpenAIStreamChunk>): AsyncIterable<LoopEvent> {\n const calls = new Map<number, PartialToolCall>()\n for await (const chunk of chunks) {\n const choice = chunk.choices?.[0]\n if (!choice) continue\n const content = choice.delta?.content\n if (content) yield { type: 'text', text: content }\n for (const tc of choice.delta?.tool_calls ?? []) {\n const cur = calls.get(tc.index) ?? { name: '', args: '' }\n if (tc.id) cur.id = tc.id\n if (tc.function?.name) cur.name += tc.function.name\n if (tc.function?.arguments) cur.args += tc.function.arguments\n calls.set(tc.index, cur)\n }\n }\n for (const [, c] of [...calls.entries()].sort((a, b) => a[0] - b[0])) {\n if (!c.name) continue\n yield { type: 'tool_call', call: { toolCallId: c.id, toolName: c.name, args: safeParse(c.args) } satisfies LoopToolCall }\n }\n}\n\nfunction safeParse(s: string): Record<string, unknown> {\n if (!s.trim()) return {}\n try {\n const v = JSON.parse(s)\n return v && typeof v === 'object' && !Array.isArray(v) ? (v as Record<string, unknown>) : {}\n } catch {\n return {}\n }\n}\n\nexport interface OpenAICompatStreamTurnOptions {\n /** OpenAI-compat base URL (e.g. the Tangle Router `https://router.tangle.tools/v1`). */\n baseUrl: string\n apiKey: string\n model: string\n /** OpenAI tool definitions — pass `buildAppToolOpenAITools(taxonomy)` so the\n * model can call the app tools. Omit for a tool-free copilot. */\n tools?: unknown[]\n temperature?: number\n fetchImpl?: typeof fetch\n /** Extra body fields (e.g. `max_tokens`). */\n extraBody?: Record<string, unknown>\n}\n\n/**\n * Build a `streamTurn` that calls an OpenAI-compatible `/chat/completions`\n * endpoint (Tangle Router / tcloud / any compat provider) with `stream: true`\n * and yields `LoopEvent`s via {@link toLoopEvents}. Browser/edge/Node-safe —\n * just `fetch` + an SSE reader. Drop straight into `streamAppToolLoop`:\n *\n * const cfg = resolveTangleModelConfig() // or { baseUrl, apiKey, model }\n * streamAppToolLoop({ streamTurn: createOpenAICompatStreamTurn({ ...cfg, tools }), executeToolCall, ... })\n */\nexport function createOpenAICompatStreamTurn(\n opts: OpenAICompatStreamTurnOptions,\n): (messages: Array<{ role: string; content: string }>) => AsyncIterable<LoopEvent> {\n const base = opts.baseUrl.replace(/\\/+$/, '')\n const doFetch = opts.fetchImpl ?? fetch\n return (messages) =>\n toLoopEvents(\n streamChatCompletions(doFetch, `${base}/chat/completions`, opts.apiKey, {\n model: opts.model,\n messages,\n stream: true,\n ...(opts.tools && opts.tools.length > 0 ? { tools: opts.tools } : {}),\n ...(opts.temperature != null ? { temperature: opts.temperature } : {}),\n ...opts.extraBody,\n }),\n )\n}\n\n/** Stream + parse an OpenAI-compat SSE response into chunks. Tolerates `data:`\n * framing, multi-line buffers, and the terminal `[DONE]`. */\nasync function* streamChatCompletions(\n doFetch: typeof fetch,\n url: string,\n apiKey: string,\n body: Record<string, unknown>,\n): AsyncIterable<OpenAIStreamChunk> {\n const res = await doFetch(url, {\n method: 'POST',\n headers: { Authorization: `Bearer ${apiKey}`, 'Content-Type': 'application/json', Accept: 'text/event-stream' },\n body: JSON.stringify(body),\n })\n if (!res.ok || !res.body) {\n const text = res.body ? await res.text().catch(() => '') : ''\n throw new Error(`OpenAI-compat stream failed (HTTP ${res.status})${text ? `: ${text.slice(0, 200)}` : ''}`)\n }\n const reader = res.body.getReader()\n const decoder = new TextDecoder()\n let buffer = ''\n for (;;) {\n const { done, value } = await reader.read()\n if (done) break\n buffer += decoder.decode(value, { stream: true })\n const lines = buffer.split('\\n')\n buffer = lines.pop() ?? ''\n for (const line of lines) {\n const trimmed = line.trim()\n if (!trimmed.startsWith('data:')) continue\n const data = trimmed.slice(5).trim()\n if (data === '[DONE]') return\n try {\n yield JSON.parse(data) as OpenAIStreamChunk\n } catch {\n /* skip a partial/garbled SSE frame */\n }\n }\n }\n}\n","/**\n * `createAgentRuntime` — the in-process agent core, assembled.\n *\n * The bricks to run an agent turn WITHOUT a sandbox already exist in this\n * package, but a consumer must hand-wire five of them every time: resolve the\n * model config, build the OpenAI tool schemas from the taxonomy, build a\n * `streamTurn` over the model endpoint, build an `executeToolCall` over the\n * product's handlers, and drive `runAppToolLoop` / `streamAppToolLoop` with an\n * `isExecutableTool` predicate. That boilerplate is identical across every\n * sandbox-free surface (an edge/browser copilot, an eval harness, a Node CLI),\n * and getting it subtly wrong — e.g. NOT advertising the tools, so the model\n * never emits a `tool_call` and no side effect ever fires — is exactly the\n * failure that makes a tool-driven agent score zero off-sandbox.\n *\n * This factory bundles those five into one object configured for ONE agent:\n *\n * const runtime = createAgentRuntime({ model, taxonomy, handlers, systemPrompt })\n * const result = await runtime.run(userMessage, { ctx }) // awaitable\n * for await (const y of runtime.stream(userMessage, { ctx })) {…} // streaming\n *\n * The model is advertised the app tools (so it CAN call them); each call is\n * dispatched against the product's `handlers` (so the side effect is real); the\n * `onProduced` hook fires at the real side-effect site (so an eval/UI credits a\n * persisted proposal or artifact). Substrate-free: no `@tangle-network/sandbox`,\n * no Durable Object, no `@tangle-network/agent-runtime` import. The SAME core\n * the Cloudflare Worker runs, runnable anywhere a `fetch` to an OpenAI-compatible\n * endpoint works.\n *\n * Domain stays out: the proposal taxonomy, the handlers, and the system prompt\n * are all injected — the factory knows nothing about insurance, law, tax, etc.\n */\nimport {\n type AppToolHandlers,\n type AppToolContext,\n type AppToolOutcome,\n type AppToolProducedEvent,\n type AppToolTaxonomy,\n} from '../tools/types'\nimport { buildAppToolOpenAITools, isAppToolName } from '../tools/openai'\nimport { createAppToolRuntimeExecutor } from '../tools/runtime'\nimport {\n runAppToolLoop,\n streamAppToolLoop,\n type LoopEvent,\n type LoopToolCall,\n type StreamLoopYield,\n type ToolLoopResult,\n} from './index'\nimport { createOpenAICompatStreamTurn } from './openai-stream'\n\n/** OpenAI-compatible model endpoint (Tangle Router / tcloud / any compat\n * provider). Build from {@link resolveTangleModelConfig} or pass literals. */\nexport interface AgentRuntimeModelConfig {\n baseUrl: string\n apiKey: string\n model: string\n temperature?: number\n fetchImpl?: typeof fetch\n /** Extra request-body fields (e.g. `max_tokens`, a `reasoning` block). */\n extraBody?: Record<string, unknown>\n}\n\nexport interface CreateAgentRuntimeOptions {\n /** The model endpoint the turns stream from. */\n model: AgentRuntimeModelConfig\n /** The product's proposal taxonomy — advertises `submit_proposal`'s `type`\n * enum to the model and labels the regulated subset on the result. */\n taxonomy: AppToolTaxonomy\n /** Domain handlers persisting each tool to the product's store/vault. */\n handlers: AppToolHandlers\n /** Default agent identity / system prompt. A turn may override it. */\n systemPrompt: string\n /** Max tool-driven re-runs per turn. Default 8. */\n maxToolTurns?: number\n /** Extra OpenAI tool definitions advertised ALONGSIDE the four app tools\n * (e.g. `integration_invoke`). Pair with {@link executeOtherTool}. */\n extraTools?: unknown[]\n /** Execute a tool that is NOT one of the four app tools (e.g. an integration\n * action). Only consulted for names {@link isOtherExecutableTool} accepts. */\n executeOtherTool?: (call: LoopToolCall, ctx: AppToolContext) => Promise<AppToolOutcome>\n /** Which non-app tool names are executable here. Required if {@link executeOtherTool} is set. */\n isOtherExecutableTool?: (toolName: string) => boolean\n}\n\nexport interface AgentTurnOptions {\n /** The trusted per-turn context (who/where the turn runs as). */\n ctx: AppToolContext\n /** Prior conversation turns, in order. */\n priorMessages?: Array<{ role: string; content: string }>\n /** Override the factory's default system prompt for this turn. */\n systemPrompt?: string\n /** Fires at the real side-effect site for each produced proposal/artifact. */\n onProduced?: (event: AppToolProducedEvent) => void\n}\n\nexport interface AgentRuntime {\n /** Run the bounded tool loop to completion; resolve with final text + every\n * executed tool outcome. */\n run(userMessage: string, turn: AgentTurnOptions): Promise<ToolLoopResult>\n /** Stream the bounded tool loop: yields each raw model event and each executed\n * tool result as it happens (for SSE re-emission + telemetry). */\n stream(userMessage: string, turn: AgentTurnOptions): AsyncGenerator<StreamLoopYield<LoopEvent>, void, unknown>\n}\n\n/**\n * Create an in-process agent runtime for one agent. See the module doc for the\n * full rationale; the short version: it advertises the app tools to the model,\n * dispatches each emitted call against `handlers`, and drives the bounded loop —\n * the whole agent core, sandbox-free.\n */\nexport function createAgentRuntime(opts: CreateAgentRuntimeOptions): AgentRuntime {\n if (opts.executeOtherTool && !opts.isOtherExecutableTool) {\n throw new Error('createAgentRuntime: isOtherExecutableTool is required when executeOtherTool is set')\n }\n\n // Tool schemas + the streamTurn are stable across turns — build once. The\n // model MUST be advertised the tools or it never emits a tool_call (the exact\n // failure that scores a tool-driven agent zero off-sandbox).\n const tools = [...buildAppToolOpenAITools(opts.taxonomy), ...(opts.extraTools ?? [])]\n const m = opts.model\n const streamTurn = createOpenAICompatStreamTurn({\n baseUrl: m.baseUrl,\n apiKey: m.apiKey,\n model: m.model,\n tools,\n temperature: m.temperature,\n fetchImpl: m.fetchImpl,\n extraBody: m.extraBody,\n })\n\n const isExecutableTool = (name: string): boolean =>\n isAppToolName(name) || (opts.isOtherExecutableTool?.(name) ?? false)\n\n const buildExecutor = (turn: AgentTurnOptions) => {\n const appExecutor = createAppToolRuntimeExecutor({\n handlers: opts.handlers,\n taxonomy: opts.taxonomy,\n ctx: turn.ctx,\n onProduced: turn.onProduced,\n })\n return async (call: LoopToolCall): Promise<AppToolOutcome> => {\n if (isAppToolName(call.toolName)) return appExecutor({ toolName: call.toolName, args: call.args })\n if (opts.executeOtherTool && opts.isOtherExecutableTool?.(call.toolName)) {\n return opts.executeOtherTool(call, turn.ctx)\n }\n return { ok: false, code: 'unknown_tool', message: `No executor for tool: ${call.toolName}` }\n }\n }\n\n return {\n run(userMessage, turn) {\n return runAppToolLoop({\n systemPrompt: turn.systemPrompt ?? opts.systemPrompt,\n userMessage,\n priorMessages: turn.priorMessages,\n streamTurn,\n executeToolCall: buildExecutor(turn),\n isExecutableTool,\n maxToolTurns: opts.maxToolTurns,\n })\n },\n stream(userMessage, turn) {\n return streamAppToolLoop<LoopEvent>({\n systemPrompt: turn.systemPrompt ?? opts.systemPrompt,\n userMessage,\n priorMessages: turn.priorMessages,\n streamTurn,\n extractText: (ev) => (ev.type === 'text' ? ev.text : ''),\n extractToolCall: (ev) => (ev.type === 'tool_call' ? ev.call : null),\n isExecutableTool,\n executeToolCall: buildExecutor(turn),\n maxToolTurns: opts.maxToolTurns,\n })\n },\n }\n}\n","export * from './model'\nexport * from './openai-stream'\nexport * from './agent'\n/**\n * The bounded agent tool-loop — the mechanism every app's chat runtime\n * hand-rolls on top of `@tangle-network/agent-runtime`.\n *\n * A model turn may emit tool calls (integration-hub actions, the app tools from\n * `../tools`, delegation). The loop: stream a turn, collect the executable tool\n * calls, stop if there are none / no executor / the turn cap is hit, otherwise\n * execute each, fold the results back as a message, and re-run so the model\n * reads them. Bounded by `maxToolTurns` so a model looping on a failing action\n * can't run forever.\n *\n * Substrate-free by design: the app supplies `streamTurn` (wrapping whatever\n * backend / `runAgentTaskStream` it uses) and `executeToolCall` (routing to its\n * integration + app-tool executors). This package owns the LOOP; the app owns\n * the model and the executors.\n *\n * LAYERING NOTE: this turn-level tool-dispatch loop is a generic RUNTIME\n * capability. It has been CONTRIBUTED DOWN and MERGED into\n * `@tangle-network/agent-runtime` as `runToolLoop` / `streamToolLoop` (PR #137),\n * but is not yet PUBLISHED (agent-runtime main is ahead of its last npm release;\n * cutting that release is the agent-runtime maintainer's call). TERMINAL STATE:\n * the moment agent-runtime publishes a version carrying #137, bump the\n * `@tangle-network/agent-runtime` peer-dep here and replace the bodies below with\n * a thin re-export — `streamAppToolLoop = streamToolLoop`, `runAppToolLoop =\n * runToolLoop` (types alias 1:1; `AppToolOutcome` ≡ `ToolCallOutcome`). Kept\n * substrate-free + shipping until then so consumers aren't blocked on the release.\n */\nimport type { AppToolOutcome } from '../tools/types'\n\nexport interface LoopToolCall {\n toolCallId?: string\n toolName: string\n args: Record<string, unknown>\n}\n\n/** Events a turn stream yields. `text` accumulates into the final answer;\n * `tool_call` is collected for dispatch. Extra event types pass through\n * untouched (the caller re-emits them to its own UI stream). */\nexport type LoopEvent =\n | { type: 'text'; text: string }\n | { type: 'tool_call'; call: LoopToolCall }\n | { type: 'other'; event: unknown }\n\nexport interface ToolLoopResult {\n /** The model's final text across the loop. */\n finalText: string\n /** Every tool call executed, with its outcome, in order. */\n toolResults: Array<{ call: LoopToolCall; label: string; outcome: AppToolOutcome }>\n /** Number of model turns run (1 + tool-driven re-runs). */\n turns: number\n /** True when the loop stopped because it hit `maxToolTurns` with calls still pending. */\n cappedOut: boolean\n}\n\nexport interface AppToolLoopOptions {\n systemPrompt: string\n userMessage: string\n priorMessages?: Array<{ role: string; content: string }>\n /** Stream one model turn over the running message list. The app wraps its\n * backend here. */\n streamTurn: (messages: Array<{ role: string; content: string }>) => AsyncIterable<LoopEvent>\n /** Execute one tool call. The app routes to its integration executor / app-tool\n * executor and returns the outcome. */\n executeToolCall: (call: LoopToolCall) => Promise<AppToolOutcome>\n /** Which emitted tool names are executable (others are ignored — e.g. a UI-only\n * tool the app renders but doesn't run here). */\n isExecutableTool: (toolName: string) => boolean\n /** Max tool-driven re-runs. Default 8. */\n maxToolTurns?: number\n /** Render one tool outcome as a line the next turn's message carries. Default\n * is a compact `- <label> → ok/failed: …`. */\n renderResult?: (label: string, outcome: AppToolOutcome) => string\n /** Map a tool call to the label its result is keyed under (default: toolName). */\n labelFor?: (call: LoopToolCall) => string\n}\n\nconst DEFAULT_MAX_TOOL_TURNS = 8\n\nfunction defaultRender(label: string, outcome: AppToolOutcome): string {\n if (outcome.ok) return `- ${label} → ok: ${JSON.stringify(outcome.result)}`\n return `- ${label} → failed (${outcome.code}): ${outcome.message}`\n}\n\n/**\n * Run the bounded tool loop and return the final text + every executed tool\n * outcome. Yields nothing — it's an awaitable driver; callers that need to\n * re-emit events to a UI stream should do so inside `streamTurn`. (A streaming\n * variant can wrap this later; keeping the core awaitable makes it trivially\n * testable.)\n */\nexport async function runAppToolLoop(opts: AppToolLoopOptions): Promise<ToolLoopResult> {\n const maxTurns = opts.maxToolTurns ?? DEFAULT_MAX_TOOL_TURNS\n const render = opts.renderResult ?? defaultRender\n const labelFor = opts.labelFor ?? ((c: LoopToolCall) => c.toolName)\n\n const messages: Array<{ role: string; content: string }> = [\n { role: 'system', content: opts.systemPrompt },\n ...(opts.priorMessages ?? []),\n { role: 'user', content: opts.userMessage },\n ]\n\n const toolResults: ToolLoopResult['toolResults'] = []\n let finalText = ''\n let turns = 0\n\n for (let toolTurn = 0; ; toolTurn++) {\n turns++\n let turnText = ''\n const pending: LoopToolCall[] = []\n\n for await (const ev of opts.streamTurn([...messages])) {\n if (ev.type === 'text') {\n turnText += ev.text\n finalText += ev.text\n } else if (ev.type === 'tool_call' && opts.isExecutableTool(ev.call.toolName)) {\n pending.push(ev.call)\n }\n }\n\n if (pending.length === 0) break\n if (toolTurn >= maxTurns) {\n return { finalText, toolResults, turns, cappedOut: true }\n }\n\n // Record the assistant's tool-calling turn so the next turn has its context.\n if (turnText.trim()) messages.push({ role: 'assistant', content: turnText })\n\n const lines: string[] = []\n for (const call of pending) {\n let outcome: AppToolOutcome\n try {\n outcome = await opts.executeToolCall(call)\n } catch (err) {\n outcome = { ok: false, code: 'executor_error', message: err instanceof Error ? err.message : String(err) }\n }\n const label = labelFor(call)\n toolResults.push({ call, label, outcome })\n lines.push(render(label, outcome))\n }\n // Fold every outcome back as one user-role message so the model reads them.\n messages.push({ role: 'user', content: `Tool results:\\n${lines.join('\\n')}` })\n }\n\n return { finalText, toolResults, turns, cappedOut: false }\n}\n\n// ── Streaming variant ──────────────────────────────────────────────────────\n//\n// `runAppToolLoop` is awaitable — perfect for tests and drain-only callers. A\n// real chat runtime instead needs to STREAM each model event to the client (SSE)\n// AND record telemetry per event as it happens. `streamAppToolLoop` is the same\n// bounded loop as an async generator: it yields every raw turn event (the app\n// maps + telemetries + re-emits it) and every executed tool result (same), while\n// owning the loop control flow (collect → stop/dispatch → fold → re-run, capped).\n// `Raw` is the app's own runtime-event type — this package stays substrate-free.\n\nexport type StreamLoopYield<Raw> =\n | { kind: 'event'; event: Raw }\n | { kind: 'tool_result'; toolName: string; toolCallId?: string; label: string; outcome: AppToolOutcome }\n | { kind: 'capped'; pending: number }\n\nexport interface StreamAppToolLoopOptions<Raw> {\n systemPrompt: string\n userMessage: string\n priorMessages?: Array<{ role: string; content: string }>\n /** Stream one model turn (the app wraps its backend / runAgentTaskStream). */\n streamTurn: (messages: Array<{ role: string; content: string }>) => AsyncIterable<Raw>\n /** Text contribution of a raw event, '' if none — used to record the\n * assistant's turn so the next turn has its context. */\n extractText: (event: Raw) => string\n /** The tool call a raw event represents, or null. */\n extractToolCall: (event: Raw) => LoopToolCall | null\n /** Which tool names are executable here (others pass through, unexecuted). */\n isExecutableTool: (toolName: string) => boolean\n /** Execute one call — the app routes to its integration / app-tool executor. */\n executeToolCall: (call: LoopToolCall) => Promise<AppToolOutcome>\n maxToolTurns?: number\n renderResult?: (label: string, outcome: AppToolOutcome) => string\n labelFor?: (call: LoopToolCall) => string\n}\n\n/**\n * The streaming bounded tool loop. Yields `event` for each raw turn event and\n * `tool_result` for each executed tool; emits a single `capped` when it stops at\n * the turn limit with calls still pending. The app drives telemetry + UI\n * emission off the yielded items.\n */\nexport async function* streamAppToolLoop<Raw>(opts: StreamAppToolLoopOptions<Raw>): AsyncGenerator<StreamLoopYield<Raw>, void, unknown> {\n const maxTurns = opts.maxToolTurns ?? DEFAULT_MAX_TOOL_TURNS\n const render = opts.renderResult ?? defaultRender\n const labelFor = opts.labelFor ?? ((c: LoopToolCall) => c.toolName)\n\n const messages: Array<{ role: string; content: string }> = [\n { role: 'system', content: opts.systemPrompt },\n ...(opts.priorMessages ?? []),\n { role: 'user', content: opts.userMessage },\n ]\n\n for (let toolTurn = 0; ; toolTurn++) {\n let turnText = ''\n const pending: LoopToolCall[] = []\n\n for await (const event of opts.streamTurn([...messages])) {\n yield { kind: 'event', event }\n turnText += opts.extractText(event)\n const call = opts.extractToolCall(event)\n if (call && opts.isExecutableTool(call.toolName)) pending.push(call)\n }\n\n if (pending.length === 0) return\n if (toolTurn >= maxTurns) {\n yield { kind: 'capped', pending: pending.length }\n return\n }\n\n if (turnText.trim()) messages.push({ role: 'assistant', content: turnText })\n\n const lines: string[] = []\n for (const call of pending) {\n let outcome: AppToolOutcome\n try {\n outcome = await opts.executeToolCall(call)\n } catch (err) {\n outcome = { ok: false, code: 'executor_error', message: err instanceof Error ? err.message : String(err) }\n }\n const label = labelFor(call)\n yield { kind: 'tool_result', toolName: call.toolName, toolCallId: call.toolCallId, label, outcome }\n lines.push(render(label, outcome))\n }\n messages.push({ role: 'user', content: `Tool results:\\n${lines.join('\\n')}` })\n }\n}\n"],"mappings":";;;;;;;AA6EO,IAAM,iCAAiC;AACvC,IAAM,6CAA6C;AAE1D,SAAS,WAAW,KAAyC,MAAsB;AACjF,QAAM,QAAQ,IAAI,IAAI,GAAG,KAAK;AAC9B,MAAI,CAAC,MAAO,OAAM,IAAI,MAAM,GAAG,IAAI,cAAc;AACjD,SAAO;AACT;AAEA,SAAS,WAAW,OAAiD;AACnE,QAAM,UAAU,OAAO,KAAK;AAC5B,SAAO,UAAU,UAAU;AAC7B;AAEA,SAAS,8BAA8B,OAAsD;AAC3F,SAAO,UAAU,mCAAmC,UAAU;AAChE;AAEO,IAAM,0BAAN,cAAsC,MAAM;AAAA,EACxC;AAAA,EACA;AAAA,EAET,YAAY,MAAmC,SAAiB,QAAgB;AAC9E,UAAM,OAAO;AACb,SAAK,OAAO;AACZ,SAAK,OAAO;AACZ,SAAK,SAAS;AAAA,EAChB;AACF;AAEO,SAAS,0BAA0B,OAAkD;AAC1F,SAAO,iBAAiB,2BAEpB,OAAO,UAAU,YACd,UAAU,QACT,MAA6B,SAAS,6BACvC,OAAQ,MAAgC,YAAY,YACpD,8BAA+B,MAA6B,IAAI,KAChE,OAAQ,MAA+B,WAAW;AAE3D;AAEO,SAAS,kCACd,MAA0C,QAAQ,KACtB;AAC5B,QAAM,OAAO,IAAI,WAAW,IAAI,YAAY,IAAI,KAAK,EAAE,YAAY;AACnE,MAAI,QAAQ,iBAAiB,QAAQ,SAAS,QAAQ,QAAS,QAAO;AACtE,MAAI,QAAQ,UAAW,QAAO;AAC9B,MAAI,QAAQ,OAAQ,QAAO;AAC3B,SAAO;AACT;AAWO,SAAS,mCACd,OAAwC,CAAC,GAChC;AACT,QAAM,MAAM,KAAK,OAAQ,QAAQ;AACjC,QAAM,oBAAoB,KAAK,qBAAqB;AACpD,QAAM,WAAW,IAAI,iBAAiB,GAAG,KAAK,EAAE,YAAY;AAE5D,MAAI,aAAa,WAAY,QAAO;AACpC,MAAI,aAAa,UAAW,QAAO;AAEnC,SAAO,kCAAkC,GAAG,MAAM;AACpD;AAEO,SAAS,4BAA4B,OAAoD;AAC9F,MAAI,CAAC,0BAA0B,KAAK,EAAG,QAAO;AAC9C,SAAO;AAAA,IACL,QAAQ,MAAM;AAAA,IACd,MAAM;AAAA,MACJ,OAAO,MAAM;AAAA,MACb,MAAM,MAAM;AAAA,IACd;AAAA,EACF;AACF;AASA,eAAsB,8BACpB,MACqC;AACrC,QAAM,MAAM,KAAK,OAAQ,QAAQ;AACjC,QAAM,cAAc,KAAK,eAAe,kCAAkC,GAAG;AAE7E,MAAI,gBAAgB,eAAe;AACjC,UAAMA,UAAS,WAAW,IAAI,cAAc;AAC5C,QAAIA,QAAQ,QAAO,EAAE,QAAAA,SAAQ,QAAQ,YAAY;AAAA,EACnD;AAEA,QAAM,SAAS,WAAW,MAAM,KAAK,cAAc,CAAC;AACpD,MAAI,OAAQ,QAAO,EAAE,QAAQ,QAAQ,OAAO;AAE5C,MAAI,gBAAgB,eAAe;AACjC,UAAM,IAAI;AAAA,MACR;AAAA,MACA;AAAA,MACA;AAAA,IACF;AAAA,EACF;AAEA,QAAM,IAAI;AAAA,IACR;AAAA,IACA;AAAA,IACA;AAAA,EACF;AACF;AAEA,eAAsB,qCACpB,MACqC;AACrC,SAAO,8BAA8B;AAAA,IACnC,aAAa,KAAK;AAAA,IAClB,KAAK,KAAK;AAAA,IACV,eAAe,MAAM,KAAK,cAAc,KAAK,MAAM;AAAA,EACrD,CAAC;AACH;AAMO,SAAS,8BACd,MACmB;AACnB,QAAM,SAAS,KAAK,OAAO,KAAK;AAChC,MAAI,CAAC,OAAQ,OAAM,IAAI,MAAM,oBAAoB;AACjD,QAAM,QAAQ,KAAK,MAAM,KAAK;AAC9B,MAAI,CAAC,MAAO,OAAM,IAAI,MAAM,mBAAmB;AAC/C,SAAO;AAAA,IACL,UAAU;AAAA,IACV;AAAA,IACA;AAAA,IACA,UAAU,KAAK,SAAS,KAAK,KAAK,gCAAgC,QAAQ,QAAQ,EAAE;AAAA,EACtF;AACF;AAUO,SAAS,yBAAyB,OAA4B,CAAC,GAAsB;AAC1F,QAAM,MAAM,KAAK,OAAQ,QAAQ;AACjC,QAAM,WAAW,IAAI,gBAAgB,KAAK,KAAK;AAC/C,QAAM,QAAQ,WAAW,KAAK,YAAY;AAE1C,MAAI,aAAa,mBAAmB,aAAa,mBAAmB,aAAa,UAAU;AACzF,WAAO;AAAA,MACL,UAAU;AAAA,MACV;AAAA,MACA,QAAQ,WAAW,KAAK,gBAAgB;AAAA,MACxC,UAAU,IAAI,wBAAwB,KAAK,KAAK,KAAK,wBAAwB,gCAAgC,QAAQ,QAAQ,EAAE;AAAA,IACjI;AAAA,EACF;AAEA,MAAI,aAAa,aAAa;AAC5B,WAAO;AAAA,MACL;AAAA,MACA;AAAA,MACA,QAAQ,WAAW,KAAK,mBAAmB;AAAA,MAC3C,SAAS,WAAW,KAAK,oBAAoB;AAAA,IAC/C;AAAA,EACF;AAEA,QAAM,IAAI,MAAM,+BAA+B,QAAQ,mEAAmE;AAC5H;;;ACrNA,gBAAuB,aAAa,QAAoE;AACtG,QAAM,QAAQ,oBAAI,IAA6B;AAC/C,mBAAiB,SAAS,QAAQ;AAChC,UAAM,SAAS,MAAM,UAAU,CAAC;AAChC,QAAI,CAAC,OAAQ;AACb,UAAM,UAAU,OAAO,OAAO;AAC9B,QAAI,QAAS,OAAM,EAAE,MAAM,QAAQ,MAAM,QAAQ;AACjD,eAAW,MAAM,OAAO,OAAO,cAAc,CAAC,GAAG;AAC/C,YAAM,MAAM,MAAM,IAAI,GAAG,KAAK,KAAK,EAAE,MAAM,IAAI,MAAM,GAAG;AACxD,UAAI,GAAG,GAAI,KAAI,KAAK,GAAG;AACvB,UAAI,GAAG,UAAU,KAAM,KAAI,QAAQ,GAAG,SAAS;AAC/C,UAAI,GAAG,UAAU,UAAW,KAAI,QAAQ,GAAG,SAAS;AACpD,YAAM,IAAI,GAAG,OAAO,GAAG;AAAA,IACzB;AAAA,EACF;AACA,aAAW,CAAC,EAAE,CAAC,KAAK,CAAC,GAAG,MAAM,QAAQ,CAAC,EAAE,KAAK,CAAC,GAAG,MAAM,EAAE,CAAC,IAAI,EAAE,CAAC,CAAC,GAAG;AACpE,QAAI,CAAC,EAAE,KAAM;AACb,UAAM,EAAE,MAAM,aAAa,MAAM,EAAE,YAAY,EAAE,IAAI,UAAU,EAAE,MAAM,MAAM,UAAU,EAAE,IAAI,EAAE,EAAyB;AAAA,EAC1H;AACF;AAEA,SAAS,UAAU,GAAoC;AACrD,MAAI,CAAC,EAAE,KAAK,EAAG,QAAO,CAAC;AACvB,MAAI;AACF,UAAM,IAAI,KAAK,MAAM,CAAC;AACtB,WAAO,KAAK,OAAO,MAAM,YAAY,CAAC,MAAM,QAAQ,CAAC,IAAK,IAAgC,CAAC;AAAA,EAC7F,QAAQ;AACN,WAAO,CAAC;AAAA,EACV;AACF;AAyBO,SAAS,6BACd,MACkF;AAClF,QAAM,OAAO,KAAK,QAAQ,QAAQ,QAAQ,EAAE;AAC5C,QAAM,UAAU,KAAK,aAAa;AAClC,SAAO,CAAC,aACN;AAAA,IACE,sBAAsB,SAAS,GAAG,IAAI,qBAAqB,KAAK,QAAQ;AAAA,MACtE,OAAO,KAAK;AAAA,MACZ;AAAA,MACA,QAAQ;AAAA,MACR,GAAI,KAAK,SAAS,KAAK,MAAM,SAAS,IAAI,EAAE,OAAO,KAAK,MAAM,IAAI,CAAC;AAAA,MACnE,GAAI,KAAK,eAAe,OAAO,EAAE,aAAa,KAAK,YAAY,IAAI,CAAC;AAAA,MACpE,GAAG,KAAK;AAAA,IACV,CAAC;AAAA,EACH;AACJ;AAIA,gBAAgB,sBACd,SACA,KACA,QACA,MACkC;AAClC,QAAM,MAAM,MAAM,QAAQ,KAAK;AAAA,IAC7B,QAAQ;AAAA,IACR,SAAS,EAAE,eAAe,UAAU,MAAM,IAAI,gBAAgB,oBAAoB,QAAQ,oBAAoB;AAAA,IAC9G,MAAM,KAAK,UAAU,IAAI;AAAA,EAC3B,CAAC;AACD,MAAI,CAAC,IAAI,MAAM,CAAC,IAAI,MAAM;AACxB,UAAM,OAAO,IAAI,OAAO,MAAM,IAAI,KAAK,EAAE,MAAM,MAAM,EAAE,IAAI;AAC3D,UAAM,IAAI,MAAM,qCAAqC,IAAI,MAAM,IAAI,OAAO,KAAK,KAAK,MAAM,GAAG,GAAG,CAAC,KAAK,EAAE,EAAE;AAAA,EAC5G;AACA,QAAM,SAAS,IAAI,KAAK,UAAU;AAClC,QAAM,UAAU,IAAI,YAAY;AAChC,MAAI,SAAS;AACb,aAAS;AACP,UAAM,EAAE,MAAM,MAAM,IAAI,MAAM,OAAO,KAAK;AAC1C,QAAI,KAAM;AACV,cAAU,QAAQ,OAAO,OAAO,EAAE,QAAQ,KAAK,CAAC;AAChD,UAAM,QAAQ,OAAO,MAAM,IAAI;AAC/B,aAAS,MAAM,IAAI,KAAK;AACxB,eAAW,QAAQ,OAAO;AACxB,YAAM,UAAU,KAAK,KAAK;AAC1B,UAAI,CAAC,QAAQ,WAAW,OAAO,EAAG;AAClC,YAAM,OAAO,QAAQ,MAAM,CAAC,EAAE,KAAK;AACnC,UAAI,SAAS,SAAU;AACvB,UAAI;AACF,cAAM,KAAK,MAAM,IAAI;AAAA,MACvB,QAAQ;AAAA,MAER;AAAA,IACF;AAAA,EACF;AACF;;;AC9CO,SAAS,mBAAmB,MAA+C;AAChF,MAAI,KAAK,oBAAoB,CAAC,KAAK,uBAAuB;AACxD,UAAM,IAAI,MAAM,oFAAoF;AAAA,EACtG;AAKA,QAAM,QAAQ,CAAC,GAAG,wBAAwB,KAAK,QAAQ,GAAG,GAAI,KAAK,cAAc,CAAC,CAAE;AACpF,QAAM,IAAI,KAAK;AACf,QAAM,aAAa,6BAA6B;AAAA,IAC9C,SAAS,EAAE;AAAA,IACX,QAAQ,EAAE;AAAA,IACV,OAAO,EAAE;AAAA,IACT;AAAA,IACA,aAAa,EAAE;AAAA,IACf,WAAW,EAAE;AAAA,IACb,WAAW,EAAE;AAAA,EACf,CAAC;AAED,QAAM,mBAAmB,CAAC,SACxB,cAAc,IAAI,MAAM,KAAK,wBAAwB,IAAI,KAAK;AAEhE,QAAM,gBAAgB,CAAC,SAA2B;AAChD,UAAM,cAAc,6BAA6B;AAAA,MAC/C,UAAU,KAAK;AAAA,MACf,UAAU,KAAK;AAAA,MACf,KAAK,KAAK;AAAA,MACV,YAAY,KAAK;AAAA,IACnB,CAAC;AACD,WAAO,OAAO,SAAgD;AAC5D,UAAI,cAAc,KAAK,QAAQ,EAAG,QAAO,YAAY,EAAE,UAAU,KAAK,UAAU,MAAM,KAAK,KAAK,CAAC;AACjG,UAAI,KAAK,oBAAoB,KAAK,wBAAwB,KAAK,QAAQ,GAAG;AACxE,eAAO,KAAK,iBAAiB,MAAM,KAAK,GAAG;AAAA,MAC7C;AACA,aAAO,EAAE,IAAI,OAAO,MAAM,gBAAgB,SAAS,yBAAyB,KAAK,QAAQ,GAAG;AAAA,IAC9F;AAAA,EACF;AAEA,SAAO;AAAA,IACL,IAAI,aAAa,MAAM;AACrB,aAAO,eAAe;AAAA,QACpB,cAAc,KAAK,gBAAgB,KAAK;AAAA,QACxC;AAAA,QACA,eAAe,KAAK;AAAA,QACpB;AAAA,QACA,iBAAiB,cAAc,IAAI;AAAA,QACnC;AAAA,QACA,cAAc,KAAK;AAAA,MACrB,CAAC;AAAA,IACH;AAAA,IACA,OAAO,aAAa,MAAM;AACxB,aAAO,kBAA6B;AAAA,QAClC,cAAc,KAAK,gBAAgB,KAAK;AAAA,QACxC;AAAA,QACA,eAAe,KAAK;AAAA,QACpB;AAAA,QACA,aAAa,CAAC,OAAQ,GAAG,SAAS,SAAS,GAAG,OAAO;AAAA,QACrD,iBAAiB,CAAC,OAAQ,GAAG,SAAS,cAAc,GAAG,OAAO;AAAA,QAC9D;AAAA,QACA,iBAAiB,cAAc,IAAI;AAAA,QACnC,cAAc,KAAK;AAAA,MACrB,CAAC;AAAA,IACH;AAAA,EACF;AACF;;;AChGA,IAAM,yBAAyB;AAE/B,SAAS,cAAc,OAAe,SAAiC;AACrE,MAAI,QAAQ,GAAI,QAAO,KAAK,KAAK,eAAU,KAAK,UAAU,QAAQ,MAAM,CAAC;AACzE,SAAO,KAAK,KAAK,mBAAc,QAAQ,IAAI,MAAM,QAAQ,OAAO;AAClE;AASA,eAAsB,eAAe,MAAmD;AACtF,QAAM,WAAW,KAAK,gBAAgB;AACtC,QAAM,SAAS,KAAK,gBAAgB;AACpC,QAAM,WAAW,KAAK,aAAa,CAAC,MAAoB,EAAE;AAE1D,QAAM,WAAqD;AAAA,IACzD,EAAE,MAAM,UAAU,SAAS,KAAK,aAAa;AAAA,IAC7C,GAAI,KAAK,iBAAiB,CAAC;AAAA,IAC3B,EAAE,MAAM,QAAQ,SAAS,KAAK,YAAY;AAAA,EAC5C;AAEA,QAAM,cAA6C,CAAC;AACpD,MAAI,YAAY;AAChB,MAAI,QAAQ;AAEZ,WAAS,WAAW,KAAK,YAAY;AACnC;AACA,QAAI,WAAW;AACf,UAAM,UAA0B,CAAC;AAEjC,qBAAiB,MAAM,KAAK,WAAW,CAAC,GAAG,QAAQ,CAAC,GAAG;AACrD,UAAI,GAAG,SAAS,QAAQ;AACtB,oBAAY,GAAG;AACf,qBAAa,GAAG;AAAA,MAClB,WAAW,GAAG,SAAS,eAAe,KAAK,iBAAiB,GAAG,KAAK,QAAQ,GAAG;AAC7E,gBAAQ,KAAK,GAAG,IAAI;AAAA,MACtB;AAAA,IACF;AAEA,QAAI,QAAQ,WAAW,EAAG;AAC1B,QAAI,YAAY,UAAU;AACxB,aAAO,EAAE,WAAW,aAAa,OAAO,WAAW,KAAK;AAAA,IAC1D;AAGA,QAAI,SAAS,KAAK,EAAG,UAAS,KAAK,EAAE,MAAM,aAAa,SAAS,SAAS,CAAC;AAE3E,UAAM,QAAkB,CAAC;AACzB,eAAW,QAAQ,SAAS;AAC1B,UAAI;AACJ,UAAI;AACF,kBAAU,MAAM,KAAK,gBAAgB,IAAI;AAAA,MAC3C,SAAS,KAAK;AACZ,kBAAU,EAAE,IAAI,OAAO,MAAM,kBAAkB,SAAS,eAAe,QAAQ,IAAI,UAAU,OAAO,GAAG,EAAE;AAAA,MAC3G;AACA,YAAM,QAAQ,SAAS,IAAI;AAC3B,kBAAY,KAAK,EAAE,MAAM,OAAO,QAAQ,CAAC;AACzC,YAAM,KAAK,OAAO,OAAO,OAAO,CAAC;AAAA,IACnC;AAEA,aAAS,KAAK,EAAE,MAAM,QAAQ,SAAS;AAAA,EAAkB,MAAM,KAAK,IAAI,CAAC,GAAG,CAAC;AAAA,EAC/E;AAEA,SAAO,EAAE,WAAW,aAAa,OAAO,WAAW,MAAM;AAC3D;AA2CA,gBAAuB,kBAAuB,MAA0F;AACtI,QAAM,WAAW,KAAK,gBAAgB;AACtC,QAAM,SAAS,KAAK,gBAAgB;AACpC,QAAM,WAAW,KAAK,aAAa,CAAC,MAAoB,EAAE;AAE1D,QAAM,WAAqD;AAAA,IACzD,EAAE,MAAM,UAAU,SAAS,KAAK,aAAa;AAAA,IAC7C,GAAI,KAAK,iBAAiB,CAAC;AAAA,IAC3B,EAAE,MAAM,QAAQ,SAAS,KAAK,YAAY;AAAA,EAC5C;AAEA,WAAS,WAAW,KAAK,YAAY;AACnC,QAAI,WAAW;AACf,UAAM,UAA0B,CAAC;AAEjC,qBAAiB,SAAS,KAAK,WAAW,CAAC,GAAG,QAAQ,CAAC,GAAG;AACxD,YAAM,EAAE,MAAM,SAAS,MAAM;AAC7B,kBAAY,KAAK,YAAY,KAAK;AAClC,YAAM,OAAO,KAAK,gBAAgB,KAAK;AACvC,UAAI,QAAQ,KAAK,iBAAiB,KAAK,QAAQ,EAAG,SAAQ,KAAK,IAAI;AAAA,IACrE;AAEA,QAAI,QAAQ,WAAW,EAAG;AAC1B,QAAI,YAAY,UAAU;AACxB,YAAM,EAAE,MAAM,UAAU,SAAS,QAAQ,OAAO;AAChD;AAAA,IACF;AAEA,QAAI,SAAS,KAAK,EAAG,UAAS,KAAK,EAAE,MAAM,aAAa,SAAS,SAAS,CAAC;AAE3E,UAAM,QAAkB,CAAC;AACzB,eAAW,QAAQ,SAAS;AAC1B,UAAI;AACJ,UAAI;AACF,kBAAU,MAAM,KAAK,gBAAgB,IAAI;AAAA,MAC3C,SAAS,KAAK;AACZ,kBAAU,EAAE,IAAI,OAAO,MAAM,kBAAkB,SAAS,eAAe,QAAQ,IAAI,UAAU,OAAO,GAAG,EAAE;AAAA,MAC3G;AACA,YAAM,QAAQ,SAAS,IAAI;AAC3B,YAAM,EAAE,MAAM,eAAe,UAAU,KAAK,UAAU,YAAY,KAAK,YAAY,OAAO,QAAQ;AAClG,YAAM,KAAK,OAAO,OAAO,OAAO,CAAC;AAAA,IACnC;AACA,aAAS,KAAK,EAAE,MAAM,QAAQ,SAAS;AAAA,EAAkB,MAAM,KAAK,IAAI,CAAC,GAAG,CAAC;AAAA,EAC/E;AACF;","names":["apiKey"]}
|
package/dist/config/index.d.ts
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
import { KnowledgeRequirementSpec } from '../knowledge/index.js';
|
|
2
2
|
export { SatisfiedByRule } from '../knowledge/index.js';
|
|
3
|
-
import {
|
|
3
|
+
import { j as TangleModelConfig } from '../model-CKzniMMr.js';
|
|
4
4
|
import '@tangle-network/agent-eval';
|
|
5
5
|
|
|
6
6
|
/**
|