open-agents-ai 0.185.71 → 0.185.72
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +5 -5
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1286,7 +1286,7 @@ The voice narration system produces **zero static phrase pools** — every spoke
|
|
|
1286
1286
|
| 2 | Contextual opener | "Moving to voice.ts" |
|
|
1287
1287
|
| 3 | Gerund-led | "Taking a deeper look at voice.ts now" |
|
|
1288
1288
|
|
|
1289
|
-
**Ring buffer deduplication** ([Moshi inner monologue, arXiv:2410.00037](https://arxiv.org/abs/2410.00037)): A sliding window of the last 8 utterances catches near-duplicates via Jaccard word-level similarity (threshold 0.7). When a near-duplicate is detected, **DITTO adaptive rotation** ([arXiv:2206.02369](https://arxiv.org/abs/2206.02369), NeurIPS 2022) advances the structure pattern by 2 positions to break self-reinforcing repetition loops.
|
|
1289
|
+
**Ring buffer deduplication** ([Moshi inner monologue, [arXiv:2410.00037](https://arxiv.org/abs/2410.00037)](https://arxiv.org/abs/2410.00037)): A sliding window of the last 8 utterances catches near-duplicates via Jaccard word-level similarity (threshold 0.7). When a near-duplicate is detected, **DITTO adaptive rotation** ([arXiv:2206.02369](https://arxiv.org/abs/2206.02369), NeurIPS 2022) advances the structure pattern by 2 positions to break self-reinforcing repetition loops.
|
|
1290
1290
|
|
|
1291
1291
|
**State-computed emotion interjections**: Instead of word pools, emotion interjections are computed from real session metrics. The emotion quadrant (from ADV coordinates) determines *which* metrics to surface:
|
|
1292
1292
|
|
|
@@ -1299,7 +1299,7 @@ The voice narration system produces **zero static phrase pools** — every spoke
|
|
|
1299
1299
|
|
|
1300
1300
|
### Emotion-Driven Prosody (SEST)
|
|
1301
1301
|
|
|
1302
|
-
The voice engine modulates **three prosodic dimensions** from the emotion state — text vocabulary stays factual, emotion is expressed through *how* it sounds, not *what* it says ([EmoShift, arXiv:2601.22873](https://arxiv.org/abs/2601.22873)):
|
|
1302
|
+
The voice engine modulates **three prosodic dimensions** from the emotion state — text vocabulary stays factual, emotion is expressed through *how* it sounds, not *what* it says ([EmoShift, [arXiv:2601.22873](https://arxiv.org/abs/2601.22873)](https://arxiv.org/abs/2601.22873)):
|
|
1303
1303
|
|
|
1304
1304
|
| Dimension | Source | Effect | Range |
|
|
1305
1305
|
|-----------|--------|--------|-------|
|
|
@@ -1307,7 +1307,7 @@ The voice engine modulates **three prosodic dimensions** from the emotion state
|
|
|
1307
1307
|
| **Speed** | Arousal (primary) + Dominance (secondary) | High arousal = faster, high dominance = more deliberate | [0.85x, 1.15x] |
|
|
1308
1308
|
| **Volume** | Speaker role | Primary = 100%, subordinate (sub-agent) = 55% | [0.55, 1.0] |
|
|
1309
1309
|
|
|
1310
|
-
Pitch and speed use **nonlinear tanh squashing** ([UDDETTS, arXiv:2505.10599](https://arxiv.org/abs/2505.10599)) — moderate emotions get amplified for expressiveness, extreme emotions saturate gracefully instead of clipping.
|
|
1310
|
+
Pitch and speed use **nonlinear tanh squashing** ([UDDETTS, [arXiv:2505.10599](https://arxiv.org/abs/2505.10599)](https://arxiv.org/abs/2505.10599)) — moderate emotions get amplified for expressiveness, extreme emotions saturate gracefully instead of clipping.
|
|
1311
1311
|
|
|
1312
1312
|
Each narration also emits a **ProsodyHint** metadata object following the RLAIF-SPA SEST schema ([arXiv:2510.14628](https://arxiv.org/abs/2510.14628)) — Structure/Emotion/Speed/Tone — which downstream consumers (WebSocket voice sessions, Telegram TTS) can use independently:
|
|
1313
1313
|
|
|
@@ -2533,7 +2533,7 @@ The eval runner supports `--runs N` for pass^k reliability measurement (consiste
|
|
|
2533
2533
|
|
|
2534
2534
|
### REST API Enterprise Evaluation (v0.185.68)
|
|
2535
2535
|
|
|
2536
|
-
35 test cases executed against the
|
|
2536
|
+
35 test cases executed against the oa REST API (`oa serve` on port 11435) across **10 industries** and **3 model tiers**. Each case sends a domain-specific prompt via `/v1/chat/completions` and verifies correctness against expected patterns.
|
|
2537
2537
|
|
|
2538
2538
|
```bash
|
|
2539
2539
|
node eval/api-enterprise-eval.mjs # Run all 85 tests (35 cases × 3 models)
|
|
@@ -2572,7 +2572,7 @@ node eval/api-enterprise-eval.mjs # Run all 85 tests (35 case
|
|
|
2572
2572
|
| 9B + PoT hint | 13% | **100%** | Models write correct Python but chat API can't execute it |
|
|
2573
2573
|
| 27B + PoT hint | 47% | **100%** | Larger models can trace code mentally; full accuracy requires `repl_exec` in agentic mode |
|
|
2574
2574
|
|
|
2575
|
-
The PoT (Program-of-Thought) guidance achieves **100% code generation rate** — every model writes Python instead of computing in-head. Full correctness is realized in agentic mode where `repl_exec` executes the code. Research basis: PAL (arXiv:2211.10435), PoT (arXiv:2211.12588), ToRA (arXiv:2309.17452), START (arXiv:2503.04625).
|
|
2575
|
+
The PoT (Program-of-Thought) guidance achieves **100% code generation rate** — every model writes Python instead of computing in-head. Full correctness is realized in agentic mode where `repl_exec` executes the code. Research basis: PAL ([arXiv:2211.10435](https://arxiv.org/abs/2211.10435)), PoT ([arXiv:2211.12588](https://arxiv.org/abs/2211.12588)), ToRA ([arXiv:2309.17452](https://arxiv.org/abs/2309.17452)), START ([arXiv:2503.04625](https://arxiv.org/abs/2503.04625)).
|
|
2576
2576
|
|
|
2577
2577
|
**Key architectural findings:**
|
|
2578
2578
|
- API proxy timeout of 10s caused **100% failure** for cold model loads (Ollama needs 15-115s to load models). Fixed to 120s in v0.185.60.
|
package/package.json
CHANGED