open-agents-ai 0.187.0 → 0.187.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +0 -120
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -72,8 +72,6 @@ An autonomous multi-turn tool-calling agent that reads your code, makes changes,
|
|
|
72
72
|
- [License](#license)
|
|
73
73
|
|
|
74
74
|
|
|
75
|
-
<details id="the-organism-not-the-cortex">
|
|
76
|
-
<summary><strong>The Organism, Not the Cortex</strong> — Why the LLM is one organ inside a larger organism</summary>
|
|
77
75
|
|
|
78
76
|
## The Organism, Not the Cortex
|
|
79
77
|
|
|
@@ -97,11 +95,8 @@ An LLM is a high-bandwidth associative generative core — closer to a cortex-li
|
|
|
97
95
|
|
|
98
96
|
Don't chase larger models. Build the organism around whatever model you have.
|
|
99
97
|
|
|
100
|
-
</details>
|
|
101
98
|
|
|
102
99
|
|
|
103
|
-
<details id="how-it-works">
|
|
104
|
-
<summary><strong>How It Works</strong> — Multi-turn autonomous tool-calling loop in action</summary>
|
|
105
100
|
|
|
106
101
|
## How It Works
|
|
107
102
|
|
|
@@ -119,11 +114,8 @@ Agent: [Turn 1] file_read(src/auth.ts)
|
|
|
119
114
|
|
|
120
115
|
The agent uses tools autonomously in a loop — reading errors, fixing code, and re-running validation until the task succeeds or the turn limit is reached.
|
|
121
116
|
|
|
122
|
-
</details>
|
|
123
117
|
|
|
124
118
|
|
|
125
|
-
<details id="features">
|
|
126
|
-
<summary><strong>Features</strong> — 61 tools, voice, vision, P2P mesh, self-play, COHERE cognitive stack</summary>
|
|
127
119
|
|
|
128
120
|
## Features
|
|
129
121
|
|
|
@@ -229,11 +221,8 @@ D8AgCTrxpDKD5meJ2bpAfVwcST3NF3EPuy9xczYycnXn
|
|
|
229
221
|
0x81Ce81F0B6B5928E15d3a2850F913C88D07051ec
|
|
230
222
|
```
|
|
231
223
|
|
|
232
|
-
</details>
|
|
233
224
|
|
|
234
225
|
|
|
235
|
-
<details id="enterprise--headless-mode">
|
|
236
|
-
<summary><strong>Enterprise & Headless Mode</strong> — REST API, background jobs, JSON output, auth scopes, tool profiles</summary>
|
|
237
226
|
|
|
238
227
|
## Enterprise & Headless Mode
|
|
239
228
|
|
|
@@ -746,11 +735,8 @@ Open `http://localhost:11435/` in a browser when `oa serve` is running. Zero ext
|
|
|
746
735
|
|
|
747
736
|
Free for non-commercial use under CC-BY-NC-4.0. For enterprise/commercial licensing, contact [zoomerconsulting.com](https://zoomerconsulting.com).
|
|
748
737
|
|
|
749
|
-
</details>
|
|
750
738
|
|
|
751
739
|
|
|
752
|
-
<details id="architecture">
|
|
753
|
-
<summary><strong>Architecture</strong> — AgenticRunner core loop with structured context assembly</summary>
|
|
754
740
|
|
|
755
741
|
## Architecture
|
|
756
742
|
|
|
@@ -774,11 +760,8 @@ User task → assembleContext(c_instr, c_state, c_know) → LLM → tool_calls
|
|
|
774
760
|
- **Context-aware** — dynamic compaction, Memex archiving, session persistence, model-tier scaling
|
|
775
761
|
- **Brute-force** — optional auto re-engagement when turn limit is hit (keeps going until task_complete or user abort)
|
|
776
762
|
|
|
777
|
-
</details>
|
|
778
763
|
|
|
779
764
|
|
|
780
|
-
<details id="context-engineering">
|
|
781
|
-
<summary><strong>Context Engineering</strong> — C = A(c_instr, c_know, c_tools, c_mem, c_state, c_query) structured assembly</summary>
|
|
782
765
|
|
|
783
766
|
## Context Engineering
|
|
784
767
|
|
|
@@ -806,11 +789,8 @@ Key design decisions grounded in research:
|
|
|
806
789
|
|
|
807
790
|
Research provenance: grounded in "A Survey of Context Engineering for LLMs" (context assembly equation), "Modular Prompt Optimization" (section-local textual gradients), "Reasoning Up the Instruction Ladder" (priority hierarchy), "GEPA" (reflective prompt evolution), and "Prompt Flow Integrity" (least-privilege context passing).
|
|
808
791
|
|
|
809
|
-
</details>
|
|
810
792
|
|
|
811
793
|
|
|
812
|
-
<details id="model-tier-awareness">
|
|
813
|
-
<summary><strong>Model-Tier Awareness</strong> — Dynamic tool sets, prompts, and limits that scale with model size</summary>
|
|
814
794
|
|
|
815
795
|
## Model-Tier Awareness
|
|
816
796
|
|
|
@@ -849,11 +829,8 @@ All context-dependent values scale automatically with the actual context window
|
|
|
849
829
|
| Tool output cap | 2K-8K chars (scales with context) |
|
|
850
830
|
| File read limits | 80-120 line cap for small/medium context windows |
|
|
851
831
|
|
|
852
|
-
</details>
|
|
853
832
|
|
|
854
833
|
|
|
855
|
-
<details id="auto-expanding-context-window">
|
|
856
|
-
<summary><strong>Auto-Expanding Context Window</strong> — RAM/VRAM detection creates optimized model variants automatically</summary>
|
|
857
834
|
|
|
858
835
|
## Auto-Expanding Context Window
|
|
859
836
|
|
|
@@ -870,11 +847,8 @@ On startup and `/model` switch, Open Agents detects your RAM/VRAM and creates an
|
|
|
870
847
|
| 8GB+ | 8K tokens |
|
|
871
848
|
| < 8GB | 4K tokens |
|
|
872
849
|
|
|
873
|
-
</details>
|
|
874
850
|
|
|
875
851
|
|
|
876
|
-
<details id="tools-61">
|
|
877
|
-
<summary><strong>Tools (61)</strong> — File I/O, shell, web, vision, memory, agents, COHERE, P2P, x402</summary>
|
|
878
852
|
|
|
879
853
|
## Tools (61)
|
|
880
854
|
|
|
@@ -984,11 +958,8 @@ The agent has 4 web tools. Pick the right one:
|
|
|
984
958
|
|
|
985
959
|
**Structured extraction**: Pass `extract_schema='{"price": "number", "name": "string"}'` to `web_crawl` for best-effort regex-based field extraction from page content.
|
|
986
960
|
|
|
987
|
-
</details>
|
|
988
961
|
|
|
989
962
|
|
|
990
|
-
<details id="ralph-loop--iteration-first-design">
|
|
991
|
-
<summary><strong>Ralph Loop — Iteration-First Design</strong> — Iterative retry loop where errors become learning data</summary>
|
|
992
963
|
|
|
993
964
|
## Ralph Loop — Iteration-First Design
|
|
994
965
|
|
|
@@ -1016,11 +987,8 @@ The loop tracks iteration history, generates completion reports saved to `.aiwg/
|
|
|
1016
987
|
/ralph-abort # Cancel running loop
|
|
1017
988
|
```
|
|
1018
989
|
|
|
1019
|
-
</details>
|
|
1020
990
|
|
|
1021
991
|
|
|
1022
|
-
<details id="task-control">
|
|
1023
|
-
<summary><strong>Task Control</strong> — Pause, stop, resume, destroy, and session context persistence</summary>
|
|
1024
992
|
|
|
1025
993
|
## Task Control
|
|
1026
994
|
|
|
@@ -1063,11 +1031,8 @@ When you launch `oa` in a workspace that has saved session context from a previo
|
|
|
1063
1031
|
|
|
1064
1032
|
Type `y` to restore — the previous session context will be prepended to your next task, giving the agent full continuity. Type `n` (or anything else) to start fresh. The prompt only appears on fresh starts, not on `/update` resumes (which auto-restore context).
|
|
1065
1033
|
|
|
1066
|
-
</details>
|
|
1067
1034
|
|
|
1068
1035
|
|
|
1069
|
-
<details id="cohere-cognitive-framework">
|
|
1070
|
-
<summary><strong>COHERE Cognitive Framework</strong> — 8-layer cognitive stack with distributed inference, identity, and reflection</summary>
|
|
1071
1036
|
|
|
1072
1037
|
## COHERE Cognitive Framework
|
|
1073
1038
|
|
|
@@ -1149,11 +1114,8 @@ The identity kernel maintains a persistent self-model across sessions, the refle
|
|
|
1149
1114
|
| L8 | Darwin Gödel Machine: Open-Ended Self-Improvement (2025) | [arxiv:2505.22954](https://arxiv.org/abs/2505.22954) |
|
|
1150
1115
|
| L8 | i-MENTOR: Intrinsic Motivation Exploration (2025) | [arxiv:2505.17621](https://arxiv.org/abs/2505.17621) |
|
|
1151
1116
|
|
|
1152
|
-
</details>
|
|
1153
1117
|
|
|
1154
1118
|
|
|
1155
|
-
<details id="agent-immune-system--constraint-enforcement--pressure-resistance">
|
|
1156
|
-
<summary><strong>Agent Immune System — Constraint Enforcement & Pressure Resistance</strong> — Behavioral constraints, pressure-aware decision gates, and audit logging</summary>
|
|
1157
1119
|
|
|
1158
1120
|
## Agent Immune System — Constraint Enforcement & Pressure Resistance
|
|
1159
1121
|
|
|
@@ -1216,11 +1178,8 @@ User (frustrated): "fix this broken shit"
|
|
|
1216
1178
|
→ Model fixes the architecture instead of adding a prompt hack
|
|
1217
1179
|
```
|
|
1218
1180
|
|
|
1219
|
-
</details>
|
|
1220
1181
|
|
|
1221
1182
|
|
|
1222
|
-
<details id="context-compaction--research-backed-memory-management">
|
|
1223
|
-
<summary><strong>Context Compaction — Research-Backed Memory Management</strong> — 6 compaction strategies, Memex archive, SNR tracking, deep context mode</summary>
|
|
1224
1183
|
|
|
1225
1184
|
## Context Compaction — Research-Backed Memory Management
|
|
1226
1185
|
|
|
@@ -1350,11 +1309,8 @@ Compaction summaries include:
|
|
|
1350
1309
|
|
|
1351
1310
|
This ensures the agent can resume coherently after compaction without re-reading files or re-running commands.
|
|
1352
1311
|
|
|
1353
|
-
</details>
|
|
1354
1312
|
|
|
1355
1313
|
|
|
1356
|
-
<details id="personality-core--sac-framework-style-control">
|
|
1357
|
-
<summary><strong>Personality Core — SAC Framework Style Control</strong> — Five-dimension behavioral intensity from silent operator to teacher mode</summary>
|
|
1358
1314
|
|
|
1359
1315
|
## Personality Core — SAC Framework Style Control
|
|
1360
1316
|
|
|
@@ -1406,11 +1362,8 @@ The personality system draws on:
|
|
|
1406
1362
|
- **Linear Personality Probing** ([arXiv:2512.17639](https://arxiv.org/abs/2512.17639)) — Prompt-level steering completely dominates activation-level interventions
|
|
1407
1363
|
- **The Prompt Report** ([arXiv:2406.06608](https://arxiv.org/abs/2406.06608)) — Positive framing outperforms negated instructions for behavioral control
|
|
1408
1364
|
|
|
1409
|
-
</details>
|
|
1410
1365
|
|
|
1411
1366
|
|
|
1412
|
-
<details id="emotion-engine--affective-state-modulation">
|
|
1413
|
-
<summary><strong>Emotion Engine — Affective State Modulation</strong> — Circumplex affect model with valence, arousal, dominance axes</summary>
|
|
1414
1367
|
|
|
1415
1368
|
## Emotion Engine — Affective State Modulation
|
|
1416
1369
|
|
|
@@ -1476,11 +1429,8 @@ The emotion system is informed by peer-reviewed and preprint research:
|
|
|
1476
1429
|
|
|
1477
1430
|
8. **EmotionBench** — Huang et al. ([arXiv:2308.03656](https://arxiv.org/abs/2308.03656), 2023). LLMs cannot maintain emotional state across turns implicitly — argues for explicit external mood state representation (which this engine implements).
|
|
1478
1431
|
|
|
1479
|
-
</details>
|
|
1480
1432
|
|
|
1481
1433
|
|
|
1482
|
-
<details id="voice-feedback-tts">
|
|
1483
|
-
<summary><strong>Voice Feedback (TTS)</strong> — GLaDOS, Overwatch, Kokoro, LuxTTS voice clone with emotion-driven prosody</summary>
|
|
1484
1434
|
|
|
1485
1435
|
## Voice Feedback (TTS)
|
|
1486
1436
|
|
|
@@ -1675,11 +1625,8 @@ The stochastic narration engine generates spoken descriptions of what the agent
|
|
|
1675
1625
|
- **Personality scaling** — terse mode (level 1-2) uses short functional descriptions; conversational (3) adds natural phrasing; chatty (4-5) adds theatrical commentary and content references
|
|
1676
1626
|
- **Natural silence** — on bland successes without notable content, ~40% of the time the narration is skipped entirely for a more natural rhythm
|
|
1677
1627
|
|
|
1678
|
-
</details>
|
|
1679
1628
|
|
|
1680
1629
|
|
|
1681
|
-
<details id="listen-mode--live-bidirectional-audio">
|
|
1682
|
-
<summary><strong>Listen Mode — Live Bidirectional Audio</strong> — Real-time Whisper transcription with hands-free auto-submit</summary>
|
|
1683
1630
|
|
|
1684
1631
|
## Listen Mode — Live Bidirectional Audio
|
|
1685
1632
|
|
|
@@ -1719,11 +1666,8 @@ The `transcribe-cli` dependency auto-installs in the background on first use. On
|
|
|
1719
1666
|
|
|
1720
1667
|
**File transcription**: Drag-and-drop audio/video files (`.mp3`, `.wav`, `.mp4`, `.mkv`, etc.) onto the terminal to transcribe them. Results are saved to `.oa/transcripts/`.
|
|
1721
1668
|
|
|
1722
|
-
</details>
|
|
1723
1669
|
|
|
1724
1670
|
|
|
1725
|
-
<details id="vision--desktop-automation-moondream">
|
|
1726
|
-
<summary><strong>Vision & Desktop Automation (Moondream)</strong> — Local VLM for screenshots, point-and-click, browser automation, OCR</summary>
|
|
1727
1671
|
|
|
1728
1672
|
## Vision & Desktop Automation (Moondream)
|
|
1729
1673
|
|
|
@@ -1913,11 +1857,8 @@ Supports `apt` (Debian/Ubuntu), `dnf` (Fedora), `pacman` (Arch), and `brew` (mac
|
|
|
1913
1857
|
- Moondream Station (local) — runs entirely on your machine, no API keys needed
|
|
1914
1858
|
- Moondream Cloud API — set `MOONDREAM_API_KEY` for cloud inference
|
|
1915
1859
|
|
|
1916
|
-
</details>
|
|
1917
1860
|
|
|
1918
1861
|
|
|
1919
|
-
<details id="interactive-tui">
|
|
1920
|
-
<summary><strong>Interactive TUI</strong> — REPL with slash commands, mid-task steering, animated metrics bar</summary>
|
|
1921
1862
|
|
|
1922
1863
|
## Interactive TUI
|
|
1923
1864
|
|
|
@@ -2024,11 +1965,8 @@ The steering sub-agent uses the same model and backend as the main agent with `m
|
|
|
2024
1965
|
- **LATS** (Zhou et al., 2024) — mid-execution replanning with user-provided value signals improves task completion on complex multi-step problems
|
|
2025
1966
|
- **AutoGen** (Wu et al., 2023) — human-in-the-loop patterns work best when user messages are expanded into structured instructions, reducing ambiguity for the primary agent
|
|
2026
1967
|
|
|
2027
|
-
</details>
|
|
2028
1968
|
|
|
2029
1969
|
|
|
2030
|
-
<details id="telegram-bridge--sub-agent-per-chat">
|
|
2031
|
-
<summary><strong>Telegram Bridge — Sub-Agent Per Chat</strong> — Per-chat sub-agents with admin passthrough, media handling, and streaming</summary>
|
|
2032
1970
|
|
|
2033
1971
|
## Telegram Bridge — Sub-Agent Per Chat
|
|
2034
1972
|
|
|
@@ -2163,11 +2101,8 @@ The bridge automatically handles Telegram's rate limits (HTTP 429) with exponent
|
|
|
2163
2101
|
|
|
2164
2102
|
**Combined with blessed mode** — `/full-send-bless` + `/telegram` creates a persistent, always-on agent that processes Telegram messages around the clock while keeping the model warm.
|
|
2165
2103
|
|
|
2166
|
-
</details>
|
|
2167
2104
|
|
|
2168
2105
|
|
|
2169
|
-
<details id="x402-payment-rails--nexus-p2p">
|
|
2170
|
-
<summary><strong>x402 Payment Rails & Nexus P2P</strong> — EVM wallets, EIP-3009 USDC transfers, metered inference, budget policies</summary>
|
|
2171
2106
|
|
|
2172
2107
|
## x402 Payment Rails & Nexus P2P
|
|
2173
2108
|
|
|
@@ -2228,11 +2163,8 @@ nexus(action='budget_set', auto_approve_below='0.01') # Auto-approve micropayme
|
|
|
2228
2163
|
- All outbound messages scanned for key material before sending
|
|
2229
2164
|
- Keys NEVER appear in tool output, logs, or LLM context
|
|
2230
2165
|
|
|
2231
|
-
</details>
|
|
2232
2166
|
|
|
2233
2167
|
|
|
2234
|
-
<details id="sponsored-inference--share-your-gpu-with-the-world">
|
|
2235
|
-
<summary><strong>Sponsored Inference — Share Your GPU With the World</strong> — 5-step wizard to share models via secure branded relay</summary>
|
|
2236
2168
|
|
|
2237
2169
|
## Sponsored Inference — Share Your GPU With the World
|
|
2238
2170
|
|
|
@@ -2330,11 +2262,8 @@ Three independent layers prevent remote peers from accessing destructive Ollama
|
|
|
2330
2262
|
|
|
2331
2263
|
The `--full` flag is required to grant remote peers model management access. Sponsor mode always blocks destructive operations regardless of flags. Tool definitions are now forwarded through all relay paths (v0.186.68+).
|
|
2332
2264
|
|
|
2333
|
-
</details>
|
|
2334
2265
|
|
|
2335
2266
|
|
|
2336
|
-
<details id="cohere-distributed-mind">
|
|
2337
|
-
<summary><strong>COHERE Distributed Mind</strong> — Multi-node mesh with NATS pub/sub, peer review, collective learning</summary>
|
|
2338
2267
|
|
|
2339
2268
|
## COHERE Distributed Mind
|
|
2340
2269
|
|
|
@@ -2402,11 +2331,8 @@ Inbound queries are scanned for prompt injection attempts before processing:
|
|
|
2402
2331
|
- Remote constraints from peer nodes (CM-07, published every 5 minutes)
|
|
2403
2332
|
- Blocked queries increment `queriesErrors` and are silently dropped
|
|
2404
2333
|
|
|
2405
|
-
</details>
|
|
2406
2334
|
|
|
2407
2335
|
|
|
2408
|
-
<details id="dream-mode--creative-idle-exploration">
|
|
2409
|
-
<summary><strong>Dream Mode — Creative Idle Exploration</strong> — NREM/REM sleep cycles with autoresearch swarm on GPU</summary>
|
|
2410
2336
|
|
|
2411
2337
|
## Dream Mode — Creative Idle Exploration
|
|
2412
2338
|
|
|
@@ -2484,11 +2410,8 @@ If the Python scripts are invoked directly (without `uv run`), they self-bootstr
|
|
|
2484
2410
|
|
|
2485
2411
|
If no GPU is detected, the REM stage falls back to the standard multi-agent creative exploration (Visionary + Pragmatist + Cross-Pollinator + Synthesizer).
|
|
2486
2412
|
|
|
2487
|
-
</details>
|
|
2488
2413
|
|
|
2489
2414
|
|
|
2490
|
-
<details id="blessed-mode--infinite-warm-loop">
|
|
2491
|
-
<summary><strong>Blessed Mode — Infinite Warm Loop</strong> — Keep model warm in VRAM, auto-cycle tasks, Default Mode Network</summary>
|
|
2492
2415
|
|
|
2493
2416
|
## Blessed Mode — Infinite Warm Loop
|
|
2494
2417
|
|
|
@@ -2529,11 +2452,8 @@ Each DMN cycle runs a lightweight LLM agent (15 max turns, temperature 0.4) with
|
|
|
2529
2452
|
|
|
2530
2453
|
**Research basis**: Reflexion ([arXiv:2303.11366](https://arxiv.org/abs/2303.11366)), Self-Rewarding LMs ([arXiv:2401.10020](https://arxiv.org/abs/2401.10020)), Generative Agents ([arXiv:2304.03442](https://arxiv.org/abs/2304.03442)), STOP ([arXiv:2310.02226](https://arxiv.org/abs/2310.02226)), Voyager ([arXiv:2305.16291](https://arxiv.org/abs/2305.16291))
|
|
2531
2454
|
|
|
2532
|
-
</details>
|
|
2533
2455
|
|
|
2534
2456
|
|
|
2535
|
-
<details id="docker-sandbox--collective-intelligence">
|
|
2536
|
-
<summary><strong>Docker Sandbox & Collective Intelligence</strong> — Container isolation, multi-agent testbed, self-play loop</summary>
|
|
2537
2457
|
|
|
2538
2458
|
## Docker Sandbox & Collective Intelligence
|
|
2539
2459
|
|
|
@@ -2652,11 +2572,8 @@ Nodes share identity kernel updates via `nexus.cohere.kernel.delta` on NATS. Ado
|
|
|
2652
2572
|
4. **Tool Use = Quality** — Agents using `web_search` produced current, verifiable data. Non-tool responses were generic.
|
|
2653
2573
|
5. **Identity Divergence** — Different task exposure → different specializations. Intern gained `web-research` from heavy search; Director gained nothing (still loading).
|
|
2654
2574
|
|
|
2655
|
-
</details>
|
|
2656
2575
|
|
|
2657
2576
|
|
|
2658
|
-
<details id="code-sandbox">
|
|
2659
|
-
<summary><strong>Code Sandbox</strong> — Isolated JS, Python, Bash, TypeScript execution in subprocess or Docker</summary>
|
|
2660
2577
|
|
|
2661
2578
|
## Code Sandbox
|
|
2662
2579
|
|
|
@@ -2676,11 +2593,8 @@ Supports JavaScript, TypeScript, Python, and Bash. Two execution modes:
|
|
|
2676
2593
|
- **Subprocess** (default) — runs in a child process with timeout and output limits
|
|
2677
2594
|
- **Docker** — runs in an isolated container when `docker` is available
|
|
2678
2595
|
|
|
2679
|
-
</details>
|
|
2680
2596
|
|
|
2681
2597
|
|
|
2682
|
-
<details id="structured-data-tools">
|
|
2683
|
-
<summary><strong>Structured Data Tools</strong> — Generate and parse CSV, TSV, JSON, Markdown tables, Excel files</summary>
|
|
2684
2598
|
|
|
2685
2599
|
## Structured Data Tools
|
|
2686
2600
|
|
|
@@ -2710,11 +2624,8 @@ Agent: read_structured_file(path="report.md")
|
|
|
2710
2624
|
|
|
2711
2625
|
Detects binary formats (XLSX, PDF, DOCX) and suggests conversion tools.
|
|
2712
2626
|
|
|
2713
|
-
</details>
|
|
2714
2627
|
|
|
2715
2628
|
|
|
2716
|
-
<details id="multi-provider-web-search">
|
|
2717
|
-
<summary><strong>Multi-Provider Web Search</strong> — DuckDuckGo, Tavily, and Jina AI with auto-detection</summary>
|
|
2718
2629
|
|
|
2719
2630
|
## Multi-Provider Web Search
|
|
2720
2631
|
|
|
@@ -2733,11 +2644,8 @@ export TAVILY_API_KEY=tvly-... # Enable Tavily (optional)
|
|
|
2733
2644
|
export JINA_API_KEY=jina_... # Enable Jina AI (optional)
|
|
2734
2645
|
```
|
|
2735
2646
|
|
|
2736
|
-
</details>
|
|
2737
2647
|
|
|
2738
2648
|
|
|
2739
|
-
<details id="task-templates">
|
|
2740
|
-
<summary><strong>Task Templates</strong> — Specialized system prompts for code, document, analysis, and plan tasks</summary>
|
|
2741
2649
|
|
|
2742
2650
|
## Task Templates
|
|
2743
2651
|
|
|
@@ -2752,11 +2660,8 @@ Set a task type to get specialized system prompts, recommended tools, and output
|
|
|
2752
2660
|
/task-type plan # Planning — emphasizes steps, dependencies, risks
|
|
2753
2661
|
```
|
|
2754
2662
|
|
|
2755
|
-
</details>
|
|
2756
2663
|
|
|
2757
2664
|
|
|
2758
|
-
<details id="human-expert-speed-ratio">
|
|
2759
|
-
<summary><strong>Human Expert Speed Ratio</strong> — Real-time Exp: Nx gauge calibrated across 47 tool baselines</summary>
|
|
2760
2665
|
|
|
2761
2666
|
## Human Expert Speed Ratio
|
|
2762
2667
|
|
|
@@ -2799,11 +2704,8 @@ Color coding: green (2x+ faster), yellow (1-2x, comparable), red (<1x, slower th
|
|
|
2799
2704
|
|
|
2800
2705
|
All 47 tools have calibrated baselines ranging from 3s (`task_stop`) to 180s (`codebase_map`). Unknown tools default to 20s.
|
|
2801
2706
|
|
|
2802
|
-
</details>
|
|
2803
2707
|
|
|
2804
2708
|
|
|
2805
|
-
<details id="cost-tracking--session-metrics">
|
|
2806
|
-
<summary><strong>Cost Tracking & Session Metrics</strong> — Token cost estimation for 15+ providers with LLM-as-judge evaluation</summary>
|
|
2807
2709
|
|
|
2808
2710
|
## Cost Tracking & Session Metrics
|
|
2809
2711
|
|
|
@@ -2821,11 +2723,8 @@ Cost tracking supports 15+ providers including Groq, Together AI, OpenRouter, Fi
|
|
|
2821
2723
|
|
|
2822
2724
|
Work evaluation uses five task-type-specific rubrics (code, document, analysis, plan, general) scoring correctness, completeness, efficiency, code quality, and communication on a 1-5 scale.
|
|
2823
2725
|
|
|
2824
|
-
</details>
|
|
2825
2726
|
|
|
2826
2727
|
|
|
2827
|
-
<details id="configuration">
|
|
2828
|
-
<summary><strong>Configuration</strong> — CLI flags, env vars, config files, project context, and .oa/ directory</summary>
|
|
2829
2728
|
|
|
2830
2729
|
## Configuration
|
|
2831
2730
|
|
|
@@ -2859,11 +2758,8 @@ Create `AGENTS.md`, `OA.md`, or `.open-agents.md` in your project root for agent
|
|
|
2859
2758
|
└── pending-task.json # Saved task state for /stop and /update resume
|
|
2860
2759
|
```
|
|
2861
2760
|
|
|
2862
|
-
</details>
|
|
2863
2761
|
|
|
2864
2762
|
|
|
2865
|
-
<details id="model-support">
|
|
2866
|
-
<summary><strong>Model Support</strong> — Qwen3.5-122B primary target, any Ollama or OpenAI-compatible model</summary>
|
|
2867
2763
|
|
|
2868
2764
|
## Model Support
|
|
2869
2765
|
|
|
@@ -2879,11 +2775,8 @@ oa --backend vllm --backend-url http://localhost:8000/v1 "add tests"
|
|
|
2879
2775
|
oa --backend-url http://10.0.0.5:11434 "refactor auth"
|
|
2880
2776
|
```
|
|
2881
2777
|
|
|
2882
|
-
</details>
|
|
2883
2778
|
|
|
2884
2779
|
|
|
2885
|
-
<details id="supported-inference-providers">
|
|
2886
|
-
<summary><strong>Supported Inference Providers</strong> — 14 providers from local Ollama to Groq, Chutes, OpenRouter, and P2P mesh</summary>
|
|
2887
2780
|
|
|
2888
2781
|
## Supported Inference Providers
|
|
2889
2782
|
|
|
@@ -3043,11 +2936,8 @@ When you've used multiple endpoints, the agent automatically builds a failover c
|
|
|
3043
2936
|
|
|
3044
2937
|
No configuration needed — the cascade is built from your endpoint usage history. Works across local Ollama, cloud providers, and P2P peers.
|
|
3045
2938
|
|
|
3046
|
-
</details>
|
|
3047
2939
|
|
|
3048
2940
|
|
|
3049
|
-
<details id="evaluation-suite">
|
|
3050
|
-
<summary><strong>Evaluation Suite</strong> — 23 web nav + 46 coding + 35 enterprise tasks with pass^k reliability</summary>
|
|
3051
2941
|
|
|
3052
2942
|
## Evaluation Suite
|
|
3053
2943
|
|
|
@@ -3256,11 +3146,8 @@ The PoT (Program-of-Thought) guidance achieves **100% code generation rate** —
|
|
|
3256
3146
|
- **~80 tokens of prompt additions** (PoT math guidance + search-when-uncertain) took the eval from 41.2% to 100% across all tiers — no fine-tuning required.
|
|
3257
3147
|
- 4B models match 9B/27B on structured domain tasks (healthcare, DevOps, e-commerce) but need search tools for specialized regulatory knowledge.
|
|
3258
3148
|
|
|
3259
|
-
</details>
|
|
3260
3149
|
|
|
3261
3150
|
|
|
3262
|
-
<details id="aiwg-integration">
|
|
3263
|
-
<summary><strong>AIWG Integration</strong> — AI-augmented SDLC with 85+ agents, structured memory, and traceability</summary>
|
|
3264
3151
|
|
|
3265
3152
|
## AIWG Integration
|
|
3266
3153
|
|
|
@@ -3281,11 +3168,8 @@ oa "analyze this project's SDLC health and set up documentation"
|
|
|
3281
3168
|
| **85+ Agents** | Specialized AI personas (Test Engineer, Security Auditor, API Designer) |
|
|
3282
3169
|
| **Traceability** | @-mention system links requirements to code to tests |
|
|
3283
3170
|
|
|
3284
|
-
</details>
|
|
3285
3171
|
|
|
3286
3172
|
|
|
3287
|
-
<details id="research-citations">
|
|
3288
|
-
<summary><strong>Research Citations</strong> — 32 papers (2023-2026) grounding self-play, memory, identity, and containers</summary>
|
|
3289
3173
|
|
|
3290
3174
|
## Research Citations
|
|
3291
3175
|
|
|
@@ -3340,11 +3224,8 @@ The COHERE collective intelligence system, self-play idle loop, identity evoluti
|
|
|
3340
3224
|
| LatentMAS: Latent-Space Collaboration | [2511.20639](https://arxiv.org/abs/2511.20639) | Nov 2025 | Future: 4x faster, 70-84% token reduction |
|
|
3341
3225
|
| Agent-Kernel Microkernel Architecture | [2512.01610](https://arxiv.org/abs/2512.01610) | Dec 2025 | Architecture: 10k agent coordination |
|
|
3342
3226
|
|
|
3343
|
-
</details>
|
|
3344
3227
|
|
|
3345
3228
|
|
|
3346
|
-
<details id="license">
|
|
3347
|
-
<summary><strong>License</strong> — CC BY-NC 4.0 with enterprise licensing available</summary>
|
|
3348
3229
|
|
|
3349
3230
|
## License
|
|
3350
3231
|
|
|
@@ -3354,4 +3235,3 @@ The COHERE collective intelligence system, self-play idle loop, identity evoluti
|
|
|
3354
3235
|
|
|
3355
3236
|
Free for non-commercial use. For enterprise/commercial licensing, contact [zoomerconsulting.com](https://zoomerconsulting.com).
|
|
3356
3237
|
|
|
3357
|
-
</details>
|
package/package.json
CHANGED