@framers/agentos 0.5.9 → 0.5.12
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +137 -408
- package/dist/api/agency.d.ts.map +1 -1
- package/dist/api/agency.js +0 -2
- package/dist/api/agency.js.map +1 -1
- package/dist/api/runtime/AgentOSOrchestrator.d.ts.map +1 -1
- package/dist/api/runtime/AgentOSOrchestrator.js +6 -3
- package/dist/api/runtime/AgentOSOrchestrator.js.map +1 -1
- package/dist/api/runtime/strategies/debate.d.ts.map +1 -1
- package/dist/api/runtime/strategies/debate.js +64 -21
- package/dist/api/runtime/strategies/debate.js.map +1 -1
- package/dist/api/runtime/strategies/graph.d.ts.map +1 -1
- package/dist/api/runtime/strategies/graph.js +11 -25
- package/dist/api/runtime/strategies/graph.js.map +1 -1
- package/dist/api/runtime/strategies/hierarchical.d.ts.map +1 -1
- package/dist/api/runtime/strategies/hierarchical.js +27 -7
- package/dist/api/runtime/strategies/hierarchical.js.map +1 -1
- package/dist/api/runtime/strategies/parallel.d.ts.map +1 -1
- package/dist/api/runtime/strategies/parallel.js +4 -8
- package/dist/api/runtime/strategies/parallel.js.map +1 -1
- package/dist/api/runtime/strategies/review-loop.d.ts +25 -0
- package/dist/api/runtime/strategies/review-loop.d.ts.map +1 -1
- package/dist/api/runtime/strategies/review-loop.js +5 -13
- package/dist/api/runtime/strategies/review-loop.js.map +1 -1
- package/dist/api/runtime/strategies/sequential.d.ts.map +1 -1
- package/dist/api/runtime/strategies/sequential.js +11 -25
- package/dist/api/runtime/strategies/sequential.js.map +1 -1
- package/dist/api/runtime/strategies/shared.d.ts +45 -8
- package/dist/api/runtime/strategies/shared.d.ts.map +1 -1
- package/dist/api/runtime/strategies/shared.js +47 -8
- package/dist/api/runtime/strategies/shared.js.map +1 -1
- package/dist/api/types.d.ts +16 -0
- package/dist/api/types.d.ts.map +1 -1
- package/dist/api/types.js.map +1 -1
- package/dist/core/llm/providers/implementations/OpenAIProvider.d.ts.map +1 -1
- package/dist/core/llm/providers/implementations/OpenAIProvider.js +26 -0
- package/dist/core/llm/providers/implementations/OpenAIProvider.js.map +1 -1
- package/dist/core/llm/providers/implementations/OpenRouterProvider.d.ts.map +1 -1
- package/dist/core/llm/providers/implementations/OpenRouterProvider.js +31 -3
- package/dist/core/llm/providers/implementations/OpenRouterProvider.js.map +1 -1
- package/dist/orchestration/builders/AgentGraph.d.ts +13 -1
- package/dist/orchestration/builders/AgentGraph.d.ts.map +1 -1
- package/dist/orchestration/builders/AgentGraph.js +5 -3
- package/dist/orchestration/builders/AgentGraph.js.map +1 -1
- package/dist/orchestration/builders/MissionBuilder.d.ts +12 -1
- package/dist/orchestration/builders/MissionBuilder.d.ts.map +1 -1
- package/dist/orchestration/builders/MissionBuilder.js +6 -3
- package/dist/orchestration/builders/MissionBuilder.js.map +1 -1
- package/dist/orchestration/builders/WorkflowBuilder.d.ts +39 -1
- package/dist/orchestration/builders/WorkflowBuilder.d.ts.map +1 -1
- package/dist/orchestration/builders/WorkflowBuilder.js +18 -3
- package/dist/orchestration/builders/WorkflowBuilder.js.map +1 -1
- package/dist/orchestration/ir/types.d.ts +13 -0
- package/dist/orchestration/ir/types.d.ts.map +1 -1
- package/dist/orchestration/runtime/GraphRuntime.d.ts.map +1 -1
- package/dist/orchestration/runtime/GraphRuntime.js +34 -1
- package/dist/orchestration/runtime/GraphRuntime.js.map +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,12 +1,14 @@
|
|
|
1
1
|
<div align="center">
|
|
2
2
|
|
|
3
3
|
<a href="https://agentos.sh">
|
|
4
|
-
<img src="https://raw.githubusercontent.com/framersai/agentos/master/assets/agentos-primary-no-tagline-transparent-2x.png" alt="AgentOS — TypeScript AI Agent Framework" height="100" />
|
|
4
|
+
<img src="https://raw.githubusercontent.com/framersai/agentos/master/assets/agentos-primary-no-tagline-transparent-2x.png" alt="AgentOS — TypeScript AI Agent Framework with Cognitive Memory" height="100" />
|
|
5
5
|
</a>
|
|
6
6
|
|
|
7
7
|
<br />
|
|
8
8
|
|
|
9
|
-
**Open-
|
|
9
|
+
# **AgentOS** — Open-Source TypeScript AI Agent Runtime with Cognitive Memory, HEXACO Personality, and Runtime Tool Forging
|
|
10
|
+
|
|
11
|
+
**85.6% on LongMemEval-S** at $0.0090/correct, +1.4 above Mastra OM gpt-4o (84.23%) at the matched reader · **70.2% on LongMemEval-M** (1.5M-token variant), the only open-source library on the public record above 65% on M with publicly reproducible methodology · 16 LLM providers · 8 neuroscience-backed memory mechanisms · Apache-2.0
|
|
10
12
|
|
|
11
13
|
[](https://www.npmjs.com/package/@framers/agentos)
|
|
12
14
|
[](https://github.com/framersai/agentos/actions/workflows/ci.yml)
|
|
@@ -16,316 +18,150 @@
|
|
|
16
18
|
[](https://opensource.org/licenses/Apache-2.0)
|
|
17
19
|
[](https://wilds.ai/discord)
|
|
18
20
|
|
|
19
|
-
[Website](https://agentos.sh) · [Docs](https://docs.agentos.sh) · [npm](https://www.npmjs.com/package/@framers/agentos) · [
|
|
21
|
+
[**Benchmarks**](https://github.com/framersai/agentos-bench/blob/master/results/LEADERBOARD.md) · [Website](https://agentos.sh) · [Docs](https://docs.agentos.sh) · [npm](https://www.npmjs.com/package/@framers/agentos) · [Discord](https://wilds.ai/discord) · [Blog](https://docs.agentos.sh/blog)
|
|
20
22
|
|
|
21
23
|
</div>
|
|
22
24
|
|
|
23
25
|
---
|
|
24
26
|
|
|
25
|
-
## What is AgentOS?
|
|
26
|
-
|
|
27
|
-
AgentOS is a TypeScript runtime for building AI agents that remember, adapt, and create new tools at runtime. Each agent is a **Generalized Mind Instance** (GMI) with its own personality, memory lifecycle, and behavioral adaptation loop.
|
|
28
|
-
|
|
29
|
-
### Why AgentOS over alternatives?
|
|
30
|
-
|
|
31
|
-
| vs. | AgentOS differentiator |
|
|
32
|
-
|-----|------------------------|
|
|
33
|
-
| **LangChain / LangGraph** | Cognitive memory (8 neuroscience-backed mechanisms), HEXACO personality, runtime tool forging |
|
|
34
|
-
| **Vercel AI SDK** | Multi-agent teams (6 strategies), full RAG pipeline (7 vector backends), guardrails, voice/telephony |
|
|
35
|
-
| **CrewAI / Mastra** | Unified orchestration (workflow DAGs + agent graphs + goal-driven missions), personality-driven routing |
|
|
36
|
-
|
|
37
|
-
> **Full comparison:** [AgentOS vs LangGraph vs CrewAI vs Mastra](https://docs.agentos.sh/blog/2026/02/20/agentos-vs-langgraph-vs-crewai)
|
|
38
|
-
|
|
39
|
-
---
|
|
40
|
-
|
|
41
|
-
## Classifier-Driven Memory Pipeline
|
|
42
|
-
|
|
43
|
-
Most memory libraries retrieve on every query. AgentOS gates memory through three independent LLM-as-judge classifiers, so trivial queries skip retrieval entirely, queries that need memory get the right architecture, and the right reader handles each category.
|
|
44
|
-
|
|
45
|
-
```
|
|
46
|
-
User query
|
|
47
|
-
│
|
|
48
|
-
▼
|
|
49
|
-
┌──────────────────────────────────┐
|
|
50
|
-
│ Stage 1: QueryClassifier │ gpt-5-mini few-shot, ~$0.0001 / query
|
|
51
|
-
│ Memory needed at all? │
|
|
52
|
-
│ T0 = none ────────────────► answer from context, skip retrieval
|
|
53
|
-
│ T1+ = simple/moderate/complex │
|
|
54
|
-
└──────────────────────────────────┘
|
|
55
|
-
│ (T1+ only)
|
|
56
|
-
▼
|
|
57
|
-
┌──────────────────────────────────┐
|
|
58
|
-
│ Stage 2: MemoryRouter │ reuses Stage 1 classification
|
|
59
|
-
│ Which retrieval architecture? │
|
|
60
|
-
│ canonical-hybrid · OM-v10 · OM-v11
|
|
61
|
-
└──────────────────────────────────┘
|
|
62
|
-
│
|
|
63
|
-
▼
|
|
64
|
-
┌──────────────────────────────────┐
|
|
65
|
-
│ Stage 3: ReaderRouter │ reuses Stage 1 classification
|
|
66
|
-
│ Which reader tier? │
|
|
67
|
-
│ gpt-4o (TR/SSU) · gpt-5-mini (SSA/SSP/KU/MS)
|
|
68
|
-
└──────────────────────────────────┘
|
|
69
|
-
│
|
|
70
|
-
▼
|
|
71
|
-
Grounded answer
|
|
72
|
-
```
|
|
73
|
-
|
|
74
|
-
Each stage is a small LLM-as-judge classifier (gpt-5-mini, ~$0.0001-0.0014 per call). Each stage is independent and shippable on its own. Stages 2 and 3 reuse the Stage 1 classification output, so the full pipeline costs **one classifier call per query**, not three.
|
|
75
|
-
|
|
76
|
-
**Validated on LongMemEval-S Phase B at N=500, gpt-4o judge, bootstrap CI 10k resamples**: 85.6% [82.4%, 88.6%] accuracy at $0.0090 per correct, 4-second average latency. Beats Mastra OM gpt-4o (84.2% published) on accuracy. Beats EmergenceMem Simple Fast (80.6% measured apples-to-apples in our harness) by +5.0 pp on accuracy at 6.5× lower cost-per-correct.
|
|
77
|
-
|
|
78
|
-
| Primitive | Source | Decision per query | Cost per call |
|
|
79
|
-
|---|---|---|---:|
|
|
80
|
-
| `QueryClassifier` | `@framers/agentos/query-router` | T0/none vs T1/simple vs T2/moderate vs T3/complex | ~$0.0001 |
|
|
81
|
-
| `MemoryRouter` | `@framers/agentos/memory-router` | canonical-hybrid vs observational-memory-v10 vs observational-memory-v11 | reuses Stage 1 output |
|
|
82
|
-
| `ReaderRouter` | `@framers/agentos/memory-router` (v0.5.5) | gpt-4o vs gpt-5-mini per category | reuses Stage 1 output |
|
|
83
|
-
|
|
84
|
-
The pipeline is novel because the **T0 / no-memory gate** removes retrieval entirely for queries that don't need it (greetings, small talk, general knowledge), saving the embedding+rerank+reader cost on a substantial fraction of typical agent traffic. The per-category dispatch then routes the remaining queries to the architecture and reader best-suited for the question type, calibrated from per-category Phase B accuracy data on LongMemEval-S.
|
|
85
|
-
|
|
86
|
-
**[Full benchmark suite + reproducible run JSONs →](https://github.com/framersai/agentos-bench)** · **[Cognitive Pipeline docs →](https://docs.agentos.sh/features/cognitive-pipeline)** · **[Query Router docs →](https://docs.agentos.sh/features/query-routing)** · **[Memory Router docs →](https://docs.agentos.sh/features/memory-router)**
|
|
87
|
-
|
|
88
|
-
---
|
|
89
|
-
|
|
90
|
-
## Memory Benchmarks at Matched Reader
|
|
91
|
-
|
|
92
|
-
Honest, apples-to-apples comparison: same reader (`gpt-4o`), same dataset, same N=500 Phase B methodology, same `gpt-4o-2024-08-06` judge with `rubric 2026-04-18.1` (FPR 1% [0%, 3%] at n=100). Cross-provider configurations (e.g. Gemini observers) are not included because their results cannot be reproduced from public methodology disclosures.
|
|
93
|
-
|
|
94
|
-
### LongMemEval-S Phase B (115K tokens, 50 sessions per haystack)
|
|
95
|
-
|
|
96
|
-
| System (gpt-4o reader, Phase B N=500) | Accuracy | 95% CI | $/correct | p50 latency | Source |
|
|
97
|
-
|---|---:|---|---:|---:|---|
|
|
98
|
-
| EmergenceMem Internal | 86.0% | not published | not published | 5,650 ms | [emergence.ai](https://www.emergence.ai/blog/sota-on-longmemeval-with-rag) |
|
|
99
|
-
| **🚀 AgentOS canonical-hybrid + reader-router** | **85.6%** | **[82.4%, 88.6%]** | **$0.0090** | **3,558 ms** | [85.6% post](https://docs.agentos.sh/blog/2026/04/28/reader-router-pareto-win) |
|
|
100
|
-
| Mastra OM gpt-4o (gemini-flash observer) | 84.23% | not published | not published | not published | [mastra.ai](https://mastra.ai/research/observational-memory) |
|
|
101
|
-
| Supermemory gpt-4o | 81.6% | not published | not published | not published | [supermemory.ai](https://supermemory.ai/research/) |
|
|
102
|
-
| EmergenceMem Simple Fast (apples-to-apples in our harness) | 80.6% | measured | $0.0586 | not published | v2 vendor reproduction |
|
|
103
|
-
| Zep self-reported / independently reproduced | 71.2% / 63.8% | not published | not published | 632 ms p95 search | [self](https://blog.getzep.com/state-of-the-art-agent-memory/) / [arXiv:2512.13564](https://arxiv.org/abs/2512.13564) |
|
|
104
|
-
|
|
105
|
-
**+1.4 pp accuracy over Mastra OM gpt-4o at the same reader.** Statistically tied with EmergenceMem Internal (86.0% point estimate sits inside our 95% CI [82.4%, 88.6%]). Median latency vs EmergenceMem is **1.6× faster** (3.558 s vs 5.650 s).
|
|
106
|
-
|
|
107
|
-
### LongMemEval-M Phase B (1.5M tokens, 500 sessions per haystack)
|
|
108
|
-
|
|
109
|
-
The harder variant. M's haystacks exceed every production context window (GPT-4o 128K, Claude Opus 200K, Gemini 3 Pro 1M). Most memory vendors stop at S because raw long-context fits there.
|
|
110
|
-
|
|
111
|
-
| System | Accuracy | 95% CI | License | Source |
|
|
112
|
-
|---|---:|---|---|---|
|
|
113
|
-
| AgentBrain | 71.7% (Test 0) | not published | closed-source SaaS | [github.com/AgentBrainHQ](https://github.com/AgentBrainHQ) |
|
|
114
|
-
| **🚀 AgentOS (sem-embed + reader-router + top-K=5)** | **70.2%** | **[66.0%, 74.0%]** | **MIT** | [70.2% post](https://docs.agentos.sh/blog/2026/04/29/longmemeval-m-70-with-topk5) |
|
|
115
|
-
| LongMemEval paper academic baseline | 65.7% | not published | open repo | [Wu et al., ICLR 2025, Table 3](https://arxiv.org/abs/2410.10813) |
|
|
116
|
-
| Mem0 v3, Mastra OM, Hindsight, Zep, EmergenceMem, Supermemory, MemMachine, Memoria, agentmemory, Backboard, ByteRover, Letta | not published | — | various | reports S only |
|
|
117
|
-
|
|
118
|
-
**Statistically tied with AgentBrain's closed-source SaaS** (their 71.7% sits inside our CI [66.0%, 74.0%]). **+4.5 pp above the LongMemEval paper's published academic ceiling.** **First open-source memory library on the public record above 65% on M with full methodology disclosure** (bootstrap CIs, per-case run JSONs, reproducible CLI, MIT-licensed).
|
|
119
|
-
|
|
120
|
-
### Methodology disclosure (12 axes most vendors omit)
|
|
121
|
-
|
|
122
|
-
| Axis | AgentOS | Most vendors |
|
|
123
|
-
|---|:-:|:-:|
|
|
124
|
-
| Aggregate accuracy | yes | yes |
|
|
125
|
-
| 95% bootstrap CI on headline | yes | no |
|
|
126
|
-
| Per-category 95% CI | yes | no |
|
|
127
|
-
| Reader model disclosed | yes | mostly |
|
|
128
|
-
| Observer / ingest model disclosed | yes | mostly |
|
|
129
|
-
| USD cost per correct | yes | no |
|
|
130
|
-
| Latency avg / p50 / p95 | yes | rarely |
|
|
131
|
-
| Per-category breakdown | yes | sometimes |
|
|
132
|
-
| Open-source benchmark runner | yes | rarely |
|
|
133
|
-
| Per-case run JSONs at fixed seed | yes | no |
|
|
134
|
-
| Judge-adversarial FPR probe | yes (1% S, 2% M, 0% LOCOMO) | no |
|
|
135
|
-
| Matched-reader cross-vendor table | yes | partial |
|
|
136
|
-
|
|
137
|
-
The full audit framework is at [Memory Benchmark Transparency Audit](https://docs.agentos.sh/blog/2026/04/24/memory-benchmark-transparency-audit). Every run referenced above ships with a per-case run JSON at `seed=42`.
|
|
138
|
-
|
|
139
|
-
---
|
|
140
|
-
|
|
141
|
-
## See It In Action
|
|
142
|
-
|
|
143
|
-
### 🌀 Paracosm — AI Agent Swarm Simulation
|
|
144
|
-
|
|
145
|
-
Define any scenario as JSON. Run it with AI commanders that have different HEXACO personalities. Same starting conditions, different decisions, divergent civilizations. Built on AgentOS.
|
|
146
|
-
|
|
147
|
-
```bash
|
|
148
|
-
npm install paracosm
|
|
149
|
-
```
|
|
150
|
-
|
|
151
|
-
**[Live Demo](https://paracosm.agentos.sh/sim)** · **[GitHub](https://github.com/framersai/paracosm)** · **[npm](https://www.npmjs.com/package/paracosm)** · **[Landing Page](https://paracosm.agentos.sh)**
|
|
152
|
-
|
|
153
|
-
---
|
|
154
|
-
|
|
155
27
|
## Install
|
|
156
28
|
|
|
157
29
|
```bash
|
|
158
30
|
npm install @framers/agentos
|
|
159
31
|
```
|
|
160
32
|
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
```bash
|
|
164
|
-
# Environment variables (recommended for production)
|
|
165
|
-
export OPENAI_API_KEY=sk-...
|
|
166
|
-
export ANTHROPIC_API_KEY=sk-ant-...
|
|
167
|
-
export GEMINI_API_KEY=AIza...
|
|
33
|
+
```typescript
|
|
34
|
+
import { agent } from '@framers/agentos';
|
|
168
35
|
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
36
|
+
const tutor = agent({
|
|
37
|
+
provider: 'anthropic',
|
|
38
|
+
instructions: 'You are a patient CS tutor.',
|
|
39
|
+
personality: { openness: 0.9, conscientiousness: 0.95 },
|
|
40
|
+
memory: { types: ['episodic', 'semantic'], working: { enabled: true } },
|
|
41
|
+
});
|
|
172
42
|
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
await
|
|
43
|
+
const session = tutor.session('student-1');
|
|
44
|
+
await session.send('Explain recursion with an analogy.');
|
|
45
|
+
await session.send('Can you expand on that?'); // remembers context
|
|
176
46
|
```
|
|
177
47
|
|
|
178
|
-
|
|
48
|
+
[Full quickstart](https://docs.agentos.sh/getting-started) · [Examples cookbook](https://docs.agentos.sh/getting-started/examples) · [API reference](https://docs.agentos.sh/api)
|
|
179
49
|
|
|
180
50
|
---
|
|
181
51
|
|
|
182
|
-
##
|
|
52
|
+
## Memory Benchmarks at Matched Reader
|
|
183
53
|
|
|
184
|
-
|
|
54
|
+
Same `gpt-4o` reader, same dataset, same `gpt-4o-2024-08-06` judge across every row. Cross-provider configurations are excluded because they cannot be reproduced from public methodology disclosures.
|
|
185
55
|
|
|
186
|
-
|
|
187
|
-
import { generateText } from '@framers/agentos';
|
|
56
|
+
### LongMemEval-S (115K tokens, 50 sessions)
|
|
188
57
|
|
|
189
|
-
|
|
190
|
-
|
|
191
|
-
|
|
192
|
-
|
|
58
|
+
| System (gpt-4o reader) | Accuracy | $/correct | p50 latency | Source |
|
|
59
|
+
|---|---:|---:|---:|---|
|
|
60
|
+
| EmergenceMem Internal | 86.0% | not published | 5,650 ms | [emergence.ai](https://www.emergence.ai/blog/sota-on-longmemeval-with-rag) |
|
|
61
|
+
| **🚀 AgentOS canonical-hybrid + reader-router** | **85.6%** | **$0.0090** | **3,558 ms** | [post](https://docs.agentos.sh/blog/2026/04/28/reader-router-pareto-win) |
|
|
62
|
+
| Mastra OM gpt-4o (gemini-flash observer) | 84.23% | not published | not published | [mastra.ai](https://mastra.ai/research/observational-memory) |
|
|
63
|
+
| Supermemory gpt-4o | 81.6% | not published | not published | [supermemory.ai](https://supermemory.ai/research/) |
|
|
64
|
+
| EmergenceMem Simple Fast (rerun in agentos-bench) | 80.6% | $0.0586 | 3,703 ms | [adapter](https://github.com/framersai/agentos-bench/blob/master/vendors/emergence-simple-fast/) |
|
|
65
|
+
| Zep self / independent reproduction | 71.2% / 63.8% | not published | not published | [self](https://blog.getzep.com/state-of-the-art-agent-memory/) / [arXiv](https://arxiv.org/abs/2512.13564) |
|
|
193
66
|
|
|
194
|
-
|
|
195
|
-
const { text: claude } = await generateText({
|
|
196
|
-
provider: 'anthropic',
|
|
197
|
-
prompt: 'Compare TCP and UDP.',
|
|
198
|
-
});
|
|
199
|
-
```
|
|
67
|
+
**+1.4 points above Mastra OM gpt-4o (84.23%) at the matched reader.** Among open-source memory libraries that publish at gpt-4o with publicly reproducible runs (per-case run JSONs at fixed seed, single-CLI reproduction), AgentOS at 85.6% is the highest published number. EmergenceMem Internal posts 86.0% (0.4 above us) but does not publish per-case results or a reproducible CLI. AgentOS p50 latency 3,558 ms vs EmergenceMem's published median 5,650 ms.
|
|
200
68
|
|
|
201
|
-
|
|
69
|
+
Notes on cross-provider numbers excluded from this table: Mastra also publishes 94.87% with a gpt-5-mini reader plus gemini-2.5-flash observer (cross-provider); agentmemory publishes 96.2% with a Claude Opus 4.6 reader; MemMachine publishes 93.0% with a GPT-5-mini reader; Hindsight publishes 91.4% with an unspecified stronger backbone. None of these are at the matched gpt-4o reader, and most do not publish full methodology details (judge model, dataset version, per-case results, single-CLI reproduction).
|
|
202
70
|
|
|
203
|
-
|
|
204
|
-
import { generateText } from '@framers/agentos';
|
|
71
|
+
**Cost at scale**: $0.0090 per memory-grounded answer = $9 per 1,000 RAG calls. A chatbot averaging 5 RAG calls per conversation across 1,000 conversations costs ~$45.
|
|
205
72
|
|
|
206
|
-
|
|
207
|
-
provider: 'anthropic',
|
|
208
|
-
prompt: 'Compare TCP and UDP.',
|
|
209
|
-
});
|
|
210
|
-
// Anthropic primary, falls through to OpenAI / Gemini / OpenRouter / etc. on retryable errors
|
|
211
|
-
```
|
|
73
|
+
### LongMemEval-M (1.5M tokens, 500 sessions)
|
|
212
74
|
|
|
213
|
-
|
|
75
|
+
The harder variant. M's haystacks exceed every production context window. Most vendors stop at S because raw long-context fits there.
|
|
214
76
|
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
|
|
220
|
-
|
|
221
|
-
|
|
77
|
+
| System | Accuracy | License | Source |
|
|
78
|
+
|---|---:|---|---|
|
|
79
|
+
| LongMemEval paper, strongest GPT-4o (round, Top-10) | 72.0% | open repo | [Wu et al., ICLR 2025, Table 3](https://arxiv.org/abs/2410.10813) |
|
|
80
|
+
| AgentBrain | 71.7% | closed-source SaaS | [github.com/AgentBrainHQ](https://github.com/AgentBrainHQ) |
|
|
81
|
+
| LongMemEval paper, strongest GPT-4o at Top-5 (session) | 71.4% | open repo | [Wu et al., ICLR 2025, Table 3](https://arxiv.org/abs/2410.10813) |
|
|
82
|
+
| **🚀 AgentOS** (sem-embed + reader-router + top-K=5) | **70.2%** | **Apache-2.0** | [post](https://docs.agentos.sh/blog/2026/04/29/longmemeval-m-70-with-topk5) |
|
|
83
|
+
| LongMemEval paper, GPT-4o at Top-5 (round) | 65.7% | open repo | [Wu et al., ICLR 2025, Table 3](https://arxiv.org/abs/2410.10813) |
|
|
84
|
+
| Mem0 v3, Mastra, Hindsight, Zep, EmergenceMem, Supermemory, Letta, others | not published | various | reports S only |
|
|
222
85
|
|
|
223
|
-
|
|
86
|
+
**Competitive with the strongest published M results in the LongMemEval paper.** At matched Top-5 retrieval, AgentOS at 70.2% is +4.5 points above the round-level configuration (65.7%) and 1.2 points below the session-level configuration (71.4%); the paper's strongest GPT-4o result overall is 72.0% at round-level Top-10. Among open-source memory libraries with publicly reproducible runs (per-case run JSONs at fixed seed, single-CLI reproduction), AgentOS is the only one on the public record above 65% on M.
|
|
224
87
|
|
|
225
|
-
|
|
226
|
-
import { generateText, buildFallbackChain } from '@framers/agentos';
|
|
88
|
+
> **[Full benchmarks page →](https://github.com/framersai/agentos-bench/blob/master/results/LEADERBOARD.md)** · **[Reproducible run JSONs →](https://github.com/framersai/agentos-bench/tree/master/results/runs)** · **[Methodology audit →](https://agentos.sh/en/blog/agentos-memory-sota-longmemeval/)**
|
|
227
89
|
|
|
228
|
-
|
|
229
|
-
provider: 'anthropic',
|
|
230
|
-
prompt: 'Compare TCP and UDP.',
|
|
231
|
-
fallbackProviders: [
|
|
232
|
-
{ provider: 'openai', model: 'gpt-4o-mini' },
|
|
233
|
-
{ provider: 'openrouter' },
|
|
234
|
-
],
|
|
235
|
-
});
|
|
236
|
-
```
|
|
90
|
+
---
|
|
237
91
|
|
|
238
|
-
|
|
92
|
+
## 📄 Technical Whitepaper · Coming Soon
|
|
239
93
|
|
|
240
|
-
|
|
241
|
-
import { streamText } from '@framers/agentos';
|
|
94
|
+
The full architecture and benchmark methodology, written for engineers and researchers who want a citable PDF instead of scrolling docs. Cognitive memory pipeline, classifier-driven dispatch, HEXACO personality modulation, runtime tool forging, full LongMemEval-S/M and LOCOMO benchmark methodology with confidence interval math, judge-FPR probes, per-stage retention metrics, and reproducibility recipes.
|
|
242
95
|
|
|
243
|
-
|
|
244
|
-
|
|
245
|
-
|
|
96
|
+
| Covers | What's inside |
|
|
97
|
+
|---|---|
|
|
98
|
+
| **Architecture** | Generalized Mind Instances, IngestRouter / MemoryRouter / ReadRouter, 8 cognitive mechanisms with primary-source citations |
|
|
99
|
+
| **Benchmarks** | LongMemEval-S 85.6%, LongMemEval-M 70.2%, vendor landscape, confidence interval methodology, judge FPR probes, full transparency stack |
|
|
100
|
+
| **Reproducibility** | Per-case run JSONs at `--seed 42`, single-CLI reproduction, Apache-2.0 bench at [github.com/framersai/agentos-bench](https://github.com/framersai/agentos-bench) |
|
|
246
101
|
|
|
247
|
-
|
|
102
|
+
**[Notify me when it drops →](mailto:team@frame.dev?subject=AgentOS%20Whitepaper%20Notify)** · **[Read the benchmarks now →](https://github.com/framersai/agentos-bench/blob/master/results/LEADERBOARD.md)** · **[Discord](https://wilds.ai/discord)**
|
|
248
103
|
|
|
249
|
-
|
|
250
|
-
import { generateObject } from '@framers/agentos';
|
|
251
|
-
import { z } from 'zod';
|
|
252
|
-
|
|
253
|
-
const { object } = await generateObject({
|
|
254
|
-
provider: 'gemini',
|
|
255
|
-
schema: z.object({
|
|
256
|
-
sentiment: z.enum(['positive', 'negative', 'neutral']),
|
|
257
|
-
topics: z.array(z.string()),
|
|
258
|
-
}),
|
|
259
|
-
prompt: 'Analyze: "Great camera but disappointing battery."',
|
|
260
|
-
});
|
|
261
|
-
```
|
|
104
|
+
---
|
|
262
105
|
|
|
263
|
-
|
|
106
|
+
## Classifier-Driven Memory Pipeline
|
|
264
107
|
|
|
265
|
-
|
|
266
|
-
import { agent } from '@framers/agentos';
|
|
108
|
+
Most memory libraries retrieve on every query. AgentOS gates memory through three LLM-as-judge classifiers in a single shared pass, so trivial queries skip retrieval entirely and the rest get the right architecture and reader per category.
|
|
267
109
|
|
|
268
|
-
|
|
269
|
-
|
|
270
|
-
|
|
110
|
+
```
|
|
111
|
+
User query
|
|
112
|
+
│
|
|
113
|
+
▼ Stage 1: QueryClassifier (gpt-5-mini, ~$0.0001/query)
|
|
114
|
+
│ T0=none ─────► answer from context, skip retrieval
|
|
115
|
+
│ T1+=needs memory
|
|
116
|
+
▼ Stage 2: MemoryRouter → canonical-hybrid · OM-v10 · OM-v11
|
|
117
|
+
▼ Stage 3: ReaderRouter → gpt-4o (TR/SSU) · gpt-5-mini (SSA/SSP/KU/MS)
|
|
118
|
+
▼
|
|
119
|
+
Grounded answer
|
|
271
120
|
```
|
|
272
121
|
|
|
273
|
-
|
|
122
|
+
Stages 2 and 3 reuse the Stage 1 classification, so the full pipeline costs **one classifier call per query**, not three. **The T0 / no-memory gate is the novel piece**: removing retrieval entirely for greetings and small talk saves the embedding + rerank + reader cost on a substantial fraction of typical agent traffic.
|
|
274
123
|
|
|
275
|
-
|
|
276
|
-
|
|
277
|
-
|
|
278
|
-
|
|
279
|
-
|
|
280
|
-
openness: 0.9,
|
|
281
|
-
conscientiousness: 0.95,
|
|
282
|
-
agreeableness: 0.85,
|
|
283
|
-
},
|
|
284
|
-
memory: {
|
|
285
|
-
types: ['episodic', 'semantic'],
|
|
286
|
-
working: { enabled: true, maxTokens: 1200 },
|
|
287
|
-
},
|
|
288
|
-
});
|
|
124
|
+
| Primitive | Source | Decision |
|
|
125
|
+
|---|---|---|
|
|
126
|
+
| `QueryClassifier` | [`@framers/agentos/query-router`](https://docs.agentos.sh/features/query-routing) | T0/none vs T1/simple vs T2/moderate vs T3/complex |
|
|
127
|
+
| `MemoryRouter` | [`@framers/agentos/memory-router`](https://docs.agentos.sh/features/memory-router) | canonical-hybrid vs observational-memory-v10 vs v11 |
|
|
128
|
+
| `ReaderRouter` | [`@framers/agentos/memory-router`](https://docs.agentos.sh/features/memory-router) | gpt-4o vs gpt-5-mini per category |
|
|
289
129
|
|
|
290
|
-
|
|
291
|
-
await session.send('Explain recursion with an analogy.');
|
|
292
|
-
await session.send('Can you expand on that?'); // remembers context
|
|
293
|
-
```
|
|
130
|
+
[Cognitive Pipeline docs →](https://docs.agentos.sh/features/cognitive-pipeline) · [Architecture deep dive →](https://docs.agentos.sh/blog/2026/04/10/cognitive-memory-architecture-deep-dive) · [Beyond RAG →](https://docs.agentos.sh/blog/2026/03/31/cognitive-memory-beyond-rag)
|
|
294
131
|
|
|
295
|
-
|
|
132
|
+
---
|
|
296
133
|
|
|
297
|
-
|
|
134
|
+
## Why AgentOS
|
|
298
135
|
|
|
299
|
-
|
|
300
|
-
|
|
136
|
+
| vs. | AgentOS differentiator |
|
|
137
|
+
|---|---|
|
|
138
|
+
| **LangChain / LangGraph** | Cognitive memory ([8 neuroscience-backed mechanisms](https://docs.agentos.sh/features/cognitive-memory)), HEXACO personality, runtime tool forging |
|
|
139
|
+
| **Vercel AI SDK** | Multi-agent teams (6 strategies), 7 vector backends, [guardrails](https://docs.agentos.sh/features/guardrails-architecture), voice/telephony |
|
|
140
|
+
| **CrewAI / Mastra** | Unified orchestration (DAGs + graphs + missions), personality-driven routing, **published reproducible numbers on LongMemEval-S (85.6%) and LongMemEval-M (70.2%) with full methodology disclosure** |
|
|
301
141
|
|
|
302
|
-
|
|
303
|
-
async getContext(text, opts) {
|
|
304
|
-
return { contextText: await recallRelevant(text, opts?.tokenBudget) };
|
|
305
|
-
},
|
|
306
|
-
async observe(role, text) {
|
|
307
|
-
await persist(role, text);
|
|
308
|
-
},
|
|
309
|
-
};
|
|
142
|
+
[Full framework comparison →](https://docs.agentos.sh/blog/2026/02/20/agentos-vs-langgraph-vs-crewai)
|
|
310
143
|
|
|
311
|
-
|
|
312
|
-
provider: 'anthropic',
|
|
313
|
-
instructions: 'You are a patient CS tutor.',
|
|
314
|
-
memoryProvider: myProvider,
|
|
315
|
-
});
|
|
144
|
+
---
|
|
316
145
|
|
|
317
|
-
|
|
318
|
-
// recorded after. No session required.
|
|
319
|
-
const stream = tutor.stream('Explain recursion.');
|
|
146
|
+
## Key Features
|
|
320
147
|
|
|
321
|
-
|
|
322
|
-
|
|
323
|
-
|
|
324
|
-
|
|
148
|
+
| Category | Highlights |
|
|
149
|
+
|---|---|
|
|
150
|
+
| **LLM Providers** | 16: OpenAI, Anthropic, Gemini, Groq, Ollama, OpenRouter, Together, Mistral, xAI, Claude/Gemini CLI, + 5 image/video |
|
|
151
|
+
| **Cognitive Memory** | 8 mechanisms: reconsolidation, retrieval-induced forgetting, involuntary recall, FOK, gist extraction, schema encoding, source decay, emotion regulation |
|
|
152
|
+
| **HEXACO Personality** | 6 traits modulate memory, retrieval bias, response style |
|
|
153
|
+
| **RAG Pipeline** | 7 vector backends · 4 retrieval strategies · GraphRAG · HyDE · Cohere rerank-v3.5 |
|
|
154
|
+
| **Multi-Agent Teams** | 6 coordination strategies · shared memory · inter-agent messaging · HITL gates |
|
|
155
|
+
| **Orchestration** | `workflow()` DAGs · `AgentGraph` cycles · `mission()` goal-driven planning · checkpointing |
|
|
156
|
+
| **Guardrails** | 5 security tiers · 6 packs (PII, ML classifiers, topicality, code safety, grounding, content policy) |
|
|
157
|
+
| **Emergent Capabilities** | Runtime tool forging · 4 self-improvement tools · tiered promotion · skill export |
|
|
158
|
+
| **Voice & Telephony** | ElevenLabs, Deepgram, Whisper · Twilio, Telnyx, Plivo |
|
|
159
|
+
| **Channels** | 37 platform adapters (Telegram, Discord, Slack, WhatsApp, webchat, ...) |
|
|
160
|
+
| **Observability** | OpenTelemetry · usage ledger · cost guard · circuit breaker |
|
|
325
161
|
|
|
326
|
-
|
|
162
|
+
---
|
|
327
163
|
|
|
328
|
-
|
|
164
|
+
## Multi-Agent in 6 Lines
|
|
329
165
|
|
|
330
166
|
```typescript
|
|
331
167
|
import { agency } from '@framers/agentos';
|
|
@@ -334,180 +170,83 @@ const team = agency({
|
|
|
334
170
|
strategy: 'graph',
|
|
335
171
|
agents: {
|
|
336
172
|
researcher: { provider: 'anthropic', instructions: 'Find relevant facts.' },
|
|
337
|
-
writer: { provider: 'openai',
|
|
338
|
-
reviewer: { provider: 'gemini',
|
|
173
|
+
writer: { provider: 'openai', instructions: 'Summarize clearly.', dependsOn: ['researcher'] },
|
|
174
|
+
reviewer: { provider: 'gemini', instructions: 'Check accuracy.', dependsOn: ['writer'] },
|
|
339
175
|
},
|
|
340
176
|
});
|
|
341
177
|
|
|
342
178
|
const result = await team.generate('Compare TCP vs UDP for game networking.');
|
|
343
179
|
```
|
|
344
180
|
|
|
345
|
-
|
|
346
|
-
|
|
347
|
-
### Multimodal
|
|
348
|
-
|
|
349
|
-
```typescript
|
|
350
|
-
import { generateImage, generateVideo, generateMusic, performOCR, embedText } from '@framers/agentos';
|
|
351
|
-
|
|
352
|
-
const image = await generateImage({ provider: 'openai', prompt: 'Neon cityscape at sunset' });
|
|
353
|
-
const video = await generateVideo({ prompt: 'Drone over misty forest' });
|
|
354
|
-
const music = await generateMusic({ prompt: 'Lo-fi hip hop beat' });
|
|
355
|
-
const ocr = await performOCR({ image: './receipt.png', strategy: 'progressive' });
|
|
356
|
-
const embed = await embedText({ provider: 'openai', input: ['hello', 'world'] });
|
|
357
|
-
```
|
|
358
|
-
|
|
359
|
-
### Orchestration
|
|
181
|
+
Strategies: `sequential` · `parallel` · `debate` · `review-loop` · `hierarchical` · `graph`. [Multi-agent docs →](https://docs.agentos.sh/features/multi-agent)
|
|
360
182
|
|
|
361
|
-
|
|
183
|
+
---
|
|
362
184
|
|
|
363
|
-
|
|
364
|
-
import { workflow, AgentGraph, mission } from '@framers/agentos/orchestration';
|
|
185
|
+
## See It In Action
|
|
365
186
|
|
|
366
|
-
|
|
367
|
-
const pipe = workflow('content').step('research', { tool: 'web_search' }).then('draft', { gmi: { instructions: '...' } }).compile();
|
|
187
|
+
### 🌀 Paracosm — AI Agent Swarm Simulation
|
|
368
188
|
|
|
369
|
-
|
|
370
|
-
const graph = new AgentGraph('review').addNode('draft', gmiNode({...})).addNode('review', judgeNode({...})).addEdge('draft','review').compile();
|
|
189
|
+
Define any scenario as JSON. Run it with AI commanders that have different HEXACO personalities. Same starting conditions, different decisions, divergent civilizations. Built on AgentOS.
|
|
371
190
|
|
|
372
|
-
|
|
373
|
-
|
|
191
|
+
```bash
|
|
192
|
+
npm install paracosm
|
|
374
193
|
```
|
|
375
194
|
|
|
376
|
-
|
|
377
|
-
|
|
378
|
-
AgentOS exposes related entry points at different depths. The shared config surface does not imply identical enforcement across them.
|
|
379
|
-
|
|
380
|
-
- The lightweight `agent()` facade owns prompt assembly, sessions, personality shaping, hooks, tools, and usage-ledger forwarding.
|
|
381
|
-
- `generateText()` and `streamText()` are the low-level generation helpers for provider control, native tool calling, and text-fallback tool loops.
|
|
382
|
-
- The full `AgentOS` runtime and `agency()` own emergent tooling, guardrails, discovery, RAG initialization, permissions/security tiers, HITL, channels/voice, and provenance-aware orchestration.
|
|
383
|
-
|
|
384
|
-
---
|
|
385
|
-
|
|
386
|
-
## Key Features
|
|
387
|
-
|
|
388
|
-
| Category | Highlights |
|
|
389
|
-
|----------|-----------|
|
|
390
|
-
| **LLM Providers** | 16 providers: OpenAI, Anthropic, Gemini, Groq, Ollama, OpenRouter, Together, Mistral, xAI, Claude CLI, Gemini CLI, + 5 image/video |
|
|
391
|
-
| **Cognitive Memory** | 8 neuroscience-backed mechanisms (reconsolidation, RIF, involuntary recall, FOK, gist extraction, schema encoding, source decay, emotion regulation) |
|
|
392
|
-
| **HEXACO Personality** | 6 traits modulate memory, retrieval bias, response style — agents have consistent identity |
|
|
393
|
-
| **RAG Pipeline** | 7 vector backends (InMemory, SQL, HNSW, Qdrant, Neo4j, pgvector, Pinecone) · 4 retrieval strategies · GraphRAG |
|
|
394
|
-
| **Multi-Agent Teams** | 6 coordination strategies · shared memory · inter-agent messaging · HITL approval gates |
|
|
395
|
-
| **Orchestration** | `workflow()` DAGs · `AgentGraph` cycles/subgraphs · `mission()` goal-driven planning · persistent checkpointing |
|
|
396
|
-
| **Guardrails** | 5 security tiers · 6 packs (PII redaction, ML classifiers, topicality, code safety, grounding, content policy) |
|
|
397
|
-
| **Emergent Capabilities** | Runtime tool forging · 4 self-improvement tools · tiered promotion (session → agent → shared) · skill export |
|
|
398
|
-
| **Capability Discovery** | Semantic per-turn tool selection · ~90% token reduction · 3-tier context model · Neo4j graph backend |
|
|
399
|
-
| **Skills** | 88 curated skills · 3-tier architecture (engine, content, catalog SDK) · auto-update on install |
|
|
400
|
-
| **Voice & Telephony** | ElevenLabs, Deepgram, OpenAI Whisper · Twilio, Telnyx, Plivo |
|
|
401
|
-
| **Channels** | 37 platform adapters (Telegram, Discord, Slack, WhatsApp, webchat, and more) |
|
|
402
|
-
| **Structured Output** | Zod-validated JSON extraction with retry · provider-native structured output |
|
|
403
|
-
| **Observability** | OpenTelemetry traces/metrics · usage ledger · cost guard · circuit breaker |
|
|
195
|
+
[Live Demo](https://paracosm.agentos.sh/sim) · [GitHub](https://github.com/framersai/paracosm) · [npm](https://www.npmjs.com/package/paracosm)
|
|
404
196
|
|
|
405
197
|
---
|
|
406
198
|
|
|
407
|
-
##
|
|
408
|
-
|
|
409
|
-
| Provider | Text Model | Image Model | Env Var |
|
|
410
|
-
|---|---|---|---|
|
|
411
|
-
| `openai` | gpt-4o | gpt-image-1 | `OPENAI_API_KEY` |
|
|
412
|
-
| `anthropic` | claude-sonnet-4 | — | `ANTHROPIC_API_KEY` |
|
|
413
|
-
| `gemini` | gemini-2.5-flash | — | `GEMINI_API_KEY` |
|
|
414
|
-
| `groq` | llama-3.3-70b | — | `GROQ_API_KEY` |
|
|
415
|
-
| `ollama` | llama3.2 | stable-diffusion | `OLLAMA_BASE_URL` |
|
|
416
|
-
| `openrouter` | openai/gpt-4o | — | `OPENROUTER_API_KEY` |
|
|
417
|
-
| `together` | Llama-3.1-70B | — | `TOGETHER_API_KEY` |
|
|
418
|
-
| `mistral` | mistral-large | — | `MISTRAL_API_KEY` |
|
|
419
|
-
| `xai` | grok-2 | — | `XAI_API_KEY` |
|
|
420
|
-
| `stability` | — | stable-diffusion-xl | `STABILITY_API_KEY` |
|
|
421
|
-
| `replicate` | — | flux-1.1-pro | `REPLICATE_API_TOKEN` |
|
|
422
|
-
| `bfl` | — | flux-pro-1.1 | `BFL_API_KEY` |
|
|
423
|
-
| `fal` | — | fal-ai/flux/dev | `FAL_API_KEY` |
|
|
424
|
-
| `claude-code-cli` | claude-sonnet-4 | — | `claude` on PATH |
|
|
425
|
-
| `gemini-cli` | gemini-2.5-flash | — | `gemini` on PATH |
|
|
199
|
+
## Configure API Keys
|
|
426
200
|
|
|
427
|
-
|
|
428
|
-
|
|
429
|
-
|
|
430
|
-
|
|
431
|
-
Three ways to specify a model:
|
|
432
|
-
|
|
433
|
-
```ts
|
|
434
|
-
// 1. Separate fields (recommended)
|
|
435
|
-
generateText({ provider: 'anthropic', model: 'claude-sonnet-4-20250514', prompt: '...' });
|
|
436
|
-
|
|
437
|
-
// 2. Colon format (canonical combined string)
|
|
438
|
-
generateText({ model: 'anthropic:claude-sonnet-4-20250514', prompt: '...' });
|
|
439
|
-
|
|
440
|
-
// 3. Slash format (also supported for known providers)
|
|
441
|
-
generateText({ model: 'anthropic/claude-sonnet-4-20250514', prompt: '...' });
|
|
201
|
+
```bash
|
|
202
|
+
export OPENAI_API_KEY=sk-...
|
|
203
|
+
export ANTHROPIC_API_KEY=sk-ant-...
|
|
204
|
+
export GEMINI_API_KEY=AIza...
|
|
442
205
|
|
|
443
|
-
|
|
444
|
-
|
|
206
|
+
# Comma-separated keys auto-rotate with quota detection
|
|
207
|
+
export OPENAI_API_KEY=sk-key1,sk-key2,sk-key3
|
|
445
208
|
```
|
|
446
209
|
|
|
447
|
-
|
|
210
|
+
Or pass `apiKey` inline on any call. Auto-detection order: OpenAI → Anthropic → OpenRouter → Gemini → Groq → Together → Mistral → xAI → CLI → Ollama. [Default models per provider →](https://docs.agentos.sh/architecture/llm-providers)
|
|
448
211
|
|
|
449
212
|
---
|
|
450
213
|
|
|
451
|
-
## API
|
|
452
|
-
|
|
453
|
-
### High-Level Functions
|
|
454
|
-
|
|
455
|
-
| Function | Description |
|
|
456
|
-
|----------|-------------|
|
|
457
|
-
| `generateText()` | Text generation with multi-step tool calling |
|
|
458
|
-
| `streamText()` | Streaming text with async iterables |
|
|
459
|
-
| `generateObject()` | Zod-validated structured output |
|
|
460
|
-
| `streamObject()` | Streaming structured output |
|
|
461
|
-
| `generateImage()` | Image generation (7 providers, character consistency) |
|
|
462
|
-
| `generateVideo()` | Video generation |
|
|
463
|
-
| `generateMusic()` / `generateSFX()` | Audio generation |
|
|
464
|
-
| `performOCR()` | Text extraction from images |
|
|
465
|
-
| `embedText()` | Embedding generation |
|
|
466
|
-
| `agent()` | Lightweight stateful agent for prompts, tools, memory, and sessions |
|
|
467
|
-
| `agency()` | Multi-agent teams plus full runtime-owned orchestration features |
|
|
468
|
-
|
|
469
|
-
### Orchestration
|
|
214
|
+
## API Surfaces
|
|
470
215
|
|
|
471
|
-
|
|
472
|
-
|
|
473
|
-
|
|
474
|
-
|
|
475
|
-
| `mission(name)` | Goal-driven, planner decides steps |
|
|
216
|
+
- **`agent()`**: lightweight stateful agent. Prompts, sessions, personality, hooks, tools, memory.
|
|
217
|
+
- **`agency()`**: multi-agent teams + full runtime. Emergent tooling, guardrails, RAG, voice, channels, HITL.
|
|
218
|
+
- **`generateText()` / `streamText()` / `generateObject()` / `generateImage()` / `generateVideo()` / `generateMusic()` / `performOCR()` / `embedText()`**: low-level multi-modal helpers with native tool calling.
|
|
219
|
+
- **`workflow()` / `AgentGraph` / `mission()`**: three orchestration authoring APIs over one graph runtime.
|
|
476
220
|
|
|
477
|
-
Full API reference
|
|
221
|
+
[Full API reference →](https://docs.agentos.sh/api) · [High-Level API guide →](https://docs.agentos.sh/getting-started/high-level-api)
|
|
478
222
|
|
|
479
223
|
---
|
|
480
224
|
|
|
481
225
|
## Ecosystem
|
|
482
226
|
|
|
483
227
|
| Package | Description |
|
|
484
|
-
|
|
485
|
-
| [`@framers/agentos`](https://www.npmjs.com/package/@framers/agentos) | Core runtime
|
|
228
|
+
|---|---|
|
|
229
|
+
| [`@framers/agentos`](https://www.npmjs.com/package/@framers/agentos) | Core runtime |
|
|
486
230
|
| [`@framers/agentos-extensions`](https://www.npmjs.com/package/@framers/agentos-extensions) | 100+ extensions and templates |
|
|
487
|
-
| [`@framers/agentos-extensions-registry`](https://www.npmjs.com/package/@framers/agentos-extensions-registry) | Curated manifest builder |
|
|
488
231
|
| [`@framers/agentos-skills`](https://www.npmjs.com/package/@framers/agentos-skills) | 88 curated SKILL.md definitions |
|
|
489
|
-
| [`@framers/agentos-
|
|
232
|
+
| [`@framers/agentos-bench`](https://github.com/framersai/agentos-bench) | Open benchmark harness with 95% confidence intervals, judge-FPR probes, per-case run JSONs (MIT-licensed; agentos itself is Apache 2.0) |
|
|
490
233
|
| [`@framers/sql-storage-adapter`](https://www.npmjs.com/package/@framers/sql-storage-adapter) | SQL persistence (SQLite, Postgres, IndexedDB) |
|
|
234
|
+
| [paracosm](https://www.npmjs.com/package/paracosm) | AI agent swarm simulation engine |
|
|
491
235
|
|
|
492
|
-
|
|
236
|
+
**Extensions auto-pickup at startup.** The runtime walks the curated registry plus user-supplied extension paths, resolves each pack's `createExtensionPack(context)` factory, and registers tools, guardrails, and channels without manual wiring. The same model applies to skills: SKILL.md files are auto-discovered from the curated registry and any local `skills/` directory, with capability gating and HITL approval for side-effecting installs. See [extensions architecture](https://docs.agentos.sh/architecture/extension-loading) for the full loading model.
|
|
493
237
|
|
|
494
|
-
|
|
238
|
+
---
|
|
495
239
|
|
|
496
|
-
|
|
497
|
-
|-------|-------|
|
|
498
|
-
| [Architecture](./docs/architecture/ARCHITECTURE.md) | System design, data flow, layer breakdown |
|
|
499
|
-
| [High-Level API](./docs/getting-started/HIGH_LEVEL_API.md) | `generateText`, `agent`, `agency` reference |
|
|
500
|
-
| [Orchestration](./docs/orchestration/UNIFIED_ORCHESTRATION.md) | Workflows, graphs, missions |
|
|
501
|
-
| [Cognitive Memory](./docs/memory/COGNITIVE_MECHANISMS.md) | 8 mechanisms, 30+ APA citations |
|
|
502
|
-
| [RAG Configuration](./docs/memory/RAG_MEMORY_CONFIGURATION.md) | Vector stores, embeddings, data sources |
|
|
503
|
-
| [Guardrails](./docs/safety/GUARDRAILS_USAGE.md) | 5 tiers, 6 packs |
|
|
504
|
-
| [Human-in-the-Loop](./docs/safety/HUMAN_IN_THE_LOOP.md) | Approval workflows, escalation |
|
|
505
|
-
| [Emergent Capabilities](./docs/architecture/EMERGENT_CAPABILITIES.md) | Runtime tool forging |
|
|
506
|
-
| [Channels & Platforms](./docs/architecture/PLATFORM_SUPPORT.md) | 37 platform adapters |
|
|
507
|
-
| [Voice Pipeline](./docs/features/VOICE_PIPELINE.md) | TTS, STT, telephony |
|
|
508
|
-
| [Uncensored Content](./docs/features/UNCENSORED_CONTENT.md) | `policyTier`-driven routing for mature text + image generation |
|
|
240
|
+
## Documentation & Community
|
|
509
241
|
|
|
510
|
-
|
|
242
|
+
- **[Benchmarks](https://github.com/framersai/agentos-bench/blob/master/results/LEADERBOARD.md)**: matched-reader benchmark tables, 95% confidence intervals, methodology audit
|
|
243
|
+
- **[Architecture](https://docs.agentos.sh/architecture/system-architecture)**: system design, layer breakdown
|
|
244
|
+
- **[Cognitive Memory](https://docs.agentos.sh/features/cognitive-memory)**: 8 mechanisms with 30+ APA citations
|
|
245
|
+
- **[RAG Configuration](https://docs.agentos.sh/features/rag-memory-configuration)**: vector stores, embeddings, sources
|
|
246
|
+
- **[Guardrails](https://docs.agentos.sh/features/guardrails-architecture)**: 5 tiers, 6 packs
|
|
247
|
+
- **[Voice Pipeline](https://docs.agentos.sh/features/voice-pipeline)**: TTS, STT, telephony
|
|
248
|
+
- **[Blog](https://docs.agentos.sh/blog)**: engineering posts, benchmark publications, transparency audits
|
|
249
|
+
- **[Discord](https://wilds.ai/discord)** · **[GitHub Issues](https://github.com/framersai/agentos/issues)** · **[Wilds.ai](https://wilds.ai)** (AI game worlds powered by AgentOS)
|
|
511
250
|
|
|
512
251
|
---
|
|
513
252
|
|
|
@@ -518,17 +257,7 @@ git clone https://github.com/framersai/agentos.git && cd agentos
|
|
|
518
257
|
pnpm install && pnpm build && pnpm test
|
|
519
258
|
```
|
|
520
259
|
|
|
521
|
-
|
|
522
|
-
|
|
523
|
-
---
|
|
524
|
-
|
|
525
|
-
## Community
|
|
526
|
-
|
|
527
|
-
- **Discord:** [wilds.ai/discord](https://wilds.ai/discord)
|
|
528
|
-
- **GitHub Issues:** [github.com/framersai/agentos/issues](https://github.com/framersai/agentos/issues)
|
|
529
|
-
- **Blog:** [docs.agentos.sh/blog](https://docs.agentos.sh/blog)
|
|
530
|
-
- **Paracosm:** [paracosm.agentos.sh](https://paracosm.agentos.sh) — AI agent swarm simulation engine built on AgentOS
|
|
531
|
-
- **Wilds.ai:** [wilds.ai](https://wilds.ai) — AI game worlds powered by AgentOS
|
|
260
|
+
[Contributing Guide](https://github.com/framersai/agentos/blob/master/CONTRIBUTING.md) · We use [Conventional Commits](https://www.conventionalcommits.org/).
|
|
532
261
|
|
|
533
262
|
---
|
|
534
263
|
|