open-agents-ai 0.187.191 → 0.187.193
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +156 -7
- package/dist/index.js +420 -652
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -642,9 +642,85 @@ curl -X DELETE -H "Authorization: Bearer $ADMIN_KEY" \
|
|
|
642
642
|
http://localhost:11435/v1/profiles/frontend-dev
|
|
643
643
|
```
|
|
644
644
|
|
|
645
|
+
#### Parallelism & Concurrency
|
|
646
|
+
|
|
647
|
+
The daemon is built for **unbounded concurrent requests** with per-key enforcement. Every agentic task (`/v1/run`, `/v1/chat`, `/api/chat`, `/api/generate`) spawns its own subprocess, so multiple jobs run in true parallel — same model or different models, same or different profiles, same or different sandbox modes.
|
|
648
|
+
|
|
649
|
+
**Per-key concurrency limits** are enforced from the `OA_API_KEYS` env var:
|
|
650
|
+
|
|
651
|
+
```bash
|
|
652
|
+
# key:scope:user:rpm:tpd:maxJobs
|
|
653
|
+
OA_API_KEYS="ci-key:run:github-actions:60:100000:5, \
|
|
654
|
+
ops-key:admin:ops:120:500000:20, \
|
|
655
|
+
read-key:read:grafana:600::"
|
|
656
|
+
oa serve
|
|
657
|
+
```
|
|
658
|
+
|
|
659
|
+
The 6th field is `maxJobs` — the maximum number of **concurrent** (in-flight) agentic tasks for that key. When exceeded, the daemon returns **RFC 7807 `429 Too Many Requests`**:
|
|
660
|
+
|
|
661
|
+
```json
|
|
662
|
+
{
|
|
663
|
+
"type": "https://openagents.nexus/problems/rate-limited",
|
|
664
|
+
"title": "Concurrent job limit exceeded",
|
|
665
|
+
"status": 429,
|
|
666
|
+
"detail": "Concurrent job limit exceeded for github-actions: 5/5",
|
|
667
|
+
"instance": "a1b2c3d4-..."
|
|
668
|
+
}
|
|
669
|
+
```
|
|
670
|
+
|
|
671
|
+
> **Previously this was dead code.** `maxJobs` was parsed but never checked — a CI key with `maxJobs:5` could spawn 50 concurrent subprocesses and OOM the host. Fixed in v0.187.189.
|
|
672
|
+
|
|
673
|
+
**64-bit job IDs** — `job-${randomBytes(8).toString("hex")}`. At 1M jobs the birthday-paradox collision risk drops from ~0.1% (old 24-bit IDs) to ~10⁻¹⁰. Bumped in v0.187.189.
|
|
674
|
+
|
|
675
|
+
**Atomic job record writes** — all 4 job state transitions (initial spawn, stream-exit, non-stream-exit, cancel) use `atomicJobWrite()` which writes to `.tmp` then `rename()`s. No race conditions between concurrent `DELETE /v1/runs/:id` and child-exit handlers. Fixed in v0.187.189.
|
|
676
|
+
|
|
677
|
+
**Running concurrent jobs**:
|
|
678
|
+
|
|
679
|
+
```bash
|
|
680
|
+
# Fire 5 different jobs with 5 different models in parallel
|
|
681
|
+
for model in qwen3.5:4b qwen3.5:9b qwen3.5:32b qwen3.5:72b qwen3.5:122b; do
|
|
682
|
+
curl -s -X POST http://localhost:11435/v1/run \
|
|
683
|
+
-H "Authorization: Bearer $KEY" \
|
|
684
|
+
-H "Content-Type: application/json" \
|
|
685
|
+
-d "{\"task\":\"Describe $model in one sentence\",\"model\":\"$model\",\"stream\":false}" &
|
|
686
|
+
done
|
|
687
|
+
wait
|
|
688
|
+
```
|
|
689
|
+
|
|
690
|
+
Each subprocess inherits a **clean env** — `OA_DAEMON` and `OA_PORT` are explicitly stripped so the child doesn't re-enter daemon mode. Fixed in v0.187.189 (root cause of the earlier "Task incomplete (0 turns, 0 tool calls)" bug).
|
|
691
|
+
|
|
692
|
+
**Observing parallelism live** — subscribe to the event bus to watch every job lifecycle event:
|
|
693
|
+
|
|
694
|
+
```bash
|
|
695
|
+
curl -N 'http://localhost:11435/v1/events?type=run.*'
|
|
696
|
+
```
|
|
697
|
+
|
|
698
|
+
Every spawn, completion, failure, and abort publishes to the bus:
|
|
699
|
+
|
|
700
|
+
```
|
|
701
|
+
event: run.started
|
|
702
|
+
data: {"type":"run.started","ts":"2026-04-07T21:00:14Z","data":{"run_id":"job-3a7c9f1e2b8d0a45","model":"qwen3.5:9b","pid":12345},"subject":"ci-key","aims:control":"A.6.2.6"}
|
|
703
|
+
|
|
704
|
+
event: run.completed
|
|
705
|
+
data: {"type":"run.completed","ts":"2026-04-07T21:00:39Z","data":{"run_id":"job-3a7c9f1e2b8d0a45","exit_code":0,"summary":"..."},"subject":"ci-key","aims:control":"A.6.2.6"}
|
|
706
|
+
```
|
|
707
|
+
|
|
708
|
+
**Abort a running job** — SIGTERM the process group, then SIGKILL after 3s:
|
|
709
|
+
|
|
710
|
+
```bash
|
|
711
|
+
curl -X DELETE http://localhost:11435/v1/runs/job-3a7c9f1e2b8d0a45 \
|
|
712
|
+
-H "Authorization: Bearer $KEY"
|
|
713
|
+
```
|
|
714
|
+
|
|
715
|
+
Also cleans up the Docker container if the job was spawned with `"sandbox":"container"`. Decrements the per-key `activeJobs` counter so the quota is immediately released. Publishes `run.aborted` on the event bus.
|
|
716
|
+
|
|
717
|
+
**Safety timeout on `/v1/chat` + `/api/chat` + `/api/generate`** — the non-streaming paths bound the subprocess wait at `timeout_s + 30s` (default `180s + 30s = 210s`). If the child doesn't close in time, the daemon SIGTERMs then SIGKILLs it and returns an OpenAI-shaped `finish_reason:"error"` response with the real reason. Fixed in v0.187.191.
|
|
718
|
+
|
|
719
|
+
**Tested end-to-end** — 10 concurrent `/v1/skills` GETs, 3 concurrent `/v1/aims/incidents` POSTs (each gets a unique ID, no write races), 2 concurrent `/v1/events` SSE subscribers (both receive the same events). All covered by `packages/cli/tests/api-endpoint-matrix.test.ts`. 201/201 tests green.
|
|
720
|
+
|
|
645
721
|
#### Endpoint Reference
|
|
646
722
|
|
|
647
|
-
> **Verified against `open-agents-ai@0.187.
|
|
723
|
+
> **Verified against `open-agents-ai@0.187.191`.** Examples in earlier README revisions are deprecated.
|
|
648
724
|
|
|
649
725
|
**Health & observability**
|
|
650
726
|
| Method | Path | Auth | Description |
|
|
@@ -666,11 +742,15 @@ curl -X DELETE -H "Authorization: Bearer $ADMIN_KEY" \
|
|
|
666
742
|
| GET | `/v1/models` | read | List models (aggregated across endpoints) |
|
|
667
743
|
| POST | `/v1/chat/completions` | read | Chat inference (sync + stream, OpenAI-shaped) |
|
|
668
744
|
| POST | `/v1/embeddings` | read | Generate embeddings |
|
|
745
|
+
| POST | `/api/embed` | read | **Ollama-compatible alias** of `/v1/embeddings`. Accepts `{model, input}` or `{model, prompt}`. |
|
|
669
746
|
|
|
670
|
-
**Chat with full agent (drop-in for /v1/chat/completions)**
|
|
747
|
+
**Chat with full agent (drop-in for Ollama /api/chat and OpenAI /v1/chat/completions)**
|
|
671
748
|
| Method | Path | Auth | Description |
|
|
672
749
|
|--------|------|------|-------------|
|
|
673
|
-
| POST | `/v1/chat` | run | Full agent under the hood, OpenAI chat.completion shape. Default = tools=true (subprocess agent). Set `tools:false` for direct backend bypass. |
|
|
750
|
+
| POST | `/v1/chat` | run | Full agent under the hood, OpenAI chat.completion shape. Default = tools=true (subprocess agent). Set `tools:false` for direct backend bypass. Supports `timeout_s` body field (default 180s). Non-streaming path has a safety SIGTERM→SIGKILL after `timeout_s + 30s`. |
|
|
751
|
+
| POST | `/api/chat` | run | **Ollama-compatible alias** — same handler as `/v1/chat`. Accepts both OA-shape (`{message, model}`) and Ollama-shape (`{model, messages: [...]}`) bodies. Returns OpenAI `chat.completion` shape on success and failure (failure uses `finish_reason:"error"`). |
|
|
752
|
+
| POST | `/v1/generate` | run | **One-off completion** — same agent stack as `/v1/chat` but no session history. Returns Ollama-shape `{model, response, done, total_duration}`. |
|
|
753
|
+
| POST | `/api/generate` | run | **Ollama-compatible alias** of `/v1/generate`. Drop-in for Ollama `/api/generate`. |
|
|
674
754
|
| GET | `/v1/chat/sessions` | read | List active chat sessions |
|
|
675
755
|
|
|
676
756
|
**Agentic task execution**
|
|
@@ -796,14 +876,43 @@ curl -X DELETE -H "Authorization: Bearer $ADMIN_KEY" \
|
|
|
796
876
|
| POST | `/v1/aiwg/use` | run | `aiwg use all` equivalent — model-tier-sized activation bundle |
|
|
797
877
|
| POST | `/v1/aiwg/expand` | run | Sub-agent unpack a specific skill/agent on demand |
|
|
798
878
|
|
|
799
|
-
#### Stateful Chat — `/v1/chat` (OpenAI drop-in with full agent under the hood)
|
|
879
|
+
#### Stateful Chat — `/v1/chat` + `/api/chat` (OpenAI drop-in with full agent under the hood)
|
|
880
|
+
|
|
881
|
+
The chat endpoint is mounted at **two paths on port 11435**:
|
|
882
|
+
|
|
883
|
+
| Path | Purpose |
|
|
884
|
+
|------|---------|
|
|
885
|
+
| `POST /v1/chat` | OA-native path |
|
|
886
|
+
| `POST /api/chat` | **Ollama-compatible alias** — same handler, so clients pointing at Ollama can be flipped over by changing only the port (`11434` → `11435`) |
|
|
887
|
+
|
|
888
|
+
It's a **drop-in replacement for OpenAI `/v1/chat/completions` and Ollama `/api/chat`**. The endpoint runs the full OA agent (tools, multi-agent, memory, skills) under the hood and returns an **OpenAI `chat.completion`-shaped response** so any client SDK can use it without modification.
|
|
800
889
|
|
|
801
|
-
|
|
890
|
+
**Both body shapes are accepted** on either path:
|
|
802
891
|
|
|
803
|
-
|
|
804
|
-
|
|
892
|
+
```jsonc
|
|
893
|
+
// OA-native
|
|
894
|
+
{"message": "hello", "model": "qwen3.5:9b", "stream": false}
|
|
895
|
+
|
|
896
|
+
// Ollama-native (the `messages` array; the last user message is extracted)
|
|
897
|
+
{"model": "qwen3.5:9b", "messages": [{"role":"user","content":"hello"}], "stream": false}
|
|
898
|
+
```
|
|
899
|
+
|
|
900
|
+
> **Two execution modes:**
|
|
901
|
+
> - **Default (`tools` unset or `tools: true`)** — full agent: spawns the OA subprocess with the entire 82-tool set, runs the agent loop, returns the final answer with `tool_calls` metadata.
|
|
805
902
|
> - **Direct (`tools: false`)** — fast path: bypasses the agent and forwards straight to the configured backend (Ollama/vLLM) using the session history. Useful for plain chat without tools.
|
|
806
903
|
|
|
904
|
+
**Safety timeout** — every non-streaming request is bounded by `timeout_s` (default **180s**). If the agent subprocess doesn't close in `timeout_s + 30s`, the daemon SIGTERMs (then SIGKILLs) it and returns an OpenAI-shaped error with `finish_reason:"error"` and a clear explanation. No more hung requests.
|
|
905
|
+
|
|
906
|
+
**Flip Ollama → OA by port alone** — this is verified to work via `scripts/oa-vs-ollama-chat-compare.sh` (see [Live Comparison](#live-comparison-ollama-vs-oa-full-agent) below):
|
|
907
|
+
|
|
908
|
+
```bash
|
|
909
|
+
# Before (Ollama)
|
|
910
|
+
curl -s http://127.0.0.1:11434/api/chat -d '{"model":"qwen3.5:9b","messages":[{"role":"user","content":"hi"}],"stream":false}'
|
|
911
|
+
|
|
912
|
+
# After (OA with full agent) — only port changed
|
|
913
|
+
curl -s http://127.0.0.1:11435/api/chat -d '{"model":"qwen3.5:9b","messages":[{"role":"user","content":"hi"}],"stream":false}'
|
|
914
|
+
```
|
|
915
|
+
|
|
807
916
|
```bash
|
|
808
917
|
# DEFAULT: full agent — multi-step tool use, memory, the works.
|
|
809
918
|
# Returns OpenAI chat.completion shape with the assistant's final answer.
|
|
@@ -904,6 +1013,46 @@ curl -s http://localhost:11435/v1/chat \
|
|
|
904
1013
|
|
|
905
1014
|
Sessions expire after 30 minutes of inactivity. List active sessions: `GET /v1/chat/sessions`.
|
|
906
1015
|
|
|
1016
|
+
#### Live Comparison: Ollama vs OA Full Agent
|
|
1017
|
+
|
|
1018
|
+
The repo ships a reproducible side-by-side harness at [`scripts/oa-vs-ollama-chat-compare.sh`](scripts/oa-vs-ollama-chat-compare.sh). It runs **5 tool-call-required prompts** × **4 phases** (Ollama non-stream, OA non-stream, Ollama stream, OA stream) = **20 runs per invocation** with the same model and the same `/api/chat` path on both ports.
|
|
1019
|
+
|
|
1020
|
+
```bash
|
|
1021
|
+
MODEL=qwen3.5:9b bash scripts/oa-vs-ollama-chat-compare.sh
|
|
1022
|
+
```
|
|
1023
|
+
|
|
1024
|
+
**Results from `open-agents-ai@0.187.191` with `qwen3.5:9b`** (all 20 runs completed, zero timeouts):
|
|
1025
|
+
|
|
1026
|
+
| # | Prompt | Ollama (bare) | Open Agents (full agent) | Winner |
|
|
1027
|
+
|---|---|---|---|---|
|
|
1028
|
+
| 1 | "Latest stable Node.js version + source URL" | ❌ **v22.10.0** — hallucinated from Aug-2024 training cutoff | ✅ **v25.9.0** fetched from `nodejs.org/download/current`, **3 tool calls** (`web_search` → `web_fetch` → `task_complete`) | **OA** |
|
|
1029
|
+
| 2 | "Biggest tech news this week + source URL" | ❌ "I don't have real-time access" + generic AI trend guess | ✅ **Anthropic Mythos, Intel Terafab, Apple foldable, Russian router breach, Firmus $5.5B** — sourced from TechCrunch, **4 tool calls** | **OA** |
|
|
1030
|
+
| 3 | "Current OS, CPU cores, free memory — use shell tools" | ❌ Confabulated **"Linux / 8 cores / 6.1 GB"** (all wrong) | ✅ **Ubuntu 24.04.2 / 48 cores / 120 GB** (all correct), **6–7 shell tool calls** | **OA** |
|
|
1031
|
+
| 4 | "List files in cwd, count top level, most recent" | ❌ "I cannot access your filesystem" | ✅ **20 files, 50+ dirs, `.claude.json` (81 KB, 09:09 UTC)** via `list_directory`, **2 tool calls** | **OA** |
|
|
1032
|
+
| 5 | "2022 FIFA World Cup final winner + score" (both endpoints have this in training data) | ✅ Argentina 4–2 France | ✅ Argentina 3–3 France, **4–2 on penalties at Lusail Stadium, Dec 18 2022** — grounded with 4 tool calls | **Tie (OA more detailed)** |
|
|
1033
|
+
|
|
1034
|
+
**Latency profile** (wall clock, 5-prompt median):
|
|
1035
|
+
|
|
1036
|
+
| Phase | Ollama | OA agent | OA overhead |
|
|
1037
|
+
|---|---|---|---|
|
|
1038
|
+
| Non-streaming | 12–18s | 24–42s | 12–26s (agent loop + tool calls) |
|
|
1039
|
+
| Streaming SSE | 11–16s | 24–56s | 10–40s |
|
|
1040
|
+
|
|
1041
|
+
**Streaming parser validation** — every OA stream delivered:
|
|
1042
|
+
- Live intermediate `tool_call` events mid-stream (e.g. `['web_search', 'web_fetch', 'task_complete']`)
|
|
1043
|
+
- OpenAI `chat.completion.chunk` deltas with `id`, `model`, `finish_reason`
|
|
1044
|
+
- Clean `data: [DONE]` termination with `finish_reason:"stop"`
|
|
1045
|
+
|
|
1046
|
+
The harness is **reproducible** — rerun it after any `/v1/chat` change to catch regressions:
|
|
1047
|
+
|
|
1048
|
+
```bash
|
|
1049
|
+
MODEL=qwen3.5:4b bash scripts/oa-vs-ollama-chat-compare.sh # faster tier for quick smoke
|
|
1050
|
+
MODEL=qwen3.5:9b OA_TIMEOUT=300 bash scripts/oa-vs-ollama-chat-compare.sh # default
|
|
1051
|
+
MODEL=qwen3.5:32b OA_TIMEOUT=600 bash scripts/oa-vs-ollama-chat-compare.sh # higher tier
|
|
1052
|
+
```
|
|
1053
|
+
|
|
1054
|
+
**Bottom line**: for any question that needs fresh data, system access, or filesystem visibility — bare Ollama is wrong or refuses; OA with the full agent is correct with citations. That's the differentiator captured live in the harness output.
|
|
1055
|
+
|
|
907
1056
|
#### AIWG Cascade — `/v1/aiwg/*`
|
|
908
1057
|
|
|
909
1058
|
Exposes the entire AIWG ecosystem (5 frameworks, 19 addons, 136+ skills, ~42 MB / ~2M tokens of markdown) through a **4-tier cascade loader** that auto-sizes responses to the detected model tier and **never overflows small-model context**.
|