npm - open-agents-ai - Versions diffs - 0.187.191 → 0.187.193 - Mend

open-agents-ai 0.187.191 → 0.187.193

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -642,9 +642,85 @@ curl -X DELETE -H "Authorization: Bearer $ADMIN_KEY" \
   http://localhost:11435/v1/profiles/frontend-dev
 ```
+#### Parallelism & Concurrency
+The daemon is built for **unbounded concurrent requests** with per-key enforcement. Every agentic task (`/v1/run`, `/v1/chat`, `/api/chat`, `/api/generate`) spawns its own subprocess, so multiple jobs run in true parallel — same model or different models, same or different profiles, same or different sandbox modes.
+**Per-key concurrency limits** are enforced from the `OA_API_KEYS` env var:
+```bash
+# key:scope:user:rpm:tpd:maxJobs
+OA_API_KEYS="ci-key:run:github-actions:60:100000:5, \
+             ops-key:admin:ops:120:500000:20, \
+             read-key:read:grafana:600::"
+oa serve
+```
+The 6th field is `maxJobs` — the maximum number of **concurrent** (in-flight) agentic tasks for that key. When exceeded, the daemon returns **RFC 7807 `429 Too Many Requests`**:
+```json
+{
+  "type": "https://openagents.nexus/problems/rate-limited",
+  "title": "Concurrent job limit exceeded",
+  "status": 429,
+  "detail": "Concurrent job limit exceeded for github-actions: 5/5",
+  "instance": "a1b2c3d4-..."
+}
+```
+> **Previously this was dead code.** `maxJobs` was parsed but never checked — a CI key with `maxJobs:5` could spawn 50 concurrent subprocesses and OOM the host. Fixed in v0.187.189.
+**64-bit job IDs** — `job-${randomBytes(8).toString("hex")}`. At 1M jobs the birthday-paradox collision risk drops from ~0.1% (old 24-bit IDs) to ~10⁻¹⁰. Bumped in v0.187.189.
+**Atomic job record writes** — all 4 job state transitions (initial spawn, stream-exit, non-stream-exit, cancel) use `atomicJobWrite()` which writes to `.tmp` then `rename()`s. No race conditions between concurrent `DELETE /v1/runs/:id` and child-exit handlers. Fixed in v0.187.189.
+**Running concurrent jobs**:
+```bash
+# Fire 5 different jobs with 5 different models in parallel
+for model in qwen3.5:4b qwen3.5:9b qwen3.5:32b qwen3.5:72b qwen3.5:122b; do
+  curl -s -X POST http://localhost:11435/v1/run \
+    -H "Authorization: Bearer $KEY" \
+    -H "Content-Type: application/json" \
+    -d "{\"task\":\"Describe $model in one sentence\",\"model\":\"$model\",\"stream\":false}" &
+done
+wait
+```
+Each subprocess inherits a **clean env** — `OA_DAEMON` and `OA_PORT` are explicitly stripped so the child doesn't re-enter daemon mode. Fixed in v0.187.189 (root cause of the earlier "Task incomplete (0 turns, 0 tool calls)" bug).
+**Observing parallelism live** — subscribe to the event bus to watch every job lifecycle event:
+```bash
+curl -N 'http://localhost:11435/v1/events?type=run.*'
+```
+Every spawn, completion, failure, and abort publishes to the bus:
+```
+event: run.started
+data: {"type":"run.started","ts":"2026-04-07T21:00:14Z","data":{"run_id":"job-3a7c9f1e2b8d0a45","model":"qwen3.5:9b","pid":12345},"subject":"ci-key","aims:control":"A.6.2.6"}
+event: run.completed
+data: {"type":"run.completed","ts":"2026-04-07T21:00:39Z","data":{"run_id":"job-3a7c9f1e2b8d0a45","exit_code":0,"summary":"..."},"subject":"ci-key","aims:control":"A.6.2.6"}
+```
+**Abort a running job** — SIGTERM the process group, then SIGKILL after 3s:
+```bash
+curl -X DELETE http://localhost:11435/v1/runs/job-3a7c9f1e2b8d0a45 \
+  -H "Authorization: Bearer $KEY"
+```
+Also cleans up the Docker container if the job was spawned with `"sandbox":"container"`. Decrements the per-key `activeJobs` counter so the quota is immediately released. Publishes `run.aborted` on the event bus.
+**Safety timeout on `/v1/chat` + `/api/chat` + `/api/generate`** — the non-streaming paths bound the subprocess wait at `timeout_s + 30s` (default `180s + 30s = 210s`). If the child doesn't close in time, the daemon SIGTERMs then SIGKILLs it and returns an OpenAI-shaped `finish_reason:"error"` response with the real reason. Fixed in v0.187.191.
+**Tested end-to-end** — 10 concurrent `/v1/skills` GETs, 3 concurrent `/v1/aims/incidents` POSTs (each gets a unique ID, no write races), 2 concurrent `/v1/events` SSE subscribers (both receive the same events). All covered by `packages/cli/tests/api-endpoint-matrix.test.ts`. 201/201 tests green.
 #### Endpoint Reference
-> **Verified against `open-agents-ai@0.187.189`.** Examples in earlier README revisions are deprecated.
+> **Verified against `open-agents-ai@0.187.191`.** Examples in earlier README revisions are deprecated.
 **Health & observability**
 | Method | Path | Auth | Description |
@@ -666,11 +742,15 @@ curl -X DELETE -H "Authorization: Bearer $ADMIN_KEY" \
 | GET | `/v1/models` | read | List models (aggregated across endpoints) |
 | POST | `/v1/chat/completions` | read | Chat inference (sync + stream, OpenAI-shaped) |
 | POST | `/v1/embeddings` | read | Generate embeddings |
+| POST | `/api/embed` | read | **Ollama-compatible alias** of `/v1/embeddings`. Accepts `{model, input}` or `{model, prompt}`. |
-**Chat with full agent (drop-in for /v1/chat/completions)**
+**Chat with full agent (drop-in for Ollama /api/chat and OpenAI /v1/chat/completions)**
 | Method | Path | Auth | Description |
 |--------|------|------|-------------|
-| POST | `/v1/chat` | run | Full agent under the hood, OpenAI chat.completion shape. Default = tools=true (subprocess agent). Set `tools:false` for direct backend bypass. |
+| POST | `/v1/chat` | run | Full agent under the hood, OpenAI chat.completion shape. Default = tools=true (subprocess agent). Set `tools:false` for direct backend bypass. Supports `timeout_s` body field (default 180s). Non-streaming path has a safety SIGTERM→SIGKILL after `timeout_s + 30s`. |
+| POST | `/api/chat` | run | **Ollama-compatible alias** — same handler as `/v1/chat`. Accepts both OA-shape (`{message, model}`) and Ollama-shape (`{model, messages: [...]}`) bodies. Returns OpenAI `chat.completion` shape on success and failure (failure uses `finish_reason:"error"`). |
+| POST | `/v1/generate` | run | **One-off completion** — same agent stack as `/v1/chat` but no session history. Returns Ollama-shape `{model, response, done, total_duration}`. |
+| POST | `/api/generate` | run | **Ollama-compatible alias** of `/v1/generate`. Drop-in for Ollama `/api/generate`. |
 | GET | `/v1/chat/sessions` | read | List active chat sessions |
 **Agentic task execution**
@@ -796,14 +876,43 @@ curl -X DELETE -H "Authorization: Bearer $ADMIN_KEY" \
 | POST | `/v1/aiwg/use` | run | `aiwg use all` equivalent — model-tier-sized activation bundle |
 | POST | `/v1/aiwg/expand` | run | Sub-agent unpack a specific skill/agent on demand |
-#### Stateful Chat — `/v1/chat` (OpenAI drop-in with full agent under the hood)
+#### Stateful Chat — `/v1/chat` + `/api/chat` (OpenAI drop-in with full agent under the hood)
+The chat endpoint is mounted at **two paths on port 11435**:
+| Path | Purpose |
+|------|---------|
+| `POST /v1/chat` | OA-native path |
+| `POST /api/chat` | **Ollama-compatible alias** — same handler, so clients pointing at Ollama can be flipped over by changing only the port (`11434` → `11435`) |
+It's a **drop-in replacement for OpenAI `/v1/chat/completions` and Ollama `/api/chat`**. The endpoint runs the full OA agent (tools, multi-agent, memory, skills) under the hood and returns an **OpenAI `chat.completion`-shaped response** so any client SDK can use it without modification.
-`/v1/chat` is a **drop-in replacement for OpenAI `/v1/chat/completions` and Ollama `/api/chat`**. The endpoint runs the full OA agent (tools, multi-agent, memory, skills) under the hood and returns an **OpenAI `chat.completion`-shaped response** so any client SDK can use it without modification.
+**Both body shapes are accepted** on either path:
-> **Two modes:**
-> - **Default (`tools` unset or `tools: true`)** — full agent: spawns the OA subprocess with the entire 82-tool set, runs the agent loop, returns the final answer.
+```jsonc
+// OA-native
+{"message": "hello", "model": "qwen3.5:9b", "stream": false}
+// Ollama-native (the `messages` array; the last user message is extracted)
+{"model": "qwen3.5:9b", "messages": [{"role":"user","content":"hello"}], "stream": false}
+```
+> **Two execution modes:**
+> - **Default (`tools` unset or `tools: true`)** — full agent: spawns the OA subprocess with the entire 82-tool set, runs the agent loop, returns the final answer with `tool_calls` metadata.
 > - **Direct (`tools: false`)** — fast path: bypasses the agent and forwards straight to the configured backend (Ollama/vLLM) using the session history. Useful for plain chat without tools.
+**Safety timeout** — every non-streaming request is bounded by `timeout_s` (default **180s**). If the agent subprocess doesn't close in `timeout_s + 30s`, the daemon SIGTERMs (then SIGKILLs) it and returns an OpenAI-shaped error with `finish_reason:"error"` and a clear explanation. No more hung requests.
+**Flip Ollama → OA by port alone** — this is verified to work via `scripts/oa-vs-ollama-chat-compare.sh` (see [Live Comparison](#live-comparison-ollama-vs-oa-full-agent) below):
+```bash
+# Before (Ollama)
+curl -s http://127.0.0.1:11434/api/chat -d '{"model":"qwen3.5:9b","messages":[{"role":"user","content":"hi"}],"stream":false}'
+# After (OA with full agent) — only port changed
+curl -s http://127.0.0.1:11435/api/chat -d '{"model":"qwen3.5:9b","messages":[{"role":"user","content":"hi"}],"stream":false}'
+```
 ```bash
 # DEFAULT: full agent — multi-step tool use, memory, the works.
 # Returns OpenAI chat.completion shape with the assistant's final answer.
@@ -904,6 +1013,46 @@ curl -s http://localhost:11435/v1/chat \
 Sessions expire after 30 minutes of inactivity. List active sessions: `GET /v1/chat/sessions`.
+#### Live Comparison: Ollama vs OA Full Agent
+The repo ships a reproducible side-by-side harness at [`scripts/oa-vs-ollama-chat-compare.sh`](scripts/oa-vs-ollama-chat-compare.sh). It runs **5 tool-call-required prompts** × **4 phases** (Ollama non-stream, OA non-stream, Ollama stream, OA stream) = **20 runs per invocation** with the same model and the same `/api/chat` path on both ports.
+```bash
+MODEL=qwen3.5:9b bash scripts/oa-vs-ollama-chat-compare.sh
+```
+**Results from `open-agents-ai@0.187.191` with `qwen3.5:9b`** (all 20 runs completed, zero timeouts):
+| # | Prompt | Ollama (bare) | Open Agents (full agent) | Winner |
+|---|---|---|---|---|
+| 1 | "Latest stable Node.js version + source URL" | ❌ **v22.10.0** — hallucinated from Aug-2024 training cutoff | ✅ **v25.9.0** fetched from `nodejs.org/download/current`, **3 tool calls** (`web_search` → `web_fetch` → `task_complete`) | **OA** |
+| 2 | "Biggest tech news this week + source URL" | ❌ "I don't have real-time access" + generic AI trend guess | ✅ **Anthropic Mythos, Intel Terafab, Apple foldable, Russian router breach, Firmus $5.5B** — sourced from TechCrunch, **4 tool calls** | **OA** |
+| 3 | "Current OS, CPU cores, free memory — use shell tools" | ❌ Confabulated **"Linux / 8 cores / 6.1 GB"** (all wrong) | ✅ **Ubuntu 24.04.2 / 48 cores / 120 GB** (all correct), **6–7 shell tool calls** | **OA** |
+| 4 | "List files in cwd, count top level, most recent" | ❌ "I cannot access your filesystem" | ✅ **20 files, 50+ dirs, `.claude.json` (81 KB, 09:09 UTC)** via `list_directory`, **2 tool calls** | **OA** |
+| 5 | "2022 FIFA World Cup final winner + score" (both endpoints have this in training data) | ✅ Argentina 4–2 France | ✅ Argentina 3–3 France, **4–2 on penalties at Lusail Stadium, Dec 18 2022** — grounded with 4 tool calls | **Tie (OA more detailed)** |
+**Latency profile** (wall clock, 5-prompt median):
+| Phase | Ollama | OA agent | OA overhead |
+|---|---|---|---|
+| Non-streaming | 12–18s | 24–42s | 12–26s (agent loop + tool calls) |
+| Streaming SSE | 11–16s | 24–56s | 10–40s |
+**Streaming parser validation** — every OA stream delivered:
+- Live intermediate `tool_call` events mid-stream (e.g. `['web_search', 'web_fetch', 'task_complete']`)
+- OpenAI `chat.completion.chunk` deltas with `id`, `model`, `finish_reason`
+- Clean `data: [DONE]` termination with `finish_reason:"stop"`
+The harness is **reproducible** — rerun it after any `/v1/chat` change to catch regressions:
+```bash
+MODEL=qwen3.5:4b bash scripts/oa-vs-ollama-chat-compare.sh       # faster tier for quick smoke
+MODEL=qwen3.5:9b OA_TIMEOUT=300 bash scripts/oa-vs-ollama-chat-compare.sh   # default
+MODEL=qwen3.5:32b OA_TIMEOUT=600 bash scripts/oa-vs-ollama-chat-compare.sh  # higher tier
+```
+**Bottom line**: for any question that needs fresh data, system access, or filesystem visibility — bare Ollama is wrong or refuses; OA with the full agent is correct with citations. That's the differentiator captured live in the harness output.
 #### AIWG Cascade — `/v1/aiwg/*`
 Exposes the entire AIWG ecosystem (5 frameworks, 19 addons, 136+ skills, ~42 MB / ~2M tokens of markdown) through a **4-tier cascade loader** that auto-sizes responses to the detected model tier and **never overflows small-model context**.