npm - open-agents-ai - Versions diffs - 0.187.194 → 0.187.196 - Mend

open-agents-ai 0.187.194 → 0.187.196

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -1053,6 +1053,174 @@ MODEL=qwen3.5:32b OA_TIMEOUT=600 bash scripts/oa-vs-ollama-chat-compare.sh  # hi
 **Bottom line**: for any question that needs fresh data, system access, or filesystem visibility — bare Ollama is wrong or refuses; OA with the full agent is correct with citations. That's the differentiator captured live in the harness output.
+#### One-Off Completions — `/api/generate` + `/v1/generate`
+Drop-in for **Ollama `/api/generate`**. Same body shape, same response shape, same port-swap semantics as `/api/chat`. No session history — pure one-shot completion. The full agent runs under the hood by default (`tools: true`), returning the final `assistant_text` wrapped in Ollama's shape.
+```bash
+# Ollama (bare LLM)
+curl -s http://127.0.0.1:11434/api/generate \
+  -d '{"model":"qwen3.5:9b","prompt":"Name 3 open-source databases.","stream":false}'
+# OA with full agent — only port changed
+curl -s http://127.0.0.1:11435/api/generate \
+  -d '{"model":"qwen3.5:9b","prompt":"Name 3 open-source databases.","stream":false}'
+# OA direct backend bypass (fast path, no agent)
+curl -s http://127.0.0.1:11435/api/generate \
+  -d '{"model":"qwen3.5:9b","prompt":"Name 3 open-source databases.","stream":false,"tools":false}'
+```
+**Response shape** — Ollama-native so any client parsing `done`, `response`, `total_duration` keeps working:
+```json
+{
+  "model": "qwen3.5:9b",
+  "created_at": "2026-04-07T22:01:08Z",
+  "response": "1. PostgreSQL\n2. MongoDB\n3. Redis",
+  "done": true,
+  "done_reason": "stop",
+  "total_duration": 18000000000,
+  "eval_count": 45,
+  "_oa": {
+    "tool_calls": 0,
+    "finish_reason": "stop",
+    "duration_ms": 17991,
+    "request_id": "..."
+  }
+}
+```
+The `_oa` extension block carries the OA-specific metadata (tool call count, agent duration, request ID for correlation with `/v1/audit`). Strict Ollama clients ignore unknown fields — no client changes required.
+**Streaming** — set `"stream": true` and receive Ollama-style NDJSON chunks:
+```
+{"model":"qwen3.5:9b","created_at":"...","response":"","done":false,"_oa":{"type":"tool_call","tool":"web_search","args":{...}}}
+{"model":"qwen3.5:9b","created_at":"...","response":"PostgreSQL...","done":false}
+{"model":"qwen3.5:9b","created_at":"...","response":"...","done":true,"done_reason":"stop","total_duration":18000000000,"eval_count":45}
+```
+Tool-call events appear as NDJSON frames with `_oa.type: "tool_call"` interleaved between content frames.
+#### Embeddings — `/v1/embeddings` + `/api/embed`
+Drop-in for Ollama `/api/embed` (returns Ollama's `{embeddings: [[...]]}` shape) **and** OpenAI `/v1/embeddings` (returns OpenAI's `{object:"list", data: [{object:"embedding", embedding:[...], index: 0}]}` shape). The endpoint path determines the response shape; both wire to the same backend embedding model.
+```bash
+# Ollama shape
+curl -s http://127.0.0.1:11435/api/embed \
+  -d '{"model":"nomic-embed-text","input":"hello world"}'
+# OpenAI shape
+curl -s http://127.0.0.1:11435/v1/embeddings \
+  -d '{"model":"nomic-embed-text","input":"hello world"}'
+```
+Both paths accept `{input: "..."}` or `{prompt: "..."}` in the body, and both support `input: ["a","b","c"]` for batched embeddings.
+#### Memory Recall + Knowledge Graph — `/v1/memory/*`
+Backed by `@open-agents/memory` (SQLite + better-sqlite3). The endpoints expose the daemon's persistent memory stores that the agent uses under the hood.
+```bash
+# Backend summary
+curl -s http://127.0.0.1:11435/v1/memory
+# Write a memory entry (run scope)
+curl -s -X POST http://127.0.0.1:11435/v1/memory/write \
+  -d '{"kind":"fact","content":"PostgreSQL supports JSONB indexing via GIN.","tags":["db","postgres"]}'
+# Semantic/keyword search (returns ranked episodes)
+curl -s -X POST http://127.0.0.1:11435/v1/memory/search \
+  -d '{"query":"postgres indexing","limit":5}'
+# Paginated episode walk (knowledge graph)
+curl -s 'http://127.0.0.1:11435/v1/memory/episodes?limit=10'
+# Paginated failure store (anti-patterns)
+curl -s 'http://127.0.0.1:11435/v1/memory/failures?limit=10'
+```
+**Example search response** — search returns real episode records with timestamps, content, importance scores, and retrieval counts:
+```json
+{
+  "query": "sorting algorithm complexity",
+  "results": [
+    {
+      "kind": "episode",
+      "id": "89e5b7f3-e6ee-462f-97fa-e9f1bbec3d73",
+      "timestamp": 1775599267977,
+      "content": "The QuickSort algorithm has average O(n log n), worst case O(n²)",
+      "contentHash": "fd43a4bc9bfbec3b",
+      "importance": 0.5,
+      "decayClass": "daily",
+      "strength": 2,
+      "lastRetrieved": 1775599267983
+    }
+  ]
+}
+```
+The `strength` and `lastRetrieved` fields are updated on every search — the store keeps a read-count that decays over time, matching the spaced-repetition model used by the agent for context selection.
+#### Generate/Embed/Memory Test Harness
+A second harness at [`scripts/oa-vs-ollama-generate-embed-memory.sh`](scripts/oa-vs-ollama-generate-embed-memory.sh) covers the four non-chat endpoint families:
+```bash
+MODEL=qwen3.5:9b EMBED_MODEL=nomic-embed-text \
+  bash scripts/oa-vs-ollama-generate-embed-memory.sh
+```
+**Tested results from `open-agents-ai@0.187.195`** (live, single run, `qwen3.5:9b` + `nomic-embed-text`):
+**Part 1 — `/api/generate` one-off prompts**:
+| Prompt | Ollama | OA direct | OA full agent |
+|---|---|---|---|
+| "TCP vs UDP in one sentence" | 26.8s — correct | 12.5s — correct | 43.8s — correct, **1 tool call** |
+| "One-line Python square function" | 32.1s — correct | 12.2s — correct | ~3min — correct, **2 tool calls** |
+| "Name 3 open-source databases" | 36.6s — Postgres/MySQL/SQLite | 21.0s — Postgres/MySQL/MongoDB | 18.2s — Postgres/MongoDB/Redis |
+**Part 2 — `/api/embed` cosine similarity sanity** (4 test sentences):
+Both Ollama and OA emitted **identical 768-dim vectors** (same backend). Cosine similarity matrix:
+```
+                   France→Par  Paris→Fran  Germany→Be   Bananas
+France→Paris          1.000       0.979       1.000      0.449
+Paris→France          0.979       1.000       0.979      0.477
+Germany→Berlin        1.000       0.979       1.000      0.449
+Bananas               0.449       0.477       0.449      1.000
+```
+Semantic sanity check: `sim(Paris, Paris-paraphrase) = 0.979 > sim(Paris, Bananas) = 0.449`. ✅ Both endpoints `0.22–0.25s` per 4 embeddings.
+**Part 3 — `/v1/memory/write` + `/v1/memory/search`** round-trip:
+```
+write: "The QuickSort algorithm has O(n log n) average...")  → {"status":"written", "timestamp":"2026-04-07T22:01:07.931Z"}
+write: "HTTP/2 uses binary framing..."                        → {"status":"written", ...}
+write: "The Rust ownership model enforces memory safety..."   → {"status":"written", ...}
+search query="sorting algorithm complexity" → 3 episodes returned with content, importance, strength, lastRetrieved
+search query="network protocol streaming"  → 3 episodes returned (strength incremented on re-read)
+```
+Every write round-trips correctly. Search returns ranked episodes with updated `strength` and `lastRetrieved` timestamps — the spaced-repetition reinforcement loop is live.
+**Part 4 — Knowledge graph walk** (`/v1/memory/episodes`, `/v1/memory/failures`):
+```
+GET /v1/memory              → backends: episodes (available), failures (available), temporal_graph (available)
+GET /v1/memory/episodes     → paginated episode list with {data, pagination}
+GET /v1/memory/failures     → paginated failure list with {data, pagination}
+```
+Empty on a fresh daemon; populates as the agent runs tasks. Fixed in v0.187.195 — earlier versions silently fell back to "memory stores unavailable" because the dynamic `await import("@open-agents/memory")` didn't resolve in the esbuild-bundled daemon. Now uses a static top-level import.
 #### AIWG Cascade — `/v1/aiwg/*`
 Exposes the entire AIWG ecosystem (5 frameworks, 19 addons, 136+ skills, ~42 MB / ~2M tokens of markdown) through a **4-tier cascade loader** that auto-sizes responses to the detected model tier and **never overflows small-model context**.