npm - open-agents-ai - Versions diffs - 0.187.266 → 0.187.267 - Mend

open-agents-ai 0.187.266 → 0.187.267

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md +16 -0
package/dist/index.js +24977 -24368
package/dist/launcher.cjs +7 -429
package/dist/postinstall-daemon.cjs +2 -495
package/dist/preinstall.cjs +2 -563
package/package.json +2 -2
package/dist/scripts/.env +0 -14
package/dist/scripts/.scrape_setup_complete +0 -1

package/README.md CHANGED Viewed

@@ -280,6 +280,22 @@ The agent uses tools autonomously in a loop — reading errors, fixing code, and
 - **Littleman Observer** — parallel meta-analysis system that watches the agent loop in real-time. Detects false failure claims after successful tools, blocks redundant re-execution, catches runaway one-sided output in conversations, and dynamically extends turn limits when active work is detected. Emits `debug_context` and `debug_littleman` events for live observability
 - **Interactive Session Lock** — generic `SESSION_ACTIVE` protocol prevents premature task completion during long-running sessions (phone calls, live chat, monitoring). Any MCP contract can adopt the protocol. Paired with context-engineered system prompts that teach small models to maintain conversation loops
 - **Voice Chat** — `/voicechat` starts an async voice conversation that runs parallel to the main agent loop. Mic audio is transcribed via Whisper and injected as user messages; agent responses are synthesized to speech via TTS. Neither blocks the other — talk to the agent while it works
+### Cross-Modal Workers
+Open Agents includes background workers that compute and associate embeddings across vision, audio, and text:
+- Visual embeddings: CLIP ViT-B/32 (OpenCLIP) image embeddings for episodes with `modality: "visual"`.
+- Audio embeddings: speaker embeddings (ECAPA) when available; automatic fallback to normalized log‑mel in constrained environments.
+- Transcription: Whisper runs automatically for audio ingests; transcripts are stored as text episodes and embedded for retrieval.
+- Associations: `appears_in` for visual presence, `said_by` for transcripts, and `alias_of` for alternate labels (e.g., username + display name). Workers also link visual episodes to nearby transcripts via a time-window co‑occurrence pass.
+Config (env vars):
+- `OA_COOCUR_WINDOW_MS` — max time delta between visual and transcript episodes to create co‑occurrence links (default: 120000 ms).
+- `OA_COOCUR_CLIP_SIM_MIN` — minimum CLIP text↔image cosine (0..1, default: 0.22) for linking when both embeddings are available.
+The daemon auto-installs Python dependencies (OpenCLIP, torchaudio + soundfile, speechbrain, Whisper) into `~/.open-agents/venv` and registers providers automatically. No manual installs are required.
 - **Ralph Loop** — iterative task execution that keeps retrying until completion criteria are met
 - **Dream Mode** — creative idle exploration modeled after real sleep architecture (NREM→REM cycles)
 - **COHERE Cognitive Stack** — layered cognitive architecture implementing [Recursive Language Models](https://arxiv.org/abs/2512.24601), [SPRINT parallel reasoning](https://arxiv.org/abs/2506.05745), governed memory metabolism, identity kernel with continuity register, immune-system reflection, [strategy-space exploration](https://arxiv.org/abs/2603.02045), and **distributed inference mesh** — any `/cohere` participant automatically serves AND consumes inference from the network with complexity-based model routing, multi-node claim coordination, IPFS-pinned identity persistence, model exposure control, and Ollama safety hardening. See [COHERE Framework](#cohere-cognitive-framework) below