open-agents-ai 0.187.172 → 0.187.174

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +145 -10
  2. package/dist/index.js +1247 -1137
  3. package/package.json +2 -2
package/README.md CHANGED
@@ -40,7 +40,8 @@ An autonomous multi-turn tool-calling agent that reads your code, makes changes,
40
40
  - [Model-Tier Awareness](#model-tier-awareness)
41
41
  - [Live Code Knowledge Graph](#live-code-knowledge-graph)
42
42
  - [Auto-Expanding Context Window](#auto-expanding-context-window)
43
- - [Tools (68+)](#tools-68)
43
+ - [Tools (85+)](#tools-85)
44
+ - [Associative Memory & Cross-Modal Binding](#associative-memory--cross-modal-binding)
44
45
  - [Ralph Loop — Iteration-First Design](#ralph-loop--iteration-first-design)
45
46
  - [Task Control](#task-control)
46
47
  - [COHERE Cognitive Framework](#cohere-cognitive-framework)
@@ -923,7 +924,7 @@ On startup and `/model` switch, Open Agents detects your RAM/VRAM and creates an
923
924
 
924
925
 
925
926
 
926
- ## Tools (68)
927
+ ## Tools (85+)
927
928
 
928
929
  <div align="right"><a href="#top">back to top</a></div>
929
930
 
@@ -1010,11 +1011,24 @@ On startup and `/model` switch, Open Agents detects your RAM/VRAM and creates an
1010
1011
  | **Hardware Access** | |
1011
1012
  | `camera_capture` | Access system cameras — list devices, capture JPEG frames, query capabilities. Uses ffmpeg + v4l2. Supports USB, CSI, and 360 cameras (QooCam, RealSense). Captured images can be piped to vision tools |
1012
1013
  | `audio_capture` | Record from microphone — list input devices, record WAV/MP3 (configurable duration/rate/channels), check real-time mic level (RMS dBFS). Uses arecord + ffmpeg backends |
1013
- | `audio_playback` | Speaker control and TTS — play audio files (WAV/MP3/OGG), text-to-speech via espeak-ng (multi-language), get/set system volume. Uses aplay/ffplay/amixer backends |
1014
+ | `audio_playback` | Speaker control and TTS — play audio files (WAV/MP3/OGG), text-to-speech via LuxTTS voice clone (persistent GPU daemon, ~2s synthesis), get/set system volume. Uses aplay/ffplay/amixer backends |
1014
1015
  | `wifi_control` | WiFi network scanning and management — scan nearby networks (SSID, signal, channel, security), list WiFi adapters (built-in + USB dongles), connect/disconnect, check connection status, toggle monitor mode. Auto-detects AC600/RTL8811AU and other USB adapters |
1015
1016
  | `bluetooth_scan` | Bluetooth device discovery — scan for Classic and BLE devices, list HCI adapters, get device info. Uses hcitool/bluetoothctl backends |
1016
1017
  | `sdr_scan` | Software-defined radio scanning — frequency sweeps, ADS-B aircraft tracking (1090 MHz), FM radio capture. Auto-installs rtl-sdr tools when RTL-SDR hardware detected. Uses rtl_power/rtl_fm/dump1090 |
1017
1018
  | `flipper_zero` | Flipper Zero multi-tool control — Sub-GHz scanning (315/433/868/915 MHz), NFC tag reading, 125kHz RFID reading, IR capture, GPIO pin reading, storage browsing. Serial CLI via /dev/ttyACM* |
1019
+ | `meshtastic` | Mesh network communication via LoRa — send/receive messages, list nodes, get device info, configure channels. Auto-installs meshtastic CLI in venv, auto-fixes serial permissions via pkexec |
1020
+ | `gps_location` | GPS positioning from 45+ USB receivers — auto-detects device, probes NMEA at multiple baud rates. Uses pyserial+pynmea2 for reliable parsing. Returns lat/lon/alt/speed/heading |
1021
+ | `audio_analyze` | Audio scene analysis — YAMNet 521-class classification (AudioSet taxonomy), Silero VAD voice activity detection, FFT spectrum analysis with peak frequency detection |
1022
+ | `asr_listen` | Record from microphone and transcribe speech to text — combines audio capture + Whisper ASR in one call. Uses PipeWire (bluetooth/USB) → faster-whisper → openai-whisper backends |
1023
+ | **Visual Intelligence** | |
1024
+ | `visual_memory` | Face recognition + object memory — InsightFace ArcFace 512d face enrollment/identification, CLIP ViT-B/32 object teaching/recognition. Persistent face+object databases in `.open-agents/visual-memory/` |
1025
+ | `multimodal_memory` | Cross-modal episode binding — captures face + voice + text + location into unified episodes. Actions: capture (photo+audio), meet (register person with name+face+voice), recall (associative retrieval), timeline (chronological query) |
1026
+ | **Associative Memory** | |
1027
+ | `episode_store` | SQLite episode store with triple-factor scoring (recency x importance x relevance), 4-class temporal decay (session/daily/procedural/permanent), Ebbinghaus strengthening on retrieval |
1028
+ | `temporal_graph` | Temporal knowledge graph with Graphiti-style valid_from/valid_until edges, entity upsert with mention counting, temporal queries, neighbor traversal for context building |
1029
+ | `zettelkasten` | A-MEM Zettelkasten note linking — retroactive context evolution, top-3 neighbor discovery via cosine similarity, bidirectional linking |
1030
+ | `ppr_retrieval` | HippoRAG Personalized PageRank retrieval — entity extraction, seed node mapping, multi-hop associative traversal over temporal KG, episode scoring |
1031
+ | `gist_compressor` | ReadAgent-style trajectory compression — deterministic gist extraction from multi-turn interactions, no LLM needed |
1018
1032
 
1019
1033
  Read-only tools execute concurrently when called in the same turn. Mutating tools run sequentially.
1020
1034
 
@@ -1049,7 +1063,7 @@ The agent can access physical hardware — cameras, microphones, and speakers
1049
1063
  | List cameras | `camera_capture` action=list | Discover `/dev/video*` devices |
1050
1064
  | Record audio | `audio_capture` action=record duration=10 | Record 10s WAV from default mic |
1051
1065
  | Check if mic works | `audio_capture` action=level | RMS level in dBFS |
1052
- | Speak aloud | `audio_playback` action=speak text="Hello" | TTS via espeak-ng |
1066
+ | Speak aloud | `audio_playback` action=speak text="Hello" | TTS via LuxTTS voice clone |
1053
1067
  | Play a sound file | `audio_playback` action=play file=alert.wav | Play WAV/MP3/OGG |
1054
1068
  | Check volume | `audio_playback` action=volume | Get current volume % |
1055
1069
  | Set volume | `audio_playback` action=volume volume=50 | Set to 50% |
@@ -1067,12 +1081,33 @@ The agent can access physical hardware — cameras, microphones, and speakers
1067
1081
  | Sub-GHz scan | `flipper_zero` action=subghz_scan frequency=433920000 | RF signals |
1068
1082
  | Read NFC tag | `flipper_zero` action=nfc_read | Tag UID, type |
1069
1083
  | Read RFID tag | `flipper_zero` action=rfid_read | 125kHz tag ID |
1084
+ | Send mesh message | `meshtastic` action=send message="Hello mesh" | LoRa broadcast |
1085
+ | List mesh nodes | `meshtastic` action=nodes | All nodes + signal info |
1086
+ | Get GPS location | `gps_location` action=locate | Lat/lon/alt/speed |
1087
+ | Analyze audio scene | `audio_analyze` action=classify file="rec.wav" | Top AudioSet classes |
1088
+ | Detect voice activity | `audio_analyze` action=vad file="rec.wav" | Speech segments |
1089
+ | Listen + transcribe | `asr_listen` action=listen duration=8 | Record + Whisper ASR |
1090
+ | Transcribe audio file | `asr_listen` action=transcribe file="rec.wav" | Whisper transcription |
1091
+ | Enroll a face | `visual_memory` action=enroll name="Alice" image="photo.jpg" | Face database entry |
1092
+ | Identify faces | `visual_memory` action=identify image="photo.jpg" | Known face matches |
1093
+ | Teach an object | `visual_memory` action=teach label="coffee_mug" image="obj.jpg" | CLIP object memory |
1094
+ | Meet a person | `multimodal_memory` action=meet name="Bob" | Photo+voice+text episode |
1095
+ | Recall a person | `multimodal_memory` action=recall query="Bob" | Associative memory search |
1096
+ | Event timeline | `multimodal_memory` action=timeline | Chronological episodes |
1070
1097
 
1071
- **Prerequisites**: `ffmpeg`, `arecord`, `aplay`, `amixer` (ALSA utils), `espeak-ng`, `bluez` (Bluetooth). Install: `sudo apt install ffmpeg alsa-utils espeak-ng bluez`
1098
+ **Prerequisites**: `ffmpeg`, `arecord`, `aplay`, `amixer` (ALSA utils), `bluez` (Bluetooth). Install: `sudo apt install ffmpeg alsa-utils bluez`
1072
1099
 
1073
- **Camera support**: USB cameras (UVC), Intel RealSense (via UVC), 360 cameras (QooCam, Ricoh Theta raw fisheye via v4l2loopback + ffmpeg crop). The captured frame is returned as base64 JPEG that can be fed directly to the `vision` tool for analysis.
1100
+ **Camera support**: USB cameras (UVC), Intel RealSense (via UVC), QooCam 8K 360 via WiFi OSC protocol (auto-discovers hotspot, connects, switches modes, captures frames). Captured frames returned as base64 JPEG for direct piping to `vision` or `visual_memory` tools.
1074
1101
 
1075
- **Audio workflow**: Record → transcribe → analyze: `audio_capture action=record` `transcribe_file` → process transcript. The tools handle device enumeration and graceful degradation when hardware is unavailable.
1102
+ **Audio workflow**: Record → transcribe → analyze → remember:
1103
+ 1. `audio_capture action=record` → WAV recording
1104
+ 2. `asr_listen action=listen` → record + Whisper transcription in one call
1105
+ 3. `audio_analyze action=classify` → YAMNet scene classification (521 AudioSet classes)
1106
+ 4. `multimodal_memory action=meet` → bind face + voice + text into persistent episode
1107
+
1108
+ **Mesh/GPS/SDR**: Auto-installs dependencies when hardware is detected. Meshtastic creates a Python venv with the CLI. GPS auto-probes NMEA at multiple baud rates. RTL-SDR auto-blacklists kernel modules and installs udev rules via pkexec.
1109
+
1110
+ **Visual Intelligence**: `visual_memory` provides persistent face recognition (InsightFace ArcFace 512d) and object memory (CLIP ViT-B/32). `multimodal_memory` binds all modalities into cross-session episodes with associative recall.
1076
1111
 
1077
1112
 
1078
1113
  ## Ralph Loop — Iteration-First Design
@@ -1561,7 +1596,7 @@ The emotion system is informed by peer-reviewed and preprint research:
1561
1596
  /voice clone overwatch # Generate clone ref from Overwatch → LuxTTS
1562
1597
  ```
1563
1598
 
1564
- Auto-downloads the ONNX voice model (~50MB) on first use. Install `espeak-ng` for best quality (`apt install espeak-ng` / `brew install espeak-ng`).
1599
+ Auto-downloads the ONNX voice model (~50MB) on first use. LuxTTS is the primary TTS engine with a persistent GPU daemon that keeps the model warm in VRAM for ~2s synthesis latency.
1565
1600
 
1566
1601
  ### LuxTTS Voice Cloning
1567
1602
 
@@ -1583,6 +1618,8 @@ Auto-downloads the ONNX voice model (~50MB) on first use. Install `espeak-ng` fo
1583
1618
  - **Pitch** → post-synthesis resampling via `resamplePitch()` (valence+arousal tanh curve)
1584
1619
  - **Volume** → WAV sample scaling (dominance-driven)
1585
1620
 
1621
+ **Persistent GPU daemon**: The `audio_playback` tool runs a persistent LuxTTS daemon process that keeps the ZipVoice model warm in GPU memory (~19GB VRAM). First call starts the daemon (~7s model load), subsequent calls synthesize in ~2s. The daemon communicates via JSON-over-stdin/stdout protocol and caches encoded voice prompts for instant reuse. Falls back to standalone synthesis (~10s) if the daemon stalls.
1622
+
1586
1623
  Output: 48kHz WAV, compatible with Telegram voice messages and WebSocket streaming.
1587
1624
 
1588
1625
  ### Narration Engine Architecture
@@ -2477,6 +2514,98 @@ Every completed task is logged to `.oa/trajectories/trajectories.jsonl` with ful
2477
2514
  | **Skill extraction** | Post-task via `/skillify` | Converts corrections into reusable SKILL.md |
2478
2515
 
2479
2516
 
2517
+ ## Associative Memory & Cross-Modal Binding
2518
+
2519
+ <div align="right"><a href="#top">back to top</a></div>
2520
+
2521
+ Open Agents implements a full associative memory system inspired by hippocampal episodic memory research. Every tool call, observation, and interaction is captured as a richly-linked episode that can be retrieved through multi-hop associative traversal — not just keyword search.
2522
+
2523
+ ### Architecture
2524
+
2525
+ ```
2526
+ ┌─────────────────────────────────────────────────────────────────┐
2527
+ │ Associative Memory Pipeline │
2528
+ │ │
2529
+ │ Tool Call → Episode Store → Temporal KG → Zettelkasten Links │
2530
+ │ │ │ │ │
2531
+ │ Triple-Factor Entity Edges Neighbor Discovery │
2532
+ │ Scoring (Graphiti) (A-MEM cosine) │
2533
+ │ │ │ │ │
2534
+ │ └───── PPR Retrieval ───────────┘ │
2535
+ │ (HippoRAG) │
2536
+ │ │ │
2537
+ │ Context Injection (every 3 turns) │
2538
+ └─────────────────────────────────────────────────────────────────┘
2539
+ ```
2540
+
2541
+ ### Episode Store (SQLite)
2542
+
2543
+ Every tool call generates an episode stored in SQLite with WAL journal mode:
2544
+
2545
+ | Field | Description |
2546
+ |-------|-------------|
2547
+ | `content` | Tool name + args + result summary |
2548
+ | `importance` | 0-10 scale (errors=8, file edits=6, reads=3) |
2549
+ | `decay_class` | session (1h), daily (1d), procedural (30d), permanent (∞) |
2550
+ | `embedding` | 384d vector for semantic similarity |
2551
+ | `strength` | Ebbinghaus curve — increases on each retrieval |
2552
+
2553
+ **Scoring**: `score = recency_weight × importance × relevance` — the triple-factor model from [Generative Agents (Park et al., 2023)](https://arxiv.org/abs/2304.03442).
2554
+
2555
+ ### Temporal Knowledge Graph
2556
+
2557
+ Entities extracted from tool results form a temporal KG with [Graphiti](https://arxiv.org/abs/2501.13956)-style edges:
2558
+
2559
+ - **Nodes**: files, functions, errors, people, concepts — with `mention_count` and `last_seen`
2560
+ - **Edges**: causal relationships (`modifies`, `calls`, `causes_error`, `met_person`) with `valid_from`/`valid_until` temporal bounds
2561
+ - **Temporal queries**: "What was the state at time T?" via validity filtering
2562
+
2563
+ ### Zettelkasten Linking (A-MEM)
2564
+
2565
+ After embedding computation, each episode discovers its top-3 nearest neighbors by cosine similarity and creates bidirectional links — implementing the [A-MEM Zettelkasten pattern (NeurIPS 2025)](https://arxiv.org/abs/2502.12110). Over time, episodes form a densely connected knowledge graph where context evolves retroactively as new episodes link to old ones.
2566
+
2567
+ ### PPR Retrieval (HippoRAG)
2568
+
2569
+ Retrieval uses [Personalized PageRank over the temporal KG](https://arxiv.org/abs/2405.14831):
2570
+
2571
+ 1. **Entity extraction** from the current query
2572
+ 2. **Seed node mapping** — find KG nodes matching query entities
2573
+ 3. **PPR diffusion** — importance flows along edges with damping factor α=0.15
2574
+ 4. **Episode scoring** — episodes connected to high-PPR nodes are ranked
2575
+ 5. **Context injection** — top episodes injected every 3 turns as `[ASSOCIATIVE MEMORY]` context
2576
+
2577
+ This enables multi-hop retrieval: asking about "the auth bug" can surface episodes about the specific file, the test that caught it, and the person who reported it — even if those episodes don't share keywords.
2578
+
2579
+ ### Cross-Modal Binding
2580
+
2581
+ The `multimodal_memory` tool binds face, voice, text, and location into unified episodes:
2582
+
2583
+ ```
2584
+ meet("Cole") → {
2585
+ face: InsightFace ArcFace 512d embedding,
2586
+ voice: Whisper transcription of spoken name,
2587
+ photo: CLIP ViT-B/32 768d scene embedding,
2588
+ text: "My name is Cole",
2589
+ episode_id: shared across all modalities,
2590
+ timestamp: ISO-8601
2591
+ }
2592
+ ```
2593
+
2594
+ **Recall** uses the shared `episode_id` to retrieve all modalities at once. CLIP embeddings enable visual queries ("who was in the photo with the whiteboard?") and face embeddings enable identity queries ("when did I last see Cole?").
2595
+
2596
+ ### Gist Compression
2597
+
2598
+ Post-task, the [ReadAgent](https://arxiv.org/abs/2402.09727) gist compressor creates deterministic summaries of multi-turn trajectories (>10 turns), preserving key decisions and outcomes while discarding redundant intermediate steps. No LLM needed — uses extractive heuristics.
2599
+
2600
+ ### Near-Critical Cognitive Architecture
2601
+
2602
+ The associative memory integrates with a near-critical cognitive framework inspired by [Beggs & Plenz (2003)](https://doi.org/10.1523/JNEUROSCI.23-35-11167.2003) neuronal avalanche dynamics:
2603
+
2604
+ - **Auto-consolidation**: At task boundaries, the system writes consolidation snapshots to `.oa/consolidations/` with lessons learned and key patterns
2605
+ - **Provenance KG**: Every agent action is tracked in `.oa/provenance/` for full action traceability
2606
+ - **Homeostasis modulation**: Error rate drives exploration guidance — high error rates inject more careful approaches, low error rates encourage bolder exploration
2607
+ - **Error pattern learning**: Recurring error patterns are detected, stored globally in `~/.open-agents/error-patterns.json`, and injected as `[LEARNED FROM EXPERIENCE]` guidance before similar actions in future sessions
2608
+
2480
2609
 
2481
2610
  ## Dream Mode — Creative Idle Exploration
2482
2611
 
@@ -3371,16 +3500,22 @@ The COHERE collective intelligence system, self-play idle loop, identity evoluti
3371
3500
  | Hyperagents: Self-Referential Meta-Improvement | [2603.19461](https://arxiv.org/abs/2603.19461) | Mar 2026 | D6: Recursive meta-improvement |
3372
3501
  | STOP: Self-Taught Optimizer | [2310.02304](https://arxiv.org/abs/2310.02304) | COLM 2024 | D6: Scaffold self-improvement |
3373
3502
 
3374
- ### Memory & Identity
3503
+ ### Memory, Identity & Associative Retrieval
3375
3504
  | Paper | ArXiv | Venue | Used In |
3376
3505
  |-------|-------|-------|---------|
3377
3506
  | MemoryOS: Memory Operating System | [2506.06326](https://arxiv.org/abs/2506.06326) | EMNLP 2025 Oral | D3: Three-tier consolidation |
3378
- | A-MEM: Agentic Memory (Zettelkasten) | [2502.12110](https://arxiv.org/abs/2502.12110) | NeurIPS 2025 | D3: Retroactive narrative |
3507
+ | A-MEM: Agentic Memory (Zettelkasten) | [2502.12110](https://arxiv.org/abs/2502.12110) | NeurIPS 2025 | Zettelkasten linking, retroactive context evolution |
3508
+ | HippoRAG: Neurobiological Retrieval | [2405.14831](https://arxiv.org/abs/2405.14831) | NeurIPS 2024 | PPR retrieval over temporal KG |
3509
+ | Generative Agents: Interactive Simulacra | [2304.03442](https://arxiv.org/abs/2304.03442) | UIST 2023 | Triple-factor scoring (recency × importance × relevance) |
3510
+ | Graphiti: Temporal Knowledge Graphs | [2501.13956](https://arxiv.org/abs/2501.13956) | Jan 2025 | Temporal edges with valid_from/valid_until |
3511
+ | ReadAgent: Gist Memories | [2402.09727](https://arxiv.org/abs/2402.09727) | Feb 2024 | Post-task trajectory compression |
3512
+ | RGMem: Phase-Transition Memory | — | — | Phase-transition threshold θ_inf=3 |
3379
3513
  | MemRL: Runtime RL on Episodic Memory | [2601.03192](https://arxiv.org/abs/2601.03192) | Jan 2026 | D3: Value-based retrieval |
3380
3514
  | Memory-R1: RL Memory Manager | [2508.19828](https://arxiv.org/abs/2508.19828) | Jan 2026 | D3: ADD/UPDATE/DELETE ops |
3381
3515
  | ExpeL: Experiential Learning | [2308.10144](https://arxiv.org/abs/2308.10144) | AAAI 2024 | D2: Insight extraction |
3382
3516
  | Experiential Reflective Learning | [2603.24639](https://arxiv.org/abs/2603.24639) | Mar 2026 | D2: Heuristics > trajectories |
3383
3517
  | EvoSkill: Automated Skill Discovery | [2603.02766](https://arxiv.org/abs/2603.02766) | Mar 2026 | D2+D4: Pareto + zero-shot transfer |
3518
+ | JARVIS-1: Open-World Multi-Modal Agent | [2311.05997](https://arxiv.org/abs/2311.05997) | NeurIPS 2023 | Cross-modal CLIP retrieval pattern |
3384
3519
 
3385
3520
  ### Collective Identity & Emergence
3386
3521
  | Paper | ArXiv | Venue | Used In |