open-agents-ai 0.187.133 → 0.187.135

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +26 -3
  2. package/dist/index.js +508 -226
  3. package/package.json +2 -2
package/README.md CHANGED
@@ -40,7 +40,7 @@ An autonomous multi-turn tool-calling agent that reads your code, makes changes,
40
40
  - [Model-Tier Awareness](#model-tier-awareness)
41
41
  - [Live Code Knowledge Graph](#live-code-knowledge-graph)
42
42
  - [Auto-Expanding Context Window](#auto-expanding-context-window)
43
- - [Tools (67+)](#tools-67)
43
+ - [Tools (64+)](#tools-64)
44
44
  - [Ralph Loop — Iteration-First Design](#ralph-loop--iteration-first-design)
45
45
  - [Task Control](#task-control)
46
46
  - [COHERE Cognitive Framework](#cohere-cognitive-framework)
@@ -833,7 +833,7 @@ Small models (4B-7B) receive 10+ optimizations that larger models don't need, ea
833
833
 
834
834
  ### Tool Nesting for Small Models
835
835
 
836
- Small models use an **explore_tools** meta-tool pattern inspired by hierarchical API retrieval research ([ToolLLM](https://arxiv.org/abs/2307.16789)). Instead of presenting all 67 tools (which overwhelms small context windows), only core tools are loaded initially. The agent calls `explore_tools()` to discover additional capabilities, then activates specific tools as needed. This reduces tool schema tokens by ~80% while preserving access to the full toolset.
836
+ Small models use an **explore_tools** meta-tool pattern inspired by hierarchical API retrieval research ([ToolLLM](https://arxiv.org/abs/2307.16789)). Instead of presenting all 64+ tools (which overwhelms small context windows), only core tools are loaded initially. The agent calls `explore_tools()` to discover additional capabilities, then activates specific tools as needed. This reduces tool schema tokens by ~80% while preserving access to the full toolset.
837
837
 
838
838
  ### Dynamic Context Limits
839
839
 
@@ -923,7 +923,7 @@ On startup and `/model` switch, Open Agents detects your RAM/VRAM and creates an
923
923
 
924
924
 
925
925
 
926
- ## Tools (61)
926
+ ## Tools (64)
927
927
 
928
928
  <div align="right"><a href="#top">back to top</a></div>
929
929
 
@@ -1007,6 +1007,10 @@ On startup and `/model` switch, Open Agents detects your RAM/VRAM and creates an
1007
1007
  | `identity_kernel` | Persistent identity state — hydrate, observe events, propose updates with justification, publish snapshot, reconcile contradictions. Persists in `.oa/identity/` |
1008
1008
  | `reflect` | Immune-system reflection — diagnostic (find flaws), epistemic (identify missing evidence), constitutional (review self-updates). Returns pass/revise/block verdict |
1009
1009
  | `explore` | ARCHE strategy-space exploration — generate diverse strategies, archive successful variants with tags/confidence, compare competing approaches, retrieve past strategies |
1010
+ | **Hardware Access** | |
1011
+ | `camera_capture` | Access system cameras — list devices, capture JPEG frames, query capabilities. Uses ffmpeg + v4l2. Supports USB, CSI, and 360 cameras (QooCam, RealSense). Captured images can be piped to vision tools |
1012
+ | `audio_capture` | Record from microphone — list input devices, record WAV/MP3 (configurable duration/rate/channels), check real-time mic level (RMS dBFS). Uses arecord + ffmpeg backends |
1013
+ | `audio_playback` | Speaker control and TTS — play audio files (WAV/MP3/OGG), text-to-speech via espeak-ng (multi-language), get/set system volume. Uses aplay/ffplay/amixer backends |
1010
1014
 
1011
1015
  Read-only tools execute concurrently when called in the same turn. Mutating tools run sequentially.
1012
1016
 
@@ -1031,7 +1035,26 @@ The agent has 4 web tools. Pick the right one:
1031
1035
 
1032
1036
  **Structured extraction**: Pass `extract_schema='{"price": "number", "name": "string"}'` to `web_crawl` for best-effort regex-based field extraction from page content.
1033
1037
 
1038
+ ### Hardware Tool Guide
1034
1039
 
1040
+ The agent can access physical hardware — cameras, microphones, and speakers — through three dedicated tools:
1041
+
1042
+ | Need | Tool | Example |
1043
+ |------|------|---------|
1044
+ | See the environment | `camera_capture` action=capture | Grab a JPEG frame from any USB/CSI camera |
1045
+ | List cameras | `camera_capture` action=list | Discover `/dev/video*` devices |
1046
+ | Record audio | `audio_capture` action=record duration=10 | Record 10s WAV from default mic |
1047
+ | Check if mic works | `audio_capture` action=level | RMS level in dBFS |
1048
+ | Speak aloud | `audio_playback` action=speak text="Hello" | TTS via espeak-ng |
1049
+ | Play a sound file | `audio_playback` action=play file=alert.wav | Play WAV/MP3/OGG |
1050
+ | Check volume | `audio_playback` action=volume | Get current volume % |
1051
+ | Set volume | `audio_playback` action=volume volume=50 | Set to 50% |
1052
+
1053
+ **Prerequisites**: `ffmpeg`, `arecord`, `aplay`, `amixer` (ALSA utils), `espeak-ng`. Install: `sudo apt install ffmpeg alsa-utils espeak-ng`
1054
+
1055
+ **Camera support**: USB cameras (UVC), Intel RealSense (via UVC), 360 cameras (QooCam, Ricoh Theta — raw fisheye via v4l2loopback + ffmpeg crop). The captured frame is returned as base64 JPEG that can be fed directly to the `vision` tool for analysis.
1056
+
1057
+ **Audio workflow**: Record → transcribe → analyze: `audio_capture action=record` → `transcribe_file` → process transcript. The tools handle device enumeration and graceful degradation when hardware is unavailable.
1035
1058
 
1036
1059
 
1037
1060
  ## Ralph Loop — Iteration-First Design