RubyGems - ruby_llm-toolbox - Versions diffs - 0.1.0 - Mend

ruby_llm-toolbox 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (83) hide show

checksums.yaml +7 -0
data/CHANGELOG.md +49 -0
data/GUIDE.md +598 -0
data/LICENSE +21 -0
data/README.md +412 -0
data/bin/verify_prism_parity +112 -0
data/lib/ruby_llm/toolbox/base.rb +112 -0
data/lib/ruby_llm/toolbox/configuration.rb +148 -0
data/lib/ruby_llm/toolbox/data_path.rb +54 -0
data/lib/ruby_llm/toolbox/process_registry.rb +226 -0
data/lib/ruby_llm/toolbox/process_runner.rb +72 -0
data/lib/ruby_llm/toolbox/ruby_outline.rb +213 -0
data/lib/ruby_llm/toolbox/safe_math.rb +182 -0
data/lib/ruby_llm/toolbox/safety/command_guard.rb +42 -0
data/lib/ruby_llm/toolbox/safety/path_jail.rb +55 -0
data/lib/ruby_llm/toolbox/safety/url_guard.rb +111 -0
data/lib/ruby_llm/toolbox/sandbox/base.rb +151 -0
data/lib/ruby_llm/toolbox/sandbox/bubblewrap.rb +70 -0
data/lib/ruby_llm/toolbox/sandbox/docker.rb +69 -0
data/lib/ruby_llm/toolbox/sandbox/sandbox_exec.rb +75 -0
data/lib/ruby_llm/toolbox/search/brave.rb +64 -0
data/lib/ruby_llm/toolbox/search/searxng.rb +64 -0
data/lib/ruby_llm/toolbox/search/tavily.rb +70 -0
data/lib/ruby_llm/toolbox/text_diff.rb +81 -0
data/lib/ruby_llm/toolbox/toml.rb +409 -0
data/lib/ruby_llm/toolbox/tools/apply_patch.rb +92 -0
data/lib/ruby_llm/toolbox/tools/bash_tool.rb +101 -0
data/lib/ruby_llm/toolbox/tools/bundle.rb +71 -0
data/lib/ruby_llm/toolbox/tools/calculator.rb +42 -0
data/lib/ruby_llm/toolbox/tools/create_directory.rb +35 -0
data/lib/ruby_llm/toolbox/tools/csv_read.rb +69 -0
data/lib/ruby_llm/toolbox/tools/csv_write.rb +51 -0
data/lib/ruby_llm/toolbox/tools/date_time.rb +42 -0
data/lib/ruby_llm/toolbox/tools/delete_file.rb +64 -0
data/lib/ruby_llm/toolbox/tools/diff.rb +35 -0
data/lib/ruby_llm/toolbox/tools/download_file.rb +55 -0
data/lib/ruby_llm/toolbox/tools/edit_file.rb +82 -0
data/lib/ruby_llm/toolbox/tools/gem_tool.rb +140 -0
data/lib/ruby_llm/toolbox/tools/git_add.rb +46 -0
data/lib/ruby_llm/toolbox/tools/git_blame.rb +58 -0
data/lib/ruby_llm/toolbox/tools/git_branch.rb +35 -0
data/lib/ruby_llm/toolbox/tools/git_checkout.rb +43 -0
data/lib/ruby_llm/toolbox/tools/git_commit.rb +47 -0
data/lib/ruby_llm/toolbox/tools/git_diff.rb +50 -0
data/lib/ruby_llm/toolbox/tools/git_grep.rb +66 -0
data/lib/ruby_llm/toolbox/tools/git_helpers.rb +68 -0
data/lib/ruby_llm/toolbox/tools/git_log.rb +47 -0
data/lib/ruby_llm/toolbox/tools/git_show.rb +48 -0
data/lib/ruby_llm/toolbox/tools/git_status.rb +27 -0
data/lib/ruby_llm/toolbox/tools/glob.rb +62 -0
data/lib/ruby_llm/toolbox/tools/grep_files.rb +221 -0
data/lib/ruby_llm/toolbox/tools/http_helpers.rb +130 -0
data/lib/ruby_llm/toolbox/tools/http_request.rb +75 -0
data/lib/ruby_llm/toolbox/tools/json_query.rb +69 -0
data/lib/ruby_llm/toolbox/tools/lint.rb +67 -0
data/lib/ruby_llm/toolbox/tools/list_directory.rb +87 -0
data/lib/ruby_llm/toolbox/tools/move_file.rb +54 -0
data/lib/ruby_llm/toolbox/tools/multi_edit.rb +107 -0
data/lib/ruby_llm/toolbox/tools/parse_ruby.rb +111 -0
data/lib/ruby_llm/toolbox/tools/process_kill.rb +41 -0
data/lib/ruby_llm/toolbox/tools/process_list.rb +29 -0
data/lib/ruby_llm/toolbox/tools/process_output.rb +55 -0
data/lib/ruby_llm/toolbox/tools/process_start.rb +109 -0
data/lib/ruby_llm/toolbox/tools/python_tests.rb +77 -0
data/lib/ruby_llm/toolbox/tools/read_file.rb +75 -0
data/lib/ruby_llm/toolbox/tools/replace_in_files.rb +139 -0
data/lib/ruby_llm/toolbox/tools/run_python.rb +38 -0
data/lib/ruby_llm/toolbox/tools/run_ruby.rb +37 -0
data/lib/ruby_llm/toolbox/tools/run_rust.rb +42 -0
data/lib/ruby_llm/toolbox/tools/run_tests.rb +81 -0
data/lib/ruby_llm/toolbox/tools/sandbox_run.rb +40 -0
data/lib/ruby_llm/toolbox/tools/todo_write.rb +57 -0
data/lib/ruby_llm/toolbox/tools/toml_query.rb +70 -0
data/lib/ruby_llm/toolbox/tools/toolchain_helpers.rb +62 -0
data/lib/ruby_llm/toolbox/tools/tree.rb +87 -0
data/lib/ruby_llm/toolbox/tools/web_fetch.rb +77 -0
data/lib/ruby_llm/toolbox/tools/web_search.rb +81 -0
data/lib/ruby_llm/toolbox/tools/write_file.rb +52 -0
data/lib/ruby_llm/toolbox/tools/yaml_query.rb +73 -0
data/lib/ruby_llm/toolbox/truncator.rb +68 -0
data/lib/ruby_llm/toolbox/version.rb +7 -0
data/lib/ruby_llm/toolbox.rb +161 -0
metadata +194 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: 7901ba482abf5f0ee5176b69fe4a1fd3015dd60794e4b049227d7553d0b79fe2
+  data.tar.gz: ab519ed44d20c0fa121ee85bc1d71891697823d5929bfd4c67f8ec428da167d2
+SHA512:
+  metadata.gz: a8f716e573c86df412c75521097e16d6939adbe1080603937366ad1ba2ef190506f83d87977de31eca1ac96ce92846d7134e1a31a1d44a28a2dcbf9468447d5b
+  data.tar.gz: 645bc71e1f16769099d73aab4326573380f9e2b6e93139332b349677f6aa15db77ca1362d8a5288bd32e30d7a20a76f071fd3cb16b691c461e10c2671fb7eabf

data/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,49 @@
+# Changelog
+All notable changes to this project are documented here. The format is based on
+[Keep a Changelog](https://keepachangelog.com/), and this project adheres to
+Semantic Versioning.
+## [0.1.0] - Unreleased
+Initial release: forty `RubyLLM::Tool` subclasses behind a safe-by-default loader.
+### Added
+- Framework: single `require "ruby_llm/toolbox"`, per-instance configuration,
+  uniform `{ error:, code: }` return contract, and token-budgeted output via
+  `ruby_llm-tokenizer`.
+- Safe (read-only) tools: `read_file`, `list_directory`, `tree`, `glob`,
+  `grep_files`, `gem`, `parse_ruby`, `web_fetch`, `web_search`, `http_request`,
+  `git_status`, `git_diff`, `git_log`, `git_show`, `git_blame`, `git_grep`,
+  `git_branch`, `json_query`, `yaml_query`, `toml_query`, `csv_read`,
+  `calculator`, `date_time`, `diff`, `todo_write`, `process_output`,
+  `process_list`, `process_kill`.
+- Exec (gated) tools: `write_file`, `edit_file`, `multi_edit`,
+  `replace_in_files`, `create_directory`, `move_file`, `delete_file`,
+  `download_file`, `git_add`, `git_commit`, `git_checkout`, `apply_patch`,
+  `csv_write`, `run_tests`, `python_tests`, `lint`, `bundle`, `bash`,
+  `run_ruby`, `run_python`, `run_rust`, `process_start`.
+- Background process management: `process_start` (gated) launches an
+  allowlisted long-running command in its own process group with an
+  address-space cap and a `max_processes` concurrency limit; `process_output`
+  reads new stdout/stderr incrementally with bounded buffers; `process_list`
+  enumerates running processes; `process_kill` stops a process (group signal +
+  `/proc` descendant sweep, TERM→KILL escalation) and returns its final output.
+  Everything still running is cleaned up at interpreter exit.
+- A dependency-free TOML parser (`RubyLLM::Toolbox::Toml`) backing `toml_query`.
+- Swappable `web_search` backends behind one adapter seam: Tavily (default),
+  `:brave` (commercial Brave Search API, header-key auth), and `:searxng`
+  (keyless, self-hosted), selected via `config.search_adapter` with
+  `brave_api_key` / `searxng_url` knobs.
+- Safety: path-jailing, no-shell argv execution with an allowlist,
+  repo-config-RCE hardening for git tools, SSRF protection with private-IP
+  blocking and DNS-rebinding-safe IP pinning, ReDoS guards, and a pluggable
+  code-execution sandbox: Docker, plus host-process bubblewrap (Linux) and
+  sandbox-exec (macOS) backends selected by `sandbox_runtime`, with rlimit
+  memory/CPU caps for the host-process backends.
+- `parse_ruby` dual backend: Prism (Ruby 3.3+) with a Ripper fallback, kept in
+  parity by `spec/ruby_outline_parity_spec.rb` and `bin/verify_prism_parity`.
+- Operator-controlled unsafe override (`config.allow_unsafe`, per-call
+  `unsafe:`) that an agent can request but cannot grant itself.
+[0.1.0]: https://github.com/washu/ruby_llm-toolbox/releases/tag/v0.1.0

data/GUIDE.md ADDED Viewed

@@ -0,0 +1,598 @@
+# ruby_llm-toolbox — Usage Guide
+A practical, end-to-end guide to wiring `ruby_llm-toolbox` into an agent harness
+and using its fifty tools well. It covers the mental model, configuration, the
+safety architecture, the sandbox and search backends, the background-process
+lifecycle, a full tool catalog, and a set of "reach for X, not Y" decision rules
+you can hand to the agent itself.
+This guide is for two audiences. The early sections (mental model through
+configuration) are for the **developer** wiring the gem into a harness. The
+catalog and the [Decision rules](#decision-rules-reach-for-x-not-y) are written
+so they can be dropped into an **agent's** system prompt verbatim. See
+[Using this guide as agent context](#using-this-guide-as-agent-context).
+---
+## Table of contents
+1. [The mental model](#the-mental-model)
+2. [Quick start](#quick-start)
+3. [Configuration reference](#configuration-reference)
+4. [The safe → exec security model](#the-safe--exec-security-model)
+5. [The unsafe override](#the-unsafe-override)
+6. [Sandboxing code execution](#sandboxing-code-execution)
+7. [Web search backends](#web-search-backends)
+8. [Background processes](#background-processes)
+9. [Tool catalog](#tool-catalog)
+10. [Decision rules: reach for X, not Y](#decision-rules-reach-for-x-not-y)
+11. [The return contract and error codes](#the-return-contract-and-error-codes)
+12. [Token budgeting](#token-budgeting)
+13. [Recipes](#recipes)
+14. [Operational notes and honest limitations](#operational-notes-and-honest-limitations)
+15. [Using this guide as agent context](#using-this-guide-as-agent-context)
+---
+## The mental model
+Three ideas explain almost everything about how the toolbox behaves.
+**1. Many narrow, typed tools — not one big shell.** Where a generic agent shells
+out to `cat`, `grep`, `jq`, `git`, and `curl`, this toolbox gives each of those a
+dedicated tool with typed parameters, structured output, and its own guard rails.
+A typed tool is easier for a model to call correctly, easier to secure, and
+easier to reason about than a free-form shell string. `bash` still exists, but as
+a deliberate, allowlisted escape hatch — not the default path.
+**2. Safe by default, exec on request.** Every tool is either **safe**
+(read-only: it observes the world but never changes it) or **exec** (it writes
+files, mutates a repo, runs code, or starts processes). Safe tools always work.
+Exec tools are loaded but **inert** until an operator flips
+`config.enable_exec_tools = true`. This means you can hand an agent the full tool
+set and trust that, until you opt in, it cannot alter anything.
+**3. A uniform return contract.** Every tool returns either a `String` (success)
+or a `Hash` of the shape `{ error: "...", code: :some_symbol }` (failure). The
+agent — and your harness — can branch on one predictable shape regardless of
+which tool ran. See [The return contract](#the-return-contract-and-error-codes).
+The toolbox namespace is `RubyLLM::Toolbox`; tools live under
+`RubyLLM::Toolbox::Tools`. A single `require "ruby_llm/toolbox"` loads everything.
+---
+## Quick start
+```ruby
+require "ruby_llm/toolbox"
+# Configure once, at boot.
+RubyLLM::Toolbox.configure do |c|
+  c.fs_root           = Dir.pwd        # the jail: no tool reads/writes outside this
+  c.max_output_tokens = 2_000          # per-call output budget (the default)
+  # exec tools stay OFF here — read-only agent
+end
+chat = RubyLLM.chat(model: "gpt-4o")          # your ruby_llm chat object
+chat.with_tools(*RubyLLM::Toolbox.safe_tools) # hand it the read-only set
+chat.ask("What does lib/foo.rb do, and where is bar() defined?")
+```
+Three sets are available to wire in:
+| Method | Returns |
+| --- | --- |
+| `RubyLLM::Toolbox.safe_tools` | the read-only tools (always usable) |
+| `RubyLLM::Toolbox.exec_tools` | the mutating/exec tools (honor the gate) |
+| `RubyLLM::Toolbox.all_tools` | both sets together |
+`with_tools` accepts tool **instances** (what these methods return) or you can
+pass tool **classes**; both work with `ruby_llm`. Each tool snapshots the global
+config at construction, so configure before you build the tool list — or rebuild
+the list after a config change.
+To enable the mutating set:
+```ruby
+RubyLLM::Toolbox.configure do |c|
+  c.fs_root           = "/srv/project"
+  c.enable_exec_tools = true
+  c.allowed_commands  = %w[ls cat git rspec]  # bash / process_start allowlist
+end
+chat.with_tools(*RubyLLM::Toolbox.all_tools)
+```
+`RubyLLM::Toolbox.reset!` restores a pristine global configuration — handy in
+tests and between sessions.
+---
+## Configuration reference
+All configuration goes through `RubyLLM::Toolbox.configure { |c| ... }`. Knobs are
+grouped below by concern. Per-call overrides are possible too: most tools accept
+keyword overrides at construction (e.g. `ReadFile.new(fs_root: "/other")`), which
+produce a one-off config snapshot without touching global state.
+### Core / filesystem
+| Knob | Default | Purpose |
+| --- | --- | --- |
+| `fs_root` | `Dir.pwd` | The path jail. Every filesystem tool resolves real paths and refuses anything outside this root. |
+| `enable_exec_tools` | `false` | Master switch for the entire exec set. |
+| `ignored_dirs` | `.git .hg .svn node_modules .bundle tmp` | Directories skipped by `tree`, `grep_files`, `replace_in_files`. |
+| `max_output_tokens` | `2_000` | Output budget per call; longer output is truncated with a marker. |
+| `tokenizer_model` | `gpt-4o` | Model name for `ruby_llm-tokenizer` so truncation counts the right tokens. |
+### Command execution
+| Knob | Default | Purpose |
+| --- | --- | --- |
+| `allowed_commands` | `[]` | Executables that `bash` **and** `process_start` may run. Empty = nothing runs. |
+| `command_timeout` | `30` | Wall-clock seconds before a spawned process is killed. |
+| `max_processes` | `8` | Max concurrent background processes (`process_start`). |
+| `env_passthrough` | `PATH LANG LC_ALL HOME` | Which host env vars are forwarded to spawned processes. Everything else is stripped. |
+### Search and ReDoS
+| Knob | Default | Purpose |
+| --- | --- | --- |
+| `regex_timeout` | `2` | `Regexp.timeout` ceiling (seconds) for `grep_files` / `replace_in_files`, defusing catastrophic backtracking. |
+| `max_grep_matches` | `200` | Cap on matches returned by a single grep. |
+### Web search
+| Knob | Default | Purpose |
+| --- | --- | --- |
+| `search_adapter` | `nil` | `nil`/`:tavily`, `:brave`, `:searxng`, or a custom object responding to `#search(query, max_results:)`. |
+| `tavily_api_key` | `ENV["TAVILY_API_KEY"]` | Key for the default Tavily adapter. |
+| `brave_api_key` | `ENV["BRAVE_API_KEY"]` | Subscription token for the `:brave` adapter. |
+| `searxng_url` | `ENV["SEARXNG_URL"]` | Base URL of a self-hosted SearXNG instance for `:searxng`. |
+### HTTP / fetch
+| Knob | Default | Purpose |
+| --- | --- | --- |
+| `web_allowlist` / `web_denylist` | `[]` / `[]` | Host filters layered on top of the SSRF guard for `web_fetch` / `http_request`. |
+| `http_timeout` | `10` | Open/read timeout for `gem`, `web_fetch`, `web_search`, `http_request`. |
+| `user_agent` | `ruby_llm-toolbox/<version>` | User-Agent header for outbound requests. |
+| `max_fetch_bytes` | `2_000_000` | Size cap on a fetched/downloaded body. |
+| `max_redirects` | `5` | Redirect hops followed (each re-checked by the SSRF guard). |
+### Sandbox (code-execution tools)
+| Knob | Default | Purpose |
+| --- | --- | --- |
+| `sandbox_runtime` | `:auto` | `:auto`, `:docker`, `:bubblewrap`, `:sandbox_exec`. |
+| `docker_image` / `python_image` / `rust_image` | `ruby:3.3-slim` / `python:3.12-slim` / `rust:1-slim` | Images for `run_ruby` / `run_python` / `run_rust` under Docker. |
+| `sandbox_network` | `none` | Network mode for sandboxed code (default: no network). |
+| `sandbox_memory` / `sandbox_cpus` / `sandbox_pids` | `256m` / `1.0` / `128` | Resource caps. |
+| `sandbox_user` | `1000:1000` | UID:GID the sandboxed process runs as. |
+| `sandbox_bwrap_extra` | `[]` | Extra `bwrap` args (e.g. masks to hide host paths). |
+| `sandbox_seatbelt_profile` | `nil` | Override the default macOS Seatbelt SBPL profile. |
+### Unsafe override
+| Knob | Default | Purpose |
+| --- | --- | --- |
+| `allow_unsafe` | `false` | Operator master switch enabling per-call `unsafe:` requests. |
+| `unsafe_logger` | `nil` | A callable invoked on every honored unsafe call, for audit. |
+---
+## The safe → exec security model
+The split between safe and exec tools is the backbone of the design.
+**Safe tools** are read-only by construction. `read_file`, `grep_files`,
+`git_log`, `web_fetch`, `json_query`, and the rest observe the filesystem, a repo,
+or the network but never mutate state. They are always available, even with
+`enable_exec_tools = false`. (The three process-management tools `process_output`,
+`process_list`, and `process_kill` are classed safe because they only act on
+processes that already exist — and `process_kill` is deliberately always
+available as a stop valve.)
+**Exec tools** can change the world: write files, commit to a repo, run arbitrary
+code, start processes. They are loaded but refuse to run until
+`enable_exec_tools = true`. An exec tool called while the gate is closed returns
+`{ error: ..., code: :exec_disabled }` rather than doing anything.
+On top of the gate sit several independent guards, each defending a specific
+class of attack:
+| Guard | Defends against | Where |
+| --- | --- | --- |
+| **Path jail** | Reading or writing outside `fs_root` (`../../etc/passwd`, symlink escapes). Paths are resolved to their real location and checked. | all filesystem tools |
+| **No-shell argv execution** | OS command injection. `bash`/`process_start` take a program plus an argument array — never a shell string — so there is no place for `;`, `|`, `$()`, or globbing to be interpreted. Plus an allowlist. | `bash`, `process_start` |
+| **SSRF guard with IP pinning** | Server-side request forgery and DNS rebinding. Only http/https; private, loopback, link-local, CGNAT, and cloud-metadata IPs are blocked; the socket is pinned to the vetted IP; every redirect hop is re-checked. | `web_fetch`, `http_request`, `download_file` |
+| **ReDoS guard** | Catastrophic regex backtracking locking the process. User patterns run under `Regexp.timeout`. | `grep_files`, `replace_in_files` |
+| **Repo-config RCE hardening** | A malicious checked-out repo executing code via git config / hooks. Git tools run with hardened flags. | all `git_*` tools |
+| **Sandbox** | Untrusted code touching the host. Code-execution tools run in an isolated, no-network, resource-capped sandbox. | `run_ruby`, `run_python`, `run_rust` |
+| **Token budget** | Blowing the context window with a huge file or command output. | every tool, via `max_output_tokens` |
+The guards are layered: enabling exec tools does **not** disable the path jail,
+the SSRF guard, or the sandbox. Each must be crossed on its own terms.
+---
+## The unsafe override
+Sometimes a guard is in the way of legitimate work — reading a file just outside
+`fs_root`, fetching a `localhost` dev server, running a non-allowlisted binary.
+The toolbox provides an escape hatch that is **two-key by design**: the agent can
+*request* a bypass, but only an operator can *grant* the capability.
+- The operator sets `config.allow_unsafe = true` (off by default).
+- The agent passes `unsafe: true` on a supporting tool call.
+- Only when **both** are true does the specific guard step aside — and only for
+  that one call. The agent cannot self-escalate; setting `unsafe: true` while
+  `allow_unsafe` is false is simply refused (`code: :refused`).
+- Every honored unsafe call is passed to `config.unsafe_logger` (if set) for an
+  audit trail.
+Tools that expose an `unsafe:` parameter: **`read_file`**, **`write_file`**,
+**`bash`**, **`process_start`**, **`web_fetch`**, **`http_request`**. Each bypass
+is scoped to that tool's guard only (e.g. `read_file unsafe: true` relaxes the
+path jail for that read; it does not turn off anything else).
+Treat `allow_unsafe` as a trusted-operator, trusted-environment setting. In a
+hands-off or adversarial deployment, leave it off.
+---
+## Sandboxing code execution
+`run_ruby`, `run_python`, and `run_rust` execute model-authored code, so they run
+inside a sandbox rather than directly on the host. The backend is pluggable and
+selected by `config.sandbox_runtime`:
+| Runtime | Platform | Isolation | Notes |
+| --- | --- | --- | --- |
+| `:docker` | any with Docker | **Strongest.** Code runs in a container off an image, so the host filesystem isn't visible at all. | `--network none`, `--read-only`, `--cap-drop ALL`, `--security-opt no-new-privileges`, memory/cpu/pids caps, non-root user, tmpfs `/tmp`. |
+| `:bubblewrap` | Linux | Process-level. No-network, restricted writes, rlimit caps. | Host FS is **bind-mounted read-only** — see caveat below. |
+| `:sandbox_exec` | macOS | Process-level via Seatbelt (SBPL). Deny-by-default, no network, writes only to temp dirs. | Host FS is **readable** — see caveat below. |
+| `:auto` (default) | — | Picks the best available: macOS → Seatbelt then Docker; Linux → bubblewrap then Docker; otherwise Docker; falls back to a `Null` backend that refuses if none exist. | |
+When no sandbox is available, the code-execution tools return
+`code: :sandbox_unavailable` rather than running unsandboxed.
+**Honest caveat on the host-process backends.** Bubblewrap and Seatbelt isolate
+*writes* and *network*, but they leave the host filesystem **readable** (Docker
+does not, because the container only sees its image). If read-confidentiality
+matters — secrets on disk the code shouldn't see — prefer `:docker`, or add
+`sandbox_bwrap_extra` masks / a custom `sandbox_seatbelt_profile` to hide
+sensitive paths. The default backends are about containing damage and side
+effects, not about hiding the source tree the agent is already working in.
+---
+## Web search backends
+`web_search` runs through a swappable adapter so you are not locked to one vendor.
+Select with `config.search_adapter`:
+| Value | Backend | Auth | Best for |
+| --- | --- | --- | --- |
+| `nil` / `:tavily` | Tavily | `tavily_api_key` | Default. Agent-oriented: returns cleaned content plus a synthesized answer. |
+| `:brave` | Brave Search API | `brave_api_key` (header token) | A commercial drop-in alternative; ranked web results. |
+| `:searxng` | self-hosted SearXNG | none — `searxng_url` | Keyless and private; you run the instance. Surfaces SearXNG "instant answers". |
+| any object | custom | your code | Anything responding to `#search(query, max_results:)` returning `{ answer:, results: [{title:, url:, content:}] }`. |
+Every adapter returns the same shape, so the tool's output and your harness logic
+are identical regardless of provider. A missing credential surfaces as
+`code: :no_api_key`; an unknown symbol or backend failure as `code: :search_failed`.
+The SearXNG base URL is treated as operator-configured infrastructure (often on a
+private network) and is **not** run through the SSRF guard — reaching an internal
+instance is the intended behavior.
+---
+## Background processes
+Four tools manage long-running commands — dev servers, file watchers, log tails —
+that you don't want to block on:
+- **`process_start`** (exec, gated) launches one allowlisted command in the
+  background and returns an id like `proc_1` immediately. Same safety model as
+  `bash`: argv only, the minimal `env_passthrough` environment, run in `fs_root`,
+  in its own process group with a memory cap (no CPU cap — these run
+  indefinitely). Bounded by `max_processes`.
+- **`process_output`** (safe) returns the stdout/stderr produced *since the last
+  read*, plus status and exit code. Poll it in a loop to stream output without
+  repeats.
+- **`process_list`** (safe) shows every process with id, status, pid, age, and
+  command.
+- **`process_kill`** (safe) stops a process — SIGTERM to its group, escalating to
+  SIGKILL, plus a `/proc` descendant sweep so children are reaped — then returns
+  any final output and removes it from the registry.
+Lifecycle: **start → poll output (repeat) → kill**. Output buffers are bounded
+(256 KB of unread data per stream; older bytes drop with a marker) so a chatty
+process can't exhaust memory, and everything still running is killed at
+interpreter exit so nothing is orphaned.
+```
+process_start command:"ruby" args:["server.rb"] name:"web"   # → "Started proc_1 (pid 4242)"
+process_output id:"proc_1"                                    # → new output + "running"
+# ... do other work, poll again ...
+process_kill   id:"proc_1"                                    # → final output, removed
+```
+---
+## Tool catalog
+Fifty tools. Safe tools are always available; **(exec)** tools require
+`enable_exec_tools = true`.
+### Filesystem — read
+| Tool | What it does |
+| --- | --- |
+| `read_file` | Read a text file in `fs_root`. Optional `start_line`/`end_line` window, or `tail` for the last N lines. `unsafe:` relaxes the jail. |
+| `list_directory` | List one directory's entries. |
+| `tree` | Depth-limited recursive overview (default depth 3), dirs marked `/`, skips ignored/hidden, no symlink follow, capped at 500 entries. |
+| `glob` | Match files by glob pattern within `fs_root`. |
+| `grep_files` | Content search with a regex (ReDoS-guarded). Supports `before`/`after`/`context` lines like `grep -B/-A/-C`. |
+### Filesystem — write (exec)
+| Tool | What it does |
+| --- | --- |
+| `write_file` | Create or overwrite a whole file (makes parent dirs). `unsafe:` relaxes the jail. |
+| `edit_file` | Replace an exact substring — must match **once** unless `replace_all`. The precise single-edit primitive. |
+| `multi_edit` | Several exact edits to one file, applied sequentially and atomically; names the first failing edit. |
+| `replace_in_files` | Project-wide find/replace across a glob (literal or regex with backrefs), `ignore_case`, `dry_run`; skips binary and ignored dirs. |
+| `create_directory` / `move_file` / `delete_file` | Directory and file management within the jail. |
+| `download_file` | SSRF-guarded fetch straight to a file on disk (size-capped). |
+| `apply_patch` | Apply a unified diff. |
+### Code intelligence
+| Tool | What it does |
+| --- | --- |
+| `parse_ruby` | Structural outline of a Ruby file (classes, modules, methods) via Prism, with a Ripper fallback. |
+### Structured data
+| Tool | What it does |
+| --- | --- |
+| `json_query` | Extract from JSON with a dot/bracket path (`a.b[0].c`, `[]` maps). |
+| `yaml_query` | Same path engine over YAML (`safe_load`). |
+| `toml_query` | Same path engine over TOML (dependency-free parser; file in `fs_root` or inline). |
+| `csv_read` | Read CSV with headers. |
+| `csv_write` *(exec)* | Write rows to a CSV. |
+### Git — read
+`git_status`, `git_diff`, `git_log`, `git_show`, `git_blame`, `git_grep`,
+`git_branch` (`-vv`, remotes). All run with repo-config-RCE hardening.
+### Git — write (exec)
+`git_add`, `git_commit`, `git_checkout`.
+### Web
+| Tool | What it does |
+| --- | --- |
+| `web_fetch` | Fetch a page (SSRF-guarded, size-capped, follows redirects). `unsafe:` relaxes the guard. |
+| `web_search` | Search via the configured adapter (Tavily/Brave/SearXNG/custom). |
+| `http_request` *(mutating verbs gated)* | General HTTP client returning status/headers/body. Safe for GET/HEAD; POST/PUT/PATCH/DELETE need the exec gate. `unsafe:` relaxes the guard. |
+### Toolchain (exec)
+| Tool | What it does |
+| --- | --- |
+| `run_ruby` / `run_python` / `run_rust` | Execute code in the sandbox. |
+| `run_tests` | Run the Ruby test suite (RSpec / Minitest). |
+| `python_tests` | Run pytest / unittest. |
+| `lint` | Run a linter (e.g. RuboCop). |
+| `bundle` | Run a Bundler subcommand. |
+| `bash` | Run one allowlisted executable, argv only, no shell. The escape hatch. `unsafe:` relaxes the allowlist. |
+### Background processes
+`process_start` *(exec)*, `process_output`, `process_list`, `process_kill` — see
+[Background processes](#background-processes).
+### Utilities
+| Tool | What it does |
+| --- | --- |
+| `calculator` | Evaluate arithmetic with a real parser — never `eval` — with functions and constants. |
+| `date_time` | Current time or convert a unix timestamp; optional strftime format. |
+| `diff` | Line-by-line comparison of two text blocks. |
+| `todo_write` | Maintain a task list across calls for multi-step work (pass the full list each time). |
+---
+## Decision rules: reach for X, not Y
+This section is written for the agent. Prefer the dedicated typed tool over a
+shell command every time one exists — it is safer, its output is structured, and
+it won't trip the allowlist.
+**Looking at the filesystem**
+- Read a file → `read_file` (use `start_line`/`end_line` or `tail` for big files). Not `bash cat`/`head`/`tail`.
+- See a project's shape → `tree`. List one dir → `list_directory`. Find by name → `glob`. Not `bash ls`/`find`.
+- Search file contents → `grep_files` (add `context` for surrounding lines). In a git repo, `git_grep` is faster and respects tracking. Not `bash grep`/`rg`.
+**Reading structured data**
+- JSON → `json_query`; YAML → `yaml_query`; TOML → `toml_query`; CSV → `csv_read`. These return typed extractions via dot/bracket paths. Do **not** shell out to `jq`/`yq` or hand-parse with regex.
+**Changing files**
+- One precise change → `edit_file` (exact-once match; include surrounding context to disambiguate).
+- Several changes to the same file → `multi_edit` (atomic; one call).
+- The same change across many files → `replace_in_files` (try `dry_run: true` first).
+- Create or fully rewrite a file → `write_file`.
+- You have a unified diff → `apply_patch`.
+- Avoid `bash sed`/`awk` for edits; the typed tools are jailed and reversible-by-diff.
+**Working with git**
+- Status, history, diffs, blame, branches → the `git_*` tools. They are hardened against malicious repo configs in a way `bash git` is not.
+**Touching the network**
+- Read one page → `web_fetch`. Discover sources → `web_search`. Call an API → `http_request`. Save a file → `download_file`. All are SSRF-guarded. Do **not** `bash curl`/`wget` — those bypass the guard and need allowlisting.
+**Running code**
+- Ruby/Python/Rust snippets → `run_ruby`/`run_python`/`run_rust` (sandboxed). Tests → `run_tests`/`python_tests`. Lint → `lint`. Dependencies → `bundle`. Prefer these over `bash ruby`/`python`, which run unsandboxed and need allowlisting.
+- Arithmetic → `calculator`, never code execution.
+**Long-running commands**
+- A server, watcher, or anything that doesn't return promptly → `process_start`, then `process_output` to follow it, then `process_kill`. Never start these with `bash` — `bash` is one-shot and will block until it times out.
+**Planning**
+- Multi-step work → `todo_write` to track it; pass the full list each call and update statuses (pending / in_progress / completed).
+**When to use `bash` at all**
+- Only for something with no dedicated tool, and only once the operator has allowlisted the executable. It takes a program plus an argument array — there is no shell, so pipes, redirects, globs, and `$()` won't work; compose multiple tool calls instead.
+---
+## The return contract and error codes
+Every tool returns **either**:
+- a `String` on success, or
+- a `Hash` `{ error: "human-readable message", code: :symbol }` on failure.
+Branch on the presence of `:error` (or check `result.is_a?(Hash)`). The `code` is
+a stable symbol you can match programmatically. Common codes you'll encounter:
+| Code | Meaning |
+| --- | --- |
+| `:exec_disabled` | An exec tool was called while the gate is closed. Enable `enable_exec_tools`. |
+| `:path_denied` / `:bad_path` | A path fell outside `fs_root` (or was malformed). |
+| `:command_denied` | The executable isn't on `allowed_commands`. |
+| `:too_many_processes` | `max_processes` reached; kill some first. |
+| `:not_found` | Unknown process id (or missing target). |
+| `:url_blocked` | The SSRF guard rejected a host/IP. |
+| `:http_error` / `:fetch_failed` / `:request_failed` | Network/HTTP failure. |
+| `:regex_timeout` | A pattern hit the ReDoS ceiling. |
+| `:no_api_key` | A search adapter is missing its credential. |
+| `:search_failed` | Search backend error or unknown adapter. |
+| `:sandbox_unavailable` | No sandbox runtime available for code execution. |
+| `:ambiguous` / `:edit_failed` / `:no_change` | An edit didn't match uniquely, failed, or was a no-op. |
+| `:bad_json` / `:bad_yaml` / `:bad_toml` / `:bad_csv` | A structured-data input didn't parse. |
+| `:refused` | An `unsafe:` request was made while `allow_unsafe` is off. |
+(The full vocabulary is larger and tool-specific; the above are the ones worth
+handling explicitly. Treat any unrecognized `code` as a soft failure and surface
+the `error` message.)
+---
+## Token budgeting
+Tool output can be large — a file, a diff, a command's stdout. Rather than let one
+call flood the context window, every tool truncates its output to
+`max_output_tokens`, counted with `ruby_llm-tokenizer` using `tokenizer_model` so
+the count matches your actual model. Truncated output ends with a clear marker.
+Practical implications for the agent:
+- For large files, pass `start_line`/`end_line` or `tail` to `read_file` instead
+  of reading the whole thing.
+- Narrow `grep_files` with a tighter pattern or a path glob rather than scanning
+  everything; results are also capped at `max_grep_matches`.
+- Prefer `tree` (depth-limited) over a deep recursive listing.
+Set `max_output_tokens` to a fraction of your model's window so several tool
+calls can coexist in one turn.
+---
+## Recipes
+**Read-only code investigation (no exec needed)**
+```
+tree depth:2                        # get the lay of the land
+grep_files pattern:"def process"    # find a definition
+read_file path:"lib/x.rb" start_line:40 end_line:90
+parse_ruby path:"lib/x.rb"          # structural outline
+git_log path:"lib/x.rb"             # how it got here
+git_blame path:"lib/x.rb"           # who changed the suspicious line
+```
+**Make a change and verify it (exec enabled)**
+```
+read_file path:"lib/x.rb"                       # confirm exact text
+edit_file path:"lib/x.rb" old_string:"..." new_string:"..."
+run_tests                                        # or python_tests
+lint                                             # style check
+git_diff                                         # review
+git_add paths:["lib/x.rb"]
+git_commit message:"Fix ..."
+```
+**Run a dev server while you work**
+```
+process_start command:"ruby" args:["bin/server"] name:"web"
+# ... edit files, run tests ...
+process_output id:"proc_1"          # check it's healthy / read logs
+process_kill   id:"proc_1"          # when done
+```
+**Fetch and use external data**
+```
+web_search query:"library X changelog 4.0"
+web_fetch  url:"https://.../CHANGELOG.md"
+http_request url:"https://api.example.com/v1/status"   # GET is safe
+download_file url:"https://.../data.csv" path:"tmp/data.csv"   # exec
+csv_read path:"tmp/data.csv"
+```
+**Project-wide rename (cautiously)**
+```
+replace_in_files glob:"**/*.rb" pattern:"OldName" replacement:"NewName" dry_run:true
+# review the report, then:
+replace_in_files glob:"**/*.rb" pattern:"OldName" replacement:"NewName"
+run_tests
+```
+---
+## Operational notes and honest limitations
+- **`fs_root` is the boundary that matters most.** Set it deliberately to the
+  project you want the agent working in. Everything filesystem-related is
+  measured against it.
+- **An empty `allowed_commands` means `bash` and `process_start` can run
+  nothing.** That's intentional — opt in to exactly the executables you trust.
+- **Host-process sandboxes leave the host FS readable.** Bubblewrap and Seatbelt
+  contain writes and network but not reads; use Docker (or masks) when
+  read-confidentiality matters. See [Sandboxing](#sandboxing-code-execution).
+- **`process_kill`'s full descendant reaping depends on the OS.** The
+  implementation is standard (process-group signal + `/proc` descendant sweep +
+  TERM→KILL escalation) and reaps a whole tree on a real Linux host and in CI.
+  Some restricted container runtimes don't deliver process-group signals to
+  non-leader members; there, deeply-nested grandchildren may need the per-pid
+  sweep to catch them, which it does where `/proc` is present.
+- **The default search provider needs a key.** Out of the box `web_search` is
+  Tavily; with no `tavily_api_key` it returns `:no_api_key`. Switch to `:searxng`
+  for a keyless, self-hosted option.
+- **Requires Ruby ≥ 3.3.** `parse_ruby` uses Prism (bundled with supported Ruby)
+  with a Ripper fallback for non-MRI runtimes.
+---
+## Using this guide as agent context
+The [Decision rules](#decision-rules-reach-for-x-not-y) and
+[Tool catalog](#tool-catalog) sections are written to be dropped into an agent's
+system prompt directly — they tell the model which tool to reach for and why,
+which measurably improves tool selection over exposing the raw tool schemas
+alone. A compact prompt-ready summary:
+> You have a toolbox of typed tools. Always prefer the specific tool over `bash`:
+> `read_file`/`tree`/`glob`/`grep_files` for the filesystem;
+> `json_query`/`yaml_query`/`toml_query`/`csv_read` for structured data;
+> `edit_file`/`multi_edit`/`replace_in_files`/`write_file`/`apply_patch` for changes;
+> the `git_*` tools for version control; `web_fetch`/`web_search`/`http_request`/`download_file` for the network;
+> `run_ruby`/`run_python`/`run_rust`/`run_tests`/`lint`/`bundle` for code and toolchain;
+> `process_start`/`process_output`/`process_kill` for anything long-running;
+> `calculator` for arithmetic; `todo_write` to plan multi-step work.
+> Reserve `bash` for tasks with no dedicated tool. Tools return a string on
+> success or `{ error:, code: }` on failure — read the `code` and adjust.

data/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 washu
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.