RubyGems - ruby_llm-toolbox - Versions diffs - 0.1.0 - Mend

ruby_llm-toolbox 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (83) hide show

checksums.yaml +7 -0
data/CHANGELOG.md +49 -0
data/GUIDE.md +598 -0
data/LICENSE +21 -0
data/README.md +412 -0
data/bin/verify_prism_parity +112 -0
data/lib/ruby_llm/toolbox/base.rb +112 -0
data/lib/ruby_llm/toolbox/configuration.rb +148 -0
data/lib/ruby_llm/toolbox/data_path.rb +54 -0
data/lib/ruby_llm/toolbox/process_registry.rb +226 -0
data/lib/ruby_llm/toolbox/process_runner.rb +72 -0
data/lib/ruby_llm/toolbox/ruby_outline.rb +213 -0
data/lib/ruby_llm/toolbox/safe_math.rb +182 -0
data/lib/ruby_llm/toolbox/safety/command_guard.rb +42 -0
data/lib/ruby_llm/toolbox/safety/path_jail.rb +55 -0
data/lib/ruby_llm/toolbox/safety/url_guard.rb +111 -0
data/lib/ruby_llm/toolbox/sandbox/base.rb +151 -0
data/lib/ruby_llm/toolbox/sandbox/bubblewrap.rb +70 -0
data/lib/ruby_llm/toolbox/sandbox/docker.rb +69 -0
data/lib/ruby_llm/toolbox/sandbox/sandbox_exec.rb +75 -0
data/lib/ruby_llm/toolbox/search/brave.rb +64 -0
data/lib/ruby_llm/toolbox/search/searxng.rb +64 -0
data/lib/ruby_llm/toolbox/search/tavily.rb +70 -0
data/lib/ruby_llm/toolbox/text_diff.rb +81 -0
data/lib/ruby_llm/toolbox/toml.rb +409 -0
data/lib/ruby_llm/toolbox/tools/apply_patch.rb +92 -0
data/lib/ruby_llm/toolbox/tools/bash_tool.rb +101 -0
data/lib/ruby_llm/toolbox/tools/bundle.rb +71 -0
data/lib/ruby_llm/toolbox/tools/calculator.rb +42 -0
data/lib/ruby_llm/toolbox/tools/create_directory.rb +35 -0
data/lib/ruby_llm/toolbox/tools/csv_read.rb +69 -0
data/lib/ruby_llm/toolbox/tools/csv_write.rb +51 -0
data/lib/ruby_llm/toolbox/tools/date_time.rb +42 -0
data/lib/ruby_llm/toolbox/tools/delete_file.rb +64 -0
data/lib/ruby_llm/toolbox/tools/diff.rb +35 -0
data/lib/ruby_llm/toolbox/tools/download_file.rb +55 -0
data/lib/ruby_llm/toolbox/tools/edit_file.rb +82 -0
data/lib/ruby_llm/toolbox/tools/gem_tool.rb +140 -0
data/lib/ruby_llm/toolbox/tools/git_add.rb +46 -0
data/lib/ruby_llm/toolbox/tools/git_blame.rb +58 -0
data/lib/ruby_llm/toolbox/tools/git_branch.rb +35 -0
data/lib/ruby_llm/toolbox/tools/git_checkout.rb +43 -0
data/lib/ruby_llm/toolbox/tools/git_commit.rb +47 -0
data/lib/ruby_llm/toolbox/tools/git_diff.rb +50 -0
data/lib/ruby_llm/toolbox/tools/git_grep.rb +66 -0
data/lib/ruby_llm/toolbox/tools/git_helpers.rb +68 -0
data/lib/ruby_llm/toolbox/tools/git_log.rb +47 -0
data/lib/ruby_llm/toolbox/tools/git_show.rb +48 -0
data/lib/ruby_llm/toolbox/tools/git_status.rb +27 -0
data/lib/ruby_llm/toolbox/tools/glob.rb +62 -0
data/lib/ruby_llm/toolbox/tools/grep_files.rb +221 -0
data/lib/ruby_llm/toolbox/tools/http_helpers.rb +130 -0
data/lib/ruby_llm/toolbox/tools/http_request.rb +75 -0
data/lib/ruby_llm/toolbox/tools/json_query.rb +69 -0
data/lib/ruby_llm/toolbox/tools/lint.rb +67 -0
data/lib/ruby_llm/toolbox/tools/list_directory.rb +87 -0
data/lib/ruby_llm/toolbox/tools/move_file.rb +54 -0
data/lib/ruby_llm/toolbox/tools/multi_edit.rb +107 -0
data/lib/ruby_llm/toolbox/tools/parse_ruby.rb +111 -0
data/lib/ruby_llm/toolbox/tools/process_kill.rb +41 -0
data/lib/ruby_llm/toolbox/tools/process_list.rb +29 -0
data/lib/ruby_llm/toolbox/tools/process_output.rb +55 -0
data/lib/ruby_llm/toolbox/tools/process_start.rb +109 -0
data/lib/ruby_llm/toolbox/tools/python_tests.rb +77 -0
data/lib/ruby_llm/toolbox/tools/read_file.rb +75 -0
data/lib/ruby_llm/toolbox/tools/replace_in_files.rb +139 -0
data/lib/ruby_llm/toolbox/tools/run_python.rb +38 -0
data/lib/ruby_llm/toolbox/tools/run_ruby.rb +37 -0
data/lib/ruby_llm/toolbox/tools/run_rust.rb +42 -0
data/lib/ruby_llm/toolbox/tools/run_tests.rb +81 -0
data/lib/ruby_llm/toolbox/tools/sandbox_run.rb +40 -0
data/lib/ruby_llm/toolbox/tools/todo_write.rb +57 -0
data/lib/ruby_llm/toolbox/tools/toml_query.rb +70 -0
data/lib/ruby_llm/toolbox/tools/toolchain_helpers.rb +62 -0
data/lib/ruby_llm/toolbox/tools/tree.rb +87 -0
data/lib/ruby_llm/toolbox/tools/web_fetch.rb +77 -0
data/lib/ruby_llm/toolbox/tools/web_search.rb +81 -0
data/lib/ruby_llm/toolbox/tools/write_file.rb +52 -0
data/lib/ruby_llm/toolbox/tools/yaml_query.rb +73 -0
data/lib/ruby_llm/toolbox/truncator.rb +68 -0
data/lib/ruby_llm/toolbox/version.rb +7 -0
data/lib/ruby_llm/toolbox.rb +161 -0
metadata +194 -0

data/README.md ADDED Viewed

@@ -0,0 +1,412 @@
+# ruby_llm-toolbox
+[![CI](https://github.com/washu/ruby_llm-toolbox/actions/workflows/ci.yml/badge.svg)](https://github.com/washu/ruby_llm-toolbox/actions/workflows/ci.yml)
+[![Gem Version](https://badge.fury.io/rb/ruby_llm-toolbox.svg)](https://rubygems.org/gems/ruby_llm-toolbox)
+[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
+[![Ruby](https://img.shields.io/badge/ruby-%3E%3D%203.3-CC342D.svg)](https://www.ruby-lang.org/)
+A safe-by-default bundle of [`RubyLLM::Tool`](https://github.com/crmne/ruby_llm) classes
+covering the skills common to most LLM harnesses — filesystem, shell, web, git, and
+structured-data tools — packaged as one gem with one require.
+- **One gem, one require.** `require "ruby_llm/toolbox"` loads everything. No sub-gems, no second require.
+- **Safe by default.** Read-only tools work out of the box. Mutating/exec tools are loaded but **inert** until you explicitly enable them.
+- **Token-budgeted output.** Every tool result is truncated (head + tail, middle elided) to fit a token budget, counted with [`ruby_llm-tokenizer`](https://github.com/washu/ruby_llm-tokenizer) — so a single `grep` can't blow up the context window.
+- **Uniform failure contract.** Tools never raise into the harness; failures come back as `{ error:, code: }`, matching ruby_llm's own convention.
+> Status: **v0.1** — ships the framework plus fifty tools across filesystem, search, code intelligence, git, web, the Ruby/Python/Rust toolchain, structured data (JSON/YAML/TOML/CSV), background process management, and small utilities. Safe tools are on by default; exec tools (writes, mutations, code execution) are gated behind `enable_exec_tools`. `parse_ruby` uses Prism (bundled with the supported Ruby 3.3+), with a Ripper fallback for non-MRI runtimes. An optional, operator-controlled [unsafe override](#security-override) lets specific calls bypass individual guards when explicitly permitted. See [Tools](#tools) and the [Roadmap](#roadmap). For an end-to-end walkthrough — wiring, the safety model, sandbox/search selection, the full tool catalog, and "reach for X, not Y" rules you can hand to the agent — read the **[Usage Guide](GUIDE.md)**.
+## Installation
+Requires **Ruby >= 3.3** (where Prism is bundled, so `parse_ruby` uses it with no extra
+dependency).
+The tokenizer dependency (`ruby_llm-tokenizer`) pulls in the `sentencepiece` native gem,
+which requires the SentencePiece C library to be present at build time:
+```bash
+# Ubuntu / Debian
+sudo apt-get install -y libsentencepiece-dev
+# macOS (Homebrew — arm64 installs to /opt/homebrew, so point the build at it)
+brew install sentencepiece
+bundle config set build.sentencepiece \
+  "--with-sentencepiece-dir=$(brew --prefix sentencepiece)"
+```
+Then add the gem to your Gemfile:
+```ruby
+# Gemfile
+gem "ruby_llm-toolbox"
+```
+## Quick start
+```ruby
+require "ruby_llm/toolbox"
+RubyLLM::Toolbox.configure do |c|
+  c.fs_root           = "/srv/project"   # filesystem tools are jailed to this
+  c.max_output_tokens = 2_000            # per-result budget
+  c.tokenizer_model   = "gpt-4o"         # which tokenizer to count with
+end
+chat = RubyLLM.chat
+chat.with_tools(*RubyLLM::Toolbox.safe_tools)   # read-only set, always on
+chat.ask("What does config/database.yml configure?")
+```
+### Enabling exec tools
+Dangerous tools (`bash`, and the upcoming `write_file`, `edit_file`, `run_code`,
+`git_commit`, mutating `http_request`) are loaded but refuse to run until you opt in:
+```ruby
+RubyLLM::Toolbox.configure do |c|
+  c.enable_exec_tools = true
+  c.allowed_commands  = %w[ls cat grep rg]   # bash runs ONLY these executables
+  c.command_timeout   = 30
+end
+chat.with_tools(*RubyLLM::Toolbox.all_tools)  # exec tools still honor the gate
+```
+You can also scope a single instance without touching global config:
+```ruby
+chat.with_tool(RubyLLM::Toolbox::Tools::ReadFile.new(fs_root: "/srv/other"))
+```
+## Tools
+### `read_file` (safe)
+Reads a UTF-8 text file from within `fs_root`, with an optional 1-based line range or a `tail`
+of the last N lines (like `tail -n N`, which takes precedence over the range).
+Output is token-budgeted. Path traversal and symlink escapes are rejected.
+### `list_directory` (safe)
+Lists directory entries within `fs_root` with type (dir/file/symlink) and size.
+Optional `recursive` and `include_hidden`. Symlinked directories are listed but not
+traversed, so a link can't walk out of the jail.
+### `tree` (safe)
+Renders a depth-limited directory tree under `fs_root` (default 3 levels) — a fast way to
+grasp project structure without walking it one level at a time. Directories are marked with a
+trailing slash; ignored directories and hidden entries are skipped (toggle with `show_hidden`),
+symlinks aren't followed, and the listing is capped.
+### `glob` (safe)
+Finds files matching a glob (`**/*.rb`, `app/models/*.rb`) within `fs_root`, relative
+to an optional `base`. Patterns containing `..` are rejected and each hit is re-checked
+through the jail to drop symlink escapes.
+### `grep_files` (safe)
+Searches file contents for a regex within `fs_root`, returning `path:line: text`. Optional
+file `glob` filter and `ignore_case`, plus `before`/`after`/`context` lines (like grep
+`-B`/`-A`/`-C`) — context lines render as `path-line- text` and separate blocks are divided
+with `--`. The pattern is compiled with a per-match timeout (ReDoS backstop), binary files and
+noisy/VCS directories are skipped, and results are capped.
+### `gem` (safe)
+Read-only RubyGems.org metadata lookup. Actions: `info` (summary), `version` (latest),
+`dependencies` (runtime deps), `search` (find gems by query). The host is fixed and all
+input is URL-encoded, so there's no arbitrary-URL surface.
+### `parse_ruby` (safe)
+In-process structural outline of a Ruby file (classes, modules, methods, constants with
+line numbers and nesting), or definition lookup by `query`/`kind`. It parses — never executes
+— the code, through one of two interchangeable backends behind `RubyOutline`: **Prism** when
+it can be loaded (it's bundled with Ruby 3.3+, the supported floor, so no gem install is
+needed), and **Ripper** (stdlib) as a fallback for runtimes that don't bundle Prism (e.g.
+non-MRI). The two are held to identical output by
+`spec/ruby_outline_parity_spec.rb` and `bin/verify_prism_parity`, which compares both
+backends over a corpus and can be run under any Ruby — including a sandboxed one
+(`docker run --rm -v "$PWD":/app -w /app ruby:3.4-slim ruby bin/verify_prism_parity`).
+### `json_query` / `yaml_query` / `toml_query` / `csv_read` (safe), `csv_write` (exec)
+`json_query`, `yaml_query`, and `toml_query` parse JSON / YAML / TOML (from a file in
+`fs_root` or an inline string) and extract values with a shared dot/bracket path
+(`users[0].name`, `dependencies.serde.version`, `products[].name`) or pretty-print. YAML is
+loaded with `safe_load` (no arbitrary Ruby objects); TOML uses a dependency-free parser
+covering the common surface of TOML 1.0 (tables, arrays-of-tables, inline tables, dotted
+keys, all scalar forms). `csv_read` reads a CSV into readable rows (optional header, `limit`);
+`csv_write` writes an array of rows (optional `headers`) to a CSV.
+### `web_fetch` / `web_search` / `http_request` (safe)
+`web_fetch` retrieves a URL over http/https and returns readable text (HTML stripped),
+following redirects. `web_search` queries the web through a swappable adapter — **Tavily** by default
+(set `tavily_api_key`), or set `search_adapter` to `:brave` (commercial Brave Search API,
+set `brave_api_key`), `:searxng` (a keyless, self-hosted SearXNG instance, set `searxng_url`),
+or any object responding to `#search(query, max_results:)`. `http_request` is a general
+client returning status/headers/body.
+All three route through `Safety::UrlGuard` (see below). `http_request` allows GET/HEAD by
+default; POST/PUT/PATCH/DELETE require `enable_exec_tools`.
+### `download_file` (exec, gated)
+Downloads a URL to a file within `fs_root` (whereas `web_fetch` returns text). Routes through
+`Safety::UrlGuard`, follows redirects safely, is capped at `config.max_fetch_bytes`, and jails
+the destination path.
+### `bash` (exec, gated)
+Runs **one allowlisted executable** with arguments. Deliberately **not a shell** — no
+pipes, redirects, globs, quoting, or variable expansion. The program goes in `command`;
+each argument is a separate element of `args`, passed verbatim as argv. This is the
+primitive that the OS-command-injection bug class can't reach, because nothing ever
+parses the input as a shell line.
+```jsonc
+// model emits:
+{ "command": "rg", "args": ["TODO", "app/models"] }
+```
+### `run_ruby` (exec, gated)
+Executes a Ruby snippet inside the active [sandbox runtime](#sandbox-runtimes) with code piped
+on stdin. Under Docker it runs in an ephemeral, no-network, read-only, cap-dropped container;
+under bubblewrap or sandbox-exec it runs the host's `ruby` in an isolated, no-network,
+write-restricted environment. Requires `enable_exec_tools` and an available sandbox; returns a
+clean `:sandbox_unavailable` error otherwise.
+### `run_python` (exec, gated)
+Same sandbox as `run_ruby`, running Python (the `config.python_image` under Docker, or the
+host's `python3` under the host-process backends). Code is piped to `python3` on stdin.
+### `python_tests` (exec, gated)
+Runs the project's Python tests from `fs_root` — pytest by default, or unittest
+(`python -m unittest discover`) — with a parsed pass/fail headline, mirroring `run_tests`.
+### `run_rust` (exec, gated)
+Compiles and runs a self-contained Rust program in the same sandbox (`config.rust_image` under
+Docker, or the host's `rustc` under the host-process backends). The source is piped on stdin; a
+shell step inside the sandbox writes it to scratch, compiles with `rustc`, and runs the binary,
+returning compiler output plus the program's stdout/stderr and exit.
+### `calculator` / `date_time` / `diff` / `todo_write` (safe)
+Small in-process utilities. `calculator` evaluates an arithmetic expression with a real
+recursive-descent parser — never `eval` — supporting `+ - * / % **`, parentheses, common
+functions (`sqrt`, `sin`, `ln`, …), and constants (`pi`, `e`). `date_time` returns the
+current time (or converts a unix timestamp), with an optional strftime format. `diff`
+produces a readable line-by-line comparison of two text blocks. `todo_write` maintains a
+task list across calls for multi-step work (pass the full list each time; statuses are
+pending/in_progress/completed).
+### Background processes: `process_start` / `process_output` / `process_list` / `process_kill`
+Long-running commands — dev servers, file watchers, log tails — that an agent
+starts, polls, and stops without blocking on them.
+`process_start` (**exec, gated**) launches one allowlisted executable as a
+background process and returns its id (e.g. `proc_1`) immediately. It carries the
+same safety model as `bash`: argv only (no shell), the minimal `env_passthrough`
+environment, run in `fs_root`, in its own process group with an address-space cap
+derived from `sandbox_memory` (but **no** CPU cap — these are meant to run
+indefinitely). The number of concurrent live processes is bounded by
+`max_processes`.
+The other three are **safe** — they only act on processes already started, and
+`process_kill` is always available as a stop valve even if exec tools are later
+disabled. `process_output` returns the stdout/stderr produced since the last read
+(incremental, so polling in a loop streams output without repeats) plus the
+current status and exit code. `process_list` shows every process with its id,
+status, pid, age, and command. `process_kill` stops a process — SIGTERM to its
+group, escalating to SIGKILL, plus a `/proc` descendant sweep so children are
+reaped even where group-signal delivery is incomplete — then returns any final
+output and removes it from the registry. Output buffers are bounded (256 KB of
+unread data per stream; older bytes are dropped with a marker), so a chatty
+process can't exhaust memory. Everything still running is killed at interpreter
+exit so nothing is orphaned.
+### `write_file` (exec, gated)
+Creates or overwrites a text file within `fs_root`, creating missing parent directories.
+### `edit_file` (exec, gated)
+The core editing primitive: replace an exact substring. `old_string` must match **exactly
+once** (include surrounding context) unless `replace_all` is set; a missing or ambiguous
+match fails clearly instead of guessing. Backslash sequences in `new_string` are written
+literally — no accidental backreference interpretation.
+### `multi_edit` (exec, gated)
+Applies several `edit_file`-style replacements to one file **atomically**. Edits run in order
+(a later edit sees earlier results), each following the exact-match-once rule unless
+`replace_all` is set. If any edit can't be applied, nothing is written and the failing edit is
+named — so the file is never left half-edited. Saves a round-trip per change when batching.
+### `replace_in_files` (exec, gated)
+Project-wide find/replace across files matching a glob (default `**/*`). Literal by default,
+or `regex: true` with `\1` backreferences in the replacement; `ignore_case` and `dry_run`
+are supported. Binary files and `ignored_dirs` are skipped, the pattern runs under a ReDoS
+timeout, and every path is jailed to `fs_root`.
+### `create_directory` / `move_file` / `delete_file` (exec, gated)
+`create_directory` does `mkdir -p` within the jail. `move_file` renames/moves with **both**
+endpoints confined to `fs_root` and refuses to clobber unless `overwrite`. `delete_file`
+removes a file or empty directory; a non-empty directory needs `recursive`, and `fs_root`
+itself can't be deleted.
+### `git_status` / `git_diff` / `git_log` / `git_show` / `git_blame` / `git_grep` / `git_branch` (safe)
+Read-only views of the repo at `fs_root`. `git_diff` takes optional `staged`, `path`, and
+`ref`; `git_log` takes `count` and `path`; `git_show` shows a commit or a file at a ref;
+`git_blame` shows line-by-line authorship (optional range); `git_grep` searches tracked
+content (optional `path`, `ignore_case`, `fixed`), passing the pattern via `-e` so a
+dash-leading pattern can't inject a git option; `git_branch` lists branches with the current
+one marked (optional `all` for remotes). Because git can be made to run repo-configured
+commands during read operations (`core.fsmonitor` on status, `diff.external`/textconv on
+diff/show), these are neutralized so a hostile checkout can't turn a diff into code execution.
+Refs are validated to block option injection, path arguments are jailed, and the pager and
+credential prompts are disabled so nothing hangs. Requires git on the host.
+### `git_add` / `git_commit` / `git_checkout` / `apply_patch` (exec, gated)
+`git_add`/`git_commit`/`git_checkout` stage, commit, and switch branches. `apply_patch`
+applies a unified diff via `git apply` — validated with `--check` first (nothing is written
+if it wouldn't apply cleanly), with `check: true` for a dry run. Path-escaping patches are
+rejected. Does not push.
+### `run_tests` / `lint` / `bundle` (exec, gated)
+The verify trio, run from `fs_root`. `run_tests` auto-detects RSpec (`spec/`/`.rspec`) or
+Minitest (`test/` via rake) and returns output with a pass/fail headline (a failing suite is
+a result, not a tool error). `lint` runs RuboCop (or Standard when `.standard.yml` is
+present), with optional `autocorrect`. `bundle` runs Bundler actions (`install`, `update`,
+`outdated`, `check`, `lock`, `add`). These inherit the full host environment (so bundler,
+rbenv/rvm, and the dev binaries resolve), use `bundle exec` when a Gemfile exists, and report
+`:unavailable` if the tool isn't installed.
+## Safety model
+The dangerous surface is engineered, not just documented:
+| Concern | Mitigation |
+| --- | --- |
+| Path traversal / symlink escape | `Safety::PathJail` resolves realpath and confines to `fs_root` |
+| OS command injection | `bash` uses array-form spawn (no shell) + executable allowlist |
+| Env leakage | spawned processes get a scrubbed env (`env_passthrough` only) |
+| Runaway processes | hard wall-clock `command_timeout`, then `SIGKILL` |
+| Untrusted code execution | runs in a pluggable [sandbox](#sandbox-runtimes) — Docker (no-network, read-only, cap-dropped) or host-process bubblewrap/sandbox-exec with no network, restricted writes, and rlimit caps |
+| Malicious repo config (RCE) | git tools disable `core.fsmonitor`, external diff drivers, and textconv |
+| Context blowup | every result passes through the token budgeter |
+| ReDoS (user regex) | `grep_files` compiles patterns with a per-match `regex_timeout` |
+| SSRF (web tools) | `Safety::UrlGuard` allows only http/https, blocks private/loopback/link-local/metadata IPs, **pins the socket to the vetted IP** (closing DNS rebinding), and re-checks every redirect hop |
+| Privilege escalation by the agent | the unsafe override is opt-in per call **and** requires an operator-set `allow_unsafe`; an agent passing `unsafe: true` on its own gets `:unsafe_denied` |
+### Security override
+Sometimes an operator genuinely wants a tool to step outside its guard — read a file outside
+`fs_root`, run a non-allowlisted binary, fetch an internal URL. The override is built so the
+**agent can ask but never grant**:
+- A few tools (`read_file`, `write_file`, `bash`, `web_fetch`, `http_request`) take an
+  `unsafe: true` parameter.
+- That alone does nothing. Unless a human has set `RubyLLM::Toolbox.config.allow_unsafe = true`,
+  any call requesting it is refused with `:unsafe_denied`. The model cannot flip that switch.
+- When both line up, the call bypasses only its own guard (path jail, command allowlist, or
+  SSRF check) — never the deeper invariants (e.g. `bash` is still argv-only with no shell, and
+  still rejects NUL bytes). Set `config.unsafe_logger = ->(tool, detail) { … }` to audit every
+  override that fires.
+This keeps the default safe, makes escalation a deliberate operator decision, and leaves an
+audit trail — rather than a single boolean an agent could talk its way into.
+### Sandbox runtimes
+The code-execution tools (`run_ruby`/`run_python`/`run_rust`) run through a pluggable sandbox,
+chosen by `config.sandbox_runtime` (default `:auto`):
+| Runtime | Platform | How it isolates |
+| --- | --- | --- |
+| `:docker` | any with Docker | Ephemeral container: `--network none`, read-only root + tmpfs `/tmp`, `--cap-drop ALL`, no-new-privileges, non-root user, memory/CPU/pids limits. Only the image is visible — not the host. |
+| `:bubblewrap` | Linux (`bwrap`) | Fresh namespaces via `--unshare-all` (no network), host filesystem bound read-only, writable tmpfs `/tmp`, `--die-with-parent`. Runs host interpreters. |
+| `:sandbox_exec` | macOS | Seatbelt profile: deny-by-default, all network denied, reads allowed, writes only to temp. Runs host interpreters. |
+| `:none` | — | Disables code execution (`:sandbox_unavailable`). |
+`:auto` prefers the native lightweight sandbox per platform (bubblewrap on Linux, sandbox-exec
+on macOS), falling back to Docker, then to `:none`. The host-process backends apply
+memory/CPU caps as inherited rlimits (since they don't use cgroups), and can be tuned with
+`config.sandbox_bwrap_extra` and `config.sandbox_seatbelt_profile`.
+One tradeoff worth knowing: unlike Docker (which only exposes its image), the host-process
+backends leave the host filesystem **readable** (read-only) inside the sandbox. On a host with
+secrets the model shouldn't read, prefer Docker, or add masks via `sandbox_bwrap_extra`
+(e.g. `["--tmpfs", "/home"]`).
+## Return contract
+- **Success** → a `String` (or a `Hash` for structured tools).
+- **Failure** → `{ error: "human-readable message", code: :symbol }`. Never an exception.
+Failure codes include `:exec_disabled`, `:path_denied`, `:not_a_file`, `:too_large`,
+`:command_denied`, `:tool_exception`.
+## Configuration reference
+| Option | Default | Purpose |
+| --- | --- | --- |
+| `fs_root` | `Dir.pwd` | Jail root for filesystem tools |
+| `enable_exec_tools` | `false` | Master switch for the dangerous set |
+| `allowed_commands` | `[]` | Executables `bash` and `process_start` may run |
+| `command_timeout` | `30` | Wall-clock limit (seconds) for spawned processes |
+| `max_processes` | `8` | Maximum concurrent background processes (`process_start`) |
+| `env_passthrough` | `%w[PATH LANG LC_ALL HOME]` | Env vars forwarded to subprocesses |
+| `max_output_tokens` | `2000` | Per-result token budget |
+| `tokenizer_model` | `"gpt-4o"` | Model id used to pick a tokenizer |
+| `regex_timeout` | `2` | Per-match timeout (seconds) for `grep_files` patterns |
+| `max_grep_matches` | `200` | Cap on grep matches per call |
+| `search_adapter` | `nil` | Web search backend: `nil`/`:tavily`, `:brave`, `:searxng`, or a custom adapter object |
+| `tavily_api_key` | `ENV["TAVILY_API_KEY"]` | API key for the default (Tavily) `web_search` adapter |
+| `brave_api_key` | `ENV["BRAVE_API_KEY"]` | Subscription token for the `:brave` adapter |
+| `searxng_url` | `ENV["SEARXNG_URL"]` | Base URL of a self-hosted SearXNG instance for the `:searxng` adapter |
+| `web_allowlist` / `web_denylist` | `[]` | Domain allow/deny lists enforced by `UrlGuard` |
+| `max_fetch_bytes` / `max_redirects` | `2_000_000` / `5` | `web_fetch`/`http_request` body cap and redirect limit |
+| `docker_image` / `python_image` / `rust_image` | `"ruby:3.3-slim"` / `"python:3.12-slim"` / `"rust:1-slim"` | Images for `run_ruby` / `run_python` / `run_rust` (Docker runtime) |
+| `sandbox_runtime` | `:auto` | `:auto`, `:docker`, `:bubblewrap`, `:sandbox_exec`, or `:none` |
+| `sandbox_bwrap_extra` | `[]` | Extra bubblewrap args (e.g. `["--tmpfs", "/home"]`) |
+| `sandbox_seatbelt_profile` | `nil` | Custom macOS Seatbelt SBPL profile (overrides the default) |
+| `allow_unsafe` | `false` | Operator master switch enabling the per-call unsafe override |
+| `unsafe_logger` | `nil` | Callable `->(tool_name, detail)` invoked whenever an override fires |
+| `sandbox_network` / `sandbox_memory` / `sandbox_cpus` / `sandbox_pids` | `none` / `256m` / `1.0` / `128` | Container limits for `run_ruby`/`run_python`/`run_rust` |
+| `http_timeout` | `10` | Open/read timeout (seconds) for the `gem`, `web_fetch`, `web_search`, and `http_request` tools |
+> Counting Claude models: call `RubyLLM::Tokenizer.enable_claude_approximation!` once at
+> boot, then set `tokenizer_model` to your Claude model id.
+## Roadmap
+Locked decisions: single gem, tokenizer-based budgeting, **Tavily** as the default search
+provider (behind a swappable adapter — Brave / SearXNG drop in), **Docker** as the
+`run_code` sandbox backend.
+1. **Skeleton + pattern** — base class, config, truncator, return contract, RSpec harness, `read_file`, `bash`. ✅
+2. **Filesystem read set** — `list_directory`, `glob`, `grep_files`. ✅
+3. **Ruby tools** — `gem` (RubyGems.org metadata, safe) and `run_ruby` (Docker sandbox, exec). ✅
+4. **Filesystem write set** — `write_file`, `edit_file`, `create_directory`, `move_file`, `delete_file` (exec). ✅
+5. **Git** — `git_status`/`git_diff`/`git_log` (safe), `git_add`/`git_commit`/`git_checkout` (exec). ✅
+6. **Verify loop** — `run_tests`, `lint`, `bundle` (exec). ✅
+7. **Python** — `run_python` (Docker sandbox) and `python_tests` (pytest/unittest), exec. ✅
+8. **Code intelligence** — `parse_ruby` (Ripper outline/navigation, safe). ✅
+9. **Web** — `web_fetch`, `web_search` (Tavily), `http_request` + `Safety::UrlGuard` SSRF protection. ✅
+10. **Patch, git history & data** — `apply_patch`, `git_show`, `git_blame`, `json_query`, `csv_read`/`csv_write`. ✅
+11. **Utilities, Rust & hardening** — `calculator`, `date_time`, `diff`, `todo_write`; `run_rust`; UrlGuard IP-pinning; operator-controlled unsafe override. ✅
+12. **Search, YAML & the Prism backend** — `git_grep`; `yaml_query` (safe_load) sharing one path engine with `json_query`; `parse_ruby` now auto-selects Prism (Ruby 3.3+) with a Ripper fallback and a parity harness. ✅
+13. **CI & sandbox runtimes** — GitHub Actions (rspec on Ruby 3.3/3.4 × Linux/macOS, parity harness, gem build); pluggable sandbox with bubblewrap (Linux) and sandbox-exec (macOS) backends alongside Docker, selected by `sandbox_runtime`. ✅
+14. **More tools** — `toml_query` (dependency-free TOML parser, completing JSON/YAML/TOML/CSV); `replace_in_files` (project-wide find/replace); `download_file` (SSRF-guarded fetch to disk); `git_branch`. ✅
+15. **Editing & navigation ergonomics** — `multi_edit` (atomic batched edits), `tree` (depth-limited overview); `read_file` already supports line ranges. ✅
+16. **Background processes** — `process_start` (gated), `process_output`, `process_list`, `process_kill`: stateful long-running commands (dev servers, watchers, log tails) with incremental output, bounded buffers, a concurrency cap, and group + `/proc`-descendant cleanup. ✅
+17. **Search isn't single-vendor** — two more `web_search` adapters behind the same seam: `:brave` (commercial Brave Search API, header-key auth) and `:searxng` (keyless, self-hosted), selected by `search_adapter`. ✅
+18. **Next** — an ecosystem-docs PR against `crmne/ruby_llm`, and a toolbox-level usage guide (safe→exec model, unsafe override, sandbox + search selection).
+## Development
+```bash
+bundle install          # installs ruby_llm, ruby_llm-tokenizer, rspec
+bundle exec rspec       # run the test suite
+bundle exec rake build  # build the gem into pkg/
+bundle exec rake install # build + install locally
+# verify the parse_ruby backends agree (Prism vs Ripper)
+ruby bin/verify_prism_parity
+```
+Requires Ruby >= 3.3. The Docker-backed tools (`run_ruby`/`run_python`/`run_rust`)
+need a Docker daemon to actually execute; without one they return a clean
+`:sandbox_unavailable` error, and their specs stub the sandbox.
+## License
+MIT.

data/bin/verify_prism_parity ADDED Viewed

@@ -0,0 +1,112 @@
+#!/usr/bin/env ruby
+# frozen_string_literal: true
+# Verifies that the Prism and Ripper backends of RubyOutline produce identical
+# outlines. Runnable on any Ruby; it only actually compares when Prism is
+# available (Ruby 3.3+, where Prism is bundled — no gem install needed).
+#
+# Usage:
+#   ruby bin/verify_prism_parity            # parse this gem's own lib/ + samples
+#   ruby bin/verify_prism_parity path/*.rb  # parse the given files instead
+#
+# In a sandbox / CI this is the "run it under ruby:3.4" check:
+#   docker run --rm -v "$PWD":/app -w /app ruby:3.4-slim ruby bin/verify_prism_parity
+#
+# Exits 0 if the backends agree (or Prism is unavailable, so there's nothing to
+# compare), 1 on any divergence.
+$LOAD_PATH.unshift File.expand_path("../lib", __dir__)
+require "ruby_llm/toolbox/ruby_outline"
+RO = RubyLLM::Toolbox::RubyOutline
+unless RO.prism_available?
+  warn "Prism is not available on this Ruby (#{RUBY_VERSION}); nothing to compare."
+  warn "Run this under Ruby 3.3+ (e.g. ruby:3.4-slim) to verify Prism/Ripper parity."
+  exit 0
+end
+SAMPLES = {
+  "nested" => <<~RUBY,
+    module App
+      CONFIG = 1
+      class User < Base
+        VERSION = "1"
+        def initialize; end
+        def self.find(id); end
+        class << self
+          def helper; end
+        end
+      end
+      module Helpers
+        def util; end
+      end
+    end
+  RUBY
+  "conditionals" => <<~RUBY,
+    class C
+      if RUBY_VERSION > "3"
+        def modern; end
+      else
+        def legacy; end
+      end
+      FLAG = true
+    end
+  RUBY
+  "toplevel" => <<~RUBY
+    TOP = 1
+    def bare; end
+    class A; end
+    class B::C; end
+  RUBY
+}
+def files_from_args
+  return [] if ARGV.empty?
+  ARGV.flat_map { |pattern| Dir.glob(pattern) }.select { |f| File.file?(f) }
+end
+def gem_lib_files
+  Dir.glob(File.expand_path("../lib/**/*.rb", __dir__))
+end
+def diff(label, source)
+  prism  = RO.extract(source, backend: RO::PrismBackend)
+  ripper = RO.extract(source, backend: RO::RipperBackend)
+  return nil if prism == ripper
+  { label: label, prism: prism, ripper: ripper }
+end
+targets = files_from_args
+targets = gem_lib_files if targets.empty?
+mismatches = []
+SAMPLES.each { |label, src| (m = diff("sample:#{label}", src)) && mismatches << m }
+targets.each do |path|
+  source = File.read(path)
+  m = diff(path, source)
+  mismatches << m if m
+rescue RubyLLM::Toolbox::RubyOutline::ParseError => e
+  warn "skip #{path}: #{e.message}"
+end
+checked = SAMPLES.size + targets.size
+if mismatches.empty?
+  puts "OK: Prism and Ripper agree on #{checked} source(s) (Ruby #{RUBY_VERSION}, Prism #{Prism::VERSION})."
+  exit 0
+end
+puts "MISMATCH in #{mismatches.size} of #{checked} source(s):"
+mismatches.each do |m|
+  puts "\n--- #{m[:label]} ---"
+  only_prism  = m[:prism]  - m[:ripper]
+  only_ripper = m[:ripper] - m[:prism]
+  only_prism.each  { |e| puts "  prism-only : #{e.kind} #{e.name} (L#{e.line}, d#{e.depth})" }
+  only_ripper.each { |e| puts "  ripper-only: #{e.kind} #{e.name} (L#{e.line}, d#{e.depth})" }
+end
+exit 1

data/lib/ruby_llm/toolbox/base.rb ADDED Viewed

@@ -0,0 +1,112 @@
+# frozen_string_literal: true
+require "ruby_llm"
+module RubyLLM
+  module Toolbox
+    # Every toolbox tool subclasses this instead of RubyLLM::Tool directly.
+    # It adds four things on top of the base ruby_llm DSL:
+    #
+    #   1. A per-instance config snapshot (overridable at construction).
+    #   2. An exec gate: tools marked `exec_tool!` refuse to run unless
+    #      config.enable_exec_tools is true.
+    #   3. A uniform failure contract: tools return { error:, code: } and never
+    #      raise into the harness. (This matches ruby_llm's own convention of
+    #      returning { error: ... } for bad arguments.)
+    #   4. Token-budgeted output via #truncate.
+    #
+    # Success returns are whatever the tool produces (usually a String, or a
+    # Hash for structured results). Failures are always { error:, code: }.
+    class Base < RubyLLM::Tool
+      # Raised when a call requests unsafe escalation that the operator has not
+      # permitted. Mapped to { error:, code: :unsafe_denied }.
+      class UnsafeDenied < StandardError; end
+      class << self
+        # Mark a subclass as part of the dangerous set.
+        def exec_tool!
+          @exec_tool = true
+        end
+        def exec_tool?
+          @exec_tool == true
+        end
+      end
+      def initialize(**overrides)
+        super()
+        @config = RubyLLM::Toolbox.config.dup_with(**overrides)
+      end
+      attr_reader :config
+      # ruby_llm derives the tool name from the full class name, which would
+      # turn RubyLLM::Toolbox::Tools::ReadFile into an ugly namespaced string.
+      # Demodulize first so tools get clean names ("read_file", "bash", ...).
+      def name
+        @name ||= begin
+          base = self.class.name.to_s.split("::").last.to_s
+          base.gsub(/([A-Z]+)([A-Z][a-z])/, '\1_\2')
+              .gsub(/([a-z\d])([A-Z])/, '\1_\2')
+              .downcase
+              .delete_suffix("_tool")
+        end
+      end
+      # Wraps the base #call to enforce the exec gate and guarantee that no
+      # exception ever escapes into the model loop.
+      def call(args)
+        if self.class.exec_tool? && !config.enable_exec_tools
+          return error(
+            "Exec tools are disabled. Set RubyLLM::Toolbox.config.enable_exec_tools = true " \
+            "(and an allowlist where relevant) to use #{self.class.name}.",
+            code: :exec_disabled
+          )
+        end
+        super
+      rescue UnsafeDenied => e
+        error(e.message, code: :unsafe_denied)
+      rescue StandardError => e
+        error("#{self.class.name} failed: #{e.message}", code: :tool_exception)
+      end
+      private
+      def error(message, code:)
+        { error: message, code: code }
+      end
+      # Security override. Returns true if this call may bypass its guard, false
+      # if no escalation was requested, and raises UnsafeDenied if escalation was
+      # requested but the operator hasn't permitted it (config.allow_unsafe). The
+      # agent can request, but only the operator can grant — and grants are
+      # logged via config.unsafe_logger.
+      def permit_unsafe!(requested, detail = nil)
+        return false unless requested
+        unless config.allow_unsafe
+          raise UnsafeDenied,
+                "this call requested an unsafe override, but it is not permitted. An operator must set " \
+                "RubyLLM::Toolbox.config.allow_unsafe = true to allow #{self.class.name} to bypass its guard."
+        end
+        logger = config.unsafe_logger
+        logger.call(self.class.name, detail) if logger.respond_to?(:call)
+        true
+      end
+      # A path jail that enforces fs_root unless this call was granted an unsafe
+      # override, in which case it resolves paths anywhere on the host.
+      def path_jail(unsafe: false, detail: nil)
+        Safety::PathJail.new(config.fs_root, enforce: !permit_unsafe!(unsafe, detail))
+      end
+      def truncate(text)
+        Truncator.new(
+          model: config.tokenizer_model,
+          max_tokens: config.max_output_tokens
+        ).call(text.to_s)
+      end
+    end
+  end
+end