npm - rhai-mcp - Versions diffs - 0.1.0 - Mend

rhai-mcp 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

package/.claude-plugin/marketplace.json +11 -0
package/.claude-plugin/plugin.json +7 -0
package/.mcp.json +12 -0
package/README.md +178 -0
package/dist/agent.js +542 -0
package/dist/agent.js.map +1 -0
package/dist/browser.js +361 -0
package/dist/browser.js.map +1 -0
package/dist/cli.js +43 -0
package/dist/cli.js.map +1 -0
package/dist/config.js +78 -0
package/dist/config.js.map +1 -0
package/dist/login.js +65 -0
package/dist/login.js.map +1 -0
package/dist/memory.js +266 -0
package/dist/memory.js.map +1 -0
package/dist/run.js +61 -0
package/dist/run.js.map +1 -0
package/dist/server.js +96 -0
package/dist/server.js.map +1 -0
package/dist/ui.js +206 -0
package/dist/ui.js.map +1 -0
package/package.json +34 -0
package/prisma/schema.prisma +72 -0

package/.claude-plugin/marketplace.json ADDED Viewed

@@ -0,0 +1,11 @@
+{
+  "name": "rhai",
+  "owner": { "name": "Aman Pandit" },
+  "plugins": [
+    {
+      "name": "rhai",
+      "source": "./",
+      "description": "Browser agent coding agents delegate manual web tasks to. Bundles the rhai MCP server."
+    }
+  ]
+}

package/.claude-plugin/plugin.json ADDED Viewed

@@ -0,0 +1,7 @@
+{
+  "name": "rhai",
+  "description": "A browser agent coding agents delegate manual web tasks to — enabling APIs, creating projects, generating keys in dashboards. Plans, acts, web-searches solutions, and self-corrects on roadblocks. Bundles the rhai MCP server.",
+  "version": "0.1.0",
+  "author": { "name": "Aman Pandit" },
+  "keywords": ["mcp", "browser", "automation", "agent", "setup"]
+}

package/.mcp.json ADDED Viewed

@@ -0,0 +1,12 @@
+{
+  "mcpServers": {
+    "rhai": {
+      "type": "stdio",
+      "command": "npx",
+      "args": ["-y", "rhai-mcp"],
+      "env": {
+        "OPENAI_API_KEY": "${OPENAI_API_KEY}"
+      }
+    }
+  }
+}

package/README.md ADDED Viewed

@@ -0,0 +1,178 @@
+# Rhai — a browser agent with a brain, for coding agents
+When you're working with a coding agent (Claude Code, Codex, Cursor) and it says *"now go to the Google Cloud Console and enable the Gmail API"* or *"create a project in Supabase and copy the anon key"* — that's the part where **you** stop and start clicking. Rhai removes that step.
+Rhai is an **MCP server** that gives your coding agent a second agent: an autonomous browser operator that drives a real, already-logged-in Chrome. The coding agent delegates the manual web task; Rhai observes the page, acts, and **re-reasons on its own when it hits a roadblock** — a moved menu, a missing prerequisite, an error message, a wrong instruction. It reports back a summary plus any values you need (API keys, project IDs, URLs).
+```
+Claude Code (writing your app)
+   │  "user needs to enable the WhatsApp API in Google Cloud Console"
+   │  calls MCP tool:  delegate_browser_task("Enable the WhatsApp Business API for project X")
+   ▼
+RHAI  (this MCP server)
+   ├─ drives a real Chrome using a persistent profile (already logged in)
+   ├─ Claude loop:  snapshot → act → snapshot → self-correct on roadblocks
+   └─ returns: "Done. API enabled. Here's the new key: ..."   ← back to Claude Code
+```
+## How it works
+- **Delivery: MCP.** One server, usable by any agent that speaks the Model Context Protocol — Claude Code, Codex, Cursor, etc.
+- **Hands: Playwright.** Drives a real Chromium. Rhai reads the page's accessibility/DOM tree, tags interactive elements with refs, and clicks/types by ref — fast and reliable on standard web apps like cloud consoles.
+- **Brain: OpenAI GPT-5** (via the Responses API, with reasoning effort `high`). An agentic loop that follows a deliberate cycle: **recall → plan → act → detect failure → fix → verify → remember**. It writes an explicit plan, checks after each action whether it worked, and revises the plan when reality diverges.
+- **Memory: Prisma + SQLite.** Every task, every action, and every distilled *learning* (the correct navigation path for a service, a non-obvious prerequisite, the fix to a roadblock) is stored in a local SQL database. Before each task it recalls what it learned on that service before; after, it writes new learnings — so run #2 on Google Cloud Console is faster and smarter than run #1.
+- **Self-service research: web search.** When the agent hits a roadblock it can't solve from memory, it uses GPT-5's built-in `web_search` tool to look up how others solved that exact problem (it searches the error text or "how to X in <service>"), then revises its plan and tries the corrected approach.
+- **Auth: a persistent browser profile.** You log into each service **once** (`npm run login`); the session is saved into a profile dir the agent reuses forever. The agent never handles your password or 2FA.
+### The failure → fix loop
+This is the core of the "comprehensive brain." After every action the agent asks: *did that do what I expected?* If a button is missing, a setting is greyed out, an error appears, or the page didn't change — that's a roadblock, and it does **not** blindly retry. Instead it:
+1. Reasons about *why* it's stuck (the coding agent's suggested steps may be wrong/outdated/missing a prerequisite).
+2. Checks **memory** for a known fix.
+3. If still stuck, runs a **web search** for the solution.
+4. Calls `set_plan` again with the corrected approach and continues.
+5. **Verifies** the end state before declaring success, then **remembers** what worked.
+## Setup
+```bash
+npm install          # downloads Chromium (Playwright) and generates the Prisma client
+cp .env.example .env # then put your OPENAI_API_KEY in .env
+npm run build
+```
+The memory database (`rhai.db`) is created automatically on first run — no migration step needed.
+### 1. Log into the services you'll use (one time each)
+```bash
+rhai-mcp login https://console.cloud.google.com
+rhai-mcp login https://supabase.com/dashboard
+# (from a clone, before publishing:  npm run login -- <url>)
+```
+A real Chrome window opens. Log in normally (including 2FA). Press **Enter** in the terminal to save the session. The agent reuses these logins from now on. The agent itself runs **headless** (no visible window) — set `RHAI_HEADLESS=false` when you want to watch it work. This `login` step always opens a window regardless, since you need to see it to sign in.
+> **"This login window has none of my saved passwords!"** Right — Playwright launches an isolated profile by design, and Chrome 136+ blocks automation from touching your *default* profile (anti-cookie-theft). Two ways to fix it:
+>
+> **Quick:** logging in is one-time per service — paste from your password manager once, and the session persists forever. Add `RHAI_BROWSER_CHANNEL=chrome` to use your real Chrome binary (familiar UI, less bot-detection), though the profile is still fresh.
+>
+> **Best — use your real browser (saved passwords + extensions):** attach Rhai to a Chrome *you* launch on a dedicated, synced profile.
+> 1. Launch Chrome with a debug port and a dedicated dir (macOS):
+>    ```bash
+>    "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
+>      --remote-debugging-port=9222 --user-data-dir="$HOME/.rhai-chrome"
+>    ```
+> 2. In that window, sign into Chrome once and turn on **Sync** (your saved passwords flow in), **or** install your password-manager extension (1Password/Bitwarden) — both work because it's a real Chrome.
+> 3. Set `RHAI_CDP_URL=http://localhost:9222` in `.env`.
+>
+> Now `rhai-mcp login` opens in *your* browser with autofill, and the agent attaches to that same Chrome (it disconnects when done — it never kills your browser).
+### 2. Install it into your coding agent
+Rhai is an MCP server, so any agent that speaks MCP can use it. Once installed, the agent **reaches for it automatically** — the server ships proactive `instructions` telling the agent to delegate any "go click through a website" step to Rhai, and it renders as a normal tool call in the agent's output.
+> The commands below assume Rhai is published to npm as `rhai-mcp` (so `npx -y rhai-mcp` just works). Until then, swap `npx -y rhai-mcp` for `node /Users/amanpandit/Desktop/Rhai/dist/cli.js` to run your local build.
+**Claude Code — as a plugin (one install, recommended):**
+```
+/plugin marketplace add amanpandit/rhai        # or the path/URL to this repo
+/plugin install rhai@rhai
+```
+The plugin bundles the MCP server ([.mcp.json](.mcp.json)) and registers it automatically.
+**Claude Code — as a plain MCP server:**
+```bash
+claude mcp add rhai --env OPENAI_API_KEY=$OPENAI_API_KEY -- npx -y rhai-mcp
+```
+**Cursor** — add to `~/.cursor/mcp.json` (global) or `.cursor/mcp.json` (project):
+```json
+{
+  "mcpServers": {
+    "rhai": {
+      "command": "npx",
+      "args": ["-y", "rhai-mcp"],
+      "env": { "OPENAI_API_KEY": "${OPENAI_API_KEY}" }
+    }
+  }
+}
+```
+**Codex CLI** — add to `~/.codex/config.toml`:
+```toml
+[mcp_servers.rhai]
+command = "npx"
+args = ["-y", "rhai-mcp"]
+[mcp_servers.rhai.env]
+OPENAI_API_KEY = "${OPENAI_API_KEY}"
+```
+That's it. Now when the agent's work hits a manual web step, it calls Rhai itself instead of telling you to go click — no prompting needed. You can still nudge it explicitly ("use rhai to …") if you want.
+## Usage
+Once wired up, your coding agent calls the `delegate_browser_task` tool on its own when it hits a manual web step. You can also nudge it:
+> "Use the rhai tool to enable the Gmail API for my Google Cloud project `my-app`, then give me the steps to use it."
+The tool takes:
+- `task` (required) — the **goal** as an outcome, e.g. *"Enable the Gmail API for project my-app"*. Describe the destination, not the clicks — the agent figures out the path even if the suggested steps are slightly off.
+- `start_url` (optional) — a page to open first.
+- `context` (optional) — which account/project, specifics, or steps you think are involved.
+It returns a status (`success` / `blocked`), a summary, and any concrete values (keys, IDs, URLs) verbatim.
+## Configuration (`.env`)
+| Var | Default | Purpose |
+|---|---|---|
+| `OPENAI_API_KEY` | — | **Required.** Powers the reasoning loop (GPT-5). |
+| `RHAI_PROFILE_DIR` | `./.rhai-profile` | Where logged-in sessions are stored. |
+| `RHAI_HEADLESS` | `true` | Agent runs headless. Set `false` to watch it work. |
+| `RHAI_MODEL` | `gpt-5` | Model for the brain. |
+| `RHAI_MAX_STEPS` | `40` | Max reason/act steps before giving up. |
+| `DATABASE_URL` | `file:./rhai.db` | Where the persistent memory (SQLite) lives. |
+## Safety notes
+- The profile dir holds live logged-in sessions — it's gitignored; keep it private.
+- The agent will not complete hard 2FA or knowingly take destructive actions without stopping; it reports `blocked` instead.
+- It runs headless by default; set `RHAI_HEADLESS=false` the first few times so you can watch exactly what it does.
+## Project layout
+```
+.claude-plugin/
+  plugin.json      Claude Code plugin manifest
+  marketplace.json marketplace entry for one-line install
+.mcp.json          bundled MCP server config (used by the plugin / projects)
+prisma/
+  schema.prisma    Task / Action / Memory tables (SQLite)
+src/
+  config.ts        env/config + DB location + stderr logging
+  browser.ts       Playwright persistent context + DOM snapshot + auth gate + actions
+  memory.ts        Prisma-backed persistent memory (recall / remember / task log)
+  agent.ts         the GPT-5 reasoning loop: recall → plan → act → fix → verify → remember
+  server.ts        MCP server (proactive instructions + delegate_browser_task tool)
+  cli.ts           binary entry: server (default) | login | task
+  login.ts         one-time interactive login
+  run.ts           direct single-task runner (for testing)
+```
+## Inspecting memory
+It's just a SQLite file. Browse it with `npx prisma studio` (opens a local UI), or query `rhai.db` with any SQLite client. The `Memory` table holds the reusable learnings; `Task` + `Action` hold the full transcript of every run.
+## Roadmap ideas
+- Screenshot/vision fallback when the DOM is unreadable (canvas-heavy UIs).
+- A `wait_for_human` handoff for unavoidable 2FA/captcha mid-task.
+- Periodic memory compaction (merge near-duplicate learnings per service).