npm - @humanjs/mcp - Versions diffs - 0.1.0 → 0.3.0 - Mend

@humanjs/mcp 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/README.md CHANGED Viewed

@@ -48,10 +48,15 @@ Some clients can register the server for you, no manual JSON:
 **Claude Code:**
 ```bash
+# this project only (default scope: local)
 claude mcp add humanjs --env HUMANJS_PERSONALITY=careful -- npx -y @humanjs/mcp
-# add --scope user to install it globally (all projects)
+# all your projects (global): add --scope user (-s user)
+claude mcp add humanjs --scope user --env HUMANJS_PERSONALITY=careful -- npx -y @humanjs/mcp
 ```
+`--scope` is `local` (default, this project only), `user` (you, across all projects), or `project` (shared via a checked-in `.mcp.json`). Use `user` for a one-time global install.
 **Cursor** — one click:
 [![Add to Cursor](https://cursor.com/deeplink/mcp-install-dark.svg)](https://cursor.com/install-mcp?name=humanjs&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBodW1hbmpzL21jcCJdfQ==)
@@ -66,6 +71,7 @@ The `config` payload is base64 of `{"command":"npx","args":["-y","@humanjs/mcp"]
 | `HUMANJS_SPEED` | `human` \| `fast` \| `instant` | `human` | Humanization pace. `human` = full realistic motion; `fast` = humanized but quick; `instant` = no humanized motion. Changes how long each action *executes*, not the wait between actions. |
 | `HUMANJS_HEADLESS` | `true` \| `false` | `false` | Headless browser. Default is visible — the point of the MCP. |
 | `HUMANJS_OUTPUT_DIR` | path | server's CWD | Where screenshots and recordings are written. |
+| `HUMANJS_UPLOAD_DIR` | path | server's CWD | Folder `human_upload` reads files from (basename only — can't escape it). |
 | `HUMANJS_VIEWPORT` | `WIDTHxHEIGHT` | `1440x900` | Default viewport for new sessions. Bump to `1920x1080` for crisper recordings. |
 | `HUMANJS_AUTO_INSTALL` | `true` \| `false` | `true` | Auto-download the Chromium binary on first launch if missing. Set `false` to require a manual `npx playwright install chromium`. |
 | `HUMANJS_PERSIST` | `true` \| `false` | `false` | Persist a profile across runs (logins/cookies survive). Uses `~/.humanjs/profile` unless `HUMANJS_USER_DATA_DIR` is set. See [Browser modes](#browser-modes). |
@@ -128,7 +134,7 @@ Click / rightClick / move / drag take a **selector or raw x/y coordinates** —
 | Tool | What it does |
 |---|---|
 | `human_start_recording` | Begin capturing (frames + action timeline) |
-| `human_stop_recording` | Finalize and write one or more files — `.mp4` / `.webm` / `.gif` / `.json` (e.g. video + timeline from one recording) |
+| `human_stop_recording` | Finalize and write one or more files — `.mp4` / `.webm` (video), `.gif`, `.json` (timeline), `.ts` (HumanJS script), `.spec.ts` / `.test.ts` (Playwright test). Pass several to export multiple ways, e.g. a video + a ready-to-commit test |
 **Sessions** — only needed for parallel browsers; the default session is implicit:
@@ -227,6 +233,7 @@ The server ships **built-in guidance** (sent to the agent on connect via MCP `in
 - **No arbitrary-JS `evaluate` tool.** Executing page-supplied JavaScript is a prompt-injection cliff — a malicious page could trick the agent into running code that exfiltrates data. The read-only inspection tools cover the legitimate "what's on the page" need.
 - **File-path safety.** Tools that write files accept a basename only; path components (`../`, absolute paths) are rejected, so a prompt-injected filename can't escape `HUMANJS_OUTPUT_DIR`.
+- **Upload path safety.** `human_upload` can attach a local file to a web form — a potential exfiltration path if a page prompt-injects the agent. So it reads files by **basename only** from `HUMANJS_UPLOAD_DIR` (default: the server's working dir); subdirectories, `../`, and absolute paths are rejected, so the agent can't reach (and send) files outside that folder. Point `HUMANJS_UPLOAD_DIR` at where your upload fixtures live.
 - **No credentials handling.** The server drives the browser; it doesn't manage logins, payment details, or secrets on your behalf.
 - **Attaching to your real browser (CDP) is opt-in and env-only.** When you point `HUMANJS_CDP_URL` at your running browser, the agent acts with *your* live sessions — a bigger blast radius if a page tries to manipulate it. That's why it's a deliberate config choice you make up front, never something a tool can switch on.

package/dist/index.cjs CHANGED Viewed

@@ -23,6 +23,7 @@ function readEnv() {
     speed: parseSpeed(process.env.HUMANJS_SPEED),
     headless: parseBool(process.env.HUMANJS_HEADLESS, false),
     outputDir: process.env.HUMANJS_OUTPUT_DIR ?? process.cwd(),
+    uploadDir: process.env.HUMANJS_UPLOAD_DIR ?? process.cwd(),
     viewport: parseViewport(process.env.HUMANJS_VIEWPORT),
     autoInstall: parseBool(process.env.HUMANJS_AUTO_INSTALL, true),
     browser: resolveBrowserConfig(),
@@ -192,7 +193,10 @@ var SessionManager = class {
       stop = resolve;
     });
     const video = options.video ?? true;
-    const done = session.human.record({ video, quality: options.quality ?? "high" }, () => signal);
+    const done = session.human.record(
+      { name: options.name, video, quality: options.quality ?? "high" },
+      () => signal
+    );
     session.recording = {
       name: options.name ?? "recording",
       startedAt: Date.now(),
@@ -589,6 +593,24 @@ function resolveOutputPath(outputDir, filename) {
   }
   return path.join(outputDir, base);
 }
+function resolveUploadPath(uploadDir, filename) {
+  const base = path.basename(filename);
+  if (base !== filename || base.length === 0) {
+    throw new Error(
+      `upload filename must be a plain name with no path components, got "${filename}". Files are read from HUMANJS_UPLOAD_DIR \u2014 place the file there (or point HUMANJS_UPLOAD_DIR at its folder) and pass just the name.`
+    );
+  }
+  return path.join(uploadDir, base);
+}
+function resolveRecordingFormat(filename) {
+  const lower = filename.toLowerCase();
+  if (lower.endsWith(".mp4") || lower.endsWith(".webm")) return "video";
+  if (lower.endsWith(".gif")) return "gif";
+  if (lower.endsWith(".json")) return "timeline";
+  if (lower.endsWith(".spec.ts") || lower.endsWith(".test.ts")) return "playwright";
+  if (lower.endsWith(".ts")) return "humanjs";
+  return null;
+}
 // src/tools/inspection.ts
 var sessionArg = zod.z.string().optional().describe("Session ID to act on. Omit to use the default session.");
@@ -634,6 +656,22 @@ function registerInspectionTools(server, ctx) {
       return { content: [{ type: "text", text }] };
     }
   );
+  server.registerTool(
+    "human_outline",
+    {
+      title: "Page outline (accessibility tree)",
+      description: 'Returns a compact accessibility-tree outline of the page (or a region) \u2014 every interactive element and landmark by its ARIA role + accessible name, as YAML (e.g. `- button "Sign in"`, `- textbox "Email"`). The most token-efficient way to see what is actionable and pick a selector: the names map directly to getByRole / accessible-name selectors. Prefer this over human_get_html for "what can I click or fill"; use human_screenshot when you need the visual layout.',
+      inputSchema: {
+        selector: zod.z.string().optional().describe("Optional region selector to scope the outline. Omit for the whole page."),
+        session: sessionArg
+      }
+    },
+    async ({ selector, session }) => {
+      const { human } = await ctx.sessions.get(session);
+      const text = await human.outline(selector);
+      return { content: [{ type: "text", text }] };
+    }
+  );
   server.registerTool(
     "human_get_text",
     {
@@ -712,7 +750,7 @@ function resolveTarget(input) {
 var sessionArg2 = zod.z.string().optional().describe(
   "Session ID to act on. Omit to use the default session (created lazily on first call). Use human_create_session for parallel browsers."
 );
-function registerPrimitiveTools(server, { sessions }) {
+function registerPrimitiveTools(server, { sessions, env }) {
   server.registerTool(
     "human_goto",
     {
@@ -759,6 +797,22 @@ function registerPrimitiveTools(server, { sessions }) {
       };
     }
   );
+  server.registerTool(
+    "human_doubleClick",
+    {
+      title: "Double-click (humanized)",
+      description: "Double-clicks the target \u2014 same humanized motion as human_click, but two presses within the OS double-click window. Use for things that open/activate on double-click (list rows, file items, editable cells). Target is a selector OR x/y coordinates.",
+      inputSchema: { ...targetFields, session: sessionArg2 }
+    },
+    async ({ selector, x, y, session }) => {
+      const { human } = await sessions.get(session);
+      const target = resolveTarget({ selector, x, y });
+      await human.doubleClick(target);
+      return {
+        content: [{ type: "text", text: `double-clicked ${describeTarget(selector, x, y)}` }]
+      };
+    }
+  );
   server.registerTool(
     "human_hover",
     {
@@ -853,6 +907,94 @@ function registerPrimitiveTools(server, { sessions }) {
       return { content: [{ type: "text", text: `pasted ${value.length} chars into ${selector}` }] };
     }
   );
+  server.registerTool(
+    "human_clear",
+    {
+      title: "Clear a field (humanized)",
+      description: "Clears a text field (input/textarea/contenteditable) with a real keyboard gesture \u2014 click to focus, select-all, then delete \u2014 firing the input events the page expects. Use before human_type when you need to replace an existing value rather than append to it.",
+      inputSchema: {
+        selector: zod.z.string().describe("Selector of the field to clear."),
+        session: sessionArg2
+      }
+    },
+    async ({ selector, session }) => {
+      const { human } = await sessions.get(session);
+      await human.clear(selector);
+      return { content: [{ type: "text", text: `cleared ${selector}` }] };
+    }
+  );
+  server.registerTool(
+    "human_check",
+    {
+      title: "Check a box (humanized)",
+      description: "Ticks a checkbox or radio \u2014 moves the cursor to it and clicks, but only if it is not already checked (a real user does not re-click a ticked box). Verifies the resulting state. Pass the checkbox/radio input itself (or a [role=checkbox]) \u2014 not a wrapping <label> \u2014 so the current state can be read and the click stays idempotent.",
+      inputSchema: {
+        selector: zod.z.string().describe("Selector of the checkbox/radio input."),
+        session: sessionArg2
+      }
+    },
+    async ({ selector, session }) => {
+      const { human } = await sessions.get(session);
+      await human.check(selector);
+      return { content: [{ type: "text", text: `checked ${selector}` }] };
+    }
+  );
+  server.registerTool(
+    "human_uncheck",
+    {
+      title: "Uncheck a box (humanized)",
+      description: "Unticks a checkbox \u2014 humanized click only if currently checked. Radios cannot be unchecked by clicking (select a different option instead). Pass the checkbox input itself (or a [role=checkbox]) \u2014 not a wrapping <label> \u2014 so its state can be read and the click stays idempotent.",
+      inputSchema: {
+        selector: zod.z.string().describe("Selector of the checkbox input."),
+        session: sessionArg2
+      }
+    },
+    async ({ selector, session }) => {
+      const { human } = await sessions.get(session);
+      await human.uncheck(selector);
+      return { content: [{ type: "text", text: `unchecked ${selector}` }] };
+    }
+  );
+  server.registerTool(
+    "human_selectOption",
+    {
+      title: "Select dropdown option (humanized)",
+      description: "Chooses option(s) in a native <select> \u2014 moves the cursor to the dropdown, then sets the value (native selects open an OS menu automation can't drive, so the value is set programmatically, firing change/input). For custom DOM dropdowns, use human_click on the rendered options instead. Match by value(s); pass one string or an array for multi-selects.",
+      inputSchema: {
+        selector: zod.z.string().describe("Selector of the <select> element."),
+        values: zod.z.union([zod.z.string(), zod.z.array(zod.z.string())]).describe("Option value, or array of values for a multi-select."),
+        session: sessionArg2
+      }
+    },
+    async ({ selector, values, session }) => {
+      const { human } = await sessions.get(session);
+      const selected = await human.selectOption(selector, values);
+      return {
+        content: [{ type: "text", text: `selected ${selected.join(", ")} in ${selector}` }]
+      };
+    }
+  );
+  server.registerTool(
+    "human_upload",
+    {
+      title: "Upload file(s) (humanized)",
+      description: `Attaches file(s) to a file input \u2014 moves the cursor to the control, then sets the files (never opens the OS dialog, which would hang). For safety, files are read by basename from HUMANJS_UPLOAD_DIR (default: the server working dir) \u2014 subdirectories, "../", and absolute paths are rejected, so the agent can't read and exfiltrate arbitrary local files. Pass the <input type="file"> selector and the filename(s).`,
+      inputSchema: {
+        selector: zod.z.string().describe("Selector of the file input."),
+        files: zod.z.union([zod.z.string(), zod.z.array(zod.z.string())]).describe("Filename(s) inside HUMANJS_UPLOAD_DIR \u2014 a basename only, no path components."),
+        session: sessionArg2
+      }
+    },
+    async ({ selector, files, session }) => {
+      const { human } = await sessions.get(session);
+      const names = Array.isArray(files) ? files : [files];
+      const paths = names.map((name) => resolveUploadPath(env.uploadDir, name));
+      await human.upload(selector, paths);
+      return {
+        content: [{ type: "text", text: `uploaded ${paths.length} file(s) to ${selector}` }]
+      };
+    }
+  );
   server.registerTool(
     "human_press",
     {
@@ -955,32 +1097,32 @@ function registerRecordingTools(server, { sessions, env }) {
     "human_stop_recording",
     {
       title: "Stop recording and save",
-      description: `Stops the active recording and writes it to one or more files in HUMANJS_OUTPUT_DIR. Each filename's extension picks its format: .mp4/.webm = video, .gif = animated gif, .json = action timeline. Pass several to export the same recording multiple ways, e.g. ["demo.mp4", "demo.json"] for video + timeline. Path components are rejected for safety.`,
+      description: `Stops the active recording and writes it to one or more files in HUMANJS_OUTPUT_DIR. Each filename's extension picks its format: .mp4/.webm = video, .gif = animated gif, .json = action timeline, .ts = runnable HumanJS script, .spec.ts/.test.ts = @playwright/test spec (humanized, with derived assertions). Pass several to export the same recording multiple ways, e.g. ["demo.mp4", "checkout.spec.ts"] for a video plus a ready-to-commit test. Path components are rejected for safety.`,
       inputSchema: {
         filenames: zod.z.array(zod.z.string()).min(1).describe(
-          'One or more output filenames. The recording is saved to each, format chosen by extension. e.g. ["demo.mp4"] or ["demo.mp4", "demo.gif", "demo.json"].'
+          'One or more output filenames. The recording is saved to each, format chosen by extension. e.g. ["demo.mp4"], ["checkout.spec.ts"], or ["demo.mp4", "demo.json", "demo.ts"].'
         ),
         session: zod.z.string().optional().describe("Session ID. Omit for the default session.")
       }
     },
     async ({ filenames, session }) => {
-      const targets = filenames.map((filename) => ({
-        path: resolveOutputPath(env.outputDir, filename),
-        ext: path.extname(filename).toLowerCase()
-      }));
-      for (const { ext } of targets) {
-        if (ext !== ".mp4" && ext !== ".webm" && ext !== ".gif" && ext !== ".json") {
+      const targets = filenames.map((filename) => {
+        const format = resolveRecordingFormat(filename);
+        if (format === null) {
           throw new Error(
-            `Unsupported output extension "${ext}". Use .mp4, .webm, .gif, or .json.`
+            `Unsupported output extension for "${filename}". Use .mp4/.webm (video), .gif, .json (timeline), .ts (HumanJS script), or .spec.ts/.test.ts (Playwright test).`
           );
         }
-      }
+        return { path: resolveOutputPath(env.outputDir, filename), format };
+      });
       const recording = await sessions.stopRecording(session);
       try {
         const saved = [];
-        for (const { path, ext } of targets) {
-          if (ext === ".gif") saved.push(await recording.toGif(path));
-          else if (ext === ".json") saved.push(await recording.toTimeline(path));
+        for (const { path, format } of targets) {
+          if (format === "gif") saved.push(await recording.toGif(path));
+          else if (format === "timeline") saved.push(await recording.toTimeline(path));
+          else if (format === "humanjs") saved.push(await recording.toHumanJS(path));
+          else if (format === "playwright") saved.push(await recording.toPlaywright(path));
           else saved.push(await recording.toVideo(path));
         }
         return { content: [{ type: "text", text: `saved recording to:
@@ -1065,6 +1207,10 @@ Recording a flow (the natural-looking way):
 1. EXPLORE FIRST (un-recorded). Navigate the flow once to discover correct, unambiguous selectors (human_screenshot / human_get_html / human_get_attribute). Do this by default whenever the selectors aren't already known \u2014 no need for the user to ask. Skip it only if the selectors are already known or the user tells you not to explore.
 2. THEN RECORD ONE CLEAN RUN AS A SINGLE BATCH: human_start_recording + every action + human_stop_recording, all emitted in one turn. Keep selector-guessing and fumbles out of the take.
+Export as a test: human_stop_recording picks format by extension. A .spec.ts (or .test.ts) filename writes a ready-to-commit @playwright/test with derived assertions; a .ts writes a standalone HumanJS script; .mp4/.webm/.gif/.json are video/timeline. So "record this flow and save it as a test" = run the clean pass, then stop into e.g. "checkout.spec.ts".
+Captured input + passwords: typed/pasted text IS recorded into the timeline and code exports, so generated scripts/tests are runnable \u2014 EXCEPT password fields, which are always masked (emitted as an empty string with a "fill in" comment). This is intentional, not a bug; don't work around it by hand-editing the secret back in. If the user explicitly wants the flow to log in, edit the exported file to read the credential from an env var (e.g. process.env.APP_PASSWORD) and tell them to set it \u2014 never hardcode a real password into a file that may be committed.
 Dynamic UI: prefer specific selectors (role, aria-label) over text \u2014 the same visible text often matches several cards before a filter, or the wrong one after. If a click reports multiple matches, narrow the selector.
 Browser state: by default each run is a fresh, signed-out browser. If a flow needs a login, tell the user to enable persistence (human_enable_persistence or HUMANJS_PERSIST) or CDP attach \u2014 see human_browser_info.`;