@humanjs/mcp 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -48,10 +48,15 @@ Some clients can register the server for you, no manual JSON:
48
48
  **Claude Code:**
49
49
 
50
50
  ```bash
51
+ # this project only (default scope: local)
51
52
  claude mcp add humanjs --env HUMANJS_PERSONALITY=careful -- npx -y @humanjs/mcp
52
- # add --scope user to install it globally (all projects)
53
+
54
+ # all your projects (global): add --scope user (-s user)
55
+ claude mcp add humanjs --scope user --env HUMANJS_PERSONALITY=careful -- npx -y @humanjs/mcp
53
56
  ```
54
57
 
58
+ `--scope` is `local` (default, this project only), `user` (you, across all projects), or `project` (shared via a checked-in `.mcp.json`). Use `user` for a one-time global install.
59
+
55
60
  **Cursor** — one click:
56
61
 
57
62
  [![Add to Cursor](https://cursor.com/deeplink/mcp-install-dark.svg)](https://cursor.com/install-mcp?name=humanjs&config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBodW1hbmpzL21jcCJdfQ==)
@@ -66,6 +71,7 @@ The `config` payload is base64 of `{"command":"npx","args":["-y","@humanjs/mcp"]
66
71
  | `HUMANJS_SPEED` | `human` \| `fast` \| `instant` | `human` | Humanization pace. `human` = full realistic motion; `fast` = humanized but quick; `instant` = no humanized motion. Changes how long each action *executes*, not the wait between actions. |
67
72
  | `HUMANJS_HEADLESS` | `true` \| `false` | `false` | Headless browser. Default is visible — the point of the MCP. |
68
73
  | `HUMANJS_OUTPUT_DIR` | path | server's CWD | Where screenshots and recordings are written. |
74
+ | `HUMANJS_UPLOAD_DIR` | path | server's CWD | Folder `human_upload` reads files from (basename only — can't escape it). |
69
75
  | `HUMANJS_VIEWPORT` | `WIDTHxHEIGHT` | `1440x900` | Default viewport for new sessions. Bump to `1920x1080` for crisper recordings. |
70
76
  | `HUMANJS_AUTO_INSTALL` | `true` \| `false` | `true` | Auto-download the Chromium binary on first launch if missing. Set `false` to require a manual `npx playwright install chromium`. |
71
77
  | `HUMANJS_PERSIST` | `true` \| `false` | `false` | Persist a profile across runs (logins/cookies survive). Uses `~/.humanjs/profile` unless `HUMANJS_USER_DATA_DIR` is set. See [Browser modes](#browser-modes). |
@@ -128,7 +134,7 @@ Click / rightClick / move / drag take a **selector or raw x/y coordinates** —
128
134
  | Tool | What it does |
129
135
  |---|---|
130
136
  | `human_start_recording` | Begin capturing (frames + action timeline) |
131
- | `human_stop_recording` | Finalize and write one or more files — `.mp4` / `.webm` / `.gif` / `.json` (e.g. video + timeline from one recording) |
137
+ | `human_stop_recording` | Finalize and write one or more files — `.mp4` / `.webm` (video), `.gif`, `.json` (timeline), `.ts` (HumanJS script), `.spec.ts` / `.test.ts` (Playwright test). Pass several to export multiple ways, e.g. a video + a ready-to-commit test |
132
138
 
133
139
  **Sessions** — only needed for parallel browsers; the default session is implicit:
134
140
 
@@ -227,6 +233,7 @@ The server ships **built-in guidance** (sent to the agent on connect via MCP `in
227
233
 
228
234
  - **No arbitrary-JS `evaluate` tool.** Executing page-supplied JavaScript is a prompt-injection cliff — a malicious page could trick the agent into running code that exfiltrates data. The read-only inspection tools cover the legitimate "what's on the page" need.
229
235
  - **File-path safety.** Tools that write files accept a basename only; path components (`../`, absolute paths) are rejected, so a prompt-injected filename can't escape `HUMANJS_OUTPUT_DIR`.
236
+ - **Upload path safety.** `human_upload` can attach a local file to a web form — a potential exfiltration path if a page prompt-injects the agent. So it reads files by **basename only** from `HUMANJS_UPLOAD_DIR` (default: the server's working dir); subdirectories, `../`, and absolute paths are rejected, so the agent can't reach (and send) files outside that folder. Point `HUMANJS_UPLOAD_DIR` at where your upload fixtures live.
230
237
  - **No credentials handling.** The server drives the browser; it doesn't manage logins, payment details, or secrets on your behalf.
231
238
  - **Attaching to your real browser (CDP) is opt-in and env-only.** When you point `HUMANJS_CDP_URL` at your running browser, the agent acts with *your* live sessions — a bigger blast radius if a page tries to manipulate it. That's why it's a deliberate config choice you make up front, never something a tool can switch on.
232
239
 
package/dist/index.cjs CHANGED
@@ -23,6 +23,7 @@ function readEnv() {
23
23
  speed: parseSpeed(process.env.HUMANJS_SPEED),
24
24
  headless: parseBool(process.env.HUMANJS_HEADLESS, false),
25
25
  outputDir: process.env.HUMANJS_OUTPUT_DIR ?? process.cwd(),
26
+ uploadDir: process.env.HUMANJS_UPLOAD_DIR ?? process.cwd(),
26
27
  viewport: parseViewport(process.env.HUMANJS_VIEWPORT),
27
28
  autoInstall: parseBool(process.env.HUMANJS_AUTO_INSTALL, true),
28
29
  browser: resolveBrowserConfig(),
@@ -192,7 +193,10 @@ var SessionManager = class {
192
193
  stop = resolve;
193
194
  });
194
195
  const video = options.video ?? true;
195
- const done = session.human.record({ video, quality: options.quality ?? "high" }, () => signal);
196
+ const done = session.human.record(
197
+ { name: options.name, video, quality: options.quality ?? "high" },
198
+ () => signal
199
+ );
196
200
  session.recording = {
197
201
  name: options.name ?? "recording",
198
202
  startedAt: Date.now(),
@@ -589,6 +593,24 @@ function resolveOutputPath(outputDir, filename) {
589
593
  }
590
594
  return path.join(outputDir, base);
591
595
  }
596
+ function resolveUploadPath(uploadDir, filename) {
597
+ const base = path.basename(filename);
598
+ if (base !== filename || base.length === 0) {
599
+ throw new Error(
600
+ `upload filename must be a plain name with no path components, got "${filename}". Files are read from HUMANJS_UPLOAD_DIR \u2014 place the file there (or point HUMANJS_UPLOAD_DIR at its folder) and pass just the name.`
601
+ );
602
+ }
603
+ return path.join(uploadDir, base);
604
+ }
605
+ function resolveRecordingFormat(filename) {
606
+ const lower = filename.toLowerCase();
607
+ if (lower.endsWith(".mp4") || lower.endsWith(".webm")) return "video";
608
+ if (lower.endsWith(".gif")) return "gif";
609
+ if (lower.endsWith(".json")) return "timeline";
610
+ if (lower.endsWith(".spec.ts") || lower.endsWith(".test.ts")) return "playwright";
611
+ if (lower.endsWith(".ts")) return "humanjs";
612
+ return null;
613
+ }
592
614
 
593
615
  // src/tools/inspection.ts
594
616
  var sessionArg = zod.z.string().optional().describe("Session ID to act on. Omit to use the default session.");
@@ -634,6 +656,22 @@ function registerInspectionTools(server, ctx) {
634
656
  return { content: [{ type: "text", text }] };
635
657
  }
636
658
  );
659
+ server.registerTool(
660
+ "human_outline",
661
+ {
662
+ title: "Page outline (accessibility tree)",
663
+ description: 'Returns a compact accessibility-tree outline of the page (or a region) \u2014 every interactive element and landmark by its ARIA role + accessible name, as YAML (e.g. `- button "Sign in"`, `- textbox "Email"`). The most token-efficient way to see what is actionable and pick a selector: the names map directly to getByRole / accessible-name selectors. Prefer this over human_get_html for "what can I click or fill"; use human_screenshot when you need the visual layout.',
664
+ inputSchema: {
665
+ selector: zod.z.string().optional().describe("Optional region selector to scope the outline. Omit for the whole page."),
666
+ session: sessionArg
667
+ }
668
+ },
669
+ async ({ selector, session }) => {
670
+ const { human } = await ctx.sessions.get(session);
671
+ const text = await human.outline(selector);
672
+ return { content: [{ type: "text", text }] };
673
+ }
674
+ );
637
675
  server.registerTool(
638
676
  "human_get_text",
639
677
  {
@@ -712,7 +750,7 @@ function resolveTarget(input) {
712
750
  var sessionArg2 = zod.z.string().optional().describe(
713
751
  "Session ID to act on. Omit to use the default session (created lazily on first call). Use human_create_session for parallel browsers."
714
752
  );
715
- function registerPrimitiveTools(server, { sessions }) {
753
+ function registerPrimitiveTools(server, { sessions, env }) {
716
754
  server.registerTool(
717
755
  "human_goto",
718
756
  {
@@ -759,6 +797,22 @@ function registerPrimitiveTools(server, { sessions }) {
759
797
  };
760
798
  }
761
799
  );
800
+ server.registerTool(
801
+ "human_doubleClick",
802
+ {
803
+ title: "Double-click (humanized)",
804
+ description: "Double-clicks the target \u2014 same humanized motion as human_click, but two presses within the OS double-click window. Use for things that open/activate on double-click (list rows, file items, editable cells). Target is a selector OR x/y coordinates.",
805
+ inputSchema: { ...targetFields, session: sessionArg2 }
806
+ },
807
+ async ({ selector, x, y, session }) => {
808
+ const { human } = await sessions.get(session);
809
+ const target = resolveTarget({ selector, x, y });
810
+ await human.doubleClick(target);
811
+ return {
812
+ content: [{ type: "text", text: `double-clicked ${describeTarget(selector, x, y)}` }]
813
+ };
814
+ }
815
+ );
762
816
  server.registerTool(
763
817
  "human_hover",
764
818
  {
@@ -853,6 +907,94 @@ function registerPrimitiveTools(server, { sessions }) {
853
907
  return { content: [{ type: "text", text: `pasted ${value.length} chars into ${selector}` }] };
854
908
  }
855
909
  );
910
+ server.registerTool(
911
+ "human_clear",
912
+ {
913
+ title: "Clear a field (humanized)",
914
+ description: "Clears a text field (input/textarea/contenteditable) with a real keyboard gesture \u2014 click to focus, select-all, then delete \u2014 firing the input events the page expects. Use before human_type when you need to replace an existing value rather than append to it.",
915
+ inputSchema: {
916
+ selector: zod.z.string().describe("Selector of the field to clear."),
917
+ session: sessionArg2
918
+ }
919
+ },
920
+ async ({ selector, session }) => {
921
+ const { human } = await sessions.get(session);
922
+ await human.clear(selector);
923
+ return { content: [{ type: "text", text: `cleared ${selector}` }] };
924
+ }
925
+ );
926
+ server.registerTool(
927
+ "human_check",
928
+ {
929
+ title: "Check a box (humanized)",
930
+ description: "Ticks a checkbox or radio \u2014 moves the cursor to it and clicks, but only if it is not already checked (a real user does not re-click a ticked box). Verifies the resulting state. Pass the checkbox/radio input itself (or a [role=checkbox]) \u2014 not a wrapping <label> \u2014 so the current state can be read and the click stays idempotent.",
931
+ inputSchema: {
932
+ selector: zod.z.string().describe("Selector of the checkbox/radio input."),
933
+ session: sessionArg2
934
+ }
935
+ },
936
+ async ({ selector, session }) => {
937
+ const { human } = await sessions.get(session);
938
+ await human.check(selector);
939
+ return { content: [{ type: "text", text: `checked ${selector}` }] };
940
+ }
941
+ );
942
+ server.registerTool(
943
+ "human_uncheck",
944
+ {
945
+ title: "Uncheck a box (humanized)",
946
+ description: "Unticks a checkbox \u2014 humanized click only if currently checked. Radios cannot be unchecked by clicking (select a different option instead). Pass the checkbox input itself (or a [role=checkbox]) \u2014 not a wrapping <label> \u2014 so its state can be read and the click stays idempotent.",
947
+ inputSchema: {
948
+ selector: zod.z.string().describe("Selector of the checkbox input."),
949
+ session: sessionArg2
950
+ }
951
+ },
952
+ async ({ selector, session }) => {
953
+ const { human } = await sessions.get(session);
954
+ await human.uncheck(selector);
955
+ return { content: [{ type: "text", text: `unchecked ${selector}` }] };
956
+ }
957
+ );
958
+ server.registerTool(
959
+ "human_selectOption",
960
+ {
961
+ title: "Select dropdown option (humanized)",
962
+ description: "Chooses option(s) in a native <select> \u2014 moves the cursor to the dropdown, then sets the value (native selects open an OS menu automation can't drive, so the value is set programmatically, firing change/input). For custom DOM dropdowns, use human_click on the rendered options instead. Match by value(s); pass one string or an array for multi-selects.",
963
+ inputSchema: {
964
+ selector: zod.z.string().describe("Selector of the <select> element."),
965
+ values: zod.z.union([zod.z.string(), zod.z.array(zod.z.string())]).describe("Option value, or array of values for a multi-select."),
966
+ session: sessionArg2
967
+ }
968
+ },
969
+ async ({ selector, values, session }) => {
970
+ const { human } = await sessions.get(session);
971
+ const selected = await human.selectOption(selector, values);
972
+ return {
973
+ content: [{ type: "text", text: `selected ${selected.join(", ")} in ${selector}` }]
974
+ };
975
+ }
976
+ );
977
+ server.registerTool(
978
+ "human_upload",
979
+ {
980
+ title: "Upload file(s) (humanized)",
981
+ description: `Attaches file(s) to a file input \u2014 moves the cursor to the control, then sets the files (never opens the OS dialog, which would hang). For safety, files are read by basename from HUMANJS_UPLOAD_DIR (default: the server working dir) \u2014 subdirectories, "../", and absolute paths are rejected, so the agent can't read and exfiltrate arbitrary local files. Pass the <input type="file"> selector and the filename(s).`,
982
+ inputSchema: {
983
+ selector: zod.z.string().describe("Selector of the file input."),
984
+ files: zod.z.union([zod.z.string(), zod.z.array(zod.z.string())]).describe("Filename(s) inside HUMANJS_UPLOAD_DIR \u2014 a basename only, no path components."),
985
+ session: sessionArg2
986
+ }
987
+ },
988
+ async ({ selector, files, session }) => {
989
+ const { human } = await sessions.get(session);
990
+ const names = Array.isArray(files) ? files : [files];
991
+ const paths = names.map((name) => resolveUploadPath(env.uploadDir, name));
992
+ await human.upload(selector, paths);
993
+ return {
994
+ content: [{ type: "text", text: `uploaded ${paths.length} file(s) to ${selector}` }]
995
+ };
996
+ }
997
+ );
856
998
  server.registerTool(
857
999
  "human_press",
858
1000
  {
@@ -955,32 +1097,32 @@ function registerRecordingTools(server, { sessions, env }) {
955
1097
  "human_stop_recording",
956
1098
  {
957
1099
  title: "Stop recording and save",
958
- description: `Stops the active recording and writes it to one or more files in HUMANJS_OUTPUT_DIR. Each filename's extension picks its format: .mp4/.webm = video, .gif = animated gif, .json = action timeline. Pass several to export the same recording multiple ways, e.g. ["demo.mp4", "demo.json"] for video + timeline. Path components are rejected for safety.`,
1100
+ description: `Stops the active recording and writes it to one or more files in HUMANJS_OUTPUT_DIR. Each filename's extension picks its format: .mp4/.webm = video, .gif = animated gif, .json = action timeline, .ts = runnable HumanJS script, .spec.ts/.test.ts = @playwright/test spec (humanized, with derived assertions). Pass several to export the same recording multiple ways, e.g. ["demo.mp4", "checkout.spec.ts"] for a video plus a ready-to-commit test. Path components are rejected for safety.`,
959
1101
  inputSchema: {
960
1102
  filenames: zod.z.array(zod.z.string()).min(1).describe(
961
- 'One or more output filenames. The recording is saved to each, format chosen by extension. e.g. ["demo.mp4"] or ["demo.mp4", "demo.gif", "demo.json"].'
1103
+ 'One or more output filenames. The recording is saved to each, format chosen by extension. e.g. ["demo.mp4"], ["checkout.spec.ts"], or ["demo.mp4", "demo.json", "demo.ts"].'
962
1104
  ),
963
1105
  session: zod.z.string().optional().describe("Session ID. Omit for the default session.")
964
1106
  }
965
1107
  },
966
1108
  async ({ filenames, session }) => {
967
- const targets = filenames.map((filename) => ({
968
- path: resolveOutputPath(env.outputDir, filename),
969
- ext: path.extname(filename).toLowerCase()
970
- }));
971
- for (const { ext } of targets) {
972
- if (ext !== ".mp4" && ext !== ".webm" && ext !== ".gif" && ext !== ".json") {
1109
+ const targets = filenames.map((filename) => {
1110
+ const format = resolveRecordingFormat(filename);
1111
+ if (format === null) {
973
1112
  throw new Error(
974
- `Unsupported output extension "${ext}". Use .mp4, .webm, .gif, or .json.`
1113
+ `Unsupported output extension for "${filename}". Use .mp4/.webm (video), .gif, .json (timeline), .ts (HumanJS script), or .spec.ts/.test.ts (Playwright test).`
975
1114
  );
976
1115
  }
977
- }
1116
+ return { path: resolveOutputPath(env.outputDir, filename), format };
1117
+ });
978
1118
  const recording = await sessions.stopRecording(session);
979
1119
  try {
980
1120
  const saved = [];
981
- for (const { path, ext } of targets) {
982
- if (ext === ".gif") saved.push(await recording.toGif(path));
983
- else if (ext === ".json") saved.push(await recording.toTimeline(path));
1121
+ for (const { path, format } of targets) {
1122
+ if (format === "gif") saved.push(await recording.toGif(path));
1123
+ else if (format === "timeline") saved.push(await recording.toTimeline(path));
1124
+ else if (format === "humanjs") saved.push(await recording.toHumanJS(path));
1125
+ else if (format === "playwright") saved.push(await recording.toPlaywright(path));
984
1126
  else saved.push(await recording.toVideo(path));
985
1127
  }
986
1128
  return { content: [{ type: "text", text: `saved recording to:
@@ -1065,6 +1207,10 @@ Recording a flow (the natural-looking way):
1065
1207
  1. EXPLORE FIRST (un-recorded). Navigate the flow once to discover correct, unambiguous selectors (human_screenshot / human_get_html / human_get_attribute). Do this by default whenever the selectors aren't already known \u2014 no need for the user to ask. Skip it only if the selectors are already known or the user tells you not to explore.
1066
1208
  2. THEN RECORD ONE CLEAN RUN AS A SINGLE BATCH: human_start_recording + every action + human_stop_recording, all emitted in one turn. Keep selector-guessing and fumbles out of the take.
1067
1209
 
1210
+ Export as a test: human_stop_recording picks format by extension. A .spec.ts (or .test.ts) filename writes a ready-to-commit @playwright/test with derived assertions; a .ts writes a standalone HumanJS script; .mp4/.webm/.gif/.json are video/timeline. So "record this flow and save it as a test" = run the clean pass, then stop into e.g. "checkout.spec.ts".
1211
+
1212
+ Captured input + passwords: typed/pasted text IS recorded into the timeline and code exports, so generated scripts/tests are runnable \u2014 EXCEPT password fields, which are always masked (emitted as an empty string with a "fill in" comment). This is intentional, not a bug; don't work around it by hand-editing the secret back in. If the user explicitly wants the flow to log in, edit the exported file to read the credential from an env var (e.g. process.env.APP_PASSWORD) and tell them to set it \u2014 never hardcode a real password into a file that may be committed.
1213
+
1068
1214
  Dynamic UI: prefer specific selectors (role, aria-label) over text \u2014 the same visible text often matches several cards before a filter, or the wrong one after. If a click reports multiple matches, narrow the selector.
1069
1215
 
1070
1216
  Browser state: by default each run is a fresh, signed-out browser. If a flow needs a login, tell the user to enable persistence (human_enable_persistence or HUMANJS_PERSIST) or CDP attach \u2014 see human_browser_info.`;