browser-automation-skill 0.71.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (117) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +144 -0
  3. package/SECURITY.md +39 -0
  4. package/SKILL.md +206 -0
  5. package/bin/cli.mjs +55 -0
  6. package/install.sh +143 -0
  7. package/package.json +54 -0
  8. package/references/adapter-candidates.md +40 -0
  9. package/references/browser-mcp-cheatsheet.md +132 -0
  10. package/references/browser-stats-cheatsheet.md +155 -0
  11. package/references/chrome-devtools-mcp-cheatsheet.md +232 -0
  12. package/references/midscene-integration.md +359 -0
  13. package/references/obscura-cheatsheet.md +103 -0
  14. package/references/playwright-cli-cheatsheet.md +64 -0
  15. package/references/playwright-lib-cheatsheet.md +90 -0
  16. package/references/recipes/add-a-tool-adapter.md +134 -0
  17. package/references/recipes/agent-workflows/README.md +37 -0
  18. package/references/recipes/agent-workflows/cache-driven-bulk-operation.md +110 -0
  19. package/references/recipes/agent-workflows/flow-record-and-replay.md +102 -0
  20. package/references/recipes/agent-workflows/incremental-pattern-discovery.md +125 -0
  21. package/references/recipes/agent-workflows/login-then-scrape.md +100 -0
  22. package/references/recipes/anti-patterns-tool-extension.md +182 -0
  23. package/references/recipes/body-bytes-not-body.md +139 -0
  24. package/references/recipes/cache-write-security.md +210 -0
  25. package/references/recipes/fingerprint-rescue.md +154 -0
  26. package/references/recipes/model-routing.md +143 -0
  27. package/references/recipes/path-security.md +138 -0
  28. package/references/recipes/privacy-canary.md +96 -0
  29. package/references/recipes/visual-rescue-hook.md +182 -0
  30. package/references/stats-prices.json +42 -0
  31. package/references/stats-schema.json +77 -0
  32. package/references/tool-versions.md +8 -0
  33. package/scripts/browser-add-site.sh +113 -0
  34. package/scripts/browser-assert.sh +106 -0
  35. package/scripts/browser-audit.sh +68 -0
  36. package/scripts/browser-baseline.sh +135 -0
  37. package/scripts/browser-click.sh +100 -0
  38. package/scripts/browser-creds-add.sh +254 -0
  39. package/scripts/browser-creds-list.sh +67 -0
  40. package/scripts/browser-creds-migrate.sh +122 -0
  41. package/scripts/browser-creds-remove.sh +69 -0
  42. package/scripts/browser-creds-rotate-totp.sh +109 -0
  43. package/scripts/browser-creds-show.sh +82 -0
  44. package/scripts/browser-creds-totp.sh +94 -0
  45. package/scripts/browser-do.sh +630 -0
  46. package/scripts/browser-doctor.sh +365 -0
  47. package/scripts/browser-drag.sh +90 -0
  48. package/scripts/browser-extract.sh +192 -0
  49. package/scripts/browser-fill.sh +142 -0
  50. package/scripts/browser-flow.sh +316 -0
  51. package/scripts/browser-history.sh +187 -0
  52. package/scripts/browser-hover.sh +92 -0
  53. package/scripts/browser-inspect.sh +188 -0
  54. package/scripts/browser-list-sessions.sh +78 -0
  55. package/scripts/browser-list-sites.sh +42 -0
  56. package/scripts/browser-login.sh +279 -0
  57. package/scripts/browser-mcp.sh +65 -0
  58. package/scripts/browser-migrate.sh +195 -0
  59. package/scripts/browser-open.sh +134 -0
  60. package/scripts/browser-press.sh +80 -0
  61. package/scripts/browser-remove-session.sh +72 -0
  62. package/scripts/browser-remove-site.sh +68 -0
  63. package/scripts/browser-replay.sh +206 -0
  64. package/scripts/browser-route.sh +174 -0
  65. package/scripts/browser-select.sh +122 -0
  66. package/scripts/browser-show-session.sh +57 -0
  67. package/scripts/browser-show-site.sh +37 -0
  68. package/scripts/browser-snapshot.sh +176 -0
  69. package/scripts/browser-stats.sh +522 -0
  70. package/scripts/browser-tab-close.sh +112 -0
  71. package/scripts/browser-tab-list.sh +70 -0
  72. package/scripts/browser-tab-switch.sh +111 -0
  73. package/scripts/browser-upload.sh +132 -0
  74. package/scripts/browser-use.sh +60 -0
  75. package/scripts/browser-vlm.sh +707 -0
  76. package/scripts/browser-wait.sh +97 -0
  77. package/scripts/install-git-hooks.sh +16 -0
  78. package/scripts/lib/capture.sh +356 -0
  79. package/scripts/lib/common.sh +262 -0
  80. package/scripts/lib/credential.sh +237 -0
  81. package/scripts/lib/fingerprint-rescue.js +123 -0
  82. package/scripts/lib/flow.sh +448 -0
  83. package/scripts/lib/flow_record.sh +210 -0
  84. package/scripts/lib/mask.sh +49 -0
  85. package/scripts/lib/memory.sh +427 -0
  86. package/scripts/lib/migrate.sh +390 -0
  87. package/scripts/lib/migrators/README.md +23 -0
  88. package/scripts/lib/migrators/memory/v1_to_v2.sh +15 -0
  89. package/scripts/lib/migrators/recent_urls/README.md +13 -0
  90. package/scripts/lib/migrators/stats/README.md +24 -0
  91. package/scripts/lib/node/chrome-devtools-bridge.mjs +1812 -0
  92. package/scripts/lib/node/mcp-server.mjs +531 -0
  93. package/scripts/lib/node/mcp-tools.json +68 -0
  94. package/scripts/lib/node/playwright-driver.mjs +1104 -0
  95. package/scripts/lib/node/totp-core.mjs +52 -0
  96. package/scripts/lib/node/totp.mjs +52 -0
  97. package/scripts/lib/node/url-pattern-cluster.mjs +102 -0
  98. package/scripts/lib/node/url-pattern-resolver.mjs +77 -0
  99. package/scripts/lib/output.sh +79 -0
  100. package/scripts/lib/router.sh +342 -0
  101. package/scripts/lib/sanitize.sh +107 -0
  102. package/scripts/lib/secret/keychain.sh +91 -0
  103. package/scripts/lib/secret/libsecret.sh +74 -0
  104. package/scripts/lib/secret/plaintext.sh +75 -0
  105. package/scripts/lib/secret_backend_select.sh +57 -0
  106. package/scripts/lib/session.sh +153 -0
  107. package/scripts/lib/site.sh +126 -0
  108. package/scripts/lib/stats.sh +419 -0
  109. package/scripts/lib/tool/.gitkeep +0 -0
  110. package/scripts/lib/tool/chrome-devtools-mcp.sh +349 -0
  111. package/scripts/lib/tool/obscura.sh +249 -0
  112. package/scripts/lib/tool/playwright-cli.sh +155 -0
  113. package/scripts/lib/tool/playwright-lib.sh +106 -0
  114. package/scripts/lib/verb_helpers.sh +222 -0
  115. package/scripts/lib/visual-rescue-default.sh +145 -0
  116. package/scripts/regenerate-docs.sh +99 -0
  117. package/uninstall.sh +51 -0
@@ -0,0 +1,154 @@
1
+ # Recipe — Phase 13 fingerprint rescue
2
+
3
+ **Use when**: a cached selector goes stale (class rename, id rename, minor DOM
4
+ reshuffle). The Phase-11 memory cache used to require 4 consecutive failures
5
+ + an LLM re-resolve to recover. Phase 13 adds a *pre-LLM* rescue tier that
6
+ tries to find an equivalent element via weak-fingerprint similarity. If it
7
+ succeeds, the cache silently heals (selector overwritten, fail_count reset,
8
+ `self_heal_history[]` appended with `event:"rescued"`). If it fails, the
9
+ existing fail_count path runs unchanged.
10
+
11
+ **Inspired by**: Scrapling's adaptive selectors. *Not* a Scrapling adapter —
12
+ the algorithm is ported as ~150 LOC of Node-side scoring + bash glue. No
13
+ Python dependency added.
14
+
15
+ ## When the rescue runs
16
+
17
+ Only on `browser-do --intent` cache-hit-then-fail with exit code
18
+ `EXIT_EMPTY_RESULT` (11) or `EXIT_ASSERTION_FAILED` (13). Environmental
19
+ failures (network 30, tool crash 42, timeout 43) skip the rescue — those
20
+ would poison the cache if counted.
21
+
22
+ ```
23
+ cache hit → dispatch verb → adapter resolves 0 elements (rc=11)
24
+
25
+
26
+ memory_fingerprint_rescue
27
+
28
+ ┌───────────────┴───────────────┐
29
+ ▼ ▼
30
+ returns rescued_selector returns "" (no match ≥ threshold)
31
+ │ │
32
+ ▼ ▼
33
+ retry verb with rescued memory_record_failure (existing path)
34
+ │ fail_count++ → after 4 → disabled
35
+ retry succeeds?
36
+
37
+ ┌─────┴─────┐
38
+ ▼ ▼
39
+ yes no
40
+ │ │
41
+ memory_record_heal: memory_record_failure (existing path)
42
+ - overwrite selector
43
+ - reset fail_count
44
+ - bump success_count
45
+ - append self_heal_history
46
+ - emit stats event {rescued:true}
47
+ ```
48
+
49
+ ## Algorithm
50
+
51
+ 1. **Parse cached selector → weak fingerprint** (bash + jq):
52
+
53
+ ```
54
+ "button.delete" → { tag: "BUTTON", classes: ["delete"], attrs: {} }
55
+ "#submit" → { tag: "*", classes: [], attrs: {id: "submit"} }
56
+ "form > input.email" → { tag: "FORM", classes: ["email"], attrs: {} } (combinator stripped — weak)
57
+ ```
58
+
59
+ Combinators (`>`, `+`, `~`), pseudo-classes (`:hover`), attribute operators
60
+ (`^=`, `*=`) are not parsed. The fingerprint will simply be weaker and the
61
+ JS scorer will probably miss — caller falls through to LLM re-resolve.
62
+
63
+ 2. **Inject scorer into the page** via `browser-extract --eval` (Phase 13 JS
64
+ file: `scripts/lib/fingerprint-rescue.js`). Constants `__FP` (the
65
+ fingerprint) and `__TH` (threshold) are prepended bash-side.
66
+
67
+ 3. **Score each DOM element**:
68
+
69
+ ```
70
+ score = 0.4 × tag_match
71
+ + 0.4 × jaccard(target.classes, candidate.classList)
72
+ + 0.2 × jaccard(target.attrs, candidate.attributes)
73
+ ```
74
+
75
+ 4. **Synthesise selector for the best-scoring candidate** above threshold:
76
+
77
+ ```
78
+ 1. #id (when id is /^[A-Za-z][\w-]*$/ AND uniquely resolving)
79
+ 2. [data-testid="…"] (preferred test-automation hook)
80
+ 3. tag.class[.class…] (uniquely resolving)
81
+ 4. nth-child path (absolute last-resort)
82
+ ```
83
+
84
+ ## Threshold
85
+
86
+ Default `0.70`. Override per-session:
87
+
88
+ ```bash
89
+ BROWSER_DO_RESCUE_THRESHOLD=0.85 bash scripts/browser-do.sh --site myapp --verb click --intent "delete row" --pattern '/devices/:id'
90
+ ```
91
+
92
+ - `0.70` (default) — Scrapling-like balance. Accepts moderate drift (class
93
+ rename) but rejects very-different candidates.
94
+ - `0.85` — conservative. Fewer false positives, more LLM round-trips on
95
+ borderline drift.
96
+ - `0.50` — permissive. More heals, but watch the `failure_mode=
97
+ wrong_element_acted` count in `browser-stats report` for false-positive
98
+ drift.
99
+
100
+ ## Audit visibility
101
+
102
+ Each successful rescue emits a dedicated stats event:
103
+
104
+ ```json
105
+ {
106
+ "schema_version": 1,
107
+ "ts": "2026-05-18T01:02:03.456Z",
108
+ "gen_ai_tool_name": "browser-do.fingerprint_rescue",
109
+ "verb": "do",
110
+ "adapter_route": "browser-do",
111
+ "outcome": "success",
112
+ "rescued": true,
113
+ "fingerprint_from_selector": "button.delete",
114
+ "fingerprint_to_selector": "button[data-testid=delete-btn]"
115
+ }
116
+ ```
117
+
118
+ Query the heal-rate:
119
+
120
+ ```bash
121
+ bash scripts/browser-stats.sh report --verb do
122
+ # → look for "browser-do.fingerprint_rescue" rows + outcome=success share
123
+ ```
124
+
125
+ ## When NOT to use
126
+
127
+ - **Cross-page rescue.** The rescue only runs against the *currently-loaded*
128
+ DOM. If the verb already redirected and the cached selector is for the
129
+ prior page, rescue won't find it. (`navigation_mismatch` failure mode
130
+ instead.)
131
+ - **Identifier-free designs.** If the target element has no id, no
132
+ data-testid, no stable classes, and no unique nth-child position, the
133
+ synthesised selector will be brittle. Better to invest in a stable
134
+ test-automation hook than depend on rescue.
135
+ - **Heavy DOMs.** The scorer walks `document.querySelectorAll('*')` — O(n).
136
+ For pages with 10k+ DOM nodes the scoring will take >500ms. Consider
137
+ scoping with `document.querySelectorAll(target.tag.toLowerCase())` if you
138
+ hit this in practice (future v2 — the current implementation prioritises
139
+ recall over speed).
140
+
141
+ ## Failure modes mapped (vs the Phase-12 stats audit)
142
+
143
+ | Outcome | failure_mode | rescued | Interpretation |
144
+ |---|---|---|---|
145
+ | Cache hit, adapter ok | null | null | Steady-state. No rescue ran. |
146
+ | Cache hit, adapter fail, rescue ok | null | true | Silent heal. Audit row: `gen_ai_tool_name=browser-do.fingerprint_rescue`. |
147
+ | Cache hit, adapter fail, rescue no-match | stale_ref (on the verb event) | null | Original fail_count++. Eventual LLM re-resolve. |
148
+ | Cache hit, adapter fail, rescue match-but-retry-fail | stale_ref (on the verb event) | false (future — currently null) | Algorithm picked wrong candidate. Track this metric to tune threshold. |
149
+ | Cache hit, adapter wrong-click | wrong_element_acted | true (false positive!) | Rescue scored a wrong element ≥ threshold. **Tune threshold up** if this rises. |
150
+
151
+ ## Related
152
+
153
+ - Phase 11 `self_heal_history[]` lifecycle is preserved — see `scripts/lib/memory.sh::memory_record` (enabled→disabled) + `memory_record_failure` (disabled→enabled). Phase 13 adds the third event type: `event:"rescued"`.
154
+ - Phase 12 audit surface: `references/browser-stats-cheatsheet.md`.
@@ -0,0 +1,143 @@
1
+ # Recipe: Model routing — three-tier strategy
2
+
3
+ When and how to route Claude model selection across the parent session, this skill, and (eventually) per-verb invocations. The default ships with `model: sonnet` + `effort: low` in `SKILL.md` frontmatter; this recipe explains why, when to override, and how to layer in `opusplan` / `/advisor` at the parent session level.
4
+
5
+ ## When to use this recipe
6
+
7
+ Use this whenever:
8
+ - A user reports the skill seems to "make wrong choices" on complex flows (multi-step logins, ambiguous snapshots) — they may need to escalate from default Sonnet.
9
+ - Your session token bill on browser tasks is bigger than you'd like — verify the three-tier setup is in place.
10
+ - You're integrating this skill into a different host CLI (Codex, Cursor, Gemini CLI) — the model-routing primitives may differ.
11
+
12
+ Do NOT use this recipe for:
13
+ - Picking which Anthropic SDK to install. Model routing is consumer-side; the SDK is producer-side.
14
+ - Speed-of-response tuning. Use `effort:` (low/medium/high/xhigh/max), not `model:`, for that knob.
15
+
16
+ ## The three tiers
17
+
18
+ | Tier | Where it lives | Default in this skill | What it controls |
19
+ |---|---|---|---|
20
+ | 1. Parent session | `/model` command, `~/.claude/settings.json::model`, env var `ANTHROPIC_MODEL` | (user's choice — recommended: `opusplan`) | The "thinking" model — used when Claude reasons about what verb to call, parses snapshots, plans next steps |
21
+ | 2. Skill turn | `model:` field in `SKILL.md` frontmatter | `sonnet` + `effort: low` | The "acting" model — used during the single turn that invokes the skill. Per-turn override; resumes Tier 1 on next prompt |
22
+ | 3. Per-verb (future) | (not yet supported) | n/a | Some verbs may need Opus reasoning (login flow auto-detect); most just shell out to bash |
23
+
24
+ ## Tier 1: Parent session
25
+
26
+ ### Recommended: `/model opusplan` (stable)
27
+
28
+ ```bash
29
+ # In any Claude Code session:
30
+ /model opusplan
31
+
32
+ # Or persist as default in ~/.claude/settings.json:
33
+ { "model": "opusplan" }
34
+ ```
35
+
36
+ `opusplan` is a Claude Code built-in alias: **Opus during plan mode, Sonnet during execution mode**. Plan mode is entered with `shift+tab` or `/plan`; exited with `shift+tab` again. Plan-mode reasoning is where the heavy thinking happens (designing flows, deciding how to debug a failure, brainstorming a feature plan). Execution mode is where Claude calls bash, edits files, runs verbs — Sonnet is enough.
37
+
38
+ This is the **zero-risk** starting point. No beta header. No skill edits. Available everywhere Claude Code runs (Anthropic-direct, Bedrock, Vertex, Foundry — though `opus`/`sonnet` resolve to provider-pinned versions on third-party providers).
39
+
40
+ ### Advanced: `/advisor` (experimental as of v2.1.x)
41
+
42
+ ```bash
43
+ /advisor # toggle in current session
44
+ ```
45
+
46
+ `/advisor` is the Claude Code surface for the [Advisor Tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/advisor-tool). The session model becomes the **executor** (Sonnet 4.6 by default; can pair Haiku 4.5 too); during generation the executor consults an **advisor** model (Opus 4.7) mid-stream when it hits decision points.
47
+
48
+ Mechanism (per [Advisor Tool docs](https://platform.claude.com/docs/en/agents-and-tools/tool-use/advisor-tool)):
49
+ 1. Executor decides to consult — emits `server_tool_use { name: "advisor", input: {} }`.
50
+ 2. Anthropic server runs Opus inference with the full transcript (no client orchestration).
51
+ 3. Advisor returns ~400-700 token plan (~1,400-1,800 with thinking).
52
+ 4. Executor continues, advice in context.
53
+
54
+ Cost economics:
55
+ - Executor (Sonnet) generates the bulk of output → billed at Sonnet rate ($3/$15 per 1M).
56
+ - Advisor (Opus) generates only advice tokens, billed at Opus rate ($5/$25 per 1M).
57
+ - Internal Anthropic benchmarks: "Sonnet executor at medium effort + Opus advisor → intelligence comparable to Sonnet at default effort, at lower cost."
58
+
59
+ **Caveats:**
60
+ - Beta status. May change. May hit rate limits on the advisor sub-inference (`too_many_requests` error code; executor continues without the advice).
61
+ - Not yet on Bedrock/Vertex/Foundry — Anthropic-direct only.
62
+ - Advisor sub-inference doesn't stream — expect a pause when consultation fires.
63
+ - No built-in conv-level cap; if cost balloons, set `max_uses` per request or toggle `/advisor` off.
64
+
65
+ **When to add `/advisor`**: after `opusplan` proves out the cost-saving direction. If browser-automation flows show ad-hoc reasoning bottlenecks (Sonnet picking wrong refs, missing the right verb sequence), `/advisor` lets Sonnet ask Opus for a plan without paying full Opus rate for the whole turn.
66
+
67
+ ## Tier 2: This skill's frontmatter
68
+
69
+ ```yaml
70
+ ---
71
+ name: browser-automation-skill
72
+ ...
73
+ model: sonnet
74
+ effort: low
75
+ ---
76
+ ```
77
+
78
+ Per [Claude Code skills docs](https://code.claude.com/docs/en/skills): "The override applies for the rest of the current turn and is not saved to settings; the session model resumes on your next prompt."
79
+
80
+ So when a user (or Claude auto-loading) invokes the skill:
81
+ - Skill turn = Sonnet + low effort
82
+ - Next prompt = back to parent session model (Tier 1: opusplan or whatever the user set)
83
+
84
+ **Why Sonnet, not Haiku.** Haiku 4.5 is ~3× cheaper than Sonnet 4.6 but has noticeably less robustness on multi-step verb chaining (snapshot → pick `eN` ref → fill → submit). Browser-automation flows have enough orchestration that Sonnet earns its 3× over Haiku. If a specific user's flows are simple (single-step extracts, dry-run-only), they can override per-skill via `~/.claude/settings.json::skillOverrides` (not currently a documented field for model — file an issue if you need this).
85
+
86
+ **Why `effort: low`.** Effort is independent of model. Sonnet at `effort: low` saves tokens vs Sonnet at default effort, with minimal capability loss for pure verb-driving (no deep reasoning needed — Claude already planned in the parent turn). If a flow regresses, bump to `effort: medium`.
87
+
88
+ **Override escape-hatch.** When a session demands Opus reasoning during the skill turn (debugging a complex login flow that Sonnet keeps mishandling), the user can:
89
+
90
+ ```bash
91
+ # Override for the rest of the session — `inherit` keeps the parent model
92
+ /model opus # before invoking the skill
93
+ ```
94
+
95
+ Or edit the skill's frontmatter to `model: inherit` permanently to disable the per-turn override (skill follows parent session model).
96
+
97
+ ## Tier 3: Per-verb (deferred)
98
+
99
+ Not currently supported by Claude Code's frontmatter model. If different verbs in this skill needed different models (e.g. `login --interactive` wants Opus for form-shape detection; `snapshot` wants Haiku for pure screen-scrape), the workaround would be:
100
+
101
+ - Split the skill into N skills, each with its own `SKILL.md` + `model:` field. Path: every verb script becomes a tiny standalone skill.
102
+ - Or use `Agent` tool from inside the skill body to spawn a subagent with a different `model:` parameter.
103
+
104
+ Not worth the structural complexity until multiple users report it. Track as open follow-up.
105
+
106
+ ## How to verify the routing is working
107
+
108
+ ```bash
109
+ # In Claude Code:
110
+ /status # shows current session model
111
+ /model # opens picker — verify opusplan or your choice is selected
112
+ ```
113
+
114
+ After invoking a skill verb (e.g. `/browser-automation-skill snapshot`), `/status` may show that the active model briefly flipped to Sonnet (depending on how Claude Code surfaces per-turn overrides). The `usage.iterations[]` array in the API response shows the breakdown if you're driving via SDK.
115
+
116
+ For cost-per-session tracking, `/cost` (when available) summarizes token spend by model.
117
+
118
+ ## Common failure modes
119
+
120
+ | Symptom | Likely cause | Fix |
121
+ |---|---|---|
122
+ | Skill picks wrong `eN` ref repeatedly | Sonnet at `effort: low` undercutting on snapshot interpretation | Bump skill frontmatter to `effort: medium` or override session-side with `/model opus` |
123
+ | `/advisor` toggle fails with "too_many_requests" | Advisor rate-limited on Opus | Toggle `/advisor` off; rely on opusplan only. Or wait + retry. |
124
+ | `model: opusplan` in SKILL.md doesn't activate plan mode | opusplan is plan-mode-state-aware; skill turn doesn't enter plan mode by itself | Use `model: sonnet` (current default) — opusplan is a parent-session-level alias, not a skill-turn primitive |
125
+ | Cost still high after opusplan + skill model:sonnet | Most tokens going to non-skill turns (parent reasoning, file reads) | Profile with `/cost`; if parent-side dominates, that's where to optimize next |
126
+
127
+ ## Recommended setup for new users
128
+
129
+ ```
130
+ 1. Run `claude update` to get v2.1.x (advisor support).
131
+ 2. Run `/model opusplan` once — persists across sessions.
132
+ 3. (Optional) Run `/advisor` to enable advisor consultation.
133
+ 4. Use this skill — frontmatter already routes the skill turn to Sonnet + low effort.
134
+ 5. Watch token usage via `/cost` over a few sessions; tune effort if needed.
135
+ ```
136
+
137
+ ## See also
138
+
139
+ - [Claude Code Model configuration (`opusplan`)](https://code.claude.com/docs/en/model-config)
140
+ - [Skills frontmatter reference (`model`/`effort`)](https://code.claude.com/docs/en/skills)
141
+ - [Advisor Tool — Claude API Docs](https://platform.claude.com/docs/en/agents-and-tools/tool-use/advisor-tool)
142
+ - [Anthropic API pricing](https://platform.claude.com/docs/en/about-claude/pricing)
143
+ - Sister recipes: [privacy-canary.md](privacy-canary.md), [path-security.md](path-security.md), [body-bytes-not-body.md](body-bytes-not-body.md)
@@ -0,0 +1,138 @@
1
+ # Recipe: Path security
2
+
3
+ For any verb that takes a `--path` argument and forwards bytes from disk to a downstream tool (browser, MCP server, network upstream). Establishes the three guarantees the verb owes its caller, all enforced bash-side BEFORE the adapter dispatches.
4
+
5
+ ## When to use this recipe
6
+
7
+ Use this whenever a verb accepts a filesystem path and acts on its contents. Already shipped: `scripts/browser-upload.sh` (Phase 6 part 6).
8
+
9
+ Phase 7 capture-pipeline work will reuse this for any verb that writes capture artifacts to a caller-specified location (`--out PATH`).
10
+
11
+ Do NOT use this recipe for:
12
+ - Paths the **framework owns** (e.g. `${BROWSER_SKILL_HOME}/sessions/<name>.json`). Trust boundary doesn't apply — those paths are constructed, not accepted.
13
+ - Read-only metadata commands (`show-site --name X` — name isn't a path).
14
+
15
+ ## The three checks (in order)
16
+
17
+ ```bash
18
+ # scripts/browser-<verb>.sh — paste-ready scaffold
19
+ [ -n "${path}" ] || die "${EXIT_USAGE_ERROR}" "<verb> requires --path PATH"
20
+
21
+ # 1. Existence + regular-file check.
22
+ # Rejects: missing files, directories, devices, FIFOs, sockets.
23
+ if [ ! -e "${path}" ]; then
24
+ die "${EXIT_USAGE_ERROR}" "<verb>: path does not exist: ${path}"
25
+ fi
26
+ if [ ! -f "${path}" ]; then
27
+ die "${EXIT_USAGE_ERROR}" "<verb>: path is not a regular file: ${path}"
28
+ fi
29
+
30
+ # 2. Readability check (current user, current process).
31
+ if [ ! -r "${path}" ]; then
32
+ die "${EXIT_USAGE_ERROR}" "<verb>: path is not readable by the current user: ${path}"
33
+ fi
34
+
35
+ # 3. Sensitive-pattern reject. Override with --allow-sensitive (typed ack).
36
+ if [ "${allow_sensitive}" -ne 1 ]; then
37
+ case "${path}" in
38
+ *.ssh/*|*/.ssh/*|*.aws/credentials|*/.aws/credentials|*/.env|*.env|\
39
+ */credentials|*/credentials.json|*/secrets.json|*/private_key*|*/id_rsa*|\
40
+ */id_ed25519*|*/id_ecdsa*)
41
+ die "${EXIT_USAGE_ERROR}" "<verb>: path '${path}' matches a sensitive pattern; pass --allow-sensitive to override"
42
+ ;;
43
+ esac
44
+ fi
45
+
46
+ # 4. Realpath canonicalization (eliminates symlink games).
47
+ canonical_path="$(realpath "${path}" 2>/dev/null \
48
+ || readlink -f "${path}" 2>/dev/null \
49
+ || printf '%s' "${path}")"
50
+
51
+ # Forward the canonical path, not the user's input.
52
+ verb_argv+=(--path "${canonical_path}")
53
+ ```
54
+
55
+ Source of truth: `scripts/browser-upload.sh:74-103`.
56
+
57
+ ## Why each check exists
58
+
59
+ ### Check 1 — `[ -f "${path}" ]`
60
+
61
+ ```
62
+ WRONG — only check existence
63
+ [ -e "${path}" ] || die ...
64
+ # Passes for /dev/zero, /tmp/some-fifo, /etc — agent uploads garbage.
65
+ ```
66
+
67
+ `-f` rejects directories, character/block devices, FIFOs, and sockets. The downstream tool was promised "a file's bytes"; an attempt to read `/dev/zero` would either hang the agent or upload an arbitrary number of zero bytes.
68
+
69
+ ### Check 2 — `[ -r "${path}" ]`
70
+
71
+ Failing this check returns a clear UX error from the verb script. Skipping it pushes the failure down to the adapter, where the error message is whatever cdt-mcp / playwright happens to emit (often opaque, often surfaces a permissions dump).
72
+
73
+ ### Check 3 — Sensitive-pattern reject
74
+
75
+ ```
76
+ WRONG — trust the agent to know what they're doing
77
+ verb_argv+=(--path "${path}") # forward whatever the agent typed
78
+ ```
79
+
80
+ The agent can be tricked. A user pastes `--path ~/.ssh/id_rsa` into a browser-automation prompt and the agent obediently uploads it. Sensitive-pattern reject is the **default-deny**; `--allow-sensitive` is the typed acknowledgment that the agent saw the pattern and is uploading intentionally (e.g. uploading a GPG key to a keyserver).
81
+
82
+ The pattern list is intentionally short — it covers the **boring high-frequency** cases (SSH keys, AWS credentials, `.env` files). It is not a full DLP. Don't expand it to chase exotic filenames; that creates an arms race against tools-of-the-month.
83
+
84
+ ### Check 4 — Realpath canonicalization
85
+
86
+ ```
87
+ WRONG — forward agent input verbatim
88
+ verb_argv+=(--path "${path}")
89
+
90
+ # Then a symlink game beats the sensitive-pattern reject:
91
+ $ ln -s ~/.ssh/id_rsa /tmp/innocent.txt
92
+ $ verb --path /tmp/innocent.txt # passes step 3 (path doesn't match patterns)
93
+ # but actually uploads the SSH key
94
+ ```
95
+
96
+ `realpath` resolves the symlink; the canonical path becomes `~/.ssh/id_rsa`, which the sensitive-pattern check would have caught — but that check already ran. Two correct orderings exist:
97
+
98
+ - **Resolve THEN check** (safer; resolution can't be skipped). Order in the recipe is "check then resolve" because that's how `browser-upload.sh` shipped; both work, but resolve-first is what to write next time.
99
+ - **Check then resolve, then re-check** (paranoid). Reasonable if you're worried about TOCTOU between the two operations.
100
+
101
+ Cross-platform fallback: macOS pre-Xcode-11 lacks `readlink -f` and may lack `realpath`. The chain `realpath || readlink -f || printf '%s'` gracefully degrades to verbatim path on the rare platform that has neither — at the cost of skipping symlink resolution on that platform. CI exercises both GNU (Linux) and BSD (macOS) realpath paths.
102
+
103
+ ## What's NOT this recipe's job
104
+
105
+ - **Encryption-at-rest of the file's contents.** That's a different layer (the user's filesystem, FileVault, LUKS, etc.).
106
+ - **Anti-malware scanning.** Verb is a thin transport, not a security product.
107
+ - **Quarantining files after upload.** Out of scope; the user owns the file.
108
+ - **Sandboxing the downstream tool.** That's the adapter's concern — sensitive-pattern reject + realpath stops *accidental* exfil; defense against a hostile downstream tool is the wrong threat model for this layer.
109
+
110
+ ## Test surface (already shipped for upload, copy for new verbs)
111
+
112
+ `tests/browser-upload.bats` cases worth porting:
113
+ - Path doesn't exist → `EXIT_USAGE_ERROR`.
114
+ - Path is a directory → `EXIT_USAGE_ERROR`.
115
+ - Path matches `~/.ssh/id_rsa` pattern → `EXIT_USAGE_ERROR` mentioning sensitive.
116
+ - Same path with `--allow-sensitive` → success (dry-run).
117
+ - Symlink-to-sensitive resolved by realpath → still rejected.
118
+ - Symlink-to-innocent resolved by realpath → success, canonical path forwarded.
119
+
120
+ ## Checklist for any new path-accepting verb
121
+
122
+ ```
123
+ 1. Verb takes --path PATH and (if writes) maybe --out PATH.
124
+ 2. Add --allow-sensitive flag (default 0; typed ack).
125
+ 3. Inline the four-step block from this recipe between argv parsing and
126
+ adapter dispatch. Replace `<verb>` with the verb name in error strings.
127
+ 4. Forward the CANONICAL path (post-realpath), not the user's input.
128
+ 5. Test cases: missing / dir / unreadable / sensitive-rejected /
129
+ sensitive-allowed / symlink-to-sensitive / symlink-to-innocent.
130
+ 6. CHANGELOG entry with [security] tag if this is a new attack surface.
131
+ ```
132
+
133
+ ## See also
134
+
135
+ - `scripts/browser-upload.sh:74-103` — the source-of-truth implementation.
136
+ - `tests/browser-upload.bats` — test cases to port.
137
+ - [Privacy canary recipe](privacy-canary.md) — sister pattern for credential bytes.
138
+ - [Body-bytes-not-body recipe](body-bytes-not-body.md) — sister pattern for content bodies.
@@ -0,0 +1,96 @@
1
+ # Recipe: Privacy canary
2
+
3
+ A sentinel-byte regression test for any verb that ingests caller-supplied secrets (passwords, tokens, TOTP shared-secrets, session storage state). Detects the day a refactor accidentally re-emits the secret on stdout, in a log line, or inside a JSON reply.
4
+
5
+ ## When to use this recipe
6
+
7
+ Use this **whenever you add a verb that reads a secret via stdin** (the AP-7 pattern — see `anti-patterns-tool-extension.md::AP-7`). Examples already shipped:
8
+
9
+ - `tests/creds-show.bats::49` — `creds show` invariant.
10
+ - `tests/creds-migrate.bats::124` — backend transfer mustn't echo.
11
+ - `tests/creds-rotate-totp.bats::99` — TOTP shared-secret roundtrip.
12
+ - `tests/chrome-devtools-mcp_daemon_e2e.bats::140` — `fill --secret-stdin` end-to-end.
13
+
14
+ Do NOT use this recipe for:
15
+ - Verbs that don't ingest secrets (the canary has nothing to detect).
16
+ - Verbs whose only "secret" is something the agent typed and is happy to read back (e.g. `route fulfill --body` — see `body-bytes-not-body.md` instead; the body is content, not a credential).
17
+
18
+ ## The pattern
19
+
20
+ ```
21
+ WRONG — assert "secret didn't appear in some specific field"
22
+ @test "fill --secret-stdin: reply has no .text key" {
23
+ printf 'pw' | bash browser-fill.sh --ref e1 --secret-stdin
24
+ printf '%s' "$output" | jq -e '.text == null' >/dev/null
25
+ }
26
+ # Brittle: only catches a single regression mode (echoing in a known field).
27
+ # A new code path that puts the secret into a NEW field passes this test.
28
+ ```
29
+
30
+ ```
31
+ RIGHT — assert "this exact byte sequence does not appear ANYWHERE on stdout"
32
+ @test "fill --secret-stdin: privacy canary" {
33
+ CANARY="sekret-do-not-leak-XYZ"
34
+ run bash -c "printf '%s' '${CANARY}' | bash '${SCRIPTS_DIR}/browser-fill.sh' --ref e1 --secret-stdin"
35
+ assert_status 0
36
+ printf '%s' "${output}" | grep -q "${CANARY}" \
37
+ && fail "skill stdout leaked the secret canary: ${CANARY}" || true
38
+ # Reply shape still correct (don't accept a "no output" pass).
39
+ printf '%s' "${output}" | jq -e '.verb == "fill" and .status == "ok"' >/dev/null
40
+ }
41
+ ```
42
+
43
+ The canary string MUST be:
44
+ - **Unique** to this test (so a grep can't accidentally match a real reply field). Embed the verb name and the test number: `sekret-do-not-leak-CDT-1c-ii`, `canary-creds-show-49`.
45
+ - **Long enough** that grep's failure mode is meaningful. ~10+ ASCII characters; shorter strings risk colliding with field names like `id` or `ok`.
46
+ - **Distinct** from the bytes the test injects elsewhere (don't reuse the canary as a username).
47
+
48
+ ## Why a sentinel beats field-shape assertions
49
+
50
+ Field-shape assertions catch the regression you predict; sentinels catch the regression you didn't predict. The sentinel test answers a different question:
51
+
52
+ > "Does **any** code path between stdin-read and stdout-write echo this byte sequence?"
53
+
54
+ That question is what the AP-7 invariant actually claims. Anchoring the test to a specific field reduces it to "does *this one path* echo," which is the strictly weaker claim.
55
+
56
+ ## Layered coverage: bash AND daemon
57
+
58
+ Verbs that go through the bridge daemon need **two** canaries:
59
+
60
+ ```
61
+ tests/<verb>.bats # bash-side canary (verb script -> adapter)
62
+ tests/<adapter>_daemon_e2e.bats # daemon-side canary (bridge -> daemon -> MCP)
63
+ ```
64
+
65
+ Each layer can independently leak — bridge could echo on its way to the daemon, or the daemon could echo back through IPC. The bash-side canary doesn't catch a daemon-side leak and vice versa.
66
+
67
+ Sample placement (already shipped): `fill --secret-stdin` has a canary in both `tests/browser-fill.bats` (bash) and `tests/chrome-devtools-mcp_daemon_e2e.bats:140` (daemon).
68
+
69
+ ## Don't grep the file system
70
+
71
+ ```
72
+ WRONG — assert canary doesn't appear in any state-dir file
73
+ grep -r "${CANARY}" "${BROWSER_SKILL_HOME}" && fail "leaked to disk"
74
+ ```
75
+
76
+ Some verbs **legitimately persist the secret** (creds-add writes to keychain/file backend; that's the whole point). Disk persistence is governed by the credential-backend test, not by the privacy canary. The privacy canary is exclusively about **stdout** — what the agent / Claude transcript sees.
77
+
78
+ ## Checklist for any new secret-ingesting verb
79
+
80
+ ```
81
+ 1. Pick a unique canary string (`sekret-do-not-leak-<verb>-<n>`).
82
+ 2. Pipe the canary in via the verb's --secret-stdin / --*-stdin flag.
83
+ 3. Capture stdout with `run bash -c '...'` (let bats own the buffer).
84
+ 4. Negative assert: `printf '%s' "$output" | grep -q "${CANARY}" && fail`.
85
+ 5. Positive assert: jq the reply shape so a "no output" run doesn't false-pass.
86
+ 6. If the verb routes through a daemon/bridge, add a SECOND canary at the
87
+ daemon-e2e layer. Each layer's stdout is separately observable.
88
+ 7. NEVER grep ${BROWSER_SKILL_HOME} for the canary — disk persistence is
89
+ the credential backend's responsibility, not this test's invariant.
90
+ ```
91
+
92
+ ## See also
93
+
94
+ - [Anti-patterns: tool extension `AP-7`](anti-patterns-tool-extension.md) — secrets-via-stdin invariant.
95
+ - [Body-bytes-not-body recipe](body-bytes-not-body.md) — sister pattern for non-secret caller-supplied content.
96
+ - `tests/argv_leak.bats` — the AP-7 enforcement test (catches secrets on argv).