browser-automation-skill 0.71.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +144 -0
- package/SECURITY.md +39 -0
- package/SKILL.md +206 -0
- package/bin/cli.mjs +55 -0
- package/install.sh +143 -0
- package/package.json +54 -0
- package/references/adapter-candidates.md +40 -0
- package/references/browser-mcp-cheatsheet.md +132 -0
- package/references/browser-stats-cheatsheet.md +155 -0
- package/references/chrome-devtools-mcp-cheatsheet.md +232 -0
- package/references/midscene-integration.md +359 -0
- package/references/obscura-cheatsheet.md +103 -0
- package/references/playwright-cli-cheatsheet.md +64 -0
- package/references/playwright-lib-cheatsheet.md +90 -0
- package/references/recipes/add-a-tool-adapter.md +134 -0
- package/references/recipes/agent-workflows/README.md +37 -0
- package/references/recipes/agent-workflows/cache-driven-bulk-operation.md +110 -0
- package/references/recipes/agent-workflows/flow-record-and-replay.md +102 -0
- package/references/recipes/agent-workflows/incremental-pattern-discovery.md +125 -0
- package/references/recipes/agent-workflows/login-then-scrape.md +100 -0
- package/references/recipes/anti-patterns-tool-extension.md +182 -0
- package/references/recipes/body-bytes-not-body.md +139 -0
- package/references/recipes/cache-write-security.md +210 -0
- package/references/recipes/fingerprint-rescue.md +154 -0
- package/references/recipes/model-routing.md +143 -0
- package/references/recipes/path-security.md +138 -0
- package/references/recipes/privacy-canary.md +96 -0
- package/references/recipes/visual-rescue-hook.md +182 -0
- package/references/stats-prices.json +42 -0
- package/references/stats-schema.json +77 -0
- package/references/tool-versions.md +8 -0
- package/scripts/browser-add-site.sh +113 -0
- package/scripts/browser-assert.sh +106 -0
- package/scripts/browser-audit.sh +68 -0
- package/scripts/browser-baseline.sh +135 -0
- package/scripts/browser-click.sh +100 -0
- package/scripts/browser-creds-add.sh +254 -0
- package/scripts/browser-creds-list.sh +67 -0
- package/scripts/browser-creds-migrate.sh +122 -0
- package/scripts/browser-creds-remove.sh +69 -0
- package/scripts/browser-creds-rotate-totp.sh +109 -0
- package/scripts/browser-creds-show.sh +82 -0
- package/scripts/browser-creds-totp.sh +94 -0
- package/scripts/browser-do.sh +630 -0
- package/scripts/browser-doctor.sh +365 -0
- package/scripts/browser-drag.sh +90 -0
- package/scripts/browser-extract.sh +192 -0
- package/scripts/browser-fill.sh +142 -0
- package/scripts/browser-flow.sh +316 -0
- package/scripts/browser-history.sh +187 -0
- package/scripts/browser-hover.sh +92 -0
- package/scripts/browser-inspect.sh +188 -0
- package/scripts/browser-list-sessions.sh +78 -0
- package/scripts/browser-list-sites.sh +42 -0
- package/scripts/browser-login.sh +279 -0
- package/scripts/browser-mcp.sh +65 -0
- package/scripts/browser-migrate.sh +195 -0
- package/scripts/browser-open.sh +134 -0
- package/scripts/browser-press.sh +80 -0
- package/scripts/browser-remove-session.sh +72 -0
- package/scripts/browser-remove-site.sh +68 -0
- package/scripts/browser-replay.sh +206 -0
- package/scripts/browser-route.sh +174 -0
- package/scripts/browser-select.sh +122 -0
- package/scripts/browser-show-session.sh +57 -0
- package/scripts/browser-show-site.sh +37 -0
- package/scripts/browser-snapshot.sh +176 -0
- package/scripts/browser-stats.sh +522 -0
- package/scripts/browser-tab-close.sh +112 -0
- package/scripts/browser-tab-list.sh +70 -0
- package/scripts/browser-tab-switch.sh +111 -0
- package/scripts/browser-upload.sh +132 -0
- package/scripts/browser-use.sh +60 -0
- package/scripts/browser-vlm.sh +707 -0
- package/scripts/browser-wait.sh +97 -0
- package/scripts/install-git-hooks.sh +16 -0
- package/scripts/lib/capture.sh +356 -0
- package/scripts/lib/common.sh +262 -0
- package/scripts/lib/credential.sh +237 -0
- package/scripts/lib/fingerprint-rescue.js +123 -0
- package/scripts/lib/flow.sh +448 -0
- package/scripts/lib/flow_record.sh +210 -0
- package/scripts/lib/mask.sh +49 -0
- package/scripts/lib/memory.sh +427 -0
- package/scripts/lib/migrate.sh +390 -0
- package/scripts/lib/migrators/README.md +23 -0
- package/scripts/lib/migrators/memory/v1_to_v2.sh +15 -0
- package/scripts/lib/migrators/recent_urls/README.md +13 -0
- package/scripts/lib/migrators/stats/README.md +24 -0
- package/scripts/lib/node/chrome-devtools-bridge.mjs +1812 -0
- package/scripts/lib/node/mcp-server.mjs +531 -0
- package/scripts/lib/node/mcp-tools.json +68 -0
- package/scripts/lib/node/playwright-driver.mjs +1104 -0
- package/scripts/lib/node/totp-core.mjs +52 -0
- package/scripts/lib/node/totp.mjs +52 -0
- package/scripts/lib/node/url-pattern-cluster.mjs +102 -0
- package/scripts/lib/node/url-pattern-resolver.mjs +77 -0
- package/scripts/lib/output.sh +79 -0
- package/scripts/lib/router.sh +342 -0
- package/scripts/lib/sanitize.sh +107 -0
- package/scripts/lib/secret/keychain.sh +91 -0
- package/scripts/lib/secret/libsecret.sh +74 -0
- package/scripts/lib/secret/plaintext.sh +75 -0
- package/scripts/lib/secret_backend_select.sh +57 -0
- package/scripts/lib/session.sh +153 -0
- package/scripts/lib/site.sh +126 -0
- package/scripts/lib/stats.sh +419 -0
- package/scripts/lib/tool/.gitkeep +0 -0
- package/scripts/lib/tool/chrome-devtools-mcp.sh +349 -0
- package/scripts/lib/tool/obscura.sh +249 -0
- package/scripts/lib/tool/playwright-cli.sh +155 -0
- package/scripts/lib/tool/playwright-lib.sh +106 -0
- package/scripts/lib/verb_helpers.sh +222 -0
- package/scripts/lib/visual-rescue-default.sh +145 -0
- package/scripts/regenerate-docs.sh +99 -0
- package/uninstall.sh +51 -0
|
@@ -0,0 +1,154 @@
|
|
|
1
|
+
# Recipe — Phase 13 fingerprint rescue
|
|
2
|
+
|
|
3
|
+
**Use when**: a cached selector goes stale (class rename, id rename, minor DOM
|
|
4
|
+
reshuffle). The Phase-11 memory cache used to require 4 consecutive failures
|
|
5
|
+
+ an LLM re-resolve to recover. Phase 13 adds a *pre-LLM* rescue tier that
|
|
6
|
+
tries to find an equivalent element via weak-fingerprint similarity. If it
|
|
7
|
+
succeeds, the cache silently heals (selector overwritten, fail_count reset,
|
|
8
|
+
`self_heal_history[]` appended with `event:"rescued"`). If it fails, the
|
|
9
|
+
existing fail_count path runs unchanged.
|
|
10
|
+
|
|
11
|
+
**Inspired by**: Scrapling's adaptive selectors. *Not* a Scrapling adapter —
|
|
12
|
+
the algorithm is ported as ~150 LOC of Node-side scoring + bash glue. No
|
|
13
|
+
Python dependency added.
|
|
14
|
+
|
|
15
|
+
## When the rescue runs
|
|
16
|
+
|
|
17
|
+
Only on `browser-do --intent` cache-hit-then-fail with exit code
|
|
18
|
+
`EXIT_EMPTY_RESULT` (11) or `EXIT_ASSERTION_FAILED` (13). Environmental
|
|
19
|
+
failures (network 30, tool crash 42, timeout 43) skip the rescue — those
|
|
20
|
+
would poison the cache if counted.
|
|
21
|
+
|
|
22
|
+
```
|
|
23
|
+
cache hit → dispatch verb → adapter resolves 0 elements (rc=11)
|
|
24
|
+
│
|
|
25
|
+
▼
|
|
26
|
+
memory_fingerprint_rescue
|
|
27
|
+
│
|
|
28
|
+
┌───────────────┴───────────────┐
|
|
29
|
+
▼ ▼
|
|
30
|
+
returns rescued_selector returns "" (no match ≥ threshold)
|
|
31
|
+
│ │
|
|
32
|
+
▼ ▼
|
|
33
|
+
retry verb with rescued memory_record_failure (existing path)
|
|
34
|
+
│ fail_count++ → after 4 → disabled
|
|
35
|
+
retry succeeds?
|
|
36
|
+
│
|
|
37
|
+
┌─────┴─────┐
|
|
38
|
+
▼ ▼
|
|
39
|
+
yes no
|
|
40
|
+
│ │
|
|
41
|
+
memory_record_heal: memory_record_failure (existing path)
|
|
42
|
+
- overwrite selector
|
|
43
|
+
- reset fail_count
|
|
44
|
+
- bump success_count
|
|
45
|
+
- append self_heal_history
|
|
46
|
+
- emit stats event {rescued:true}
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
## Algorithm
|
|
50
|
+
|
|
51
|
+
1. **Parse cached selector → weak fingerprint** (bash + jq):
|
|
52
|
+
|
|
53
|
+
```
|
|
54
|
+
"button.delete" → { tag: "BUTTON", classes: ["delete"], attrs: {} }
|
|
55
|
+
"#submit" → { tag: "*", classes: [], attrs: {id: "submit"} }
|
|
56
|
+
"form > input.email" → { tag: "FORM", classes: ["email"], attrs: {} } (combinator stripped — weak)
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
Combinators (`>`, `+`, `~`), pseudo-classes (`:hover`), attribute operators
|
|
60
|
+
(`^=`, `*=`) are not parsed. The fingerprint will simply be weaker and the
|
|
61
|
+
JS scorer will probably miss — caller falls through to LLM re-resolve.
|
|
62
|
+
|
|
63
|
+
2. **Inject scorer into the page** via `browser-extract --eval` (Phase 13 JS
|
|
64
|
+
file: `scripts/lib/fingerprint-rescue.js`). Constants `__FP` (the
|
|
65
|
+
fingerprint) and `__TH` (threshold) are prepended bash-side.
|
|
66
|
+
|
|
67
|
+
3. **Score each DOM element**:
|
|
68
|
+
|
|
69
|
+
```
|
|
70
|
+
score = 0.4 × tag_match
|
|
71
|
+
+ 0.4 × jaccard(target.classes, candidate.classList)
|
|
72
|
+
+ 0.2 × jaccard(target.attrs, candidate.attributes)
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
4. **Synthesise selector for the best-scoring candidate** above threshold:
|
|
76
|
+
|
|
77
|
+
```
|
|
78
|
+
1. #id (when id is /^[A-Za-z][\w-]*$/ AND uniquely resolving)
|
|
79
|
+
2. [data-testid="…"] (preferred test-automation hook)
|
|
80
|
+
3. tag.class[.class…] (uniquely resolving)
|
|
81
|
+
4. nth-child path (absolute last-resort)
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
## Threshold
|
|
85
|
+
|
|
86
|
+
Default `0.70`. Override per-session:
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
BROWSER_DO_RESCUE_THRESHOLD=0.85 bash scripts/browser-do.sh --site myapp --verb click --intent "delete row" --pattern '/devices/:id'
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
- `0.70` (default) — Scrapling-like balance. Accepts moderate drift (class
|
|
93
|
+
rename) but rejects very-different candidates.
|
|
94
|
+
- `0.85` — conservative. Fewer false positives, more LLM round-trips on
|
|
95
|
+
borderline drift.
|
|
96
|
+
- `0.50` — permissive. More heals, but watch the `failure_mode=
|
|
97
|
+
wrong_element_acted` count in `browser-stats report` for false-positive
|
|
98
|
+
drift.
|
|
99
|
+
|
|
100
|
+
## Audit visibility
|
|
101
|
+
|
|
102
|
+
Each successful rescue emits a dedicated stats event:
|
|
103
|
+
|
|
104
|
+
```json
|
|
105
|
+
{
|
|
106
|
+
"schema_version": 1,
|
|
107
|
+
"ts": "2026-05-18T01:02:03.456Z",
|
|
108
|
+
"gen_ai_tool_name": "browser-do.fingerprint_rescue",
|
|
109
|
+
"verb": "do",
|
|
110
|
+
"adapter_route": "browser-do",
|
|
111
|
+
"outcome": "success",
|
|
112
|
+
"rescued": true,
|
|
113
|
+
"fingerprint_from_selector": "button.delete",
|
|
114
|
+
"fingerprint_to_selector": "button[data-testid=delete-btn]"
|
|
115
|
+
}
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
Query the heal-rate:
|
|
119
|
+
|
|
120
|
+
```bash
|
|
121
|
+
bash scripts/browser-stats.sh report --verb do
|
|
122
|
+
# → look for "browser-do.fingerprint_rescue" rows + outcome=success share
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
## When NOT to use
|
|
126
|
+
|
|
127
|
+
- **Cross-page rescue.** The rescue only runs against the *currently-loaded*
|
|
128
|
+
DOM. If the verb already redirected and the cached selector is for the
|
|
129
|
+
prior page, rescue won't find it. (`navigation_mismatch` failure mode
|
|
130
|
+
instead.)
|
|
131
|
+
- **Identifier-free designs.** If the target element has no id, no
|
|
132
|
+
data-testid, no stable classes, and no unique nth-child position, the
|
|
133
|
+
synthesised selector will be brittle. Better to invest in a stable
|
|
134
|
+
test-automation hook than depend on rescue.
|
|
135
|
+
- **Heavy DOMs.** The scorer walks `document.querySelectorAll('*')` — O(n).
|
|
136
|
+
For pages with 10k+ DOM nodes the scoring will take >500ms. Consider
|
|
137
|
+
scoping with `document.querySelectorAll(target.tag.toLowerCase())` if you
|
|
138
|
+
hit this in practice (future v2 — the current implementation prioritises
|
|
139
|
+
recall over speed).
|
|
140
|
+
|
|
141
|
+
## Failure modes mapped (vs the Phase-12 stats audit)
|
|
142
|
+
|
|
143
|
+
| Outcome | failure_mode | rescued | Interpretation |
|
|
144
|
+
|---|---|---|---|
|
|
145
|
+
| Cache hit, adapter ok | null | null | Steady-state. No rescue ran. |
|
|
146
|
+
| Cache hit, adapter fail, rescue ok | null | true | Silent heal. Audit row: `gen_ai_tool_name=browser-do.fingerprint_rescue`. |
|
|
147
|
+
| Cache hit, adapter fail, rescue no-match | stale_ref (on the verb event) | null | Original fail_count++. Eventual LLM re-resolve. |
|
|
148
|
+
| Cache hit, adapter fail, rescue match-but-retry-fail | stale_ref (on the verb event) | false (future — currently null) | Algorithm picked wrong candidate. Track this metric to tune threshold. |
|
|
149
|
+
| Cache hit, adapter wrong-click | wrong_element_acted | true (false positive!) | Rescue scored a wrong element ≥ threshold. **Tune threshold up** if this rises. |
|
|
150
|
+
|
|
151
|
+
## Related
|
|
152
|
+
|
|
153
|
+
- Phase 11 `self_heal_history[]` lifecycle is preserved — see `scripts/lib/memory.sh::memory_record` (enabled→disabled) + `memory_record_failure` (disabled→enabled). Phase 13 adds the third event type: `event:"rescued"`.
|
|
154
|
+
- Phase 12 audit surface: `references/browser-stats-cheatsheet.md`.
|
|
@@ -0,0 +1,143 @@
|
|
|
1
|
+
# Recipe: Model routing — three-tier strategy
|
|
2
|
+
|
|
3
|
+
When and how to route Claude model selection across the parent session, this skill, and (eventually) per-verb invocations. The default ships with `model: sonnet` + `effort: low` in `SKILL.md` frontmatter; this recipe explains why, when to override, and how to layer in `opusplan` / `/advisor` at the parent session level.
|
|
4
|
+
|
|
5
|
+
## When to use this recipe
|
|
6
|
+
|
|
7
|
+
Use this whenever:
|
|
8
|
+
- A user reports the skill seems to "make wrong choices" on complex flows (multi-step logins, ambiguous snapshots) — they may need to escalate from default Sonnet.
|
|
9
|
+
- Your session token bill on browser tasks is bigger than you'd like — verify the three-tier setup is in place.
|
|
10
|
+
- You're integrating this skill into a different host CLI (Codex, Cursor, Gemini CLI) — the model-routing primitives may differ.
|
|
11
|
+
|
|
12
|
+
Do NOT use this recipe for:
|
|
13
|
+
- Picking which Anthropic SDK to install. Model routing is consumer-side; the SDK is producer-side.
|
|
14
|
+
- Speed-of-response tuning. Use `effort:` (low/medium/high/xhigh/max), not `model:`, for that knob.
|
|
15
|
+
|
|
16
|
+
## The three tiers
|
|
17
|
+
|
|
18
|
+
| Tier | Where it lives | Default in this skill | What it controls |
|
|
19
|
+
|---|---|---|---|
|
|
20
|
+
| 1. Parent session | `/model` command, `~/.claude/settings.json::model`, env var `ANTHROPIC_MODEL` | (user's choice — recommended: `opusplan`) | The "thinking" model — used when Claude reasons about what verb to call, parses snapshots, plans next steps |
|
|
21
|
+
| 2. Skill turn | `model:` field in `SKILL.md` frontmatter | `sonnet` + `effort: low` | The "acting" model — used during the single turn that invokes the skill. Per-turn override; resumes Tier 1 on next prompt |
|
|
22
|
+
| 3. Per-verb (future) | (not yet supported) | n/a | Some verbs may need Opus reasoning (login flow auto-detect); most just shell out to bash |
|
|
23
|
+
|
|
24
|
+
## Tier 1: Parent session
|
|
25
|
+
|
|
26
|
+
### Recommended: `/model opusplan` (stable)
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
# In any Claude Code session:
|
|
30
|
+
/model opusplan
|
|
31
|
+
|
|
32
|
+
# Or persist as default in ~/.claude/settings.json:
|
|
33
|
+
{ "model": "opusplan" }
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
`opusplan` is a Claude Code built-in alias: **Opus during plan mode, Sonnet during execution mode**. Plan mode is entered with `shift+tab` or `/plan`; exited with `shift+tab` again. Plan-mode reasoning is where the heavy thinking happens (designing flows, deciding how to debug a failure, brainstorming a feature plan). Execution mode is where Claude calls bash, edits files, runs verbs — Sonnet is enough.
|
|
37
|
+
|
|
38
|
+
This is the **zero-risk** starting point. No beta header. No skill edits. Available everywhere Claude Code runs (Anthropic-direct, Bedrock, Vertex, Foundry — though `opus`/`sonnet` resolve to provider-pinned versions on third-party providers).
|
|
39
|
+
|
|
40
|
+
### Advanced: `/advisor` (experimental as of v2.1.x)
|
|
41
|
+
|
|
42
|
+
```bash
|
|
43
|
+
/advisor # toggle in current session
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
`/advisor` is the Claude Code surface for the [Advisor Tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/advisor-tool). The session model becomes the **executor** (Sonnet 4.6 by default; can pair Haiku 4.5 too); during generation the executor consults an **advisor** model (Opus 4.7) mid-stream when it hits decision points.
|
|
47
|
+
|
|
48
|
+
Mechanism (per [Advisor Tool docs](https://platform.claude.com/docs/en/agents-and-tools/tool-use/advisor-tool)):
|
|
49
|
+
1. Executor decides to consult — emits `server_tool_use { name: "advisor", input: {} }`.
|
|
50
|
+
2. Anthropic server runs Opus inference with the full transcript (no client orchestration).
|
|
51
|
+
3. Advisor returns ~400-700 token plan (~1,400-1,800 with thinking).
|
|
52
|
+
4. Executor continues, advice in context.
|
|
53
|
+
|
|
54
|
+
Cost economics:
|
|
55
|
+
- Executor (Sonnet) generates the bulk of output → billed at Sonnet rate ($3/$15 per 1M).
|
|
56
|
+
- Advisor (Opus) generates only advice tokens, billed at Opus rate ($5/$25 per 1M).
|
|
57
|
+
- Internal Anthropic benchmarks: "Sonnet executor at medium effort + Opus advisor → intelligence comparable to Sonnet at default effort, at lower cost."
|
|
58
|
+
|
|
59
|
+
**Caveats:**
|
|
60
|
+
- Beta status. May change. May hit rate limits on the advisor sub-inference (`too_many_requests` error code; executor continues without the advice).
|
|
61
|
+
- Not yet on Bedrock/Vertex/Foundry — Anthropic-direct only.
|
|
62
|
+
- Advisor sub-inference doesn't stream — expect a pause when consultation fires.
|
|
63
|
+
- No built-in conv-level cap; if cost balloons, set `max_uses` per request or toggle `/advisor` off.
|
|
64
|
+
|
|
65
|
+
**When to add `/advisor`**: after `opusplan` proves out the cost-saving direction. If browser-automation flows show ad-hoc reasoning bottlenecks (Sonnet picking wrong refs, missing the right verb sequence), `/advisor` lets Sonnet ask Opus for a plan without paying full Opus rate for the whole turn.
|
|
66
|
+
|
|
67
|
+
## Tier 2: This skill's frontmatter
|
|
68
|
+
|
|
69
|
+
```yaml
|
|
70
|
+
---
|
|
71
|
+
name: browser-automation-skill
|
|
72
|
+
...
|
|
73
|
+
model: sonnet
|
|
74
|
+
effort: low
|
|
75
|
+
---
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
Per [Claude Code skills docs](https://code.claude.com/docs/en/skills): "The override applies for the rest of the current turn and is not saved to settings; the session model resumes on your next prompt."
|
|
79
|
+
|
|
80
|
+
So when a user (or Claude auto-loading) invokes the skill:
|
|
81
|
+
- Skill turn = Sonnet + low effort
|
|
82
|
+
- Next prompt = back to parent session model (Tier 1: opusplan or whatever the user set)
|
|
83
|
+
|
|
84
|
+
**Why Sonnet, not Haiku.** Haiku 4.5 is ~3× cheaper than Sonnet 4.6 but has noticeably less robustness on multi-step verb chaining (snapshot → pick `eN` ref → fill → submit). Browser-automation flows have enough orchestration that Sonnet earns its 3× over Haiku. If a specific user's flows are simple (single-step extracts, dry-run-only), they can override per-skill via `~/.claude/settings.json::skillOverrides` (not currently a documented field for model — file an issue if you need this).
|
|
85
|
+
|
|
86
|
+
**Why `effort: low`.** Effort is independent of model. Sonnet at `effort: low` saves tokens vs Sonnet at default effort, with minimal capability loss for pure verb-driving (no deep reasoning needed — Claude already planned in the parent turn). If a flow regresses, bump to `effort: medium`.
|
|
87
|
+
|
|
88
|
+
**Override escape-hatch.** When a session demands Opus reasoning during the skill turn (debugging a complex login flow that Sonnet keeps mishandling), the user can:
|
|
89
|
+
|
|
90
|
+
```bash
|
|
91
|
+
# Override for the rest of the session — `inherit` keeps the parent model
|
|
92
|
+
/model opus # before invoking the skill
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Or edit the skill's frontmatter to `model: inherit` permanently to disable the per-turn override (skill follows parent session model).
|
|
96
|
+
|
|
97
|
+
## Tier 3: Per-verb (deferred)
|
|
98
|
+
|
|
99
|
+
Not currently supported by Claude Code's frontmatter model. If different verbs in this skill needed different models (e.g. `login --interactive` wants Opus for form-shape detection; `snapshot` wants Haiku for pure screen-scrape), the workaround would be:
|
|
100
|
+
|
|
101
|
+
- Split the skill into N skills, each with its own `SKILL.md` + `model:` field. Path: every verb script becomes a tiny standalone skill.
|
|
102
|
+
- Or use `Agent` tool from inside the skill body to spawn a subagent with a different `model:` parameter.
|
|
103
|
+
|
|
104
|
+
Not worth the structural complexity until multiple users report it. Track as open follow-up.
|
|
105
|
+
|
|
106
|
+
## How to verify the routing is working
|
|
107
|
+
|
|
108
|
+
```bash
|
|
109
|
+
# In Claude Code:
|
|
110
|
+
/status # shows current session model
|
|
111
|
+
/model # opens picker — verify opusplan or your choice is selected
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
After invoking a skill verb (e.g. `/browser-automation-skill snapshot`), `/status` may show that the active model briefly flipped to Sonnet (depending on how Claude Code surfaces per-turn overrides). The `usage.iterations[]` array in the API response shows the breakdown if you're driving via SDK.
|
|
115
|
+
|
|
116
|
+
For cost-per-session tracking, `/cost` (when available) summarizes token spend by model.
|
|
117
|
+
|
|
118
|
+
## Common failure modes
|
|
119
|
+
|
|
120
|
+
| Symptom | Likely cause | Fix |
|
|
121
|
+
|---|---|---|
|
|
122
|
+
| Skill picks wrong `eN` ref repeatedly | Sonnet at `effort: low` undercutting on snapshot interpretation | Bump skill frontmatter to `effort: medium` or override session-side with `/model opus` |
|
|
123
|
+
| `/advisor` toggle fails with "too_many_requests" | Advisor rate-limited on Opus | Toggle `/advisor` off; rely on opusplan only. Or wait + retry. |
|
|
124
|
+
| `model: opusplan` in SKILL.md doesn't activate plan mode | opusplan is plan-mode-state-aware; skill turn doesn't enter plan mode by itself | Use `model: sonnet` (current default) — opusplan is a parent-session-level alias, not a skill-turn primitive |
|
|
125
|
+
| Cost still high after opusplan + skill model:sonnet | Most tokens going to non-skill turns (parent reasoning, file reads) | Profile with `/cost`; if parent-side dominates, that's where to optimize next |
|
|
126
|
+
|
|
127
|
+
## Recommended setup for new users
|
|
128
|
+
|
|
129
|
+
```
|
|
130
|
+
1. Run `claude update` to get v2.1.x (advisor support).
|
|
131
|
+
2. Run `/model opusplan` once — persists across sessions.
|
|
132
|
+
3. (Optional) Run `/advisor` to enable advisor consultation.
|
|
133
|
+
4. Use this skill — frontmatter already routes the skill turn to Sonnet + low effort.
|
|
134
|
+
5. Watch token usage via `/cost` over a few sessions; tune effort if needed.
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
## See also
|
|
138
|
+
|
|
139
|
+
- [Claude Code Model configuration (`opusplan`)](https://code.claude.com/docs/en/model-config)
|
|
140
|
+
- [Skills frontmatter reference (`model`/`effort`)](https://code.claude.com/docs/en/skills)
|
|
141
|
+
- [Advisor Tool — Claude API Docs](https://platform.claude.com/docs/en/agents-and-tools/tool-use/advisor-tool)
|
|
142
|
+
- [Anthropic API pricing](https://platform.claude.com/docs/en/about-claude/pricing)
|
|
143
|
+
- Sister recipes: [privacy-canary.md](privacy-canary.md), [path-security.md](path-security.md), [body-bytes-not-body.md](body-bytes-not-body.md)
|
|
@@ -0,0 +1,138 @@
|
|
|
1
|
+
# Recipe: Path security
|
|
2
|
+
|
|
3
|
+
For any verb that takes a `--path` argument and forwards bytes from disk to a downstream tool (browser, MCP server, network upstream). Establishes the three guarantees the verb owes its caller, all enforced bash-side BEFORE the adapter dispatches.
|
|
4
|
+
|
|
5
|
+
## When to use this recipe
|
|
6
|
+
|
|
7
|
+
Use this whenever a verb accepts a filesystem path and acts on its contents. Already shipped: `scripts/browser-upload.sh` (Phase 6 part 6).
|
|
8
|
+
|
|
9
|
+
Phase 7 capture-pipeline work will reuse this for any verb that writes capture artifacts to a caller-specified location (`--out PATH`).
|
|
10
|
+
|
|
11
|
+
Do NOT use this recipe for:
|
|
12
|
+
- Paths the **framework owns** (e.g. `${BROWSER_SKILL_HOME}/sessions/<name>.json`). Trust boundary doesn't apply — those paths are constructed, not accepted.
|
|
13
|
+
- Read-only metadata commands (`show-site --name X` — name isn't a path).
|
|
14
|
+
|
|
15
|
+
## The three checks (in order)
|
|
16
|
+
|
|
17
|
+
```bash
|
|
18
|
+
# scripts/browser-<verb>.sh — paste-ready scaffold
|
|
19
|
+
[ -n "${path}" ] || die "${EXIT_USAGE_ERROR}" "<verb> requires --path PATH"
|
|
20
|
+
|
|
21
|
+
# 1. Existence + regular-file check.
|
|
22
|
+
# Rejects: missing files, directories, devices, FIFOs, sockets.
|
|
23
|
+
if [ ! -e "${path}" ]; then
|
|
24
|
+
die "${EXIT_USAGE_ERROR}" "<verb>: path does not exist: ${path}"
|
|
25
|
+
fi
|
|
26
|
+
if [ ! -f "${path}" ]; then
|
|
27
|
+
die "${EXIT_USAGE_ERROR}" "<verb>: path is not a regular file: ${path}"
|
|
28
|
+
fi
|
|
29
|
+
|
|
30
|
+
# 2. Readability check (current user, current process).
|
|
31
|
+
if [ ! -r "${path}" ]; then
|
|
32
|
+
die "${EXIT_USAGE_ERROR}" "<verb>: path is not readable by the current user: ${path}"
|
|
33
|
+
fi
|
|
34
|
+
|
|
35
|
+
# 3. Sensitive-pattern reject. Override with --allow-sensitive (typed ack).
|
|
36
|
+
if [ "${allow_sensitive}" -ne 1 ]; then
|
|
37
|
+
case "${path}" in
|
|
38
|
+
*.ssh/*|*/.ssh/*|*.aws/credentials|*/.aws/credentials|*/.env|*.env|\
|
|
39
|
+
*/credentials|*/credentials.json|*/secrets.json|*/private_key*|*/id_rsa*|\
|
|
40
|
+
*/id_ed25519*|*/id_ecdsa*)
|
|
41
|
+
die "${EXIT_USAGE_ERROR}" "<verb>: path '${path}' matches a sensitive pattern; pass --allow-sensitive to override"
|
|
42
|
+
;;
|
|
43
|
+
esac
|
|
44
|
+
fi
|
|
45
|
+
|
|
46
|
+
# 4. Realpath canonicalization (eliminates symlink games).
|
|
47
|
+
canonical_path="$(realpath "${path}" 2>/dev/null \
|
|
48
|
+
|| readlink -f "${path}" 2>/dev/null \
|
|
49
|
+
|| printf '%s' "${path}")"
|
|
50
|
+
|
|
51
|
+
# Forward the canonical path, not the user's input.
|
|
52
|
+
verb_argv+=(--path "${canonical_path}")
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
Source of truth: `scripts/browser-upload.sh:74-103`.
|
|
56
|
+
|
|
57
|
+
## Why each check exists
|
|
58
|
+
|
|
59
|
+
### Check 1 — `[ -f "${path}" ]`
|
|
60
|
+
|
|
61
|
+
```
|
|
62
|
+
WRONG — only check existence
|
|
63
|
+
[ -e "${path}" ] || die ...
|
|
64
|
+
# Passes for /dev/zero, /tmp/some-fifo, /etc — agent uploads garbage.
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
`-f` rejects directories, character/block devices, FIFOs, and sockets. The downstream tool was promised "a file's bytes"; an attempt to read `/dev/zero` would either hang the agent or upload an arbitrary number of zero bytes.
|
|
68
|
+
|
|
69
|
+
### Check 2 — `[ -r "${path}" ]`
|
|
70
|
+
|
|
71
|
+
Failing this check returns a clear UX error from the verb script. Skipping it pushes the failure down to the adapter, where the error message is whatever cdt-mcp / playwright happens to emit (often opaque, often surfaces a permissions dump).
|
|
72
|
+
|
|
73
|
+
### Check 3 — Sensitive-pattern reject
|
|
74
|
+
|
|
75
|
+
```
|
|
76
|
+
WRONG — trust the agent to know what they're doing
|
|
77
|
+
verb_argv+=(--path "${path}") # forward whatever the agent typed
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
The agent can be tricked. A user pastes `--path ~/.ssh/id_rsa` into a browser-automation prompt and the agent obediently uploads it. Sensitive-pattern reject is the **default-deny**; `--allow-sensitive` is the typed acknowledgment that the agent saw the pattern and is uploading intentionally (e.g. uploading a GPG key to a keyserver).
|
|
81
|
+
|
|
82
|
+
The pattern list is intentionally short — it covers the **boring high-frequency** cases (SSH keys, AWS credentials, `.env` files). It is not a full DLP. Don't expand it to chase exotic filenames; that creates an arms race against tools-of-the-month.
|
|
83
|
+
|
|
84
|
+
### Check 4 — Realpath canonicalization
|
|
85
|
+
|
|
86
|
+
```
|
|
87
|
+
WRONG — forward agent input verbatim
|
|
88
|
+
verb_argv+=(--path "${path}")
|
|
89
|
+
|
|
90
|
+
# Then a symlink game beats the sensitive-pattern reject:
|
|
91
|
+
$ ln -s ~/.ssh/id_rsa /tmp/innocent.txt
|
|
92
|
+
$ verb --path /tmp/innocent.txt # passes step 3 (path doesn't match patterns)
|
|
93
|
+
# but actually uploads the SSH key
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
`realpath` resolves the symlink; the canonical path becomes `~/.ssh/id_rsa`, which the sensitive-pattern check would have caught — but that check already ran. Two correct orderings exist:
|
|
97
|
+
|
|
98
|
+
- **Resolve THEN check** (safer; resolution can't be skipped). Order in the recipe is "check then resolve" because that's how `browser-upload.sh` shipped; both work, but resolve-first is what to write next time.
|
|
99
|
+
- **Check then resolve, then re-check** (paranoid). Reasonable if you're worried about TOCTOU between the two operations.
|
|
100
|
+
|
|
101
|
+
Cross-platform fallback: macOS pre-Xcode-11 lacks `readlink -f` and may lack `realpath`. The chain `realpath || readlink -f || printf '%s'` gracefully degrades to verbatim path on the rare platform that has neither — at the cost of skipping symlink resolution on that platform. CI exercises both GNU (Linux) and BSD (macOS) realpath paths.
|
|
102
|
+
|
|
103
|
+
## What's NOT this recipe's job
|
|
104
|
+
|
|
105
|
+
- **Encryption-at-rest of the file's contents.** That's a different layer (the user's filesystem, FileVault, LUKS, etc.).
|
|
106
|
+
- **Anti-malware scanning.** Verb is a thin transport, not a security product.
|
|
107
|
+
- **Quarantining files after upload.** Out of scope; the user owns the file.
|
|
108
|
+
- **Sandboxing the downstream tool.** That's the adapter's concern — sensitive-pattern reject + realpath stops *accidental* exfil; defense against a hostile downstream tool is the wrong threat model for this layer.
|
|
109
|
+
|
|
110
|
+
## Test surface (already shipped for upload, copy for new verbs)
|
|
111
|
+
|
|
112
|
+
`tests/browser-upload.bats` cases worth porting:
|
|
113
|
+
- Path doesn't exist → `EXIT_USAGE_ERROR`.
|
|
114
|
+
- Path is a directory → `EXIT_USAGE_ERROR`.
|
|
115
|
+
- Path matches `~/.ssh/id_rsa` pattern → `EXIT_USAGE_ERROR` mentioning sensitive.
|
|
116
|
+
- Same path with `--allow-sensitive` → success (dry-run).
|
|
117
|
+
- Symlink-to-sensitive resolved by realpath → still rejected.
|
|
118
|
+
- Symlink-to-innocent resolved by realpath → success, canonical path forwarded.
|
|
119
|
+
|
|
120
|
+
## Checklist for any new path-accepting verb
|
|
121
|
+
|
|
122
|
+
```
|
|
123
|
+
1. Verb takes --path PATH and (if writes) maybe --out PATH.
|
|
124
|
+
2. Add --allow-sensitive flag (default 0; typed ack).
|
|
125
|
+
3. Inline the four-step block from this recipe between argv parsing and
|
|
126
|
+
adapter dispatch. Replace `<verb>` with the verb name in error strings.
|
|
127
|
+
4. Forward the CANONICAL path (post-realpath), not the user's input.
|
|
128
|
+
5. Test cases: missing / dir / unreadable / sensitive-rejected /
|
|
129
|
+
sensitive-allowed / symlink-to-sensitive / symlink-to-innocent.
|
|
130
|
+
6. CHANGELOG entry with [security] tag if this is a new attack surface.
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
## See also
|
|
134
|
+
|
|
135
|
+
- `scripts/browser-upload.sh:74-103` — the source-of-truth implementation.
|
|
136
|
+
- `tests/browser-upload.bats` — test cases to port.
|
|
137
|
+
- [Privacy canary recipe](privacy-canary.md) — sister pattern for credential bytes.
|
|
138
|
+
- [Body-bytes-not-body recipe](body-bytes-not-body.md) — sister pattern for content bodies.
|
|
@@ -0,0 +1,96 @@
|
|
|
1
|
+
# Recipe: Privacy canary
|
|
2
|
+
|
|
3
|
+
A sentinel-byte regression test for any verb that ingests caller-supplied secrets (passwords, tokens, TOTP shared-secrets, session storage state). Detects the day a refactor accidentally re-emits the secret on stdout, in a log line, or inside a JSON reply.
|
|
4
|
+
|
|
5
|
+
## When to use this recipe
|
|
6
|
+
|
|
7
|
+
Use this **whenever you add a verb that reads a secret via stdin** (the AP-7 pattern — see `anti-patterns-tool-extension.md::AP-7`). Examples already shipped:
|
|
8
|
+
|
|
9
|
+
- `tests/creds-show.bats::49` — `creds show` invariant.
|
|
10
|
+
- `tests/creds-migrate.bats::124` — backend transfer mustn't echo.
|
|
11
|
+
- `tests/creds-rotate-totp.bats::99` — TOTP shared-secret roundtrip.
|
|
12
|
+
- `tests/chrome-devtools-mcp_daemon_e2e.bats::140` — `fill --secret-stdin` end-to-end.
|
|
13
|
+
|
|
14
|
+
Do NOT use this recipe for:
|
|
15
|
+
- Verbs that don't ingest secrets (the canary has nothing to detect).
|
|
16
|
+
- Verbs whose only "secret" is something the agent typed and is happy to read back (e.g. `route fulfill --body` — see `body-bytes-not-body.md` instead; the body is content, not a credential).
|
|
17
|
+
|
|
18
|
+
## The pattern
|
|
19
|
+
|
|
20
|
+
```
|
|
21
|
+
WRONG — assert "secret didn't appear in some specific field"
|
|
22
|
+
@test "fill --secret-stdin: reply has no .text key" {
|
|
23
|
+
printf 'pw' | bash browser-fill.sh --ref e1 --secret-stdin
|
|
24
|
+
printf '%s' "$output" | jq -e '.text == null' >/dev/null
|
|
25
|
+
}
|
|
26
|
+
# Brittle: only catches a single regression mode (echoing in a known field).
|
|
27
|
+
# A new code path that puts the secret into a NEW field passes this test.
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
```
|
|
31
|
+
RIGHT — assert "this exact byte sequence does not appear ANYWHERE on stdout"
|
|
32
|
+
@test "fill --secret-stdin: privacy canary" {
|
|
33
|
+
CANARY="sekret-do-not-leak-XYZ"
|
|
34
|
+
run bash -c "printf '%s' '${CANARY}' | bash '${SCRIPTS_DIR}/browser-fill.sh' --ref e1 --secret-stdin"
|
|
35
|
+
assert_status 0
|
|
36
|
+
printf '%s' "${output}" | grep -q "${CANARY}" \
|
|
37
|
+
&& fail "skill stdout leaked the secret canary: ${CANARY}" || true
|
|
38
|
+
# Reply shape still correct (don't accept a "no output" pass).
|
|
39
|
+
printf '%s' "${output}" | jq -e '.verb == "fill" and .status == "ok"' >/dev/null
|
|
40
|
+
}
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
The canary string MUST be:
|
|
44
|
+
- **Unique** to this test (so a grep can't accidentally match a real reply field). Embed the verb name and the test number: `sekret-do-not-leak-CDT-1c-ii`, `canary-creds-show-49`.
|
|
45
|
+
- **Long enough** that grep's failure mode is meaningful. ~10+ ASCII characters; shorter strings risk colliding with field names like `id` or `ok`.
|
|
46
|
+
- **Distinct** from the bytes the test injects elsewhere (don't reuse the canary as a username).
|
|
47
|
+
|
|
48
|
+
## Why a sentinel beats field-shape assertions
|
|
49
|
+
|
|
50
|
+
Field-shape assertions catch the regression you predict; sentinels catch the regression you didn't predict. The sentinel test answers a different question:
|
|
51
|
+
|
|
52
|
+
> "Does **any** code path between stdin-read and stdout-write echo this byte sequence?"
|
|
53
|
+
|
|
54
|
+
That question is what the AP-7 invariant actually claims. Anchoring the test to a specific field reduces it to "does *this one path* echo," which is the strictly weaker claim.
|
|
55
|
+
|
|
56
|
+
## Layered coverage: bash AND daemon
|
|
57
|
+
|
|
58
|
+
Verbs that go through the bridge daemon need **two** canaries:
|
|
59
|
+
|
|
60
|
+
```
|
|
61
|
+
tests/<verb>.bats # bash-side canary (verb script -> adapter)
|
|
62
|
+
tests/<adapter>_daemon_e2e.bats # daemon-side canary (bridge -> daemon -> MCP)
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
Each layer can independently leak — bridge could echo on its way to the daemon, or the daemon could echo back through IPC. The bash-side canary doesn't catch a daemon-side leak and vice versa.
|
|
66
|
+
|
|
67
|
+
Sample placement (already shipped): `fill --secret-stdin` has a canary in both `tests/browser-fill.bats` (bash) and `tests/chrome-devtools-mcp_daemon_e2e.bats:140` (daemon).
|
|
68
|
+
|
|
69
|
+
## Don't grep the file system
|
|
70
|
+
|
|
71
|
+
```
|
|
72
|
+
WRONG — assert canary doesn't appear in any state-dir file
|
|
73
|
+
grep -r "${CANARY}" "${BROWSER_SKILL_HOME}" && fail "leaked to disk"
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
Some verbs **legitimately persist the secret** (creds-add writes to keychain/file backend; that's the whole point). Disk persistence is governed by the credential-backend test, not by the privacy canary. The privacy canary is exclusively about **stdout** — what the agent / Claude transcript sees.
|
|
77
|
+
|
|
78
|
+
## Checklist for any new secret-ingesting verb
|
|
79
|
+
|
|
80
|
+
```
|
|
81
|
+
1. Pick a unique canary string (`sekret-do-not-leak-<verb>-<n>`).
|
|
82
|
+
2. Pipe the canary in via the verb's --secret-stdin / --*-stdin flag.
|
|
83
|
+
3. Capture stdout with `run bash -c '...'` (let bats own the buffer).
|
|
84
|
+
4. Negative assert: `printf '%s' "$output" | grep -q "${CANARY}" && fail`.
|
|
85
|
+
5. Positive assert: jq the reply shape so a "no output" run doesn't false-pass.
|
|
86
|
+
6. If the verb routes through a daemon/bridge, add a SECOND canary at the
|
|
87
|
+
daemon-e2e layer. Each layer's stdout is separately observable.
|
|
88
|
+
7. NEVER grep ${BROWSER_SKILL_HOME} for the canary — disk persistence is
|
|
89
|
+
the credential backend's responsibility, not this test's invariant.
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
## See also
|
|
93
|
+
|
|
94
|
+
- [Anti-patterns: tool extension `AP-7`](anti-patterns-tool-extension.md) — secrets-via-stdin invariant.
|
|
95
|
+
- [Body-bytes-not-body recipe](body-bytes-not-body.md) — sister pattern for non-secret caller-supplied content.
|
|
96
|
+
- `tests/argv_leak.bats` — the AP-7 enforcement test (catches secrets on argv).
|