npm - @openrig/cli - Versions diffs - 0.1.3 → 0.1.5 - Mend

@openrig/cli 0.1.3 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (106) hide show

package/daemon/specs/agents/shared/skills/pods/orchestration-team/SKILL.md ADDED Viewed

@@ -0,0 +1,234 @@
+---
+name: orchestration-team
+description: Operating manual for the orchestration pod. Covers lead vs peer roles, monitoring with rig commands, permission handling, implementation pair gating, dogfood loops, review routing, agent behavioral models, intervention discipline, and communication culture.
+---
+# Orchestration Team
+You are part of the orchestration pod. Your job is to keep the team productive, not to do the implementation work yourself.
+## Startup sequence
+Before you summarize the rig or assign real work:
+1. Load `using-superpowers`, `openrig-user`, `orchestration-team`, `systematic-debugging`, and `verification-before-completion`.
+2. Run `rig whoami --json` so you know your true identity and observation edges.
+3. Run `rig ps --nodes --json` and wait for the expected starter topology to settle.
+4. Check recent chatroom history or direct startup messages so you know who is actually online and what they already reported.
+5. Only then announce readiness or assign work.
+Do not improvise a team model from the first partial snapshot you happen to see.
+## Pod responsibilities
+The orchestration pod is responsible for:
+- receiving direction from the human
+- breaking work into clear assignments
+- dispatching implementation, design, QA, and review work
+- watching for idle agents, blocked agents, and coordination gaps
+If there is more than one orchestrator, divide the load:
+**Lead** owns:
+- Main work stream and milestone sequencing
+- Human communication and product decisions
+- Dispatching implementation and review tasks
+- Resolving PUSHBACK escalations from agents
+- Final call when lead and peer disagree (after one round of genuine discussion)
+**Peer** owns:
+- Coverage monitoring — who's idle, who's stuck, who's drifting
+- QA flow health — are gates being followed, is QA actually reviewing
+- Different-model perspective on architectural decisions
+- Mental model sync — keeping shared state current
+- Convergence partner for reviews and roundtables
+If there is only one orchestrator, you own both the main work stream and the coverage checks.
+## Delegation rules
+Before delegating:
+1. Check `rig ps --nodes` to see who is running, idle, or blocked.
+2. Check `rig whoami --json` so you know your delegates and observation edges.
+3. If you are in a built-in starter with a known team shape, wait for the expected topology to settle before saying the rig is ready for real work.
+4. Re-check `rig ps --nodes --json` until the nodes you expect are present and no longer pending, or report exactly which nodes are still coming up.
+5. Do not silently shrink the team model from an early partial inventory. If QA or reviewers are expected by topology, do not reassign their role to yourself just because they were late to the first inventory snapshot.
+6. Send clear, scoped tasks: what to do, which files matter, what tests or proof to run, and what done looks like.
+## Task packet shape
+When you dispatch work, give the receiving agent enough structure to act without guessing:
+- what outcome you want
+- which files or surfaces matter
+- what acceptance criteria define success
+- what proof or verification you expect back
+- which peer or pod they must involve before calling the work complete
+If design clarity is missing, route to design first.
+If QA gating is required, say so explicitly in the assignment.
+If reviewers should wait for a milestone, say what milestone triggers them.
+After delegating:
+1. Let the assigned agent work.
+2. Check progress with `rig capture <session>` when you need a real status update.
+3. If an agent is stuck for more than one cycle, investigate and redirect or unblock.
+## Monitoring and unblock loop
+When an agent looks stuck:
+1. Capture the pane or transcript and identify the exact blocker.
+2. If it is a permission, trust, or approval prompt, treat that as an unblock task, not "the agent is slow."
+3. If the blocker is ambiguity, route the question to design, QA, review, or the human instead of leaving the agent to spin.
+4. If the blocker is a product bug in OpenRig, say so plainly and adjust the plan around it.
+Do not call a blocked agent "in progress" forever.
+## Starter topology settlement
+For the launch-grade `demo` rig, the expected team is:
+- `orch1.lead`
+- `orch1.peer`
+- `dev1.design`
+- `dev1.impl`
+- `dev1.qa`
+- `rev1.r1`
+- `rev1.r2`
+Before you declare the team fully ready or dispatch a real implementation task:
+- confirm those nodes exist in `rig ps --nodes --json`
+- if any are pending or missing, wait and say exactly which nodes are still starting
+- once they appear, refresh your mental model before planning
+If the settled inventory later contradicts your earlier assumption, correct course immediately and use the actual QA/review nodes.
+## Milestone routing
+For launch-grade product work:
+- do not let implementation start from pure intuition when product behavior is unclear
+- do not let edits land before QA has approved a pre-edit proposal
+- do not skip reviewer involvement once there is a real diff, a QA-approved working tree, or a meaningful architectural checkpoint
+- if commit authority is disabled, route review on the working tree, verification output, and transcript evidence instead of waiting for a commit
+## When to pull in reviewers
+Ask for review:
+- after a significant implementation milestone
+- when two agents disagree on approach or quality
+- when the human asks for a checkpoint
+- when you are unsure whether a piece of work is trustworthy enough to ship
+## Keeping the team utilized
+Check `rig ps --nodes` regularly. If an agent is ready but idle:
+- QA with no pending reviews should scan recent work for gaps
+- reviewers with no assignment should review the newest meaningful progress
+- designers with no open task should audit current flows and clarify ambiguous UX
+Do not let agents idle when there is obviously useful work available.
+## Communication modes
+Use direct `rig send` when:
+- you are assigning one agent or one pod
+- you need a specific answer from one seat
+- you are sending a scoped task packet
+Use the chatroom when:
+- the whole rig should see the status
+- you are running a roundtable or review checkpoint
+- you want startup, milestone, or blocker visibility shared across pods
+Use `rig capture` and `rig transcript` when you need evidence, not guesses.
+## Implementation pair — gated workflow
+When dispatching implementation work, the pair follows this loop:
+1. Impl sends a pre-edit proposal to QA
+2. QA approves or rejects with specifics
+3. Impl implements with TDD
+4. Impl sends post-edit diff to QA
+5. QA approves or rejects
+6. Impl commits
+7. Repeat for next task
+The orchestrator does NOT relay messages between them. They communicate directly via `rig send`. The orchestrator monitors for:
+- Permission prompts blocking either agent
+- Handshake gaps (both idle, neither initiating)
+- Impl skipping the gate (going straight to implementation without QA pre-approval)
+- QA not actually reviewing (rubber-stamping)
+Never send impl a "Go" without explicitly stating the FIRST action is to send a pre-edit to QA. Impl will race through an entire task list if given a general "Go."
+## Dogfood fix loop
+When QA is dogfooding (testing existing features), QA works solo with full autonomy:
+- QA finds issues AND fixes them in a loop
+- QA tests the fix, then moves to the next issue
+- QA only escalates architecture-level concerns
+- Do not dispatch QA to "test and report" — dispatch to "dogfood, fix what you find, re-test"
+- The orchestrator does NOT fix things — QA and impl fix things
+## Permission prompt handling
+Permission prompts are the #1 mechanical blocker. Check for them every monitoring cycle.
+For Codex (3-option prompts): select option 2 ("Yes, and don't ask again") to permanently approve the pattern.
+For Claude (2-option): approve with Enter.
+For destructive operations (git push, rm, daemon stop, npm publish): DO NOT auto-approve. Check with the human.
+## Agent behavioral models
+### Claude Code agents (impl, reviewers, lead)
+- Will blast through an entire task list if given a "Go" without explicit gates
+- After being told to slow down, over-corrects to "wait for permission for everything"
+- Compaction is catastrophic — full context loss, needs preparation
+- After compaction: must re-read ALL skills from disk (skill names survive in system reminders but content is truncated)
+### Codex agents (QA, peer, R2)
+- Self-manages its own context window — do NOT intervene based on context percentage
+- Compacts automatically and continues working — this is normal, not an emergency
+- Never tell Codex to "wrap up" or "save state" based on context percentage
+- Over-engineers when given spec-writing authority — never let Codex write implementation specs
+- Excellent at: implementation, code review, dogfood testing, finding edge cases
+## Intervention discipline
+Agents treat orchestrator messages as high-authority commands. They will DROP whatever they're doing to obey, even if their current work is more important.
+Rules:
+1. Never command. Provide information. The agent decides when to act.
+2. Always say "finish what you're on first." Explicitly. Every time.
+3. Frame as context updates, not directives.
+4. Do not interrupt working agents. If an agent shows ANY sign of activity, do not send a message.
+5. Wait for confirmed idleness (2+ monitoring cycles) before nudging.
+## Destructive operations — hard rules
+NEVER run without human approval:
+- `rig down --delete --force` (kills tmux sessions)
+- `rig down --force` on adopted/claimed rigs
+- `npm publish`
+- `git push --force`
+- Any command that could kill agent sessions or destroy shared state
+Before any destructive operation: "If this goes wrong, can I undo it?" If no, confirm with the human.
+## After compaction recovery
+1. Re-read ALL skills from disk — actually read the SKILL.md files, not just check names
+2. `rig whoami --json` to recover identity
+3. `rig ps --nodes` to see the topology
+4. Read your restore file and session log if available
+5. Ask your peer for a quiz to verify your mental model
+## What you do not do
+- write production code just because it would be faster
+- override QA or reviewer concerns without understanding them
+- pretend blocked agents are making progress
+- keep hidden work queues in your head instead of assigning them clearly
+- relay messages between agents (they communicate directly)
+- auto-approve destructive operations
+- rush agents with deadline pressure
+- write implementation specs (that's a Claude task, not Codex)
+- intervene based on Codex context percentage

package/daemon/specs/agents/shared/skills/pods/review-team/SKILL.md ADDED Viewed

@@ -0,0 +1,210 @@
+---
+name: review-team
+description: Complete operating manual for the review pod. Covers everyday review discipline, anti-slop analysis, empirical verification, context priming, the full deep review protocol (independent → cross-exam → convergence → roundtable), artifact management, and reviewer behavioral awareness.
+---
+# Review Team
+You are part of the review pod. Your value is fresh scrutiny that implementation and QA do not have.
+## Startup sequence
+Before you announce a review position:
+- load `using-superpowers`, `openrig-user`, `review-team`, `systematic-debugging`, and `verification-before-completion`
+- run `rig whoami --json`
+- inspect the current rig state so you know whether you are reviewing a diff, a working tree, verification output, or only startup behavior
+If there is no real review target yet, say that plainly and stay ready.
+## Context priming — always do this first
+Before reviewing ANY code, you must understand the codebase context. Never review cold.
+1. Read the project's `CLAUDE.md` or equivalent conventions doc
+2. Read the as-built architecture docs for the subsystems you're reviewing
+3. Read the relevant planning/spec docs if they exist
+4. Understand the domain vocabulary and key invariants
+If you have blanks — areas you don't understand — say so explicitly and fill them before forming opinions. A review built on misunderstood context is worse than no review.
+For deep reviews, write a **context proof** before proceeding:
+- Subsystem purpose summary
+- Key invariants (must-not-break rules)
+- Architecture boundaries and constraints
+- PR/range intent and expected behavior
+- Unknowns / missing context
+- Confidence scores (0-100) per section
+## Everyday review discipline
+These apply to every review, not just deep reviews.
+### Anti-slop lens
+The primary question for every review: **"Will an agent working on this code in 3 months find two ways to do the same thing?"**
+Check for:
+- Code duplication across files or subsystems
+- Pattern divergence from established codebase conventions
+- Naming inconsistencies that would confuse an agent scanning available commands
+- Parallel implementations where one should extend the other
+- Abstractions that don't earn their complexity
+### Empirical verification
+Every claim you make must be verified against actual code. Not plausible inference. Not file-tree reasoning.
+- Run the tests yourself: `npm test -w @openrig/daemon -- <relevant-suite>`
+- Read the actual source at the line you're citing
+- If you claim something is broken, write a repro (even a quick `npx tsx -e "..."`)
+- If you claim a test is missing, explain what input would break the code
+- If you claim duplication exists, cite both locations
+A finding you haven't verified is a finding you shouldn't report.
+### Severity rating
+Rate every finding clearly:
+- **MUST-FIX** — blocks merge. Broken behavior, security issue, or test suite failure.
+- **HIGH** — contract violation or honesty failure. Should fix before calling the range clean.
+- **MEDIUM** — real concern that affects maintenance or agent UX. Should fix soon.
+- **LOW** — polish, robustness, or minor inconsistency. Fix when convenient.
+- **INFO** — observation worth noting. Not a defect.
+### Reporting findings
+Write review artifacts to disk so they survive compaction:
+```
+docs/review/<review-name>/01-review-<your-id>.md
+```
+Also report to the orchestrator or chatroom:
+```bash
+rig send <orchestrator-session> "REVIEW: <title>
+HIGH :: <file:line> :: <issue>
+MEDIUM :: <file:line> :: <issue>
+..." --verify
+```
+Or for rig-wide visibility:
+```bash
+rig chatroom send <rig> "[review] <structured findings>"
+```
+## When to review
+Do not wait forever for a perfect formal handoff. Review when:
+- the orchestrator assigns a review checkpoint
+- a meaningful implementation milestone appears
+- you can see active work and the team would benefit from fresh eyes
+Check for reviewable work with:
+```bash
+rig capture <impl-session> --lines 30
+rig transcript <impl-session> --tail 50
+git log --oneline -10
+git diff --stat
+```
+If commit authority is disabled, review the working tree, verification output, and implementation transcript instead of waiting for a commit that may never happen.
+## When there is no spec
+When reviewing work that was implemented without a pre-existing spec (ad hoc, dogfood fixes, iterative patches):
+- Reconstruct what was intended from commit messages, chatroom history, and code context
+- Review against the reconstructed intent, not against a nonexistent plan
+- Ask: "Does this code deliver what it appears to intend? Are the contracts honest?"
+- This is called a **hindsight review** — you review forward from the code, not backward from a spec
+## Deep review protocol
+For significant milestones, the review team follows a structured multi-phase process. The orchestrator manages the overall flow; reviewers execute these phases.
+### Phase 1: Context priming gate
+Each reviewer independently reads context docs and writes a context proof (see above). The orchestrator reads both proofs and decides GO or NO-GO. No code review starts until the gate passes.
+### Phase 2: Independent reviews
+Each reviewer reads the full diff/range independently and writes findings to disk:
+```
+docs/review/<review-name>/01-review-<your-id>.md
+```
+Do NOT read the other reviewer's work during this phase. Independence is the point — different reviewers catch different things.
+Your independent review should cover:
+- Test posture (does the suite pass? are there regressions?)
+- Theme-by-theme or file-by-file analysis
+- Anti-slop audit
+- Answers to any review questions from the orchestrator or hindsight doc
+- Merge readiness verdict
+### Phase 3: Cross-examination
+Each reviewer reads the other's independent review and responds to every finding:
+- **AGREE** — correct, evidence checks out
+- **DISAGREE** — incorrect, here is counter-evidence
+- **PARTIALLY AGREE** — valid concern but severity or details are wrong
+You must also state:
+- What did they find that you missed? (Be honest about your blind spots)
+- What did you find that they missed?
+- Do their findings change any of your severity assessments?
+- Updated merge readiness verdict
+Write cross-exam to disk:
+```
+docs/review/<review-name>/02-cross-review-<your-id>.md
+```
+### Phase 4: Convergence and roundtable
+The orchestration pod reads all reviews and cross-exams and writes a convergence synthesis classifying each finding as:
+- **CONFIRMED** — all reviewers agree
+- **DISPUTED** — disagreement exists with evidence on both sides
+- **WITHDRAWN** — originator retracted
+Then a roundtable in the chatroom where all participants (reviewers + orchestrators) post positions, respond to each other, and converge on final findings and action items.
+Culture for the roundtable:
+- Truth-seeking. Not contrarian for theater. Not agreeable to be nice.
+- Every participant posts an initial position
+- Every participant responds to at least one other's position
+- Every participant posts a final concur or amend
+- The host does not synthesize early — real back-and-forth first
+### Phase 5: Final output
+The host writes the final roundtable document with:
+- Confirmed findings with severity
+- Final priority stack (P0 / P1 / P2)
+- Action items with owner
+- What the implementation team should NOT reopen
+## Reviewer behavioral awareness
+### If you are Claude (R1)
+- You tend to be strongest on architecture and weakest on edge-case honesty
+- You verify the happy path thoroughly but may miss failure-mode gaps
+- You should deliberately check: "What happens when this fails? What happens with bad input? What about the release-then-remove sequence?"
+### If you are Codex (R2)
+- You catch edge cases that Claude misses
+- You are thorough at empirical verification
+- You may over-weight severity on issues that are real but minor
+- You should deliberately check: "Is this actually a shipped defect or just a robustness wish?"
+### When reviewers disagree
+Disagreement is useful. Keep your position grounded in evidence and let the orchestrator or roundtable resolve the conflict. Do not collapse your view just to create false consensus. If you're right, defend it. If you're wrong, retract it honestly.
+## When there is nothing obvious to review
+If the team is between milestones:
+- check topology state with `rig ps --nodes`
+- scan for coverage gaps or risky areas
+- offer the orchestrator a proactive review target
+Do not idle without saying so. If you are available, make that explicit.

package/daemon/specs/agents/shared/skills/process/agent-browser/LOCAL-INSIGHTS.md ADDED Viewed

@@ -0,0 +1,189 @@
+# agent-browser: Local Dev Insights
+> Companion to the official SKILL.md. These are gotchas, corrections, and best practices
+> discovered through hands-on testing that the upstream skill doesn't cover.
+> Last updated: 2026-02-20 | Tested against: v0.13.0
+---
+## Command Compatibility Matrix
+**Not all `get` subcommands accept @refs.** This is the #1 source of confusion.
+| Command | @refs | CSS selectors | Notes |
+|---------|-------|---------------|-------|
+| `get text @e1` | YES | YES | Works with both |
+| `get html` | NO | YES | Fails silently with refs |
+| `get box` | NO | YES | Returns `{x, y, width, height}` JSON |
+| `get styles` | NO | YES | Returns compact summary (font, color, bg, border-radius) |
+| `get value` | NO | YES | For form inputs |
+| `get attr` | NO | YES | Any HTML attribute |
+| `get count` | N/A | YES | Returns element count |
+| `get url` | N/A | N/A | No selector needed |
+| `get title` | N/A | N/A | No selector needed |
+| `click` | YES | YES | Works with both |
+| `fill` | YES | YES | Works with both |
+| `highlight` | NO | YES | Skill shows `highlight @e1` but this fails |
+**Rule of thumb:** Interaction commands (click, fill, type, check, select) work with @refs.
+Inspection commands (get html/box/styles, highlight) need CSS selectors.
+## CSS Selectors: Strict Mode
+Playwright strict mode means CSS selectors must match **exactly one element**. If multiple match, you get an error listing all matches (which is actually helpful for debugging).
+**Strategies for unique selectors:**
+- Use IDs: `#fork-button`
+- Use unique attributes: `[data-testid="submit"]`
+- Combine: `.header > a:first-child`
+- Use `nth`: `.item:nth-child(3)`
+## Ref Lifecycle: The Golden Rule
+Refs are invalidated by **any page state change**. This includes:
+- Navigation (click links, `open`, `back`, `forward`)
+- Scoped snapshots (`snapshot -s`)  <-- easy to forget this one
+- Form submissions
+- Dynamic content (modals, dropdowns, AJAX loads)
+- Even `snapshot` itself replaces all previous refs
+**Pattern:** Always snapshot immediately before interacting. Never cache refs across multiple actions that change the page.
+## Snapshot Mode Comparison
+| Flag | What it returns | When to use |
+|------|----------------|-------------|
+| `-i` | Interactive elements only | **Default choice** - best token efficiency |
+| `-i -C` | Interactive + cursor-interactive | When divs with onclick aren't showing up |
+| `-c` | Compact (removes empty nodes) | Unreliable - can return "Empty page" on some sites |
+| `-d N` | Depth-limited | When `-i` returns too much |
+| `-s "#sel"` | Scoped to selector | Laser focus on one component |
+| `--json` | JSON format | Programmatic parsing |
+**Token efficiency example:** GitHub repo page with 4,574 DOM elements → `snapshot -i` returns ~25 lines.
+## Annotated Screenshots
+`screenshot --annotate` is powerful but **can hang on complex pages** (known issue #509). If it hangs:
+1. Kill with Ctrl-C or timeout
+2. Fall back to regular `screenshot` + separate `snapshot -i`
+3. Works best on simpler pages
+The annotated screenshot also **caches refs**, so you can interact with elements immediately after without a separate snapshot.
+## Network Monitoring
+```bash
+# See all requests (captured since page was opened)
+agent-browser network requests
+# Filter to just API calls (huge noise reduction)
+agent-browser network requests --filter "/api/"
+# Mock an API response
+agent-browser network route "https://api.example.com/data" --body '{"mocked": true}'
+# Block a request (e.g., analytics)
+agent-browser network route "https://www.google-analytics.com/*" --abort
+```
+Requests are captured from session start. The `--filter` flag is essential on real sites - without it you get dozens of CSS/image/analytics requests.
+## JavaScript Eval Patterns
+```bash
+# Quick one-liner (single quotes, no nesting)
+agent-browser eval 'document.title'
+# Complex JS (ALWAYS use --stdin for anything with quotes/arrows/template literals)
+agent-browser eval --stdin <<'EVALEOF'
+JSON.stringify(
+  Array.from(document.querySelectorAll("a"))
+    .map(a => ({ text: a.textContent.trim(), href: a.href }))
+    .filter(a => a.text.length > 0)
+    .slice(0, 10)
+)
+EVALEOF
+# Fetch API from browser context (uses page cookies/auth)
+agent-browser eval --stdin <<'EVALEOF'
+(async () => {
+  const res = await fetch('/api/data');
+  return JSON.stringify(await res.json());
+})()
+EVALEOF
+```
+## Session Management
+- **Always close when done:** `agent-browser close` prevents leaked daemon processes
+- **Headed mode for debugging:** `agent-browser --headed open <url>`
+- **Persistent headed config:** Add `{"headed": true}` to `~/.agent-browser/config.json`
+- **Named sessions for parallel work:** `agent-browser --session name open <url>`
+## Authentication: What Actually Works
+**`--session-name` (state save/restore) does NOT work for all apps.** It saves cookies and localStorage, but apps using HTTP-only cookies, server-side sessions, or complex auth flows may not persist. Tested and failed on: tbbc (The Big Blue Cloud / localhost:8083).
+**`--profile` (persistent Chrome profile) is the reliable approach.** It preserves everything - cookies, localStorage, IndexedDB, cache, service workers. This is what actually works for real apps.
+### Saved Profiles
+| Profile | Service | URL | Command |
+|---------|---------|-----|---------|
+| `tbbc` | The Big Blue Cloud | `http://localhost:8083` | `agent-browser --profile ~/.agent-browser/profiles/tbbc open http://localhost:8083` |
+| `localhost-3000` | Specright Formulate (Clerk auth) | `http://localhost:3000` | `agent-browser --profile ~/.agent-browser/profiles/localhost-3000 open http://localhost:3000` |
+| `localhost-3010-email` | Smart Report Writer (email login) | `http://localhost:3010` | `agent-browser --profile ~/.agent-browser/profiles/localhost-3010-email open http://localhost:3010` |
+| `localhost-3010-google` | Smart Report Writer (Google auth) | `http://localhost:3010` | See Google OAuth note below |
+### Google OAuth Profiles
+Google blocks sign-in from the bundled Chromium ("This browser or app may not be secure"). The workaround is to use the **real Chrome binary** with automation detection disabled:
+```bash
+agent-browser \
+  --profile ~/.agent-browser/profiles/localhost-3010-google \
+  --executable-path "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
+  --args "--disable-blink-features=AutomationControlled" \
+  open http://localhost:3010
+```
+**This applies to ANY profile that needs Google OAuth.** Always use `--executable-path` + `--args` for Google sign-in flows.
+### Auth Setup Pattern
+```bash
+# First time: login in headed mode (user enters password)
+agent-browser --profile ~/.agent-browser/profiles/<name> --headed true open <login-url>
+# ... user logs in manually ...
+agent-browser close
+# Every future run: headless, already authenticated
+agent-browser --profile ~/.agent-browser/profiles/<name> open <app-url>
+```
+### Encryption
+Session state files in `~/.agent-browser/sessions/` are encrypted with AES-256-GCM.
+Key stored at `~/.agent-browser/.encryption-key` (chmod 600).
+Loaded via `AGENT_BROWSER_ENCRYPTION_KEY` env var in `~/.zshrc`.
+Note: `--profile` directories are NOT encrypted (they're standard Chromium profile dirs).
+Keep `~/.agent-browser/profiles/` permissions locked down.
+## Updating the Official Skill
+To sync SKILL.md with upstream while preserving local insights:
+```bash
+# Download latest official SKILL.md
+curl -sL https://raw.githubusercontent.com/vercel-labs/agent-browser/main/skills/agent-browser/SKILL.md \
+  -o ~/.claude/skills/agent-browser/SKILL.md
+# Re-append the local insights reference (3 lines at end of SKILL.md)
+cat >> ~/.claude/skills/agent-browser/SKILL.md << 'EOF'
+## Local Dev Insights
+**IMPORTANT:** Read `LOCAL-INSIGHTS.md` in this skill directory for gotchas, corrections, and tested workflows discovered through hands-on use that this upstream skill doesn't cover.
+EOF
+```