npm - typeclaw - Versions diffs - 0.5.1 → 0.7.0 - Mend

typeclaw 0.5.1 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (48) hide show

package/README.md +34 -84
package/package.json +1 -1
package/src/agent/index.ts +80 -8
package/src/agent/live-subagents.ts +215 -0
package/src/agent/plugin-tools.ts +60 -20
package/src/agent/session-origin.ts +15 -0
package/src/agent/subagents.ts +140 -3
package/src/agent/system-prompt.ts +42 -0
package/src/agent/tools/channel-reply.ts +24 -1
package/src/agent/tools/channel-send.ts +26 -1
package/src/agent/tools/spawn-subagent.ts +283 -0
package/src/agent/tools/subagent-cancel.ts +96 -0
package/src/agent/tools/subagent-output.ts +192 -0
package/src/bundled-plugins/agent-browser/skills/agent-browser/SKILL.md +26 -0
package/src/bundled-plugins/explorer/explorer.ts +103 -0
package/src/bundled-plugins/explorer/index.ts +11 -0
package/src/bundled-plugins/guard/index.ts +12 -1
package/src/bundled-plugins/guard/policies/managed-config.ts +139 -0
package/src/bundled-plugins/guard/policy.ts +1 -0
package/src/bundled-plugins/operator/index.ts +11 -0
package/src/bundled-plugins/operator/operator.ts +76 -0
package/src/bundled-plugins/scout/index.ts +11 -0
package/src/bundled-plugins/scout/scout.ts +94 -0
package/src/channels/router.ts +32 -0
package/src/cli/init.ts +8 -1
package/src/cli/oauth-callbacks.ts +64 -34
package/src/cli/provider.ts +9 -4
package/src/config/config.ts +73 -16
package/src/config/index.ts +3 -0
package/src/config/providers.ts +106 -0
package/src/cron/index.ts +3 -0
package/src/cron/schema.ts +20 -0
package/src/init/dockerfile.ts +44 -5
package/src/init/models-dev.ts +1 -0
package/src/permissions/builtins.ts +23 -2
package/src/plugin/define.ts +2 -0
package/src/plugin/index.ts +2 -0
package/src/plugin/types.ts +15 -22
package/src/run/bundled-plugins.ts +6 -0
package/src/run/channel-session-factory.ts +19 -0
package/src/run/index.ts +56 -6
package/src/server/index.ts +103 -0
package/src/skills/typeclaw-claude-code/SKILL.md +273 -0
package/src/skills/typeclaw-claude-code/references/auth-flow.md +135 -0
package/src/skills/typeclaw-claude-code/references/stop-hook.md +99 -0
package/src/skills/typeclaw-claude-code/references/tmux-driving.md +157 -0
package/src/skills/typeclaw-config/SKILL.md +29 -26
package/typeclaw.schema.json +12 -0

package/src/skills/typeclaw-claude-code/SKILL.md ADDED Viewed

@@ -0,0 +1,273 @@
+---
+name: typeclaw-claude-code
+description: Use this skill whenever you decide to delegate substantial coding or code-analysis work to Claude Code (Anthropic's official coding-agent CLI). Triggers include "use Claude Code", "ask Claude Code", "delegate to claude", "claude cli", "have claude do it", any task where you want a more capable agent than yourself, and any time you're about to run `claude` from a shell. Read it before you spawn the CLI — Claude Code is a TTY-only TUI in interactive mode (you must drive it through tmux, not pipes), it operates inside a dedicated `git worktree` checkout under `/tmp/` so its commits never pollute the agent folder, and you detect "turn done" through a `Stop` hook that writes a sentinel file. Skipping this skill means you'll either fall back to `claude -p` (which strips plan mode and sub-agents), let claude mutate the live agent checkout (which loses you the rollback safety), or try to parse the TUI buffer with capture-pane heuristics (fragile, version-locked).
+---
+# typeclaw-claude-code
+You can delegate work to Claude Code, Anthropic's official coding agent. The agent runs as an interactive TUI: it plans, uses sub-agents, edits files, runs tools — the full loop. You drive it through tmux because your own process has no TTY, you isolate it in a dedicated `git worktree` so its experiments never touch the live agent checkout, and you detect "turn done" through a `Stop` hook that writes a sentinel file (not by parsing the TUI buffer).
+This skill is for the case where Claude Code is the right tool: hard architecture work, multi-file refactors, deep code analysis, a second-opinion read on something you wrote. It is **not** for trivial edits — the round-trip cost (worktree setup + process spawn + auth check + TUI init + at least one full Claude turn) is 15–45 seconds and several thousand tokens of someone else's context window. Do trivial edits yourself.
+## When to delegate to Claude Code
+Use Claude Code for:
+- **Multi-file refactors** that need a holistic plan before any edit lands.
+- **Code analysis** the user wants done thoroughly — "review this module", "find the bug in this 800-line file", "explain why X is slow".
+- **Implementations you're unsure about** where a more capable model would catch issues you'd miss.
+- **A second pair of eyes** on a design you've already drafted, especially when the user asks for one.
+Do **not** use Claude Code for:
+- One-line edits, typo fixes, single-function tweaks.
+- Anything where the user is watching your tool calls and wants to see each step — Claude's intermediate output is captured but not streamed back to the user.
+- Tasks that depend on context you haven't extracted yet. Claude won't have repo-wide context either; you have to brief it explicitly.
+## First-time auth (interactive)
+If `claude` is installed but no credential is set up, you have to broker the auth flow yourself. The user is talking to you through the TUI (or a channel); you walk them through one of two paths.
+**Decision rule, top to bottom:**
+1. **Already authenticated?** Run `env | grep -E '^(ANTHROPIC_API_KEY|CLAUDE_CODE_OAUTH_TOKEN)='` — if either is present, skip auth entirely.
+2. **User has an Anthropic Console workspace** (API billing, no subscription) → API key path.
+3. **User has a Claude Pro/Max/Team/Enterprise subscription** → OAuth token path.
+4. **User is unsure** → ask which kind of Claude account they have. Both paths are now equally low-friction (one user action each — paste an API key, or run one command on their machine and paste the result), so the old "prefer API key when unsure" bias is gone. Pick by account shape, not by flow complexity.
+Both paths converge on the same final steps: read `.env`, merge one new `KEY=value` line, write back with the `nonWorkspaceWrite` guard ack, verify, and prompt the user to restart the container. Only the credential differs.
+### API key path
+1. Ask the user: "Paste your Anthropic API key (starts with `sk-ant-`) — or say 'cancel' to use OAuth instead."
+2. **Validate** the pasted value before writing: `/^sk-ant-[A-Za-z0-9_-]{20,}$/`. If it doesn't match, refuse and ask again — neither the guard nor the restart tool catches a malformed token.
+3. **Read** the existing `.env` first (if any). Parse it into a key→value map so you don't clobber unrelated entries.
+4. **Reconstruct** the full `.env` content with `ANTHROPIC_API_KEY=<value>` added or replaced.
+5. **Write** with `acknowledgeGuards: { nonWorkspaceWrite: true }`. `.env` is in the `nonWorkspaceWrite` guard's deny set; the call fails without the ack flag.
+6. **Verify** by re-reading the file.
+7. **Ask the user**: "Auth is on disk. The container needs to restart to load it (TUI will briefly disconnect). May I restart now, or do you have other changes to make first?"
+8. On yes → call the `restart` tool. On no → tell them to run `typeclaw restart` themselves when ready.
+### OAuth path
+The OAuth flow runs **on the user's own machine**, not inside the container. The user generates a long-lived `CLAUDE_CODE_OAUTH_TOKEN` with `claude setup-token` on whatever local machine they're already authenticated on, copies the printed token, and pastes it back to you. You write it to `.env` exactly like the API key path.
+Why this works: `claude setup-token` is Anthropic's documented path for "CI pipelines, scripts, or other environments where interactive browser login isn't available" ([code.claude.com/docs/en/authentication](https://code.claude.com/docs/en/authentication)). A typeclaw container is exactly that environment. The token is one-year-lived, authenticates against the user's Claude subscription, and is scoped to inference only — it can't establish Remote Control sessions or otherwise act outside of `claude` CLI calls.
+Do **not** run `claude setup-token` inside the container. The container has no browser, no display, and (for remote-host typeclaw deployments) is on a different machine from the user's browser anyway. The user's local machine already has `claude` installed for them to be a subscriber in the first place — they're the right place to run the one-off `setup-token` command.
+1. Confirm with the user: "Do you have the `claude` CLI installed on your local machine and are you signed in to it with your Claude Pro/Max/Team/Enterprise account? If not, install it from claude.com/code and `claude login` first."
+2. Once they confirm, instruct them: "Run `claude setup-token` on your machine. It opens a browser, you authorize, and the terminal prints a long token (looks like `sk-ant-oat01-...` or similar). Copy that token and paste it back to me. The token is long-lived (one year) and authenticates against your Claude subscription — keep it private."
+3. When they paste, **validate** before writing: `/^[A-Za-z0-9_-]{30,}$/`. Strip surrounding whitespace first. If it doesn't match (too short, contains slashes, looks like a URL or a sentence), refuse and ask again — the user may have pasted a partial copy or the wrong line.
+4. **Read** the existing `.env` first. Parse it into a key→value map.
+5. **Reconstruct** the full `.env` content with `CLAUDE_CODE_OAUTH_TOKEN=<value>` added or replaced.
+6. **Write** with `acknowledgeGuards: { nonWorkspaceWrite: true }`.
+7. **Verify** by re-reading the file.
+8. **Ask before restart** (same prompt as the API key path).
+9. On yes → call the `restart` tool. On no → `typeclaw restart` themselves when ready.
+The full validation rules, the failure modes on the user's side (their `claude` CLI is signed out, their `setup-token` command 401s, their subscription is expired), and the rationale for not doing the OAuth dance in-container are in `references/auth-flow.md`.
+### Cost-cap warning
+Interactive-mode Claude Code has **no built-in spend cap** — `--max-budget-usd` only works in `-p` mode, which is not what we use here. If the user is on the API-key path, recommend setting a workspace spend limit in the Anthropic Console; that's the only safety net. If they're on OAuth (subscription), usage is bounded by the subscription's monthly Agent SDK credit pool. Tell them once before the first delegation so it's not a surprise.
+## Prerequisites
+Before you spawn `claude` for any real work:
+- **`docker.file.claudeCode: true`** in `typeclaw.json`. Verify with `which claude`; if missing, the toggle isn't on. Tell the user to enable it and `typeclaw start --build`.
+- **`docker.file.tmux: true`** (default `true`, but check). Verify with `which tmux`.
+- **Auth set up** — see above. Verify with `env | grep -E '^(ANTHROPIC_API_KEY|CLAUDE_CODE_OAUTH_TOKEN)='`.
+- **Agent folder is a git repo.** Verify with `git -C /agent rev-parse --is-inside-work-tree`. The worktree model below requires it. If the user's agent folder somehow isn't a repo (rare — `typeclaw init` scaffolds one), tell them to `git init && git add -A && git commit -m "initial"` first.
+- **No uncommitted changes that you care about.** `git -C /agent status --porcelain` should be clean, or you should be willing to set the working tree aside before delegating. The worktree is a separate checkout, so claude can't see your uncommitted changes — meaning claude operates on the last committed state. If the user wants claude to work with in-progress edits, commit them first (even on a WIP branch).
+If any prerequisite is missing, stop and surface the gap to the user. Do not try to install `claude` yourself in the running container — the install belongs in the Dockerfile layer, not at runtime.
+## Create the worktree
+Each delegation runs inside a dedicated `git worktree` checkout under `/tmp/`. This is the load-bearing isolation that makes the rest of the skill safe:
+- **Claude can edit, commit, reset, run tests** — none of it touches the agent folder's live working tree or its main branch pointer.
+- **You get perfect introspection.** `git diff` between claude's branch and your main checkout shows exactly what claude changed; `git log` shows how it got there.
+- **Cleanup is bounded.** When you're done, you remove the worktree and its branch; nothing persists on disk except deliberately cherry-picked commits.
+- **The agent folder's `git status` stays clean during delegation** — the user can keep working on their own checkout while claude operates in parallel.
+### Setup
+Pick a task id (short hex string or `verb-noun` like `refactor-auth`) and create the worktree:
+```sh
+git -C /agent worktree add -b cc-<task-id> /tmp/cc-<task-id> HEAD
+cd /tmp/cc-<task-id>
+mkdir -p .claude
+```
+This creates:
+- A new branch `cc-<task-id>` rooted at the agent folder's current `HEAD`.
+- A new working tree at `/tmp/cc-<task-id>/` containing every file from that commit.
+- An entry in `/agent/.git/worktrees/cc-<task-id>/` that ties the two together.
+The worktree shares the agent folder's `.git` directory but has its own `HEAD`, index, and working tree. Branch state lives in `/agent/.git/refs/heads/cc-<task-id>` regardless of where the worktree itself lives on disk.
+Inside `/tmp/cc-<task-id>/`, write the per-task hook config (see "The Stop hook" below):
+```
+/tmp/cc-<task-id>/
+├── .claude/
+│   └── settings.json        # registers the Stop hook
+├── hook-on-stop.sh          # the hook script, chmod +x
+├── sentinel.json            # written by the hook (does not exist yet)
+└── .done                    # flag file (does not exist yet)
+└── ...                      # plus every file from the agent folder's HEAD
+```
+### Why `/tmp/`, not `workspace/`?
+`workspace/` is the agent folder's gitignored scratch zone — fine for one-off scripts. But a `git worktree` is a _checkout_, not scratch: it carries an index, refs in `/agent/.git/worktrees/`, and (briefly) shares working-tree state with the main checkout. Putting it under `workspace/` would mean the agent folder contains a worktree of itself, which works mechanically but is recursive and confusing (nested worktrees? infinite recursion if claude does `git status`?). `/tmp/cc-<id>/` keeps the worktree clearly outside the agent folder. It's also genuinely ephemeral — `/tmp/` is tmpfs-ish, survives container life but never enters git history or backups.
+## The Stop hook
+Claude Code fires a `Stop` hook every time it finishes responding — turn-end, not session-end. The hook runs an arbitrary shell command with the lifecycle event payload (JSON) on stdin. We use this as the done-signal: the hook writes the payload to `sentinel.json` and `touch`es `.done`, and your polling loop watches for `.done`.
+Minimum `/tmp/cc-<id>/.claude/settings.json`:
+```json
+{
+  "hooks": {
+    "Stop": [
+      {
+        "matcher": "*",
+        "hooks": [{ "type": "command", "command": "./hook-on-stop.sh" }]
+      }
+    ]
+  }
+}
+```
+Minimum `/tmp/cc-<id>/hook-on-stop.sh` (chmod +x):
+```sh
+#!/bin/sh
+# stdin carries the Stop event JSON; transcript_path points at the JSONL.
+cat > sentinel.json.tmp
+mv sentinel.json.tmp sentinel.json
+touch .done
+```
+The temp-file-then-rename keeps the read side from ever seeing a partial sentinel. The full schema of the Stop event (every field Claude Code populates, including `last_assistant_message` and `transcript_path`) is in `references/stop-hook.md`.
+## Driving the session
+The minimum protocol — translate to your actual tool calls:
+1. Create the worktree, write the hook config (above).
+2. `tmux new-session -d -s cc-<id> -c /tmp/cc-<id> claude`.
+3. Wait ~3 seconds for the TUI to initialize.
+4. `tmux send-keys -t cc-<id> "<your prompt>" Enter`.
+5. **Poll** for `/tmp/cc-<id>/.done` in a 500ms-cadence loop with a wall-clock budget (default 10 minutes). On every iteration, also check `tmux has-session -t cc-<id>` — if the session died, claude crashed or auth failed.
+6. When `.done` exists: `rm .done`, read `sentinel.json`, examine `last_assistant_message`.
+7. Decide using the multi-turn loop below.
+8. When done: `tmux send-keys -t cc-<id> "/exit" Enter && sleep 1 && tmux kill-session -t cc-<id>`.
+The full polling implementation, the ANSI-handling rules for `capture-pane` fallbacks, and the "tmux session died unexpectedly" recovery path are in `references/tmux-driving.md`.
+## The multi-turn decision loop
+`Stop` fires every turn — including turns where claude paused to ask you a question, not just turns where claude finished the task. After every Stop sentinel, read `last_assistant_message` and decide:
+- **Ends with a question mark, or contains "Do you want me to", "Should I", "Could you clarify"** → claude is asking a clarifying question. Compose an answer from the original task brief and `send-keys` it back. Reset the loop: `rm .done`, poll again.
+- **Mentions a permission-style ask** ("May I run `<command>`?", "Allow me to edit `<file>`?") → answer per the task's safety constraints. If the constraint is unclear, abort with `/exit` and surface to the user — never invent a yes/no on the user's behalf for an unbounded operation.
+- **Looks like a final result** (code block + summary, or "Done.", "Here's the result.", "I've finished") → capture and `/exit`.
+- **Looks like a status update mid-tool-use** ("Let me check…", "Reading the file now…") → this is a spurious Stop (a Claude turn-boundary that isn't real task progress). Just `rm .done` and keep polling.
+**Hard turn cap: 8 turns per delegation.** Beyond that, either the task is too complex to delegate cleanly or claude is stuck in a loop. Abort with `/exit`, capture what you have, surface to the user with: "Claude took 8 turns without finishing — here's what it produced, what do you want to do?"
+This loop is the most failure-prone part of the skill. If you find yourself uncertain whether a message is a question or a result, **default to surfacing to the user**, not to guessing. Wrong answers compound across turns.
+## Capturing the output
+Four sources, in order of preference:
+1. **`git diff /agent main..cc-<id>`** (run from `/agent`, or use the explicit worktree path). This is the killer feature of the worktree model — the exact set of changes claude made, branch-vs-branch. Use this for code-change tasks.
+2. **`git log cc-<id> --oneline main..cc-<id>`** for how claude got there (the sequence of commits). Useful when claude broke a refactor into steps you want to attribute or cherry-pick.
+3. **`sentinel.json` from the final turn** (`last_assistant_message`). The narrative summary claude gave you. Use this for analysis tasks where the answer is prose, not code.
+4. **The JSONL transcript** at `transcript_path` in the sentinel. The complete conversation including intermediate tool calls. Use when the diff/log aren't enough and you need to see how claude reasoned. Schema in `references/stop-hook.md`.
+For code-change tasks, the canonical pattern is:
+1. Read `last_assistant_message` for the summary.
+2. Run `git diff main..cc-<id> -- <files>` to see the actual changes.
+3. Decide: are these changes good? If yes, either `git cherry-pick <commits>` onto the agent folder's branch OR copy the changes manually into the main checkout and commit there with proper attribution (per `typeclaw-git`).
+4. Throw away the `cc-<id>` branch.
+Never paste Claude's output verbatim into your reply or a commit message. Summarize, attribute ("Claude Code's analysis: ..."), and stay accountable for the work. You delegated up; you didn't outsource ownership.
+## Cleanup discipline
+Cleanup is git-aware: a worktree isn't just a directory. Three steps, in order:
+```sh
+tmux kill-session -t cc-<id> 2>/dev/null || true
+git -C /agent worktree remove --force /tmp/cc-<id>
+git -C /agent branch -D cc-<id>
+```
+- **`tmux kill-session`** first because claude might still be holding files open. `|| true` because a clean `/exit` already killed the session.
+- **`git worktree remove --force`** because the working tree may have dirty files (the sentinel, the hook script, claude's in-progress edits). `--force` skips the "uncommitted changes" check; this is correct here because we're explicitly discarding the worktree.
+- **`git branch -D cc-<id>`** to delete the branch ref. Without this, `cc-<id>` lingers in `git branch -a` indefinitely. `-D` (capital) because `cc-<id>` is unmerged into anything you care about.
+Always do all three, including on failure paths. Orphan worktrees:
+- Show up in `git worktree list` forever.
+- Cause `git status` in the agent folder to mention "another worktree exists at /tmp/cc-<id>" if you `cd` somewhere related.
+- Make the next delegation with the same task-id fail with "branch already exists".
+Before starting a new delegation, check for orphans:
+```sh
+git -C /agent worktree list | grep cc-
+tmux ls 2>/dev/null | grep '^cc-'
+```
+Kill anything you find first.
+## When not to delegate
+A re-statement, because this is where the skill is most often misused:
+- **Trivial edits**: the round-trip cost dominates. Do it yourself.
+- **Tasks needing live user visibility**: claude's tool calls don't stream back through TypeClaw. The user sees a long pause, not progress. Use your own tools.
+- **Tasks where you don't have the context to brief claude**: spend tokens narrowing the problem first. A vague delegation produces a vague result.
+- **Anything secret beyond `ANTHROPIC_API_KEY`**: claude only sees the prompt you send it and the files in its worktree (which is everything at `HEAD`). Don't try to pass secrets through the prompt — they'll land in claude's transcript and in your sentinel.
+## Things you must not do
+- **Do not use `claude -p` for delegation work.** The headless print mode strips plan mode, sub-agents, and the agent loop. The whole reason to delegate up is the loop. If you find yourself reaching for `-p`, the right answer is probably "do it yourself".
+- **Do not run `claude` directly inside `/agent`.** Always inside `/tmp/cc-<id>/`. Running claude in the agent folder lets it mutate the live working tree and break the user's session in flight.
+- **Do not skip the worktree.** Even for short delegations, the worktree is what gives you the `git diff` introspection and the rollback safety. Skipping it because "this one's small" is the path to claude accidentally committing on the wrong branch.
+- **Do not share a tmux session across two delegated tasks.** Each task needs its own worktree, its own session, and its own `.claude/settings.json`. Sharing corrupts the sentinel state and crosses transcripts.
+- **Do not leave a tmux session, worktree, or branch alive after capturing the result.** All three need explicit teardown. Reusing them defeats the per-task isolation that makes the Stop hook reliable.
+- **Do not push claude's branch to a remote.** `cc-<id>` is throwaway. If something useful happened, cherry-pick onto a real branch first; don't push the experimental branch directly.
+- **Do not merge claude's branch into main without reviewing the diff.** The `git diff main..cc-<id>` is your review surface. Skipping the diff and merging blindly means you don't actually know what shipped.
+- **Do not commit `/tmp/cc-<id>/` artifacts back to the agent folder.** The sentinel, the hook script, the captured pane content are scratch — they live in `/tmp/`, they die with `worktree remove`.
+- **Do not paste Claude's output verbatim into a commit message or a user reply.** Summarize and attribute. You're accountable for the work you ship.
+- **Do not put `ANTHROPIC_API_KEY` or `CLAUDE_CODE_OAUTH_TOKEN` in `typeclaw.json`, in a prompt, or in any committed file.** They live in `.env`, which is gitignored. Period.
+- **Do not poll the JSONL transcript directly as the done-signal.** The JSONL has documented race conditions (the file can be stale when `Stop` fires, or occasionally missing entirely). The sentinel is the reliable signal; the JSONL is for content, not lifecycle.
+- **Do not write to `.env` without `acknowledgeGuards: { nonWorkspaceWrite: true }`.** The guard will refuse, the agent loop will retry the same broken write, and you'll waste tokens fighting the guard. The ack is required every write, not just the first one.
+- **Do not edit `.env` with the `edit` tool's patch semantics.** Use read-modify-write: read the whole file, reconstruct the new content, write the whole file. `.env` is a flat KV store; a fragile `oldText` match could corrupt unrelated lines.
+- **Do not run `claude setup-token` inside the container.** It's a TUI OAuth flow that wants a browser. The container has no display, no browser, and is often on a different machine from the user anyway. Always have the user run `setup-token` on their own machine and paste the resulting token back; never spawn it in tmux on this side.
+- **Do not echo, log, or transcribe the pasted `CLAUDE_CODE_OAUTH_TOKEN` value back to the user, into a sentinel, into a commit message, or into any message you send.** It's a one-year credential. Confirm receipt with "got it, validating" — never with the token itself.
+- **Do not invent answers to Claude's clarifying questions.** If you can't derive the answer from the original task brief, surface the question to the user. Wrong answers compound across multi-turn delegations.
+- **Do not exceed 8 turns per delegation.** Abort, capture what you have, surface. Long delegations almost always mean the task wasn't shaped right.
+- **Do not assume `claude` exists.** If `which claude` returns empty, the `docker.file.claudeCode` toggle isn't on. Tell the user, don't try to install it yourself.
+## Cross-references
+- **`references/auth-flow.md`** — both auth paths in detail: the API-key recap, the OAuth user-machine flow (what to tell the user, what their `claude setup-token` output looks like, validation rules), and the failure-mode catalogue (expired subscription, wrong account, malformed paste).
+- **`references/tmux-driving.md`** — full polling implementation, ANSI handling, session-died recovery, the `capture-pane` fallback details, the worktree-is-not-scratch distinction.
+- **`references/stop-hook.md`** — complete `Stop` event JSON schema, `SubagentStop` differences, transcript JSONL schema (unofficial but reverse-engineered), documented race conditions to handle.
+- **`typeclaw-config`** — the `docker.file.claudeCode` toggle that gates the install.
+- **`typeclaw-git`** — commit discipline for any cherry-picks or hand-copies from claude's worktree back into the agent folder.
+- **`typeclaw-monorepo`** — the `workspace/` vs `packages/` distinction (this skill uses `/tmp/`, not `workspace/`, for reasons explained above).

package/src/skills/typeclaw-claude-code/references/auth-flow.md ADDED Viewed

@@ -0,0 +1,135 @@
+# Auth flow — interactive
+Deep dive for the auth paths. Read it when `SKILL.md`'s "First-time auth (interactive)" section sends you here, or when an auth attempt fails and you need to understand what went wrong.
+The two paths are intentionally symmetric: in both, the user produces one string on their side, pastes it to you, you validate it, you do read-modify-write on `.env`, you offer a restart. Only the credential differs.
+## Path A — API key (recap)
+The API key path lives entirely in `SKILL.md` because there's nothing to elaborate. Summary:
+1. Prompt user for `sk-ant-…`.
+2. Validate `/^sk-ant-[A-Za-z0-9_-]{20,}$/`.
+3. Read `.env`, merge `ANTHROPIC_API_KEY=<value>` into the parsed map, reconstruct full content, write with `acknowledgeGuards: { nonWorkspaceWrite: true }`.
+4. Verify.
+5. Ask before restart.
+When to recommend it: the user has an **Anthropic Console** workspace (API billing, no Claude subscription). They get their key from `console.anthropic.com`. Cost is metered per-token against the Console workspace.
+## Path B — OAuth long-lived token, generated on the user's machine
+This is the path for users with a Claude **Pro / Max / Team / Enterprise** subscription. Inference cost is bounded by the subscription's monthly Agent SDK credit pool, not per-token.
+The token is generated by `claude setup-token`, which is Anthropic's own one-time setup command. From the [official docs](https://code.claude.com/docs/en/authentication):
+> For CI pipelines, scripts, or other environments where interactive browser login isn't available, generate a one-year OAuth token with `claude setup-token`. The command walks you through OAuth authorization and prints a token to the terminal. It does not save the token anywhere; copy it and set it as the `CLAUDE_CODE_OAUTH_TOKEN` environment variable wherever you want to authenticate.
+A typeclaw container is precisely that environment ("CI pipelines, scripts, or other environments where interactive browser login isn't available"). The user runs `setup-token` on their own machine — where they already have `claude` installed and `/login`-ed — copies the printed token, and pastes it to you.
+### Why on the user's machine, not in the container
+This was originally implemented as an in-container tmux dance: agent spawns `claude setup-token` in a tmux pane, scrapes the URL with `capture-pane`, surfaces it to the user, brokers the auth code back, regex-extracts the token from the pane. It worked, barely. It cost ~150 lines of pane-capture mechanics, ANSI stripping, URL-or-code parsing, retry logic, and timing assumptions that broke on every Claude Code version bump.
+The user-machine flow is strictly better:
+1. **Zero in-container surface area.** No tmux session, no pane capture, no version-locked regex matching the prompt wording, no 30-second polling budget, no race between OAuth-code single-use and your retry loop.
+2. **The user already has `claude` installed locally.** They had to, to have a subscription worth using `setup-token` against. The marginal install cost is zero.
+3. **The browser is already on the user's machine.** No matter where the container lives — laptop, remote VM, shared workstation, cloud sandbox — the user's browser is where the user is. `setup-token` on the user's machine has a working `localhost:1455` callback by definition; no cross-device dance needed.
+4. **Failure modes are easier to debug.** When `setup-token` fails on the user's machine, the user sees the error directly. When it failed in the container, you had to surface a stripped-ANSI capture-pane snapshot and hope the user could decipher it.
+5. **The token has no network dependency from inside the container.** Once it's in `.env`, `claude` reads it from the environment on startup — no token-refresh round-trips, no `api.anthropic.com` reachability requirement at auth time (only at inference time, which the agent needs anyway).
+There is no remaining case where running `setup-token` inside the container is preferable. The only thing the container needs is the resulting token string.
+### Step-by-step
+1. **Confirm prerequisites with the user, in one message:**
+   > To set up OAuth auth, you'll generate a long-lived token on your own machine. Two prerequisites:
+   >
+   > 1. Do you have the `claude` CLI installed locally? If not: install from `claude.com/code`, then `claude login` with your Claude Pro / Max / Team / Enterprise account.
+   > 2. Do you have a paid Claude subscription? (`setup-token` requires Pro, Max, Team, or Enterprise — it doesn't work on free accounts.)
+   >
+   > Once both are true, reply "ready" and I'll send the next step.
+   This single confirmation up-front is the difference between a one-paste flow and a multi-turn debugging session when the user discovers mid-flow that their CLI isn't installed.
+2. **When the user confirms, send the generation instructions:**
+   > Great. On your machine, run:
+   >
+   > ```sh
+   > claude setup-token
+   > ```
+   >
+   > It opens a browser, you authorize with your Claude account, and then prints **one long token** on the terminal. It looks something like:
+   >
+   > ```
+   > sk-ant-oat01-<long random string>
+   > ```
+   >
+   > Copy the whole token (just the token, not any surrounding text) and paste it back to me. The token is valid for one year and authenticates against your Claude subscription — treat it like a password.
+3. **Wait for the user's reply.** Expected shapes:
+   - **A bare token string.** Typically starts with `sk-ant-oat01-` but Anthropic has changed the prefix before and may again — do not hardcode the prefix.
+   - **The full line including `CLAUDE_CODE_OAUTH_TOKEN=`** if they pasted the `export` line they wrote themselves. Strip the `CLAUDE_CODE_OAUTH_TOKEN=` prefix (and any leading `export `) before validating.
+   - **An error message** if `setup-token` failed on their side. See the failure-mode list below.
+   - **"cancel"** or equivalent. Drop the flow cleanly.
+4. **Parse and validate**, in order:
+   1. Trim leading/trailing whitespace.
+   2. If the string starts with `export ` (with the space), drop the `export ` prefix.
+   3. If the string starts with `CLAUDE_CODE_OAUTH_TOKEN=`, drop that prefix. Also strip surrounding single or double quotes that the user's shell prompt may have included.
+   4. Validate the remainder: `/^[A-Za-z0-9_-]{30,}$/`. Tokens are opaque alphanumeric blobs with `_` and `-` only (current observed prefix is `sk-ant-oat01-` but Anthropic has changed prefixes before — validate by shape, not prefix). If the token format ever grows to include `.`, `/`, or other characters, this regex will reject valid tokens; widen the character class then, not preemptively.
+   5. If validation fails, ask once more: "That doesn't look like a `claude setup-token` token — it should be one long string with no spaces or newlines. Paste just the token. Or say 'cancel' to switch to API-key auth instead."
+   6. If the second attempt also fails, drop OAuth and recommend the API-key path.
+5. **Confirm receipt without echoing the token.** Reply something like "Got it, validating and writing to `.env`." Never include the token in your reply, in a log line, in a sentinel, in a commit message, or anywhere else.
+6. **Read `.env`** (existing content, may not exist).
+7. **Parse into a key→value map.** Be tolerant of comments (`#`-prefixed lines), blank lines, and quoted values. Preserve order and comments when reconstructing.
+8. **Merge `CLAUDE_CODE_OAUTH_TOKEN=<value>`.** Add if absent, replace if present. Do not quote — Docker's `--env-file` parser is brittle around quotes, and the token has no whitespace by validation.
+9. **Write back with `acknowledgeGuards: { nonWorkspaceWrite: true }`.** `.env` is in the `nonWorkspaceWrite` guard's deny set; the ack flag is required on every write to it, not just the first.
+10. **Verify by re-reading `.env`** and confirming the new line is there exactly once and the value matches what you wrote.
+11. **Ask before restart**, same prompt as the API-key path:
+    > Auth is on disk. The container needs to restart to load it (TUI will briefly disconnect). May I restart now, or do you have other changes to make first?
+12. On yes → call the `restart` tool. On no → tell them to run `typeclaw restart` themselves when ready.
+13. **Done.** There is no auth scratch directory, no tmux session to tear down, no worktree. The OAuth path has the same on-disk footprint as the API-key path: one new line in `.env`.
+## Failure modes on the user's side
+These all surface as the user's reply being an error message instead of a token. Recognize them, do not validate them as tokens, and respond with the matching guidance.
+- **"command not found: claude"** — they don't have the CLI installed locally. Point them at `claude.com/code` and ask them to `claude login` after installing.
+- **"Not logged in"** / `setup-token` immediately asking them to log in — they have the CLI but no active subscription session. Have them run `claude login` first, then re-try `claude setup-token`.
+- **"This account doesn't have access to a paid Claude subscription"** — they're on a free account. `setup-token` requires Pro / Max / Team / Enterprise. They either upgrade or use the API-key path.
+- **"Token request failed"** / generic network error during `setup-token` — their local machine couldn't reach `claude.ai` or `api.anthropic.com`. Check VPN, firewall, corporate proxy. Re-try in a moment.
+- **Browser opened but no token appeared in terminal** — they authorized in the wrong account, or they closed the tab before the callback completed. Have them run `setup-token` again and wait for the terminal to finish.
+- **They report success but pasted a string that fails validation** — most likely they pasted the surrounding output (the `export` line, a banner, instructions) rather than just the token. Re-ask, emphasize "just the token, no surrounding text".
+## Failure modes after you've written the token
+- **`typeclaw restart` fails or the container won't come up** — the credential is on disk, the restart is the problem. Don't re-prompt for auth; surface the restart failure and tell the user to run `typeclaw restart` from their host shell to see the underlying error.
+- **`claude` invocations after restart still say "Invalid API key" / "Unauthorized"** — token validation passed locally but the credential is rejected upstream. Three likely causes:
+  1. **Token from a different account than expected.** The user has multiple Claude accounts on their local machine and `setup-token` used the wrong one. Have them check `claude login` and re-run `setup-token` from the right account.
+  2. **`ANTHROPIC_API_KEY` is also set in `.env` and takes precedence.** Per the [auth precedence rules](https://code.claude.com/docs/en/authentication#authentication-precedence), `ANTHROPIC_API_KEY` outranks `CLAUDE_CODE_OAUTH_TOKEN`. Check `.env`; remove the stale `ANTHROPIC_API_KEY` line if the user wants OAuth.
+  3. **Token expired or revoked.** Tokens are one-year-lived; revocation happens if the user `/logout`s from the subscription that issued the token. Have them re-run `setup-token`.
+## Things you must not do during auth
+- **Do not run `claude setup-token` inside the container.** Use the user-machine flow above. The in-container tmux dance that this skill used to recommend has been removed; it was strictly worse than asking the user to run one command on their machine.
+- **Do not log, echo, paste-back, or otherwise transcribe the user's token.** Not in a confirmation message, not in a sentinel, not in a commit. A one-year credential leak is significantly worse than a momentary "did you mean this?" reflection — there's no good reason for the token to leave the `.env` write path.
+- **Do not write the token to `.env` until you've validated its format.** A malformed token quietly turns into a broken auth that surfaces only on the next claude invocation, long after the user has moved on.
+- **Do not retry validation more than once.** If the first paste fails the regex, ask once with clearer guidance ("just the token, no surrounding text"). If the second also fails, the user is in a state that text instructions won't resolve — drop the OAuth path and recommend API-key auth.
+- **Do not advise the user to `typeclaw shell` and run `claude setup-token` inside the container as a "fallback"**. It does not work — the container has no browser. If `setup-token` is failing on the user's machine, the right fix is on their machine (check `claude login`, check network), not switching the dance to the container.
+- **Do not assume the token format prefix.** Anthropic has changed the prefix on long-lived tokens before (the docs use generic placeholders like `your-token`). Validate by shape (length + character class), not by prefix.
+- **Do not write to `.env` without `acknowledgeGuards: { nonWorkspaceWrite: true }`.** Same guard contract as every other `.env` write.
+- **Do not patch-edit `.env`.** Read-modify-write the whole file. A fragile `oldText` match could corrupt unrelated lines.
+- **Do not branch on local-vs-remote container topology.** The user-machine flow is the same whether the container is on the user's laptop or on a remote host — the user runs `setup-token` on whatever local machine they're at, the token works in either container.

package/src/skills/typeclaw-claude-code/references/stop-hook.md ADDED Viewed

@@ -0,0 +1,99 @@
+# Stop hook — schema and gotchas
+Deep dive for the `Stop` lifecycle hook that powers the done-signal. Read it when the basic hook in `SKILL.md` isn't enough — when the transcript looks stale, the sentinel is malformed, or you need to extract intermediate tool calls from the JSONL.
+## What fires when
+Claude Code supports several lifecycle hooks. The two relevant to delegation are:
+- **`Stop`** — fires every time the _main_ agent finishes responding. This is per-turn, not per-session. A 5-turn conversation fires `Stop` five times. The "task is done" signal is just "the latest Stop, where claude's message looks like a result not a question" — that's the multi-turn decision loop in `SKILL.md`.
+- **`SubagentStop`** — fires when a _sub-agent_ (Task tool, plan-mode sub-agents, etc.) finishes. Sub-agents are claude spawning claude. You don't typically need to handle this — the parent's `Stop` fires after its sub-agents are done. Configure it only if you want progress signals during a sub-agent-heavy turn.
+Other hooks that exist (`PreToolUse`, `PostToolUse`, `Notification`, `SessionStart`, `SessionEnd`, `PreCompact`, etc.) are out of scope for this skill — they're useful for progress logging, command auditing, or session bookkeeping, but they're not the done-signal.
+## Stop event JSON schema
+The hook command receives a single JSON object on stdin. Fields observed in current Claude Code (subject to upstream churn — the docs page is at `https://docs.anthropic.com/en/docs/claude-code/hooks`):
+```jsonc
+{
+  "session_id": "abc123…", // The Claude Code session UUID
+  "transcript_path": "/root/.claude/projects/-tmp-cc-foo/abc123.jsonl",
+  "cwd": "/tmp/cc-foo", // Should match your worktree path
+  "permission_mode": "default", // or "plan", "bypassPermissions", etc.
+  "hook_event_name": "Stop", // Literal "Stop" for this event
+  "stop_hook_active": false, // True only while the hook itself runs
+  "last_assistant_message": "…", // The text of claude's just-finished turn
+}
+```
+Fields you actually use:
+- **`last_assistant_message`** — your primary capture for the multi-turn decision loop. Read this from `sentinel.json`, classify (question / permission / result / spurious), act.
+- **`transcript_path`** — points at the JSONL with the full conversation. Useful when `last_assistant_message` isn't enough.
+- **`cwd`** — sanity check. If this isn't `/tmp/cc-<id>`, something is wrong with your tmux spawn (likely missing `-c`).
+- **`session_id`** — useful for logging or if you want to correlate with the JSONL filename.
+Fields you ignore:
+- `permission_mode`, `stop_hook_active` — for hook-internal coordination, not delegation logic.
+### SubagentStop deltas
+If you ever configure a `SubagentStop` hook, expect these additional fields:
+```jsonc
+{
+  "agent_id": "def456…",
+  "agent_transcript_path": "/root/.claude/projects/-tmp-cc-foo/abc123/subagents/agent-def456.jsonl",
+}
+```
+The schema is otherwise the same. `agent_transcript_path` is a separate JSONL per sub-agent.
+## The transcript JSONL
+`transcript_path` points at a JSONL file with one JSON object per line. Anthropic does not publish a formal schema — community tools (claudeoo, maury, serac) have reverse-engineered it. What you'll see:
+- **`{ "type": "user", "message": { … } }`** — what you sent claude (or what the upstream parent sent, for sub-agents).
+- **`{ "type": "assistant", "message": { "content": [ … ] } }`** — claude's response. `content` is an array of `{ "type": "text", "text": "…" }` and `{ "type": "tool_use", … }` objects.
+- **`{ "type": "tool_use", "name": "Read", "input": { … } }`** — tool calls claude made.
+- **`{ "type": "tool_result", "tool_use_id": "…", "content": "…" }`** — tool results.
+- **`{ "type": "system", "subtype": "…" }`** — system events: `compact_boundary`, `turn_duration`, `stop_hook_summary`, etc.
+- **`{ "type": "attachment", … }`** — file uploads or contextual attachments.
+To extract claude's final text answer when `last_assistant_message` isn't enough:
+```sh
+# Read every assistant-text content line from the JSONL
+jq -r 'select(.type == "assistant") | .message.content[] | select(.type == "text") | .text' "$transcript_path"
+```
+Filter further by timestamp or message-id if you only want the last turn.
+## Documented race conditions
+Three known races, all from upstream Claude Code issues. The skill body's design avoids them by preferring the sentinel over the JSONL; this is the reasoning if you have to debug:
+1. **Stale transcript on Stop (#15813).** The `Stop` hook can fire before the last assistant message is flushed to the JSONL. If you read `transcript_path` immediately on hook fire, the last message may not be there yet. **Mitigation:** use `last_assistant_message` from the hook's stdin JSON as the primary capture; treat the JSONL as the backup, with a 1–2 second wait if it looks stale.
+2. **Missing transcript file (#20612, #30217).** Some users report `transcript_path` pointing at a file that doesn't exist, especially in multi-session or concurrent-worktree setups. **Mitigation:** capture `last_assistant_message` on every Stop and accumulate it yourself if you need the full history. Falling back to `tmux capture-pane -S -` is the last-resort path.
+3. **Inaccurate final token counts (#27361).** The JSONL has historically missed the final `message_stop` SSE event, causing `output_tokens` to be a mid-stream snapshot (sometimes undercounted by ~2x). **Mitigation:** don't rely on JSONL token counts for cost calculations; the Anthropic Console workspace usage is the authoritative source.
+## Permission prompts vs Stop
+A subtle point that confuses the multi-turn decision loop: **permission prompts do not fire `Stop`**. When claude is waiting for a "Allow this command?" yes/no, the turn isn't over — the model is waiting for the _user_ (you) to type y/n into the TUI, not waiting for a new prompt. So:
+- **Permission prompt appears** → no `Stop`, no `.done`, you keep polling.
+- **You answer the prompt** (via `tmux send-keys "y" Enter`) → claude continues working, eventually finishes its turn → `Stop` fires.
+This is why the multi-turn loop's classification is "ends with question mark / contains 'Do you want me to'" — it's looking for _content-level_ questions claude wrote as part of its response, not permission-tool prompts. Permission prompts don't reach `last_assistant_message`; they only appear in the pane.
+If you need to detect permission prompts (to auto-answer them), `capture-pane` is the only signal. Look for the literal yes/no UI affordance at the bottom of the pane. This is risky to automate — you're answering on the user's behalf for an operation you can't see the full safety implications of. Default behavior in this skill: pause polling for the sentinel, look at the pane after the budget elapses without a `.done`, and if there's a permission prompt sitting there, surface to the user rather than auto-answering.
+## Things you must not do with the Stop hook
+- **Do not set `matcher` to anything other than `"*"`.** The matcher filters by hook tool name; for `Stop`, there's no tool — `"*"` is the canonical "fire on every Stop". Other values may silently never match.
+- **Do not put long-running commands in the hook.** The hook runs synchronously on the Claude Code main loop; a slow hook blocks the user's next prompt. Write the payload + touch a flag + exit. Anything heavier belongs in your polling loop, not the hook.
+- **Do not skip the temp-file rename pattern.** Writing `sentinel.json` directly with `>` lets readers see partial JSON if they poll mid-write. Always `cat > sentinel.json.tmp && mv sentinel.json.tmp sentinel.json`.
+- **Do not delete `transcript_path` from inside the hook.** The path is shared with `SessionEnd` and other lifecycle events; deleting it breaks downstream hooks.
+- **Do not log the full hook payload to a place you don't control.** It contains `last_assistant_message`, which can contain anything claude said — including code, secrets the user pasted, or private context. Sentinel is fine (it's in `/tmp/`); piping to a shared log is not.