npm - typeclaw - Versions diffs - 0.11.1 → 0.13.0 - Mend

typeclaw 0.11.1 → 0.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (53) hide show

package/README.md +1 -1
package/package.json +1 -1
package/scripts/dump-system-prompt.ts +12 -11
package/src/agent/index.ts +15 -22
package/src/agent/loop-guard.ts +170 -0
package/src/agent/model-fallback.ts +2 -1
package/src/agent/multimodal/index.ts +1 -1
package/src/agent/multimodal/look-at.ts +118 -55
package/src/agent/plugin-tools.ts +57 -0
package/src/agent/subagents.ts +2 -1
package/src/agent/system-prompt.ts +28 -25
package/src/agent/tools/channel-fetch-attachment.ts +45 -16
package/src/agent/tools/normalize-ref.ts +11 -0
package/src/bundled-plugins/reviewer/index.ts +11 -0
package/src/bundled-plugins/reviewer/reviewer.ts +171 -0
package/src/bundled-plugins/reviewer/skills/code-review.ts +73 -0
package/src/bundled-plugins/reviewer/skills/general.ts +68 -0
package/src/channels/adapters/discord-bot-classify.ts +32 -24
package/src/channels/adapters/github/inbound.ts +19 -2
package/src/channels/adapters/kakaotalk-attachment.ts +140 -133
package/src/channels/adapters/kakaotalk-classify.ts +8 -1
package/src/channels/adapters/kakaotalk.ts +19 -11
package/src/channels/adapters/slack-bot-classify.ts +30 -14
package/src/channels/adapters/slack-bot.ts +3 -2
package/src/channels/adapters/telegram-bot-classify.ts +36 -13
package/src/channels/adapters/telegram-bot.ts +3 -3
package/src/channels/outbound-flood-filter.ts +57 -0
package/src/channels/router.ts +93 -5
package/src/channels/types.ts +52 -1
package/src/cli/builtins.ts +2 -0
package/src/cli/index.ts +2 -0
package/src/cli/mount.ts +157 -0
package/src/cli/update.ts +84 -0
package/src/config/mounts-mutation.ts +161 -0
package/src/init/hatching.ts +1 -1
package/src/plugin/index.ts +6 -0
package/src/plugin/load-skill.ts +99 -0
package/src/run/bundled-plugins.ts +2 -0
package/src/run/index.ts +14 -1
package/src/secrets/codex-auth-json.ts +67 -0
package/src/secrets/export-codex-auth-file.ts +243 -0
package/src/secrets/index.ts +6 -0
package/src/server/command-runner.ts +2 -1
package/src/server/index.ts +3 -2
package/src/shared/index.ts +7 -1
package/src/shared/local-time.ts +32 -0
package/src/skills/typeclaw-channel-github/SKILL.md +47 -13
package/src/skills/typeclaw-channel-kakaotalk/SKILL.md +10 -11
package/src/skills/typeclaw-channel-telegram-bot/SKILL.md +8 -0
package/src/skills/typeclaw-codex-cli/SKILL.md +2 -1
package/src/skills/typeclaw-codex-cli/references/auth-flow.md +22 -0
package/src/skills/typeclaw-kaomoji/SKILL.md +116 -0
package/src/update/index.ts +155 -0

package/src/agent/plugin-tools.ts CHANGED Viewed

@@ -31,11 +31,19 @@ import type {
   ToolResult,
 } from '@/plugin'
+import { createLoopGuard, type LoopGuard } from './loop-guard'
 import { checkImageReadRedirect } from './multimodal/read-redirect'
 import type { SessionOrigin } from './session-origin'
 import { webfetchTool } from './tools/webfetch'
 import { websearchTool } from './tools/websearch'
+// Process-wide loop guard. State is keyed by sessionId so concurrent sessions
+// don't interfere; the guard's own LRU bound keeps it from growing without
+// limit. Wrappers consult it before invoking the underlying tool so the
+// detector covers every tool category — plugin tools, TypeClaw system tools,
+// and pi-coding-agent builtins — through one chokepoint.
+let sharedLoopGuard: LoopGuard = createLoopGuard()
 const ACKNOWLEDGE_GUARDS_SCHEMA = Type.Optional(
   Type.Object(
     {
@@ -177,6 +185,11 @@ export function wrapPluginTool(tool: Tool<any>, opts: WrapToolOptions): ToolDefi
         return errorResult(`blocked: ${blockResult.reason}`)
       }
+      const loopDecision = sharedLoopGuard.check(opts.sessionId, opts.toolName, before.args)
+      if (loopDecision.kind === 'block') {
+        return errorResult(loopDecision.message)
+      }
       const toolCtx: ToolContext = {
         signal,
         sessionId: opts.sessionId,
@@ -192,6 +205,10 @@ export function wrapPluginTool(tool: Tool<any>, opts: WrapToolOptions): ToolDefi
         return errorResult(message)
       }
+      if (loopDecision.kind === 'warn') {
+        result = appendLoopWarning(result, loopDecision.message)
+      }
       await opts.hooks.runToolAfter({
         tool: opts.toolName,
         sessionId: opts.sessionId,
@@ -227,6 +244,10 @@ export function wrapSystemTool<TParams extends TSchema, TDetails = unknown, TSta
       if (blockResult !== undefined) {
         throw new Error(`blocked: ${blockResult.reason}`)
       }
+      const loopDecision = sharedLoopGuard.check(opts.sessionId, tool.name, mutableArgs)
+      if (loopDecision.kind === 'block') {
+        throw new Error(loopDecision.message)
+      }
       const guardResult = await runFinalWriteGuards({
         tool: tool.name,
         args: mutableArgs,
@@ -246,6 +267,11 @@ export function wrapSystemTool<TParams extends TSchema, TDetails = unknown, TSta
         content: result.content as ContentPart[],
         details: result.details,
       }
+      if (loopDecision.kind === 'warn') {
+        const warned = appendLoopWarning(hookResult, loopDecision.message)
+        hookResult.content = warned.content
+        hookResult.details = warned.details
+      }
       await opts.hooks.runToolAfter({
         tool: tool.name,
         sessionId: opts.sessionId,
@@ -280,6 +306,10 @@ export function wrapSystemAgentTool<TParams extends TSchema, TDetails = unknown>
       if (blockResult !== undefined) {
         throw new Error(`blocked: ${blockResult.reason}`)
       }
+      const loopDecision = sharedLoopGuard.check(opts.sessionId, tool.name, mutableArgs)
+      if (loopDecision.kind === 'block') {
+        throw new Error(loopDecision.message)
+      }
       const guardResult = await runFinalWriteGuards({
         tool: tool.name,
         args: mutableArgs,
@@ -299,6 +329,11 @@ export function wrapSystemAgentTool<TParams extends TSchema, TDetails = unknown>
         content: result.content as ContentPart[],
         details: result.details,
       }
+      if (loopDecision.kind === 'warn') {
+        const warned = appendLoopWarning(hookResult, loopDecision.message)
+        hookResult.content = warned.content
+        hookResult.details = warned.details
+      }
       await opts.hooks.runToolAfter({
         tool: tool.name,
         sessionId: opts.sessionId,
@@ -340,6 +375,10 @@ export function wrapAgentToolAsCustomToolDefinition<TParams extends TSchema, TDe
       if (blockResult !== undefined) {
         throw new Error(`blocked: ${blockResult.reason}`)
       }
+      const loopDecision = sharedLoopGuard.check(opts.sessionId, tool.name, mutableArgs)
+      if (loopDecision.kind === 'block') {
+        throw new Error(loopDecision.message)
+      }
       const guardResult = await runFinalWriteGuards({
         tool: tool.name,
         args: mutableArgs,
@@ -359,6 +398,11 @@ export function wrapAgentToolAsCustomToolDefinition<TParams extends TSchema, TDe
         content: result.content as ContentPart[],
         details: result.details,
       }
+      if (loopDecision.kind === 'warn') {
+        const warned = appendLoopWarning(hookResult, loopDecision.message)
+        hookResult.content = warned.content
+        hookResult.details = warned.details
+      }
       await opts.hooks.runToolAfter({
         tool: tool.name,
         sessionId: opts.sessionId,
@@ -381,6 +425,19 @@ export function buildBuiltinPiToolOverrides(opts: WrapSystemToolOptions): ToolDe
   return defaultBuiltinPiAgentTools().map((tool) => wrapAgentToolAsCustomToolDefinition(tool, opts))
 }
+function appendLoopWarning(result: ToolResult, message: string): ToolResult {
+  const content: ContentPart[] = [...(result.content as ContentPart[]), { type: 'text', text: message }]
+  return { content, details: result.details }
+}
+// Test-only seam: swaps the shared loop guard for a fresh instance so tests
+// that reuse sessionIds across cases don't see cross-test streak counts.
+// Production code never calls this; the guard's LRU bound handles
+// long-running processes.
+export function __resetSharedLoopGuardForTests(): void {
+  sharedLoopGuard = createLoopGuard()
+}
 function errorResult(message: string) {
   return {
     content: [{ type: 'text' as const, text: message }],

package/src/agent/subagents.ts CHANGED Viewed

@@ -7,6 +7,7 @@ import type { Stream, Unsubscribe } from '@/stream'
 import { type AgentSession, createSession } from './index'
 import { subscribeProviderErrors } from './provider-error'
 import type { SessionOrigin } from './session-origin'
+import { renderTurnTimeAnchor } from './system-prompt'
 import type { ToolResultBudget } from './tool-result-budget'
 type AgentSessionTools = NonNullable<Parameters<typeof createSession>[0]>['tools']
@@ -226,7 +227,7 @@ export async function invokeSubagent(name: string, options: InvokeSubagentOption
         await hooks.runSessionTurnStart({ ...turnEvent, userPrompt: userPromptForTurn })
       }
       try {
-        await session.prompt(userPromptForTurn)
+        await session.prompt(`${renderTurnTimeAnchor()}\n\n${userPromptForTurn}`)
       } finally {
         if (hooks && turnEvent !== undefined) {
           await hooks.runSessionTurnEnd(turnEvent)

package/src/agent/system-prompt.ts CHANGED Viewed

@@ -1,4 +1,4 @@
-import { formatLocalDateTime, resolveLocalTimezoneName } from '@/shared'
+import { formatLocalDateTime, formatLocalWeekday, resolveLocalTimezoneName } from '@/shared'
 export const DEFAULT_SYSTEM_PROMPT = `You are a general-purpose AI agent running inside TypeClaw.
@@ -66,6 +66,8 @@ The bundled \`explorer\` subagent is the right tool for **local** reconnaissance
 The bundled \`scout\` subagent is its external counterpart — web research only. Use it when you need information from public sources (docs, library references, vendor changelogs, news, anything not already in this agent's folder). Scout runs \`websearch\` and \`webfetch\` in a fresh context window so the search churn does not pollute yours; it returns a citation-backed answer with a confidence rating. Prefer scout over running \`websearch\`/\`webfetch\` yourself when the research is non-trivial (more than 1-2 queries) or when you want to save your context for the synthesis step.
+The bundled \`reviewer\` subagent is for **deep read-only analysis** — code review, PR review, plan review, design review, docs review. It runs on the \`deep\` profile (falls back to \`default\` if \`models.deep\` is unconfigured) so it can spend tokens on careful reasoning. It has the read-only filesystem tools, \`bash\` (for \`gh pr diff\`, \`git log\`, \`git diff\`, \`gh api -X GET\`, etc.), and the web tools (for verifying claims against OWASP, RFCs, library docs). It returns a structured \`<review>\` block with findings (severity \`blocker\`/\`concern\`/\`nit\`/\`praise\`, evidence quotes, suggestions) and a verdict (\`approve\`/\`request-changes\`/\`comment\`). Reviewer does NOT post — when reviewing a PR for a channel that wants comments posted, YOU translate its findings into \`gh api\` review-comment payloads and post them yourself. Use reviewer instead of doing review work in your own session whenever the target is non-trivial: a single-file lookup or a one-paragraph sanity check stays with you; a real PR, a multi-page design doc, a non-trivial plan — delegate.
 **Mode B — Delegate-and-converse** (the user asked you to DO something long-running)
 When the user hands you a task that will take minutes (a multi-step browser session, a long build, a complex external operation), acknowledge in plain language ("Alright, running that in the background — I'll let you know when it's done"), spawn one subagent with \`run_in_background: true\`, then KEEP TALKING. Stay available for follow-ups, related questions, parallel small tasks. When the completion reminder lands, weave the result into your next reply naturally. If the conversation has gone idle, proactively message the user with the result rather than waiting.
@@ -123,34 +125,35 @@ export function renderRuntimeBlock(version: string): string {
 TypeClaw runtime version: ${version}.`
 }
-// Wall-clock anchor for the agent. Without this, models hallucinate the
-// current time (typically defaulting to a UTC-shaped guess from training
-// data), which surfaces as confidently-wrong replies like "it's 6am" when
-// the actual wall-clock is 15:11 +09:00. The container's clock is correct
-// — `-e TZ=<host-tz>` propagation makes `new Date()` resolve to host local
-// time — but the model never sees that value unless we put it in the
-// prompt.
+// Wall-clock anchor injected into the **user turn**, not the system prompt.
+//
+// Why per-turn instead of session-creation: long-lived channel sessions can
+// outlive a session-creation timestamp by days (a session opened Friday and
+// woken Thursday morning happily reports "today is Friday" because the only
+// dated reference in its context is the stale stamp). The per-turn anchor
+// always reflects the moment the turn is about to be sent, so the model
+// answers "what day is it" against `new Date()` rather than against the
+// session-creation snapshot.
 //
-// Positioned as the very last block of the system prompt (after memory)
-// because it changes on every session creation, which is more frequent
-// than any other section: memory changes per dreaming/memory-logger cycle,
-// gitNudge changes per session, but `now` changes per second. Pinning it
-// to the tail means every byte UP TO this block stays in the provider's
-// cache prefix across session resurrections, and only the trailing ~60
-// bytes invalidate.
+// Why this still respects the prompt cache: the user turn is the only
+// non-cacheable suffix in every provider's KV cache shape. Putting the
+// anchor here invalidates exactly zero cached bytes — the same bytes that
+// would already be re-billed on each turn's user message — so this is
+// cache-free relative to the previous "## Now" placement.
 //
-// The model still needs to know this is a session-creation snapshot, not
-// a live clock: long-lived channel sessions can outlive the stamp by
-// hours, and the resource loader is not re-rendered per turn (see the
-// CreateSessionOptions doc at the top of src/agent/index.ts). The prose
-// names the snapshot semantics and tells the model how to get a fresh
-// reading when it matters (run `date` via bash).
-export function renderNowBlock(now: Date): string {
+// The block emits both English and Korean weekday names alongside the ISO
+// timestamp because models replying in a non-English language frequently
+// compute weekday-from-ISO incorrectly; pre-computing the weekday in both
+// candidate reply languages removes that arithmetic step entirely. The
+// framing is a single `<current-time>` XML tag for parity with other
+// runtime-injected per-turn blocks the agent already sees
+// (`<system-reminder>` etc.), so the model reads it as a structured anchor
+// rather than as content authored by a human in the chat.
+export function renderTurnTimeAnchor(now: Date = new Date()): string {
   const iso = formatLocalDateTime(now)
   const zone = resolveLocalTimezoneName()
-  return `## Now
-Session started at \`${iso}\` (${zone}). This is a session-creation snapshot, not a live clock — the value above does not advance during this session. If you need the current wall-clock time precisely (e.g. before scheduling a cron, replying with "it's 3pm", or computing a deadline), run \`date\` via bash instead of trusting this stamp; the container's timezone is set to the host's, so \`date\` returns the user's local time.`
+  const weekday = formatLocalWeekday(now)
+  return `<current-time>${iso} (${zone}, ${weekday.en} / ${weekday.ko})</current-time>`
 }
 // Compact replacement for DEFAULT_SYSTEM_PROMPT, used by non-interactive

package/src/agent/tools/channel-fetch-attachment.ts CHANGED Viewed

@@ -8,9 +8,13 @@ import type { ChannelRouter } from '@/channels/router'
 import type { AdapterId } from '@/channels/schema'
 import { type ChannelToolLogger, consoleChannelLogger, formatChannelToolFailure } from './channel-log'
+import { normalizeRef } from './normalize-ref'
 export type ChannelFetchAttachmentOrigin = {
   adapter: AdapterId
+  workspace: string
+  chat: string
+  thread: string | null
 }
 export type CreateChannelFetchAttachmentToolOptions = {
@@ -34,18 +38,16 @@ export function createChannelFetchAttachmentTool({
     name: 'channel_fetch_attachment',
     label: 'Channel Fetch Attachment',
     description:
-      'Download a file the user attached to the inbound channel message and save it to disk. Inbound channel ' +
-      'messages with uploads carry a `[<Platform> message with attachment: <name> (<mime>) <ref>]` summary — pass ' +
-      "the literal `<ref>` value as `ref`. For Slack the ref looks like `id=Fxxxx` (use `Fxxxx`); for Discord it's " +
-      'the full `https://cdn.discordapp.com/...` URL. The tool authenticates with the channel adapter (Slack ' +
-      'url_private requires the bot token; Discord CDN URLs are signed and expire ~24h, so fetch promptly). On ' +
-      'success returns the absolute path of the saved file plus its detected mimetype and size. On failure returns ' +
-      'the upstream error verbatim.',
+      'Download a file the user attached to the current inbound channel message and save it to disk. Inbound channel ' +
+      'messages with attachments show `[<Platform> attachment #N: <kind> <metadata>]` in the text. Pass `N` as ' +
+      '`attachment_id`; do not invent ids that are not present in the inbound message. The router validates the id ' +
+      'against the current turn and resolves the private platform ref itself. On success returns the absolute path ' +
+      'of the saved file plus its detected mimetype and size.',
     parameters: Type.Object({
-      ref: Type.String({
+      attachment_id: Type.Integer({
         description:
-          'Slack: the file id `Fxxxx` (with or without the `id=` prefix). Discord: the full `https://cdn.discordapp.com/...` or `https://media.discordapp.net/...` URL.',
-        minLength: 1,
+          'The number N from the inbound `[<Platform> attachment #N: ...]` placeholder. Must be present in this turn.',
+        minimum: 1,
       }),
       filename: Type.Optional(
         Type.String({
@@ -58,10 +60,38 @@ export function createChannelFetchAttachmentTool({
     async execute(_toolCallId, params) {
       type Details = { ok: boolean; error?: string; path?: string; mimetype?: string; size?: number }
-      const ref = normalizeRef(params.ref)
+      const found = router.lookupInboundAttachment({
+        adapter,
+        workspace: origin.workspace,
+        chat: origin.chat,
+        thread: origin.thread,
+        id: params.attachment_id,
+      })
+      if (found === null) {
+        const validIds = router.listInboundAttachmentIds({
+          adapter,
+          workspace: origin.workspace,
+          chat: origin.chat,
+          thread: origin.thread,
+        })
+        const validMsg =
+          validIds.length === 0
+            ? 'no attachments are present in the current turn'
+            : `valid attachment_ids in this turn: ${validIds.join(', ')}`
+        return errorResult(
+          `no attachment with id=${params.attachment_id} in this turn (${validMsg}). Do not call channel_fetch_attachment for attachments that do not appear in the inbound message — they do not exist.`,
+        )
+      }
+      if (found.ref === '') {
+        return errorResult(
+          `attachment #${params.attachment_id} (${found.kind}) has no fetchable ref — likely a sticker or an upstream payload without a public URL. Acknowledge the user but do not promise to view it.`,
+        )
+      }
+      const ref = normalizeRef(found.ref)
+      const filename = params.filename ?? found.filename
       const result = await router.fetchAttachment(adapter, {
         ref,
-        ...(params.filename !== undefined ? { filename: params.filename } : {}),
+        ...(filename !== undefined ? { filename } : {}),
       })
       if (!result.ok) {
         logger.warn(formatChannelToolFailure('channel_fetch_attachment', `${adapter}: ${result.error}`))
@@ -98,10 +128,9 @@ export function createChannelFetchAttachmentTool({
   })
 }
-function normalizeRef(ref: string): string {
-  const trimmed = ref.trim()
-  if (trimmed.startsWith('id=')) return trimmed.slice(3)
-  return trimmed
+function errorResult(message: string) {
+  const details = { ok: false, error: message }
+  return { content: [{ type: 'text' as const, text: `channel_fetch_attachment error: ${message}` }], details }
 }
 const UNSAFE_FILENAME_CHARS = /[^A-Za-z0-9._-]/g

package/src/agent/tools/normalize-ref.ts ADDED Viewed

@@ -0,0 +1,11 @@
+export function normalizeRef(ref: string): string {
+  const trimmed = ref.trim()
+  // New classifiers store bare Slack file ids; legacy persisted refs (and
+  // anything still hitting the lookup path from older contextBuffer state)
+  // may carry the old prompt-visible `id=Fxxxx` prefix. Strip it here so
+  // both attachment-fetching tools route the same ref through the adapter
+  // callback — without this, `channel_fetch_attachment` would silently
+  // succeed on a legacy ref while `look_at_channel_attachment` would fail.
+  if (trimmed.startsWith('id=')) return trimmed.slice(3)
+  return trimmed
+}

package/src/bundled-plugins/reviewer/index.ts ADDED Viewed

@@ -0,0 +1,11 @@
+import { definePlugin } from '@/plugin'
+import { createReviewerSubagent } from './reviewer'
+export default definePlugin({
+  plugin: async () => ({
+    subagents: {
+      reviewer: createReviewerSubagent(),
+    },
+  }),
+})

package/src/bundled-plugins/reviewer/reviewer.ts ADDED Viewed

@@ -0,0 +1,171 @@
+import { z } from 'zod'
+import {
+  bashTool,
+  createLoadSkillTool,
+  findTool,
+  grepTool,
+  type LoadableSkill,
+  lsTool,
+  readTool,
+  type Subagent,
+  webfetchTool,
+  websearchTool,
+} from '@/plugin'
+import { CODE_REVIEW_SKILL } from './skills/code-review'
+import { GENERAL_REVIEW_SKILL } from './skills/general'
+// The curated set of review-domain skills the reviewer can load on
+// demand via its `load_skill` tool. Order is the order the model sees
+// in the tool description; put the most common case first so the
+// menu's first impression is the right one for the typical caller.
+//
+// Ship list is intentionally small for the first release. Adding a
+// skill is a one-line append here plus a new file under `./skills/`;
+// no runtime change required.
+export const REVIEWER_SKILLS: readonly LoadableSkill[] = [CODE_REVIEW_SKILL, GENERAL_REVIEW_SKILL]
+// TODO(#452): Restrict the reviewer's `bash` to git and a curated set of
+// read-only `gh` subcommands once per-subagent bash allowlist support lands.
+// Today the read-only contract is enforced only by this system prompt, the
+// same way `explorer` enforces its own read-only bash usage. The reviewer
+// inherits TypeClaw's global bash guards (`secret-exfil-bash`, `git-exfil`)
+// but has no positive allowlist. See https://github.com/typeclaw/typeclaw/issues/452.
+export const REVIEWER_SYSTEM_PROMPT = `You are a review specialist running inside TypeClaw. Your job: produce a careful, structured review of a target the caller hands you — a code change, a written plan, a design document, a docs update, a draft argument, or anything else that benefits from another pair of eyes — and return findings the caller can act on.
+You exist to do what \`explorer\` and \`scout\` cannot: deep, model-heavy analysis. Your model has been chosen for quality, not speed — spend tokens on thinking. Read carefully. Cross-check. Form a real opinion.
+=== READ-ONLY — NO SIDE EFFECTS ===
+You are STRICTLY PROHIBITED from:
+- Creating, modifying, or deleting files (no write/edit tools available)
+- Posting to GitHub, Slack, Discord, email, or any channel — the parent owns posting
+- Pushing, merging, rebasing, or otherwise mutating remote state
+- Using bash for: mkdir, touch, rm, cp, mv, git add, git commit, git push, git rebase, git reset, npm install, pip install, or any write operation
+- Spawning further subagents — you are at the end of the delegation chain
+Your role is EXCLUSIVELY to analyze and report. The parent agent decides what to do with your findings.
+## Tools
+The runtime exposes these tools to you by these EXACT names — call them by name, do not paraphrase:
+- \`read\` — read a file when you know the path
+- \`grep\` — search file contents by text or regex
+- \`find\` — locate files by name pattern
+- \`ls\` — list a directory's immediate contents
+- \`bash\` — read-only commands ONLY. Read-only \`git\` (\`git log\`, \`git diff\`, \`git show\`, \`git blame\`, \`git status\`, \`git grep\`, \`git rev-parse\`, \`git ls-files\`, \`git cat-file\`) and one-shot pipelines that do not mutate state (\`cat\`, \`head\`, \`tail\`, \`wc\`, \`sort\`, \`uniq\`, \`jq\`, \`yq\`). For platform-specific reads (a PR diff, a vendor API), use the canonical read-only invocation of the platform's CLI and consult your loaded skill for which subcommands are appropriate.
+- \`websearch\` — search the public web (e.g. for OWASP guidance, RFCs, library changelogs, framework docs, prior art)
+- \`webfetch\` — fetch a single URL (e.g. to read a linked spec, vendor doc, or article cited in the target)
+- \`load_skill\` — load a curated review skill by name. See the section below.
+Launch independent tools in parallel. A finding backed by reading the artifact AND a primary source AND an adjacent piece of context is stronger than any one of them alone.
+## Loading a review skill
+You are domain-neutral. Specific review craft — what to look for in code, in a plan, in a design, in docs, in a piece of writing — lives in dedicated skills you load on demand.
+The first thing you do for any review is:
+1. **Read the payload and identify the target's domain.** What kind of artifact is this? A pull request? A design doc? An RFC? A plan? A piece of marketing copy? Inspect the payload, glance at the target if necessary (one \`read\` or one \`gh pr view\` is fine), then decide.
+2. **Call \`load_skill\` with the matching skill name.** The \`load_skill\` tool's description lists the available skills and what each is for — pick the one whose description fits the target. If none of the domain skills fit, load \`general\`.
+3. **Apply that skill's guidance on top of the universal contract below.** The skill tells you what to look for in this domain, what to ignore, and how to map severity for this kind of artifact. The universal output contract (severity, evidence, suggestion, verdict, \`<review>\` block) does not change.
+You can load more than one skill if the target genuinely spans domains (e.g. a design doc with code examples — load \`design\`-something AND \`code-review\`). Do this sparingly; each extra skill loaded costs context for marginal gain.
+Do NOT proceed past step 1 without loading a skill unless you have explicitly decided that no domain skill applies AND that the universal contract alone is sufficient. State the decision in your \`<summary>\` if you take this path.
+## Universal review philosophy
+These rules apply to every review regardless of domain.
+1. **Form findings, not opinions.** Each finding is one issue. State severity (\`blocker\` / \`concern\` / \`nit\` / \`praise\`). Cite specific evidence — a file:line, a diff hunk, a quoted passage. Suggest a concrete alternative.
+2. **Evidence is mandatory.** If you cannot point at a specific location and quote the offending content, the finding is too vague — sharpen it or drop it.
+3. **Verify external claims.** If the target cites a spec, RFC, library behavior, benchmark, prior art, or "common practice", look it up with \`websearch\`/\`webfetch\` before agreeing or disagreeing. Cite the source in the finding.
+4. **One finding, one concern.** Do not bundle unrelated issues into a single finding. The parent parses findings; mixed-concern findings break that.
+5. **Praise is rare.** Call out non-obvious good work — a tricky invariant carefully preserved, a clear name for a subtle concept, a test that catches an easy-to-miss regression. Do not pad reviews with positivity.
+6. **No generic LLM review noise.** "Consider adding tests" / "improve error handling" / "use better variable names" with no specific location to point at is noise. If you cannot point at a line, do not raise the finding.
+7. **Do not restate the target.** "This function reads a file" is not a finding. "This document discusses X" is not a finding.
+8. **Respect settled conventions.** Style/formatting that a formatter would catch (\`prettier\`, \`oxfmt\`, \`gofmt\`, \`black\`, \`ruff\`, etc.) is not your concern. Project conventions that the target follows are not findings; only deviations are.
+## Severity scale (universal)
+- \`blocker\` — Must fix before this lands. Correctness defect, security hole, broken contract, fatal logical error, deal-breaking design flaw, audience-fit problem so severe the artifact cannot be used.
+- \`concern\` — Should fix. Likely-bad outcome, unsupported load-bearing claim, missing test on new behavior, convention violation that will compound, ambiguity that will mislead.
+- \`nit\` — Optional. Style, naming, micro-improvement. The author can decline; do not push back.
+- \`praise\` — Non-obvious good design or careful work worth calling out. Rare on purpose.
+The loaded skill may refine what counts as each severity for its domain.
+## Output discipline
+End every response with a single \`<review>\` block. Use this exact structure:
+<review>
+<summary>
+[One paragraph: what the target is (in your words), what it is trying to achieve, your overall read. Name the skill(s) you loaded and why. If the target is too large to review meaningfully in one pass, say so here and propose a chunking strategy; produce findings for what you did review.]
+</summary>
+<findings>
+  <finding severity="blocker|concern|nit|praise" location="path/to/file.ts:42, diff hunk, paragraph reference, or general">
+    <issue>One-sentence statement of the problem.</issue>
+    <evidence>Specific quote from the target or a brief description of the observed behavior.</evidence>
+    <suggestion>Concrete fix: what to do instead.</suggestion>
+  </finding>
+  <!-- Repeat per finding. Order: blocker > concern > nit > praise. -->
+</findings>
+<verdict>approve | request-changes | comment</verdict>
+</review>
+\`approve\` = no blockers; concerns are minor or already addressed.
+\`request-changes\` = at least one blocker, or a load-bearing concern that needs an answer before this lands.
+\`comment\` = neither — useful observations without a clear approve/reject signal (typical for early drafts, exploratory documents, partial reviews).
+## Rules
+- Every path you cite MUST be absolute (start with \`/\`) when reviewing local files. PR-diff locations use the diff's own \`path:line\` form. Document references quote the passage.
+- If the target requires information you cannot access (a private system, a file outside this checkout, the caller's stated intent), say so explicitly in \`<summary>\` and review what you can.
+- If you cannot identify the target at all from the payload, return one \`blocker\` finding asking the caller to clarify the target, and a \`comment\` verdict.
+You have one shot. The parent receives your final assistant message verbatim — make it complete and self-contained.`
+export const reviewerPayloadSchema = z
+  .object({
+    requestId: z.string().optional(),
+    prompt: z.string().optional(),
+    description: z.string().optional(),
+  })
+  .passthrough()
+export type ReviewerPayload = z.infer<typeof reviewerPayloadSchema>
+export function createReviewerSubagent(): Subagent<ReviewerPayload> {
+  const loadSkillTool = createLoadSkillTool({
+    skills: REVIEWER_SKILLS,
+    description: `Load a curated review skill by name. Each skill explains what to look for in one kind of artifact (code, plan, design, docs, etc.) and refines the universal severity scale for that domain. Call this BEFORE forming findings so your review is grounded in the right craft, not generic prose.
+Available skills:
+${REVIEWER_SKILLS.map((s) => `- \`${s.name}\` — ${s.description}`).join('\n')}
+If none of the listed skills fit the target, load \`general\` and explain in \`<summary>\` why no domain skill applied.`,
+  })
+  return {
+    systemPrompt: REVIEWER_SYSTEM_PROMPT,
+    // `deep` is a conventional profile name (see src/config/config.ts). If the
+    // user has not configured `models.deep` in typeclaw.json, `resolveProfile`
+    // falls back to `default` with a one-time warning — safe degradation.
+    profile: 'deep',
+    tools: [readTool, grepTool, findTool, lsTool, bashTool, websearchTool, webfetchTool],
+    customTools: [loadSkillTool],
+    payloadSchema: reviewerPayloadSchema,
+    visibility: 'public',
+    inFlightKey: (payload) => payload?.requestId ?? `anon-${Date.now()}-${Math.random().toString(36).slice(2, 8)}`,
+    toolResultBudget: {
+      // Higher than explorer (256KB) because a reviewer typically reads larger
+      // diffs and multiple files plus web sources; lower than operator (1MB)
+      // because we are read-only and producing analysis, not building.
+      maxTotalBytes: 512_000,
+      toolNames: ['read', 'grep', 'find', 'ls', 'bash', 'websearch', 'webfetch', 'load_skill'],
+    },
+  }
+}

package/src/bundled-plugins/reviewer/skills/code-review.ts ADDED Viewed

@@ -0,0 +1,73 @@
+import type { LoadableSkill } from '@/plugin'
+export const CODE_REVIEW_SKILL_NAME = 'code-review'
+export const CODE_REVIEW_SKILL_DESCRIPTION =
+  'Review code: a pull request, a commit, a single file, or a module. Covers correctness, security, architecture fit, test coverage, performance, error handling, API surface, naming, and project conventions.'
+export const CODE_REVIEW_SKILL_CONTENT = `# code-review
+You have been asked to review code. Apply this guidance on top of the reviewer's neutral output contract (severity-tagged findings, evidence quotes, suggestions, verdict).
+## How to acquire the target
+- **PR URL or number** — fetch the diff and the description:
+  - \`gh pr diff <n>\` for the unified diff
+  - \`gh pr view <n>\` for title, body, labels, linked issues, checks
+  - \`gh api /repos/<owner>/<repo>/pulls/<n>\` for the structured payload when you need machine-readable fields
+- **Commit SHA** — \`git show <sha>\` and \`git show <sha> --stat\` for the scope.
+- **File path / module path** — \`read\` the file directly; \`ls\` the parent directory to understand its neighbors; \`grep\` for callers of any function the file exports.
+- **Branch name** — \`git log <branch> ^main --oneline\` to enumerate commits, then \`git diff main...<branch>\` for the cumulative change.
+## How to build context
+A finding without context is noise. Before forming findings:
+1. **Read the change description.** PR body, commit messages, linked issues. The author told you what they intended — verify the code matches.
+2. **Read adjacent code.** A change to one function means reading callers and callees. A change to a class means reading the rest of the class and its subclasses.
+3. **Read the project's conventions.** \`AGENTS.md\`, \`CONTRIBUTING.md\`, \`CLAUDE.md\`, \`README.md\`, the test layout, the linter config. Deviation from established convention is a finding worth raising; following convention is not worth praising.
+4. **Read the tests.** Existing tests show what the project considers important to verify. New tests show what the author considers important to lock in. The gap between them is often where the bugs hide.
+## What to look for
+Prioritize in this order:
+1. **Correctness.** Does the change do what its description claims? Off-by-one errors, missing null/undefined handling, race conditions, incorrect error propagation, broken invariants.
+2. **Security.** Injection vectors (SQL, shell, HTML), missing authz/authn checks, secret leakage in logs or error messages, unsafe deserialization, SSRF, path traversal, time-of-check-time-of-use. Cite OWASP / CWE / RFC by number when relevant; verify with \`websearch\` or \`webfetch\` before asserting.
+3. **Architecture fit.** Does the change respect existing layering? Does it introduce a new dependency where the existing pattern would have worked? Does it duplicate logic that already exists elsewhere in the repo?
+4. **Test coverage.** New behavior should have new tests. Edge cases the description names should be tested. If existing tests were deleted or skipped, that is a blocker absent a stated reason.
+5. **Error handling.** Empty catch blocks, swallowed errors, errors converted to silent fallbacks, retry loops without bounded backoff, missing timeouts on external calls.
+6. **Performance.** Quadratic loops in hot paths, missing indexes, unbounded memory accumulation, N+1 queries, blocking I/O in async hot paths. Performance findings need evidence: cite the loop, the data scale, the actual hot path. "Could be slow" without evidence is not a finding.
+7. **API surface.** Breaking changes to exported types, function signatures, CLI flags, env vars, on-disk schemas. Are they documented? Versioned? Migration noted in CHANGELOG / release notes?
+8. **Naming.** Names that lie (a function called \`getUser\` that mutates), names that hide intent (\`data\`, \`info\`, \`tmp\`), names that don't match the project's vocabulary.
+## What NOT to find
+- **Formatter / linter territory.** If the project has \`prettier\`, \`oxfmt\`, \`gofmt\`, \`black\`, \`ruff\`, \`eslint\`, etc., assume it ran. Do not raise spacing, trailing commas, single-vs-double quotes, line length, or import order.
+- **Settled convention objections.** If the project uses tabs, four-space indent, camelCase vs snake_case, etc., and the change matches, that is not a finding. Only the deviation is.
+- **Generic best-practice essays.** "Consider adding more tests" without naming a specific untested branch is noise. "Improve error handling" without pointing at a specific swallowed error is noise.
+- **Restating the code.** "This function reads the file and returns its contents" is not a finding.
+## Severity hints specific to code
+- **blocker** — Correctness bug that will misbehave for users. Security vulnerability. Broken backward compatibility without migration. Crashing path on common input. Deleted tests without justification.
+- **concern** — Likely-bad outcome that hasn't bitten yet (missing timeout, unbounded retry, edge case ignored). Test gap on the new behavior. Architectural deviation that compounds.
+- **nit** — Naming, micro-readability, suboptimal-but-correct code. Optional. The author can decline and you should not push back.
+- **praise** — Non-obvious good design: a tricky invariant carefully preserved, a test that catches a subtle regression, a name that captures the domain precisely. Rare on purpose.
+## Verdict mapping
+- **approve** — Zero blockers. Concerns are minor, isolated, or already discussed.
+- **request-changes** — At least one blocker, OR a load-bearing concern that needs an answer before this lands.
+- **comment** — Mixed signal: useful observations without a clear approve/reject. Common on large refactors where you reviewed part of the change, or on early-draft PRs where the author asked for direction more than approval.
+## Final output
+Return findings inside the reviewer's neutral \`<review>\` block. Do NOT invent your own output format. The parent agent parses the structured shape.
+`
+export const CODE_REVIEW_SKILL: LoadableSkill = {
+  name: CODE_REVIEW_SKILL_NAME,
+  description: CODE_REVIEW_SKILL_DESCRIPTION,
+  content: CODE_REVIEW_SKILL_CONTENT,
+}