npm - agent-afk - Versions diffs - 3.80.2 → 3.80.4 - Mend

agent-afk 3.80.2 → 3.80.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/dist/bundled-plugins/awa-bundled/skills/intent-lock/SKILL.md ADDED Viewed

@@ -0,0 +1,99 @@
+---
+name: intent-lock
+description: "Fires before multi-step work when the user's request contains ambiguous referents ('the text', 'her Y'), characterizations of unverified entities ('the meeting is substantive'), or identity assumptions (which contact = the user). Emits a one-sentence interpretation lock for fast async correction; escalates to Asking only when interpretation gates an irreversible action AND multiple plausible reads exist."
+---
+## When to invoke
+Check for these signal classes before starting any multi-step task:
+**Ambiguous referents** — a noun phrase that could resolve to more than one entity in context:
+- Bare demonstratives with no prior binding: "the text", "the doc", "that email", "the file"
+- Possessives pointing to unintroduced people: "her draft", "their proposal", "his calendar"
+- Ordinals with unclear basis: "the first one", "the latest version", "the other branch"
+**Unverified characterizations** — a status or quality claim about a named entity the agent cannot confirm:
+- "The meeting is substantive" (agent has not read the meeting)
+- "The PR is approved" (agent has not checked)
+- "The last run passed" (agent has not run or read CI output)
+**Identity assumptions** — who is "the user", "me", "us", "the team", or a named person when the identity matters for action:
+- Sending to "me" when multiple contact entries exist
+- Filing under "my account" when credentials are ambiguous
+- "The owner" when ownership is not established in context
+**Code-vs-runtime dual referent** — a term that names an entity in the active codebase AND a model runtime concept. Fires when the user is working on a tool whose vocabulary mirrors the agent's own runtime (parity-mirror projects like agent CLIs, plugin frameworks, or harness tooling):
+- "memory" — the project's memory store OR the agent's own session memory
+- "session" — a project's session class OR the agent's conversation
+- "hooks" — the project's hook registry OR the agent's tool-use hooks
+- "agent" / "subagent" — the project's agent abstraction OR the agent dispatching them
+- "tool" / "skill" / "plugin" — the project's loadable units OR the agent's available tools
+- "MCP" / "terminal" / "REPL" — primitives the project implements that the agent also has
+**Skip when:**
+- Referent resolves unambiguously from immediately prior context (file was just read this turn, entity was named and confirmed, prior message established the binding).
+- Request is reversible, exploratory, and any misread is immediately correctable without cost.
+- User has already provided the interpretation explicitly ("I mean X by 'the text'", "agent-afk's memory", "my session", "this codebase's hooks").
+- Single-step clarification would cost more than simply proceeding and correcting.
+- Dual-referent term has no matching symbol in cwd — no parity risk, standard interpretation applies.
+---
+## Procedure
+### Step 1 — Scan
+Before acting, scan the request for signal classes above. List every ambiguous referent, unverified characterization, identity assumption, or code-vs-runtime dual referent found. If zero are found, skip this skill entirely.
+### Step 2 — Classify each finding
+For each finding, determine:
+- **Plausible reads** — how many distinct interpretations exist given available context?
+- **Gate type** — does this interpretation gate a reversible or irreversible action?
+### Step 3 — Choose resolution mode
+| Condition | Mode |
+|-----------|------|
+| Single plausible read, reversible action | **Lock** — emit interpretation, proceed |
+| Single plausible read, irreversible action | **Lock** — emit interpretation, proceed (but flag) |
+| Multiple plausible reads, reversible action | **Lock** — emit most-likely read with explicit note, proceed |
+| Multiple plausible reads, irreversible action | **Asking** — stop, ask one question |
+**Multiple plausible reads** means two or more interpretations that would produce materially different outcomes. "the text" pointing to one of two equally recent documents = multiple. "the text" when only one document was discussed this session = single.
+### Step 4 — Emit the lock
+**Lock format (most cases):**
+> Interpreting: [ambiguous phrase] → [resolved referent]. Proceeding on that basis — correct me if wrong.
+One sentence. No preamble. Append to the start of the work output, not as a standalone turn. The user can correct asynchronously; work continues.
+**If multiple locks needed:** stack them, one per line, before the work output.
+**Asking format (irreversible + multiple reads only):**
+> [One precise question that resolves the ambiguity.] (This determines [what action follows].)
+One question. State what it unlocks. Do not proceed until answered.
+---
+## Interaction with other skills
+**thesis-lock** fires on first-person thesis before drafting. Intent-lock fires on the user's *request* before any multi-step work. They are complementary: thesis-lock protects the agent's own claims; intent-lock protects the agent's reading of what the user asked.
+**ground-state** fires before implementation to survey repo state. Intent-lock fires before *any* multi-step work on requests with ambiguous inputs — including non-implementation work like drafting, research, or messaging. They can both fire on the same request (ground-state runs after intent-lock resolves).
+**premise-gate** checks named-entity and status-claim pairs during research and analysis. Intent-lock fires *before* work begins on the request itself. They address the same underlying hazard at different points in the pipeline: intent-lock at request intake, premise-gate during execution.
+---
+## Exit criteria
+- Every ambiguous referent, unverified characterization, identity assumption, and code-vs-runtime dual referent is either locked (one-sentence interpretation emitted) or escalated to Asking.
+- No multi-step work begins on a request whose interpretation gates an irreversible action when multiple plausible reads exist.
+- Lock statements are visible in the turn output before the work they govern.
+- Asking state contains exactly one question and states what it unlocks.

package/dist/bundled-plugins/awa-bundled/skills/parallelize/SKILL.md ADDED Viewed

@@ -0,0 +1,9 @@
+---
+name: parallelize
+description: "When finished creating the plan in plan mode, run /parallelize so Claude dispatches one planning agent to transform the current approach into a plan to orchestrate waves of parallel sub-agents."
+---
+## Sub-agent contract
+/contract
+Dispatch a planning sub-agent to transform the current plan into a dependency-aware orchestration plan for waves of parallel sub-agents. Preserve the original goal, infer the necessary decomposition, identify sequential vs parallelizable work, embed TDD into implementation lanes, and avoid unsafe or redundant parallelization. Return an orchestration-ready revised plan. Ensure plan specifies to run tests.

package/dist/bundled-plugins/awa-bundled/skills/refactor/SKILL.md ADDED Viewed

@@ -0,0 +1,154 @@
+---
+name: refactor
+description: "Orchestrates safe, large-scale structural changes across a codebase — symbol renames, API migrations, pattern standardizations, layer restructurings. Enumerates all affected sites, groups them into dependency layers via the DAG executor, applies changes in parallel per layer with worktree isolation, and verifies behavioral preservation at each layer boundary before proceeding. Exits immediately on first regression, surfacing the exact site, diff, and behavioral delta before any commit."
+argument-hint: "<refactor goal> [--scope <glob-or-path>]"
+failure_modes:
+  - bad decomposition
+  - dependency blindness
+  - false completeness
+  - premature execution
+---
+## Sub-agent contract
+/contract
+**Skip when:** the change touches ≤3 files and no dependency ordering is required — just edit directly. Also skip when the codebase has no test coverage at all (behavioral preservation cannot be verified; ask the user to add tests first or proceed manually).
+---
+### Triage (inline, before dispatching any sub-agent)
+Parse `$ARGUMENT` to extract:
+- **Change type**: rename | api-migration | pattern-standardization | layer-restructure | dependency-upgrade | other
+- **Target symbol / path / pattern** (what is changing)
+- **Destination** (what it becomes)
+- **Scope**: default to the entire repo; narrow if `--scope` is provided
+If `$ARGUMENT` is ambiguous on any of the above, stop and ask exactly one question.
+---
+### Wave 1 — Scope enumeration (parallel, `subagent_type: "research-agent"`)
+Dispatch both agents simultaneously in a single response turn.
+**Site-finder agent**
+- goal: Enumerate every file and symbol that must change for the refactor to be complete and correct.
+- inputs: change type, target symbol/pattern, scope glob
+- artifacts:
+  - `sites`: array of `{file, line_range, site_type: "definition"|"import"|"usage"|"type-ref"|"test", change_required: string}`
+  - `dependency_graph`: for each site, which other sites must be changed first (a site depends on its callers if it exports a symbol; a definition must be changed before its imports)
+  - `site_count`: integer
+  - `confidence`: low | medium | high
+  - `coverage_gaps`: what the agent couldn't search (generated files, third-party vendored code, etc.)
+- non_goals: Do not apply any changes. Do not read unrelated files.
+- failure_modes: If grep tooling is unavailable, return `confidence: low` and list what was searched manually.
+**Contract-extractor agent**
+- goal: Identify the public interfaces and test commands that verify behavioral preservation for the affected API surface.
+- inputs: site list (from site-finder, passed inline if available; if not yet available, derive from change type + scope)
+- artifacts:
+  - `contracts`: array of `{symbol, signature_before, exported_by, consumed_by[]}`
+  - `test_commands`: array of shell commands that exercise the affected contracts (e.g., `pnpm test -- --grep "AuthService"`)
+  - `test_coverage_verdict`: "adequate" | "partial" | "absent"
+  - `confidence`: low | medium | high
+  - `coverage_gaps`: test paths not reachable by the identified commands
+- non_goals: Do not run tests. Do not read unrelated files.
+**Scope gate (after Wave 1):**
+- `site_count > 50` → warn the user: "This refactor touches N sites. Recommend scoping down with `--scope` or batching via `/parallelize`. Proceeding, but each layer will take longer."
+- `test_coverage_verdict == "absent"` → hard stop: "No tests cover the affected contracts. Behavioral preservation cannot be verified. Add tests before proceeding, or explicitly confirm you accept unverified risk."
+- Either agent returns `confidence: low` → surface the coverage gaps and ask the user to confirm before proceeding.
+---
+### Synthesize: Build the layer map (inline)
+From `dependency_graph`, compute a topological sort into layers using Kahn's algorithm:
+- **Layer 0** — leaf sites with no dependents (definitions that nothing imports, or sites that depend on nothing else changing first)
+- **Layer N** — sites whose dependencies are all in layers 0…N-1
+If the graph has cycles (rare but possible in circular-import codebases), surface them explicitly: "Cycle detected between [A, B, C]. Manual intervention required before automation can proceed." Do not continue.
+---
+### Wave 2 — Layer-by-layer application (sequential between layers, parallel within each layer)
+For **each layer** (starting at Layer 0):
+Dispatch one sub-agent per site in this layer, all in parallel, each in a worktree-isolated environment:
+**Layer applicator agent** (per site, `isolation: "worktree"`)
+- goal: Apply the transformation at exactly this site and verify it passes its targeted tests.
+- inputs: site descriptor `{file, line_range, change_required}`, full transformation spec (target → destination), test commands that cover this site
+- artifacts:
+  - `site`: `{file, line_range}`
+  - `status`: "green" | "red" | "skipped"
+  - `diff_applied`: the exact unified diff applied (≤30 lines; truncate with `...` if longer)
+  - `test_output`: truncated stdout of the targeted test run (last 20 lines)
+  - `failure_reason`: populated only if `status == "red"` — the first failing assertion + the file:line where it fired
+- non_goals: Do not change any file other than the one in your site descriptor. Do not run the full test suite.
+- IMPORTANT: this sub-agent MAY call `edit_file` and run bash commands; `isolation: "worktree"` is required.
+**Layer gate — hard stop before proceeding to the next layer:**
+- All sites in this layer `status == "green"` → proceed to the next layer.
+- Any site `status == "red"` → **abort the entire refactor immediately**. Do NOT apply any more changes. Surface:
+  - The failing site (`file:line_range`)
+  - The `diff_applied` for that site
+  - The `failure_reason`
+  - The current layer number and which layers (if any) already completed successfully
+  - Recommendation: "Revert layer N changes manually or run `git stash` in each affected worktree, then diagnose with `/diagnose`."
+---
+### Wave 3 — Behavioral diff verification (after all layers green, `subagent_type: "research-agent"` or Bash-capable)
+Dispatch a single behavioral-diff agent:
+**Behavioral-diff agent** (`isolation: "worktree"`, Bash-capable)
+- goal: Run the full test suite and compare observable behavior at the API boundary before vs. after.
+- inputs: `test_commands` from the contract-extractor, plus any project-wide test command (inferred from `package.json`, `Makefile`, etc.)
+- artifacts:
+  - `full_suite_status`: "all-green" | "failures"
+  - `failing_tests`: array of `{test_name, file, failure_reason}` (empty if all-green)
+  - `behavioral_deltas`: array of `{symbol, before_behavior, after_behavior, expected: true|false}` — note only deltas, not noise
+  - `unexpected_deltas`: subset of `behavioral_deltas` where `expected == false`
+  - `confidence`: low | medium | high
+- non_goals: Do not commit. Do not push.
+**Post-Wave 3 decision:**
+- `full_suite_status == "all-green"` AND `unexpected_deltas` is empty → proceed to synthesis, recommend commit.
+- `full_suite_status == "failures"` OR `unexpected_deltas` is non-empty → surface the unexpected deltas for human review. Do NOT commit. Recommend: "Review unexpected behavioral changes before committing. If intentional, update tests to reflect new contracts. If not, route to `/diagnose`."
+---
+### Synthesis: Change report
+Emit a structured change report:
+```
+## Refactor Report
+**Goal:** <change type> — <target> → <destination>
+**Scope:**
+- Sites enumerated: N
+- Layers processed: M (of M total)
+- Status: COMPLETE | ABORTED AT LAYER N
+**Changes applied:**
+| File | Lines | Type | Status |
+|------|-------|------|--------|
+**Tests run:**
+- Targeted (per-site): N runs, N green, N red
+- Full suite: all-green | N failures
+**Behavioral deltas:**
+- Expected: N
+- Unexpected: N (see below if non-zero)
+**Recommendation:** COMMIT | REVIEW REQUIRED | ABORT
+```
+If recommendation is COMMIT, offer to invoke `/ship` for the final commit + PR. If REVIEW REQUIRED or ABORT, do not invoke `/ship` and do not commit.

package/dist/bundled-plugins/awa-bundled/skills/research/SKILL.md ADDED Viewed

@@ -0,0 +1,33 @@
+---
+name: research
+description: "Dispatches two sub-agents in parallel to gather external and local context for the current task."
+---
+## Sub-agent contract
+/contract
+Dispatch two sub-agents in parallel using the Agent tool. Prefer `subagent_type: "research-agent"`; fall back to `subagent_type: "Explore"` with thoroughness "very thorough" if the research-agent is not available. One researches the web for external context relevant to the current task. The other inspects the local working directory for domain-relevant artifacts. Return a concise merged research brief highlighting relevant findings, conflicts, risks, and implications for the task.
+**Web research agent** — always the same: search for external context, prior art, patterns, APIs, and comparable approaches relevant to the task. Domain-agnostic.
+**Local inspection agent** — adapt to the domain:
+| Domain | What to inspect |
+|--------|----------------|
+| `software` | Code, config files, package manifests, CI configs, test suites, git history, README/docs, existing patterns and conventions |
+| `research` | Papers (PDF/LaTeX), notes, data files, citation databases (.bib), lab notebooks, analysis scripts, prior drafts |
+| `design` | Design files (Figma exports, SVGs, mockups), brand guidelines, component libraries, user research docs, style guides |
+| `business` | Financial models, strategy docs, market research, pitch decks, competitive analyses, KPI dashboards, stakeholder maps |
+| *(other)* | Scan the working directory for any files relevant to the stated domain — documents, data, config, scripts — and describe what you find |
+When domain is unspecified, infer it: git repo → software; PDFs/LaTeX/.bib → research; design assets → design; spreadsheets/decks → business. If ambiguous, inspect broadly and note what you found.
+## Coverage reporting
+Both agents must end their response with a coverage assessment:
+- **Coverage confidence**: low / medium / high — how thoroughly could this domain be searched?
+- **Known gaps**: what couldn't be accessed? (proprietary databases, paywalled papers, unpublished work, practitioner-only knowledge)
+- **Tacit knowledge risk**: low / medium / high — is this a domain where critical knowledge is unwritten or not documented online?
+When merging results, surface coverage gaps prominently. If both agents report low coverage, flag: "Low epistemic coverage — findings may be incomplete. Consider consulting domain practitioners or providing access to private sources."

package/dist/bundled-plugins/awa-bundled/skills/review/SKILL.md ADDED Viewed

@@ -0,0 +1,104 @@
+---
+name: review
+description: "Dispatches parallel dimension agents across a diff, PR (URL or number), commit SHA, branch, staged changes, or patch file — covering security, correctness, api-compat, test-coverage, and perf-observability — synthesizes findings by severity, and emits a merge recommendation. Use when changes are ready for review before merge. Read-only: this skill analyzes and reports only — it never edits files, commits, pushes, comments on a PR, or modifies the PR description."
+argument-hint: "[diff|pr-url|pr-number|commit-sha|branch|--staged|--head] [--light] [--change-type hotfix|feature|refactor|dep-bump|new-service]"
+---
+## Read-only — hard constraint
+This skill **analyzes and reports**; it never mutates the repository, the PR/MR, or anything external. After you emit the merge recommendation, **STOP**.
+Never — not for a real bug, not for a blocking defect, not even when there is no human reviewer and "someone has to fix it":
+- edit, create, or delete files (no `write`/`edit`-style mutations);
+- `git add` / `commit` / `stash` / `reset`, `git checkout` to discard changes, or `git push`;
+- `gh pr comment` / `review` / `edit` / `merge` / `create`, or post or edit any PR/MR body, comment, or description;
+- run any other write- or network-mutating shell command.
+The only shell permitted is **read-only inspection**: `git diff` / `git show` / `gh pr diff`, `grep` / `rg`, and file reads — plus dispatching the review sub-agents. Resolving findings, fixing bugs, resolving merge conflicts, and "making the branch mergeable" are explicitly **out of scope**: a fixable defect is a finding to report (`file:line` + a one-line fix in the `suggestion` field), never a license to act.
+## Sub-agent contract
+/contract
+**Skip for:** lock files (`package-lock.json`, `go.sum`, `yarn.lock`), auto-generated files (`*.generated.*`), pure-docs diffs, vendored deps.
+**Resolve target → diff (inline).** The review target argument is: `$ARGUMENT` (empty = review working-tree/HEAD changes). Map this argument to a diff source, then capture the diff text plus a one-line target descriptor for the triage header. Also capture the **reviewed ref** (branch HEAD SHA or equivalent) — this is required for citation verification later:
+- `--staged` → `git diff --staged`; reviewed ref = `git write-tree` (snapshots the staged index to a throwaway tree so citations resolve against the staged content under review, not HEAD)
+- `--head` or no arg → `git diff HEAD`; reviewed ref = `git stash create` (snapshots worktree + index to a throwaway commit so citations resolve against the content under review; empty output = no local changes → fall back to `git rev-parse HEAD`)
+- arg matches `^https?://.*/pull/\d+` (GitHub/GitLab PR URL) → `gh pr diff <url>` (or `glab mr diff`); reviewed ref = head SHA from `gh pr view <url> --json headRefOid -q .headRefOid`; record PR title + base/head refs
+- arg matches `^#?\d+$` (bare PR number, optionally `#`-prefixed) → resolve in current repo with `gh pr diff <n>`; reviewed ref = head SHA from `gh pr view <n> --json headRefOid -q .headRefOid`; if `gh` is unavailable or repo has no PR matching, abort with `Asking` (one question: which repo/PR)
+- arg matches `^[0-9a-f]{7,40}$` (commit SHA) → `git show <sha>`; reviewed ref = `<sha>`
+- arg matches a known ref (`git rev-parse --verify <arg>` succeeds) → `git diff <merge-base>...<arg>` against the repo's default branch; reviewed ref = `git rev-parse <arg>`
+- arg is a path or `*.diff`/`*.patch` file → read file contents as the diff; reviewed ref = `unknown (patch file — no live ref available)`
+- otherwise → abort with `Asking` naming the ambiguous arg
+**Triage (inline).** From the resolved diff extract: change type (hotfix | feature | refactor | dep-bump | new-service), files changed, total lines changed, summary. Classify regime: `light` if ≤300 lines or change type is hotfix/dep-bump; `full` otherwise.
+**Wave 1 — Full review (regime=full, 2 parallel agents, `subagent_type: "research-agent"`).** Dispatch:
+- **security · api-compat** — contracts, auth, injection, breaking changes, secret exposure.
+- **correctness · test-coverage · perf-observability** — logic bugs, missing tests, regressions, hot-path perf, logging gaps.
+Each agent receives: full diff + file tree + triage header + **reviewed ref (SHA)**, the severity rubric, and the finding schema.
+**Citation requirement (enforced per agent):** Before quoting any line in a `blocking` or `high/critical` finding, the agent MUST read that exact line from the branch HEAD using `git show <reviewed-ref>:<file>` (or `gh api /repos/{owner}/{repo}/contents/{path}?ref=<sha>` if outside a git context). The agent must:
+1. State the ref it read from in each finding: `ref: <sha>`.
+2. Distinguish `diff-context` citations (line visible in the diff hunk, no re-read required) from `file-state` citations (line in the post-merge file, re-read required before quoting).
+3. If the line does not exist at that ref, omit the finding entirely — do not paraphrase or reconstruct from memory.
+Banned words: "ensure", "consider", "may", "could". No `file:line` citation → omit the finding.
+**api-compat reachability pre-check (mandatory before surfacing any breaking-change finding).**
+For every symbol flagged as a breaking change, grep production source files (exclude `*.test.*`, `*.spec.*`, `__tests__/`, `__mocks__/`, `/test/`, `/tests/`) for imports or usages of that symbol. Decision table:
+- Zero production importers → downgrade finding to `nit`, append `[UNVERIFIED: no production importers]`, set confidence `low`.
+- One or more production importers → severity stands; include one importer path as evidence.
+If grep tooling is unavailable, tag the finding `[UNVERIFIED: reachability not checked]` and downgrade one severity tier.
+**Absence-claim grounding (mandatory for any claim that something does not exist).** Before emitting a finding of the form "no test covers X", "no handler validates Y", "no caller invokes Z", "X is not tested": grep the production tree (`rg --no-heading -n <pattern>` or `git grep -n <pattern>`) for plausible match strings. Decision table:
+- Zero matches → finding stands; cite the grep command in the evidence field.
+- One or more matches → emit `unverified — absence claim refuted by <path>:<line>` instead of the finding.
+- Grep tooling unavailable or claim cannot be reduced to a pattern → tag finding `[UNVERIFIED: absence not checked]` and downgrade one severity tier.
+This is the agent's first-line self-check; **Wave 1.5 Check B** independently re-verifies any surviving absence claims against the reviewed ref as a backstop.
+**Wave 1 — Light review (regime=light, 1 agent, `subagent_type: "research-agent"`).** Single agent covers all dimensions. Same rubric, schema, and citation requirement.
+**Wave 1.5 — Citation + absence-claim verification (1 agent, `subagent_type: "research-agent"`).** Run after Wave 1 returns, before Wave 2 synthesis. This agent performs two independent checks.
+**Check A — Citation verification.** Extracts every `file:line` citation from any `blocking`, `critical`, or `high` finding across all Wave 1 results. For each citation, runs `git show <reviewed-ref>:<file>` (or equivalent) and checks whether the quoted evidence snippet actually appears at that line in the reviewed ref — not in main, not in diff context alone. Classifies each citation as:
+- `verified` — content matches what is actually at that line on the reviewed ref.
+- `diff-only` — line appears in the diff hunk but no longer exists at the reviewed ref HEAD (e.g., deleted block). Finding must be downgraded: the issue may already be resolved.
+- `fabricated` — line does not exist at the reviewed ref and was not in the diff hunk; the evidence snippet is unverifiable. Finding is **dropped** from the report.
+**Check B — Absence-claim verification.** Extracts every **absence claim** from blocking/critical/high findings — claims of the form "no test covers X", "no handler validates Y", "no caller invokes Z", "X is not tested", "Y has no validation". Citations are not required for absence claims, so Check A cannot catch them; they need their own gate. For each, identify the asserted-absent symbol, test name, or behavior, then run `git grep -n <pattern> <reviewed-ref>` (or `rg --no-heading -n <pattern>` if outside a git context) across the production tree (exclude the same paths as the api-compat reachability check: tests, mocks, fixtures, when the claim is about production code). Classify as:
+- `confirmed-absent` — zero matches in the asserted scope; finding stands.
+- `false-absent` — one or more matches in the asserted scope; finding is **dropped** (the asserted-absent entity exists at the reviewed ref). Name the matching path(s) in the dropped-findings manifest.
+- `grep-unavailable` — tooling missing, symbol ambiguous, or absence claim cannot be reduced to a grep pattern; finding tagged `[UNVERIFIED: absence not checked]` and downgraded one severity tier.
+Returns a combined verification manifest: `[{type: citation|absence, claim, status, finding_id, evidence?}]`. Findings classified `fabricated` (citation) or `false-absent` (absence) are excluded from Wave 2 input. `diff-only` citations are passed to Wave 2 with a `⚠ diff-only citation — line absent at the reviewed ref` annotation and auto-downgraded one severity tier. `grep-unavailable` absence claims are passed through with their `[UNVERIFIED]` tag intact.
+**Wave 2 — Synthesis (1 agent, `subagent_type: "research-agent"`).** Receives: Wave 1 findings **after** citation-verification filtering + manifest of dropped/downgraded citations. Dedup by `(file, line_range, dimension)` — keep highest severity on exact match. Flag cross-agent conflicts as `CONFLICT` blocks (surface both rationales; do not auto-resolve).
+**Severity sort order within the blocking list:** findings tagged with semantics matching `invariant violation`, `defeats stated purpose`, `defeats refactor goal`, or `breaks stated contract` sort above all other `high` findings, even those with higher mechanical severity (e.g. test/build hygiene). Within that group, sort by tier (critical → high). Mechanical findings (missing test, build hygiene) sort last within their tier.
+Sort overall: critical → high → medium → low → nit; security first within tier; semantic/invariant findings above mechanical findings within tier. Template-fill summary block. Emit merge decision: **DO NOT MERGE**, **MERGE**. This is the terminal step — after emitting the decision, STOP. Do not act on any finding: no edits, commits, pushes, or PR/MR mutations. A blocking bug is a finding to report, not a fix to apply.
+**Severity rubric:**
+- `critical` — data loss, auth bypass, secret exposure, RCE. If it cannot cause unauthorized access or data loss, it is NOT critical.
+- `high` — wrong output under reachable conditions (reachable = called from production code, not tests-only)
+- `medium` — missing edge case, perf degraded under load, deprecated API
+- `low` — missing test, unclear error message
+- `nit` — naming, formatting
+Confidence `low` → auto-downgrade one tier + append `[low confidence — verify with runtime context]`.
+**Output per dimension:** if you have read the file(s) at the reviewed ref and have either real findings or a confirmed clean read, emit findings or `no issues found — read <file> at <ref>`. If evidence is insufficient — you could not read the file, the tool was unavailable, no production importers were found for the symbol, or no test file exists at the asserted path — emit `unverified — <reason>` naming the missing evidence rather than invent a finding to fill the slot. Banned words from the hedging list (`ensure`, `consider`, `may`, `could`) remain banned **inside findings**; the `unverified` channel is the sanctioned path for uncertainty.
+**Finding schema:** `severity · confidence · dimension · file:line_range · ref:<sha> · citation-type:(diff-context|file-state) · finding (one concrete sentence naming the failure mode) · evidence (verbatim code ≤4 lines) · suggestion (one concrete fix)`.
+**Epistemic scope disclosure (required in synthesis output).** The "What was not checked" section must include:
+- Which ref(s) Wave 1 agents actually read from (list the SHA or `unknown` if patch-file input). Example: `Read against branch HEAD abc1234 — citations verified at that ref.`
+- If any citations could not be verified against a live ref (patch-file input): `Citation verification skipped — no live ref available; diff-context citations only.`
+- Any topical gaps (e.g. 'did not review Telegram surface', 'did not run tests').
+**Post-synthesis:** if any `critical` or `high` finding is present, invoke `/shadow-verify` on those findings before surfacing to the user. Shadow-verify independently re-derives each top-severity claim against source; fabricated or unsupportable findings drop here before they reach the merge decision. `medium` and below go straight through.

package/dist/bundled-plugins/awa-bundled/skills/shadow-verify/SKILL.md ADDED Viewed

@@ -0,0 +1,38 @@
+---
+name: shadow-verify
+description: "Dispatch a parallel adversarial verifier wave after any high-stakes sub-agent investigation (code reviews, audits, findings reports, large refactors). Shadow verifiers independently re-derive 2–3 key claims from scratch and flag disagreements before the user acts. Use when sub-agent output will drive decisions, file changes, commits, or external side-effects."
+---
+## Sub-agent contract
+/contract
+When a sub-agent (or wave) returns investigation findings, code-review conclusions, audit claims, refactor plans, or counts that will drive user decisions or file changes, do NOT surface the report. Instead, run a shadow verification wave **before** merging.
+**Wave 2 — Adversarial verifiers (parallel, independent):**
+1. Extract 2–3 concrete, re-checkable claims from the returned report (e.g., "X function is unused", "file Y exceeds 300 lines", "PR targets main", "no tests cover Z").
+2. Dispatch one shadow sub-agent per claim, in parallel. Each receives ONLY the claim + the user's original goal — never the original agent's reasoning or cited evidence. **Default to `subagent_type: "research-agent"` (mechanically locked to Read/Grep/Glob/WebFetch/WebSearch — cannot Edit/commit/push).** If the claim requires Bash to verify (running a failing test, `gh pr view`, `git log origin/...`), fall back to a Bash-capable subagent type with `isolation: "worktree"` and prepend this prefix to the prompt: *"Verifier sub-agent — do not Edit, Write, commit, push, `gh pr create`, or `curl`. Return findings only."*
+3. Each verifier re-derives the verdict independently. Returns `{claim, verifier_verdict, evidence_pointer}`.
+**Merge:**
+- All verifiers confirm → surface the original report as validated.
+- Any verifier disagrees → show the original claim alongside the verifier's counter-claim with evidence. Do not act until the conflict is resolved.
+**When to invoke:**
+Any time sub-agent output will drive user decisions, file edits, commits, external side-effects, or is the basis of a user-facing summary.
+**Skip when:**
+Sub-agent ran inside an orchestrator skill that already verifies (`resolve`, `diagnose`, `appmap`); sub-agent returned explicit failure; work was purely exploratory and no decision follows.
+## Appendix: verification methods by domain (non-binding)
+Reference aid for choosing re-derivation methods when dispatching a verifier. Consult when the claim's domain isn't obvious.
+| Domain | Re-derivation methods |
+|--------|----------------------|
+| `software` | Grep, Read, test runs, git commands (`gh pr view`, `git log`, `git diff`), build output |
+| `research` | Web search for citation verification, independent literature re-search, replication/methodology audit, cross-reference checks |
+| `design` | Competitive audit via web search, heuristic evaluation against stated criteria, accessibility/usability re-assessment |
+| `business` | Market comp search, independent financial/metric re-derivation, assumption stress-test via web research |
+| *(other)* | Web search re-derivation, independent source verification, assumption audit — use whatever tools can independently check the claim |
+When domain is unspecified, infer from the claim content.

package/dist/bundled-plugins/awa-bundled/skills/ship/SKILL.md ADDED Viewed

@@ -0,0 +1,128 @@
+---
+name: ship
+description: "Release pipeline for already-done local work. Dispatches /ground-state pre-flight, runs the project test suite, drafts a commit message, pushes, and opens a PR with a structured verification summary. Use when local changes are ready to hand off to review — e.g. 'ship this', 'push and open a PR', 'release this work'. Add --verify to trigger an adversarial verifier wave on the diff before a human reads the PR."
+---
+## Sub-agent contract
+/contract
+Release pipeline for work that is **already done locally**. This skill does NOT build, implement, or fix — if the task needs code, route to `/mint`; if a bug needs diagnosis, route to `/diagnose`. This skill's job is the hand-off from "done locally" to "visible in a PR."
+**Skip when:**
+- Working tree is clean (nothing to ship — tell the user).
+- Current branch is the default branch (`main`/`master`) — ask the user for a feature branch.
+- User invokes with `--dry-run` → produce the plan but take no git actions.
+**Arguments:**
+- `--verify` — force the optional adversarial verifier wave (Phase 6), regardless of diff size.
+- `--draft` — open the PR as a draft.
+- `--dry-run` — run phases 1–5 but stop before Phase 7 (no push, no PR).
+---
+**Hard rules (non-negotiable — these override any inferred convention):**
+- **Branch lock.** Whatever branch is checked out when `/ship` starts is the only branch you may commit and push from. **NEVER** `git checkout` to a different branch during this skill — not to `main`, not to `master`, not to a sibling feature branch. If pre-flight finds you on the default branch, abort. Do not "recover" by switching branches yourself.
+- **Never push to the default branch.** Phase 5 pushes the CURRENT (feature) branch. Phase 8 opens a PR. There is no path through this skill that pushes to `main`/`master` or merges without a PR. If you find yourself about to run `git push origin main` (or equivalent), stop.
+- **User intent is absolute.** If the user said "make a PR," "open a PR," "submit for review," or anything synonymous, you MUST complete Phase 8. "Ship" / "release" alone also implies PR — there is no interpretation under which `/ship` means "skip the PR step."
+- **Do not invent project convention.** Never assert "this project uses direct-to-main flow," "this repo merges direct," or any equivalent justification for bypassing the PR step. If you genuinely cannot tell whether the project uses PRs, run `gh pr list --state merged --limit 5`; ≥3 merged PRs in the last week ⇒ PR is the convention. If still ambiguous, ask the user. Default to PR — it is the safer choice.
+- **Anchor cwd.** When `/ship` is invoked, `pwd` at that moment is the only working directory you operate in. All git commands and file reads must run inside that directory — never `cd` to a sibling worktree, the parent repo, or any other path during this skill. If you find yourself reading state from a path other than the invocation cwd (e.g. `git log` returning unrelated commits, or a file diff that doesn't match what was just edited), stop and re-anchor on the original cwd. Worktrees and sibling repos look almost identical; cwd is the only disambiguator.
+---
+**Phase 1 — Pre-flight (mandatory, runs BEFORE any git mutation).**
+Invoke `ground-state` via the Skill tool. Abort and surface the blocker with remediation steps if ground-state reports any of:
+- Uncommitted changes in files unrelated to the declared scope
+- Branch behind `origin/<default>` (stale — user needs to rebase/merge first)
+- Current branch IS the default branch
+- Wrong-repo context (user's cwd doesn't match the branch's purpose)
+**Phase 2 — Test gate.**
+Detect the project's test harness by scanning, in order:
+1. `package.json` → `scripts.test` (run via the project's package manager: `pnpm`/`yarn`/`npm`)
+2. `pyproject.toml` → `pytest` if configured, or `scripts.test`
+3. `Cargo.toml` → `cargo test`
+4. `go.mod` → `go test ./...`
+5. `Makefile` → `test` target
+If found → run it. Non-zero exit → **abort**; surface the failing output and recommend `/diagnose`. Do not proceed to commit.
+If no harness detected → **warn the user:** "No test harness detected. Ship without running tests?" Require explicit yes. Do not silently skip.
+**Phase 3 — Draft commit message.**
+Read the cumulative diff: `git diff --stat origin/<default>...HEAD` + full `git diff`. Synthesize a commit message:
+- **Subject (≤70 chars):** imperative, Conventional Commits format (`feat`, `fix`, `chore`, `refactor`, `docs`, etc.)
+- **Body:** 2–5 bullets on WHY — the motivation, constraint, or trade-off. Skip the WHAT — the diff speaks for itself.
+Before committing, review the **file list** to stage — don't sweep in untracked config/scratch/secret files. Print the draft message + file list to the user as info-only output, then **immediately** invoke Phase 4. **This is not a gate. Do not ask "does this look good?" Do not wait for approval.** The user surface is one continuous turn: draft → commit → push → PR URL.
+**Phase 4 — Commit.**
+Stage only the confirmed files. Commit via HEREDOC to preserve formatting:
+```
+git commit -m "$(cat <<'EOF'
+<subject>
+<body>
+EOF
+)"
+```
+Never `--amend` (creates a new commit each time). Never bypass hooks (`--no-verify`).
+**Phase 5 — Push.**
+- **Assertion (before any push):** `git rev-parse --abbrev-ref HEAD` must NOT equal the default branch. If it does, abort with the same error as Phase 1 — Branch lock was violated somewhere upstream.
+- Upstream unset → `git push -u origin <branch>`
+- Upstream set → `git push`
+- Non-fast-forward rejection → **abort**; do not force-push. Surface and let the user decide.
+- **Never** `git push origin main` (or `master`). Pushing the feature branch is the only allowed form.
+**Phase 6 — Optional adversarial verifier wave.**
+Trigger when `$ARGUMENT` contains `--verify`, OR when the cumulative diff exceeds **100 changed lines** (rough proxy for "big enough that a human reviewer will miss something"). Skip for diffs under 100 lines unless `--verify` is explicit.
+Dispatch an adversarial verifier sub-agent via the Agent tool. Use `subagent_type: "research-agent"` (read-only, no Edit/Bash-that-mutates). Brief it with:
+- The cumulative branch diff
+- The draft PR body (Phase 7)
+- Instruction: from the diff alone — **without consulting the draft's reasoning** — independently re-derive the 2–3 load-bearing claims the PR body makes. Return per claim: `CONFIRM` / `REFINE` / `CONTRADICT` with file:line evidence.
+If any verifier returns `CONTRADICT` → **surface to the user** before opening the PR. Append the counter-claim to the PR body's Verification section. User decides whether to proceed, revise, or abort.
+**Phase 7 — Synthesize PR body.**
+Structure:
+```
+## Summary
+<2–4 bullets: what changed, why>
+## Test results
+- <command>: <passed/failed + counts>
+<repeat per test invocation>
+## Verification
+<if --verify ran: per-claim verdict + evidence>
+<if skipped: "No adversarial verify requested (diff under threshold).">
+## Out of scope
+<anything touched but not central to this PR, if any; omit section otherwise>
+```
+**Phase 8 — Open PR.**
+```
+gh pr create \
+  --title "<subject line of the commit>" \
+  --body "$(cat <<'EOF'
+<PR body from Phase 7>
+EOF
+)" \
+  --base <default-branch> \
+  [--draft]
+```
+Return the PR URL to the user. Done.
+---
+**Failure modes to watch:**
+- `/ground-state` unavailable → skip Phase 1 with a loud warning; do not proceed silently.
+- `gh` CLI not authenticated → abort Phase 8 with the exact `gh auth login` command.
+- Detached HEAD or shallow clone → abort at Phase 1; these are edge cases `/ship` should not try to auto-recover.