npm - pi-diffwarden - Versions diffs - 0.26.1 - Mend

pi-diffwarden 0.26.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/CHANGELOG.md +755 -0
package/LICENSE +21 -0
package/README.md +846 -0
package/extensions/diffwarden/index.ts +84 -0
package/package.json +31 -0
package/skills/diffwarden/SKILL.md +2428 -0
package/skills/diffwarden/commands/diffwarden.md +22 -0
package/skills/diffwarden/commands/dw.md +22 -0
package/skills/diffwarden/prompts/diffwarden.md +3 -0
package/skills/diffwarden/prompts/dw.md +3 -0

package/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,755 @@
+# Changelog
+All notable changes to Diffwarden are documented here.
+Format follows Keep a Changelog style. Version tags use SemVer.
+## [0.26.1] - 2026-06-24
+### Added
+- Added npm package metadata for publishing the Pi extension package as
+  `pi-diffwarden` with the `pi-package` keyword.
+- Added npm install instructions for Pi Agent (`pi install npm:pi-diffwarden@0.26.1`).
+- Added `npm pack --dry-run` and package-publication checks to CI.
+## [0.26.0] - 2026-06-24
+### Added
+- Added optional Pi package extension (`extensions/diffwarden/index.ts`) that
+  registers native `/dw` and `/diffwarden` commands, forwards to
+  `/skill:diffwarden`, provides basic argument completions, and discovers the
+  bundled skill.
+- Added `package.json` Pi package manifest so Pi can install Diffwarden from
+  git/local paths.
+- Added CI static checks for the Pi package extension manifest and wrapper.
+- Documented Pi extension install, behavior, and security warning in README and
+  `SKILL.md`.
+### Kept
+- Installer still writes only Pi skills/prompts; no extension auto-install
+  through `install.sh`.
+- Safety stance unchanged: no auto-merge, no force-push, no blind push, no
+  CI/test weakening, no resolving human comments without approval.
+## [0.25.0] - 2026-06-16
+### Added
+- **Evidence-Based Findings** in `SKILL.md`: actionable findings require an
+  anchor (`file:line`, check name, PR field, or comment/thread id) plus a
+  verbatim quote or diff hunk; low-confidence guesses cannot be P0/P1 without
+  local proof.
+- **Hallucination Guard** expanded beyond `How to test` to cover findings, fix
+  plans, PR comments, and thread replies.
+- Grounded verification discovery: use manifest/workflow targets only when they
+  exist; never invent runners.
+- Structured `verify: pass|fail|skipped` block in `--verbose` loop output.
+- Fix-plan rules: `Will change` / `Will run` must trace to diff/read files and
+  discovered commands only.
+- Delegated-read findings cross-linked to evidence rules; verification checklist
+  updated.
+## [0.24.1] - 2026-06-15
+### Fixed
+- Restored the mandatory final status lines for lean, verbose, status, and PR
+  comment output. Final reviews now end with `Status:` followed by `Level:`,
+  without extra final fields or headings.
+## [0.24.0] - 2026-06-13
+### Added
+- Added `loop` as the primary review-fix-verify command.
+- Added `workspace` target for non-git and no-branch workspace review.
+- Added auto-fallback to workspace mode when no git repo, no branch, or no PR exists.
+- Added document review mode for plans, docs, guides, tutorials, specs, and markdown files.
+- Added lean default loop output using `cN/5` progress lines.
+- Added `--mvp` to stop at `c4/5` when only P3/info items remain.
+- Added `--verbose` for the full detailed report.
+- Added `--orchestrate` for optional reviewer/fixer role split.
+- Added orchestration model defaults via global config and project override.
+- Added safe config precedence for optional orchestration.
+- Added short PR comment format: Findings, Status, Level.
+- Added explicit PR comment posting safety: approval, head-SHA recheck, dedupe, and COMMENT-only reviews.
+- Added workspace edit backups before non-git workspace fixes.
+- Added document backups before document fixes.
+- Added Pi Agent README install guide.
+- Added optional Pi Agent installer target (`--pi`, `--pi-root`).
+- Added Pi prompt-template aliases for `/dw` and `/diffwarden`.
+- Added `docs/orchestration.md` and linked it from README.
+- Added agentic implementation safety guidance for multi-agent worktrees, file ownership, merge gates, and conflict policy.
+### Changed
+- Recommended global install by default because Diffwarden is a machine-wide reviewer/fixer.
+- Simplified visible command surface to `review`, `loop`, `status`, `comment`, and `help`.
+- Kept `fix`, `prepare`, and `security` as compatibility aliases.
+- Changed default loop behavior to local edits only unless `--commit` or `--push` is passed.
+- Reduced default review and loop output to minimize tokens.
+- Moved detailed reports behind `--verbose`.
+### Fixed
+- Non-git folders no longer block review.
+- Git workspaces with no branch or detached HEAD no longer block workspace review.
+- Git workspaces with no current PR now fall back to local/workspace review unless PR behavior was explicit.
+- Missing `gh` no longer blocks non-PR review.
+- Document review no longer requires git.
+### Kept
+- No auto-merge.
+- No force-push.
+- No destructive git cleanup.
+- No weakening CI/tests/lint/auth/secrets.
+- No resolving human comments without explicit approval.
+- Normal `/dw loop` remains single-agent unless `--orchestrate` is passed.
+## [0.23.2] - 2026-06-10
+### Changed
+- Dropped unshipped `skills/diffwarden/prompts/` templates (never in a tagged
+  release). Codex 0.117.0 removed custom prompts upstream (`/prompts:dw`,
+  `/prompts:diffwarden`); the installer no longer references prompt files.
+## [0.23.1] - 2026-06-10
+### Fixed
+- Codex CLI docs and installer now match current Codex behavior (≥ 0.117.0):
+  invoke Diffwarden with `$diffwarden` or `/skills`, not `/dw`, `/diffwarden`, or
+  `/prompts:*`. Custom prompts in `~/.codex/prompts/` were removed upstream;
+  `install.sh` no longer copies prompt files there.
+- README adds a **Codex CLI** section listing supported vs unsupported invocation
+  paths and why (built-in-only `/` menu, deprecated/removed custom prompts,
+  `.agents/skills` as the skill path).
+- `SKILL.md` Slash Commands section now documents per-agent invocation and parses
+  `$diffwarden` the same as `/diffwarden` / `/dw`.
+## [0.23.0] - 2026-06-10
+### Fixed
+- Corrected Codex install support to match current Codex docs: Codex skills now
+  install to `.agents/skills/diffwarden/` or `~/.agents/skills/diffwarden/`,
+  not `.codex/skills/diffwarden/`.
+- Removed the unsupported `.codex/commands/` path. Codex CLI prompt aliases now
+  install to `~/.codex/prompts/` and are invoked as `/prompts:dw` or
+  `/prompts:diffwarden`.
+- README and `SKILL.md` now distinguish Claude Code/Cursor slash-command files
+  from Codex CLI skills and prompt aliases.
+## [0.22.0] - 2026-06-10
+### Added
+- **Codex installer support.** `install.sh` now detects `.codex` / `~/.codex`,
+  accepts `--codex`, and copies the skill plus `/dw` / `/diffwarden` command
+  files to `.codex/skills/diffwarden/` and `.codex/commands/`.
+- README and `SKILL.md` now document Codex as a first-class install target,
+  including manual-copy paths and `/dw` troubleshooting.
+### Changed
+- Installer safety guard now allows writes only under `.claude/`, `.codex/`,
+  and `.cursor/`, preserving the no-`sudo`, no-outside-config-dir stance.
+- README local-mode wording now matches the skill: `prepare local` is valid;
+  only `status` and posting/push flags are rejected with a local target.
+## [0.21.0] - 2026-06-08
+### Added
+- **Web-Augmented Review (`--web`, alias `--research`; slash `--web`) — opt-in,
+  human-gated web grounding.** Off by default. When enabled *and* genuinely
+  uncertain (a low-confidence finding, a time-sensitive CVE/advisory/best-practice
+  question, or a user-requested deep review), Diffwarden may consult the web to
+  ground a single finding — but only after a per-finding `[y/N]` consent prompt it
+  **waits** on, and only with a redacted minimal finding descriptor. Two gates,
+  both required: the `--web` flag, then per-finding human consent. It never
+  auto-searches silently and never batch-approves.
+- Findings are marked `web-verified` (a consented search grounded it; URL cited)
+  or `local-only` (the default). Web grounding never auto-raises severity and
+  never bypasses a safety cap (P0/security still caps at `1/5`, needs-user at
+  `3/5`) — severity and confidence stay Diffwarden's own judgment.
+- **Data-egress guard.** A web query carries only the abstract finding descriptor;
+  repo source, diff hunks, secrets, tokens, file paths, and internal names are
+  never sent. The exact redacted query is shown in the consent prompt, and the
+  data-exfiltration/scope risk is noted in the finding's rationale.
+- Valid on `review` / `fix` / `prepare` / `security` (code targets, incl.
+  `local` / `staged` / `worktree`) and compatible with `--dry-run` /
+  `--security-focus`; **rejected** on `status` (snapshot only) and plan mode
+  (`--as-plan` or a `.md` plan target). New "Web-Augmented Review (opt-in)"
+  section; wired into Inputs, the slash grammar + flag-mapping table + help
+  output, Invalid-combinations rows, the Confidence Score / Classification flow,
+  the Loop Algorithm, Dry Run Mode, the Final Report, Common Pitfalls, and the
+  Verification Checklist. Synced the `/dw` and `/diffwarden` command files and the
+  README (new "Web-augmented review (opt-in)" section, command/flag tables, TOC).
+### Kept
+- Full safety stance unchanged: no auto-merge, no force-push, no blind push, no
+  weakening of CI/tests/lint/auth/secrets, and no resolving human comments without
+  explicit approval. Web access is off by default and never silent; the help-path
+  version check remains the only other network call and is unaffected by `--web`.
+## [0.20.0] - 2026-06-07
+### Changed
+- **Collapsed the plan subcommand surface into auto-detected `review` / `fix`.**
+  There is now **one** `review` and **one** `fix`; each classifies its *target*
+  and selects the matching internal mode — a PR / `#num` / URL / `current` /
+  `local` / `staged` / `worktree` → **code** mode; a single prose `.md` plan
+  (headings/sections, no diff payload) → **plan** mode. The code-review and
+  plan-review rubric logic is unchanged — only the entrypoint collapses.
+- Mixed signals (e.g. a PR ref *and* a `.md` plan path, or a `.md` carrying diff
+  hunks) → Diffwarden **asks** which mode and states that the default is **code**;
+  it never silently guesses.
+### Added
+- **`--as-code` / `--as-plan` override flags** on `review` / `fix` to force the
+  mode past the detector. They are mutually exclusive, and `--as-plan` is rejected
+  on a PR / `local` / `staged` / `worktree` target (not a plan document).
+- **Mandatory mode banner.** Every `review` / `fix` run prints the auto-selected
+  mode before working: `detected: code review | plan review | code fix | plan fix`.
+- Updated the grammar, Target Auto-Detection section, subcommand and flag-mapping
+  tables, expansion examples, Invalid-combinations table, help output, Plan
+  Review/Fix Mode triggers, How-to-Test scope, and the Verification Checklist.
+  Synced the `/dw` and `/diffwarden` command files and the README (new
+  "Auto-detected mode (code vs plan)" section, command/flag tables).
+### Kept
+- **Hidden back-compat aliases.** `review-plan <filepath>` ≡ `review <filepath>
+  --as-plan` and `fix-plan <filepath>` ≡ `fix <filepath> --as-plan` are still
+  accepted — expanded internally, not advertised in `help`.
+- The full safety stance is unchanged: no auto-merge, no force-push, no blind
+  push, no weakening of CI/tests/lint/auth/secrets, and no resolving human
+  comments without explicit approval. Plan mode still touches no PR, git, or code
+  (plan `review` is read-only; plan `fix` edits only the plan file).
+## [0.19.0] - 2026-06-06
+### Added
+- **`fix-plan <filepath>` — Plan Fix Mode.** The edit counterpart to
+  `review-plan`: it runs the same plan critique, then **revises the plan file in
+  place** to address findings, looping review → revise → re-score until
+  plan-readiness `5/5` or `--max-iterations` (default `5`, hard max `5`).
+- Before the first edit it backs up the original to `<filepath>.orig` (and never
+  overwrites an existing backup — it falls back to `<filepath>.orig.N`). It edits
+  **only the plan file**: no code, no git, no commit, no push. Needs-user findings
+  are left flagged, never invented, and the plan is never weakened to raise the
+  score.
+- Flags: `--security` deepens the security pass; `--delegate` may digest a long
+  plan under the grounding contract; `--max N` bounds the loop. `--comment` /
+  `--reply` / `--resolve` / `--push` / `--dry-run` and any `<pr>` / `local` target
+  are rejected.
+- Reports `Plan-readiness: N/5 (checks: n/a (plan))`, the backup path, and
+  `Iterations: N/M`. Updated the grammar, subcommand table, expansion examples,
+  Invalid-combinations table, help output, Final Report notes, and Verification
+  Checklist. Added `review-plan` and `fix-plan` to the README command reference.
+## [0.18.0] - 2026-06-06
+### Added
+- **`review-plan <filepath>` — Plan Review Mode.** A new subcommand that
+  critiques a plan/design document *before* any code is written: completeness,
+  ordering & dependencies, ambiguity, scope, risk (destructive/irreversible
+  steps), security, per-step verification, rollback/failure handling, grounding
+  (do the files/commands/symbols the plan names actually exist?), and unstated
+  assumptions.
+- Plan Review Mode is **read-only**: no PR, no git operations, no code edits, no
+  fix loop. It reads the plan and (read-only) the files it references to ground
+  the critique, never rewrites the plan, and reports a `0–5` plan-readiness score
+  (`ready | needs revision | blocked | user decision needed`).
+- Flags: `--security` deepens the security pass; `--delegate` may digest a long
+  plan under the grounding contract. `--comment` / `--reply` / `--resolve` /
+  `--push` and any `<pr>` / `local` target are rejected (no PR, no code change).
+## [0.17.0] - 2026-06-06
+### Added
+- **`prepare` on a local target.** `prepare local` (also `prepare staged` /
+  `prepare worktree`) is now valid: it loops review → fix → verify on the working
+  tree, recomputing the local confidence score each pass, until the score reaches
+  `5/5` (clean) or `--max-iterations` is hit — stopping as soon as `5/5` is
+  reached, then reporting the verdict. Local prep defaults to `--max-iterations 5`
+  (hard max `5`).
+- Like every local run, `prepare local` **never commits or pushes** (no PR
+  exists); the user commits afterward. It also honors all normal loop stop
+  conditions (needs-user decision, oscillation, ambiguous verification failure,
+  out-of-scope risk).
+### Changed
+- Local mode now accepts `review` / `fix` / `prepare` / `security` (was
+  `review` / `fix` / `security`). `status local` remains rejected (no PR to
+  snapshot). Updated the Invalid-combinations table, slash-command grammar,
+  subcommand table, expansion examples, help output, and Verification Checklist
+  accordingly.
+## [0.16.0] - 2026-06-06
+### Added
+- **"How to test" in fix/prepare reports.** When a run changes code (`fix` or
+  `prepare`, any target), the report now adds a grounded `How to test` block
+  between `Next action` and the final status lines — concrete setup / exercise /
+  expect steps a human can run by hand. Included in posted review bodies
+  (`--comment`) and in `fixed` thread replies (`--reply`). Omitted on read-only
+  runs.
+- **Hallucination guard for test steps.** Every command, path, flag, and
+  expected output in a `How to test` block must trace to real evidence (the
+  diff, a discovered script, a command actually run, a confirmed binary).
+  Ungroundable steps are omitted, never fabricated — online and offline.
+## [0.15.0] - 2026-06-06
+### Changed
+- **Final report puts readiness last.** Status, confidence, and target context
+  print at the bottom of the report, after `Next action`, instead of at the top.
+  Lets the reader scan findings → verification → next action → readiness in
+  order.
+## [0.14.0] - 2026-06-05
+### Added
+- **Local (Uncommitted) Review Mode.** Diffwarden now reviews uncommitted
+  working-tree changes with no PR required. Pass a `local`, `staged`, or
+  `worktree` target to `review`, `fix`, or `security` (e.g. `/dw review local`,
+  `/dw fix staged --security`). `local`/`worktree` cover all changes vs `HEAD`
+  plus untracked files (gitignored excluded); `staged` covers staged changes
+  only. The full review pipeline still applies — classification, severity,
+  confidence score, fix loop, verification, and the security checklist — while
+  the PR-only machinery is skipped: no PR detection, no CI, no review threads,
+  no posting, and no commit or push. Preflight runs with `LOCAL_MODE=1` (skips
+  the `gh`/remote checks; no Phase 2 PR gate). The confidence score drops its CI
+  dimension and reports `checks: n/a (local)`, reflecting readiness-to-commit
+  rather than merge-readiness. `prepare`/`status` and any posting/push flag are
+  rejected with a local target.
+## [0.13.0] - 2026-06-05
+### Added
+- **Best-effort version check on the help path.** Bare `/diffwarden` / `/dw`
+  (and the explicit `help` subcommand) now does one notify-only check for a
+  newer release and, if the installed skill is behind, appends a single
+  `↑ Diffwarden vX.Y.Z available …` line to the help output. Security-first by
+  design: it runs *only* on the help path (never during a review loop), is
+  best-effort and non-blocking (any failure — offline, no `curl`, rate-limit —
+  is silently skipped), uses the unauthenticated public releases API (never
+  reads or sends a token), and is **notify-only** — it never downloads,
+  overwrites, or executes the skill or `install.sh`. Updating stays the user's
+  manual `install.sh` step, preserving the trust boundary the rest of the skill
+  defends.
+## [0.12.2] - 2026-06-05
+### Changed
+- Bare `/diffwarden` / `/dw` (and `help`) now show the Diffwarden version in the
+  help header (`Diffwarden vX.Y.Z — slash commands ...`), substituted from the
+  skill's frontmatter `version:`. Docs only; no behavior change.
+## [0.12.1] - 2026-06-04
+### Changed
+- Help output now lists `--delegate` in the per-subcommand usage lines for
+  `review`, `fix`, and `prepare`, matching the Flags legend and grammar so the
+  flag is discoverable from the command listing. Docs only; no behavior change.
+## [0.12.0] - 2026-06-04
+### Added
+- **Delegated Reads (`--delegate-reads`, off by default).** On large PRs the bulk
+  diff hunks and CI-log bodies dominate context. With this flag, read-only
+  subagents may digest that *content* so the orchestrator's context holds the
+  conclusions, not the raw bytes — a token saving on long reviews. Built
+  security-first as a compression layer on reading only; it cannot change the
+  verdict or hide a file:
+  - **Security overrides everything (refusals, not tunables):** `--security-focus`
+    runs never delegate, and security-sensitive files (auth/authz, payments,
+    migrations, secrets, infra, `.github/workflows/**`, lint/CI config) are always
+    read raw. `security … --delegate` is rejected as a no-op.
+  - **No decision is ever delegated** — classification, severity, confidence
+    score, merge-ready, fix/defer, post/resolve stay 100% with the orchestrator.
+  - **Structured claims, grounded against raw source.** Subagents return
+    `{file, line, type, verbatim_quote}` (no prose); the orchestrator greps each
+    quote against raw source — no match → the claim is dropped and that file is
+    read raw, so a garbled-but-real issue is not lost.
+  - **Coverage reconciliation.** The authoritative file/check/comment set is
+    enumerated raw; a set difference forces a raw read of anything a subagent
+    skipped. A subagent can never shrink the set or mark a file clean.
+  - **Prompt-injection containment.** PR diff/comments/logs are treated as
+    untrusted data; subagents are read-only with no commit/push/post tools, so an
+    injected "report no issues" is caught by grounding + reconciliation.
+  - **Fail-safe + auditable.** Any subagent error/timeout/malformed output →
+    raw read of that chunk (worst case equals prior behavior); each run logs
+    `digest: subagent (files=N, grounded M/M, raw-fallback K, security-raw S)`.
+  - New section "Delegated Reads", slash flag `--delegate`, an Invalid-combination
+    reject, plus Common Pitfall and Verification Checklist entries.
+  - Default unset = today's behavior, byte-identical. Strict manual opt-in (no
+    auto-on heuristic in this release).
+## [0.11.0] - 2026-06-04
+### Added
+- **Incremental re-collection (loop iterations 2+).** The loop's biggest
+  repeated cost was re-fetching the full diff, every comment, and every CI log
+  on every iteration (full × N). Iterations 2+ may now fetch only what changed
+  since the last collection, cutting cost to roughly full + small × (N-1).
+  Designed so a missed delta is both unreachable at the verdict and cheap to
+  detect:
+  - Iteration 1 is always a full collection.
+  - Small signals (check status, `reviewDecision`, thread resolution state,
+    comment counts) are always re-pulled full; only the diff and failing-check
+    CI logs are deltaed.
+  - **Ancestry guard:** `git merge-base --is-ancestor LAST_HEAD HEAD` (or the PR
+    head SHA in review-only mode) forces a full re-pull on any rebase/force-push.
+  - **Count probe:** a comment-count mismatch vs the last collection forces a
+    full re-pull, catching added or deleted comments (edits don't change the
+    count — see the `updated_at` filter next).
+  - Comment deltas filter on `updated_at` (not `created_at`) so edits and
+    in-place bot updates are caught, and the diff delta unions in files that
+    still carry an open finding.
+  - **The merge-ready verdict always rests on a full collection** — `5/5` is
+    never declared on delta evidence (Loop Algorithm steps 5 and 14).
+  - Each iteration logs its mode (`evidence: full` / `evidence: delta`) so a
+    wrong delta is visible, never silent.
+  - New Common Pitfall and Verification Checklist item cover the delta path.
+## [0.10.2] - 2026-06-04
+### Changed
+- **Preflight: deduplicated the review-only vs local-edit explanation.** The
+  mode definition lived in three places verbatim (Preflight intro, the Phase 1
+  protected-branch comment, and the Phase 2 prose). It is now stated once in the
+  Preflight intro; Phase 1 and Phase 2 reference it and keep only their
+  location-specific detail. No behavior change — gate logic, modes, and all
+  bash are identical; this only trims repeated prose to cut per-run input
+  tokens.
+## [0.10.1] - 2026-06-04
+### Changed
+- **Evidence Collection now filters noise out of context** to cut token usage
+  with no loss of review coverage. The diff stream is path-filtered to drop
+  generated/vendored files (`*.lock`, `dist/`, `*.min.js`, `__snapshots__/`,
+  `vendor/`); CI logs are
+  pulled only for failing checks; inline/issue comments are reduced to the
+  fields the classifier reads (dropping `diff_hunk`, URLs, reactions); and the
+  PR snapshot omits the `comments` field that is fetched separately. These
+  filters only remove data the review never acts on — same findings, fewer
+  tokens. Added a caution to widen/drop a glob when a matched file is actually
+  human-reviewed, and a pointer to the GraphQL `reviewThreads` query for
+  resolved-thread state.
+## [0.10.0] - 2026-06-04
+### Added
+- `install.sh` — a self-contained installer. Detects which agents are present
+  (Claude Code, Cursor) and at which scope (project / global), then copies the
+  skill into `.../skills/diffwarden/SKILL.md` and the optional `/dw` and
+  `/diffwarden` command files into `.../commands/`. Idempotent (skips files
+  already up to date, diffs and asks before overwriting a changed file). Flags:
+  `--claude`, `--cursor`, `--project`, `--global`, `--dry-run`, `--yes`,
+  `--force`, `--ref`. Security-hardened: `set -euo pipefail`, HTTPS-only fetch
+  pinned to a release tag, no `sudo`, refuses to write outside `.claude/` and
+  `.cursor/`. Runs from a clone with no network, or from a downloaded copy.
+- `.github/workflows/ci.yml` — CI that shellchecks `install.sh` (`bash -n` +
+  `shellcheck`) and enforces version sync across all files. Required on `main`.
+- README **Contributing** section documenting the fork/PR flow and the `main`
+  branch-protection rules (PR required, 1 approval, CI green, squash-only, no
+  direct push / force-push — enforced for everyone, including the maintainer).
+### Changed
+- **Removed the `npx`/skills.sh install path** — it proved flaky. Install is now
+  the installer (Option A) or a plain manual copy (Option B). README Install
+  section rewritten end to end; skills.sh badge and references dropped.
+- README Command reference, Troubleshooting, Files list, and version badge
+  updated to describe installer-based install instead of the skill loader.
+## [0.9.2] - 2026-06-04
+### Changed
+- README: clarified how the `/dw` and `/diffwarden` slash commands actually
+  register. `/diffwarden` works automatically in Claude Code (matches the skill
+  name); `/dw` is never auto-installed and needs a one-time command-file copy.
+  Install section now documents copying `dw.md` into `.claude/commands/` (Claude
+  Code) as well as `.cursor/commands/` (Cursor), with a note that Claude Code
+  loads commands at session start. Updated the Command reference intro,
+  Troubleshooting entry, and Files list to match.
+## [0.9.1] - 2026-06-04
+### Added
+- Final report and `status` snapshot now print the Diffwarden version (from the
+  skill frontmatter `version:`) on the first line, so users can see which playbook
+  ran.
+- README: Cursor-specific caveman setup. Documents per-agent caveman activation
+  (hook-driven for Claude Code/Codex/Gemini vs. static `.cursor/rules/` file for
+  Cursor/Windsurf/Cline/Copilot), the `--with-init` symlink caution for this repo
+  (`AGENTS.md` → `CLAUDE.md`), the safe manual rule copy, and a Troubleshooting
+  entry for caveman not activating in Cursor.
+## [0.9.0] - 2026-06-04
+### Added
+- Caveman Mode (token savings): at the start of every invocation Diffwarden now
+  checks whether the `caveman` skill is available. If present, it runs in caveman
+  mode (compact, high-signal output) while preserving exact paths, commands,
+  errors, verification results, risks, and next actions, and keeping caveman's
+  safety carve-outs. If caveman is not installed, it emits a one-time suggestion
+  to install it for ~75% output-token savings, then continues normally. Output
+  style only — never changes classification, fix scope, safety gates, or the loop.
+## [0.8.0] - 2026-06-04
+### Added
+- Confidence Score: pending-checks bucket. A required check in a non-terminal
+  state (`pending`, `in_progress`, `queued`, `expected`) is now scored as
+  unresolved evidence capped at `3/5` with `checks: pending`, not as a failing
+  check (`2/5`) or as passing (`5/5`). Never declare `5/5` while a required check
+  is pending.
+- Preflight: review-only mode (`REVIEW_ONLY`). `review`, `status`, `security`,
+  any `--dry-run` run, and `--post-review` on a PR you do not own no longer
+  require the PR branch to be checked out locally — they pin the PR head SHA from
+  `gh` and read evidence via the API. Phase 1 skips the protected-branch halt and
+  Phase 2 skips the base-branch/head-drift checks in this mode. Fixes spurious
+  halts when reviewing another developer's PR from a different machine or clone
+  (e.g. a reviewer sitting on `main`). Local-edit mode (`fix`/`prepare`) keeps
+  the full protected-branch + base/head-drift gate.
+- Explicit `OWNER/REPO` resolution from the PR reference before any API call,
+  with `--repo "$OWNER/$REPO"` on every `gh pr`/`gh api` command. Stops `gh`'s
+  implicit current-directory repo resolution from silently targeting the wrong
+  repo (fork, renamed remote, different clone) and returning empty comment sets
+  that look like an uncommented PR.
+### Changed
+- Confidence Score: it is now explicitly relative to the commit it was computed
+  against. Two runs at different head SHAs (or check states) can legitimately
+  produce different scores for the same PR; scores must not be compared without
+  comparing their stamps first.
+- Final Report: `Confidence:` line now stamps the head SHA and check-state —
+  `Confidence: N/5 @ <head-sha> (checks: passing | pending | failing) — reason`.
+  Makes cross-device/cross-run score differences self-explaining instead of
+  looking like a contradiction.
+## [0.7.7] - 2026-06-02
+### Changed
+- GitHub auth: prefer `gh auth status` (user/keyring login) over
+  `GH_TOKEN`/`GITHUB_TOKEN`. When a user is active, unset env tokens for the
+  session so `gh` does not override keyring. Env tokens are validated only when
+  no active `gh` user (CI/automation fallback). No filesystem token search.
+## [0.7.6] - 2026-06-01
+### Changed
+- Remove tracked `.cursor/commands/` from repo; add `.cursor/` and `.claude/` to
+  `.gitignore`. Skill stays agent-agnostic; `skills/diffwarden/commands/` remains
+  an optional Cursor-only install. README and SKILL.md clarify Cursor slash menu
+  is optional.
+## [0.7.5] - 2026-06-01
+### Changed
+- README: move Contents before Command reference; reorder TOC to match section
+  order (command reference and loop guide first).
+## [0.7.4] - 2026-06-01
+### Changed
+- README: add "Loop until merge-ready (5/5)" section after Command reference
+  (loop commands, confidence scale, stop conditions, example workflow).
+## [0.7.3] - 2026-06-01
+### Added
+- Cursor slash command files: `skills/diffwarden/commands/dw.md` and
+  `diffwarden.md` (plus `.cursor/commands/` in this repo). `/dw` and
+  `/diffwarden` now work in Cursor's `/` menu after copying to
+  `.cursor/commands/` or `~/.cursor/commands/`. README install + FAQ updated.
+## [0.7.2] - 2026-06-01
+### Changed
+- README: add "Command reference" section with command and flag tables after the
+  intro overview; dedupe slash-command section to examples only.
+## [0.7.1] - 2026-06-01
+### Added
+- Safe GitHub token handling in preflight and new "GitHub Authentication"
+  section. Use `GH_TOKEN` / `GITHUB_TOKEN` only when already in the environment
+  (never search files/config). Validate with `gh api user`; if invalid, unset
+  env token and fall back to `gh` keyring login.
+## [0.7.0] - 2026-06-01
+### Added
+- Reviewer comment reply workflow: `--reply-comments` and `--resolve-replied`
+  flags. New "Replying to Review Comments" section in `SKILL.md` with reply
+  taxonomy (`fixed`, `already-addressed`, `defer`, `wontfix`, `needs-user`),
+  body templates, `gh api` reply/resolve commands, idempotency rules, and loop
+  integration. Slash flags: `--reply` and `--resolve`. Final report includes
+  comment-reply coverage.
+## [0.6.0] - 2026-06-01
+### Added
+- Slash-command invocation: `/diffwarden` and `/dw` with subcommands `review`,
+  `fix`, `prepare`, `security`, `status`, and `help`. New "Slash Commands"
+  section in `SKILL.md` defines grammar, flag mapping, PR resolution, expansion
+  examples, invalid combinations, and help output. README documents the same
+  for users.
+## [0.5.0] - 2026-06-01
+### Changed
+- Split the preflight gate into two phases. Phase 1 (environment) runs first and
+  is unchanged. New Phase 2 (PR-context) runs after PR detection and
+  machine-checks what were previously judgment calls: PR open/not-merged,
+  current branch is not the PR base, and no external head drift since last
+  iteration. Uses a single `gh` fetch with `-q` (no `jq` dependency) and exits
+  non-zero on failure.
+- Only dirty-file *relevance* remains a judgment call (a script can see dirty
+  files but not whether they belong to the fix).
+- Loop step 2 and the verification checklist now require the Phase 2 gate to
+  pass and to halt on failure.
+## [0.4.0] - 2026-06-01
+### Changed
+- Harden Preflight into an enforceable hard gate. Added a copy-paste gate script
+  that exits non-zero on hard failures (no git repo, missing/unauthenticated
+  `gh`, no remote, protected branch) so the result is machine-checkable instead
+  of a judgment call. Judgment checks (base-branch match, PR detected/open,
+  external head change, unrelated dirty files) are listed explicitly and must
+  also halt with a `blocked` report.
+- Loop step 1 and the verification checklist now require the gate to pass and
+  to halt on failure.
+### Safety
+- Diffwarden must not silently "fix" a failed gate (stash user changes, switch
+  branches, re-authenticate) without explicit user approval.
+## [0.3.0] - 2026-06-01
+### Added
+- Confidence Score: a PR-level merge-readiness score from `0` to `5`, computed
+  by Diffwarden from collected evidence each iteration (not self-reported by any
+  external tool). New "Confidence Score" section defines the scale and its
+  safety caps. Reported as `Confidence: N/5` in the final report.
+### Changed
+- Loop now gates on confidence: merge-ready is declared only at `5/5`. Loop and
+  success-state steps updated to compute and check the score; verification
+  checklist adds confidence items.
+### Safety
+- Confidence is advisory and a loop gate only. It never lowers a safety bar: a
+  high score does not authorize merge, push, or comment resolution, and
+  unresolved P0/security findings, failing required checks, and pending user
+  decisions cap the score regardless of other passing signals.
+## [0.2.0] - 2026-05-30
+### Added
+- `--post-review` mode: post findings directly to a PR as a GitHub review of
+  type `COMMENT`, with optional inline line comments. Enables reviewing other
+  developers' PRs and leaving feedback on GitHub instead of only reporting
+  locally. New "Posting Review to PR" section with `gh` commands.
+### Changed
+- Rewrite README as a beginner-friendly, comprehensive guide: prerequisites
+  with install commands, step-by-step first run, flag table, recipes, and a
+  troubleshooting/FAQ section.
+### Safety
+- Posted reviews are `COMMENT` only — never `APPROVE` or `REQUEST_CHANGES`
+  (merge-gating decisions stay with humans).
+- Off by default; requires `--post-review` plus explicit per-run authorization.
+- Never resolves/dismisses human threads, merges, or pushes when posting.
+- Posts against the captured head SHA; aborts on stale head. Secrets redacted.
+## [0.1.1] - 2026-05-30
+### Changed
+- Clarify External Agent Protocol: the Caveman-mode prefix is an
+  output-formatting directive, not an instruction-injection or safety-override
+  payload, and the section is explicitly optional. Reduces false-positive
+  surface for skill security scanners (Gen Agent Trust Hub, Socket, Snyk).
+## [0.1.0] - 2026-05-30
+### Added
+- Initial `diffwarden` PR review skill/playbook.
+- GitHub-first PR review loop using `gh`.
+- Preflight checks for git repo, branch scope, GitHub auth, PR state, and dirty worktree.
+- Evidence collection for PR diff, checks, reviews, comments, files, commits, and review decision.
+- Finding classification: actionable, informational, already addressed, needs user decision.
+- Severity model: P0 critical, P1 high, P2 medium, P3 low/info.
+- Conservative fix-planning protocol.
+- Verification strategy for tests, lint, typecheck, and security-sensitive changes.
+- Bounded review/fix loop with max-iteration and convergence guards.
+- Comment-resolution safety rules.
+- Security-focused checklist.
+- Branch and CI protection guards.
+- Dry-run mode.
+- External-agent protocol requiring Caveman mode before Claude Code CLI or Copilot CLI task prompts.
+### Safety
+- No auto-merge.
+- No force-push.
+- No blind push.
+- No destructive git operations by default.
+- No CI/test/lint weakening to pass checks.
+- No human review comment resolution without explicit approval.