pi-diffwarden 0.26.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md ADDED
@@ -0,0 +1,755 @@
1
+ # Changelog
2
+
3
+ All notable changes to Diffwarden are documented here.
4
+
5
+ Format follows Keep a Changelog style. Version tags use SemVer.
6
+
7
+ ## [0.26.1] - 2026-06-24
8
+
9
+ ### Added
10
+
11
+ - Added npm package metadata for publishing the Pi extension package as
12
+ `pi-diffwarden` with the `pi-package` keyword.
13
+ - Added npm install instructions for Pi Agent (`pi install npm:pi-diffwarden@0.26.1`).
14
+ - Added `npm pack --dry-run` and package-publication checks to CI.
15
+
16
+ ## [0.26.0] - 2026-06-24
17
+
18
+ ### Added
19
+
20
+ - Added optional Pi package extension (`extensions/diffwarden/index.ts`) that
21
+ registers native `/dw` and `/diffwarden` commands, forwards to
22
+ `/skill:diffwarden`, provides basic argument completions, and discovers the
23
+ bundled skill.
24
+ - Added `package.json` Pi package manifest so Pi can install Diffwarden from
25
+ git/local paths.
26
+ - Added CI static checks for the Pi package extension manifest and wrapper.
27
+ - Documented Pi extension install, behavior, and security warning in README and
28
+ `SKILL.md`.
29
+
30
+ ### Kept
31
+
32
+ - Installer still writes only Pi skills/prompts; no extension auto-install
33
+ through `install.sh`.
34
+ - Safety stance unchanged: no auto-merge, no force-push, no blind push, no
35
+ CI/test weakening, no resolving human comments without approval.
36
+
37
+ ## [0.25.0] - 2026-06-16
38
+
39
+ ### Added
40
+
41
+ - **Evidence-Based Findings** in `SKILL.md`: actionable findings require an
42
+ anchor (`file:line`, check name, PR field, or comment/thread id) plus a
43
+ verbatim quote or diff hunk; low-confidence guesses cannot be P0/P1 without
44
+ local proof.
45
+ - **Hallucination Guard** expanded beyond `How to test` to cover findings, fix
46
+ plans, PR comments, and thread replies.
47
+ - Grounded verification discovery: use manifest/workflow targets only when they
48
+ exist; never invent runners.
49
+ - Structured `verify: pass|fail|skipped` block in `--verbose` loop output.
50
+ - Fix-plan rules: `Will change` / `Will run` must trace to diff/read files and
51
+ discovered commands only.
52
+ - Delegated-read findings cross-linked to evidence rules; verification checklist
53
+ updated.
54
+
55
+ ## [0.24.1] - 2026-06-15
56
+
57
+ ### Fixed
58
+
59
+ - Restored the mandatory final status lines for lean, verbose, status, and PR
60
+ comment output. Final reviews now end with `Status:` followed by `Level:`,
61
+ without extra final fields or headings.
62
+
63
+ ## [0.24.0] - 2026-06-13
64
+
65
+ ### Added
66
+
67
+ - Added `loop` as the primary review-fix-verify command.
68
+ - Added `workspace` target for non-git and no-branch workspace review.
69
+ - Added auto-fallback to workspace mode when no git repo, no branch, or no PR exists.
70
+ - Added document review mode for plans, docs, guides, tutorials, specs, and markdown files.
71
+ - Added lean default loop output using `cN/5` progress lines.
72
+ - Added `--mvp` to stop at `c4/5` when only P3/info items remain.
73
+ - Added `--verbose` for the full detailed report.
74
+ - Added `--orchestrate` for optional reviewer/fixer role split.
75
+ - Added orchestration model defaults via global config and project override.
76
+ - Added safe config precedence for optional orchestration.
77
+ - Added short PR comment format: Findings, Status, Level.
78
+ - Added explicit PR comment posting safety: approval, head-SHA recheck, dedupe, and COMMENT-only reviews.
79
+ - Added workspace edit backups before non-git workspace fixes.
80
+ - Added document backups before document fixes.
81
+ - Added Pi Agent README install guide.
82
+ - Added optional Pi Agent installer target (`--pi`, `--pi-root`).
83
+ - Added Pi prompt-template aliases for `/dw` and `/diffwarden`.
84
+ - Added `docs/orchestration.md` and linked it from README.
85
+ - Added agentic implementation safety guidance for multi-agent worktrees, file ownership, merge gates, and conflict policy.
86
+
87
+ ### Changed
88
+
89
+ - Recommended global install by default because Diffwarden is a machine-wide reviewer/fixer.
90
+ - Simplified visible command surface to `review`, `loop`, `status`, `comment`, and `help`.
91
+ - Kept `fix`, `prepare`, and `security` as compatibility aliases.
92
+ - Changed default loop behavior to local edits only unless `--commit` or `--push` is passed.
93
+ - Reduced default review and loop output to minimize tokens.
94
+ - Moved detailed reports behind `--verbose`.
95
+
96
+ ### Fixed
97
+
98
+ - Non-git folders no longer block review.
99
+ - Git workspaces with no branch or detached HEAD no longer block workspace review.
100
+ - Git workspaces with no current PR now fall back to local/workspace review unless PR behavior was explicit.
101
+ - Missing `gh` no longer blocks non-PR review.
102
+ - Document review no longer requires git.
103
+
104
+ ### Kept
105
+
106
+ - No auto-merge.
107
+ - No force-push.
108
+ - No destructive git cleanup.
109
+ - No weakening CI/tests/lint/auth/secrets.
110
+ - No resolving human comments without explicit approval.
111
+ - Normal `/dw loop` remains single-agent unless `--orchestrate` is passed.
112
+
113
+ ## [0.23.2] - 2026-06-10
114
+
115
+ ### Changed
116
+
117
+ - Dropped unshipped `skills/diffwarden/prompts/` templates (never in a tagged
118
+ release). Codex 0.117.0 removed custom prompts upstream (`/prompts:dw`,
119
+ `/prompts:diffwarden`); the installer no longer references prompt files.
120
+
121
+ ## [0.23.1] - 2026-06-10
122
+
123
+ ### Fixed
124
+
125
+ - Codex CLI docs and installer now match current Codex behavior (≥ 0.117.0):
126
+ invoke Diffwarden with `$diffwarden` or `/skills`, not `/dw`, `/diffwarden`, or
127
+ `/prompts:*`. Custom prompts in `~/.codex/prompts/` were removed upstream;
128
+ `install.sh` no longer copies prompt files there.
129
+ - README adds a **Codex CLI** section listing supported vs unsupported invocation
130
+ paths and why (built-in-only `/` menu, deprecated/removed custom prompts,
131
+ `.agents/skills` as the skill path).
132
+ - `SKILL.md` Slash Commands section now documents per-agent invocation and parses
133
+ `$diffwarden` the same as `/diffwarden` / `/dw`.
134
+
135
+ ## [0.23.0] - 2026-06-10
136
+
137
+ ### Fixed
138
+
139
+ - Corrected Codex install support to match current Codex docs: Codex skills now
140
+ install to `.agents/skills/diffwarden/` or `~/.agents/skills/diffwarden/`,
141
+ not `.codex/skills/diffwarden/`.
142
+ - Removed the unsupported `.codex/commands/` path. Codex CLI prompt aliases now
143
+ install to `~/.codex/prompts/` and are invoked as `/prompts:dw` or
144
+ `/prompts:diffwarden`.
145
+ - README and `SKILL.md` now distinguish Claude Code/Cursor slash-command files
146
+ from Codex CLI skills and prompt aliases.
147
+
148
+ ## [0.22.0] - 2026-06-10
149
+
150
+ ### Added
151
+
152
+ - **Codex installer support.** `install.sh` now detects `.codex` / `~/.codex`,
153
+ accepts `--codex`, and copies the skill plus `/dw` / `/diffwarden` command
154
+ files to `.codex/skills/diffwarden/` and `.codex/commands/`.
155
+ - README and `SKILL.md` now document Codex as a first-class install target,
156
+ including manual-copy paths and `/dw` troubleshooting.
157
+
158
+ ### Changed
159
+
160
+ - Installer safety guard now allows writes only under `.claude/`, `.codex/`,
161
+ and `.cursor/`, preserving the no-`sudo`, no-outside-config-dir stance.
162
+ - README local-mode wording now matches the skill: `prepare local` is valid;
163
+ only `status` and posting/push flags are rejected with a local target.
164
+
165
+ ## [0.21.0] - 2026-06-08
166
+
167
+ ### Added
168
+
169
+ - **Web-Augmented Review (`--web`, alias `--research`; slash `--web`) — opt-in,
170
+ human-gated web grounding.** Off by default. When enabled *and* genuinely
171
+ uncertain (a low-confidence finding, a time-sensitive CVE/advisory/best-practice
172
+ question, or a user-requested deep review), Diffwarden may consult the web to
173
+ ground a single finding — but only after a per-finding `[y/N]` consent prompt it
174
+ **waits** on, and only with a redacted minimal finding descriptor. Two gates,
175
+ both required: the `--web` flag, then per-finding human consent. It never
176
+ auto-searches silently and never batch-approves.
177
+ - Findings are marked `web-verified` (a consented search grounded it; URL cited)
178
+ or `local-only` (the default). Web grounding never auto-raises severity and
179
+ never bypasses a safety cap (P0/security still caps at `1/5`, needs-user at
180
+ `3/5`) — severity and confidence stay Diffwarden's own judgment.
181
+ - **Data-egress guard.** A web query carries only the abstract finding descriptor;
182
+ repo source, diff hunks, secrets, tokens, file paths, and internal names are
183
+ never sent. The exact redacted query is shown in the consent prompt, and the
184
+ data-exfiltration/scope risk is noted in the finding's rationale.
185
+ - Valid on `review` / `fix` / `prepare` / `security` (code targets, incl.
186
+ `local` / `staged` / `worktree`) and compatible with `--dry-run` /
187
+ `--security-focus`; **rejected** on `status` (snapshot only) and plan mode
188
+ (`--as-plan` or a `.md` plan target). New "Web-Augmented Review (opt-in)"
189
+ section; wired into Inputs, the slash grammar + flag-mapping table + help
190
+ output, Invalid-combinations rows, the Confidence Score / Classification flow,
191
+ the Loop Algorithm, Dry Run Mode, the Final Report, Common Pitfalls, and the
192
+ Verification Checklist. Synced the `/dw` and `/diffwarden` command files and the
193
+ README (new "Web-augmented review (opt-in)" section, command/flag tables, TOC).
194
+
195
+ ### Kept
196
+
197
+ - Full safety stance unchanged: no auto-merge, no force-push, no blind push, no
198
+ weakening of CI/tests/lint/auth/secrets, and no resolving human comments without
199
+ explicit approval. Web access is off by default and never silent; the help-path
200
+ version check remains the only other network call and is unaffected by `--web`.
201
+
202
+ ## [0.20.0] - 2026-06-07
203
+
204
+ ### Changed
205
+
206
+ - **Collapsed the plan subcommand surface into auto-detected `review` / `fix`.**
207
+ There is now **one** `review` and **one** `fix`; each classifies its *target*
208
+ and selects the matching internal mode — a PR / `#num` / URL / `current` /
209
+ `local` / `staged` / `worktree` → **code** mode; a single prose `.md` plan
210
+ (headings/sections, no diff payload) → **plan** mode. The code-review and
211
+ plan-review rubric logic is unchanged — only the entrypoint collapses.
212
+ - Mixed signals (e.g. a PR ref *and* a `.md` plan path, or a `.md` carrying diff
213
+ hunks) → Diffwarden **asks** which mode and states that the default is **code**;
214
+ it never silently guesses.
215
+
216
+ ### Added
217
+
218
+ - **`--as-code` / `--as-plan` override flags** on `review` / `fix` to force the
219
+ mode past the detector. They are mutually exclusive, and `--as-plan` is rejected
220
+ on a PR / `local` / `staged` / `worktree` target (not a plan document).
221
+ - **Mandatory mode banner.** Every `review` / `fix` run prints the auto-selected
222
+ mode before working: `detected: code review | plan review | code fix | plan fix`.
223
+ - Updated the grammar, Target Auto-Detection section, subcommand and flag-mapping
224
+ tables, expansion examples, Invalid-combinations table, help output, Plan
225
+ Review/Fix Mode triggers, How-to-Test scope, and the Verification Checklist.
226
+ Synced the `/dw` and `/diffwarden` command files and the README (new
227
+ "Auto-detected mode (code vs plan)" section, command/flag tables).
228
+
229
+ ### Kept
230
+
231
+ - **Hidden back-compat aliases.** `review-plan <filepath>` ≡ `review <filepath>
232
+ --as-plan` and `fix-plan <filepath>` ≡ `fix <filepath> --as-plan` are still
233
+ accepted — expanded internally, not advertised in `help`.
234
+ - The full safety stance is unchanged: no auto-merge, no force-push, no blind
235
+ push, no weakening of CI/tests/lint/auth/secrets, and no resolving human
236
+ comments without explicit approval. Plan mode still touches no PR, git, or code
237
+ (plan `review` is read-only; plan `fix` edits only the plan file).
238
+
239
+ ## [0.19.0] - 2026-06-06
240
+
241
+ ### Added
242
+
243
+ - **`fix-plan <filepath>` — Plan Fix Mode.** The edit counterpart to
244
+ `review-plan`: it runs the same plan critique, then **revises the plan file in
245
+ place** to address findings, looping review → revise → re-score until
246
+ plan-readiness `5/5` or `--max-iterations` (default `5`, hard max `5`).
247
+ - Before the first edit it backs up the original to `<filepath>.orig` (and never
248
+ overwrites an existing backup — it falls back to `<filepath>.orig.N`). It edits
249
+ **only the plan file**: no code, no git, no commit, no push. Needs-user findings
250
+ are left flagged, never invented, and the plan is never weakened to raise the
251
+ score.
252
+ - Flags: `--security` deepens the security pass; `--delegate` may digest a long
253
+ plan under the grounding contract; `--max N` bounds the loop. `--comment` /
254
+ `--reply` / `--resolve` / `--push` / `--dry-run` and any `<pr>` / `local` target
255
+ are rejected.
256
+ - Reports `Plan-readiness: N/5 (checks: n/a (plan))`, the backup path, and
257
+ `Iterations: N/M`. Updated the grammar, subcommand table, expansion examples,
258
+ Invalid-combinations table, help output, Final Report notes, and Verification
259
+ Checklist. Added `review-plan` and `fix-plan` to the README command reference.
260
+
261
+ ## [0.18.0] - 2026-06-06
262
+
263
+ ### Added
264
+
265
+ - **`review-plan <filepath>` — Plan Review Mode.** A new subcommand that
266
+ critiques a plan/design document *before* any code is written: completeness,
267
+ ordering & dependencies, ambiguity, scope, risk (destructive/irreversible
268
+ steps), security, per-step verification, rollback/failure handling, grounding
269
+ (do the files/commands/symbols the plan names actually exist?), and unstated
270
+ assumptions.
271
+ - Plan Review Mode is **read-only**: no PR, no git operations, no code edits, no
272
+ fix loop. It reads the plan and (read-only) the files it references to ground
273
+ the critique, never rewrites the plan, and reports a `0–5` plan-readiness score
274
+ (`ready | needs revision | blocked | user decision needed`).
275
+ - Flags: `--security` deepens the security pass; `--delegate` may digest a long
276
+ plan under the grounding contract. `--comment` / `--reply` / `--resolve` /
277
+ `--push` and any `<pr>` / `local` target are rejected (no PR, no code change).
278
+
279
+ ## [0.17.0] - 2026-06-06
280
+
281
+ ### Added
282
+
283
+ - **`prepare` on a local target.** `prepare local` (also `prepare staged` /
284
+ `prepare worktree`) is now valid: it loops review → fix → verify on the working
285
+ tree, recomputing the local confidence score each pass, until the score reaches
286
+ `5/5` (clean) or `--max-iterations` is hit — stopping as soon as `5/5` is
287
+ reached, then reporting the verdict. Local prep defaults to `--max-iterations 5`
288
+ (hard max `5`).
289
+ - Like every local run, `prepare local` **never commits or pushes** (no PR
290
+ exists); the user commits afterward. It also honors all normal loop stop
291
+ conditions (needs-user decision, oscillation, ambiguous verification failure,
292
+ out-of-scope risk).
293
+
294
+ ### Changed
295
+
296
+ - Local mode now accepts `review` / `fix` / `prepare` / `security` (was
297
+ `review` / `fix` / `security`). `status local` remains rejected (no PR to
298
+ snapshot). Updated the Invalid-combinations table, slash-command grammar,
299
+ subcommand table, expansion examples, help output, and Verification Checklist
300
+ accordingly.
301
+
302
+ ## [0.16.0] - 2026-06-06
303
+
304
+ ### Added
305
+
306
+ - **"How to test" in fix/prepare reports.** When a run changes code (`fix` or
307
+ `prepare`, any target), the report now adds a grounded `How to test` block
308
+ between `Next action` and the final status lines — concrete setup / exercise /
309
+ expect steps a human can run by hand. Included in posted review bodies
310
+ (`--comment`) and in `fixed` thread replies (`--reply`). Omitted on read-only
311
+ runs.
312
+ - **Hallucination guard for test steps.** Every command, path, flag, and
313
+ expected output in a `How to test` block must trace to real evidence (the
314
+ diff, a discovered script, a command actually run, a confirmed binary).
315
+ Ungroundable steps are omitted, never fabricated — online and offline.
316
+
317
+ ## [0.15.0] - 2026-06-06
318
+
319
+ ### Changed
320
+
321
+ - **Final report puts readiness last.** Status, confidence, and target context
322
+ print at the bottom of the report, after `Next action`, instead of at the top.
323
+ Lets the reader scan findings → verification → next action → readiness in
324
+ order.
325
+
326
+ ## [0.14.0] - 2026-06-05
327
+
328
+ ### Added
329
+
330
+ - **Local (Uncommitted) Review Mode.** Diffwarden now reviews uncommitted
331
+ working-tree changes with no PR required. Pass a `local`, `staged`, or
332
+ `worktree` target to `review`, `fix`, or `security` (e.g. `/dw review local`,
333
+ `/dw fix staged --security`). `local`/`worktree` cover all changes vs `HEAD`
334
+ plus untracked files (gitignored excluded); `staged` covers staged changes
335
+ only. The full review pipeline still applies — classification, severity,
336
+ confidence score, fix loop, verification, and the security checklist — while
337
+ the PR-only machinery is skipped: no PR detection, no CI, no review threads,
338
+ no posting, and no commit or push. Preflight runs with `LOCAL_MODE=1` (skips
339
+ the `gh`/remote checks; no Phase 2 PR gate). The confidence score drops its CI
340
+ dimension and reports `checks: n/a (local)`, reflecting readiness-to-commit
341
+ rather than merge-readiness. `prepare`/`status` and any posting/push flag are
342
+ rejected with a local target.
343
+
344
+ ## [0.13.0] - 2026-06-05
345
+
346
+ ### Added
347
+
348
+ - **Best-effort version check on the help path.** Bare `/diffwarden` / `/dw`
349
+ (and the explicit `help` subcommand) now does one notify-only check for a
350
+ newer release and, if the installed skill is behind, appends a single
351
+ `↑ Diffwarden vX.Y.Z available …` line to the help output. Security-first by
352
+ design: it runs *only* on the help path (never during a review loop), is
353
+ best-effort and non-blocking (any failure — offline, no `curl`, rate-limit —
354
+ is silently skipped), uses the unauthenticated public releases API (never
355
+ reads or sends a token), and is **notify-only** — it never downloads,
356
+ overwrites, or executes the skill or `install.sh`. Updating stays the user's
357
+ manual `install.sh` step, preserving the trust boundary the rest of the skill
358
+ defends.
359
+
360
+ ## [0.12.2] - 2026-06-05
361
+
362
+ ### Changed
363
+
364
+ - Bare `/diffwarden` / `/dw` (and `help`) now show the Diffwarden version in the
365
+ help header (`Diffwarden vX.Y.Z — slash commands ...`), substituted from the
366
+ skill's frontmatter `version:`. Docs only; no behavior change.
367
+
368
+ ## [0.12.1] - 2026-06-04
369
+
370
+ ### Changed
371
+
372
+ - Help output now lists `--delegate` in the per-subcommand usage lines for
373
+ `review`, `fix`, and `prepare`, matching the Flags legend and grammar so the
374
+ flag is discoverable from the command listing. Docs only; no behavior change.
375
+
376
+ ## [0.12.0] - 2026-06-04
377
+
378
+ ### Added
379
+
380
+ - **Delegated Reads (`--delegate-reads`, off by default).** On large PRs the bulk
381
+ diff hunks and CI-log bodies dominate context. With this flag, read-only
382
+ subagents may digest that *content* so the orchestrator's context holds the
383
+ conclusions, not the raw bytes — a token saving on long reviews. Built
384
+ security-first as a compression layer on reading only; it cannot change the
385
+ verdict or hide a file:
386
+ - **Security overrides everything (refusals, not tunables):** `--security-focus`
387
+ runs never delegate, and security-sensitive files (auth/authz, payments,
388
+ migrations, secrets, infra, `.github/workflows/**`, lint/CI config) are always
389
+ read raw. `security … --delegate` is rejected as a no-op.
390
+ - **No decision is ever delegated** — classification, severity, confidence
391
+ score, merge-ready, fix/defer, post/resolve stay 100% with the orchestrator.
392
+ - **Structured claims, grounded against raw source.** Subagents return
393
+ `{file, line, type, verbatim_quote}` (no prose); the orchestrator greps each
394
+ quote against raw source — no match → the claim is dropped and that file is
395
+ read raw, so a garbled-but-real issue is not lost.
396
+ - **Coverage reconciliation.** The authoritative file/check/comment set is
397
+ enumerated raw; a set difference forces a raw read of anything a subagent
398
+ skipped. A subagent can never shrink the set or mark a file clean.
399
+ - **Prompt-injection containment.** PR diff/comments/logs are treated as
400
+ untrusted data; subagents are read-only with no commit/push/post tools, so an
401
+ injected "report no issues" is caught by grounding + reconciliation.
402
+ - **Fail-safe + auditable.** Any subagent error/timeout/malformed output →
403
+ raw read of that chunk (worst case equals prior behavior); each run logs
404
+ `digest: subagent (files=N, grounded M/M, raw-fallback K, security-raw S)`.
405
+ - New section "Delegated Reads", slash flag `--delegate`, an Invalid-combination
406
+ reject, plus Common Pitfall and Verification Checklist entries.
407
+ - Default unset = today's behavior, byte-identical. Strict manual opt-in (no
408
+ auto-on heuristic in this release).
409
+
410
+ ## [0.11.0] - 2026-06-04
411
+
412
+ ### Added
413
+
414
+ - **Incremental re-collection (loop iterations 2+).** The loop's biggest
415
+ repeated cost was re-fetching the full diff, every comment, and every CI log
416
+ on every iteration (full × N). Iterations 2+ may now fetch only what changed
417
+ since the last collection, cutting cost to roughly full + small × (N-1).
418
+ Designed so a missed delta is both unreachable at the verdict and cheap to
419
+ detect:
420
+ - Iteration 1 is always a full collection.
421
+ - Small signals (check status, `reviewDecision`, thread resolution state,
422
+ comment counts) are always re-pulled full; only the diff and failing-check
423
+ CI logs are deltaed.
424
+ - **Ancestry guard:** `git merge-base --is-ancestor LAST_HEAD HEAD` (or the PR
425
+ head SHA in review-only mode) forces a full re-pull on any rebase/force-push.
426
+ - **Count probe:** a comment-count mismatch vs the last collection forces a
427
+ full re-pull, catching added or deleted comments (edits don't change the
428
+ count — see the `updated_at` filter next).
429
+ - Comment deltas filter on `updated_at` (not `created_at`) so edits and
430
+ in-place bot updates are caught, and the diff delta unions in files that
431
+ still carry an open finding.
432
+ - **The merge-ready verdict always rests on a full collection** — `5/5` is
433
+ never declared on delta evidence (Loop Algorithm steps 5 and 14).
434
+ - Each iteration logs its mode (`evidence: full` / `evidence: delta`) so a
435
+ wrong delta is visible, never silent.
436
+ - New Common Pitfall and Verification Checklist item cover the delta path.
437
+
438
+ ## [0.10.2] - 2026-06-04
439
+
440
+ ### Changed
441
+
442
+ - **Preflight: deduplicated the review-only vs local-edit explanation.** The
443
+ mode definition lived in three places verbatim (Preflight intro, the Phase 1
444
+ protected-branch comment, and the Phase 2 prose). It is now stated once in the
445
+ Preflight intro; Phase 1 and Phase 2 reference it and keep only their
446
+ location-specific detail. No behavior change — gate logic, modes, and all
447
+ bash are identical; this only trims repeated prose to cut per-run input
448
+ tokens.
449
+
450
+ ## [0.10.1] - 2026-06-04
451
+
452
+ ### Changed
453
+
454
+ - **Evidence Collection now filters noise out of context** to cut token usage
455
+ with no loss of review coverage. The diff stream is path-filtered to drop
456
+ generated/vendored files (`*.lock`, `dist/`, `*.min.js`, `__snapshots__/`,
457
+ `vendor/`); CI logs are
458
+ pulled only for failing checks; inline/issue comments are reduced to the
459
+ fields the classifier reads (dropping `diff_hunk`, URLs, reactions); and the
460
+ PR snapshot omits the `comments` field that is fetched separately. These
461
+ filters only remove data the review never acts on — same findings, fewer
462
+ tokens. Added a caution to widen/drop a glob when a matched file is actually
463
+ human-reviewed, and a pointer to the GraphQL `reviewThreads` query for
464
+ resolved-thread state.
465
+
466
+ ## [0.10.0] - 2026-06-04
467
+
468
+ ### Added
469
+
470
+ - `install.sh` — a self-contained installer. Detects which agents are present
471
+ (Claude Code, Cursor) and at which scope (project / global), then copies the
472
+ skill into `.../skills/diffwarden/SKILL.md` and the optional `/dw` and
473
+ `/diffwarden` command files into `.../commands/`. Idempotent (skips files
474
+ already up to date, diffs and asks before overwriting a changed file). Flags:
475
+ `--claude`, `--cursor`, `--project`, `--global`, `--dry-run`, `--yes`,
476
+ `--force`, `--ref`. Security-hardened: `set -euo pipefail`, HTTPS-only fetch
477
+ pinned to a release tag, no `sudo`, refuses to write outside `.claude/` and
478
+ `.cursor/`. Runs from a clone with no network, or from a downloaded copy.
479
+ - `.github/workflows/ci.yml` — CI that shellchecks `install.sh` (`bash -n` +
480
+ `shellcheck`) and enforces version sync across all files. Required on `main`.
481
+ - README **Contributing** section documenting the fork/PR flow and the `main`
482
+ branch-protection rules (PR required, 1 approval, CI green, squash-only, no
483
+ direct push / force-push — enforced for everyone, including the maintainer).
484
+
485
+ ### Changed
486
+
487
+ - **Removed the `npx`/skills.sh install path** — it proved flaky. Install is now
488
+ the installer (Option A) or a plain manual copy (Option B). README Install
489
+ section rewritten end to end; skills.sh badge and references dropped.
490
+ - README Command reference, Troubleshooting, Files list, and version badge
491
+ updated to describe installer-based install instead of the skill loader.
492
+
493
+ ## [0.9.2] - 2026-06-04
494
+
495
+ ### Changed
496
+
497
+ - README: clarified how the `/dw` and `/diffwarden` slash commands actually
498
+ register. `/diffwarden` works automatically in Claude Code (matches the skill
499
+ name); `/dw` is never auto-installed and needs a one-time command-file copy.
500
+ Install section now documents copying `dw.md` into `.claude/commands/` (Claude
501
+ Code) as well as `.cursor/commands/` (Cursor), with a note that Claude Code
502
+ loads commands at session start. Updated the Command reference intro,
503
+ Troubleshooting entry, and Files list to match.
504
+
505
+ ## [0.9.1] - 2026-06-04
506
+
507
+ ### Added
508
+
509
+ - Final report and `status` snapshot now print the Diffwarden version (from the
510
+ skill frontmatter `version:`) on the first line, so users can see which playbook
511
+ ran.
512
+ - README: Cursor-specific caveman setup. Documents per-agent caveman activation
513
+ (hook-driven for Claude Code/Codex/Gemini vs. static `.cursor/rules/` file for
514
+ Cursor/Windsurf/Cline/Copilot), the `--with-init` symlink caution for this repo
515
+ (`AGENTS.md` → `CLAUDE.md`), the safe manual rule copy, and a Troubleshooting
516
+ entry for caveman not activating in Cursor.
517
+
518
+ ## [0.9.0] - 2026-06-04
519
+
520
+ ### Added
521
+
522
+ - Caveman Mode (token savings): at the start of every invocation Diffwarden now
523
+ checks whether the `caveman` skill is available. If present, it runs in caveman
524
+ mode (compact, high-signal output) while preserving exact paths, commands,
525
+ errors, verification results, risks, and next actions, and keeping caveman's
526
+ safety carve-outs. If caveman is not installed, it emits a one-time suggestion
527
+ to install it for ~75% output-token savings, then continues normally. Output
528
+ style only — never changes classification, fix scope, safety gates, or the loop.
529
+
530
+ ## [0.8.0] - 2026-06-04
531
+
532
+ ### Added
533
+
534
+ - Confidence Score: pending-checks bucket. A required check in a non-terminal
535
+ state (`pending`, `in_progress`, `queued`, `expected`) is now scored as
536
+ unresolved evidence capped at `3/5` with `checks: pending`, not as a failing
537
+ check (`2/5`) or as passing (`5/5`). Never declare `5/5` while a required check
538
+ is pending.
539
+ - Preflight: review-only mode (`REVIEW_ONLY`). `review`, `status`, `security`,
540
+ any `--dry-run` run, and `--post-review` on a PR you do not own no longer
541
+ require the PR branch to be checked out locally — they pin the PR head SHA from
542
+ `gh` and read evidence via the API. Phase 1 skips the protected-branch halt and
543
+ Phase 2 skips the base-branch/head-drift checks in this mode. Fixes spurious
544
+ halts when reviewing another developer's PR from a different machine or clone
545
+ (e.g. a reviewer sitting on `main`). Local-edit mode (`fix`/`prepare`) keeps
546
+ the full protected-branch + base/head-drift gate.
547
+ - Explicit `OWNER/REPO` resolution from the PR reference before any API call,
548
+ with `--repo "$OWNER/$REPO"` on every `gh pr`/`gh api` command. Stops `gh`'s
549
+ implicit current-directory repo resolution from silently targeting the wrong
550
+ repo (fork, renamed remote, different clone) and returning empty comment sets
551
+ that look like an uncommented PR.
552
+
553
+ ### Changed
554
+
555
+ - Confidence Score: it is now explicitly relative to the commit it was computed
556
+ against. Two runs at different head SHAs (or check states) can legitimately
557
+ produce different scores for the same PR; scores must not be compared without
558
+ comparing their stamps first.
559
+ - Final Report: `Confidence:` line now stamps the head SHA and check-state —
560
+ `Confidence: N/5 @ <head-sha> (checks: passing | pending | failing) — reason`.
561
+ Makes cross-device/cross-run score differences self-explaining instead of
562
+ looking like a contradiction.
563
+
564
+ ## [0.7.7] - 2026-06-02
565
+
566
+ ### Changed
567
+
568
+ - GitHub auth: prefer `gh auth status` (user/keyring login) over
569
+ `GH_TOKEN`/`GITHUB_TOKEN`. When a user is active, unset env tokens for the
570
+ session so `gh` does not override keyring. Env tokens are validated only when
571
+ no active `gh` user (CI/automation fallback). No filesystem token search.
572
+
573
+ ## [0.7.6] - 2026-06-01
574
+
575
+ ### Changed
576
+
577
+ - Remove tracked `.cursor/commands/` from repo; add `.cursor/` and `.claude/` to
578
+ `.gitignore`. Skill stays agent-agnostic; `skills/diffwarden/commands/` remains
579
+ an optional Cursor-only install. README and SKILL.md clarify Cursor slash menu
580
+ is optional.
581
+
582
+ ## [0.7.5] - 2026-06-01
583
+
584
+ ### Changed
585
+
586
+ - README: move Contents before Command reference; reorder TOC to match section
587
+ order (command reference and loop guide first).
588
+
589
+ ## [0.7.4] - 2026-06-01
590
+
591
+ ### Changed
592
+
593
+ - README: add "Loop until merge-ready (5/5)" section after Command reference
594
+ (loop commands, confidence scale, stop conditions, example workflow).
595
+
596
+ ## [0.7.3] - 2026-06-01
597
+
598
+ ### Added
599
+
600
+ - Cursor slash command files: `skills/diffwarden/commands/dw.md` and
601
+ `diffwarden.md` (plus `.cursor/commands/` in this repo). `/dw` and
602
+ `/diffwarden` now work in Cursor's `/` menu after copying to
603
+ `.cursor/commands/` or `~/.cursor/commands/`. README install + FAQ updated.
604
+
605
+ ## [0.7.2] - 2026-06-01
606
+
607
+ ### Changed
608
+
609
+ - README: add "Command reference" section with command and flag tables after the
610
+ intro overview; dedupe slash-command section to examples only.
611
+
612
+ ## [0.7.1] - 2026-06-01
613
+
614
+ ### Added
615
+
616
+ - Safe GitHub token handling in preflight and new "GitHub Authentication"
617
+ section. Use `GH_TOKEN` / `GITHUB_TOKEN` only when already in the environment
618
+ (never search files/config). Validate with `gh api user`; if invalid, unset
619
+ env token and fall back to `gh` keyring login.
620
+
621
+ ## [0.7.0] - 2026-06-01
622
+
623
+ ### Added
624
+
625
+ - Reviewer comment reply workflow: `--reply-comments` and `--resolve-replied`
626
+ flags. New "Replying to Review Comments" section in `SKILL.md` with reply
627
+ taxonomy (`fixed`, `already-addressed`, `defer`, `wontfix`, `needs-user`),
628
+ body templates, `gh api` reply/resolve commands, idempotency rules, and loop
629
+ integration. Slash flags: `--reply` and `--resolve`. Final report includes
630
+ comment-reply coverage.
631
+
632
+ ## [0.6.0] - 2026-06-01
633
+
634
+ ### Added
635
+
636
+ - Slash-command invocation: `/diffwarden` and `/dw` with subcommands `review`,
637
+ `fix`, `prepare`, `security`, `status`, and `help`. New "Slash Commands"
638
+ section in `SKILL.md` defines grammar, flag mapping, PR resolution, expansion
639
+ examples, invalid combinations, and help output. README documents the same
640
+ for users.
641
+
642
+ ## [0.5.0] - 2026-06-01
643
+
644
+ ### Changed
645
+
646
+ - Split the preflight gate into two phases. Phase 1 (environment) runs first and
647
+ is unchanged. New Phase 2 (PR-context) runs after PR detection and
648
+ machine-checks what were previously judgment calls: PR open/not-merged,
649
+ current branch is not the PR base, and no external head drift since last
650
+ iteration. Uses a single `gh` fetch with `-q` (no `jq` dependency) and exits
651
+ non-zero on failure.
652
+ - Only dirty-file *relevance* remains a judgment call (a script can see dirty
653
+ files but not whether they belong to the fix).
654
+ - Loop step 2 and the verification checklist now require the Phase 2 gate to
655
+ pass and to halt on failure.
656
+
657
+ ## [0.4.0] - 2026-06-01
658
+
659
+ ### Changed
660
+
661
+ - Harden Preflight into an enforceable hard gate. Added a copy-paste gate script
662
+ that exits non-zero on hard failures (no git repo, missing/unauthenticated
663
+ `gh`, no remote, protected branch) so the result is machine-checkable instead
664
+ of a judgment call. Judgment checks (base-branch match, PR detected/open,
665
+ external head change, unrelated dirty files) are listed explicitly and must
666
+ also halt with a `blocked` report.
667
+ - Loop step 1 and the verification checklist now require the gate to pass and
668
+ to halt on failure.
669
+
670
+ ### Safety
671
+
672
+ - Diffwarden must not silently "fix" a failed gate (stash user changes, switch
673
+ branches, re-authenticate) without explicit user approval.
674
+
675
+ ## [0.3.0] - 2026-06-01
676
+
677
+ ### Added
678
+
679
+ - Confidence Score: a PR-level merge-readiness score from `0` to `5`, computed
680
+ by Diffwarden from collected evidence each iteration (not self-reported by any
681
+ external tool). New "Confidence Score" section defines the scale and its
682
+ safety caps. Reported as `Confidence: N/5` in the final report.
683
+
684
+ ### Changed
685
+
686
+ - Loop now gates on confidence: merge-ready is declared only at `5/5`. Loop and
687
+ success-state steps updated to compute and check the score; verification
688
+ checklist adds confidence items.
689
+
690
+ ### Safety
691
+
692
+ - Confidence is advisory and a loop gate only. It never lowers a safety bar: a
693
+ high score does not authorize merge, push, or comment resolution, and
694
+ unresolved P0/security findings, failing required checks, and pending user
695
+ decisions cap the score regardless of other passing signals.
696
+
697
+ ## [0.2.0] - 2026-05-30
698
+
699
+ ### Added
700
+
701
+ - `--post-review` mode: post findings directly to a PR as a GitHub review of
702
+ type `COMMENT`, with optional inline line comments. Enables reviewing other
703
+ developers' PRs and leaving feedback on GitHub instead of only reporting
704
+ locally. New "Posting Review to PR" section with `gh` commands.
705
+
706
+ ### Changed
707
+
708
+ - Rewrite README as a beginner-friendly, comprehensive guide: prerequisites
709
+ with install commands, step-by-step first run, flag table, recipes, and a
710
+ troubleshooting/FAQ section.
711
+
712
+ ### Safety
713
+
714
+ - Posted reviews are `COMMENT` only — never `APPROVE` or `REQUEST_CHANGES`
715
+ (merge-gating decisions stay with humans).
716
+ - Off by default; requires `--post-review` plus explicit per-run authorization.
717
+ - Never resolves/dismisses human threads, merges, or pushes when posting.
718
+ - Posts against the captured head SHA; aborts on stale head. Secrets redacted.
719
+
720
+ ## [0.1.1] - 2026-05-30
721
+
722
+ ### Changed
723
+
724
+ - Clarify External Agent Protocol: the Caveman-mode prefix is an
725
+ output-formatting directive, not an instruction-injection or safety-override
726
+ payload, and the section is explicitly optional. Reduces false-positive
727
+ surface for skill security scanners (Gen Agent Trust Hub, Socket, Snyk).
728
+
729
+ ## [0.1.0] - 2026-05-30
730
+
731
+ ### Added
732
+
733
+ - Initial `diffwarden` PR review skill/playbook.
734
+ - GitHub-first PR review loop using `gh`.
735
+ - Preflight checks for git repo, branch scope, GitHub auth, PR state, and dirty worktree.
736
+ - Evidence collection for PR diff, checks, reviews, comments, files, commits, and review decision.
737
+ - Finding classification: actionable, informational, already addressed, needs user decision.
738
+ - Severity model: P0 critical, P1 high, P2 medium, P3 low/info.
739
+ - Conservative fix-planning protocol.
740
+ - Verification strategy for tests, lint, typecheck, and security-sensitive changes.
741
+ - Bounded review/fix loop with max-iteration and convergence guards.
742
+ - Comment-resolution safety rules.
743
+ - Security-focused checklist.
744
+ - Branch and CI protection guards.
745
+ - Dry-run mode.
746
+ - External-agent protocol requiring Caveman mode before Claude Code CLI or Copilot CLI task prompts.
747
+
748
+ ### Safety
749
+
750
+ - No auto-merge.
751
+ - No force-push.
752
+ - No blind push.
753
+ - No destructive git operations by default.
754
+ - No CI/test/lint weakening to pass checks.
755
+ - No human review comment resolution without explicit approval.