@tw93/waza 3.25.0 → 3.28.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. package/README.md +49 -25
  2. package/package.json +5 -3
  3. package/rules/anti-patterns.md +24 -20
  4. package/rules/durable-context.md +6 -0
  5. package/rules/waza-routing.md +18 -0
  6. package/scripts/build_metadata.py +28 -16
  7. package/scripts/check_routing_drift.py +8 -0
  8. package/scripts/package-skill.sh +2 -3
  9. package/scripts/setup-rule.sh +4 -2
  10. package/scripts/setup-statusline.sh +1 -1
  11. package/scripts/skill_checks.py +290 -2
  12. package/scripts/statusline.sh +6 -14
  13. package/scripts/validate_package.py +1 -1
  14. package/scripts/verify_skills.py +12 -0
  15. package/skills/RESOLVER.md +8 -8
  16. package/skills/check/SKILL.md +78 -28
  17. package/skills/check/references/project-context.md +14 -6
  18. package/skills/check/scripts/audit_signals.py +192 -11
  19. package/skills/design/SKILL.md +39 -2
  20. package/skills/design/references/design-reference.md +17 -0
  21. package/skills/design/references/design-tokens.md +3 -11
  22. package/skills/health/SKILL.md +53 -26
  23. package/skills/health/agents/inspector-context.md +1 -1
  24. package/skills/health/scripts/check_agent_context.py +38 -1
  25. package/skills/health/scripts/check_maintainability.py +6 -0
  26. package/skills/health/scripts/collect-data.sh +11 -20
  27. package/skills/hunt/SKILL.md +33 -1
  28. package/skills/hunt/references/failure-patterns.md +54 -0
  29. package/skills/learn/SKILL.md +13 -3
  30. package/skills/read/SKILL.md +40 -9
  31. package/skills/read/references/read-methods.md +23 -4
  32. package/skills/read/scripts/fetch.sh +8 -7
  33. package/skills/read/scripts/fetch_feishu.py +11 -6
  34. package/skills/think/SKILL.md +33 -8
  35. package/skills/write/SKILL.md +88 -10
  36. package/skills/write/references/write-en.md +19 -17
  37. package/skills/write/references/write-product-localization.md +43 -0
  38. package/skills/write/references/write-zh-bilingual.md +2 -3
  39. package/skills/write/references/write-zh-prose.md +2 -0
  40. package/skills/write/references/write-zh.md +144 -68
  41. package/skills/read/references/save-paths.md +0 -33
@@ -13,6 +13,13 @@ Prefix your first line with 🥷 inline, not as its own paragraph.
13
13
 
14
14
  Read the diff, find the problems, fix what can be fixed safely, ask about the rest. Done means verification ran in this session and passed.
15
15
 
16
+ ## Outcome Contract
17
+
18
+ - Outcome: a review, release decision, or maintainer action grounded in the current diff, project context, and live evidence.
19
+ - Done when: findings, fixes, shipped state, or blockers are stated with the commands, artifacts, or remote state that prove them.
20
+ - Evidence: worktree status, diff, public project docs, manifests, CI, package contents, release or registry state, and current command output.
21
+ - Output: concise findings first, then verification and shipped-state summary when applicable.
22
+
16
23
  ## Worktree Safety Preflight
17
24
 
18
25
  Before any review, triage, ship, release, or PR operation, read the current worktree with:
@@ -25,6 +32,10 @@ Treat modified, staged, and untracked files as user work. You may read them and
25
32
 
26
33
  Do not run these commands as default review or PR setup: `git switch`, `git checkout`, `git reset --hard`, `git clean`, `git stash -u`, `git stash --include-untracked`, `git stash -a`, `git stash --all`, or `gh pr checkout`. If a branch change or cleanup is genuinely required, stop and ask for that exact operation.
27
34
 
35
+ Do not "protect" user work by moving untracked files, generated files, screenshots, or local scratch files into `/tmp` or another holding directory. Moving someone else's WIP out of the checkout is the same class of interference as stashing it. If a clean tree is required for generation, packaging, or verification, use a separate worktree from a known commit and copy only the artifact or patch you own back into the current checkout.
36
+
37
+ For commit or push follow-through in a dirty or multi-agent checkout, record `git rev-parse HEAD` before staging. Re-read `git status --short --branch -uall` and `git rev-parse HEAD` immediately before commit and again before push. If HEAD moved, unknown commits appeared, or the worktree changed outside your intended files, stop and report the mismatch instead of rebasing, recommitting, or pushing.
38
+
28
39
  For PR inspection, prefer commands that do not switch the current working tree: `gh pr view`, `gh pr diff`, `git fetch origin pull/<n>/head:refs/tmp/pr-<n>`, and `git merge-tree`.
29
40
 
30
41
  ## Mode Picker
@@ -43,22 +54,6 @@ Pick the mode that matches the user's intent, then read that section in full. Mo
43
54
 
44
55
  Before any mode, run [Project Context Extraction](#project-context-extraction) and (if memory is in scope) [Durable Context Preflight](#durable-context-preflight).
45
56
 
46
- ## Plan Execution Mode
47
-
48
- Activate when the user's message starts with "Implement the following plan", "按计划实施", "按照计划", "整", "可以干", "直接改" followed by a plan body, or links to a `/think` output.
49
-
50
- In this mode, do not run a code review. Instead:
51
-
52
- 1. State which plan is being executed (first heading or summary line).
53
- 2. Check for obvious repo drift: run `git status --short --branch -uall` and skim any changed files that contradict the plan. If drift makes the plan unsafe, name the specific conflict and stop.
54
- 3. Work through each plan item as a to-do. Mark each complete as you go.
55
- 4. After all items are done, run the project's verification command.
56
- 5. Transition automatically into Ship mode if the project context or current thread indicates review-then-ship.
57
-
58
- ## Default Continuation (review-then-ship)
59
-
60
- When the project's `AGENTS.md` or the current thread explicitly asks to "commit after review", "ship if green", or equivalent, transition directly from review to the Ship flow after a clean review. Do not ask again. State "proceeding to ship" before acting.
61
-
62
57
  ## Project Context Extraction
63
58
 
64
59
  This is Waza's public, standalone code-review capability. It should not depend on private machine paths or unpublished project instructions.
@@ -81,6 +76,22 @@ See [rules/durable-context.md](../../rules/durable-context.md) for when to read
81
76
 
82
77
  For `/check`, private task constraints are `decision`, `preference`, and `principle` entries; review checklists are `pattern` and `learning`. Current code, diff, public docs, CI, tests, and remote state override memory. Durable memory can explain user intent and preferred follow-through, but public project rules still come from README files, manifests, CI workflows, release docs, the diff, and explicit instructions in the current thread. Never cite private memory as a public project requirement.
83
78
 
79
+ ## Plan Execution Mode
80
+
81
+ Activate when the user's message starts with "Implement the following plan", "按计划实施", "按照计划", "整", "可以干", "直接改" followed by a plan body, or links to a `/think` output.
82
+
83
+ In this mode, do not run a code review. Instead:
84
+
85
+ 1. State which plan is being executed (first heading or summary line).
86
+ 2. Check for obvious repo drift: run `git status --short --branch -uall` and skim any changed files that contradict the plan. If drift makes the plan unsafe, name the specific conflict and stop.
87
+ 3. Work through each plan item as a to-do. Mark each complete as you go.
88
+ 4. After all items are done, run the project's verification command.
89
+ 5. Transition automatically into Ship mode if the project context or current thread indicates review-then-ship.
90
+
91
+ ## Default Continuation (review-then-ship)
92
+
93
+ When the project's `AGENTS.md` or the current thread explicitly asks to "commit after review", "ship if green", or equivalent, transition directly from review to the Ship flow after a clean review. Do not ask again. State "proceeding to ship" before acting.
94
+
84
95
  ## Get the Diff
85
96
 
86
97
  Get the full diff between the current branch and the base branch. If unclear, ask. If already on the base branch, ask which commits to review.
@@ -91,11 +102,15 @@ Activate when the user mentions: issue, PR, "review all", triage, "batch", or "
91
102
 
92
103
  **Action-first rule:** Items with a clear disposition (already fixed, duplicate, already released) get acted on immediately without analysis paragraphs. When analyzing screenshots or images, state what you see and the suggested action in one message. Only ask the user when the disposition is genuinely ambiguous.
93
104
 
94
- **Flow:** Pull open items with `gh issue list -R <repo> --state open --limit 20` and `gh pr list -R <repo> --state open`. For each item, check if a fix already shipped: `git log --oneline <latest-tag>..HEAD | grep -i "<keyword>"`. If shipped: close with note. If merged but unreleased: reply "已修复,等下一个版本 release" and close. If no fix: analyze and act. Fix now if possible (`fix: closes #N` commit); when the target project documents a nightly, beta, or pre-release channel that already contains the fix, reply with that exact upgrade path and close; for valid-but-unreleased items acknowledge and leave open; for invalid items give one-two sentence reason and close.
105
+ **Bundled request classification:** When one issue, PR, or support thread contains several asks, split them before acting: core bug, existing affordance, cosmetic preference, and out-of-scope request. Fix or close only the validated core bug; answer existing affordances with the current path; defer or decline cosmetic and out-of-scope asks instead of treating the whole report as a to-do list.
106
+
107
+ **Status answer order:** For "都解决了吗", "is this fixed", "is this ready", or similar status checks, answer in this order: code or commit state, branch or CI state, release artifact or registry state, then public issue or PR state. Do not collapse fixed-on-main, available in pre-release, next stable release, and already shipped.
108
+
109
+ **Flow:** First identify the project's issue/PR host from public context. For GitHub projects, pull open items with `gh issue list -R <repo> --state open --limit 20` and `gh pr list -R <repo> --state open`. For non-GitHub projects, use the platform CLI/API named by the project docs or user request; if none exists, stop and report the missing integration instead of pretending GitHub commands apply. For each item, check if a fix already shipped: `git log --oneline <latest-tag>..HEAD | grep -i "<keyword>"`. If shipped: close with note. If merged but unreleased: reply "已修复,等下一个版本 release" and close. If no fix: analyze and act. Fix now if possible (`fix: closes #N` commit); when the target project documents a nightly, beta, or pre-release channel that already contains the fix, reply with that exact upgrade path and close; for valid-but-unreleased items acknowledge and leave open; for invalid items give one-two sentence reason and close.
95
110
 
96
111
  Before final conclusions in a live queue, refresh the issue/PR list once more and re-read any item that changed during the run. If evidence is incomplete, hold the item instead of closing it on a guess.
97
112
 
98
- **PR handling:** If the PR direction is accepted but the patch needs changes, prefer pushing the maintainer's fixes to the contributor's PR branch and then merging the PR. Check `maintainerCanModify` first. If branch edits are not allowed, ask the contributor to enable maintainer edits or push the needed revision; only fall back to a separate maintainer commit when timing or release safety requires it, and say so in the PR. Close without merging only when the direction is rejected, unsafe, no longer needed, or explicitly not part of the project's scope. Do not silently absorb an accepted PR into `main` and close it.
113
+ **PR handling:** If the PR direction is accepted but the patch needs changes, prefer pushing the maintainer's fixes to the contributor's PR branch and then merging the PR. Check `maintainerCanModify` first, then confirm the push remote, target branch, and current HEAD immediately before pushing so you do not overwrite contributor work or push maintainer fixes to the wrong repository. If branch edits are not allowed, ask the contributor to enable maintainer edits or push the needed revision; only fall back to a separate maintainer commit when timing or release safety requires it, and say so in the PR. Close without merging only when the direction is rejected, unsafe, no longer needed, or explicitly not part of the project's scope. Do not silently absorb an accepted PR into `main` and close it.
99
114
 
100
115
  **Public reply shape:** load `references/public-reply.md` for the full template (mention, single thanks, factual paragraphs, next-release step, editing rules, closure criteria). Ship Mode uses the same template; the file is the single source.
101
116
 
@@ -126,12 +141,23 @@ This mode extends review; it does not skip review. Before any public or irrevers
126
141
  1. Extract release rules from public project context: README, manifests, CI workflows, release notes, package scripts, changelogs, and explicit user instructions in the current thread.
127
142
  2. Fill the Release Gate 2.0 matrix from `references/project-context.md`: review base, dirty/staged/untracked state, latest tag, origin sync, version fields, generated artifacts, package/archive contents, release assets, registry/appcast/CI, and public issue/PR state.
128
143
  3. Verify generated or bundled outputs, version fields, release notes, package contents, and required artifacts are in sync. Prefer dry-run commands when the ecosystem provides them.
129
- 4. Commit only intended files. Preserve unrelated dirty work, and serialize git operations so index locks or overlapping adds do not corrupt the workflow.
144
+ Generated deliverables include tracked archives, ignored dist files, appcasts, site/download copy, registry packages, checksums, and release assets. If project docs require them, regenerate, inspect, and stage or upload them explicitly even when they are ignored by git; do not infer readiness from source-only tests.
145
+ 4. Commit only intended files. Preserve unrelated dirty work, serialize git operations so index locks or overlapping adds do not corrupt the workflow, and re-check HEAD/status before pushing so concurrent agent or maintainer commits are not swept into your ship action.
130
146
  5. Push, publish, tag, or create a release only when the user has explicitly approved that action. If auth, OTP, CI, registry, or network state blocks the operation, pause and report the exact blocker.
131
- 6. For issue/PR follow-through, confirm the item identity with `gh issue view` or `gh pr view` before posting. Use `references/public-reply.md` for the maintainer reply template (mention, single thanks, facts, explicit next release or verification step) and its closure criteria.
132
- 7. For GitHub release reaction follow-through, only do it when project context or the current thread asks for it. After the release exists and required assets are verified, resolve the release id from the tag, POST every positive release reaction to `repos/<owner>/<repo>/releases/<id>/reactions` with `gh api`, and re-read reactions to confirm. Positive release reactions are `+1`, `laugh`, `heart`, `hooray`, `rocket`, and `eyes`.
147
+ 6. For issue/PR follow-through, confirm the item identity with the host's read command before posting. On GitHub, use `gh issue view` or `gh pr view`; on other hosts, use the CLI/API named by project docs or the current request. Use `references/public-reply.md` for the maintainer reply template (mention, single thanks, facts, explicit next release or verification step) and its closure criteria.
148
+ 7. For GitHub release reaction follow-through, only do it when project context or the current thread asks for it. After the release exists and required assets are verified, resolve the release id from the tag, POST every positive release reaction to `repos/<owner>/<repo>/releases/<id>/reactions` with `gh api` or the available GitHub tool, and re-read reactions to confirm. Positive release reactions are `+1`, `laugh`, `heart`, `hooray`, `rocket`, and `eyes`.
133
149
  8. After network or API failures, re-read the end state instead of assuming success or failure.
134
150
 
151
+ ### Reworked Or Cancelled Release Gate
152
+
153
+ Activate this gate when a release candidate was cancelled, a preview or beta had repeated bug-fix churn, or the user asks whether a delayed release is finally safe.
154
+
155
+ 1. Lock the review base to the last public stable tag or release artifact, then review through current `HEAD`. Do not limit the review to recent commits or the latest local diff.
156
+ 2. Record the exact base, `HEAD`, dirty state, origin sync, version fields, generated artifacts, release notes, package contents, CI, and remote distribution state. If any state changes mid-review, refresh the range and rerun the fast gates.
157
+ 3. Review by shipped risk surface: user-reported regressions, crash or hang paths, destructive operations, privilege or permission boundaries, background workers, startup or first-frame work, update feeds, package contents, and public support claims.
158
+ 4. Output two release decisions, not one: whether the preview or beta can keep taking user testing, and whether stable release prep can start.
159
+ 5. Every conclusion must name blockers, deferrable maintenance, commands that ran, and runtime or user-smoke coverage. Source tests alone cannot prove a reworked UI/native release ready.
160
+
135
161
  End with the concrete shipped state: commit hash, tag, release URL, registry/version result, pushed branch, release asset state, release reaction state, issue/PR state, and any remaining blockers. Omit fields that do not apply.
136
162
 
137
163
  ## Project Audit Mode
@@ -140,7 +166,7 @@ Activate when the user asks for a project-wide code-quality scorecard: "audit",
140
166
 
141
167
  **Flow**
142
168
 
143
- 1. Run `python3 <waza>/skills/check/scripts/audit_signals.py --root <project>` from the target repo. The script emits ten labelled blocks (`=== FILE SIZE HOTSPOTS ===` ... `=== DENYLIST IN BUILD ===`) each ending with `status: PASS|WARN|FAIL`.
169
+ 1. Run `python3 <waza>/skills/check/scripts/audit_signals.py --root <project>` from the target repo. The script emits labelled blocks (`=== FILE SIZE HOTSPOTS ===` ... `=== DENYLIST IN BUILD ===`) each ending with `status: PASS|WARN|FAIL|N/A`.
144
170
  2. Skim the largest source files surfaced by `FILE SIZE HOTSPOTS` (typically 3-5; stop sooner if the architecture is already clear).
145
171
  3. Read `CLAUDE.md` / `AGENTS.md` / `README.md` to learn the project's own stated conventions before judging it against generic ones.
146
172
  4. Apply the four-axis rubric below. Each axis is independently scored 0-10. Overall = arithmetic mean.
@@ -207,6 +233,8 @@ State the depth before proceeding.
207
233
 
208
234
  Before reading code, check scope drift: do the diff and the stated goal match? Label: **on target** / **drift** / **incomplete**.
209
235
 
236
+ Also check surgical traceability: every changed file and every new public surface must trace back to the user's stated goal. If a file, dependency, config knob, abstraction, generated artifact, workflow permission, or release behavior cannot be explained in one sentence from the request, label it drift until proven necessary.
237
+
210
238
  Drift signals (examples, not exhaustive -- any one is enough to label drift):
211
239
  - A changed file has no connection to the stated goal
212
240
  - The diff includes pure refactoring (renames, formatting, restructuring) when the goal was a bug fix or feature
@@ -219,21 +247,39 @@ Drift signals (examples, not exhaustive -- any one is enough to label drift):
219
247
 
220
248
  When the diff fixes one instance of a class-of-bug (a missing validation, a wrong selector, an off-by-one, a missing lock), the same shape often lives elsewhere. Extract the pattern signature, `grep -rn` it across the repo (exclude generated dirs), and confirm sibling instances were also handled. List any unswept sibling: flag it as a hard stop when it carries the same risk, advisory when lower-risk. For a deeper sweep playbook, see hunt's Scope Blast Mode.
221
249
 
250
+ ## Testability Seam For Recurring Bugs
251
+
252
+ When the diff fixes a visual, layout, timing, or stateful-UI bug that has recurred (the same area broke before, or the fix reads as "tune a number until it looks right"), a code change alone will let the regression return: the logic is entangled with mutable render or UI state, so there is nowhere to assert on it. Flag the fix as incomplete unless it pulls the decision into a pure function -- inputs in, value out, no mutable receiver -- and unit-tests the invariant that was violated (a width never collapses to zero, a hit region stays half-open, an offset stays in bounds). "Verified by running the app" confirms this one instance; only a pinned invariant stops the next one. Reserve this for classes that recur or that runtime checks cannot see; do not demand a seam for one-off logic that already has straightforward coverage.
253
+
254
+ ## CLI Command Surface
255
+
256
+ When a diff touches a CLI entrypoint, installer, completion, config/env handling, package wrapper, or a mutating command such as cleanup, update, uninstall, migration, or cache removal, fill the CLI Command Surface from `references/project-context.md` before sign-off.
257
+
258
+ Check command contract and installed-runtime behavior, not just library tests: help/version, subcommands/flags, exit codes, stdout/stderr, JSON/schema output, TTY/non-interactive paths, env/config precedence, shebang/executable bit, PATH shim, and package-manager install path when applicable.
259
+
260
+ For mutating CLI commands, also run the Safety Sink Review: dry-run or confirmation path, operation log or rollback story, retry/idempotency, signal/partial-failure handling, and test-mode guards for auth prompts or real system changes. For cleanup, uninstall, prune, reset, or cache-removal commands, add two checks before approval: can a normal user verify each selected item is safe, and is the deleted content locally rebuildable rather than a downloaded dependency or user data? If either answer is no, require narrower matching, explicit user selection, or leave the item visible but non-destructive.
261
+
222
262
  ## Hard Stops (fix before merging)
223
263
 
224
264
  Examples, not exhaustive -- flag any diff that could cause irreversible harm if merged unreviewed.
225
265
 
226
266
  - **No unverified claims.** Do not write "I verified X", "I ran Y", "tests pass", or "this fixes Z" unless the shell output is in this turn's transcript. If you reason about behavior without running, say "based on reading the code" instead of "I verified". Every verification claim in the sign-off must point to a command that actually ran in this session.
227
267
  - **Re-read before citing source-of-truth facts.** Before writing a line number, dirty-file count, branch ahead/behind state, fallback behavior, locale coverage, or release artifact state into a handoff or review report, re-read the source in this turn (`git status`, `git diff`, file `Read`, `rg`, command output). Earlier chat context, prior agent's notes, and your own recall from a hundred turns ago are stale by default; restating "the catalog uses en fallback" or "the file is at line 310" without checking has been the recurring failure mode in long sessions. Cite the verification path inline (`per current Read of <file>` / `per `git status` this turn`) so reviewers know which facts are anchored.
268
+ - **String-matching on captured output?** When a diff branches on, greps, or classifies an error message or command output, verify what that string actually holds at runtime before approving. A subprocess spawned with `stdio: 'inherit'` (or any uncaptured pipe) streams its diagnostics to the terminal, not into `error.message` -- which then contains only the command line. Such a matcher silently matches the command, not the output: it can pass tests, fire on the wrong token, or be dead in production while looking correct. Probe the real `error.message` (a one-line repro) instead of assuming, and prefer driving behavior off a structured fact the caller already holds (build target, exit code) over re-parsing a string.
228
269
  - **Destructive auto-execution**: any task marked "safe" or "auto-run" that modifies user-visible state (history files, config, preferences, installed software) must require explicit confirmation.
229
270
  - **Release artifacts missing**: verify every artifact listed in release notes, release templates, or project workflows exists and has been uploaded before declaring done.
230
271
  - **Generated artifact drift**: if source changes require generated or bundled outputs, verify the output was regenerated and included.
272
+ - **Verifier failure layer unclear**: if a verifier fails before assertions or due to missing optional dependencies, bootstrap noise, transient build-service crashes, unavailable simulators, or tool setup, classify setup versus product failure. Retry only with new evidence or a narrower environment. Do not call the repo broken until the intended test body or artifact check actually ran.
231
273
  - **Tracked package omissions**: if a package script builds from tracked files, allowlists, or generated manifests, verify every new helper module, reference file, template, or script used by the diff is tracked and present in the built archive before sign-off.
232
274
  - **Version skew**: release version fields across manifests, package metadata, app configs, changelogs, tags, or lockfiles must stay synchronized.
233
275
  - **Unknown identifiers in diff**: any function, variable, or type introduced in the diff that does not exist in the codebase is a hard stop. Grep before writing or approving any reference: `grep -r "name" .` -- no results outside the diff = does not exist.
276
+ - **Dead-code or YAGNI deletion without proof**: any "zero callers" or "unused" claim must be checked across the whole repository, including top-level entrypoints, docs, tests, generated dispatch tables, scripts, CI, and dynamic lookup patterns. Treat sub-agent or tool reports as leads, not proof. Before deleting, batch-grep all candidates, classify test-only references separately from production references, and chase written variables or data tables that may become orphaned together. If the grep scope is partial, do not delete.
234
277
  - **Injection and validation**: SQL, command, path injection at system entry points. Credentials hardcoded, logged, committed, or copied into public docs.
235
278
  - **Dependency changes**: unexpected additions or version bumps in package.json, Cargo.toml, go.mod, requirements.txt. Flag any new dependency not obviously required by the diff.
236
279
  - **Safety sinks**: destructive file operations, shell or AppleScript construction, cwd/path/symlink traversal, approval or sandbox boundary changes, signing/appcast flows, and auth prompts need explicit review of validation, rollback, and user-confirmation behavior.
280
+ - **Audit before restore**: when the diff re-adds a symbol, string, asset, or config field that recent history removed, grep the rest of the diff and the main branch to confirm anything still uses it. A rule file that names the symbol is not proof of life. If only a parity test references it, the rule is stale and the restore is wrong; reject the restore and flag the stale rule. Specifically suspicious: re-adding an enum case, xcstrings entry, dictionary key, or asset file that the prior commit deleted intentionally.
281
+ - **AI-generated PR with broad matchers in destructive sinks**: any PR that introduces `find`-like recursion, mass-delete, sandbox/container traversal, ID-prefix wildcards, or fallback regex branches feeding a destructive sink, and was likely AI-generated, must be reviewed line-by-line for three things: matcher breadth in every branch (fallback paths often regress to broad globs even when the primary branch is correct), protected-path coverage (does the existing guard list include this new entry point?), and whether the change bypasses an existing user-confirmation step. Generic plausibility is not safety. When in doubt, ask the contributor to narrow the matcher to an exact constant (exact bundle ID, exact app name, exact path), not a prefix or wildcard; do not approve "this looks fine."
282
+ - **Migration code for features that did not ship before**: reject migration scaffolding, version-gated defaults, or "carry old key forward" logic when the underlying preference / schema / feature was introduced in this same release. `git show v<last-release>:<path>` is the gate: if the key is absent from the last tag, no migration is needed; ship the default. Migration code added for a never-shipped key is dead-on-arrival complexity.
237
283
 
238
284
  ## Finding Quality Gate
239
285
 
@@ -282,6 +328,8 @@ Load `references/persona-catalog.md` to determine which specialists activate. La
282
328
 
283
329
  Merge findings: when two specialists flag the same code location, keep the higher severity and note cross-reviewer agreement. Findings on different code locations are never duplicates even if they share a theme.
284
330
 
331
+ Treat each specialist finding as a claim to verify, not a fact to act on. Before routing a finding to Autofix or sign-off, re-read the cited code this turn and confirm it is real and live: not already handled elsewhere, not consistent-by-design, not a latent-only risk labeled as a live bug. Parallel reviewers over-report from name-based inference and partial context; drop or downgrade what dissolves on direct read, and cite the verification path.
332
+
285
333
  ## Autofix Routing
286
334
 
287
335
  | Class | Definition | Action |
@@ -297,9 +345,9 @@ Apply all `safe_auto` fixes first. Batch all `gated_auto` into one confirmation
297
345
 
298
346
  "If I were trying to break this system through this specific diff, what would I exploit?" Four angles (see `references/persona-catalog.md`): assumption violation, composition failures, cascade construction, abuse cases. Suppress findings below 0.60 confidence.
299
347
 
300
- ## GitHub Operations
348
+ ## Platform Operations
301
349
 
302
- Use `gh` CLI for all GitHub interactions, not MCP or raw API. Confirm CI passes before merging.
350
+ Use the platform tool that matches the project. For GitHub projects, prefer `gh` or the available GitHub integration and confirm CI passes before merging. For non-GitHub projects, derive the CLI/API from public project docs or the user's explicit platform context; do not force GitHub commands onto other hosts.
303
351
 
304
352
  ## Verification
305
353
 
@@ -309,16 +357,18 @@ If the script exits non-zero or prints `(no test command detected)`: halt. Do no
309
357
 
310
358
  For bug fixes: a regression test that fails on the old code must exist before the fix is done.
311
359
 
360
+ In a dirty or multi-agent checkout, a passing local build or test run is not proof your change is sound: unrelated WIP already in the tree can supply missing symbols, mask a break, or fail for reasons unrelated to you. Verify in isolation -- `git worktree add --detach <known-good-commit>`, `git apply` only the diff of the files you own, then build/test there. The clean isolated pass is the real signal; the contaminated local pass is not.
361
+
312
362
  ## Gotchas
313
363
 
314
364
  | What happened | Rule |
315
365
  |---------------|------|
316
- | Commented on #249 when discussing #255 | Run `gh issue view N` to confirm title before acting |
366
+ | Posted a public reply to the wrong issue or PR thread | Re-read the target with `gh issue view N` or `gh pr view N` and confirm title, author, and current state before acting |
317
367
  | PR comment sounded like a report | 1-2 sentences, natural, like a colleague. Not structured, not AI-sounding. |
318
368
  | PR comment used bullet points | Write as short paragraphs, one thought per paragraph; thank the contributor first |
319
- | article.en.md inside _posts_en/ doubled the suffix | Check naming convention of existing files in the target directory first |
320
- | Deployed without env vars set | Run `vercel env ls` before deploying; diff against local keys |
321
- | Push failed from auth mismatch | Run `git remote -v` before the first push in a new project |
369
+ | New file name duplicated a locale, platform, or suffix convention | Check the target directory's existing naming convention before creating or renaming files |
370
+ | Deployed without provider runtime or env checks | Follow the project's public deployment docs and compare provider config with local required env and runtime settings |
371
+ | Push failed from auth mismatch | Check `git remote -v`, current branch, and auth identity before the first push in a new project |
322
372
 
323
373
  ## Document Review
324
374
 
@@ -20,6 +20,8 @@ Use this template to compress repository context before running Waza `/check`. T
20
20
  - Protected files and directories.
21
21
  - Generated or bundled artifacts that must stay in sync with source changes.
22
22
  - Packaging source of truth: whether archives are built from `git ls-files`, explicit allowlists, generated manifests, or source directories.
23
+ - Delivery surfaces: whether generated outputs are tracked, ignored, external release assets, registry uploads, appcasts, installer metadata, checksums, or site/download copy; how they are regenerated, inspected, staged, or uploaded.
24
+ - CLI command surfaces: entrypoints, subcommands, flags, help/version behavior, exit codes, stdout/stderr contract, TTY and non-interactive paths, config/env precedence, and installed-runtime checks.
23
25
  - Runtime dependencies introduced by the diff: Python packages, CLIs, network services, package managers, or platform tools that are not already declared in CI/docs.
24
26
  - Domain-specific safety rules.
25
27
  - Release artifacts that must exist.
@@ -34,6 +36,7 @@ Use this template to compress repository context before running Waza `/check`. T
34
36
  - Maintainer-only machine paths.
35
37
  - One-off personal preferences that do not affect project behavior.
36
38
  - One-off review reports, scorecards, or diagnostic snapshots copied as guidance instead of distilled into stable project rules.
39
+ - Raw memory, chat excerpts, screenshots, private support details, local paths, project-specific commands, issue/PR numbers, release tags, or commit hashes from another project.
37
40
  - Full copies of Waza `/check` sections.
38
41
 
39
42
  ## Recommended Context Shape
@@ -45,10 +48,19 @@ Use this template to compress repository context before running Waza `/check`. T
45
48
  - Fast check: `<command>`
46
49
  - Full verification: `<command>`
47
50
 
51
+ ## CLI Command Surface
52
+
53
+ - Entrypoints: `<command or bin>`.
54
+ - Command contract: help/version, subcommands, flags, exit codes, stdout/stderr, JSON/schema output.
55
+ - Runtime shape: TTY vs non-interactive behavior, env/config precedence, completion/manpage or shell integration.
56
+ - Install/run proof: built package, temp prefix, PATH shim, shebang/executable bit, or package-manager path checked with `<command>`.
57
+ - Mutating commands: dry-run/confirmation, operation log, rollback/retry behavior, signal/partial-failure handling.
58
+
48
59
  ## Project Hard Stops
49
60
 
50
61
  - Do not modify `<protected path>` unless explicitly requested.
51
62
  - If `<artifact>` is generated from `<source>`, verify it was regenerated.
63
+ - If `<artifact>` is ignored by git but required for release, verify the regeneration and force-stage, upload, or registry publish path named by the project.
52
64
  - If `<package script>` builds from tracked files or an allowlist, verify newly introduced helpers, references, templates, and scripts are included in `<archive>`.
53
65
  - If an installer fetches remote content, verify the default ref is pinned to a release tag or checksum-protected; floating `main` must be an explicit override.
54
66
  - If a helper introduces a non-stdlib package or external CLI, verify CI installs it or the helper fails with a clear setup path.
@@ -60,11 +72,7 @@ Use this template to compress repository context before running Waza `/check`. T
60
72
 
61
73
  ## Public Replies
62
74
 
63
- - Draft replies in the same language as the thread.
64
- - Do not post comments, close issues, or merge PRs without maintainer approval.
65
- - For accepted PRs, prefer updating the contributor branch and merging the PR; close without merge only when the direction is rejected, unsafe, out of scope, or the branch cannot be updated and a maintainer commit is explicitly needed.
66
- - Default reply shape: `@<user>` + thanks, brief reason/action, then update command, release/version, or next step.
67
- - Keep shipped-fix replies to 1-2 natural sentences unless the project explicitly uses a longer template.
75
+ See `public-reply.md` for the full reply template (language match, `@user` + thanks, factual paragraphs, ship-state line, closure criteria). It is the single source; do not restate the rules here.
68
76
 
69
77
  ## Release Follow-through
70
78
 
@@ -88,7 +96,7 @@ Fill this before claiming a change is release-ready. Use "n/a" only when the pro
88
96
  | Remote state | `origin/main` or release branch sync checked |
89
97
  | Version fields | Manifest, app config, changelog, appcast, and lockfile versions aligned |
90
98
  | Runtime dependencies | Newly introduced Python packages, CLIs, package managers, and network tools declared and available in CI |
91
- | Generated artifacts | Bundled/minified/archive outputs regenerated or proven not needed |
99
+ | Generated artifacts | Tracked archives, ignored dist outputs, bundled/minified files, appcasts, installer metadata, checksums, and site/download copy regenerated or proven not needed |
92
100
  | Package/archive contents | Built package inspected for required files, newly introduced helpers/references, and missing extras |
93
101
  | Release assets | GitHub release, appcast, download archive, checksum, or installer assets verified |
94
102
  | Registry/appcast | npm/crates/Homebrew/appcast/App Store or equivalent state re-read after publish |
@@ -1,8 +1,8 @@
1
1
  #!/usr/bin/env python3
2
2
  """Project audit signals (Phase 1) for /check audit mode.
3
3
 
4
- Walks a project root and emits 10 structured signal blocks to stdout.
5
- Each block ends with `status: PASS|WARN|FAIL` so the LLM driving the
4
+ Walks a project root and emits structured signal blocks to stdout.
5
+ Each block ends with `status: PASS|WARN|FAIL|N/A` so the LLM driving the
6
6
  4-axis Linus-style scorecard can skim quickly.
7
7
 
8
8
  Pure stdlib. Read-only. Exits 0 even on WARN/FAIL so the harness does
@@ -14,6 +14,7 @@ Run as: python3 skills/check/scripts/audit_signals.py --root <path>
14
14
  from __future__ import annotations
15
15
 
16
16
  import argparse
17
+ import json
17
18
  import os
18
19
  import re
19
20
  import subprocess
@@ -54,8 +55,42 @@ DENYLIST_HINT_RE = re.compile(
54
55
  re.IGNORECASE,
55
56
  )
56
57
  MINIFIED_RE = re.compile(r"\.min\.[a-z]+$", re.IGNORECASE)
58
+ CLI_CONTRACT_BUCKETS: tuple[tuple[str, re.Pattern[str]], ...] = (
59
+ ("help_or_usage", re.compile(r"(--help|\busage\b|\bhelp output\b)", re.IGNORECASE)),
60
+ ("version", re.compile(r"(--version|\bversion output\b)", re.IGNORECASE)),
61
+ ("exit_code", re.compile(r"\b(exit code|exit status|return code|exit_code|\$\?)\b", re.IGNORECASE)),
62
+ ("stdout", re.compile(r"\b(stdout|standard output)\b|>\s*\"\$?[A-Za-z0-9_./-]*stdout", re.IGNORECASE)),
63
+ ("stderr", re.compile(r"\b(stderr|standard error)\b|2>\s*\"\$?[A-Za-z0-9_./-]*stderr", re.IGNORECASE)),
64
+ ("non_interactive_or_tty", re.compile(r"\b(non-interactive|noninteractive|tty|isatty|/dev/null|CI=1)\b", re.IGNORECASE)),
65
+ (
66
+ "install_run",
67
+ re.compile(
68
+ r"(\binstall\s+-m\b|\binstalled command\b|\binstalled-runtime\b|"
69
+ r"\binstall/run\b|\binstall run\b|\btemp prefix\b|\bPATH shim\b|"
70
+ r"\bpackage-manager path\b|\bnpm link\b|\bpipx install\b|"
71
+ r"\bcargo install\b|\bbrew install\b|\bmake install\b)",
72
+ re.IGNORECASE,
73
+ ),
74
+ ),
75
+ ("json_or_schema", re.compile(r"\b(json|schema)\b", re.IGNORECASE)),
76
+ ("completion", re.compile(r"\bcompletion\b", re.IGNORECASE)),
77
+ )
78
+ CLI_CORE_BUCKETS = (
79
+ "help_or_usage",
80
+ "version",
81
+ "exit_code",
82
+ "stdout",
83
+ "stderr",
84
+ "install_run",
85
+ )
57
86
 
58
87
 
88
+ # The file-walk helpers below are deliberately duplicated in
89
+ # skills/health/scripts/check_maintainability.py. Both scripts ship
90
+ # standalone (see packaging.allowlist) and run inside an arbitrary target
91
+ # project, so they import only stdlib. Do not hoist them into a shared
92
+ # scripts/ module: it is dev-only, not on the ship allowlist, and would
93
+ # couple a standalone tool to the install layout.
59
94
  def is_excluded(path: Path, root: Path) -> bool:
60
95
  try:
61
96
  parts = path.relative_to(root).parts
@@ -129,23 +164,18 @@ def status(label: str) -> None:
129
164
 
130
165
  def block_hotspots(files: list[Path], root: Path) -> None:
131
166
  header("FILE SIZE HOTSPOTS")
167
+ sized = ((p, line_count(p)) for p in files if p.suffix in SOURCE_EXTS)
132
168
  big = sorted(
133
- ((p, line_count(p)) for p in files
134
- if p.suffix in SOURCE_EXTS and line_count(p) >= HOTSPOT_LINES),
169
+ (item for item in sized if item[1] >= HOTSPOT_LINES),
135
170
  key=lambda x: -x[1],
136
171
  )[:10]
137
172
  if not big:
138
- print("(no source files >= 800 lines)")
173
+ print(f"(no source files >= {HOTSPOT_LINES} lines)")
139
174
  status("PASS")
140
175
  return
141
176
  for path, n in big:
142
177
  print(f" {n:>5} {rel(path, root)}")
143
- if any(n >= HOTSPOT_FAIL for _, n in big):
144
- status("FAIL")
145
- elif len(big) > 3:
146
- status("WARN")
147
- else:
148
- status("WARN")
178
+ status("FAIL" if any(n >= HOTSPOT_FAIL for _, n in big) else "WARN")
149
179
 
150
180
 
151
181
  def block_heredoc(files: list[Path], root: Path) -> None:
@@ -214,6 +244,156 @@ def block_test_ci(files: list[Path], root: Path) -> None:
214
244
  status("PASS")
215
245
 
216
246
 
247
+ def _package_bin_entrypoints(root: Path) -> list[str]:
248
+ path = root / "package.json"
249
+ if not path.is_file():
250
+ return []
251
+ text = read_text(path, 200_000)
252
+ try:
253
+ data = json.loads(text)
254
+ except json.JSONDecodeError:
255
+ return []
256
+ bin_field = data.get("bin")
257
+ name = str(data.get("name") or "package")
258
+ if isinstance(bin_field, str):
259
+ return [f"package.json bin:{name} -> {bin_field}"]
260
+ if isinstance(bin_field, dict):
261
+ return [
262
+ f"package.json bin:{cmd} -> {target}"
263
+ for cmd, target in sorted(bin_field.items())
264
+ if isinstance(cmd, str) and isinstance(target, str)
265
+ ]
266
+ return []
267
+
268
+
269
+ def _pyproject_script_entrypoints(root: Path) -> list[str]:
270
+ path = root / "pyproject.toml"
271
+ if not path.is_file():
272
+ return []
273
+ text = read_text(path, 200_000)
274
+ entries: list[str] = []
275
+ in_scripts = False
276
+ for line in text.splitlines():
277
+ stripped = line.strip()
278
+ if stripped.startswith("[") and stripped.endswith("]"):
279
+ in_scripts = stripped in {
280
+ "[project.scripts]",
281
+ "[tool.poetry.scripts]",
282
+ }
283
+ continue
284
+ if not in_scripts or not stripped or stripped.startswith("#"):
285
+ continue
286
+ m = re.match(r'([A-Za-z0-9_.-]+)\s*=\s*["\']([^"\']+)["\']', stripped)
287
+ if m:
288
+ entries.append(f"pyproject.toml script:{m.group(1)} -> {m.group(2)}")
289
+ return entries
290
+
291
+
292
+ def _cargo_entrypoints(root: Path) -> list[str]:
293
+ entries: list[str] = []
294
+ cargo = root / "Cargo.toml"
295
+ if cargo.is_file():
296
+ text = read_text(cargo, 200_000)
297
+ if "[[bin]]" in text:
298
+ names = re.findall(r'(?m)^\s*name\s*=\s*["\']([^"\']+)["\']', text)
299
+ if names:
300
+ entries.extend(f"Cargo.toml bin:{name}" for name in sorted(set(names)))
301
+ else:
302
+ entries.append("Cargo.toml [[bin]]")
303
+ if (root / "src" / "main.rs").is_file():
304
+ entries.append("src/main.rs")
305
+ return entries
306
+
307
+
308
+ def cli_entrypoints(files: list[Path], root: Path) -> list[str]:
309
+ entries: set[str] = set()
310
+ entries.update(_package_bin_entrypoints(root))
311
+ entries.update(_pyproject_script_entrypoints(root))
312
+ entries.update(_cargo_entrypoints(root))
313
+
314
+ for path in files:
315
+ try:
316
+ parts = path.relative_to(root).parts
317
+ except ValueError:
318
+ continue
319
+ if not parts:
320
+ continue
321
+ if parts[0] == "bin" and len(parts) >= 2:
322
+ entries.add("/".join(parts[:2]))
323
+ if parts[0] == "cmd" and len(parts) >= 3 and path.suffix == ".go":
324
+ entries.add(f"cmd/{parts[1]}")
325
+ return sorted(entries)
326
+
327
+
328
+ def _is_cli_contract_candidate(path: Path, root: Path) -> bool:
329
+ try:
330
+ parts = path.relative_to(root).parts
331
+ except ValueError:
332
+ return False
333
+ if not parts:
334
+ return False
335
+ lower_parts = tuple(p.lower() for p in parts)
336
+ name = lower_parts[-1]
337
+ if name in {"readme.md", "readme.txt", "agents.md", "claude.md"}:
338
+ return True
339
+ if lower_parts[0] in {"tests", "test", "spec", "scripts"}:
340
+ return True
341
+ if "test" in name or "spec" in name:
342
+ return True
343
+ if len(lower_parts) >= 3 and lower_parts[:2] == (".github", "workflows"):
344
+ return True
345
+ return False
346
+
347
+
348
+ def cli_contract_evidence(files: list[Path], root: Path) -> dict[str, list[tuple[str, str]]]:
349
+ hits: dict[str, list[tuple[str, str]]] = {}
350
+ for path in files:
351
+ if not _is_cli_contract_candidate(path, root):
352
+ continue
353
+ text = read_text(path, 200_000)
354
+ if not text:
355
+ continue
356
+ for bucket, pattern in CLI_CONTRACT_BUCKETS:
357
+ m = pattern.search(text)
358
+ if m:
359
+ hits.setdefault(bucket, []).append((rel(path, root), m.group(0)))
360
+ return {bucket: sorted(values) for bucket, values in sorted(hits.items())}
361
+
362
+
363
+ def block_cli_contract_surface(files: list[Path], root: Path) -> None:
364
+ header("CLI CONTRACT SURFACE")
365
+ entries = cli_entrypoints(files, root)
366
+ if not entries:
367
+ print("(no CLI entrypoints detected)")
368
+ status("N/A")
369
+ return
370
+
371
+ print(f"entrypoints={len(entries)}")
372
+ for entry in entries[:12]:
373
+ print(f" entry: {entry}")
374
+ if len(entries) > 12:
375
+ print(f" ... {len(entries) - 12} more")
376
+
377
+ evidence = cli_contract_evidence(files, root)
378
+ covered = tuple(bucket for bucket, _ in CLI_CONTRACT_BUCKETS if bucket in evidence)
379
+ missing = tuple(bucket for bucket in CLI_CORE_BUCKETS if bucket not in evidence)
380
+ print(f"covered={','.join(covered) if covered else 'none'}")
381
+ print(f"missing={','.join(missing) if missing else 'none'}")
382
+ printed = 0
383
+ for bucket in covered:
384
+ for path, signal in evidence[bucket][:3]:
385
+ print(f" evidence: {bucket} {path} signal={signal}")
386
+ printed += 1
387
+ if printed >= 12:
388
+ break
389
+ if printed >= 12:
390
+ break
391
+ if not missing:
392
+ status("PASS")
393
+ else:
394
+ status("WARN")
395
+
396
+
217
397
  def _grep_version(path: Path, pattern: str) -> str | None:
218
398
  text = read_text(path, 20_000)
219
399
  if not text:
@@ -471,6 +651,7 @@ def main() -> int:
471
651
  block_hotspots(files, root); print()
472
652
  block_heredoc(files, root); print()
473
653
  block_test_ci(files, root); print()
654
+ block_cli_contract_surface(files, root); print()
474
655
  block_version_sources(root); print()
475
656
  block_packaging_posture(root); print()
476
657
  block_install_url(root); print()