@tw93/waza 3.25.0 → 3.27.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -2,9 +2,9 @@
2
2
 
3
3
  ## Shared Output Marker
4
4
 
5
- 所有技能都沿用同一个输出约定:首行内联带上 `🥷`,不要单独起段。这个约定写在各自的 `SKILL.md` 里,`verify-skills.sh` 也会校验它。
5
+ 所有技能都沿用同一个输出约定:首行内联带上 `🥷`,不要单独起段。这个约定写在各自的 `SKILL.md` 里,`scripts/verify_skills.py` 也会校验它。
6
6
 
7
- 触发词到技能的路由表。Claude Code 通过每个 SKILL.md 的 `description` 自动匹配,这份文档是给人看的集中索引,也是 `verify-skills.sh` 的校验依据。改 SKILL.md 的适用范围时,同步改这里。
7
+ 触发词到技能的路由表。Claude Code 通过每个 SKILL.md 的 `description` 自动匹配,这份文档是给人看的集中索引,也是 `scripts/verify_skills.py` 的校验依据。改 SKILL.md 的适用范围时,同步改这里。
8
8
 
9
9
  > **Read the skill file before acting.** 两个技能都可能匹配时,两个都读。它们设计成可串联(例:`/think` → 实现 → `/check`)。
10
10
 
@@ -59,19 +59,19 @@
59
59
 
60
60
  ## Chaining(常见串联)
61
61
 
62
- 技能之间的转换需要用户手动触发,不会自动串联。每个技能完成后会停下来,等你决定下一步。
62
+ 技能之间默认不自动串联。每个技能完成后会停下来,等用户决定下一步,除非当前请求或项目公开上下文已经明确授权后续动作(例如 "implement this plan", "review then ship if green", "triage and close")。
63
63
 
64
64
  - `/think` 出方案 → **用户说"实现"** → 实施 → **用户说"/check"** → `/check` 把关
65
65
  - `/think` 出可执行计划 → **用户说"Implement the plan / 可以干 / 直接改"** → 按计划实施,不重新争论方向
66
66
  - `/hunt` 修复 issue → **用户说"发布 / push / 关闭 issue"** → `/check` 做发布前检查和收尾
67
- - `/read` 取回多篇 URL → **用户说"/learn"** → `/learn` 综合成文
67
+ - `/read` 取回多篇 URL → **用户说"/learn"** → `/learn` 综合成文;如果同一回合已经明确要求总结或分析,`/read` fetch 后直接满足该请求
68
68
  - `/learn` 出初稿 → **用户说"/write"** → `/write` 去 AI 味
69
69
  - `/hunt` 定位根因 → **用户说"修"** → 修完 → **用户说"/check"** → `/check` 确认没副作用
70
70
  - `/health` 发现 skill 配置问题 → **用户说"修"** → 修完 → **用户说"/health"** → 再跑一次 `/health`
71
71
 
72
72
  ## Latent vs Deterministic
73
73
 
74
- Waza 的技能都是 fat skill(Markdown 判断),底层的确定性约束走 `scripts/verify-skills.sh` 和 `rules/*.md`。新加能力时先问:
74
+ Waza 的技能都是 fat skill(Markdown 判断),底层的确定性约束走 `scripts/verify_skills.py` 和 `rules/*.md`。新加能力时先问:
75
75
 
76
76
  - 需要判断 / 适应场景 / 追问用户?→ skill
77
77
  - 同入同出 / 只是校验和列举?→ script 或 rule
@@ -83,8 +83,8 @@ Waza 的技能都是 fat skill(Markdown 判断),底层的确定性约束
83
83
  通用程序员能力沉淀在 Waza。遇到具体项目时,先从公开项目上下文提炼约束,再执行对应技能:
84
84
 
85
85
  - `code-review` / `/check` -> 从 diff、README、manifest、CI、release notes 中提炼验证命令、生成物、风险、safety sinks 和发布规则。
86
- - `github-ops` -> 复用 `skills/check/SKILL.md` Triage Mode,并从 issue/PR 现场确认 repo、发布状态和回复语言。
87
- - `release` -> 从项目公开发布文档、脚本和 CI 中确认前置条件、产物和验证命令。
86
+ - GitHub issue/PR/release intents -> `skills/check/SKILL.md` 处理。目标是 GitHub 时优先使用 `gh` 或可用的 GitHub 工具;非 GitHub 平台按项目公开上下文选择对应 CLI/API。
87
+ - Release/publish intents -> 由 `skills/check/SKILL.md` 从项目公开发布文档、脚本和 CI 中确认前置条件、产物和验证命令。
88
88
 
89
89
  本地 durable memory / preview 可以作为可选私有上下文来理解用户偏好、旧决策和可迁移模式;它不属于公开项目约束,且必须用当前代码、日志、测试、文档或远端状态重新验证。
90
90
 
@@ -13,6 +13,13 @@ Prefix your first line with 🥷 inline, not as its own paragraph.
13
13
 
14
14
  Read the diff, find the problems, fix what can be fixed safely, ask about the rest. Done means verification ran in this session and passed.
15
15
 
16
+ ## Outcome Contract
17
+
18
+ - Outcome: a review, release decision, or maintainer action grounded in the current diff, project context, and live evidence.
19
+ - Done when: findings, fixes, shipped state, or blockers are stated with the commands, artifacts, or remote state that prove them.
20
+ - Evidence: worktree status, diff, public project docs, manifests, CI, package contents, release or registry state, and current command output.
21
+ - Output: concise findings first, then verification and shipped-state summary when applicable.
22
+
16
23
  ## Worktree Safety Preflight
17
24
 
18
25
  Before any review, triage, ship, release, or PR operation, read the current worktree with:
@@ -25,6 +32,10 @@ Treat modified, staged, and untracked files as user work. You may read them and
25
32
 
26
33
  Do not run these commands as default review or PR setup: `git switch`, `git checkout`, `git reset --hard`, `git clean`, `git stash -u`, `git stash --include-untracked`, `git stash -a`, `git stash --all`, or `gh pr checkout`. If a branch change or cleanup is genuinely required, stop and ask for that exact operation.
27
34
 
35
+ Do not "protect" user work by moving untracked files, generated files, screenshots, or local scratch files into `/tmp` or another holding directory. Moving someone else's WIP out of the checkout is the same class of interference as stashing it. If a clean tree is required for generation, packaging, or verification, use a separate worktree from a known commit and copy only the artifact or patch you own back into the current checkout.
36
+
37
+ For commit or push follow-through in a dirty or multi-agent checkout, record `git rev-parse HEAD` before staging. Re-read `git status --short --branch -uall` and `git rev-parse HEAD` immediately before commit and again before push. If HEAD moved, unknown commits appeared, or the worktree changed outside your intended files, stop and report the mismatch instead of rebasing, recommitting, or pushing.
38
+
28
39
  For PR inspection, prefer commands that do not switch the current working tree: `gh pr view`, `gh pr diff`, `git fetch origin pull/<n>/head:refs/tmp/pr-<n>`, and `git merge-tree`.
29
40
 
30
41
  ## Mode Picker
@@ -91,11 +102,15 @@ Activate when the user mentions: issue, PR, "review all", triage, "batch", or "
91
102
 
92
103
  **Action-first rule:** Items with a clear disposition (already fixed, duplicate, already released) get acted on immediately without analysis paragraphs. When analyzing screenshots or images, state what you see and the suggested action in one message. Only ask the user when the disposition is genuinely ambiguous.
93
104
 
94
- **Flow:** Pull open items with `gh issue list -R <repo> --state open --limit 20` and `gh pr list -R <repo> --state open`. For each item, check if a fix already shipped: `git log --oneline <latest-tag>..HEAD | grep -i "<keyword>"`. If shipped: close with note. If merged but unreleased: reply "已修复,等下一个版本 release" and close. If no fix: analyze and act. Fix now if possible (`fix: closes #N` commit); when the target project documents a nightly, beta, or pre-release channel that already contains the fix, reply with that exact upgrade path and close; for valid-but-unreleased items acknowledge and leave open; for invalid items give one-two sentence reason and close.
105
+ **Bundled request classification:** When one issue, PR, or support thread contains several asks, split them before acting: core bug, existing affordance, cosmetic preference, and out-of-scope request. Fix or close only the validated core bug; answer existing affordances with the current path; defer or decline cosmetic and out-of-scope asks instead of treating the whole report as a to-do list.
106
+
107
+ **Status answer order:** For "都解决了吗", "is this fixed", "is this ready", or similar status checks, answer in this order: code or commit state, branch or CI state, release artifact or registry state, then public issue or PR state. Do not collapse fixed-on-main, available in pre-release, next stable release, and already shipped.
108
+
109
+ **Flow:** First identify the project's issue/PR host from public context. For GitHub projects, pull open items with `gh issue list -R <repo> --state open --limit 20` and `gh pr list -R <repo> --state open`. For non-GitHub projects, use the platform CLI/API named by the project docs or user request; if none exists, stop and report the missing integration instead of pretending GitHub commands apply. For each item, check if a fix already shipped: `git log --oneline <latest-tag>..HEAD | grep -i "<keyword>"`. If shipped: close with note. If merged but unreleased: reply "已修复,等下一个版本 release" and close. If no fix: analyze and act. Fix now if possible (`fix: closes #N` commit); when the target project documents a nightly, beta, or pre-release channel that already contains the fix, reply with that exact upgrade path and close; for valid-but-unreleased items acknowledge and leave open; for invalid items give one-two sentence reason and close.
95
110
 
96
111
  Before final conclusions in a live queue, refresh the issue/PR list once more and re-read any item that changed during the run. If evidence is incomplete, hold the item instead of closing it on a guess.
97
112
 
98
- **PR handling:** If the PR direction is accepted but the patch needs changes, prefer pushing the maintainer's fixes to the contributor's PR branch and then merging the PR. Check `maintainerCanModify` first. If branch edits are not allowed, ask the contributor to enable maintainer edits or push the needed revision; only fall back to a separate maintainer commit when timing or release safety requires it, and say so in the PR. Close without merging only when the direction is rejected, unsafe, no longer needed, or explicitly not part of the project's scope. Do not silently absorb an accepted PR into `main` and close it.
113
+ **PR handling:** If the PR direction is accepted but the patch needs changes, prefer pushing the maintainer's fixes to the contributor's PR branch and then merging the PR. Check `maintainerCanModify` first, then confirm the push remote, target branch, and current HEAD immediately before pushing so you do not overwrite contributor work or push maintainer fixes to the wrong repository. If branch edits are not allowed, ask the contributor to enable maintainer edits or push the needed revision; only fall back to a separate maintainer commit when timing or release safety requires it, and say so in the PR. Close without merging only when the direction is rejected, unsafe, no longer needed, or explicitly not part of the project's scope. Do not silently absorb an accepted PR into `main` and close it.
99
114
 
100
115
  **Public reply shape:** load `references/public-reply.md` for the full template (mention, single thanks, factual paragraphs, next-release step, editing rules, closure criteria). Ship Mode uses the same template; the file is the single source.
101
116
 
@@ -126,10 +141,11 @@ This mode extends review; it does not skip review. Before any public or irrevers
126
141
  1. Extract release rules from public project context: README, manifests, CI workflows, release notes, package scripts, changelogs, and explicit user instructions in the current thread.
127
142
  2. Fill the Release Gate 2.0 matrix from `references/project-context.md`: review base, dirty/staged/untracked state, latest tag, origin sync, version fields, generated artifacts, package/archive contents, release assets, registry/appcast/CI, and public issue/PR state.
128
143
  3. Verify generated or bundled outputs, version fields, release notes, package contents, and required artifacts are in sync. Prefer dry-run commands when the ecosystem provides them.
129
- 4. Commit only intended files. Preserve unrelated dirty work, and serialize git operations so index locks or overlapping adds do not corrupt the workflow.
144
+ Generated deliverables include tracked archives, ignored dist files, appcasts, site/download copy, registry packages, checksums, and release assets. If project docs require them, regenerate, inspect, and stage or upload them explicitly even when they are ignored by git; do not infer readiness from source-only tests.
145
+ 4. Commit only intended files. Preserve unrelated dirty work, serialize git operations so index locks or overlapping adds do not corrupt the workflow, and re-check HEAD/status before pushing so concurrent agent or maintainer commits are not swept into your ship action.
130
146
  5. Push, publish, tag, or create a release only when the user has explicitly approved that action. If auth, OTP, CI, registry, or network state blocks the operation, pause and report the exact blocker.
131
- 6. For issue/PR follow-through, confirm the item identity with `gh issue view` or `gh pr view` before posting. Use `references/public-reply.md` for the maintainer reply template (mention, single thanks, facts, explicit next release or verification step) and its closure criteria.
132
- 7. For GitHub release reaction follow-through, only do it when project context or the current thread asks for it. After the release exists and required assets are verified, resolve the release id from the tag, POST every positive release reaction to `repos/<owner>/<repo>/releases/<id>/reactions` with `gh api`, and re-read reactions to confirm. Positive release reactions are `+1`, `laugh`, `heart`, `hooray`, `rocket`, and `eyes`.
147
+ 6. For issue/PR follow-through, confirm the item identity with the host's read command before posting. On GitHub, use `gh issue view` or `gh pr view`; on other hosts, use the CLI/API named by project docs or the current request. Use `references/public-reply.md` for the maintainer reply template (mention, single thanks, facts, explicit next release or verification step) and its closure criteria.
148
+ 7. For GitHub release reaction follow-through, only do it when project context or the current thread asks for it. After the release exists and required assets are verified, resolve the release id from the tag, POST every positive release reaction to `repos/<owner>/<repo>/releases/<id>/reactions` with `gh api` or the available GitHub tool, and re-read reactions to confirm. Positive release reactions are `+1`, `laugh`, `heart`, `hooray`, `rocket`, and `eyes`.
133
149
  8. After network or API failures, re-read the end state instead of assuming success or failure.
134
150
 
135
151
  End with the concrete shipped state: commit hash, tag, release URL, registry/version result, pushed branch, release asset state, release reaction state, issue/PR state, and any remaining blockers. Omit fields that do not apply.
@@ -140,7 +156,7 @@ Activate when the user asks for a project-wide code-quality scorecard: "audit",
140
156
 
141
157
  **Flow**
142
158
 
143
- 1. Run `python3 <waza>/skills/check/scripts/audit_signals.py --root <project>` from the target repo. The script emits ten labelled blocks (`=== FILE SIZE HOTSPOTS ===` ... `=== DENYLIST IN BUILD ===`) each ending with `status: PASS|WARN|FAIL`.
159
+ 1. Run `python3 <waza>/skills/check/scripts/audit_signals.py --root <project>` from the target repo. The script emits labelled blocks (`=== FILE SIZE HOTSPOTS ===` ... `=== DENYLIST IN BUILD ===`) each ending with `status: PASS|WARN|FAIL|N/A`.
144
160
  2. Skim the largest source files surfaced by `FILE SIZE HOTSPOTS` (typically 3-5; stop sooner if the architecture is already clear).
145
161
  3. Read `CLAUDE.md` / `AGENTS.md` / `README.md` to learn the project's own stated conventions before judging it against generic ones.
146
162
  4. Apply the four-axis rubric below. Each axis is independently scored 0-10. Overall = arithmetic mean.
@@ -207,6 +223,8 @@ State the depth before proceeding.
207
223
 
208
224
  Before reading code, check scope drift: do the diff and the stated goal match? Label: **on target** / **drift** / **incomplete**.
209
225
 
226
+ Also check surgical traceability: every changed file and every new public surface must trace back to the user's stated goal. If a file, dependency, config knob, abstraction, generated artifact, workflow permission, or release behavior cannot be explained in one sentence from the request, label it drift until proven necessary.
227
+
210
228
  Drift signals (examples, not exhaustive -- any one is enough to label drift):
211
229
  - A changed file has no connection to the stated goal
212
230
  - The diff includes pure refactoring (renames, formatting, restructuring) when the goal was a bug fix or feature
@@ -219,6 +237,14 @@ Drift signals (examples, not exhaustive -- any one is enough to label drift):
219
237
 
220
238
  When the diff fixes one instance of a class-of-bug (a missing validation, a wrong selector, an off-by-one, a missing lock), the same shape often lives elsewhere. Extract the pattern signature, `grep -rn` it across the repo (exclude generated dirs), and confirm sibling instances were also handled. List any unswept sibling: flag it as a hard stop when it carries the same risk, advisory when lower-risk. For a deeper sweep playbook, see hunt's Scope Blast Mode.
221
239
 
240
+ ## CLI Command Surface
241
+
242
+ When a diff touches a CLI entrypoint, installer, completion, config/env handling, package wrapper, or a mutating command such as cleanup, update, uninstall, migration, or cache removal, fill the CLI Command Surface from `references/project-context.md` before sign-off.
243
+
244
+ Check command contract and installed-runtime behavior, not just library tests: help/version, subcommands/flags, exit codes, stdout/stderr, JSON/schema output, TTY/non-interactive paths, env/config precedence, shebang/executable bit, PATH shim, and package-manager install path when applicable.
245
+
246
+ For mutating CLI commands, also run the Safety Sink Review: dry-run or confirmation path, operation log or rollback story, retry/idempotency, signal/partial-failure handling, and test-mode guards for auth prompts or real system changes. For cleanup, uninstall, prune, reset, or cache-removal commands, add two checks before approval: can a normal user verify each selected item is safe, and is the deleted content locally rebuildable rather than a downloaded dependency or user data? If either answer is no, require narrower matching, explicit user selection, or leave the item visible but non-destructive.
247
+
222
248
  ## Hard Stops (fix before merging)
223
249
 
224
250
  Examples, not exhaustive -- flag any diff that could cause irreversible harm if merged unreviewed.
@@ -228,12 +254,17 @@ Examples, not exhaustive -- flag any diff that could cause irreversible harm if
228
254
  - **Destructive auto-execution**: any task marked "safe" or "auto-run" that modifies user-visible state (history files, config, preferences, installed software) must require explicit confirmation.
229
255
  - **Release artifacts missing**: verify every artifact listed in release notes, release templates, or project workflows exists and has been uploaded before declaring done.
230
256
  - **Generated artifact drift**: if source changes require generated or bundled outputs, verify the output was regenerated and included.
257
+ - **Verifier failure layer unclear**: if a verifier fails before assertions or due to missing optional dependencies, bootstrap noise, transient build-service crashes, unavailable simulators, or tool setup, classify setup versus product failure. Retry only with new evidence or a narrower environment. Do not call the repo broken until the intended test body or artifact check actually ran.
231
258
  - **Tracked package omissions**: if a package script builds from tracked files, allowlists, or generated manifests, verify every new helper module, reference file, template, or script used by the diff is tracked and present in the built archive before sign-off.
232
259
  - **Version skew**: release version fields across manifests, package metadata, app configs, changelogs, tags, or lockfiles must stay synchronized.
233
260
  - **Unknown identifiers in diff**: any function, variable, or type introduced in the diff that does not exist in the codebase is a hard stop. Grep before writing or approving any reference: `grep -r "name" .` -- no results outside the diff = does not exist.
261
+ - **Dead-code or YAGNI deletion without proof**: any "zero callers" or "unused" claim must be checked across the whole repository, including top-level entrypoints, docs, tests, generated dispatch tables, scripts, CI, and dynamic lookup patterns. Treat sub-agent or tool reports as leads, not proof. Before deleting, batch-grep all candidates, classify test-only references separately from production references, and chase written variables or data tables that may become orphaned together. If the grep scope is partial, do not delete.
234
262
  - **Injection and validation**: SQL, command, path injection at system entry points. Credentials hardcoded, logged, committed, or copied into public docs.
235
263
  - **Dependency changes**: unexpected additions or version bumps in package.json, Cargo.toml, go.mod, requirements.txt. Flag any new dependency not obviously required by the diff.
236
264
  - **Safety sinks**: destructive file operations, shell or AppleScript construction, cwd/path/symlink traversal, approval or sandbox boundary changes, signing/appcast flows, and auth prompts need explicit review of validation, rollback, and user-confirmation behavior.
265
+ - **Audit before restore**: when the diff re-adds a symbol, string, asset, or config field that recent history removed, grep the rest of the diff and the main branch to confirm anything still uses it. A rule file that names the symbol is not proof of life. If only a parity test references it, the rule is stale and the restore is wrong; reject the restore and flag the stale rule. Specifically suspicious: re-adding an enum case, xcstrings entry, dictionary key, or asset file that the prior commit deleted intentionally.
266
+ - **AI-generated PR with broad matchers in destructive sinks**: any PR that introduces `find`-like recursion, mass-delete, sandbox/container traversal, ID-prefix wildcards, or fallback regex branches feeding a destructive sink, and was likely AI-generated, must be reviewed line-by-line for three things: matcher breadth in every branch (fallback paths often regress to broad globs even when the primary branch is correct), protected-path coverage (does the existing guard list include this new entry point?), and whether the change bypasses an existing user-confirmation step. Generic plausibility is not safety. When in doubt, ask the contributor to narrow the matcher to an exact constant (exact bundle ID, exact app name, exact path), not a prefix or wildcard; do not approve "this looks fine."
267
+ - **Migration code for features that did not ship before**: reject migration scaffolding, version-gated defaults, or "carry old key forward" logic when the underlying preference / schema / feature was introduced in this same release. `git show v<last-release>:<path>` is the gate: if the key is absent from the last tag, no migration is needed; ship the default. Migration code added for a never-shipped key is dead-on-arrival complexity.
237
268
 
238
269
  ## Finding Quality Gate
239
270
 
@@ -297,9 +328,9 @@ Apply all `safe_auto` fixes first. Batch all `gated_auto` into one confirmation
297
328
 
298
329
  "If I were trying to break this system through this specific diff, what would I exploit?" Four angles (see `references/persona-catalog.md`): assumption violation, composition failures, cascade construction, abuse cases. Suppress findings below 0.60 confidence.
299
330
 
300
- ## GitHub Operations
331
+ ## Platform Operations
301
332
 
302
- Use `gh` CLI for all GitHub interactions, not MCP or raw API. Confirm CI passes before merging.
333
+ Use the platform tool that matches the project. For GitHub projects, prefer `gh` or the available GitHub integration and confirm CI passes before merging. For non-GitHub projects, derive the CLI/API from public project docs or the user's explicit platform context; do not force GitHub commands onto other hosts.
303
334
 
304
335
  ## Verification
305
336
 
@@ -313,12 +344,12 @@ For bug fixes: a regression test that fails on the old code must exist before th
313
344
 
314
345
  | What happened | Rule |
315
346
  |---------------|------|
316
- | Commented on #249 when discussing #255 | Run `gh issue view N` to confirm title before acting |
347
+ | Posted a public reply to the wrong issue or PR thread | Re-read the target with `gh issue view N` or `gh pr view N` and confirm title, author, and current state before acting |
317
348
  | PR comment sounded like a report | 1-2 sentences, natural, like a colleague. Not structured, not AI-sounding. |
318
349
  | PR comment used bullet points | Write as short paragraphs, one thought per paragraph; thank the contributor first |
319
- | article.en.md inside _posts_en/ doubled the suffix | Check naming convention of existing files in the target directory first |
320
- | Deployed without env vars set | Run `vercel env ls` before deploying; diff against local keys |
321
- | Push failed from auth mismatch | Run `git remote -v` before the first push in a new project |
350
+ | New file name duplicated a locale, platform, or suffix convention | Check the target directory's existing naming convention before creating or renaming files |
351
+ | Deployed without provider runtime or env checks | Follow the project's public deployment docs and compare provider config with local required env and runtime settings |
352
+ | Push failed from auth mismatch | Check `git remote -v`, current branch, and auth identity before the first push in a new project |
322
353
 
323
354
  ## Document Review
324
355
 
@@ -20,6 +20,8 @@ Use this template to compress repository context before running Waza `/check`. T
20
20
  - Protected files and directories.
21
21
  - Generated or bundled artifacts that must stay in sync with source changes.
22
22
  - Packaging source of truth: whether archives are built from `git ls-files`, explicit allowlists, generated manifests, or source directories.
23
+ - Delivery surfaces: whether generated outputs are tracked, ignored, external release assets, registry uploads, appcasts, installer metadata, checksums, or site/download copy; how they are regenerated, inspected, staged, or uploaded.
24
+ - CLI command surfaces: entrypoints, subcommands, flags, help/version behavior, exit codes, stdout/stderr contract, TTY and non-interactive paths, config/env precedence, and installed-runtime checks.
23
25
  - Runtime dependencies introduced by the diff: Python packages, CLIs, network services, package managers, or platform tools that are not already declared in CI/docs.
24
26
  - Domain-specific safety rules.
25
27
  - Release artifacts that must exist.
@@ -34,6 +36,7 @@ Use this template to compress repository context before running Waza `/check`. T
34
36
  - Maintainer-only machine paths.
35
37
  - One-off personal preferences that do not affect project behavior.
36
38
  - One-off review reports, scorecards, or diagnostic snapshots copied as guidance instead of distilled into stable project rules.
39
+ - Raw memory, chat excerpts, screenshots, private support details, local paths, project-specific commands, issue/PR numbers, release tags, or commit hashes from another project.
37
40
  - Full copies of Waza `/check` sections.
38
41
 
39
42
  ## Recommended Context Shape
@@ -45,10 +48,19 @@ Use this template to compress repository context before running Waza `/check`. T
45
48
  - Fast check: `<command>`
46
49
  - Full verification: `<command>`
47
50
 
51
+ ## CLI Command Surface
52
+
53
+ - Entrypoints: `<command or bin>`.
54
+ - Command contract: help/version, subcommands, flags, exit codes, stdout/stderr, JSON/schema output.
55
+ - Runtime shape: TTY vs non-interactive behavior, env/config precedence, completion/manpage or shell integration.
56
+ - Install/run proof: built package, temp prefix, PATH shim, shebang/executable bit, or package-manager path checked with `<command>`.
57
+ - Mutating commands: dry-run/confirmation, operation log, rollback/retry behavior, signal/partial-failure handling.
58
+
48
59
  ## Project Hard Stops
49
60
 
50
61
  - Do not modify `<protected path>` unless explicitly requested.
51
62
  - If `<artifact>` is generated from `<source>`, verify it was regenerated.
63
+ - If `<artifact>` is ignored by git but required for release, verify the regeneration and force-stage, upload, or registry publish path named by the project.
52
64
  - If `<package script>` builds from tracked files or an allowlist, verify newly introduced helpers, references, templates, and scripts are included in `<archive>`.
53
65
  - If an installer fetches remote content, verify the default ref is pinned to a release tag or checksum-protected; floating `main` must be an explicit override.
54
66
  - If a helper introduces a non-stdlib package or external CLI, verify CI installs it or the helper fails with a clear setup path.
@@ -88,7 +100,7 @@ Fill this before claiming a change is release-ready. Use "n/a" only when the pro
88
100
  | Remote state | `origin/main` or release branch sync checked |
89
101
  | Version fields | Manifest, app config, changelog, appcast, and lockfile versions aligned |
90
102
  | Runtime dependencies | Newly introduced Python packages, CLIs, package managers, and network tools declared and available in CI |
91
- | Generated artifacts | Bundled/minified/archive outputs regenerated or proven not needed |
103
+ | Generated artifacts | Tracked archives, ignored dist outputs, bundled/minified files, appcasts, installer metadata, checksums, and site/download copy regenerated or proven not needed |
92
104
  | Package/archive contents | Built package inspected for required files, newly introduced helpers/references, and missing extras |
93
105
  | Release assets | GitHub release, appcast, download archive, checksum, or installer assets verified |
94
106
  | Registry/appcast | npm/crates/Homebrew/appcast/App Store or equivalent state re-read after publish |
@@ -1,8 +1,8 @@
1
1
  #!/usr/bin/env python3
2
2
  """Project audit signals (Phase 1) for /check audit mode.
3
3
 
4
- Walks a project root and emits 10 structured signal blocks to stdout.
5
- Each block ends with `status: PASS|WARN|FAIL` so the LLM driving the
4
+ Walks a project root and emits structured signal blocks to stdout.
5
+ Each block ends with `status: PASS|WARN|FAIL|N/A` so the LLM driving the
6
6
  4-axis Linus-style scorecard can skim quickly.
7
7
 
8
8
  Pure stdlib. Read-only. Exits 0 even on WARN/FAIL so the harness does
@@ -14,6 +14,7 @@ Run as: python3 skills/check/scripts/audit_signals.py --root <path>
14
14
  from __future__ import annotations
15
15
 
16
16
  import argparse
17
+ import json
17
18
  import os
18
19
  import re
19
20
  import subprocess
@@ -54,6 +55,34 @@ DENYLIST_HINT_RE = re.compile(
54
55
  re.IGNORECASE,
55
56
  )
56
57
  MINIFIED_RE = re.compile(r"\.min\.[a-z]+$", re.IGNORECASE)
58
+ CLI_CONTRACT_BUCKETS: tuple[tuple[str, re.Pattern[str]], ...] = (
59
+ ("help_or_usage", re.compile(r"(--help|\busage\b|\bhelp output\b)", re.IGNORECASE)),
60
+ ("version", re.compile(r"(--version|\bversion output\b)", re.IGNORECASE)),
61
+ ("exit_code", re.compile(r"\b(exit code|exit status|return code|exit_code|\$\?)\b", re.IGNORECASE)),
62
+ ("stdout", re.compile(r"\b(stdout|standard output)\b|>\s*\"\$?[A-Za-z0-9_./-]*stdout", re.IGNORECASE)),
63
+ ("stderr", re.compile(r"\b(stderr|standard error)\b|2>\s*\"\$?[A-Za-z0-9_./-]*stderr", re.IGNORECASE)),
64
+ ("non_interactive_or_tty", re.compile(r"\b(non-interactive|noninteractive|tty|isatty|/dev/null|CI=1)\b", re.IGNORECASE)),
65
+ (
66
+ "install_run",
67
+ re.compile(
68
+ r"(\binstall\s+-m\b|\binstalled command\b|\binstalled-runtime\b|"
69
+ r"\binstall/run\b|\binstall run\b|\btemp prefix\b|\bPATH shim\b|"
70
+ r"\bpackage-manager path\b|\bnpm link\b|\bpipx install\b|"
71
+ r"\bcargo install\b|\bbrew install\b|\bmake install\b)",
72
+ re.IGNORECASE,
73
+ ),
74
+ ),
75
+ ("json_or_schema", re.compile(r"\b(json|schema)\b", re.IGNORECASE)),
76
+ ("completion", re.compile(r"\bcompletion\b", re.IGNORECASE)),
77
+ )
78
+ CLI_CORE_BUCKETS = (
79
+ "help_or_usage",
80
+ "version",
81
+ "exit_code",
82
+ "stdout",
83
+ "stderr",
84
+ "install_run",
85
+ )
57
86
 
58
87
 
59
88
  def is_excluded(path: Path, root: Path) -> bool:
@@ -214,6 +243,156 @@ def block_test_ci(files: list[Path], root: Path) -> None:
214
243
  status("PASS")
215
244
 
216
245
 
246
+ def _package_bin_entrypoints(root: Path) -> list[str]:
247
+ path = root / "package.json"
248
+ if not path.is_file():
249
+ return []
250
+ text = read_text(path, 200_000)
251
+ try:
252
+ data = json.loads(text)
253
+ except json.JSONDecodeError:
254
+ return []
255
+ bin_field = data.get("bin")
256
+ name = str(data.get("name") or "package")
257
+ if isinstance(bin_field, str):
258
+ return [f"package.json bin:{name} -> {bin_field}"]
259
+ if isinstance(bin_field, dict):
260
+ return [
261
+ f"package.json bin:{cmd} -> {target}"
262
+ for cmd, target in sorted(bin_field.items())
263
+ if isinstance(cmd, str) and isinstance(target, str)
264
+ ]
265
+ return []
266
+
267
+
268
+ def _pyproject_script_entrypoints(root: Path) -> list[str]:
269
+ path = root / "pyproject.toml"
270
+ if not path.is_file():
271
+ return []
272
+ text = read_text(path, 200_000)
273
+ entries: list[str] = []
274
+ in_scripts = False
275
+ for line in text.splitlines():
276
+ stripped = line.strip()
277
+ if stripped.startswith("[") and stripped.endswith("]"):
278
+ in_scripts = stripped in {
279
+ "[project.scripts]",
280
+ "[tool.poetry.scripts]",
281
+ }
282
+ continue
283
+ if not in_scripts or not stripped or stripped.startswith("#"):
284
+ continue
285
+ m = re.match(r'([A-Za-z0-9_.-]+)\s*=\s*["\']([^"\']+)["\']', stripped)
286
+ if m:
287
+ entries.append(f"pyproject.toml script:{m.group(1)} -> {m.group(2)}")
288
+ return entries
289
+
290
+
291
+ def _cargo_entrypoints(root: Path) -> list[str]:
292
+ entries: list[str] = []
293
+ cargo = root / "Cargo.toml"
294
+ if cargo.is_file():
295
+ text = read_text(cargo, 200_000)
296
+ if "[[bin]]" in text:
297
+ names = re.findall(r'(?m)^\s*name\s*=\s*["\']([^"\']+)["\']', text)
298
+ if names:
299
+ entries.extend(f"Cargo.toml bin:{name}" for name in sorted(set(names)))
300
+ else:
301
+ entries.append("Cargo.toml [[bin]]")
302
+ if (root / "src" / "main.rs").is_file():
303
+ entries.append("src/main.rs")
304
+ return entries
305
+
306
+
307
+ def cli_entrypoints(files: list[Path], root: Path) -> list[str]:
308
+ entries: set[str] = set()
309
+ entries.update(_package_bin_entrypoints(root))
310
+ entries.update(_pyproject_script_entrypoints(root))
311
+ entries.update(_cargo_entrypoints(root))
312
+
313
+ for path in files:
314
+ try:
315
+ parts = path.relative_to(root).parts
316
+ except ValueError:
317
+ continue
318
+ if not parts:
319
+ continue
320
+ if parts[0] == "bin" and len(parts) >= 2:
321
+ entries.add("/".join(parts[:2]))
322
+ if parts[0] == "cmd" and len(parts) >= 3 and path.suffix == ".go":
323
+ entries.add(f"cmd/{parts[1]}")
324
+ return sorted(entries)
325
+
326
+
327
+ def _is_cli_contract_candidate(path: Path, root: Path) -> bool:
328
+ try:
329
+ parts = path.relative_to(root).parts
330
+ except ValueError:
331
+ return False
332
+ if not parts:
333
+ return False
334
+ lower_parts = tuple(p.lower() for p in parts)
335
+ name = lower_parts[-1]
336
+ if name in {"readme.md", "readme.txt", "agents.md", "claude.md"}:
337
+ return True
338
+ if lower_parts[0] in {"tests", "test", "spec", "scripts"}:
339
+ return True
340
+ if "test" in name or "spec" in name:
341
+ return True
342
+ if len(lower_parts) >= 3 and lower_parts[:2] == (".github", "workflows"):
343
+ return True
344
+ return False
345
+
346
+
347
+ def cli_contract_evidence(files: list[Path], root: Path) -> dict[str, list[tuple[str, str]]]:
348
+ hits: dict[str, list[tuple[str, str]]] = {}
349
+ for path in files:
350
+ if not _is_cli_contract_candidate(path, root):
351
+ continue
352
+ text = read_text(path, 200_000)
353
+ if not text:
354
+ continue
355
+ for bucket, pattern in CLI_CONTRACT_BUCKETS:
356
+ m = pattern.search(text)
357
+ if m:
358
+ hits.setdefault(bucket, []).append((rel(path, root), m.group(0)))
359
+ return {bucket: sorted(values) for bucket, values in sorted(hits.items())}
360
+
361
+
362
+ def block_cli_contract_surface(files: list[Path], root: Path) -> None:
363
+ header("CLI CONTRACT SURFACE")
364
+ entries = cli_entrypoints(files, root)
365
+ if not entries:
366
+ print("(no CLI entrypoints detected)")
367
+ status("N/A")
368
+ return
369
+
370
+ print(f"entrypoints={len(entries)}")
371
+ for entry in entries[:12]:
372
+ print(f" entry: {entry}")
373
+ if len(entries) > 12:
374
+ print(f" ... {len(entries) - 12} more")
375
+
376
+ evidence = cli_contract_evidence(files, root)
377
+ covered = tuple(bucket for bucket, _ in CLI_CONTRACT_BUCKETS if bucket in evidence)
378
+ missing = tuple(bucket for bucket in CLI_CORE_BUCKETS if bucket not in evidence)
379
+ print(f"covered={','.join(covered) if covered else 'none'}")
380
+ print(f"missing={','.join(missing) if missing else 'none'}")
381
+ printed = 0
382
+ for bucket in covered:
383
+ for path, signal in evidence[bucket][:3]:
384
+ print(f" evidence: {bucket} {path} signal={signal}")
385
+ printed += 1
386
+ if printed >= 12:
387
+ break
388
+ if printed >= 12:
389
+ break
390
+ if not missing:
391
+ status("PASS")
392
+ else:
393
+ status("WARN")
394
+
395
+
217
396
  def _grep_version(path: Path, pattern: str) -> str | None:
218
397
  text = read_text(path, 20_000)
219
398
  if not text:
@@ -471,6 +650,7 @@ def main() -> int:
471
650
  block_hotspots(files, root); print()
472
651
  block_heredoc(files, root); print()
473
652
  block_test_ci(files, root); print()
653
+ block_cli_contract_surface(files, root); print()
474
654
  block_version_sources(root); print()
475
655
  block_packaging_posture(root); print()
476
656
  block_install_url(root); print()
@@ -11,6 +11,13 @@ Prefix your first line with 🥷 inline, not as its own paragraph.
11
11
 
12
12
  If it could have been generated by a default prompt, it is not good enough.
13
13
 
14
+ ## Outcome Contract
15
+
16
+ - Outcome: a usable interface or visual fix with a clear point of view and no incoherent layout, text, or responsive breakage.
17
+ - Done when: the real rendered surface or generated artifact has been checked against the user's visual goal and the relevant viewport states.
18
+ - Evidence: screenshots, rendered UI, source components, design tokens, accessibility constraints, and user-provided references.
19
+ - Output: the implemented visual change or a precise visual review with the remaining verification gap named.
20
+
14
21
  **Output language rule:** Never use em-dash (—) in any output from this skill. Use commas, colons, or periods instead.
15
22
 
16
23
  **Chinese gut-feel complaints**: when the user says "很傻", "很怪", "突兀", "不协调", "不和谐" about a visual, treat it as an aesthetic rejection, not a debugging symptom. Route to Screenshot Iteration Mode, not to `/hunt`.
@@ -21,6 +28,28 @@ See [rules/durable-context.md](../../rules/durable-context.md) for when to read
21
28
 
22
29
  For `/design`, visual constraints are `decision`, `preference`, and `principle` entries; reusable product and UI patterns are `pattern` and `learning`. Current screenshots, rendered output, code, design tokens, and user feedback override memory. Reuse durable visual preferences and mature interaction patterns, but still name the current visual problem from the screenshot or source before changing code.
23
30
 
31
+ ## Visual Quick-Fix Mode
32
+
33
+ Activate when the user asks for a narrow visual repair with a concrete symptom: overflow, clipped or wrapped text, misalignment, spacing imbalance, contrast/readability, localized text not fitting, or compact responsive breakage. This is for fixing an existing surface, not redesigning it.
34
+
35
+ Flow:
36
+
37
+ 1. Read the current UI evidence: screenshot, rendered page, native view, or responsible component.
38
+ 2. Name the exact visual defect in one sentence.
39
+ 3. Make the smallest material, geometry, spacing, contrast, typography, or text-fit change that fixes that defect.
40
+ 4. Verify the real running surface or generated artifact. Check long words, localized strings, compact states, and at least one narrow viewport when applicable.
41
+ 5. If the fix touches three or more components, changes product behavior, or reveals a direction problem, stop and switch to Screenshot Iteration Mode or Lock the Direction First.
42
+
43
+ **Spacing unification rule.** If a magic spacing or sizing value has been adjusted three times and the layout still looks off, stop tuning. Replace the N independent padding / gap / margin / size values with one shared named token (`Spacing.s4`, `--gap-content`, `gap-4`). Outer container padding defaults to the same value as inner element gap. Asymmetry that survives tuning is structural, not numeric, so more rounds of magic numbers will not converge. Reduce the count of independent values first, then argue about the specific value.
44
+
45
+ **Fixed-height action slot, uniform typography.** Any container that swaps children based on state (status bar, action slot, toolbar row, menu item) must use one font size across every state. Vary fill, stroke, opacity, color, or icon, never font size. A 1pt height delta between `secondary 13px` and `primary 14px` becomes visible jitter at the state transition. CTA pill buttons in the same slot use the same size (typically 14px), distinguished by background and border, not by typography.
46
+
47
+ **Completion screen layout.** Operation-complete surfaces show the single result the user came for: the actual reclaimed size / processed count / changed state. Long explanations belong in a details overlay opened from a summary row, not in the primary completion line. Do not add a separate "Review" button next to the summary row when one tap on the row already opens details; do not show an empty "0 skipped" entry point. If there is no skipped or failed item, hide the details affordance entirely.
48
+
49
+ **Safety-bound action design.** For cleanup, deletion, uninstall, reset, or permission-changing surfaces, do not make the UI feel simpler by hiding recoverability. Bulk select, auto-select, one-tap delete, or "recommended" destructive defaults are only appropriate when each row is understandable to the target user and carries enough identity to verify safety (name, source, owner, path, preview, or recovery implication as relevant). If rows are opaque identifiers, inferred leftovers, or machine-only paths, prefer review-first UI, current-target scoping, disabled destructive affordances, or explanatory grouping over faster batch controls. A feature request for fewer clicks is not enough to remove the user's ability to verify what will change.
50
+
51
+ **Quiet product boundary.** Fewer clicks and richer controls are not automatically better. Remove misleading affordances before adding alternate controls, prefer quiet defaults for diagnostics and alerts, and fix unstable motion cadence before changing speed or adding a new motion preference. If the current UI implies an action, state, or promise it cannot support, remove that implication first.
52
+
24
53
  ## Screenshot Iteration Mode
25
54
 
26
55
  Activate when the user sends a screenshot or image alongside a complaint ("这里很丑", "这个不对", "fix this", "looks wrong"). The existing product is the direction. Skip the five-question direction lock.
@@ -56,7 +85,7 @@ Before writing any code, ask the user directly, using the environment's native q
56
85
  2. **What is the aesthetic direction?** Name it precisely: dense editorial, raw terminal, ink-on-paper, brutalist grid, warm analog. "Clean and modern" is not a direction. If the user names a reference site or product ("feels like Linear / Claude.ai / Vercel"), do not accept it as a direction -- extract 3 concrete properties from it: button radius philosophy, surface depth treatment (shadow vs background step vs border), and accent color family. Name those instead.
57
86
 
58
87
  **Shortcut for well-known brands**: see "Brand preset flow" in `references/design-reference.md`. Ask first, run the preset, then decompose against the generated file.
59
- 3. **What is the one thing this leaves in memory?** A typeface, color system, unexpected motion, asymmetric layout. Pick one and make it obvious.
88
+ 3. **What is the design signature?** A typeface, color system, unexpected motion, asymmetric layout. Pick one and make it obvious.
60
89
  4. **What are the hard constraints?** Framework, bundle size, contrast minimums, keyboard accessibility.
61
90
  5. **What is the signature micro-interaction?** Scale on press, staggered reveal, or contextual icon animation. Pick one and know exactly how it's implemented.
62
91
 
@@ -73,6 +102,10 @@ Lift exact values: hex codes, spacing scale entries, font stacks, border radii.
73
102
 
74
103
  Only attach the target component folder or package. Exclude `.git`, `node_modules`, `dist`, and lock files. Dragging in an entire monorepo pollutes the context with irrelevant code and degrades output quality.
75
104
 
105
+ ### Existing-native-app exception (do not propose wholesale platform restyling)
106
+
107
+ When the target is an existing macOS / iOS / Android native app that already has a coherent visual direction, do not propose a wholesale port to a newer platform style (macOS 26 Liquid Glass, iOS 18 frosted material, Material You, Fluent Design, etc.) as the default improvement plan. Wholesale restyling reads as "I do not have a specific design intent, here is the platform's." Default to incremental polish on the existing direction: spacing, alignment, hover and focus states, typography hierarchy, copy tightening, motion timing. Only propose a platform-style migration when the user has explicitly asked for it in this turn, or when the existing direction is broken in a way that incremental polish cannot fix. State the existing direction in one sentence before proposing changes so the user can correct the read.
108
+
76
109
  ### App shell exception (sidebar + main workspace)
77
110
 
78
111
  If question 1 is an app shell (Slack, Linear, Notion class), load the "App shell rules" section in `references/design-reference.md` and apply those constraints before proceeding.
@@ -108,6 +141,7 @@ Give at least 3 variations across genuinely different dimensions (density, typog
108
141
  | Chose glassmorphism, ignored the mobile constraint | `backdrop-filter` is expensive on low-power devices. Name the tradeoff. |
109
142
  | Light-mode app: white panel on white background, visually indistinguishable | Adjacent nested surfaces must differ visually. Either background step (sidebar vs main ≥4% lightness difference) or shadow minimum `0 1px 3px rgba(0,0,0,0.10)`. |
110
143
  | Fixed visual polish by redesigning the whole surface | Locate the concrete visual delta first, then make the smallest material, opacity, geometry, or typography change that addresses it. |
144
+ | Added a setting or louder control to solve UI noise | Remove the misleading affordance or choose a quiet default first |
111
145
  | English looked fine, localized text overflowed | Test long words and localized strings before handoff, especially inside buttons, tabs, nav, and compact cards. |
112
146
 
113
147
  ## Aesthetic Review
@@ -1,4 +1,6 @@
1
- # Design Tokens: Color, Typography, and Motion
1
+ # Design Tokens: Color and Typography
2
+
3
+ Motion rules live in [design-reference.md](./design-reference.md) under Animation and Motion Specifics. This file owns color and typography only.
2
4
 
3
5
  ## Color System: OKLCH Rules
4
6
 
@@ -41,13 +43,3 @@ Reject: Inter, DM Sans, DM Serif Display, DM Serif Text, Outfit, Plus Jakarta Sa
41
43
  - `-webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale` once on root layout (macOS only)
42
44
  - `font-variant-numeric: tabular-nums` for counters, timers, prices, number columns
43
45
  - Letter-spacing: roughly -0.022em for display sizes (32px+), -0.012em for mid-range (20-28px), normal at 16px and below
44
-
45
- ## Motion Specifics
46
-
47
- - No bounce or elastic easing. Use exponential ease-out: `cubic-bezier(0.16,1,0.3,1)` for natural deceleration.
48
- - Animate `transform` and `opacity` only. Every other property triggers layout or paint.
49
- - For height reveals: `grid-template-rows: 0fr` to `1fr` (avoids `height: auto` animation trap).
50
- - Icon swaps: 120ms cross-fade with `opacity` and subtle `scale(0.9)` to `scale(1)`.
51
- - Scale on press: `scale(0.96)` on active/press via CSS transitions.
52
- - Page-load guard: `initial={false}` on animated presence wrappers for toggles and tabs (prevents enter animations on first render).
53
- - Honor `prefers-reduced-motion`: disable or reduce animations when set.
@@ -1,11 +1,11 @@
1
1
  ---
2
2
  name: health
3
- description: "Runs a budget-aware Agent Health audit for Codex, Claude Code, Pi, agent instructions, hooks/MCP, verifier surfaces, and AI maintainability. Use when users ask 检查claude/检查codex/检查pi/配置检查/健康度 or report agents ignoring instructions, missing validation, or code becoming hard to maintain. Not for debugging code or reviewing PRs."
3
+ description: "Runs a budget-aware agent-assisted engineering health audit for instruction/config drift, hooks/MCP, verifier surfaces, and AI maintainability. Use when users ask 检查claude/检查codex/检查pi/配置检查/健康度 or report agents ignoring instructions, missing validation, or code becoming hard to maintain. Not for debugging code or reviewing PRs."
4
4
  when_to_use: "检查claude, 检查codex, 检查pi, Codex 配置, Pi 配置, AGENTS.md, config.toml, agent instructions, 健康度, 配置检查, 配置对不对, AI coding 腐化, 代码变烂, 维护性, 上下文混乱, 验证缺失, 验证命令失真, Claude ignoring instructions, Pi coding agent, check config, settings not working, audit config"
5
5
  dispatch_intent: "Codex/Claude/Pi ignoring instructions, agent config audit, hooks/MCP broken, health token usage, AI coding code rot, hotspot ownership, unclear context, missing verification, stale verifier output"
6
6
  ---
7
7
 
8
- # Health: Agent Config and AI Maintainability
8
+ # Health: Agent-Assisted Engineering Health
9
9
 
10
10
  Prefix your first line with 🥷 inline, not as its own paragraph.
11
11
 
@@ -14,6 +14,18 @@ Audit the current project's agent setup and AI coding maintainability against th
14
14
 
15
15
  Find violations. Identify the misaligned layer. Calibrate to project complexity only.
16
16
 
17
+ ## Outcome Contract
18
+
19
+ - Outcome: a budget-aware health report that separates agent configuration risk from AI maintainability risk.
20
+ - Done when: each finding names the misaligned layer, the concrete evidence, and a copy-pasteable action or diagnostic command.
21
+ - Evidence: collected health script output, tracked project instructions, runtime config summaries, verifier logs, hooks/MCP surfaces, and live probes when needed.
22
+ - Output: prioritized findings with status, impact, and next action, or a clear clean bill with residual risk.
23
+
24
+ Two lanes share one report:
25
+
26
+ - **Agent config health**: Codex/Claude/Pi instruction drift, permissions, hooks, MCP, skills, and memory supply chain.
27
+ - **AI maintainability health**: project context surface, verifier wrapper, generated-artifact checks, hotspot ownership, and stale or misleading durable docs.
28
+
17
29
  **Output language:** Check in order: (1) project agent instructions (`AGENTS.md` before runtime-specific files); (2) global agent instructions; (3) user's recent language; (4) English.
18
30
 
19
31
  **Budget posture:** Start with the summary audit. Escalate automatically when the user asks for a deep, full, complete, thorough, "深入", "完整", "彻底", or "继续跑完" audit, when the user explicitly mentions AI coding code rot, Codex/Claude config drift, unclear context, missing verification, verifier output that points at stale paths, or "代码变烂", when current project instructions or remembered user preference says to run deep health checks by default, when the project is Complex, or when the summary pass exposes a critical ambiguity that cannot be resolved locally. Otherwise do not read full conversation extracts or launch inspector subagents. Tell the user before escalating because deep health audits can consume significant token quota.
@@ -78,7 +90,7 @@ Test every MCP server: call one harmless tool per server. Record `live=yes/no` w
78
90
 
79
91
  Run these on every audit, regardless of tier. They are the floor, not the ceiling.
80
92
 
81
- **Deny-list floor.** The project's agent settings should deny, at minimum: credential and key directories (SSH, cloud providers, GPG, gh CLI), secret files (`.env`, `credentials*`, `secrets*`), pipe-to-shell installers (`curl ... | bash`, `wget ... | sh`), and outbound shells (`ssh`, `scp`, `nc`). Flag missing categories as Critical findings; let the reviewer fill in the exact paths from the project's environment.
93
+ **Deny-list floor.** Apply this only when the project or runtime exposes agent permission settings, hook settings, MCP settings, allowed/denied tools, or a documented autonomous-agent launcher. In that case, the settings should deny, at minimum: credential and key directories (SSH, cloud providers, GPG, gh CLI), secret files (`.env`, `credentials*`, `secrets*`), pipe-to-shell installers (`curl ... | bash`, `wget ... | sh`), and outbound shells (`ssh`, `scp`, `nc`). Report this as one concise WARN with the missing categories and suggested fix; let the reviewer fill in exact local paths from the environment. If no agent settings surface exists, report the deny-list as not applicable rather than a failure.
82
94
 
83
95
  **Environment override surface.** Treat the following as attack surface, report when set in tracked files or shipped settings without a justification comment: API base-URL overrides (redirect all traffic to a third party), auto-trust flags for project-local MCP servers, wildcard tool allowlists (`allowedTools: ["*"]`), and permission-skip flags (`--dangerously-skip-permissions` or equivalents). Print file:line and the key name only; never print secrets.
84
96
 
@@ -155,6 +167,8 @@ bash skills/health/scripts/check-agent-context.sh . summary
155
167
 
156
168
  **AI-maintainability gaps.** Use `AI MAINTAINABILITY SUMMARY` in summary mode and `AI MAINTAINABILITY DETAIL` in deep mode. Report `FAIL` when the project has no executable verification command, no agent instruction surface for a non-trivial repo, or broken doc references. Report `WARN` when instructions exist but lack a project map, verification guidance, boundary/non-goal language, when TODO/HACK markers are concentrated, when large source hotspots lack ownership/boundary and verification guidance, or when durable docs contain raw one-off review reports, scorecards, dated line references, or diagnostic dumps instead of stable invariants. Treat missing `docs/`, `specs/`, `.specify/`, `HANDOFF.md`, `CHANGELOG`, issue templates, and PR templates as informational unless project complexity makes them necessary for handoff. The action for stale reports is to extract stable rules into public instructions, rules, references, or verifier scripts, then remove or archive the transient report.
157
169
 
170
+ **Conversation-derived guidance.** When a health audit reads recent agent conversations, do not recommend copying the conversation or a scorecard into docs. Recommend a candidate-matrix pass instead: repeated failure, durable invariant, target public layer, verifier if deterministic, and redaction risk. If the lesson cannot be stated without local paths, issue numbers, customer details, or one-machine state, keep it out of public guidance and leave it as private context.
171
+
158
172
  **Hotspot ownership gaps.** In deep mode, read `HOTSPOT OWNERSHIP SURFACE`. If a largest source file exceeds the hotspot threshold and `AGENTS.md` / `CLAUDE.md` / shared instruction files do not name who owns the hotspot, what boundary should stay stable, and which verification command covers it, report a Structural `WARN`. Do not treat documented large files as code rot by size alone; some modules are intentionally large.
159
173
 
160
174
  **Missing stable verifier wrapper.** If the repo exposes multiple verification commands through CI, scripts, or manifests but `Makefile` has no `check`, `test`, or `verify` target, report a Structural `WARN`. This is an AI-maintainability gap because agents need one stable default entrypoint, not because the project is broken.