@tw93/waza 3.27.0 → 3.28.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +29 -6
- package/package.json +5 -3
- package/rules/anti-patterns.md +23 -23
- package/scripts/build_metadata.py +28 -16
- package/scripts/check_routing_drift.py +8 -0
- package/scripts/package-skill.sh +2 -3
- package/scripts/setup-rule.sh +1 -1
- package/scripts/setup-statusline.sh +1 -1
- package/scripts/skill_checks.py +30 -2
- package/scripts/statusline.sh +88 -9
- package/scripts/validate_package.py +1 -1
- package/skills/RESOLVER.md +1 -1
- package/skills/check/SKILL.md +39 -17
- package/skills/check/references/project-context.md +7 -6
- package/skills/check/scripts/audit_signals.py +10 -9
- package/skills/design/SKILL.md +4 -1
- package/skills/design/references/design-reference.md +17 -0
- package/skills/health/SKILL.md +37 -24
- package/skills/health/scripts/check_agent_context.py +1 -1
- package/skills/health/scripts/check_maintainability.py +6 -0
- package/skills/health/scripts/collect-data.sh +11 -20
- package/skills/hunt/SKILL.md +22 -2
- package/skills/hunt/references/failure-patterns.md +18 -0
- package/skills/read/scripts/fetch.sh +8 -7
- package/skills/read/scripts/fetch_feishu.py +11 -6
- package/skills/think/SKILL.md +24 -8
- package/skills/write/SKILL.md +47 -6
- package/skills/write/references/write-en.md +19 -17
- package/skills/write/references/write-product-localization.md +43 -0
- package/skills/write/references/write-zh-bilingual.md +2 -3
- package/skills/write/references/write-zh-prose.md +2 -0
- package/skills/write/references/write-zh-release-notes.md +2 -0
- package/skills/write/references/write-zh.md +70 -71
package/skills/check/SKILL.md
CHANGED
|
@@ -54,22 +54,6 @@ Pick the mode that matches the user's intent, then read that section in full. Mo
|
|
|
54
54
|
|
|
55
55
|
Before any mode, run [Project Context Extraction](#project-context-extraction) and (if memory is in scope) [Durable Context Preflight](#durable-context-preflight).
|
|
56
56
|
|
|
57
|
-
## Plan Execution Mode
|
|
58
|
-
|
|
59
|
-
Activate when the user's message starts with "Implement the following plan", "按计划实施", "按照计划", "整", "可以干", "直接改" followed by a plan body, or links to a `/think` output.
|
|
60
|
-
|
|
61
|
-
In this mode, do not run a code review. Instead:
|
|
62
|
-
|
|
63
|
-
1. State which plan is being executed (first heading or summary line).
|
|
64
|
-
2. Check for obvious repo drift: run `git status --short --branch -uall` and skim any changed files that contradict the plan. If drift makes the plan unsafe, name the specific conflict and stop.
|
|
65
|
-
3. Work through each plan item as a to-do. Mark each complete as you go.
|
|
66
|
-
4. After all items are done, run the project's verification command.
|
|
67
|
-
5. Transition automatically into Ship mode if the project context or current thread indicates review-then-ship.
|
|
68
|
-
|
|
69
|
-
## Default Continuation (review-then-ship)
|
|
70
|
-
|
|
71
|
-
When the project's `AGENTS.md` or the current thread explicitly asks to "commit after review", "ship if green", or equivalent, transition directly from review to the Ship flow after a clean review. Do not ask again. State "proceeding to ship" before acting.
|
|
72
|
-
|
|
73
57
|
## Project Context Extraction
|
|
74
58
|
|
|
75
59
|
This is Waza's public, standalone code-review capability. It should not depend on private machine paths or unpublished project instructions.
|
|
@@ -92,6 +76,22 @@ See [rules/durable-context.md](../../rules/durable-context.md) for when to read
|
|
|
92
76
|
|
|
93
77
|
For `/check`, private task constraints are `decision`, `preference`, and `principle` entries; review checklists are `pattern` and `learning`. Current code, diff, public docs, CI, tests, and remote state override memory. Durable memory can explain user intent and preferred follow-through, but public project rules still come from README files, manifests, CI workflows, release docs, the diff, and explicit instructions in the current thread. Never cite private memory as a public project requirement.
|
|
94
78
|
|
|
79
|
+
## Plan Execution Mode
|
|
80
|
+
|
|
81
|
+
Activate when the user's message starts with "Implement the following plan", "按计划实施", "按照计划", "整", "可以干", "直接改" followed by a plan body, or links to a `/think` output.
|
|
82
|
+
|
|
83
|
+
In this mode, do not run a code review. Instead:
|
|
84
|
+
|
|
85
|
+
1. State which plan is being executed (first heading or summary line).
|
|
86
|
+
2. Check for obvious repo drift: run `git status --short --branch -uall` and skim any changed files that contradict the plan. If drift makes the plan unsafe, name the specific conflict and stop.
|
|
87
|
+
3. Work through each plan item as a to-do. Mark each complete as you go.
|
|
88
|
+
4. After all items are done, run the project's verification command.
|
|
89
|
+
5. Transition automatically into Ship mode if the project context or current thread indicates review-then-ship.
|
|
90
|
+
|
|
91
|
+
## Default Continuation (review-then-ship)
|
|
92
|
+
|
|
93
|
+
When the project's `AGENTS.md` or the current thread explicitly asks to "commit after review", "ship if green", or equivalent, transition directly from review to the Ship flow after a clean review. Do not ask again. State "proceeding to ship" before acting.
|
|
94
|
+
|
|
95
95
|
## Get the Diff
|
|
96
96
|
|
|
97
97
|
Get the full diff between the current branch and the base branch. If unclear, ask. If already on the base branch, ask which commits to review.
|
|
@@ -141,13 +141,24 @@ This mode extends review; it does not skip review. Before any public or irrevers
|
|
|
141
141
|
1. Extract release rules from public project context: README, manifests, CI workflows, release notes, package scripts, changelogs, and explicit user instructions in the current thread.
|
|
142
142
|
2. Fill the Release Gate 2.0 matrix from `references/project-context.md`: review base, dirty/staged/untracked state, latest tag, origin sync, version fields, generated artifacts, package/archive contents, release assets, registry/appcast/CI, and public issue/PR state.
|
|
143
143
|
3. Verify generated or bundled outputs, version fields, release notes, package contents, and required artifacts are in sync. Prefer dry-run commands when the ecosystem provides them.
|
|
144
|
-
Generated deliverables include tracked archives, ignored dist files, appcasts, site/download copy, registry packages, checksums, and release assets. If project docs require them, regenerate, inspect, and stage or upload them explicitly even when they are ignored by git; do not infer readiness from source-only tests.
|
|
144
|
+
Generated deliverables include tracked archives, ignored dist files, appcasts, site/download copy, registry packages, checksums, and release assets. If project docs require them, regenerate, inspect, and stage or upload them explicitly even when they are ignored by git; do not infer readiness from source-only tests. For remote assets, prefer downloading or reading back the published artifact and comparing entries, checksums, or manifest contents; release page text, file size, or workflow success alone is not artifact proof.
|
|
145
|
+
If the project has preview, beta, nightly, stable, or App Store lanes, name the lane explicitly. Do not use a preview or beta artifact to claim stable release readiness, and do not touch stable appcast, registry, or download surfaces when the requested lane is preview-only unless project docs require it.
|
|
145
146
|
4. Commit only intended files. Preserve unrelated dirty work, serialize git operations so index locks or overlapping adds do not corrupt the workflow, and re-check HEAD/status before pushing so concurrent agent or maintainer commits are not swept into your ship action.
|
|
146
147
|
5. Push, publish, tag, or create a release only when the user has explicitly approved that action. If auth, OTP, CI, registry, or network state blocks the operation, pause and report the exact blocker.
|
|
147
148
|
6. For issue/PR follow-through, confirm the item identity with the host's read command before posting. On GitHub, use `gh issue view` or `gh pr view`; on other hosts, use the CLI/API named by project docs or the current request. Use `references/public-reply.md` for the maintainer reply template (mention, single thanks, facts, explicit next release or verification step) and its closure criteria.
|
|
148
149
|
7. For GitHub release reaction follow-through, only do it when project context or the current thread asks for it. After the release exists and required assets are verified, resolve the release id from the tag, POST every positive release reaction to `repos/<owner>/<repo>/releases/<id>/reactions` with `gh api` or the available GitHub tool, and re-read reactions to confirm. Positive release reactions are `+1`, `laugh`, `heart`, `hooray`, `rocket`, and `eyes`.
|
|
149
150
|
8. After network or API failures, re-read the end state instead of assuming success or failure.
|
|
150
151
|
|
|
152
|
+
### Reworked Or Cancelled Release Gate
|
|
153
|
+
|
|
154
|
+
Activate this gate when a release candidate was cancelled, a preview or beta had repeated bug-fix churn, or the user asks whether a delayed release is finally safe.
|
|
155
|
+
|
|
156
|
+
1. Lock the review base to the last public stable tag or release artifact, then review through current `HEAD`. Do not limit the review to recent commits or the latest local diff.
|
|
157
|
+
2. Record the exact base, `HEAD`, dirty state, origin sync, version fields, generated artifacts, release notes, package contents, CI, and remote distribution state. If any state changes mid-review, refresh the range and rerun the fast gates.
|
|
158
|
+
3. Review by shipped risk surface: user-reported regressions, crash or hang paths, destructive operations, privilege or permission boundaries, background workers, startup or first-frame work, update feeds, package contents, and public support claims.
|
|
159
|
+
4. Output two release decisions, not one: whether the preview or beta can keep taking user testing, and whether stable release prep can start.
|
|
160
|
+
5. Every conclusion must name blockers, deferrable maintenance, commands that ran, and runtime or user-smoke coverage. Source tests alone cannot prove a reworked UI/native release ready.
|
|
161
|
+
|
|
151
162
|
End with the concrete shipped state: commit hash, tag, release URL, registry/version result, pushed branch, release asset state, release reaction state, issue/PR state, and any remaining blockers. Omit fields that do not apply.
|
|
152
163
|
|
|
153
164
|
## Project Audit Mode
|
|
@@ -219,6 +230,8 @@ Measure the diff and classify depth:
|
|
|
219
230
|
|
|
220
231
|
State the depth before proceeding.
|
|
221
232
|
|
|
233
|
+
Static content diffs can stay quick even when they touch several generated files: version strings, dates, release-copy mirrors, sitemap dates, or one-for-one localization copy changes usually need line-by-line readback plus grep consistency, not a specialist fleet. Escalate only when the diff changes logic, generation rules, public distribution behavior, or user-facing semantics beyond the literal text replacement.
|
|
234
|
+
|
|
222
235
|
## Did We Build What Was Asked?
|
|
223
236
|
|
|
224
237
|
Before reading code, check scope drift: do the diff and the stated goal match? Label: **on target** / **drift** / **incomplete**.
|
|
@@ -237,6 +250,10 @@ Drift signals (examples, not exhaustive -- any one is enough to label drift):
|
|
|
237
250
|
|
|
238
251
|
When the diff fixes one instance of a class-of-bug (a missing validation, a wrong selector, an off-by-one, a missing lock), the same shape often lives elsewhere. Extract the pattern signature, `grep -rn` it across the repo (exclude generated dirs), and confirm sibling instances were also handled. List any unswept sibling: flag it as a hard stop when it carries the same risk, advisory when lower-risk. For a deeper sweep playbook, see hunt's Scope Blast Mode.
|
|
239
252
|
|
|
253
|
+
## Testability Seam For Recurring Bugs
|
|
254
|
+
|
|
255
|
+
When the diff fixes a visual, layout, timing, or stateful-UI bug that has recurred (the same area broke before, or the fix reads as "tune a number until it looks right"), a code change alone will let the regression return: the logic is entangled with mutable render or UI state, so there is nowhere to assert on it. Flag the fix as incomplete unless it pulls the decision into a pure function -- inputs in, value out, no mutable receiver -- and unit-tests the invariant that was violated (a width never collapses to zero, a hit region stays half-open, an offset stays in bounds). "Verified by running the app" confirms this one instance; only a pinned invariant stops the next one. Reserve this for classes that recur or that runtime checks cannot see; do not demand a seam for one-off logic that already has straightforward coverage.
|
|
256
|
+
|
|
240
257
|
## CLI Command Surface
|
|
241
258
|
|
|
242
259
|
When a diff touches a CLI entrypoint, installer, completion, config/env handling, package wrapper, or a mutating command such as cleanup, update, uninstall, migration, or cache removal, fill the CLI Command Surface from `references/project-context.md` before sign-off.
|
|
@@ -251,6 +268,7 @@ Examples, not exhaustive -- flag any diff that could cause irreversible harm if
|
|
|
251
268
|
|
|
252
269
|
- **No unverified claims.** Do not write "I verified X", "I ran Y", "tests pass", or "this fixes Z" unless the shell output is in this turn's transcript. If you reason about behavior without running, say "based on reading the code" instead of "I verified". Every verification claim in the sign-off must point to a command that actually ran in this session.
|
|
253
270
|
- **Re-read before citing source-of-truth facts.** Before writing a line number, dirty-file count, branch ahead/behind state, fallback behavior, locale coverage, or release artifact state into a handoff or review report, re-read the source in this turn (`git status`, `git diff`, file `Read`, `rg`, command output). Earlier chat context, prior agent's notes, and your own recall from a hundred turns ago are stale by default; restating "the catalog uses en fallback" or "the file is at line 310" without checking has been the recurring failure mode in long sessions. Cite the verification path inline (`per current Read of <file>` / `per `git status` this turn`) so reviewers know which facts are anchored.
|
|
271
|
+
- **String-matching on captured output?** When a diff branches on, greps, or classifies an error message or command output, verify what that string actually holds at runtime before approving. A subprocess spawned with `stdio: 'inherit'` (or any uncaptured pipe) streams its diagnostics to the terminal, not into `error.message` -- which then contains only the command line. Such a matcher silently matches the command, not the output: it can pass tests, fire on the wrong token, or be dead in production while looking correct. Probe the real `error.message` (a one-line repro) instead of assuming, and prefer driving behavior off a structured fact the caller already holds (build target, exit code) over re-parsing a string.
|
|
254
272
|
- **Destructive auto-execution**: any task marked "safe" or "auto-run" that modifies user-visible state (history files, config, preferences, installed software) must require explicit confirmation.
|
|
255
273
|
- **Release artifacts missing**: verify every artifact listed in release notes, release templates, or project workflows exists and has been uploaded before declaring done.
|
|
256
274
|
- **Generated artifact drift**: if source changes require generated or bundled outputs, verify the output was regenerated and included.
|
|
@@ -313,6 +331,8 @@ Load `references/persona-catalog.md` to determine which specialists activate. La
|
|
|
313
331
|
|
|
314
332
|
Merge findings: when two specialists flag the same code location, keep the higher severity and note cross-reviewer agreement. Findings on different code locations are never duplicates even if they share a theme.
|
|
315
333
|
|
|
334
|
+
Treat each specialist finding as a claim to verify, not a fact to act on. Before routing a finding to Autofix or sign-off, re-read the cited code this turn and confirm it is real and live: not already handled elsewhere, not consistent-by-design, not a latent-only risk labeled as a live bug. Parallel reviewers over-report from name-based inference and partial context; drop or downgrade what dissolves on direct read, and cite the verification path.
|
|
335
|
+
|
|
316
336
|
## Autofix Routing
|
|
317
337
|
|
|
318
338
|
| Class | Definition | Action |
|
|
@@ -340,6 +360,8 @@ If the script exits non-zero or prints `(no test command detected)`: halt. Do no
|
|
|
340
360
|
|
|
341
361
|
For bug fixes: a regression test that fails on the old code must exist before the fix is done.
|
|
342
362
|
|
|
363
|
+
In a dirty or multi-agent checkout, a passing local build or test run is not proof your change is sound: unrelated WIP already in the tree can supply missing symbols, mask a break, or fail for reasons unrelated to you. Verify in isolation -- `git worktree add --detach <known-good-commit>`, `git apply` only the diff of the files you own, then build/test there. The clean isolated pass is the real signal; the contaminated local pass is not.
|
|
364
|
+
|
|
343
365
|
## Gotchas
|
|
344
366
|
|
|
345
367
|
| What happened | Rule |
|
|
@@ -21,11 +21,13 @@ Use this template to compress repository context before running Waza `/check`. T
|
|
|
21
21
|
- Generated or bundled artifacts that must stay in sync with source changes.
|
|
22
22
|
- Packaging source of truth: whether archives are built from `git ls-files`, explicit allowlists, generated manifests, or source directories.
|
|
23
23
|
- Delivery surfaces: whether generated outputs are tracked, ignored, external release assets, registry uploads, appcasts, installer metadata, checksums, or site/download copy; how they are regenerated, inspected, staged, or uploaded.
|
|
24
|
+
- Distribution lanes: preview, beta, nightly, stable, App Store, or registry channels, and which generated artifacts belong to each lane.
|
|
24
25
|
- CLI command surfaces: entrypoints, subcommands, flags, help/version behavior, exit codes, stdout/stderr contract, TTY and non-interactive paths, config/env precedence, and installed-runtime checks.
|
|
25
26
|
- Runtime dependencies introduced by the diff: Python packages, CLIs, network services, package managers, or platform tools that are not already declared in CI/docs.
|
|
26
27
|
- Domain-specific safety rules.
|
|
27
28
|
- Release artifacts that must exist.
|
|
28
29
|
- GitHub release reactions or other public release follow-through expected by the project.
|
|
30
|
+
- Release-asset verification method: download, archive entry comparison, checksum manifest, package metadata readback, appcast readback, or registry query.
|
|
29
31
|
- Public issue or PR reply conventions.
|
|
30
32
|
- Known CI or test flakes documented by the project and how to distinguish them from real failures.
|
|
31
33
|
- Release, publish, push, or issue-closure prerequisites documented by the project.
|
|
@@ -72,17 +74,15 @@ Use this template to compress repository context before running Waza `/check`. T
|
|
|
72
74
|
|
|
73
75
|
## Public Replies
|
|
74
76
|
|
|
75
|
-
-
|
|
76
|
-
- Do not post comments, close issues, or merge PRs without maintainer approval.
|
|
77
|
-
- For accepted PRs, prefer updating the contributor branch and merging the PR; close without merge only when the direction is rejected, unsafe, out of scope, or the branch cannot be updated and a maintainer commit is explicitly needed.
|
|
78
|
-
- Default reply shape: `@<user>` + thanks, brief reason/action, then update command, release/version, or next step.
|
|
79
|
-
- Keep shipped-fix replies to 1-2 natural sentences unless the project explicitly uses a longer template.
|
|
77
|
+
See `public-reply.md` for the full reply template (language match, `@user` + thanks, factual paragraphs, ship-state line, closure criteria). It is the single source; do not restate the rules here.
|
|
80
78
|
|
|
81
79
|
## Release Follow-through
|
|
82
80
|
|
|
83
81
|
- Version fields to check: `<manifest>`, `<app config>`, `<lockfile>`.
|
|
84
82
|
- Generated artifacts to check: `<artifact>` from `<source>`.
|
|
83
|
+
- Distribution lane: `<preview/beta/nightly/stable/etc.>` and which public surfaces it is allowed to touch.
|
|
85
84
|
- Dry-run command before publishing: `<command>`.
|
|
85
|
+
- Remote asset proof: `<download/readback command>` that checks content, manifest, digest, appcast, or registry state.
|
|
86
86
|
- GitHub release reactions to add after asset verification: `<+1/laugh/heart/hooray/rocket/eyes or none>`.
|
|
87
87
|
- Public state to re-read after publishing or closing: `<registry/release/issue URL or command>`.
|
|
88
88
|
```
|
|
@@ -99,10 +99,11 @@ Fill this before claiming a change is release-ready. Use "n/a" only when the pro
|
|
|
99
99
|
| Worktree state | Dirty, staged, and untracked files accounted for |
|
|
100
100
|
| Remote state | `origin/main` or release branch sync checked |
|
|
101
101
|
| Version fields | Manifest, app config, changelog, appcast, and lockfile versions aligned |
|
|
102
|
+
| Distribution lane | Preview, beta, nightly, stable, registry, or app-store lane named, with unrelated lanes left untouched |
|
|
102
103
|
| Runtime dependencies | Newly introduced Python packages, CLIs, package managers, and network tools declared and available in CI |
|
|
103
104
|
| Generated artifacts | Tracked archives, ignored dist outputs, bundled/minified files, appcasts, installer metadata, checksums, and site/download copy regenerated or proven not needed |
|
|
104
105
|
| Package/archive contents | Built package inspected for required files, newly introduced helpers/references, and missing extras |
|
|
105
|
-
| Release assets | GitHub release, appcast, download archive, checksum, or installer assets verified |
|
|
106
|
+
| Release assets | GitHub release, appcast, download archive, checksum, or installer assets downloaded or read back and verified beyond page text or file size |
|
|
106
107
|
| Registry/appcast | npm/crates/Homebrew/appcast/App Store or equivalent state re-read after publish |
|
|
107
108
|
| CI status | Latest required checks passed or blocker named |
|
|
108
109
|
| Issue/PR state | Target issue or PR re-read before commenting, closing, merging, or saying shipped |
|
|
@@ -85,6 +85,12 @@ CLI_CORE_BUCKETS = (
|
|
|
85
85
|
)
|
|
86
86
|
|
|
87
87
|
|
|
88
|
+
# The file-walk helpers below are deliberately duplicated in
|
|
89
|
+
# skills/health/scripts/check_maintainability.py. Both scripts ship
|
|
90
|
+
# standalone (see packaging.allowlist) and run inside an arbitrary target
|
|
91
|
+
# project, so they import only stdlib. Do not hoist them into a shared
|
|
92
|
+
# scripts/ module: it is dev-only, not on the ship allowlist, and would
|
|
93
|
+
# couple a standalone tool to the install layout.
|
|
88
94
|
def is_excluded(path: Path, root: Path) -> bool:
|
|
89
95
|
try:
|
|
90
96
|
parts = path.relative_to(root).parts
|
|
@@ -158,23 +164,18 @@ def status(label: str) -> None:
|
|
|
158
164
|
|
|
159
165
|
def block_hotspots(files: list[Path], root: Path) -> None:
|
|
160
166
|
header("FILE SIZE HOTSPOTS")
|
|
167
|
+
sized = ((p, line_count(p)) for p in files if p.suffix in SOURCE_EXTS)
|
|
161
168
|
big = sorted(
|
|
162
|
-
(
|
|
163
|
-
if p.suffix in SOURCE_EXTS and line_count(p) >= HOTSPOT_LINES),
|
|
169
|
+
(item for item in sized if item[1] >= HOTSPOT_LINES),
|
|
164
170
|
key=lambda x: -x[1],
|
|
165
171
|
)[:10]
|
|
166
172
|
if not big:
|
|
167
|
-
print("(no source files >=
|
|
173
|
+
print(f"(no source files >= {HOTSPOT_LINES} lines)")
|
|
168
174
|
status("PASS")
|
|
169
175
|
return
|
|
170
176
|
for path, n in big:
|
|
171
177
|
print(f" {n:>5} {rel(path, root)}")
|
|
172
|
-
if any(n >= HOTSPOT_FAIL for _, n in big)
|
|
173
|
-
status("FAIL")
|
|
174
|
-
elif len(big) > 3:
|
|
175
|
-
status("WARN")
|
|
176
|
-
else:
|
|
177
|
-
status("WARN")
|
|
178
|
+
status("FAIL" if any(n >= HOTSPOT_FAIL for _, n in big) else "WARN")
|
|
178
179
|
|
|
179
180
|
|
|
180
181
|
def block_heredoc(files: list[Path], root: Path) -> None:
|
package/skills/design/SKILL.md
CHANGED
|
@@ -22,6 +22,8 @@ If it could have been generated by a default prompt, it is not good enough.
|
|
|
22
22
|
|
|
23
23
|
**Chinese gut-feel complaints**: when the user says "很傻", "很怪", "突兀", "不协调", "不和谐" about a visual, treat it as an aesthetic rejection, not a debugging symptom. Route to Screenshot Iteration Mode, not to `/hunt`.
|
|
24
24
|
|
|
25
|
+
**Document & print typography → Kami.** When the deliverable is a shippable document rather than a product UI surface (report, slide deck, resume, long-form or print-oriented page, paged PDF), do not hand-roll an over-designed document layout here. Suggest the user run it through Kami (`tw93/Kami`), a document design system with a fixed constraint language and templates, and let Kami draft the detailed plan. Screen 排版 (app surfaces, components, web pages) stays in this skill.
|
|
26
|
+
|
|
25
27
|
## Durable Context Preflight
|
|
26
28
|
|
|
27
29
|
See [rules/durable-context.md](../../rules/durable-context.md) for when to read durable context, the read-order budget, and the memory-type mapping.
|
|
@@ -123,7 +125,7 @@ Summarize the direction as three lines before writing any code:
|
|
|
123
125
|
|
|
124
126
|
For production or multi-page UIs, expand the thesis into the 9-section DESIGN.md scaffold in `references/design-reference.md` (theme, palette, typography, components, layout, depth, do/don't, responsive, prompt guide). For a single component, the three lines are sufficient.
|
|
125
127
|
|
|
126
|
-
##
|
|
128
|
+
## Hard Rules
|
|
127
129
|
|
|
128
130
|
`references/design-reference.md` is already loaded during direction lock. It owns the full rules: typography, OKLCH color, motion timings, layout defaults, CSS-pattern bans, accessibility baseline, and complexity matching. Apply them. Do not restate them here.
|
|
129
131
|
|
|
@@ -143,6 +145,7 @@ Give at least 3 variations across genuinely different dimensions (density, typog
|
|
|
143
145
|
| Fixed visual polish by redesigning the whole surface | Locate the concrete visual delta first, then make the smallest material, opacity, geometry, or typography change that addresses it. |
|
|
144
146
|
| Added a setting or louder control to solve UI noise | Remove the misleading affordance or choose a quiet default first |
|
|
145
147
|
| English looked fine, localized text overflowed | Test long words and localized strings before handoff, especially inside buttons, tabs, nav, and compact cards. |
|
|
148
|
+
| Relied on `…` truncation to fit text in a fixed-width slot | Guarantee fit instead: compact the format, cap to whole segments, or hard-trim with no glyph. Metric and label footers must never tail-truncate into an ellipsis. |
|
|
146
149
|
|
|
147
150
|
## Aesthetic Review
|
|
148
151
|
|
|
@@ -123,6 +123,13 @@ When extending an existing interface, first spend time understanding its visual
|
|
|
123
123
|
|
|
124
124
|
If swapping in different content would make the new component look out of place, the vocabulary was not matched closely enough.
|
|
125
125
|
|
|
126
|
+
### Responsive & Screen Verification
|
|
127
|
+
- Verify the rendered surface, not a type check or CSS-balance read. Several regressions (early wraps, orphaned separator dots, table overflow) are invisible in source and only show in the render. Screenshot at phone (375px, plus 320px for buttons) and desktop (1280px), in every shipped locale.
|
|
128
|
+
- Line widows: eliminate 1-2 word last lines by trimming the copy so the block rebalances, not by adding a `max-width` cap (a cap narrower than its container wraps early and leaves empty space on the right, which reads as a premature break). Detect objectively: flag any text block whose last line is under ~13% of its widest line; eyeballing misses them, and nested `<code>` hides them from greps.
|
|
129
|
+
- Mobile CTA resting state: natural width, left-aligned to the surrounding text edge, height unchanged. Centering reads as floating; full-width `flex: 1` reads heavy; dropping button height to relieve a "too full" feel treats a width problem as a height one.
|
|
130
|
+
- Spacing is a system, not a per-gap value. Run section spacing as one responsive ladder; when a page reads too airy or too tight, scale the whole set by a single factor across all breakpoints rather than tuning one gap. Asymmetry that survives tuning is structural.
|
|
131
|
+
- Long-form and documentation surfaces stay light: a borderless prev/next text pager (not bordered cards), a sidebar active state as a thin rail rather than a filled block, and build-time zero-runtime-JS code highlighting (bake static spans, plain code stays the source) over a shipped highlighter.
|
|
132
|
+
|
|
126
133
|
## Data Visualization Surfaces
|
|
127
134
|
|
|
128
135
|
For dashboards, analytics views, chart-heavy interfaces, or number-dense displays, load `references/design-data-viz.md`. It owns dashboard defaults, chart selection, number alignment, and product-benchmark extraction.
|
|
@@ -140,6 +147,16 @@ Reject: Inter, DM Sans, DM Serif Display, DM Serif Text, Outfit, Plus Jakarta Sa
|
|
|
140
147
|
3. Reject all three.
|
|
141
148
|
4. Pick a typeface from a named foundry (Klim, Commercial Type, Colophon, Grilli Type, OH no Type, Village, etc.) or an open-source option with a clear personality that matches the brand words. Be able to explain why that specific typeface in one sentence.
|
|
142
149
|
|
|
150
|
+
## CJK & Multilingual Type
|
|
151
|
+
|
|
152
|
+
When the interface mixes Chinese, Japanese, or Korean with Latin, Latin-only type rules silently break the CJK text. Apply these before handoff:
|
|
153
|
+
|
|
154
|
+
- **Latin face first, system CJK face after** in the stack, so each script renders with correct glyphs: `font-family: -apple-system, "SF Pro Text", "PingFang SC", "Noto Sans SC", sans-serif;`. Latin runs use the Latin face; Han characters fall through to the CJK face.
|
|
155
|
+
- **Give CJK body text more line-height than Latin**: roughly 1.7–1.8 for reading. Dense Hanzi needs more vertical room than the 1.4–1.5 that suits Latin body copy.
|
|
156
|
+
- **Tag runs with `lang="zh"` / `lang="ja"` / `lang="en"`** so the browser picks the right font and line-breaking. Mixed-language paragraphs break badly without it.
|
|
157
|
+
- **Serif reading modes need an explicit CJK serif fallback.** Most Latin "reading serif" webfonts carry no CJK glyphs, so a serif toggle silently drops Chinese back to a sans and looks broken. Pair them: `"Newsreader", "Songti SC", "Noto Serif SC", serif`.
|
|
158
|
+
- **Do not apply negative letter-spacing to CJK runs.** The display-type tracking rule above is Latin-only; tightening tracking on Hanzi cramps the glyphs and reads as a rendering bug. Scope tracking to `lang="en"` runs.
|
|
159
|
+
|
|
143
160
|
## Color System: OKLCH Rules
|
|
144
161
|
|
|
145
162
|
- Use OKLCH instead of HSL. OKLCH is perceptually uniform: equal numeric changes produce equal perceived changes across the spectrum.
|
package/skills/health/SKILL.md
CHANGED
|
@@ -40,13 +40,11 @@ For `/health`, audit expectations are `decision`, `preference`, and `principle`
|
|
|
40
40
|
|
|
41
41
|
Pick one. Apply only that tier's requirements.
|
|
42
42
|
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
|
46
|
-
| **
|
|
47
|
-
| **
|
|
48
|
-
| **Complex** | >5K files, multi-contributor, active CI | Full six-layer setup required |
|
|
49
|
-
|
|
43
|
+
| Tier | Signal | What's expected |
|
|
44
|
+
|---|---|---|
|
|
45
|
+
| **Simple** | <500 files, 1 contributor, no CI | CLAUDE.md only; 0-1 skills; hooks optional |
|
|
46
|
+
| **Standard** | 500-5K files, small team or CI | CLAUDE.md + 1-2 rules; 2-4 skills; basic hooks |
|
|
47
|
+
| **Complex** | >5K files, multi-contributor, active CI | Full six-layer setup required |
|
|
50
48
|
|
|
51
49
|
## Step 1: Collect data
|
|
52
50
|
|
|
@@ -86,7 +84,11 @@ The collector includes both runtime-specific and agent-agnostic surfaces:
|
|
|
86
84
|
|
|
87
85
|
Test every MCP server: call one harmless tool per server. Record `live=yes/no` with error detail. Respect `enabled: false` (skip without flagging). For API keys, only check if the env var is set (`echo $VAR | head -c 5`), never print full keys.
|
|
88
86
|
|
|
89
|
-
##
|
|
87
|
+
## Step 1c: Safety and security checks
|
|
88
|
+
|
|
89
|
+
These run after collection and before the Step 2 analysis. The first two apply to every audit; the third only to projects with long-running or autonomous agents.
|
|
90
|
+
|
|
91
|
+
### Security Baseline Checks
|
|
90
92
|
|
|
91
93
|
Run these on every audit, regardless of tier. They are the floor, not the ceiling.
|
|
92
94
|
|
|
@@ -94,15 +96,15 @@ Run these on every audit, regardless of tier. They are the floor, not the ceilin
|
|
|
94
96
|
|
|
95
97
|
**Environment override surface.** Treat the following as attack surface, report when set in tracked files or shipped settings without a justification comment: API base-URL overrides (redirect all traffic to a third party), auto-trust flags for project-local MCP servers, wildcard tool allowlists (`allowedTools: ["*"]`), and permission-skip flags (`--dangerously-skip-permissions` or equivalents). Print file:line and the key name only; never print secrets.
|
|
96
98
|
|
|
97
|
-
|
|
99
|
+
### Memory and Skill Supply Chain
|
|
98
100
|
|
|
99
101
|
Treat agent memory and third-party skills as supply-chain artifacts. They run with the user's privileges.
|
|
100
102
|
|
|
101
103
|
**Memory hygiene.** Audit the project's long-term agent memory store for secrets, tokens, or credentials (Critical), and for entries written by untrusted runs (subagent invoked on attacker-controlled input, /loop iteration over external content); recommend rotation after such runs. For high-risk one-off runs (untrusted PDFs, uncontrolled scraping, third-party scripts), recommend disabling memory persistence for that session entirely.
|
|
102
104
|
|
|
103
|
-
**Skill supply chain.** Third-party skills, plugins, and MCP servers run with the user's privileges. For each one not authored in this repo, check: source pinned to a release tag (not `main
|
|
105
|
+
**Skill supply chain.** Third-party skills, plugins, and MCP servers run with the user's privileges. For each one not authored in this repo, check: source pinned to a release tag or revision (not `main`, a branch, or a remote git marketplace left tracking its latest head), hook handlers do not write to credential directories, MCP servers have explicit user consent (not auto-trusted by wildcard). Report unpinned sources or unreviewed hook handlers as Structural, not Critical, unless an active exploit signal is present.
|
|
104
106
|
|
|
105
|
-
|
|
107
|
+
### Long-Running Agent Stop Conditions
|
|
106
108
|
|
|
107
109
|
For projects that use `/loop`, autonomous agents, or any long-running agent flow, the project must define explicit stop conditions. An agent that never stops is a budget and safety incident waiting to happen.
|
|
108
110
|
|
|
@@ -113,7 +115,7 @@ Audit for these four hard stop signals; flag the absence of each as a Structural
|
|
|
113
115
|
3. **Cost or token budget exceeded.** Project should declare a per-run budget (tokens, API spend, wall-clock minutes). Loop exits when the budget is hit, not when work is done.
|
|
114
116
|
4. **External blockers.** Merge conflict on the target branch, dependency lock the agent cannot resolve, missing credential, network unreachable. Any of these halt the loop and ask the user, not retry forever.
|
|
115
117
|
|
|
116
|
-
The stop conditions should live in tracked project docs (`AGENTS.md`, the loop's launch script, or a dedicated config), not only in the agent's prompt. Prompts are forgettable; tracked config is enforceable. Recommend hooks (PostToolUse on the relevant tools) over prompt instructions when the project supports them: a hook physically cannot be skipped, a prompt instruction can.
|
|
118
|
+
The stop conditions should live in tracked project docs (`AGENTS.md`, the loop's launch script, or a dedicated config), not only in the agent's prompt. Prompts are forgettable; tracked config is enforceable. Recommend hooks (PostToolUse on the relevant tools) over prompt instructions when the project supports them: a hook physically cannot be skipped, a prompt instruction can. Confirm the host's hook coverage before recommending one: some agents only fire PostToolUse for a subset of tools (for example, a runtime may match shell/Bash only), so a fixup that must run after file edits belongs on a Stop or session-end hook there instead.
|
|
117
119
|
|
|
118
120
|
## Step 2: Analyze
|
|
119
121
|
|
|
@@ -167,7 +169,19 @@ bash skills/health/scripts/check-agent-context.sh . summary
|
|
|
167
169
|
|
|
168
170
|
**AI-maintainability gaps.** Use `AI MAINTAINABILITY SUMMARY` in summary mode and `AI MAINTAINABILITY DETAIL` in deep mode. Report `FAIL` when the project has no executable verification command, no agent instruction surface for a non-trivial repo, or broken doc references. Report `WARN` when instructions exist but lack a project map, verification guidance, boundary/non-goal language, when TODO/HACK markers are concentrated, when large source hotspots lack ownership/boundary and verification guidance, or when durable docs contain raw one-off review reports, scorecards, dated line references, or diagnostic dumps instead of stable invariants. Treat missing `docs/`, `specs/`, `.specify/`, `HANDOFF.md`, `CHANGELOG`, issue templates, and PR templates as informational unless project complexity makes them necessary for handoff. The action for stale reports is to extract stable rules into public instructions, rules, references, or verifier scripts, then remove or archive the transient report.
|
|
169
171
|
|
|
170
|
-
**Conversation-derived guidance.** When a health audit reads recent agent conversations, do not recommend copying the conversation or a scorecard into docs. Recommend a candidate-matrix pass instead:
|
|
172
|
+
**Conversation-derived guidance.** When a health audit reads recent agent conversations, do not recommend copying the conversation or a scorecard into docs. Recommend a candidate-matrix pass instead:
|
|
173
|
+
|
|
174
|
+
| Field | Question |
|
|
175
|
+
|---|---|
|
|
176
|
+
| Repeated failure | Did this recur across fixes, releases, agents, or user reports? |
|
|
177
|
+
| Durable invariant | Can the lesson be stated as a stable rule, not a dated incident summary? |
|
|
178
|
+
| Target layer | Should it live in project instructions, a Waza skill, a global rule, or private memory? |
|
|
179
|
+
| Verifier | Is there a deterministic command, script, artifact check, or runtime smoke that can enforce it? |
|
|
180
|
+
| Redaction risk | Does the lesson require local paths, issue numbers, customer details, machine state, secrets, or unpublished release facts? |
|
|
181
|
+
|
|
182
|
+
Layering rule: project-specific commands, app names, artifact names, and release rituals stay in the project; reusable workflows such as cancelled-release review gates or native-freeze evidence ladders belong in Waza skills; universal honesty and verification rules belong in global CLAUDE/AGENTS; private user preferences and one-machine facts stay in memory. If the lesson cannot pass the redaction-risk field, keep it out of public guidance.
|
|
183
|
+
|
|
184
|
+
**Concentrated fix chains.** Run `git log --oneline --since='2 weeks ago' | grep -i fix` and group by area (the prefix before `:` or `(`). When the same area has 3+ fix commits in a short window, it signals a missing structural invariant: each fix is a guess at a rule that was never written down. Report a Structural `WARN` with the area name, fix count, and recommend adding an explicit rule to `AGENTS.md` / `CLAUDE.md` / project rules that captures the invariant those fixes were converging toward. A concentrated fix chain that touches the same file 4+ times is a stronger signal than scattered fixes across different files.
|
|
171
185
|
|
|
172
186
|
**Hotspot ownership gaps.** In deep mode, read `HOTSPOT OWNERSHIP SURFACE`. If a largest source file exceeds the hotspot threshold and `AGENTS.md` / `CLAUDE.md` / shared instruction files do not name who owns the hotspot, what boundary should stay stable, and which verification command covers it, report a Structural `WARN`. Do not treat documented large files as code rot by size alone; some modules are intentionally large.
|
|
173
187
|
|
|
@@ -231,15 +245,14 @@ If no issues: `All relevant checks passed. Nothing to fix.`
|
|
|
231
245
|
|
|
232
246
|
## Gotchas
|
|
233
247
|
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
|
237
|
-
|
|
|
238
|
-
|
|
|
239
|
-
|
|
|
240
|
-
| Flagged intentionally noisy hook as broken | Ask before calling a hook "broken" |
|
|
248
|
+
| What happened | Rule |
|
|
249
|
+
|---|---|
|
|
250
|
+
| Missed the local override | Always read `settings.local.json` too; it shadows the committed file |
|
|
251
|
+
| Subagent timeout reported as MCP failure | MCP failures come from the live probe, not data collection |
|
|
252
|
+
| Reported issues in wrong language | Honor CLAUDE.md Communication rule first |
|
|
253
|
+
| Flagged intentionally noisy hook as broken | Ask before calling a hook "broken" |
|
|
241
254
|
| Hook seemed not to fire, but it did -- a later UI element rendered above it | Hook firing order is not visual order. Before re-editing the hook config: (a) confirm with `--debug` or by piping output, (b) check whether a diff dialog, permission prompt, or other UI element rendered on top and pushed the hook output offscreen, (c) only then suspect the hook itself. |
|
|
242
|
-
| `/health` burned too much quota on first run
|
|
243
|
-
| Treated missing specs/docs as a failure
|
|
244
|
-
| Treated an ignored AGENTS/CLAUDE file as durable project truth
|
|
245
|
-
| Treated a review scorecard as maintainability documentation
|
|
255
|
+
| `/health` burned too much quota on first run | Stay in summary mode first. Full conversation extracts and inspector subagents are deep-audit tools, not the default path for Standard projects. |
|
|
256
|
+
| Treated missing specs/docs as a failure | Decision artifacts are optional by default. Escalate missing docs/specs only when the tier, active handoff risk, or user request makes them necessary. |
|
|
257
|
+
| Treated an ignored AGENTS/CLAUDE file as durable project truth | Report whether the rule is tracked and distributed. Local overlays can inform the audit, but durable fixes belong in public repo docs or shipped skill/rule files. |
|
|
258
|
+
| Treated a review scorecard as maintainability documentation | Scorecards are snapshots. Extract the invariant and verification path, then remove or archive the report instead of calling the score itself a durable rule. |
|
|
@@ -179,7 +179,7 @@ def parse_codex_config(
|
|
|
179
179
|
if "=" not in line:
|
|
180
180
|
continue
|
|
181
181
|
key, value = [part.strip() for part in line.split("=", 1)]
|
|
182
|
-
if section == "features" and value.lower() == "true":
|
|
182
|
+
if section == "features" and value.split("#", 1)[0].strip().strip('"').lower() == "true":
|
|
183
183
|
features.append(key)
|
|
184
184
|
elif section.startswith('projects."') and key == "trust_level":
|
|
185
185
|
project = section[len('projects."'): -1]
|
|
@@ -66,6 +66,12 @@ VERIFICATION_WORD_RE = re.compile(
|
|
|
66
66
|
)
|
|
67
67
|
|
|
68
68
|
|
|
69
|
+
# The file-walk helpers below are deliberately duplicated in
|
|
70
|
+
# skills/check/scripts/audit_signals.py. Both scripts ship standalone
|
|
71
|
+
# (see packaging.allowlist) and run inside an arbitrary target project, so
|
|
72
|
+
# they import only stdlib. Do not hoist them into a shared scripts/
|
|
73
|
+
# module: it is dev-only, not on the ship allowlist, and would couple a
|
|
74
|
+
# standalone tool to the install layout.
|
|
69
75
|
def rel(path: Path, root: Path) -> str:
|
|
70
76
|
try:
|
|
71
77
|
return path.resolve().relative_to(root).as_posix()
|
|
@@ -8,7 +8,7 @@
|
|
|
8
8
|
# python3 not on PATH -> MCP/hooks/allowedTools sections print "(unavailable)"; do not flag those areas
|
|
9
9
|
# settings.local.json absent -> hooks, MCP, allowedTools all show "(unavailable)"; normal for global-settings-only projects
|
|
10
10
|
# MEMORY.md path -> built via sed on pwd; unusual chars produce wrong project key; verify manually if (none) seems wrong
|
|
11
|
-
# Conversation scope ->
|
|
11
|
+
# Conversation scope -> 2 most recent PREVIOUS .jsonl sampled (live session skipped); fewer than 2 = [LOW CONFIDENCE]
|
|
12
12
|
# MCP token estimate -> assumes ~25 tools/server, ~200 tokens/tool; treat as directional, not precise
|
|
13
13
|
# Tier misclassification -> .next/, __pycache__, .turbo/ can inflate file count; recheck manually if tier feels wrong
|
|
14
14
|
set -euo pipefail
|
|
@@ -293,9 +293,10 @@ sample_jsonl_prefix() {
|
|
|
293
293
|
' "$file"
|
|
294
294
|
}
|
|
295
295
|
|
|
296
|
-
|
|
297
|
-
|
|
298
|
-
|
|
296
|
+
# Shared jq filter: collapse one transcript record to a single trimmed text
|
|
297
|
+
# line, dropping meta and tool-result noise. Defined once and prepended to both
|
|
298
|
+
# extract_* programs below so the flattening logic lives in exactly one place.
|
|
299
|
+
JQ_FLATTEN='
|
|
299
300
|
def flatten:
|
|
300
301
|
if (.isMeta // false) or (.toolUseResult? != null) then
|
|
301
302
|
empty
|
|
@@ -311,6 +312,11 @@ extract_messages_from_file() {
|
|
|
311
312
|
| sub("^ "; "")
|
|
312
313
|
| sub(" $"; "")
|
|
313
314
|
end;
|
|
315
|
+
'
|
|
316
|
+
|
|
317
|
+
extract_messages_from_file() {
|
|
318
|
+
local file="$1"
|
|
319
|
+
sample_jsonl_prefix "$file" | jq -r "$JQ_FLATTEN"'
|
|
314
320
|
(.type // .role // "") as $kind
|
|
315
321
|
| (flatten) as $text
|
|
316
322
|
| if ($text | length) == 0 then
|
|
@@ -329,22 +335,7 @@ extract_messages_from_file() {
|
|
|
329
335
|
|
|
330
336
|
extract_signals_from_file() {
|
|
331
337
|
local file="$1"
|
|
332
|
-
sample_jsonl_prefix "$file" | jq -r '
|
|
333
|
-
def flatten:
|
|
334
|
-
if (.isMeta // false) or (.toolUseResult? != null) then
|
|
335
|
-
empty
|
|
336
|
-
else
|
|
337
|
-
(.message.content // .content // .text // "")
|
|
338
|
-
| if type == "array" then
|
|
339
|
-
[ .[] | if type == "object" and .type == "text" then .text elif type == "string" then . else empty end ] | join(" ")
|
|
340
|
-
elif type == "string" then .
|
|
341
|
-
else empty
|
|
342
|
-
end
|
|
343
|
-
| gsub("[\\r\\n]+"; " ")
|
|
344
|
-
| gsub(" +"; " ")
|
|
345
|
-
| sub("^ "; "")
|
|
346
|
-
| sub(" $"; "")
|
|
347
|
-
end;
|
|
338
|
+
sample_jsonl_prefix "$file" | jq -r "$JQ_FLATTEN"'
|
|
348
339
|
def is_correction:
|
|
349
340
|
test("(?i)(\\bdon'\''t\\b|\\bdo not\\b|\\bplease don'\''t\\b|\\binstead\\b|\\bnext time\\b|\\bremember\\b|\\buse\\b.*\\binstead\\b|\\bnot\\b.*\\bbut\\b)")
|
|
350
341
|
or test("(不要再|请不要|不要|别再|下次|记得|改成|改为|而不是|别用|去掉|统一成)");
|
package/skills/hunt/SKILL.md
CHANGED
|
@@ -46,7 +46,7 @@ For `/hunt`, diagnostic constraints are `decision`, `preference`, and `principle
|
|
|
46
46
|
- **System/tooling symptoms need a lower-layer baseline.** Before blaming the visible app, generated file, or top-level feature, measure the raw lower layer first: OS capture versus post-processing, runtime service versus UI, compiler/toolchain versus test assertion, network/API versus client handling. Retire hypotheses that the baseline disproves instead of circling them.
|
|
47
47
|
- **Pay attention to deflection.** When someone says "that part doesn't matter," treat it as a signal. The area someone avoids examining is often where the problem lives.
|
|
48
48
|
- **Visual/rendering bugs: static analysis first.** Trace paint layers, stacking contexts, and layer order in DevTools before adding console.log or visual debug overlays. Logs cannot capture what the compositor does. Only add instrumentation after static analysis fails.
|
|
49
|
-
- **Behavioral / lifecycle bugs:
|
|
49
|
+
- **Behavioral / lifecycle / async bugs: instrument first, not after failure.** Window lifecycle, event delivery, navigation, focus, timer, state-machine, and async-ordering bugs almost never yield to static reading alone. Do not wait for a failed fix to add logs. The moment your hypothesis involves "this callback fires before/after that one", "this state should be X when Y runs", or "this object should still be alive here", **add the log immediately as part of forming the hypothesis**, before writing any fix. A hypothesis without runtime evidence is a guess; two guesses in a row is the hard-stop signal. Distinguish from visual-rendering bugs (compositor behavior needs DevTools, not logs) and pure-logic bugs (wrong formula, off-by-one) where static analysis is sufficient.
|
|
50
50
|
- **Tuning magic numbers past round three: stop, unify.** When a spacing / sizing / threshold value has been adjusted three times and still looks wrong, the bug is structural, not numeric. Replace the N independent values with one named token (`Spacing.s4`, `--gap-content`, etc.) and verify the asymmetry was hiding a missing constraint. Asymmetry that survives tuning is structural; more tuning will not converge.
|
|
51
51
|
- **Fix the cause, not the symptom.** If the fix touches more than 5 files, pause and confirm scope with the user.
|
|
52
52
|
|
|
@@ -102,7 +102,7 @@ If the blast surfaces unrelated bugs, list them but do not fix in this PR unless
|
|
|
102
102
|
|
|
103
103
|
## Confirm or Discard
|
|
104
104
|
|
|
105
|
-
|
|
105
|
+
The instrument-first rule lives in Hard Rules (behavioral/async bugs) above; this is what to do with its result. Run the one probe that would fail if the hypothesis were wrong, then read it. If the evidence contradicts the hypothesis, discard it completely and re-orient on what the probe just showed. Do not stack a fix onto a disproven hypothesis, and do not keep one just because the code "looks like" the cause.
|
|
106
106
|
|
|
107
107
|
## Runtime Evidence Ladder
|
|
108
108
|
|
|
@@ -118,6 +118,26 @@ Compile-only is not enough for UI, native-app, visual, rendering, or generated-a
|
|
|
118
118
|
|
|
119
119
|
For recurring classes of failures, load `references/failure-patterns.md` before adding a second fix.
|
|
120
120
|
|
|
121
|
+
## Native App Freeze Mode
|
|
122
|
+
|
|
123
|
+
Activate when a desktop or mobile native app reports beachball, not responding, tab-switch freeze, first-open lag, idle wake stall, overlay lockup, or a screenshot shows a frozen app.
|
|
124
|
+
|
|
125
|
+
Evidence to collect before changing code:
|
|
126
|
+
|
|
127
|
+
1. Exact user path and version: first launch versus warm launch, the tab or window transition, idle duration, permissions, display count, and any setting that makes the freeze disappear.
|
|
128
|
+
2. Runtime capture while frozen: `sample <process>`, recent app logs, CPU and memory footprint, thread count, and whether the main thread is blocked, spinning, or allocating.
|
|
129
|
+
3. First-frame surface: view body work, first `.task`, synchronous icon or metadata lookup, filesystem scans, URL parent walks, notification callbacks, and app/window wake handlers.
|
|
130
|
+
4. Blast search after the fix: grep the same API shape across the repo, especially path parent walks, synchronous icon loading, metadata reads in render paths, and callbacks that run on the main thread.
|
|
131
|
+
|
|
132
|
+
Common native freeze traps:
|
|
133
|
+
|
|
134
|
+
- Launch, terminate, permission, audio, display, or workspace notifications doing path walks, icon lookup, filesystem scans, or process enumeration on the main thread.
|
|
135
|
+
- First paint hydrating a full app list, directory tree, media thumbnail set, or system status table before showing an interactive shell.
|
|
136
|
+
- An input-lock or full-screen overlay without a guaranteed teardown path for Escape, app deactivation, permission denial, process termination, and window close.
|
|
137
|
+
- Timer or sampler work that survives hidden windows, long idle periods, sleep/wake, or app reactivation.
|
|
138
|
+
|
|
139
|
+
Compile-only and source-only checks are insufficient for this mode. The outcome must include the runtime capture, the root-cause frame or state transition, the focused regression guard, and any sibling matches that were fixed or explicitly left safe.
|
|
140
|
+
|
|
121
141
|
## Targeted Logging
|
|
122
142
|
|
|
123
143
|
Use logs as a scalpel, not as noise. Before adding a log, write the question it answers:
|
|
@@ -83,6 +83,15 @@ Checks:
|
|
|
83
83
|
- Add test-mode or no-auth guards around real prompts and system changes.
|
|
84
84
|
- Stub external prompt tools through PATH when timeout wrappers exec real binaries.
|
|
85
85
|
|
|
86
|
+
## Subprocess Pipe Backpressure
|
|
87
|
+
|
|
88
|
+
Signals: a long-running child process hangs only on large output, small fixtures pass, or the parent waits for exit before reading stdout/stderr. The child may be blocked on a full pipe buffer while the parent is blocked on `wait`.
|
|
89
|
+
|
|
90
|
+
Checks:
|
|
91
|
+
- Drain stdout and stderr while the process runs, or explicitly inherit/redirect streams when output is not needed.
|
|
92
|
+
- Test with output larger than a typical pipe buffer, not only tiny fixtures.
|
|
93
|
+
- Preserve stderr tails or structured error output for diagnostics without holding the whole stream in memory.
|
|
94
|
+
|
|
86
95
|
## Signal Or Partial-Failure Mapping
|
|
87
96
|
|
|
88
97
|
Signals: cancel, timeout, SIGINT, or SIGTERM is reported as success or as a normal business failure; temp files, locks, or operation logs make retries look complete.
|
|
@@ -118,3 +127,12 @@ Checks:
|
|
|
118
127
|
- Read the tool's man page for cold-start semantics. `top -l 2`, `iostat -d 2`, `vm_stat 1 2`, etc. all share this shape.
|
|
119
128
|
- Slice the output to the latest sample (`.suffix(perSampleSize)` on parsed lines, or look for the second instance of the header row).
|
|
120
129
|
- When in doubt, raise `-l` to 3 and confirm sample 2 and 3 agree; sample 1 stays zero.
|
|
130
|
+
|
|
131
|
+
## Aggregation Key Variant
|
|
132
|
+
|
|
133
|
+
Signals: a count, log roll-up, event tally, or per-category breakdown is short by some entries; the missing items share a trait (a system-derived path, a localized string, a prefixed command name); the base-form key matches but a derived variant (`<base>-system`, a suffix, a prefix) is silently dropped.
|
|
134
|
+
|
|
135
|
+
Checks:
|
|
136
|
+
- Before adding a category, grep every write site that produces this class of key and enumerate the real variants, not just the base form.
|
|
137
|
+
- Match with `hasPrefix` / a regex / an explicit variant list rather than exact equality on the base key.
|
|
138
|
+
- Add a fixture row for each known variant so a future key shape that escapes the matcher fails the test instead of the aggregate.
|
|
@@ -35,12 +35,15 @@ PROXY="${2:-}"
|
|
|
35
35
|
|
|
36
36
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
37
37
|
|
|
38
|
+
LOCAL_ERR="$(mktemp)"
|
|
39
|
+
trap 'rm -f "$LOCAL_ERR"' EXIT
|
|
40
|
+
|
|
38
41
|
# shellcheck disable=SC2329,SC2317 # called indirectly via _with_retry / _try_once
|
|
39
42
|
_curl() {
|
|
40
43
|
if [ -n "$PROXY" ]; then
|
|
41
|
-
https_proxy="$PROXY" http_proxy="$PROXY" curl -sfL "$@"
|
|
44
|
+
https_proxy="$PROXY" http_proxy="$PROXY" curl -sfL --connect-timeout 10 --max-time 30 "$@"
|
|
42
45
|
else
|
|
43
|
-
curl -sfL "$@"
|
|
46
|
+
curl -sfL --connect-timeout 10 --max-time 30 "$@"
|
|
44
47
|
fi
|
|
45
48
|
}
|
|
46
49
|
|
|
@@ -70,14 +73,12 @@ _with_retry() {
|
|
|
70
73
|
}
|
|
71
74
|
|
|
72
75
|
# Tier 1: local extractor. Always tried first.
|
|
73
|
-
if OUT=$(python3 "$SCRIPT_DIR/fetch_local.py" "$URL" 2
|
|
74
|
-
cat
|
|
76
|
+
if OUT=$(python3 "$SCRIPT_DIR/fetch_local.py" "$URL" 2>"$LOCAL_ERR"); then
|
|
77
|
+
cat "$LOCAL_ERR" >&2 2>/dev/null || true
|
|
75
78
|
echo "$OUT"
|
|
76
|
-
rm -f /tmp/fetch-local.err
|
|
77
79
|
exit 0
|
|
78
80
|
fi
|
|
79
|
-
cat
|
|
80
|
-
rm -f /tmp/fetch-local.err
|
|
81
|
+
cat "$LOCAL_ERR" >&2 2>/dev/null || true
|
|
81
82
|
|
|
82
83
|
# Without --use-proxy, stop here. URL never leaves the machine.
|
|
83
84
|
if [ "$USE_PROXY" -eq 0 ]; then
|