npm - @tw93/waza - Versions diffs - 3.27.0 → 3.28.0 - Mend

@tw93/waza 3.27.0 → 3.28.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

package/README.md +24 -6
package/package.json +5 -3
package/rules/anti-patterns.md +24 -23
package/scripts/build_metadata.py +28 -16
package/scripts/check_routing_drift.py +8 -0
package/scripts/package-skill.sh +2 -3
package/scripts/setup-rule.sh +1 -1
package/scripts/setup-statusline.sh +1 -1
package/scripts/skill_checks.py +30 -2
package/scripts/statusline.sh +6 -14
package/scripts/validate_package.py +1 -1
package/skills/RESOLVER.md +1 -1
package/skills/check/SKILL.md +35 -16
package/skills/check/references/project-context.md +1 -5
package/skills/check/scripts/audit_signals.py +10 -9
package/skills/design/SKILL.md +4 -1
package/skills/design/references/design-reference.md +17 -0
package/skills/health/SKILL.md +37 -24
package/skills/health/scripts/check_agent_context.py +1 -1
package/skills/health/scripts/check_maintainability.py +6 -0
package/skills/health/scripts/collect-data.sh +11 -20
package/skills/hunt/SKILL.md +22 -2
package/skills/hunt/references/failure-patterns.md +9 -0
package/skills/read/scripts/fetch.sh +8 -7
package/skills/read/scripts/fetch_feishu.py +11 -6
package/skills/think/SKILL.md +24 -8
package/skills/write/SKILL.md +47 -6
package/skills/write/references/write-en.md +19 -17
package/skills/write/references/write-product-localization.md +43 -0
package/skills/write/references/write-zh-bilingual.md +2 -3
package/skills/write/references/write-zh-prose.md +2 -0
package/skills/write/references/write-zh.md +70 -71

package/skills/check/scripts/audit_signals.py CHANGED Viewed

@@ -85,6 +85,12 @@ CLI_CORE_BUCKETS = (
 )
+# The file-walk helpers below are deliberately duplicated in
+# skills/health/scripts/check_maintainability.py. Both scripts ship
+# standalone (see packaging.allowlist) and run inside an arbitrary target
+# project, so they import only stdlib. Do not hoist them into a shared
+# scripts/ module: it is dev-only, not on the ship allowlist, and would
+# couple a standalone tool to the install layout.
 def is_excluded(path: Path, root: Path) -> bool:
     try:
         parts = path.relative_to(root).parts
@@ -158,23 +164,18 @@ def status(label: str) -> None:
 def block_hotspots(files: list[Path], root: Path) -> None:
     header("FILE SIZE HOTSPOTS")
+    sized = ((p, line_count(p)) for p in files if p.suffix in SOURCE_EXTS)
     big = sorted(
-        ((p, line_count(p)) for p in files
-         if p.suffix in SOURCE_EXTS and line_count(p) >= HOTSPOT_LINES),
+        (item for item in sized if item[1] >= HOTSPOT_LINES),
         key=lambda x: -x[1],
     )[:10]
     if not big:
-        print("(no source files >= 800 lines)")
+        print(f"(no source files >= {HOTSPOT_LINES} lines)")
         status("PASS")
         return
     for path, n in big:
         print(f"  {n:>5}  {rel(path, root)}")
-    if any(n >= HOTSPOT_FAIL for _, n in big):
-        status("FAIL")
-    elif len(big) > 3:
-        status("WARN")
-    else:
-        status("WARN")
+    status("FAIL" if any(n >= HOTSPOT_FAIL for _, n in big) else "WARN")
 def block_heredoc(files: list[Path], root: Path) -> None:

package/skills/design/SKILL.md CHANGED Viewed

@@ -22,6 +22,8 @@ If it could have been generated by a default prompt, it is not good enough.
 **Chinese gut-feel complaints**: when the user says "很傻", "很怪", "突兀", "不协调", "不和谐" about a visual, treat it as an aesthetic rejection, not a debugging symptom. Route to Screenshot Iteration Mode, not to `/hunt`.
+**Document & print typography → Kami.** When the deliverable is a shippable document rather than a product UI surface (report, slide deck, resume, long-form or print-oriented page, paged PDF), do not hand-roll an over-designed document layout here. Suggest the user run it through Kami (`tw93/Kami`), a document design system with a fixed constraint language and templates, and let Kami draft the detailed plan. Screen 排版 (app surfaces, components, web pages) stays in this skill.
 ## Durable Context Preflight
 See [rules/durable-context.md](../../rules/durable-context.md) for when to read durable context, the read-order budget, and the memory-type mapping.
@@ -123,7 +125,7 @@ Summarize the direction as three lines before writing any code:
 For production or multi-page UIs, expand the thesis into the 9-section DESIGN.md scaffold in `references/design-reference.md` (theme, palette, typography, components, layout, depth, do/don't, responsive, prompt guide). For a single component, the three lines are sufficient.
-## Non-Negotiable Constraints
+## Hard Rules
 `references/design-reference.md` is already loaded during direction lock. It owns the full rules: typography, OKLCH color, motion timings, layout defaults, CSS-pattern bans, accessibility baseline, and complexity matching. Apply them. Do not restate them here.
@@ -143,6 +145,7 @@ Give at least 3 variations across genuinely different dimensions (density, typog
 | Fixed visual polish by redesigning the whole surface | Locate the concrete visual delta first, then make the smallest material, opacity, geometry, or typography change that addresses it. |
 | Added a setting or louder control to solve UI noise | Remove the misleading affordance or choose a quiet default first |
 | English looked fine, localized text overflowed | Test long words and localized strings before handoff, especially inside buttons, tabs, nav, and compact cards. |
+| Relied on `…` truncation to fit text in a fixed-width slot | Guarantee fit instead: compact the format, cap to whole segments, or hard-trim with no glyph. Metric and label footers must never tail-truncate into an ellipsis. |
 ## Aesthetic Review

package/skills/design/references/design-reference.md CHANGED Viewed

@@ -123,6 +123,13 @@ When extending an existing interface, first spend time understanding its visual
 If swapping in different content would make the new component look out of place, the vocabulary was not matched closely enough.
+### Responsive & Screen Verification
+- Verify the rendered surface, not a type check or CSS-balance read. Several regressions (early wraps, orphaned separator dots, table overflow) are invisible in source and only show in the render. Screenshot at phone (375px, plus 320px for buttons) and desktop (1280px), in every shipped locale.
+- Line widows: eliminate 1-2 word last lines by trimming the copy so the block rebalances, not by adding a `max-width` cap (a cap narrower than its container wraps early and leaves empty space on the right, which reads as a premature break). Detect objectively: flag any text block whose last line is under ~13% of its widest line; eyeballing misses them, and nested `<code>` hides them from greps.
+- Mobile CTA resting state: natural width, left-aligned to the surrounding text edge, height unchanged. Centering reads as floating; full-width `flex: 1` reads heavy; dropping button height to relieve a "too full" feel treats a width problem as a height one.
+- Spacing is a system, not a per-gap value. Run section spacing as one responsive ladder; when a page reads too airy or too tight, scale the whole set by a single factor across all breakpoints rather than tuning one gap. Asymmetry that survives tuning is structural.
+- Long-form and documentation surfaces stay light: a borderless prev/next text pager (not bordered cards), a sidebar active state as a thin rail rather than a filled block, and build-time zero-runtime-JS code highlighting (bake static spans, plain code stays the source) over a shipped highlighter.
 ## Data Visualization Surfaces
 For dashboards, analytics views, chart-heavy interfaces, or number-dense displays, load `references/design-data-viz.md`. It owns dashboard defaults, chart selection, number alignment, and product-benchmark extraction.
@@ -140,6 +147,16 @@ Reject: Inter, DM Sans, DM Serif Display, DM Serif Text, Outfit, Plus Jakarta Sa
 3. Reject all three.
 4. Pick a typeface from a named foundry (Klim, Commercial Type, Colophon, Grilli Type, OH no Type, Village, etc.) or an open-source option with a clear personality that matches the brand words. Be able to explain why that specific typeface in one sentence.
+## CJK & Multilingual Type
+When the interface mixes Chinese, Japanese, or Korean with Latin, Latin-only type rules silently break the CJK text. Apply these before handoff:
+- **Latin face first, system CJK face after** in the stack, so each script renders with correct glyphs: `font-family: -apple-system, "SF Pro Text", "PingFang SC", "Noto Sans SC", sans-serif;`. Latin runs use the Latin face; Han characters fall through to the CJK face.
+- **Give CJK body text more line-height than Latin**: roughly 1.7–1.8 for reading. Dense Hanzi needs more vertical room than the 1.4–1.5 that suits Latin body copy.
+- **Tag runs with `lang="zh"` / `lang="ja"` / `lang="en"`** so the browser picks the right font and line-breaking. Mixed-language paragraphs break badly without it.
+- **Serif reading modes need an explicit CJK serif fallback.** Most Latin "reading serif" webfonts carry no CJK glyphs, so a serif toggle silently drops Chinese back to a sans and looks broken. Pair them: `"Newsreader", "Songti SC", "Noto Serif SC", serif`.
+- **Do not apply negative letter-spacing to CJK runs.** The display-type tracking rule above is Latin-only; tightening tracking on Hanzi cramps the glyphs and reads as a rendering bug. Scope tracking to `lang="en"` runs.
 ## Color System: OKLCH Rules
 - Use OKLCH instead of HSL. OKLCH is perceptually uniform: equal numeric changes produce equal perceived changes across the spectrum.

package/skills/health/SKILL.md CHANGED Viewed

@@ -40,13 +40,11 @@ For `/health`, audit expectations are `decision`, `preference`, and `principle`
 Pick one. Apply only that tier's requirements.
-| Tier         | Signal                                  | What's expected                                |
-| ------------ | --------------------------------------- | ---------------------------------------------- |
-| **Simple**   | <500 files, 1 contributor, no CI        | CLAUDE.md only; 0-1 skills; hooks optional     |
-| **Standard** | 500-5K files, small team or CI          | CLAUDE.md + 1-2 rules; 2-4 skills; basic hooks |
-| **Complex**  | >5K files, multi-contributor, active CI | Full six-layer setup required                  |
+| Tier | Signal | What's expected |
+|---|---|---|
+| **Simple** | <500 files, 1 contributor, no CI | CLAUDE.md only; 0-1 skills; hooks optional |
+| **Standard** | 500-5K files, small team or CI | CLAUDE.md + 1-2 rules; 2-4 skills; basic hooks |
+| **Complex** | >5K files, multi-contributor, active CI | Full six-layer setup required |
 ## Step 1: Collect data
@@ -86,7 +84,11 @@ The collector includes both runtime-specific and agent-agnostic surfaces:
 Test every MCP server: call one harmless tool per server. Record `live=yes/no` with error detail. Respect `enabled: false` (skip without flagging). For API keys, only check if the env var is set (`echo $VAR | head -c 5`), never print full keys.
-## Security Baseline Checks
+## Step 1c: Safety and security checks
+These run after collection and before the Step 2 analysis. The first two apply to every audit; the third only to projects with long-running or autonomous agents.
+### Security Baseline Checks
 Run these on every audit, regardless of tier. They are the floor, not the ceiling.
@@ -94,15 +96,15 @@ Run these on every audit, regardless of tier. They are the floor, not the ceilin
 **Environment override surface.** Treat the following as attack surface, report when set in tracked files or shipped settings without a justification comment: API base-URL overrides (redirect all traffic to a third party), auto-trust flags for project-local MCP servers, wildcard tool allowlists (`allowedTools: ["*"]`), and permission-skip flags (`--dangerously-skip-permissions` or equivalents). Print file:line and the key name only; never print secrets.
-## Memory and Skill Supply Chain
+### Memory and Skill Supply Chain
 Treat agent memory and third-party skills as supply-chain artifacts. They run with the user's privileges.
 **Memory hygiene.** Audit the project's long-term agent memory store for secrets, tokens, or credentials (Critical), and for entries written by untrusted runs (subagent invoked on attacker-controlled input, /loop iteration over external content); recommend rotation after such runs. For high-risk one-off runs (untrusted PDFs, uncontrolled scraping, third-party scripts), recommend disabling memory persistence for that session entirely.
-**Skill supply chain.** Third-party skills, plugins, and MCP servers run with the user's privileges. For each one not authored in this repo, check: source pinned to a release tag (not `main` or a branch), hook handlers do not write to credential directories, MCP servers have explicit user consent (not auto-trusted by wildcard). Report unpinned sources or unreviewed hook handlers as Structural, not Critical, unless an active exploit signal is present.
+**Skill supply chain.** Third-party skills, plugins, and MCP servers run with the user's privileges. For each one not authored in this repo, check: source pinned to a release tag or revision (not `main`, a branch, or a remote git marketplace left tracking its latest head), hook handlers do not write to credential directories, MCP servers have explicit user consent (not auto-trusted by wildcard). Report unpinned sources or unreviewed hook handlers as Structural, not Critical, unless an active exploit signal is present.
-## Long-Running Agent Stop Conditions
+### Long-Running Agent Stop Conditions
 For projects that use `/loop`, autonomous agents, or any long-running agent flow, the project must define explicit stop conditions. An agent that never stops is a budget and safety incident waiting to happen.
@@ -113,7 +115,7 @@ Audit for these four hard stop signals; flag the absence of each as a Structural
 3. **Cost or token budget exceeded.** Project should declare a per-run budget (tokens, API spend, wall-clock minutes). Loop exits when the budget is hit, not when work is done.
 4. **External blockers.** Merge conflict on the target branch, dependency lock the agent cannot resolve, missing credential, network unreachable. Any of these halt the loop and ask the user, not retry forever.
-The stop conditions should live in tracked project docs (`AGENTS.md`, the loop's launch script, or a dedicated config), not only in the agent's prompt. Prompts are forgettable; tracked config is enforceable. Recommend hooks (PostToolUse on the relevant tools) over prompt instructions when the project supports them: a hook physically cannot be skipped, a prompt instruction can.
+The stop conditions should live in tracked project docs (`AGENTS.md`, the loop's launch script, or a dedicated config), not only in the agent's prompt. Prompts are forgettable; tracked config is enforceable. Recommend hooks (PostToolUse on the relevant tools) over prompt instructions when the project supports them: a hook physically cannot be skipped, a prompt instruction can. Confirm the host's hook coverage before recommending one: some agents only fire PostToolUse for a subset of tools (for example, a runtime may match shell/Bash only), so a fixup that must run after file edits belongs on a Stop or session-end hook there instead.
 ## Step 2: Analyze
@@ -167,7 +169,19 @@ bash skills/health/scripts/check-agent-context.sh . summary
 **AI-maintainability gaps.** Use `AI MAINTAINABILITY SUMMARY` in summary mode and `AI MAINTAINABILITY DETAIL` in deep mode. Report `FAIL` when the project has no executable verification command, no agent instruction surface for a non-trivial repo, or broken doc references. Report `WARN` when instructions exist but lack a project map, verification guidance, boundary/non-goal language, when TODO/HACK markers are concentrated, when large source hotspots lack ownership/boundary and verification guidance, or when durable docs contain raw one-off review reports, scorecards, dated line references, or diagnostic dumps instead of stable invariants. Treat missing `docs/`, `specs/`, `.specify/`, `HANDOFF.md`, `CHANGELOG`, issue templates, and PR templates as informational unless project complexity makes them necessary for handoff. The action for stale reports is to extract stable rules into public instructions, rules, references, or verifier scripts, then remove or archive the transient report.
-**Conversation-derived guidance.** When a health audit reads recent agent conversations, do not recommend copying the conversation or a scorecard into docs. Recommend a candidate-matrix pass instead: repeated failure, durable invariant, target public layer, verifier if deterministic, and redaction risk. If the lesson cannot be stated without local paths, issue numbers, customer details, or one-machine state, keep it out of public guidance and leave it as private context.
+**Conversation-derived guidance.** When a health audit reads recent agent conversations, do not recommend copying the conversation or a scorecard into docs. Recommend a candidate-matrix pass instead:
+| Field | Question |
+|---|---|
+| Repeated failure | Did this recur across fixes, releases, agents, or user reports? |
+| Durable invariant | Can the lesson be stated as a stable rule, not a dated incident summary? |
+| Target layer | Should it live in project instructions, a Waza skill, a global rule, or private memory? |
+| Verifier | Is there a deterministic command, script, artifact check, or runtime smoke that can enforce it? |
+| Redaction risk | Does the lesson require local paths, issue numbers, customer details, machine state, secrets, or unpublished release facts? |
+Layering rule: project-specific commands, app names, artifact names, and release rituals stay in the project; reusable workflows such as cancelled-release review gates or native-freeze evidence ladders belong in Waza skills; universal honesty and verification rules belong in global CLAUDE/AGENTS; private user preferences and one-machine facts stay in memory. If the lesson cannot pass the redaction-risk field, keep it out of public guidance.
+**Concentrated fix chains.** Run `git log --oneline --since='2 weeks ago' | grep -i fix` and group by area (the prefix before `:` or `(`). When the same area has 3+ fix commits in a short window, it signals a missing structural invariant: each fix is a guess at a rule that was never written down. Report a Structural `WARN` with the area name, fix count, and recommend adding an explicit rule to `AGENTS.md` / `CLAUDE.md` / project rules that captures the invariant those fixes were converging toward. A concentrated fix chain that touches the same file 4+ times is a stronger signal than scattered fixes across different files.
 **Hotspot ownership gaps.** In deep mode, read `HOTSPOT OWNERSHIP SURFACE`. If a largest source file exceeds the hotspot threshold and `AGENTS.md` / `CLAUDE.md` / shared instruction files do not name who owns the hotspot, what boundary should stay stable, and which verification command covers it, report a Structural `WARN`. Do not treat documented large files as code rot by size alone; some modules are intentionally large.
@@ -231,15 +245,14 @@ If no issues: `All relevant checks passed. Nothing to fix.`
 ## Gotchas
-| What happened                                                               | Rule                                                                                                                                                                                                                                                                                           |
-| --------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| Missed the local override                                                   | Always read `settings.local.json` too; it shadows the committed file                                                                                                                                                                                                                           |
-| Subagent timeout reported as MCP failure                                    | MCP failures come from the live probe, not data collection                                                                                                                                                                                                                                     |
-| Reported issues in wrong language                                           | Honor CLAUDE.md Communication rule first                                                                                                                                                                                                                                                       |
-| Flagged intentionally noisy hook as broken                                  | Ask before calling a hook "broken"                                                                                                                                                                                                                                                             |
+| What happened | Rule |
+|---|---|
+| Missed the local override | Always read `settings.local.json` too; it shadows the committed file |
+| Subagent timeout reported as MCP failure | MCP failures come from the live probe, not data collection |
+| Reported issues in wrong language | Honor CLAUDE.md Communication rule first |
+| Flagged intentionally noisy hook as broken | Ask before calling a hook "broken" |
 | Hook seemed not to fire, but it did -- a later UI element rendered above it | Hook firing order is not visual order. Before re-editing the hook config: (a) confirm with `--debug` or by piping output, (b) check whether a diff dialog, permission prompt, or other UI element rendered on top and pushed the hook output offscreen, (c) only then suspect the hook itself. |
-| `/health` burned too much quota on first run                                | Stay in summary mode first. Full conversation extracts and inspector subagents are deep-audit tools, not the default path for Standard projects.                                                                                                                                                 |
-| Treated missing specs/docs as a failure                                     | Decision artifacts are optional by default. Escalate missing docs/specs only when the tier, active handoff risk, or user request makes them necessary.                                                                                                                                           |
-| Treated an ignored AGENTS/CLAUDE file as durable project truth              | Report whether the rule is tracked and distributed. Local overlays can inform the audit, but durable fixes belong in public repo docs or shipped skill/rule files.                                                                                                                               |
-| Treated a review scorecard as maintainability documentation                 | Scorecards are snapshots. Extract the invariant and verification path, then remove or archive the report instead of calling the score itself a durable rule.                                                                                                                                     |
+| `/health` burned too much quota on first run | Stay in summary mode first. Full conversation extracts and inspector subagents are deep-audit tools, not the default path for Standard projects. |
+| Treated missing specs/docs as a failure | Decision artifacts are optional by default. Escalate missing docs/specs only when the tier, active handoff risk, or user request makes them necessary. |
+| Treated an ignored AGENTS/CLAUDE file as durable project truth | Report whether the rule is tracked and distributed. Local overlays can inform the audit, but durable fixes belong in public repo docs or shipped skill/rule files. |
+| Treated a review scorecard as maintainability documentation | Scorecards are snapshots. Extract the invariant and verification path, then remove or archive the report instead of calling the score itself a durable rule. |

package/skills/health/scripts/check_agent_context.py CHANGED Viewed

@@ -179,7 +179,7 @@ def parse_codex_config(
         if "=" not in line:
             continue
         key, value = [part.strip() for part in line.split("=", 1)]
-        if section == "features" and value.lower() == "true":
+        if section == "features" and value.split("#", 1)[0].strip().strip('"').lower() == "true":
             features.append(key)
         elif section.startswith('projects."') and key == "trust_level":
             project = section[len('projects."'): -1]

package/skills/health/scripts/check_maintainability.py CHANGED Viewed

@@ -66,6 +66,12 @@ VERIFICATION_WORD_RE = re.compile(
 )
+# The file-walk helpers below are deliberately duplicated in
+# skills/check/scripts/audit_signals.py. Both scripts ship standalone
+# (see packaging.allowlist) and run inside an arbitrary target project, so
+# they import only stdlib. Do not hoist them into a shared scripts/
+# module: it is dev-only, not on the ship allowlist, and would couple a
+# standalone tool to the install layout.
 def rel(path: Path, root: Path) -> str:
     try:
         return path.resolve().relative_to(root).as_posix()

package/skills/health/scripts/collect-data.sh CHANGED Viewed

@@ -8,7 +8,7 @@
 #   python3 not on PATH     -> MCP/hooks/allowedTools sections print "(unavailable)"; do not flag those areas
 #   settings.local.json absent -> hooks, MCP, allowedTools all show "(unavailable)"; normal for global-settings-only projects
 #   MEMORY.md path          -> built via sed on pwd; unusual chars produce wrong project key; verify manually if (none) seems wrong
-#   Conversation scope      -> only 2 most recent .jsonl files sampled; fewer than 2 = [LOW CONFIDENCE]
+#   Conversation scope      -> 2 most recent PREVIOUS .jsonl sampled (live session skipped); fewer than 2 = [LOW CONFIDENCE]
 #   MCP token estimate      -> assumes ~25 tools/server, ~200 tokens/tool; treat as directional, not precise
 #   Tier misclassification  -> .next/, __pycache__, .turbo/ can inflate file count; recheck manually if tier feels wrong
 set -euo pipefail
@@ -293,9 +293,10 @@ sample_jsonl_prefix() {
   ' "$file"
 }
-extract_messages_from_file() {
-  local file="$1"
-  sample_jsonl_prefix "$file" | jq -r '
+# Shared jq filter: collapse one transcript record to a single trimmed text
+# line, dropping meta and tool-result noise. Defined once and prepended to both
+# extract_* programs below so the flattening logic lives in exactly one place.
+JQ_FLATTEN='
     def flatten:
       if (.isMeta // false) or (.toolUseResult? != null) then
         empty
@@ -311,6 +312,11 @@ extract_messages_from_file() {
         | sub("^ "; "")
         | sub(" $"; "")
       end;
+'
+extract_messages_from_file() {
+  local file="$1"
+  sample_jsonl_prefix "$file" | jq -r "$JQ_FLATTEN"'
     (.type // .role // "") as $kind
     | (flatten) as $text
     | if ($text | length) == 0 then
@@ -329,22 +335,7 @@ extract_messages_from_file() {
 extract_signals_from_file() {
   local file="$1"
-  sample_jsonl_prefix "$file" | jq -r '
-    def flatten:
-      if (.isMeta // false) or (.toolUseResult? != null) then
-        empty
-      else
-        (.message.content // .content // .text // "")
-        | if type == "array" then
-            [ .[] | if type == "object" and .type == "text" then .text elif type == "string" then . else empty end ] | join(" ")
-          elif type == "string" then .
-          else empty
-          end
-        | gsub("[\\r\\n]+"; " ")
-        | gsub("  +"; " ")
-        | sub("^ "; "")
-        | sub(" $"; "")
-      end;
+  sample_jsonl_prefix "$file" | jq -r "$JQ_FLATTEN"'
     def is_correction:
       test("(?i)(\\bdon'\''t\\b|\\bdo not\\b|\\bplease don'\''t\\b|\\binstead\\b|\\bnext time\\b|\\bremember\\b|\\buse\\b.*\\binstead\\b|\\bnot\\b.*\\bbut\\b)")
       or test("(不要再|请不要|不要|别再|下次|记得|改成|改为|而不是|别用|去掉|统一成)");

package/skills/hunt/SKILL.md CHANGED Viewed

@@ -46,7 +46,7 @@ For `/hunt`, diagnostic constraints are `decision`, `preference`, and `principle
 - **System/tooling symptoms need a lower-layer baseline.** Before blaming the visible app, generated file, or top-level feature, measure the raw lower layer first: OS capture versus post-processing, runtime service versus UI, compiler/toolchain versus test assertion, network/API versus client handling. Retire hypotheses that the baseline disproves instead of circling them.
 - **Pay attention to deflection.** When someone says "that part doesn't matter," treat it as a signal. The area someone avoids examining is often where the problem lives.
 - **Visual/rendering bugs: static analysis first.** Trace paint layers, stacking contexts, and layer order in DevTools before adding console.log or visual debug overlays. Logs cannot capture what the compositor does. Only add instrumentation after static analysis fails.
-- **Behavioral / lifecycle bugs: log-debug first after one failed fix.** Window lifecycle, event delivery, navigation, focus, timer, and async-state bugs do *not* yield to static reading alone. After static analysis plus one failed hypothesis, stop guessing and add a runtime probe (Swift `#if DEBUG NSLog("[App.stage] state=...")`, JS `console.log`, Rust `eprintln!` / `tracing::debug!`) at the suspect boundary. Read its output before changing code again. "Looks reasonable but unverified" twice in a row is the hard-stop signal. Distinguish from the visual-rendering rule above: a compositor bug needs DevTools; a "the function was never called" / "the object was already destroyed" bug needs a log.
+- **Behavioral / lifecycle / async bugs: instrument first, not after failure.** Window lifecycle, event delivery, navigation, focus, timer, state-machine, and async-ordering bugs almost never yield to static reading alone. Do not wait for a failed fix to add logs. The moment your hypothesis involves "this callback fires before/after that one", "this state should be X when Y runs", or "this object should still be alive here", **add the log immediately as part of forming the hypothesis**, before writing any fix. A hypothesis without runtime evidence is a guess; two guesses in a row is the hard-stop signal. Distinguish from visual-rendering bugs (compositor behavior needs DevTools, not logs) and pure-logic bugs (wrong formula, off-by-one) where static analysis is sufficient.
 - **Tuning magic numbers past round three: stop, unify.** When a spacing / sizing / threshold value has been adjusted three times and still looks wrong, the bug is structural, not numeric. Replace the N independent values with one named token (`Spacing.s4`, `--gap-content`, etc.) and verify the asymmetry was hiding a missing constraint. Asymmetry that survives tuning is structural; more tuning will not converge.
 - **Fix the cause, not the symptom.** If the fix touches more than 5 files, pause and confirm scope with the user.
@@ -102,7 +102,7 @@ If the blast surfaces unrelated bugs, list them but do not fix in this PR unless
 ## Confirm or Discard
-Add one targeted instrument: a log line, a failing assertion, or the smallest test that would fail if the hypothesis is correct. Run it. If the evidence contradicts the hypothesis, discard it completely and re-orient with what was just learned. Do not preserve a hypothesis the evidence disproves.
+The instrument-first rule lives in Hard Rules (behavioral/async bugs) above; this is what to do with its result. Run the one probe that would fail if the hypothesis were wrong, then read it. If the evidence contradicts the hypothesis, discard it completely and re-orient on what the probe just showed. Do not stack a fix onto a disproven hypothesis, and do not keep one just because the code "looks like" the cause.
 ## Runtime Evidence Ladder
@@ -118,6 +118,26 @@ Compile-only is not enough for UI, native-app, visual, rendering, or generated-a
 For recurring classes of failures, load `references/failure-patterns.md` before adding a second fix.
+## Native App Freeze Mode
+Activate when a desktop or mobile native app reports beachball, not responding, tab-switch freeze, first-open lag, idle wake stall, overlay lockup, or a screenshot shows a frozen app.
+Evidence to collect before changing code:
+1. Exact user path and version: first launch versus warm launch, the tab or window transition, idle duration, permissions, display count, and any setting that makes the freeze disappear.
+2. Runtime capture while frozen: `sample <process>`, recent app logs, CPU and memory footprint, thread count, and whether the main thread is blocked, spinning, or allocating.
+3. First-frame surface: view body work, first `.task`, synchronous icon or metadata lookup, filesystem scans, URL parent walks, notification callbacks, and app/window wake handlers.
+4. Blast search after the fix: grep the same API shape across the repo, especially path parent walks, synchronous icon loading, metadata reads in render paths, and callbacks that run on the main thread.
+Common native freeze traps:
+- Launch, terminate, permission, audio, display, or workspace notifications doing path walks, icon lookup, filesystem scans, or process enumeration on the main thread.
+- First paint hydrating a full app list, directory tree, media thumbnail set, or system status table before showing an interactive shell.
+- An input-lock or full-screen overlay without a guaranteed teardown path for Escape, app deactivation, permission denial, process termination, and window close.
+- Timer or sampler work that survives hidden windows, long idle periods, sleep/wake, or app reactivation.
+Compile-only and source-only checks are insufficient for this mode. The outcome must include the runtime capture, the root-cause frame or state transition, the focused regression guard, and any sibling matches that were fixed or explicitly left safe.
 ## Targeted Logging
 Use logs as a scalpel, not as noise. Before adding a log, write the question it answers:

package/skills/hunt/references/failure-patterns.md CHANGED Viewed

@@ -118,3 +118,12 @@ Checks:
 - Read the tool's man page for cold-start semantics. `top -l 2`, `iostat -d 2`, `vm_stat 1 2`, etc. all share this shape.
 - Slice the output to the latest sample (`.suffix(perSampleSize)` on parsed lines, or look for the second instance of the header row).
 - When in doubt, raise `-l` to 3 and confirm sample 2 and 3 agree; sample 1 stays zero.
+## Aggregation Key Variant
+Signals: a count, log roll-up, event tally, or per-category breakdown is short by some entries; the missing items share a trait (a system-derived path, a localized string, a prefixed command name); the base-form key matches but a derived variant (`<base>-system`, a suffix, a prefix) is silently dropped.
+Checks:
+- Before adding a category, grep every write site that produces this class of key and enumerate the real variants, not just the base form.
+- Match with `hasPrefix` / a regex / an explicit variant list rather than exact equality on the base key.
+- Add a fixture row for each known variant so a future key shape that escapes the matcher fails the test instead of the aggregate.

package/skills/read/scripts/fetch.sh CHANGED Viewed

@@ -35,12 +35,15 @@ PROXY="${2:-}"
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+LOCAL_ERR="$(mktemp)"
+trap 'rm -f "$LOCAL_ERR"' EXIT
 # shellcheck disable=SC2329,SC2317  # called indirectly via _with_retry / _try_once
 _curl() {
   if [ -n "$PROXY" ]; then
-    https_proxy="$PROXY" http_proxy="$PROXY" curl -sfL "$@"
+    https_proxy="$PROXY" http_proxy="$PROXY" curl -sfL --connect-timeout 10 --max-time 30 "$@"
   else
-    curl -sfL "$@"
+    curl -sfL --connect-timeout 10 --max-time 30 "$@"
   fi
 }
@@ -70,14 +73,12 @@ _with_retry() {
 }
 # Tier 1: local extractor. Always tried first.
-if OUT=$(python3 "$SCRIPT_DIR/fetch_local.py" "$URL" 2>/tmp/fetch-local.err); then
-  cat /tmp/fetch-local.err >&2 2>/dev/null || true
+if OUT=$(python3 "$SCRIPT_DIR/fetch_local.py" "$URL" 2>"$LOCAL_ERR"); then
+  cat "$LOCAL_ERR" >&2 2>/dev/null || true
   echo "$OUT"
-  rm -f /tmp/fetch-local.err
   exit 0
 fi
-cat /tmp/fetch-local.err >&2 2>/dev/null || true
-rm -f /tmp/fetch-local.err
+cat "$LOCAL_ERR" >&2 2>/dev/null || true
 # Without --use-proxy, stop here. URL never leaves the machine.
 if [ "$USE_PROXY" -eq 0 ]; then

package/skills/read/scripts/fetch_feishu.py CHANGED Viewed

@@ -31,6 +31,7 @@ except ImportError:
     sys.exit(1)
 API = "https://open.feishu.cn/open-apis"
+TIMEOUT = 20
 def yaml_string(value):
@@ -43,7 +44,8 @@ def get_token():
     if not app_id or not app_secret:
         return None, "FEISHU_APP_ID or FEISHU_APP_SECRET not set"
     resp = requests.post(f"{API}/auth/v3/tenant_access_token/internal",
-                         json={"app_id": app_id, "app_secret": app_secret})
+                         json={"app_id": app_id, "app_secret": app_secret},
+                         timeout=TIMEOUT)
     d = resp.json()
     if d.get("code") != 0:
         return None, f"Auth failed: {d.get('msg', resp.text)}"
@@ -69,7 +71,8 @@ def parse_url(url):
 def resolve_wiki(token, wiki_token):
     resp = requests.get(f"{API}/wiki/v2/spaces/get_node",
                         headers={"Authorization": f"Bearer {token}"},
-                        params={"token": wiki_token})
+                        params={"token": wiki_token},
+                        timeout=TIMEOUT)
     d = resp.json()
     if d.get("code") == 0:
         node = d["data"]["node"]
@@ -85,7 +88,8 @@ def get_blocks(token, doc_id):
             params["page_token"] = page_token
         resp = requests.get(f"{API}/docx/v1/documents/{doc_id}/blocks",
                             headers={"Authorization": f"Bearer {token}"},
-                            params=params)
+                            params=params,
+                            timeout=TIMEOUT)
         d = resp.json()
         if d.get("code") != 0:
             return None, f"Blocks fetch failed: {d.get('msg', resp.text)}"
@@ -141,7 +145,7 @@ def blocks_to_md(blocks):
             key = f"heading{level}"
             data = block.get(key) or block.get("heading", {})
             text = extract_text(data.get("elements", []))
-            lines.append(f"{'#' * level} {text}")
+            lines.append(f"{'#' * min(level, 6)} {text}")
         elif bt == 10:
             text = extract_text(block.get("bullet", {}).get("elements", []))
             lines.append(f"- {text}")
@@ -203,8 +207,9 @@ def fetch_feishu(url):
         doc_id, doc_type = real_id, real_type or "docx"
     info_resp = requests.get(f"{API}/docx/v1/documents/{doc_id}",
-                             headers={"Authorization": f"Bearer {token}"})
-    doc_info = info_resp.json().get("data", {}).get("document", {})
+                             headers={"Authorization": f"Bearer {token}"},
+                             timeout=TIMEOUT)
+    doc_info = (info_resp.json().get("data") or {}).get("document") or {}
     title = doc_info.get("title", "")
     blocks, err = get_blocks(token, doc_id)

package/skills/think/SKILL.md CHANGED Viewed

@@ -20,6 +20,14 @@ Give opinions directly. Take a position and state what evidence would change it.
 - Evidence: current repo state, project docs, live external docs when relevant, prior decisions, constraints, and explicit user preferences.
 - Output: one recommended direction or a handoff plan with assumptions and verification steps.
+## Durable Context Preflight
+See [rules/durable-context.md](../../rules/durable-context.md) for when to read durable context, the read-order budget, and the memory-type mapping (planning constraints, reusable patterns, facts that need re-verification against current state).
+For `/think`, planning constraints are `decision`, `preference`, and `principle` entries; current repo state, live docs, logs, tests, and remote state override memory. Lock durable decisions and preferences before asking questions. Do not ask the user to restate an intent that the durable context already establishes unless it is risky, stale, or contradicted by current state.
+Before outputting any plan, scan the project's `AGENTS.md`, `CLAUDE.md`, `.claude/rules/*.md`, and any local agent-memory summary if the user pointed at one. If the proposed plan contradicts a "hard rule", "never X", "must Y", or "prefer Z" stated in those files, surface the contradiction in the plan output (one sentence: which rule, which step contradicts it, recommended resolution). Do not silently override the rule. If the rule blocks the plan, stop and ask before continuing.
 ## Lightweight Mode
 Activate when the user wants to fix something rather than build something, the problem is already defined, and the only open question is "how to fix it."
@@ -50,6 +58,22 @@ Do not use a build-plan template here. Do not list options. Give one verdict.
 Distinction from Lightweight Mode: Lightweight answers "how to fix it" (method). Evaluation answers "should it exist" (value judgment).
+## Triage Mode
+Activate when the user forwards a bundle of asks: an issue with multiple requests, a batch of screenshots, a user saying "看看这几个需求", or any input containing 3+ distinct items that could each be accepted or rejected independently.
+Do not treat the bundle as a to-do list. Classify each item first:
+| Bucket | Meaning | Action |
+|--------|---------|--------|
+| **Bug** | Broken behavior with evidence | Fix |
+| **Already works** | The feature exists but the reporter missed it | Point to the existing affordance |
+| **Accepted improvement** | Genuine gap, low-risk, aligns with product direction | Implement |
+| **Cosmetic / preference** | Subjective, no functional impact | Note it, do not implement unless the maintainer agrees |
+| **Out of scope** | Conflicts with product boundary or adds unjustified complexity | Decline with one sentence |
+Output the classification table first. Wait for the user to confirm the accepted subset before implementing anything. "Already works" misidentified as missing is the most common waste; grep for the existing affordance before classifying an item as a gap.
 **Negative-user feedback is not automatic scope.** When a user evaluation is triggered by a refund customer, a churn report, or a "competitor X is more intuitive" comparison, do not convert the complaint into a rework plan by default. First check whether the current behavior is intentional product differentiation, not an oversight: read the project's own AGENTS.md / CLAUDE.md / product notes for phrases like "review-first", "verifiability over speed", "evidence-driven", "explicit confirmation". If the behavior the user criticized is named there as a deliberate choice, the verdict is **Keep**, with one sentence on why the differentiation matters, and a note that the maintainer can override. Do not write a "fix the friction" plan that quietly removes the differentiator. The signal-to-respect ratio for refund / competitor-comparison feedback on a deliberately-designed surface is low.
 ## Before Reading Any Code
@@ -58,14 +82,6 @@ Distinction from Lightweight Mode: Lightweight answers "how to fix it" (method).
 - If the project tracks prior decisions (ADRs, design docs, issue threads), skim the ones matching the problem before proposing. Skip if none exist.
 - If the plan involves a default value, env var, or config field, open the project's actual config file (e.g. `app.config.json`, `tauri.conf.json`, `package.json`, `.env`) and lift the live value. Never quote a default from memory or docs.
-## Durable Context Preflight
-See [rules/durable-context.md](../../rules/durable-context.md) for when to read durable context, the read-order budget, and the memory-type mapping (planning constraints, reusable patterns, facts that need re-verification against current state).
-For `/think`, planning constraints are `decision`, `preference`, and `principle` entries; current repo state, live docs, logs, tests, and remote state override memory. Lock durable decisions and preferences before asking questions. Do not ask the user to restate an intent that the durable context already establishes unless it is risky, stale, or contradicted by current state.
-Before outputting any plan, scan the project's `AGENTS.md`, `CLAUDE.md`, `.claude/rules/*.md`, and any local agent-memory summary if the user pointed at one. If the proposed plan contradicts a "hard rule", "never X", "must Y", or "prefer Z" stated in those files, surface the contradiction in the plan output (one sentence: which rule, which step contradicts it, recommended resolution). Do not silently override the rule. If the rule blocks the plan, stop and ask before continuing.
 ## Check for Official Solutions First
 Before proposing custom implementations, search for framework built-ins, official patterns, and ecosystem standards. Use Context7 MCP tools to query latest docs when available. If an official solution exists, it is the default recommendation unless you can articulate why it is insufficient for this specific case.