npm - cc-devflow - Versions diffs - 4.5.1 → 4.5.3 - Mend

cc-devflow 4.5.1 → 4.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (81) hide show

package/.claude/skills/cc-investigate/assets/ANALYSIS_TEMPLATE.md CHANGED Viewed

@@ -16,8 +16,25 @@
 - What the user saw:
 - Reproduction command / path:
+- Repro stability: `stable` | `intermittent` | `not-yet-reproduced` | `narrowed-only`
+- Matches reported symptom: `yes` | `no` | `partial` | `unknown`
+- Symptom match evidence:
 - Expected:
 - Actual:
+- Impact / blast radius:
+## Feedback Loop Contract
+- Loop type: `failing-test` | `http-script` | `cli-fixture` | `browser-script` | `trace-replay` | `throwaway-harness` | `property-fuzz` | `bisect` | `differential` | `hitl`
+- Command or manual driver:
+- Expected failing signal:
+- Actual failing signal:
+- Runtime:
+- Determinism: `deterministic` | `high-rate-flaky` | `low-rate-flaky` | `unknown`
+- Failure rate:
+- Signal specificity:
+- Sharpening plan:
+- If no loop, evidence request:
 ## Evidence Chain
@@ -25,25 +42,114 @@
 - Code path:
 - Recent changes:
 - Existing tests:
+- Prior investigations:
+- TODO / backlog / report-card signals:
+- Native domain / decision context:
-## Hypothesis Table
+## Boundary Probe Matrix
+| Component boundary | Input observed | Output observed | Config / env observed | State observed | Verdict |
+| --- | --- | --- | --- | --- | --- |
+| | | | | | unknown |
+## Backward Trace Chain
+- Immediate failure site:
+- Direct caller:
+- Caller chain:
+- Bad value origin:
+- Original trigger:
+- Why symptom-site fix is rejected:
+## Reference Comparison
+- Similar working example:
+- Broken path:
+- Differences found:
+- Differences accepted as hypothesis:
+- Differences ruled out:
+## Diagnostic Instrumentation Plan
+| Probe tag | Probe location | Question answered | Command to run | Expected signal | Actual signal | Cleanup requirement |
+| --- | --- | --- | --- | --- | --- | --- |
+| | | | | | | |
+## Pattern Analysis
-| Hypothesis | Evidence for | Evidence against | Status |
+| Pattern | Evidence checked | Status | Notes |
 | --- | --- | --- | --- |
-| | | | pending |
+| race condition | | ruled-out | |
+| null propagation | | ruled-out | |
+| state corruption | | ruled-out | |
+| integration failure | | ruled-out | |
+| configuration drift | | ruled-out | |
+| stale cache | | ruled-out | |
+| resource leak | | ruled-out | |
+| performance regression | | ruled-out | |
+| trust boundary drift | | ruled-out | |
+| timing guess / flaky wait | | ruled-out | |
+## Candidate Hypotheses
+| Rank | Hypothesis | Why plausible | Prediction | Status |
+| --- | --- | --- | --- | --- |
+| 1 | | | | pending |
+## Research Evidence
+- External research used: `yes` | `no`
+- Sanitized query:
+- Source / result:
+- Applicability:
+- Accepted into hypothesis: `yes` | `no`
+- If skipped, reason:
+## Hypothesis Table
+| Hypothesis | Evidence for | Evidence against | Falsification method | Expected observation | Actual observation | Status |
+| --- | --- | --- | --- | --- | --- | --- |
+| | | | | | | pending |
+## Escalation Decision
+- Failed hypothesis count:
+- Attempted evidence:
+- Why current entry is suspect:
+- Next option: `continue-with-new-hypothesis` | `instrument-and-wait` | `human-review` | `reroute-cc-plan`
+- Evidence request:
+- Recommendation:
 ## Root Cause
 - Confirmed root cause:
+- Root cause class: `code` | `config` | `environment` | `external` | `timing`
 - Broken contract:
 - Spec diagnosis: `implementation drift` | `missing spec truth` | `roadmap mismatch`
 - Why it escaped:
+- Why not code root cause:
+- Monitoring or future evidence needed:
+- Operator handling after fix:
+- Prior history relationship: `new` | `recurring` | `same-root-cause` | `architectural-smell-candidate`
+## Correct Test Seam
+- Test seam:
+- Public interface exercised:
+- Why this seam reaches the real trigger chain:
+- Why a shallower test would be false confidence:
+- If no correct seam exists:
 ## Repair Boundary
 - Fix strategy:
+- Affected module:
+- Allowed files:
 - Files likely touched:
 - Do not change:
+- Blast radius file count:
+- Blast radius risk: `low` | `medium` | `high`
+- Split / reroute decision if >5 files:
 - Expected spec delta:
 - Verification after fix:
 - Why this can enter `cc-do`:
@@ -51,6 +157,9 @@
 ## Review Gate
 - Repro stable:
+- Feedback loop trustworthy:
+- Symptom match confirmed:
 - Root cause confirmed:
+- Correct test seam identified:
 - Repair scope still belongs to this requirement:
 - If not, reroute:

package/.claude/skills/cc-investigate/assets/TASKS_TEMPLATE.md CHANGED Viewed

@@ -15,22 +15,34 @@
 - Canonical change meta: `change-meta.json`
 - Execution mode: `single-path` | `parallel-ready`
 - Confirmed root cause:
+- Root-cause hypothesis:
+- Feedback loop:
+- Symptom match evidence:
 - Frozen repair boundary:
+- Boundary probes:
+- Backward trace:
+- Reference comparison:
+- Allowed files:
+- Forbidden files:
+- Blast radius:
 - Capability specs:
 - Read first:
 - Commands to trust:
 - Do not re-decide:
 - Parallel boundaries:
+- Correct test seam:
+- Evidence request if blocked:
-## Phase 1: Reproduce Guard
+## Phase 1: Reproduce And Probe Guard
 - [ ] T001 [TEST] Capture the failing behavior as a stable reproduction (dependsOn:none) `path/to/test`
-  Goal: 让 bug 先变成一个可复跑的失败事实。
+  Goal: 让 bug 先变成一个快、准、可复跑且匹配用户症状的失败事实。
   Files: `path/to/test`
   Read first: `analysis.md`, `tasks.md`
   Verification: `npm test -- path/to/test`
-  Evidence: failing output or reproducible log
-  Ready when: reproduction path 已稳定
+  Evidence: failing output or reproducible log + symptom match evidence
+  Correct seam: test must exercise the real trigger chain through a public interface
+  Ready when: feedback loop 已稳定，analysis 已记录必要的 boundary / trace / comparison evidence
 ## Phase 2: Repair
@@ -40,7 +52,7 @@
   Read first: `analysis.md`, `path/to/test`
   Verification: `npm test -- path/to/test`
   Evidence: passing output + checkpoint
-  Ready when: T001 已证明问题存在
+  Ready when: T001 已证明同一个用户症状存在，analysis 已证明根因源头
 ## Phase 3: Verify

package/.claude/skills/cc-investigate/assets/TASK_MANIFEST_TEMPLATE.json CHANGED Viewed

@@ -20,12 +20,149 @@
     ]
   },
   "planningMeta": {
-    "ccInvestigateSkillVersion": "1.0.0",
+    "ccInvestigateSkillVersion": "1.1.6",
     "analysisVersion": "analysis.v1",
     "approvedAt": "2026-04-17T12:00:00.000Z",
     "approvedBy": "user",
     "basedOnRootCause": "Root cause sentence"
   },
+  "investigationMeta": {
+    "symptomStatus": "stable",
+    "reproductionPath": "npm test -- src/feature/feature.test.ts",
+    "feedbackLoop": {
+      "loopType": "failing-test",
+      "commandOrDriver": "npm test -- src/feature/feature.test.ts",
+      "expectedFailingSignal": "The test fails with the user-reported behavior",
+      "actualFailingSignal": "Observed failure output from the current repo",
+      "symptomMatchEvidence": "Failure output matches the reported symptom, not a nearby unrelated failure",
+      "runtime": "under 10s",
+      "determinism": "deterministic",
+      "failureRate": "100%",
+      "signalSpecificity": "asserts the exact broken behavior",
+      "sharpeningPlan": "Narrow setup or assertions if the loop becomes slow or broad",
+      "evidenceRequest": ""
+    },
+    "patternAnalysis": {
+      "selectedPattern": "null propagation",
+      "ruledOutPatterns": [
+        "race condition",
+        "performance regression",
+        "configuration drift",
+        "timing guess / flaky wait"
+      ],
+      "notes": "Pattern evidence belongs in planning/analysis.md"
+    },
+    "boundaryProbes": [
+      {
+        "componentBoundary": "api -> service",
+        "inputObserved": "Request payload matches the reproduced failure",
+        "outputObserved": "Service receives invalid state",
+        "configEnvObserved": "Relevant env/config values recorded in analysis.md",
+        "stateObserved": "State snapshot or log pointer",
+        "verdict": "fail"
+      }
+    ],
+    "backwardTrace": {
+      "immediateFailureSite": "file:line or operation where the symptom appears",
+      "directCaller": "caller that passed the bad value or state",
+      "callerChain": [
+        "entrypoint",
+        "intermediate caller",
+        "failure site"
+      ],
+      "badValueOrigin": "where the invalid data/state first appears",
+      "originalTrigger": "user action, command, event, config, or dependency response that starts the chain",
+      "symptomSiteFixRejectedBecause": "Guarding only the failure site would leave the bad upstream contract intact"
+    },
+    "referenceComparison": {
+      "similarWorkingExample": "path/to/working/example",
+      "brokenPath": "path/to/broken/path",
+      "differencesFound": [
+        "Working path validates input before persistence"
+      ],
+      "differencesAcceptedAsHypothesis": [
+        "Missing validation before persistence"
+      ],
+      "differencesRuledOut": []
+    },
+    "diagnosticInstrumentation": [
+      {
+        "probeTag": "[DEBUG-FIXXXX-a4f2]",
+        "probeLocation": "file:line or component boundary",
+        "questionAnswered": "Which boundary first emits the invalid value?",
+        "commandToRun": "npm test -- src/feature/feature.test.ts",
+        "expectedSignal": "Probe records invalid value before the failure site",
+        "actualSignal": "Observed evidence from the current repo",
+        "cleanupRequirement": "Remove temporary probe or convert it into a durable assertion/log"
+      }
+    ],
+    "candidateHypotheses": [
+      {
+        "rank": 1,
+        "statement": "Specific, testable root-cause claim",
+        "whyPlausible": "Reproduction output points to the affected contract",
+        "prediction": "The failing signal disappears when that contract is restored",
+        "status": "accepted-for-testing"
+      }
+    ],
+    "priorInvestigations": [],
+    "researchEvidence": [],
+    "domainDecisionContext": {
+      "contextFilesRead": [],
+      "adrFilesRead": [],
+      "vocabularyNotes": [],
+      "adrConflicts": []
+    },
+    "rootCauseHypothesis": {
+      "statement": "Specific, testable root-cause claim",
+      "falsificationMethod": "Command, log probe, assertion, or code-path check",
+      "expectedObservation": "What should be observed if the hypothesis is true",
+      "actualObservation": "Observed evidence from the current repo",
+      "status": "confirmed"
+    },
+    "rootCauseClass": "code",
+    "noCodeRootCause": {
+      "whyNotCodeRootCause": "",
+      "monitoringOrFutureEvidenceNeeded": "",
+      "operatorHandlingAfterFix": ""
+    },
+    "hypothesisAttempts": [
+      {
+        "statement": "Specific, testable root-cause claim",
+        "status": "confirmed",
+        "evidenceFor": [
+          "Reproduction output points to the affected code path"
+        ],
+        "evidenceAgainst": [],
+        "falsificationMethod": "Run the reproduction command"
+      }
+    ],
+    "escalationDecision": {
+      "failedHypothesisCount": 0,
+      "nextOption": "cc-do",
+      "recommendation": "Repair the confirmed root cause"
+    },
+    "correctTestSeam": {
+      "testSeam": "public interface or end-to-end path that reaches the real trigger chain",
+      "publicInterfaceExercised": "CLI/API/UI behavior observed by callers",
+      "realTriggerChainCoverage": "The test enters through the same trigger path as the bug",
+      "whyShallowTestRejected": "A lower-level unit test would not prove the upstream contract",
+      "ifNoCorrectSeam": ""
+    },
+    "repairBoundary": {
+      "affectedModule": "src/feature",
+      "allowedFiles": [
+        "src/feature/feature.ts",
+        "src/feature/feature.test.ts"
+      ],
+      "forbiddenFiles": [
+        "unrelated modules"
+      ],
+      "blastRadiusFileCount": 2,
+      "blastRadiusRisk": "low",
+      "splitOrRerouteDecision": "single focused repair"
+    }
+  },
   "status": "planned",
   "designMode": "cc-investigate",
   "approvedOption": "confirmed-root-cause",
@@ -52,6 +189,7 @@
   "activePhase": 1,
   "frozenDecisions": [
     "Fix only the confirmed root cause",
+    "Use planning/analysis.md as the canonical root-cause contract",
     "Do not widen scope without rerouting to cc-plan"
   ],
   "tasks": [
@@ -71,6 +209,8 @@
       ],
       "acceptance": [
         "The target bug is reproduced as a stable failure",
+        "The failing loop matches the user-reported symptom",
+        "The regression test uses the correct seam for the real trigger chain",
         "The failure output points to the confirmed root-cause path"
       ],
       "verification": [

package/.claude/skills/cc-investigate/references/investigation-contract.md CHANGED Viewed

@@ -3,6 +3,7 @@
 ## Iron Law
 - 没有根因，不准修 bug。
+- 没有 frozen root-cause contract，不准生成 repair task。
 ## Minimum Evidence
@@ -10,10 +11,25 @@
 - symptom
 - reproduction path
+- feedback loop contract
+- symptom match evidence
 - expected vs actual
 - code path
 - recent change signal
+- prior investigation signal
+- boundary probe matrix, when the failure crosses components
+- backward trace chain, when the error appears below the original trigger
+- reference comparison, when a similar working path exists
+- diagnostic instrumentation plan, when probes are needed
+- pattern analysis
+- ranked candidate hypotheses
+- root-cause hypothesis
+- falsification method
 - confirmed root cause
+- correct test seam
+- root cause class
+- repair boundary
+- blast radius
 ## Output Shape
@@ -21,6 +37,182 @@
 - `planning/tasks.md` 是修复 handoff
 - `planning/task-manifest.json` 是执行真相源
+## Root-Cause Hypothesis
+每条假设都必须可证伪：
+- `candidateRank`：候选假设排序，避免第一直觉锚定
+- `hypothesis`：具体说明什么坏了，为什么会导致症状
+- `evidenceFor`
+- `evidenceAgainst`
+- `falsificationMethod`
+- `expectedObservation`
+- `actualObservation`
+- `status`：`pending` / `confirmed` / `rejected` / `needs-more-evidence`
+只有 `confirmed` 假设可以进入 Root Cause。
+## Feedback Loop Contract
+调查必须先构造一个可信 pass/fail loop：
+- `loopType`: failing-test / http-script / cli-fixture / browser-script / trace-replay / throwaway-harness / property-fuzz / bisect / differential / hitl
+- `commandOrDriver`
+- `expectedFailingSignal`
+- `actualFailingSignal`
+- `symptomMatchEvidence`
+- `runtime`
+- `determinism`
+- `failureRate`
+- `sharpeningPlan`
+loop 必须复现用户报告的同一失败。无法构造 loop 时，只能进入 `Evidence Request`，不能冻结根因。
+## Pattern Analysis
+调查必须显式选择或排除常见模式：
+- race condition
+- null propagation
+- state corruption
+- integration failure
+- configuration drift
+- stale cache
+- resource leak
+- performance regression
+- trust boundary drift
+- timing guess / flaky wait
+模式分析只是检索索引，不是 root cause。
+## Boundary Probe Matrix
+多组件链路必须记录每个边界的事实：
+- `componentBoundary`
+- `inputObserved`
+- `outputObserved`
+- `configEnvObserved`
+- `stateObserved`
+- `verdict`: `pass` / `fail` / `unknown`
+第一个失败边界决定下一轮调查收缩点；多个边界同时失败时，优先追共同上游。
+## Backward Trace Chain
+深层堆栈或坏值来源不明时，必须追到源头：
+- immediate failure site
+- direct caller
+- caller chain
+- bad value origin
+- original trigger
+- why symptom-site fix is rejected
+找不到 original trigger 时，不能冻结根因。
+## Reference Comparison
+有相似可用实现时，必须记录：
+- similar working example
+- broken path
+- differences found
+- differences accepted as hypothesis
+- differences ruled out
+不能用“差不多”跳过差异。
+## Diagnostic Instrumentation
+临时探针必须回答一个明确问题：
+- probe tag
+- probe location
+- question answered
+- command to run
+- expected signal
+- actual signal
+- cleanup requirement
+探针不是修复。handoff 必须说明删除、保留为正式日志，或转成测试断言。
+debug 日志必须带唯一前缀，例如 `[DEBUG-FIX123-a4f2]`，确保 cleanup 可以用 grep 验证。
+## Correct Test Seam
+修复 handoff 必须记录回归测试是否覆盖真实触发链：
+- `testSeam`
+- `publicInterfaceExercised`
+- `realTriggerChainCoverage`
+- `whyShallowTestRejected`
+- `ifNoCorrectSeam`
+没有正确 seam 时，必须把它记录为架构事实，并保留原始 feedback loop 作为修复验证。
+## Domain And Decision Context
+调查前先读 cc-devflow 原生上下文：`devflow/specs/INDEX.md`、相关 capability specs、roadmap/backlog handoff、历史 `planning/design.md` / `planning/analysis.md`、`change-meta.json`。
+- 输出中的领域概念、假设名、测试名使用项目既有词汇
+- 如果根因或修复方向违反 capability spec、roadmap decision 或历史 design decision，必须显式记录冲突和理由
+- 缺失领域词汇是调查信号，不要临时发明同义词掩盖契约缺口
+## Prior History
+调查必须记录是否检查了：
+- `git log --oneline -20 -- <affected-files>`
+- historical `planning/analysis.md`
+- `TODOS.md` / backlog / roadmap
+- previous `report-card.json` findings
+如果同一区域重复出现 bug，必须标记为 architectural smell candidate。
+## External Research
+外部调研必须脱敏：
+- 不搜索 host、IP、token、customer id、内部路径、SQL、私有 repo 名
+- 只搜索通用错误类别、框架 / 库名、版本、组件名
+- research finding 只能作为候选假设，必须回到本仓库验证
+## No Code Root Cause
+如果结论不是代码根因，必须写清：
+- `rootCauseClass`: `code` / `config` / `environment` / `external` / `timing`
+- why not code root cause
+- monitoring or future evidence needed
+- operator handling after fix
+环境、外部服务、时序窗口仍然需要证据；不能把调查不足写成外因。
+## Repair Boundary
+修复边界至少记录：
+- affected module
+- allowed files
+- forbidden files
+- expected spec delta
+- verification after fix
+- blast radius file count
+- blast radius risk
+预计触碰超过 5 个文件时，必须 split / justify / reroute。
+## Escalation
+三次假设失败后，不再继续猜。必须记录：
+- failed hypothesis count
+- attempted evidence
+- why current entry is suspect
+- recommended next option：continue / instrument-and-wait / human-review / reroute-cc-plan
+- evidence request：repro env / HAR / log dump / core dump / timestamped recording / temporary production instrumentation
 ## Reroute
 - 根因明确，修复边界清楚 -> `cc-do`

package/.claude/skills/cc-plan/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,31 @@
 # CC-Plan Skill Changelog
+## v3.7.0 - 2026-04-28
+- add glossary delta capture for canonical terms, aliases to avoid, ambiguities, and relationship constraints during context sweep
+- require non-trivial public interfaces to compare deliberately different shapes before freezing the final seam
+- mark vertical slices as `AFK` or `HITL` and require durable design / issue handoffs to describe behavior contracts instead of stale file paths
+## v3.6.2 - 2026-04-28
+- clarify that canonical language and durable decisions come from cc-devflow native sources: `devflow/specs/`, roadmap/backlog handoff, planning design/analysis, and change metadata
+- remove external context/architecture-decision files from the standard planning contract so they are not implied as generated artifacts
+- route long-lived decisions into capability spec deltas, roadmap/backlog decision notes, or the current design decision log
+## v3.6.1 - 2026-04-28
+- require plans to freeze public test seams, behavior assertions, mock boundaries, and feedback loop types before handing Red tasks to `cc-do`
+- strengthen TDD planning so Red tasks reject implementation-detail tests, internal collaborator mocks, and fake seams
+- update design, tiny-design, tasks, and manifest templates with test quality fields inherited from the TDD workflow review
+## v3.6.0 - 2026-04-28
+- absorb grilling-session discipline into native planning: one decision branch at a time, recommended answer with evidence, and no user questions when repo evidence can answer
+- require domain language and durable decision scans before naming modules, interfaces, tests, or tasks
+- add interface/deep-module checks so new public surfaces identify callers, hidden complexity, misuse risk, and alternative shapes before task split
+- strengthen test-first planning around vertical tracer bullets so tasks do not become horizontal "all tests first, all implementation later" slices
+- update design, tiny-design, tasks, and manifest templates with language handoff, interface shape, and vertical slice fields
 ## v3.5.6 - 2026-04-28
 - require non-trivial plans to compare named option roles, including minimal viable and ideal architecture, before freezing a recommendation

package/.claude/skills/cc-plan/PLAYBOOK.md CHANGED Viewed

@@ -18,14 +18,16 @@
 5. 版本、来源、冻结决策必须可追踪。
 6. 机械决策自动落盘；taste decision 和 user challenge 必须显式交给用户拍板。
 7. 同 blast radius 内的完整边界优先做完，跨系统或无证据扩张才 defer。
-8. 具体执行计划默认测试先行；没有 Red/Green/Refactor 链或 TDD exception，不准交给 `cc-do`。
+8. 具体执行计划默认测试先行；没有 Red/Green/Refactor 链、公共测试 seam、行为断言、mock 边界或 TDD exception，不准交给 `cc-do`。
 9. 新 change 目录必须使用 `REQ-<number>-<description>` 或 `FIX-<number>-<description>`；旧小写目录只读兼容，不再作为新输出。
 10. 原始需求跨多个独立子系统时，先拆回 roadmap / 多个 REQ/FIX；不要把一个大杂烩压成单个计划。
 11. `tiny-design` 仍然必须被批准，它只是短设计，不是跳过设计。
 12. 非 trivial 方案必须至少比较 `minimal viable` 和 `ideal architecture` 两种角色，小方案没有天然优先权。
 13. `full-design` 必须冻结 implementation decision horizon 和 error/rescue map，避免 `cc-do` 临场补设计。
-14. 测试框架来源、覆盖质量和回归测试必须在计划阶段写清，不准靠执行阶段猜。
+14. 测试框架来源、覆盖质量、测试 seam、mock 边界和回归测试必须在计划阶段写清，不准靠执行阶段猜。
 15. UI 和 developer/operator-facing 范围只在适用时触发对应 gate，不把每个计划都塞成大审查清单。
+16. 先对齐项目语言和持久决策，再命名 capability、模块、接口、测试和任务；术语冲突必须显式暴露。
+17. 行为变更按 tracer bullet 垂直切片推进，不能把任务水平切成“先测试层、再服务层、最后 UI 层”。
 ## Required Outputs
@@ -63,10 +65,14 @@
 12. `full-design` 必须包含 implementation decision horizon 和 error/rescue map；不适用时写清 N/A 理由。
 13. 新 artifact、CLI、包、容器、文档入口必须在计划阶段写清分发和 discoverability，不准到 `cc-act` 才发现没人能用。
 14. 行为变更任务必须拆成 `[TEST] -> [IMPL] -> [REFACTOR]` 或写明 TDD exception；不能用“实现并测试”混成一个任务。
-15. 回归测试不能 defer。修改既有行为且缺少覆盖时，必须先计划 regression test。
-16. UI scope 要写 design completeness score 和 loading / empty / error / success / partial 状态。
-17. developer/operator-facing scope 要写 target persona、time to first value、magic moment 和 install / run / debug / upgrade 风险。
-18. Review gate 只拦会导致实现错误、执行卡住、范围越界、验证缺失的问题；文字偏好和 nice-to-have 只能作为 advisory。
+15. 行为变更任务必须按一个 observable behavior 一条 tracer bullet 链组织，不能先批量写红灯再批量实现。
+16. 回归测试不能 defer。修改既有行为且缺少覆盖时，必须先计划 regression test。
+17. Red 任务必须验证公共接口上的行为，不验证私有函数、内部调用次数或临时数据结构。
+18. Mock 只能放在系统边界；如果测试必须 mock 自己控制的模块，说明 seam 或接口设计还没压平。
+19. 找不到正确 seam 时，先计划 exploratory spike 或设计修正，不能用假红灯冒充 TDD。
+17. UI scope 要写 design completeness score 和 loading / empty / error / success / partial 状态。
+18. developer/operator-facing scope 要写 target persona、time to first value、magic moment 和 install / run / debug / upgrade 风险。
+19. Review gate 只拦会导致实现错误、执行卡住、范围越界、验证缺失的问题；文字偏好和 nice-to-have 只能作为 advisory。
 ## Approval Flow
@@ -86,9 +92,15 @@
 - 每个会触达的文件职责是什么，为什么属于这个文件，而不是另一个平行位置？
 - 为什么推荐方案胜过 `minimal viable` / `ideal architecture` 的另一端？
 - foundation / core / integration / polish 阶段哪些决策已经冻结，哪些仍是 blocked question？
+- 核心语言是否沿用 `devflow/specs/`、roadmap handoff 或历史 design/analysis，是否存在 language conflict？
+- 新增接口是否是小接口深模块，复杂度是否被藏在正确边界里？
 - 每条 failure path 的 rescue action、用户可见结果和测试证据是什么？
 - 每条新增 code path / user flow / error path 的第一条失败测试是什么？
+- 第一条失败测试通过哪个公共 seam 进入系统，断言什么可观察行为？
+- 哪些依赖允许 mock，哪些内部协作者禁止 mock？
+- 反馈循环是自动测试、HTTP、CLI、浏览器、trace replay、harness、property/fuzz、differential，还是 HITL；为什么这是当前最短可信循环？
 - 测试框架来源是什么，现有覆盖是 strong、happy-path-only、smoke-only 还是 missing？
+- task 是否以端到端 tracer bullet 为单位，而不是按层水平拆？
 - 哪些生产失败模式已经处理，哪些 defer 到 backlog？
 ## Design Mode Switch