npm - cc-devflow - Versions diffs - 4.5.0 → 4.5.2 - Mend

cc-devflow 4.5.0 → 4.5.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (75) hide show

package/.claude/skills/cc-investigate/assets/ANALYSIS_TEMPLATE.md CHANGED Viewed

@@ -16,8 +16,10 @@
 - What the user saw:
 - Reproduction command / path:
+- Repro stability: `stable` | `intermittent` | `not-yet-reproduced` | `narrowed-only`
 - Expected:
 - Actual:
+- Impact / blast radius:
 ## Evidence Chain
@@ -25,25 +27,97 @@
 - Code path:
 - Recent changes:
 - Existing tests:
+- Prior investigations:
+- TODO / backlog / report-card signals:
-## Hypothesis Table
+## Boundary Probe Matrix
+| Component boundary | Input observed | Output observed | Config / env observed | State observed | Verdict |
+| --- | --- | --- | --- | --- | --- |
+| | | | | | unknown |
+## Backward Trace Chain
+- Immediate failure site:
+- Direct caller:
+- Caller chain:
+- Bad value origin:
+- Original trigger:
+- Why symptom-site fix is rejected:
+## Reference Comparison
+- Similar working example:
+- Broken path:
+- Differences found:
+- Differences accepted as hypothesis:
+- Differences ruled out:
+## Diagnostic Instrumentation Plan
+| Probe location | Question answered | Command to run | Expected signal | Actual signal | Cleanup requirement |
+| --- | --- | --- | --- | --- | --- |
+| | | | | | |
-| Hypothesis | Evidence for | Evidence against | Status |
+## Pattern Analysis
+| Pattern | Evidence checked | Status | Notes |
 | --- | --- | --- | --- |
-| | | | pending |
+| race condition | | ruled-out | |
+| null propagation | | ruled-out | |
+| state corruption | | ruled-out | |
+| integration failure | | ruled-out | |
+| configuration drift | | ruled-out | |
+| stale cache | | ruled-out | |
+| resource leak | | ruled-out | |
+| trust boundary drift | | ruled-out | |
+| timing guess / flaky wait | | ruled-out | |
+## Research Evidence
+- External research used: `yes` | `no`
+- Sanitized query:
+- Source / result:
+- Applicability:
+- Accepted into hypothesis: `yes` | `no`
+- If skipped, reason:
+## Hypothesis Table
+| Hypothesis | Evidence for | Evidence against | Falsification method | Expected observation | Actual observation | Status |
+| --- | --- | --- | --- | --- | --- | --- |
+| | | | | | | pending |
+## Escalation Decision
+- Failed hypothesis count:
+- Attempted evidence:
+- Why current entry is suspect:
+- Next option: `continue-with-new-hypothesis` | `instrument-and-wait` | `human-review` | `reroute-cc-plan`
+- Recommendation:
 ## Root Cause
 - Confirmed root cause:
+- Root cause class: `code` | `config` | `environment` | `external` | `timing`
 - Broken contract:
 - Spec diagnosis: `implementation drift` | `missing spec truth` | `roadmap mismatch`
 - Why it escaped:
+- Why not code root cause:
+- Monitoring or future evidence needed:
+- Operator handling after fix:
+- Prior history relationship: `new` | `recurring` | `same-root-cause` | `architectural-smell-candidate`
 ## Repair Boundary
 - Fix strategy:
+- Affected module:
+- Allowed files:
 - Files likely touched:
 - Do not change:
+- Blast radius file count:
+- Blast radius risk: `low` | `medium` | `high`
+- Split / reroute decision if >5 files:
 - Expected spec delta:
 - Verification after fix:
 - Why this can enter `cc-do`:

package/.claude/skills/cc-investigate/assets/TASKS_TEMPLATE.md CHANGED Viewed

@@ -15,14 +15,21 @@
 - Canonical change meta: `change-meta.json`
 - Execution mode: `single-path` | `parallel-ready`
 - Confirmed root cause:
+- Root-cause hypothesis:
 - Frozen repair boundary:
+- Boundary probes:
+- Backward trace:
+- Reference comparison:
+- Allowed files:
+- Forbidden files:
+- Blast radius:
 - Capability specs:
 - Read first:
 - Commands to trust:
 - Do not re-decide:
 - Parallel boundaries:
-## Phase 1: Reproduce Guard
+## Phase 1: Reproduce And Probe Guard
 - [ ] T001 [TEST] Capture the failing behavior as a stable reproduction (dependsOn:none) `path/to/test`
   Goal: 让 bug 先变成一个可复跑的失败事实。
@@ -30,7 +37,7 @@
   Read first: `analysis.md`, `tasks.md`
   Verification: `npm test -- path/to/test`
   Evidence: failing output or reproducible log
-  Ready when: reproduction path 已稳定
+  Ready when: reproduction path 已稳定，analysis 已记录必要的 boundary / trace / comparison evidence
 ## Phase 2: Repair
@@ -40,7 +47,7 @@
   Read first: `analysis.md`, `path/to/test`
   Verification: `npm test -- path/to/test`
   Evidence: passing output + checkpoint
-  Ready when: T001 已证明问题存在
+  Ready when: T001 已证明问题存在，analysis 已证明根因源头
 ## Phase 3: Verify

package/.claude/skills/cc-investigate/assets/TASK_MANIFEST_TEMPLATE.json CHANGED Viewed

@@ -20,12 +20,112 @@
     ]
   },
   "planningMeta": {
-    "ccInvestigateSkillVersion": "1.0.0",
+    "ccInvestigateSkillVersion": "1.1.4",
     "analysisVersion": "analysis.v1",
     "approvedAt": "2026-04-17T12:00:00.000Z",
     "approvedBy": "user",
     "basedOnRootCause": "Root cause sentence"
   },
+  "investigationMeta": {
+    "symptomStatus": "stable",
+    "reproductionPath": "npm test -- src/feature/feature.test.ts",
+    "patternAnalysis": {
+      "selectedPattern": "implementation drift",
+      "ruledOutPatterns": [
+        "race condition",
+        "configuration drift",
+        "timing guess / flaky wait"
+      ],
+      "notes": "Pattern evidence belongs in planning/analysis.md"
+    },
+    "boundaryProbes": [
+      {
+        "componentBoundary": "api -> service",
+        "inputObserved": "Request payload matches the reproduced failure",
+        "outputObserved": "Service receives invalid state",
+        "configEnvObserved": "Relevant env/config values recorded in analysis.md",
+        "stateObserved": "State snapshot or log pointer",
+        "verdict": "fail"
+      }
+    ],
+    "backwardTrace": {
+      "immediateFailureSite": "file:line or operation where the symptom appears",
+      "directCaller": "caller that passed the bad value or state",
+      "callerChain": [
+        "entrypoint",
+        "intermediate caller",
+        "failure site"
+      ],
+      "badValueOrigin": "where the invalid data/state first appears",
+      "originalTrigger": "user action, command, event, config, or dependency response that starts the chain",
+      "symptomSiteFixRejectedBecause": "Guarding only the failure site would leave the bad upstream contract intact"
+    },
+    "referenceComparison": {
+      "similarWorkingExample": "path/to/working/example",
+      "brokenPath": "path/to/broken/path",
+      "differencesFound": [
+        "Working path validates input before persistence"
+      ],
+      "differencesAcceptedAsHypothesis": [
+        "Missing validation before persistence"
+      ],
+      "differencesRuledOut": []
+    },
+    "diagnosticInstrumentation": [
+      {
+        "probeLocation": "file:line or component boundary",
+        "questionAnswered": "Which boundary first emits the invalid value?",
+        "commandToRun": "npm test -- src/feature/feature.test.ts",
+        "expectedSignal": "Probe records invalid value before the failure site",
+        "actualSignal": "Observed evidence from the current repo",
+        "cleanupRequirement": "Remove temporary probe or convert it into a durable assertion/log"
+      }
+    ],
+    "priorInvestigations": [],
+    "researchEvidence": [],
+    "rootCauseHypothesis": {
+      "statement": "Specific, testable root-cause claim",
+      "falsificationMethod": "Command, log probe, assertion, or code-path check",
+      "expectedObservation": "What should be observed if the hypothesis is true",
+      "actualObservation": "Observed evidence from the current repo",
+      "status": "confirmed"
+    },
+    "rootCauseClass": "code",
+    "noCodeRootCause": {
+      "whyNotCodeRootCause": "",
+      "monitoringOrFutureEvidenceNeeded": "",
+      "operatorHandlingAfterFix": ""
+    },
+    "hypothesisAttempts": [
+      {
+        "statement": "Specific, testable root-cause claim",
+        "status": "confirmed",
+        "evidenceFor": [
+          "Reproduction output points to the affected code path"
+        ],
+        "evidenceAgainst": [],
+        "falsificationMethod": "Run the reproduction command"
+      }
+    ],
+    "escalationDecision": {
+      "failedHypothesisCount": 0,
+      "nextOption": "cc-do",
+      "recommendation": "Repair the confirmed root cause"
+    },
+    "repairBoundary": {
+      "affectedModule": "src/feature",
+      "allowedFiles": [
+        "src/feature/feature.ts",
+        "src/feature/feature.test.ts"
+      ],
+      "forbiddenFiles": [
+        "unrelated modules"
+      ],
+      "blastRadiusFileCount": 2,
+      "blastRadiusRisk": "low",
+      "splitOrRerouteDecision": "single focused repair"
+    }
+  },
   "status": "planned",
   "designMode": "cc-investigate",
   "approvedOption": "confirmed-root-cause",
@@ -52,6 +152,7 @@
   "activePhase": 1,
   "frozenDecisions": [
     "Fix only the confirmed root cause",
+    "Use planning/analysis.md as the canonical root-cause contract",
     "Do not widen scope without rerouting to cc-plan"
   ],
   "tasks": [

package/.claude/skills/cc-investigate/references/investigation-contract.md CHANGED Viewed

@@ -3,6 +3,7 @@
 ## Iron Law
 - 没有根因，不准修 bug。
+- 没有 frozen root-cause contract，不准生成 repair task。
 ## Minimum Evidence
@@ -13,7 +14,18 @@
 - expected vs actual
 - code path
 - recent change signal
+- prior investigation signal
+- boundary probe matrix, when the failure crosses components
+- backward trace chain, when the error appears below the original trigger
+- reference comparison, when a similar working path exists
+- diagnostic instrumentation plan, when probes are needed
+- pattern analysis
+- root-cause hypothesis
+- falsification method
 - confirmed root cause
+- root cause class
+- repair boundary
+- blast radius
 ## Output Shape
@@ -21,6 +33,140 @@
 - `planning/tasks.md` 是修复 handoff
 - `planning/task-manifest.json` 是执行真相源
+## Root-Cause Hypothesis
+每条假设都必须可证伪：
+- `hypothesis`：具体说明什么坏了，为什么会导致症状
+- `evidenceFor`
+- `evidenceAgainst`
+- `falsificationMethod`
+- `expectedObservation`
+- `actualObservation`
+- `status`：`pending` / `confirmed` / `rejected` / `needs-more-evidence`
+只有 `confirmed` 假设可以进入 Root Cause。
+## Pattern Analysis
+调查必须显式选择或排除常见模式：
+- race condition
+- null propagation
+- state corruption
+- integration failure
+- configuration drift
+- stale cache
+- resource leak
+- trust boundary drift
+- timing guess / flaky wait
+模式分析只是检索索引，不是 root cause。
+## Boundary Probe Matrix
+多组件链路必须记录每个边界的事实：
+- `componentBoundary`
+- `inputObserved`
+- `outputObserved`
+- `configEnvObserved`
+- `stateObserved`
+- `verdict`: `pass` / `fail` / `unknown`
+第一个失败边界决定下一轮调查收缩点；多个边界同时失败时，优先追共同上游。
+## Backward Trace Chain
+深层堆栈或坏值来源不明时，必须追到源头：
+- immediate failure site
+- direct caller
+- caller chain
+- bad value origin
+- original trigger
+- why symptom-site fix is rejected
+找不到 original trigger 时，不能冻结根因。
+## Reference Comparison
+有相似可用实现时，必须记录：
+- similar working example
+- broken path
+- differences found
+- differences accepted as hypothesis
+- differences ruled out
+不能用“差不多”跳过差异。
+## Diagnostic Instrumentation
+临时探针必须回答一个明确问题：
+- probe location
+- question answered
+- command to run
+- expected signal
+- actual signal
+- cleanup requirement
+探针不是修复。handoff 必须说明删除、保留为正式日志，或转成测试断言。
+## Prior History
+调查必须记录是否检查了：
+- `git log --oneline -20 -- <affected-files>`
+- historical `planning/analysis.md`
+- `TODOS.md` / backlog / roadmap
+- previous `report-card.json` findings
+如果同一区域重复出现 bug，必须标记为 architectural smell candidate。
+## External Research
+外部调研必须脱敏：
+- 不搜索 host、IP、token、customer id、内部路径、SQL、私有 repo 名
+- 只搜索通用错误类别、框架 / 库名、版本、组件名
+- research finding 只能作为候选假设，必须回到本仓库验证
+## No Code Root Cause
+如果结论不是代码根因，必须写清：
+- `rootCauseClass`: `code` / `config` / `environment` / `external` / `timing`
+- why not code root cause
+- monitoring or future evidence needed
+- operator handling after fix
+环境、外部服务、时序窗口仍然需要证据；不能把调查不足写成外因。
+## Repair Boundary
+修复边界至少记录：
+- affected module
+- allowed files
+- forbidden files
+- expected spec delta
+- verification after fix
+- blast radius file count
+- blast radius risk
+预计触碰超过 5 个文件时，必须 split / justify / reroute。
+## Escalation
+三次假设失败后，不再继续猜。必须记录：
+- failed hypothesis count
+- attempted evidence
+- why current entry is suspect
+- recommended next option：continue / instrument-and-wait / human-review / reroute-cc-plan
 ## Reroute
 - 根因明确，修复边界清楚 -> `cc-do`

package/.claude/skills/cc-plan/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,19 @@
 # CC-Plan Skill Changelog
+## v3.5.6 - 2026-04-28
+- require non-trivial plans to compare named option roles, including minimal viable and ideal architecture, before freezing a recommendation
+- add implementation decision horizon and error/rescue mapping so full designs resolve implementation-time ambiguity before `cc-do`
+- strengthen test-first planning with test framework evidence, coverage quality mapping, and mandatory regression tests for changed existing behavior
+- add conditional UI and DX/operator gates for design completeness, interaction states, target persona, time to first value, and magic moment
+## v3.5.5 - 2026-04-28
+- require over-broad asks to split back into roadmap stages or separate REQ/FIX candidates before detailed planning
+- clarify that `tiny-design` is a short approved design, not permission to skip the design gate
+- add implementation surface mapping so file responsibilities are locked before task decomposition
+- add review calibration so only build-blocking scope, ambiguity, verification, or execution issues fail the planning gate
 ## v3.5.4 - 2026-04-27
 - require planning outputs to resolve the runtime output policy before writing `planning/design.md`, `planning/tasks.md`, or `change-meta.json`

package/.claude/skills/cc-plan/PLAYBOOK.md CHANGED Viewed

@@ -20,6 +20,12 @@
 7. 同 blast radius 内的完整边界优先做完，跨系统或无证据扩张才 defer。
 8. 具体执行计划默认测试先行；没有 Red/Green/Refactor 链或 TDD exception，不准交给 `cc-do`。
 9. 新 change 目录必须使用 `REQ-<number>-<description>` 或 `FIX-<number>-<description>`；旧小写目录只读兼容，不再作为新输出。
+10. 原始需求跨多个独立子系统时，先拆回 roadmap / 多个 REQ/FIX；不要把一个大杂烩压成单个计划。
+11. `tiny-design` 仍然必须被批准，它只是短设计，不是跳过设计。
+12. 非 trivial 方案必须至少比较 `minimal viable` 和 `ideal architecture` 两种角色，小方案没有天然优先权。
+13. `full-design` 必须冻结 implementation decision horizon 和 error/rescue map，避免 `cc-do` 临场补设计。
+14. 测试框架来源、覆盖质量和回归测试必须在计划阶段写清，不准靠执行阶段猜。
+15. UI 和 developer/operator-facing 范围只在适用时触发对应 gate，不把每个计划都塞成大审查清单。
 ## Required Outputs
@@ -53,8 +59,14 @@
 8. `planning/tasks.md` 顶部必须写清 frozen decisions、commands to trust、do-not-re-decide。
 9. `planning/task-manifest.json` 必须是 `cc-do` 的真相源，而不是装饰文件。
 10. `planning/design.md` 必须包含 `Existing Leverage`、`NOT in scope`、`Failure Modes`、`Test Diagram`，除非明确说明为什么不适用。
-11. 新 artifact、CLI、包、容器、文档入口必须在计划阶段写清分发和 discoverability，不准到 `cc-act` 才发现没人能用。
-12. 行为变更任务必须拆成 `[TEST] -> [IMPL] -> [REFACTOR]` 或写明 TDD exception；不能用“实现并测试”混成一个任务。
+11. `planning/design.md` 或 `planning/tasks.md` 必须包含 implementation surface map：文件、职责、归属理由、耦合风险。
+12. `full-design` 必须包含 implementation decision horizon 和 error/rescue map；不适用时写清 N/A 理由。
+13. 新 artifact、CLI、包、容器、文档入口必须在计划阶段写清分发和 discoverability，不准到 `cc-act` 才发现没人能用。
+14. 行为变更任务必须拆成 `[TEST] -> [IMPL] -> [REFACTOR]` 或写明 TDD exception；不能用“实现并测试”混成一个任务。
+15. 回归测试不能 defer。修改既有行为且缺少覆盖时，必须先计划 regression test。
+16. UI scope 要写 design completeness score 和 loading / empty / error / success / partial 状态。
+17. developer/operator-facing scope 要写 target persona、time to first value、magic moment 和 install / run / debug / upgrade 风险。
+18. Review gate 只拦会导致实现错误、执行卡住、范围越界、验证缺失的问题；文字偏好和 nice-to-have 只能作为 advisory。
 ## Approval Flow
@@ -71,7 +83,12 @@
 - 现有代码已经解决了哪些子问题？
 - 最小完整方案触达哪些文件，为什么没有更小边界？
 - 数据流、状态流或执行流怎么走？
+- 每个会触达的文件职责是什么，为什么属于这个文件，而不是另一个平行位置？
+- 为什么推荐方案胜过 `minimal viable` / `ideal architecture` 的另一端？
+- foundation / core / integration / polish 阶段哪些决策已经冻结，哪些仍是 blocked question？
+- 每条 failure path 的 rescue action、用户可见结果和测试证据是什么？
 - 每条新增 code path / user flow / error path 的第一条失败测试是什么？
+- 测试框架来源是什么，现有覆盖是 strong、happy-path-only、smoke-only 还是 missing？
 - 哪些生产失败模式已经处理，哪些 defer 到 backlog？
 ## Design Mode Switch

package/.claude/skills/cc-plan/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: cc-plan
-version: 3.5.4
+version: 3.5.6
 description: Use when a requirement, roadmap item, or bug needs scope clarification, design decisions, and executable task breakdown before coding starts.
 triggers:
   - 帮我规划这个需求
@@ -34,6 +34,8 @@ writes:
 entry_gate:
   - Read roadmap handoff, current requirement files, code, docs, and tests before drafting design.
   - Freeze problem, constraints, non-goals, and success criteria before proposing implementation tasks.
+  - If the raw ask spans multiple independent subsystems, split it back into roadmap stages or separate REQ/FIX candidates before asking implementation details.
+  - "For non-trivial designs, compare named option roles: minimal viable, ideal architecture, and optional hybrid. Do not default to smallest unless it best serves the goal."
   - Plan executable work as Red/Green/Refactor by default; identify the first failing test before any production implementation task, or write an explicit TDD exception with replacement evidence.
   - Assign a canonical change key before writing artifacts; feature work must use `REQ-<number>-<description>`, and bug-fix work must use `FIX-<number>-<description>`.
   - Do not generate planning/tasks.md, planning/task-manifest.json, or change-meta.json until the recommended design is approved.
@@ -106,9 +108,11 @@ tool_budget:
 先给出默认 planning 形态，再解释为什么不是另外两种。`cc-plan` 的第一件事不是产出文档，而是压平 planning 密度。
+`tiny-design` 只是短设计，不是免设计。再小的变更也必须在 `planning/design.md` 里写清边界、验证和用户批准状态，不能用“太简单”跳过设计 gate。
 ## Harness Contract
-- Allowed actions: clarify scope, compare designs, freeze decisions, and write only `planning/design.md`, `planning/tasks.md`, `planning/task-manifest.json`, and `change-meta.json`.
+- Allowed actions: clarify scope, compare designs, split over-broad asks into separate planning candidates, freeze decisions, and write only `planning/design.md`, `planning/tasks.md`, `planning/task-manifest.json`, and `change-meta.json`.
 - Forbidden actions: writing production code, splitting planning into new side documents, or emitting tasks before approval.
 - Required evidence: design choices, task boundaries, and verification commands must point back to repo facts or explicit user approval.
 - Reroute rule: if the problem expands to project strategy go back to `roadmap`; if the plan is already frozen move straight to `cc-do`.
@@ -166,9 +170,10 @@ tool_budget:
 1. 先确认当前对象是一个 requirement，而不是整个项目路线图。
 2. 如果来源于 `roadmap`，必须先定位对应的 `RM-ID`，读清 `devflow/ROADMAP.md` / `devflow/BACKLOG.md` 的版本、证据、约束、success signal、next decision、primary capability、expected spec delta。
-3. 先读当前 change 目录现状。旧目录里如果还有 `BRAINSTORM.md` / `PLAN_REVIEW.md` / `context-package.md`，把有效信息吸收进新的 `planning/design.md`，不要继续增殖。
-4. 先看代码、文档、测试和最近提交，再谈拆任务。
-5. 先写不做什么，再写做什么。
+3. 如果原始需求包含多个可独立交付的子系统，先拆成独立 `RM` 或 `REQ/FIX` 候选；不要在一个 `cc-plan` 里继续追问实现细节。
+4. 先读当前 change 目录现状。旧目录里如果还有 `BRAINSTORM.md` / `PLAN_REVIEW.md` / `context-package.md`，把有效信息吸收进新的 `planning/design.md`，不要继续增殖。
+5. 先看代码、文档、测试和最近提交，再谈拆任务。
+6. 先写不做什么，再写做什么。
 ## Context Sweep
@@ -180,6 +185,9 @@ tool_budget:
 4. 当前 change 目录已有的 `planning/design.md`、`planning/tasks.md`、`planning/task-manifest.json`、`change-meta.json` 与历史 planning 文档
 5. `CLAUDE.md`、README、相关 docs / specs / ADR / 最近提交
 6. 当前代码、测试、发布、迁移、依赖的现实边界
+7. 测试框架真相源：优先读 `CLAUDE.md` / project docs 的测试约定，再用配置文件和目录结构补证。
+8. 如果有 UI scope，读取现有设计系统、组件、页面状态和交互模式。
+9. 如果是 API / CLI / SDK / developer-facing / operator-facing scope，读取 README、docs、package metadata、安装/运行/调试入口和当前 first-success path。
 先把这些材料压成 `Source Handoff`，再决定 discovery 还是 planning。
@@ -202,6 +210,9 @@ tool_budget:
 2. 澄清时一次只问一个关键问题，不做问题轰炸。
 3. 先写问题、目标、约束、非目标、成功标准，再写方案。
 4. 如果方向仍不稳，给 2-3 个方案，带 trade-off 和推荐，但这些内容都写进 `planning/design.md`。
+   - `full-design` 的方案必须至少包含 `minimal viable` 和 `ideal architecture` 两个角色。
+   - 两个角色权重相等；小方案不是默认答案，理想架构也不是默认过度设计。
+   - 只有一个方案成立时，必须写清其它方案为何被排除。
 5. 推荐方案没有得到用户明确批准前，不允许生成 `planning/tasks.md`。
 6. 批准后先判断这次用 `tiny-design` 还是 `full-design`。
 7. 把批准后的唯一方案冻结进 `planning/design.md`。
@@ -215,11 +226,19 @@ tool_budget:
 1. Existing leverage map：每个子问题先映射到现有代码、脚本、spec、模板或测试，避免重复造轮子。
 2. Scope challenge：超过 8 个文件、2 个新 service/class、或跨模块连锁时，必须解释为什么不是过度设计。
-3. Architecture diagram：跨模块或状态流变更要写 ASCII 数据流 / 依赖图。
-4. Code quality scan：指出 DRY、命名、错误处理、三层以上分支、隐藏耦合风险。
-5. Test diagram：列出新增 code path、user flow、错误路径、边界状态，并标注 first failing test、unit / e2e / eval。
-6. Performance and distribution：涉及批量、I/O、发布物、CLI、包、容器时，必须写清性能和分发边界。
-7. NOT in scope：所有被考虑但 defer 的内容要写理由，不能消失在聊天里。
+3. Implementation surface map：先锁定每个会新增或修改的文件、职责、归属理由、耦合风险，再拆任务。
+4. Option role check：非 trivial 方案必须比较 `minimal viable`、`ideal architecture`，必要时加 `hybrid`，并写清为什么推荐方案服务当前目标。
+5. Implementation decision horizon：提前写出 foundation、core logic、integration、polish/tests 阶段实现者会撞到的决策，能现在冻结就不要留给 `cc-do` 临场猜。
+6. Architecture diagram：跨模块或状态流变更要写 ASCII 数据流 / 依赖图。
+7. Error & Rescue map：`full-design` 必须按 codepath 写清 failure、rescue、user sees、test evidence；不适用时写 N/A 理由。
+8. Code quality scan：指出 DRY、命名、错误处理、三层以上分支、隐藏耦合风险。
+9. Test diagram：列出新增 code path、user flow、错误路径、边界状态，并标注 first failing test、unit / e2e / eval。
+10. Test framework source：先记录测试框架来自 `CLAUDE.md` / docs / config / directory 的哪条证据；不能靠猜。
+11. UI state coverage：有 UI / interaction scope 时，写 loading / empty / error / success / partial 状态表和 design completeness score。
+12. DX / operator coverage：developer-facing / operator-facing scope 必须写 target persona、time to first value、magic moment、install / run / debug / upgrade 风险。
+13. Performance and distribution：涉及批量、I/O、发布物、CLI、包、容器时，必须写清性能和分发边界。
+14. NOT in scope：所有被考虑但 defer 的内容要写理由，不能消失在聊天里。
+15. Review calibration：只有会导致 `cc-do` 建错、卡住、越界、漏测的问题才是 blocking；措辞偏好和非阻塞建议不能伪装成 gate failure。
 如果任一项无法从当前证据完成，写 `assumption` 或 `blocked question`，不要伪装成已经审过。
@@ -227,15 +246,21 @@ tool_budget:
 `cc-plan` 生成具体计划时默认采用测试先行纪律。不能让计划是“先实现再补测”，然后把 TDD 压力留给 `cc-do` 临场修正。
-1. 每个可观察行为变更默认拆成 `Red -> Green -> Refactor`：
+1. 先定位测试框架真相源：
+   - 优先读取 `CLAUDE.md` / project docs 中的 testing 约定。
+   - 如果没有，按配置文件和目录结构识别：`vitest` / `jest` / `pytest` / `go test` / `cargo test` / `rspec` / `playwright` / `cypress` 等。
+   - 如果仍然没有框架，写成 `test framework unknown`，并把验证计划降级为 exploratory spike 或 manual evidence，不准假装已有自动测试路径。
+2. 每个可观察行为变更默认拆成 `Red -> Green -> Refactor`：
    - Red：先写 `[TEST]` 任务，目标是用最小失败测试证明目标行为缺失。
    - Green：再写 `[IMPL]` 任务，只做让对应红灯转绿的最小生产实现。
    - Refactor：最后写 `[REFACTOR]` 或在实现任务中明确 refactor checkpoint，说明何时清理重复、命名、结构和坏味道。
-2. `planning/tasks.md` 不能把测试和实现塞进同一个 task。一个 task 同时写“实现并测试”就是计划失败。
-3. `planning/task-manifest.json` 必须让 `cc-do` 看出每个任务的 `tddPhase`、依赖和证据：`red` 任务产出 failing output，`green` 任务产出 passing output，`refactor` 任务产出重跑后的 green evidence。
-4. 只有纯文档、纯配置、纯生成文件、throwaway prototype 可以例外。例外必须写进 `planning/design.md` 和 `planning/tasks.md` 的 `TDD exceptions`，包含原因、风险、替代验证命令和后续补证入口。
-5. 并行只允许发生在已经满足上游 Red/Green 依赖之后。两个 `[P]` 任务如果共享同一个红灯或同一组 touched files，就不能并行。
-6. 如果当前需求找不到第一条失败测试，先把它写成 blocked question 或 exploratory spike，不准伪装成可执行实现任务。
+3. `planning/tasks.md` 不能把测试和实现塞进同一个 task。一个 task 同时写“实现并测试”就是计划失败。
+4. `planning/task-manifest.json` 必须让 `cc-do` 看出每个任务的 `tddPhase`、依赖和证据：`red` 任务产出 failing output，`green` 任务产出 passing output，`refactor` 任务产出重跑后的 green evidence。
+5. Test diagram 要同时覆盖 code paths 和 user flows。每条路径标注 `unit` / `integration` / `e2e` / `eval`，并给现有测试质量分级：`strong`、`happy-path-only`、`smoke-only`、`missing`。
+6. 回归测试是硬门槛。只要计划修改既有行为且现有测试没有覆盖，就必须把 regression test 写进 `planning/tasks.md`，不能 defer，不能问用户要不要跳过。
+7. 只有纯文档、纯配置、纯生成文件、throwaway prototype 可以例外。例外必须写进 `planning/design.md` 和 `planning/tasks.md` 的 `TDD exceptions`，包含原因、风险、替代验证命令和后续补证入口。
+8. 并行只允许发生在已经满足上游 Red/Green 依赖之后。两个 `[P]` 任务如果共享同一个红灯或同一组 touched files，就不能并行。
+9. 如果当前需求找不到第一条失败测试，先把它写成 blocked question 或 exploratory spike，不准伪装成可执行实现任务。
 ## Design Modes
@@ -251,6 +276,8 @@ tool_budget:
 3. 不涉及 migration、复杂状态流、权限、安全、回滚编排
 4. 执行者看完一张冻结卡片就能准确落地
+`tiny-design` 仍然必须有用户批准、implementation surface、第一条验证证据和升级到 `full-design` 的触发条件。它消除的是冗长文档，不是消除设计。
 出现以下任一情况，直接升级到 `full-design`：
 1. 需要跨模块协调或多阶段落地
@@ -268,11 +295,15 @@ tool_budget:
 4. Ambiguity scan：实现者看完不能还靠猜
 5. Feasibility scan：方案要接得上现有代码、依赖和时间边界
 6. Source alignment：仍然对齐上游 roadmap 的 success signal、constraints、non-goals
-7. Engineering scan：完成 existing leverage、scope challenge、test diagram、failure modes、NOT in scope
-8. Final gate：明确 auto-decided items、taste decisions、user challenges 和最终 recommendation
+7. Engineering scan：完成 existing leverage、scope challenge、implementation surface、test diagram、failure modes、NOT in scope
+8. Decision horizon scan：foundation / core / integration / polish/tests 的实现决策是否已经冻结或明确 blocked。
+9. Error & rescue scan：`full-design` 是否写清 failure -> rescue -> user sees -> test evidence。
+10. Test framework / regression scan：测试框架来源、覆盖质量、回归测试是否明确。
+11. Review calibration：只把会导致实现错误、执行卡住、范围越界、验证缺失的问题标成 blocking；非阻塞建议必须降级为 advisory
+12. Final gate：明确 auto-decided items、taste decisions、user challenges 和最终 recommendation
-如果有 UI / interaction 明显范围，在 `planning/design.md` 里补一段 design review 结论。
-如果有 API / CLI / developer-facing scope，在 `planning/design.md` 里补一段 DX review 结论。
+如果有 UI / interaction 明显范围，在 `planning/design.md` 里补 design completeness score 和状态覆盖表。
+如果有 API / CLI / developer-facing / operator-facing scope，在 `planning/design.md` 里补 target persona、time to first value、magic moment 和 DX / operator review 结论。
 ## Good Output
@@ -305,6 +336,7 @@ tool_budget:
 7. 具体计划默认测试先行；没有 Red/Green/Refactor 或 TDD exception，就不能进入 `cc-do`。
 8. 任务一旦超过 2-5 分钟粒度就继续拆，直到可以稳定交给执行者。
 9. 三层以上判断说明设计还没压平，应回到 `planning/design.md` 继续简化。
+10. `tiny-design` 不得被当成“免审批”；只要要写任务，就必须先有已批准的设计卡片。
 ## Exit Criteria