npm - cc-devflow - Versions diffs - 4.5.1 → 4.5.3 - Mend

cc-devflow 4.5.1 → 4.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (81) hide show

package/.claude/skills/cc-plan/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: cc-plan
-version: 3.5.6
+version: 3.7.0
 description: Use when a requirement, roadmap item, or bug needs scope clarification, design decisions, and executable task breakdown before coding starts.
 triggers:
   - 帮我规划这个需求
@@ -33,6 +33,7 @@ writes:
     required: true
 entry_gate:
   - Read roadmap handoff, current requirement files, code, docs, and tests before drafting design.
+  - Load cc-devflow native language and decision sources (`devflow/specs/`, roadmap/backlog handoff, current or prior `planning/design.md` / `planning/analysis.md`, and `change-meta.json`) before naming concepts, modules, tests, or tasks.
   - Freeze problem, constraints, non-goals, and success criteria before proposing implementation tasks.
   - If the raw ask spans multiple independent subsystems, split it back into roadmap stages or separate REQ/FIX candidates before asking implementation details.
   - "For non-trivial designs, compare named option roles: minimal viable, ideal architecture, and optional hybrid. Do not default to smallest unless it best serves the goal."
@@ -173,7 +174,8 @@ tool_budget:
 3. 如果原始需求包含多个可独立交付的子系统，先拆成独立 `RM` 或 `REQ/FIX` 候选；不要在一个 `cc-plan` 里继续追问实现细节。
 4. 先读当前 change 目录现状。旧目录里如果还有 `BRAINSTORM.md` / `PLAN_REVIEW.md` / `context-package.md`，把有效信息吸收进新的 `planning/design.md`，不要继续增殖。
 5. 先看代码、文档、测试和最近提交，再谈拆任务。
-6. 先写不做什么，再写做什么。
+6. 先读 cc-devflow 原生项目语言和决策上下文：`devflow/specs/INDEX.md`、相关 capability specs、roadmap/backlog handoff、当前或历史 `planning/design.md` / `planning/analysis.md`、`change-meta.json`；不存在时静默跳过，但发现术语冲突必须写成 blocked question 或 user challenge。
+7. 先写不做什么，再写做什么。
 ## Context Sweep
@@ -182,12 +184,14 @@ tool_budget:
 1. 当前对象对应的 `RM-ID`、roadmap version、roadmap skill version
 2. `devflow/ROADMAP.md` / `devflow/BACKLOG.md` 中该事项的阶段来源、证据、dependencies、success signal、kill signal、next decision、capability links
 3. `devflow/specs/INDEX.md` 与相关 capability specs
-4. 当前 change 目录已有的 `planning/design.md`、`planning/tasks.md`、`planning/task-manifest.json`、`change-meta.json` 与历史 planning 文档
-5. `CLAUDE.md`、README、相关 docs / specs / ADR / 最近提交
-6. 当前代码、测试、发布、迁移、依赖的现实边界
-7. 测试框架真相源：优先读 `CLAUDE.md` / project docs 的测试约定，再用配置文件和目录结构补证。
-8. 如果有 UI scope，读取现有设计系统、组件、页面状态和交互模式。
-9. 如果是 API / CLI / SDK / developer-facing / operator-facing scope，读取 README、docs、package metadata、安装/运行/调试入口和当前 first-success path。
+4. 项目语言 / 决策上下文：`devflow/specs/INDEX.md`、相关 capability specs、roadmap/backlog handoff、当前或历史 `planning/design.md` / `planning/analysis.md`、`change-meta.json`
+5. 当前 change 目录已有的 `planning/design.md`、`planning/tasks.md`、`planning/task-manifest.json`、`change-meta.json` 与历史 planning 文档
+6. `CLAUDE.md`、README、相关 docs / specs / 最近提交
+7. 当前代码、测试、发布、迁移、依赖的现实边界
+8. 测试框架真相源：优先读 `CLAUDE.md` / project docs 的测试约定，再用配置文件和目录结构补证。
+9. 如果有 UI scope，读取现有设计系统、组件、页面状态和交互模式。
+10. 如果是 API / CLI / SDK / developer-facing / operator-facing scope，读取 README、docs、package metadata、安装/运行/调试入口和当前 first-success path。
+11. 如果现有语言仍混乱，写出最小 glossary delta：canonical term、aliases to avoid、flagged ambiguity、关系约束；只记录领域或 capability 概念，不记录短期类名。
 先把这些材料压成 `Source Handoff`，再决定 discovery 还是 planning。
@@ -201,9 +205,22 @@ tool_budget:
 4. Narrowest wedge：最小可交付边界是什么，哪些同 blast radius 问题必须顺手解决？
 5. Observation：有没有日志、测试、真实流程、最近提交能证明这个问题存在？
 6. Future fit：这个方案 6 个月后是否仍然是正确边界，还是会制造第二套系统？
+7. Language fit：这次使用的核心名词是否已经是项目里的 canonical term，还是在创造第二套语言？
+8. Interface fit：调用方真正需要的最小公共接口是什么，哪些复杂度应该被藏在模块内部？
 一次只问一个关键未知点。能从代码、文档、测试、git 历史里确认的问题，不问用户。
+## Grilling Protocol
+`cc-plan` 可以吸收 brainstorm / grilling 的结论，但不再产出独立 `BRAINSTORM.md`。深挖问题时遵守这些规则：
+1. 沿决策树一枝一枝走。每次只解决一个会改变设计或任务切分的关键分支。
+2. 每个问题必须附带推荐答案、证据来源、以及如果用户反对会影响哪些下游决策。
+3. 能从代码、docs、tests、git history、capability spec、roadmap handoff 或历史 design/analysis 得到答案时，先查证，不问用户。
+4. 用户或文档里的模糊词必须被压成 canonical term；如果和 `devflow/specs/`、roadmap/backlog 或历史 design/analysis 冲突，立即标成 `language conflict`。
+5. 具体场景优先于抽象概念。每个关键边界至少用一个真实 codepath、user flow、operator flow 或 failure path 压测。
+6. 只有满足 hard to reverse、surprising without context、real trade-off 三个条件的决策，才建议沉淀为 capability spec delta 或 roadmap/backlog decision note；否则留在本次 design decision log。
 ## Session Protocol
 1. 先探索上下文，再写结论。
@@ -228,17 +245,23 @@ tool_budget:
 2. Scope challenge：超过 8 个文件、2 个新 service/class、或跨模块连锁时，必须解释为什么不是过度设计。
 3. Implementation surface map：先锁定每个会新增或修改的文件、职责、归属理由、耦合风险，再拆任务。
 4. Option role check：非 trivial 方案必须比较 `minimal viable`、`ideal architecture`，必要时加 `hybrid`，并写清为什么推荐方案服务当前目标。
-5. Implementation decision horizon：提前写出 foundation、core logic、integration、polish/tests 阶段实现者会撞到的决策，能现在冻结就不要留给 `cc-do` 临场猜。
-6. Architecture diagram：跨模块或状态流变更要写 ASCII 数据流 / 依赖图。
-7. Error & Rescue map：`full-design` 必须按 codepath 写清 failure、rescue、user sees、test evidence；不适用时写 N/A 理由。
-8. Code quality scan：指出 DRY、命名、错误处理、三层以上分支、隐藏耦合风险。
-9. Test diagram：列出新增 code path、user flow、错误路径、边界状态，并标注 first failing test、unit / e2e / eval。
-10. Test framework source：先记录测试框架来自 `CLAUDE.md` / docs / config / directory 的哪条证据；不能靠猜。
-11. UI state coverage：有 UI / interaction scope 时，写 loading / empty / error / success / partial 状态表和 design completeness score。
-12. DX / operator coverage：developer-facing / operator-facing scope 必须写 target persona、time to first value、magic moment、install / run / debug / upgrade 风险。
-13. Performance and distribution：涉及批量、I/O、发布物、CLI、包、容器时，必须写清性能和分发边界。
-14. NOT in scope：所有被考虑但 defer 的内容要写理由，不能消失在聊天里。
-15. Review calibration：只有会导致 `cc-do` 建错、卡住、越界、漏测的问题才是 blocking；措辞偏好和非阻塞建议不能伪装成 gate failure。
+5. Domain language check：核心名词、文件命名、测试名、任务标题必须对齐 `devflow/specs/`、roadmap handoff 或历史 design/analysis；没有来源时写 assumption，不要临时发明第二套语言。
+6. Interface depth check：新增或改动模块 / API / CLI / SDK 时，先说明调用方、公共操作、隐藏复杂度、易用错点；非 trivial 公共接口至少比较两种故意不同的形态，例如 `minimal/common-case` 与 `flexible/general-purpose`，再解释为什么最终形态更深、更不容易误用。
+7. Implementation decision horizon：提前写出 foundation、core logic、integration、polish/tests 阶段实现者会撞到的决策，能现在冻结就不要留给 `cc-do` 临场猜。
+8. Architecture diagram：跨模块或状态流变更要写 ASCII 数据流 / 依赖图。
+9. Error & Rescue map：`full-design` 必须按 codepath 写清 failure、rescue、user sees、test evidence；不适用时写 N/A 理由。
+10. Code quality scan：指出 DRY、命名、错误处理、三层以上分支、隐藏耦合风险。
+11. Test diagram：列出新增 code path、user flow、错误路径、边界状态，并标注 first failing test、unit / e2e / eval。
+12. Test seam check：每条 Red 任务必须说明通过哪个公共接口、调用方流程或用户可见路径证明行为；如果只能测私有函数、内部调用次数或临时结构，先改设计或写 blocked question。
+13. Mock boundary check：只允许 mock 系统边界，如外部 API、时间、随机性、文件系统、必要数据库边界；不 mock 自己控制的内部模块。
+14. Feedback loop check：为每条行为选定最短可信反馈循环，优先顺序是自动测试、curl/HTTP、CLI+fixture、浏览器脚本、trace replay、throwaway harness、property/fuzz、differential loop、HITL script。
+15. Test framework source：先记录测试框架来自 `CLAUDE.md` / docs / config / directory 的哪条证据；不能靠猜。
+16. UI state coverage：有 UI / interaction scope 时，写 loading / empty / error / success / partial 状态表和 design completeness score。
+17. DX / operator coverage：developer-facing / operator-facing scope 必须写 target persona、time to first value、magic moment、install / run / debug / upgrade 风险。
+18. Performance and distribution：涉及批量、I/O、发布物、CLI、包、容器时，必须写清性能和分发边界。
+19. NOT in scope：所有被考虑但 defer 的内容要写理由，不能消失在聊天里。
+20. Review calibration：只有会导致 `cc-do` 建错、卡住、越界、漏测的问题才是 blocking；措辞偏好和非阻塞建议不能伪装成 gate failure。
+21. Durable brief check：设计摘要、PRD 化描述、issue / follow-up handoff 只写行为、契约、模块责任和验收标准；不要把易过期的文件路径、行号或当前实现细节当成长期事实。
 如果任一项无法从当前证据完成，写 `assumption` 或 `blocked question`，不要伪装成已经审过。
@@ -250,17 +273,24 @@ tool_budget:
    - 优先读取 `CLAUDE.md` / project docs 中的 testing 约定。
    - 如果没有，按配置文件和目录结构识别：`vitest` / `jest` / `pytest` / `go test` / `cargo test` / `rspec` / `playwright` / `cypress` 等。
    - 如果仍然没有框架，写成 `test framework unknown`，并把验证计划降级为 exploratory spike 或 manual evidence，不准假装已有自动测试路径。
-2. 每个可观察行为变更默认拆成 `Red -> Green -> Refactor`：
+2. 先冻结测试 seam 和行为断言：
+   - Red 必须通过公共接口、调用方流程、CLI/API/UI 路径或其它真实边界证明行为缺失。
+   - 测试名、断言和 fixture 必须描述用户 / 调用方关心的行为，不描述内部实现步骤。
+   - 如果正确 seam 不存在，计划先写 exploratory spike 或架构 follow-up，不准用脆弱单元测试冒充回归保护。
+3. 每个可观察行为变更默认拆成 `Red -> Green -> Refactor`：
    - Red：先写 `[TEST]` 任务，目标是用最小失败测试证明目标行为缺失。
    - Green：再写 `[IMPL]` 任务，只做让对应红灯转绿的最小生产实现。
    - Refactor：最后写 `[REFACTOR]` 或在实现任务中明确 refactor checkpoint，说明何时清理重复、命名、结构和坏味道。
-3. `planning/tasks.md` 不能把测试和实现塞进同一个 task。一个 task 同时写“实现并测试”就是计划失败。
-4. `planning/task-manifest.json` 必须让 `cc-do` 看出每个任务的 `tddPhase`、依赖和证据：`red` 任务产出 failing output，`green` 任务产出 passing output，`refactor` 任务产出重跑后的 green evidence。
-5. Test diagram 要同时覆盖 code paths 和 user flows。每条路径标注 `unit` / `integration` / `e2e` / `eval`，并给现有测试质量分级：`strong`、`happy-path-only`、`smoke-only`、`missing`。
-6. 回归测试是硬门槛。只要计划修改既有行为且现有测试没有覆盖，就必须把 regression test 写进 `planning/tasks.md`，不能 defer，不能问用户要不要跳过。
-7. 只有纯文档、纯配置、纯生成文件、throwaway prototype 可以例外。例外必须写进 `planning/design.md` 和 `planning/tasks.md` 的 `TDD exceptions`，包含原因、风险、替代验证命令和后续补证入口。
-8. 并行只允许发生在已经满足上游 Red/Green 依赖之后。两个 `[P]` 任务如果共享同一个红灯或同一组 touched files，就不能并行。
-9. 如果当前需求找不到第一条失败测试，先把它写成 blocked question 或 exploratory spike，不准伪装成可执行实现任务。
+4. 禁止水平切片：不能先写一批测试、再写一批实现。计划必须按 tracer bullet 垂直切片排列：一个行为红灯 -> 最小实现转绿 -> 必要重构，然后再进入下一个行为。
+5. `planning/tasks.md` 不能把测试和实现塞进同一个 task。一个 task 同时写“实现并测试”就是计划失败。
+6. `planning/tasks.md` 的每个 `[TEST]` task 必须写清 test seam、behavior asserted、allowed mocks、feedback loop type、implementation-detail risk。
+7. `planning/task-manifest.json` 必须让 `cc-do` 看出每个任务的 `tddPhase`、依赖、测试质量边界和证据：`red` 任务产出 failing output，`green` 任务产出 passing output，`refactor` 任务产出重跑后的 green evidence。
+8. Test diagram 要同时覆盖 code paths 和 user flows。每条路径标注 `unit` / `integration` / `e2e` / `eval`，并给现有测试质量分级：`strong`、`happy-path-only`、`smoke-only`、`missing`。
+9. 回归测试是硬门槛。只要计划修改既有行为且现有测试没有覆盖，就必须把 regression test 写进 `planning/tasks.md`，不能 defer，不能问用户要不要跳过。
+10. 只有纯文档、纯配置、纯生成文件、throwaway prototype 可以例外。例外必须写进 `planning/design.md` 和 `planning/tasks.md` 的 `TDD exceptions`，包含原因、风险、替代验证命令和后续补证入口。
+11. 并行只允许发生在已经满足上游 Red/Green 依赖之后。两个 `[P]` 任务如果共享同一个红灯或同一组 touched files，就不能并行。
+12. 如果当前需求找不到第一条失败测试，先把它写成 blocked question 或 exploratory spike，不准伪装成可执行实现任务。
+13. 每条垂直切片必须标注 `AFK` 或 `HITL`：`AFK` 代表执行者可在现有合同下独立完成并验证；`HITL` 代表仍需要用户判断、外部权限、设计取舍或人工验收。默认拆到可 `AFK`，只有证据证明必须人工参与时才保留 `HITL`。
 ## Design Modes
@@ -299,8 +329,14 @@ tool_budget:
 8. Decision horizon scan：foundation / core / integration / polish/tests 的实现决策是否已经冻结或明确 blocked。
 9. Error & rescue scan：`full-design` 是否写清 failure -> rescue -> user sees -> test evidence。
 10. Test framework / regression scan：测试框架来源、覆盖质量、回归测试是否明确。
-11. Review calibration：只把会导致实现错误、执行卡住、范围越界、验证缺失的问题标成 blocking；非阻塞建议必须降级为 advisory
-12. Final gate：明确 auto-decided items、taste decisions、user challenges 和最终 recommendation
+11. Test seam / mock boundary scan：Red 任务是否通过公共 seam 证明行为，mock 是否只发生在系统边界，反馈循环是否可重复。
+12. Domain language scan：核心名词、测试名、文件职责是否沿用项目语言；冲突是否写成 blocked question / user challenge。
+13. Interface depth scan：新增接口是否足够小、隐藏复杂度是否足够深、调用方是否容易正确使用且不容易误用；非 trivial 接口是否已经做过至少两种形态比较。
+14. Tracer bullet scan：任务是否按一个行为一条 Red/Green/Refactor 链组织，而不是按测试层、服务层、UI 层水平堆叠。
+15. Slice readiness scan：每条切片是否能独立 demo / verify，是否标明 `AFK` / `HITL`、依赖和阻塞原因。
+16. Durable handoff scan：design / issue / follow-up 文案是否按行为和契约表达，没有把当前文件行号当成长期 truth。
+17. Review calibration：只把会导致实现错误、执行卡住、范围越界、验证缺失的问题标成 blocking；非阻塞建议必须降级为 advisory
+18. Final gate：明确 auto-decided items、taste decisions、user challenges 和最终 recommendation
 如果有 UI / interaction 明显范围，在 `planning/design.md` 里补 design completeness score 和状态覆盖表。
 如果有 API / CLI / developer-facing / operator-facing scope，在 `planning/design.md` 里补 target persona、time to first value、magic moment 和 DX / operator review 结论。
@@ -308,8 +344,9 @@ tool_budget:
 ## Good Output
 - `planning/design.md` 一份就讲清：为什么做、做什么、不做什么、备选方案、批准方案、设计模式、风险、review gate、执行边界
-- `planning/tasks.md` 只保留能直接执行的任务和 handoff，不再承载重复背景介绍；行为变更默认拆成 `[TEST] -> [IMPL] -> [REFACTOR]`
-- `planning/task-manifest.json` 是 `cc-do` 的真相源，要写清 `dependsOn`、`tddPhase`、并行资格、触点、验证命令，以及继承了哪版 roadmap / design / spec
+- `planning/design.md` 必须使用项目 canonical language，记录相关 capability spec / roadmap decision 冲突，并说明新增接口如何保持小接口深模块
+- `planning/tasks.md` 只保留能直接执行的任务和 handoff，不再承载重复背景介绍；行为变更默认拆成 tracer bullet 形式的 `[TEST] -> [IMPL] -> [REFACTOR]`，且 Red task 明确公共 seam、行为断言、mock 边界和反馈循环
+- `planning/task-manifest.json` 是 `cc-do` 的真相源，要写清 `dependsOn`、`tddPhase`、`verticalSlice`、test seam、allowed mocks、feedback loop、并行资格、触点、验证命令，以及继承了哪版 roadmap / design / spec
 - `change-meta.json` 是 capability 真相源，要写清这次 change 准备如何改变长期 spec
 - 看完第一屏，执行者就知道这次属于 `tiny-design` 还是 `full-design`，以及为什么
@@ -334,9 +371,10 @@ tool_budget:
 5. 版本、来源、冻结决策必须可追踪。
 6. 任务少而硬，胜过任务多而虚。
 7. 具体计划默认测试先行；没有 Red/Green/Refactor 或 TDD exception，就不能进入 `cc-do`。
-8. 任务一旦超过 2-5 分钟粒度就继续拆，直到可以稳定交给执行者。
-9. 三层以上判断说明设计还没压平，应回到 `planning/design.md` 继续简化。
-10. `tiny-design` 不得被当成“免审批”；只要要写任务，就必须先有已批准的设计卡片。
+8. 任务必须是端到端可验证的垂直切片；除非是纯重构，否则不要按“先改模型、再改服务、最后改 UI”的水平层次拆。
+9. 任务一旦超过 2-5 分钟粒度就继续拆，直到可以稳定交给执行者。
+10. 三层以上判断说明设计还没压平，应回到 `planning/design.md` 继续简化。
+11. `tiny-design` 不得被当成“免审批”；只要要写任务，就必须先有已批准的设计卡片。
 ## Exit Criteria

package/.claude/skills/cc-plan/assets/DESIGN_TEMPLATE.md CHANGED Viewed

@@ -40,6 +40,16 @@
 - Intentional gaps:
 - Spec sync target:
+## Domain Language & Durable Decisions
+- Language sources loaded:
+- Canonical terms used:
+- Terms avoided / aliases:
+- Language conflicts:
+- Native decision sources loaded:
+- Capability spec / roadmap decision conflicts:
+- Decisions worth long-term capability spec sync:
 ## Requirement Snapshot
 - Raw ask:
@@ -101,6 +111,14 @@
 - Error handling:
 - Rollout / migration:
+## Interface / Deep Module Check
+| Surface | Callers | Public operations | Complexity hidden | Misuse risk | Shape decision |
+|---------|---------|-------------------|-------------------|-------------|----------------|
+|  |  |  |  |  |  |
+> 新增或改动公共接口时，优先小接口深模块。若有两个合理形态，写清为什么没有选择另一个。
 ## Implementation Decision Horizon
 | Phase | Decision `cc-do` would otherwise hit | Frozen answer | Evidence / owner |
@@ -142,6 +160,11 @@
 - Test framework source:
 - First failing tests:
+- Test seams / public interfaces:
+- Behavior assertions:
+- Mock boundaries:
+- Feedback loop types:
+- Tracer bullet order:
 - Red/Green/Refactor task chain:
 - TDD exceptions:
 - Regression tests required:
@@ -154,9 +177,9 @@
 ## Test Coverage Map
-| Code path / user flow | Existing coverage | Quality | Required test | Level | Regression? |
-|-----------------------|-------------------|---------|---------------|-------|-------------|
-|  |  | strong / happy-path-only / smoke-only / missing |  | unit / integration / e2e / eval | Yes / No |
+| Code path / user flow | Public seam | Behavior asserted | Existing coverage | Quality | Required test | Level | Mock boundary | Implementation-detail risk | Regression? |
+|-----------------------|-------------|-------------------|-------------------|---------|---------------|-------|---------------|----------------------------|-------------|
+|  |  |  |  | strong / happy-path-only / smoke-only / missing |  | unit / integration / e2e / eval | none / system boundary | low / medium / high | Yes / No |
 ## Error & Rescue Map
@@ -200,10 +223,14 @@
 - Ambiguity scan:
 - Feasibility scan:
 - Source alignment:
+- Domain language scan:
 - Implementation surface scan:
+- Interface depth scan:
 - Decision horizon scan:
 - Error & rescue scan:
 - Test framework / regression scan:
+- Test seam / mock boundary scan:
+- Tracer bullet scan:
 - UI / interaction review summary:
 - DX / operator review summary:
 - Test-first readiness:

package/.claude/skills/cc-plan/assets/TASKS_TEMPLATE.md CHANGED Viewed

@@ -17,10 +17,15 @@
 - Execution mode: `single-path` | `parallel-ready`
 - Frozen decisions:
 - Capability specs:
+- Canonical language / terms:
 - Read first:
 - Commands to trust:
 - Test framework source:
+- Test seam policy: Red tasks verify behavior through public interfaces, caller flows, CLI/API/UI paths, or other real seams.
+- Mock boundary policy: mock only system boundaries; do not mock internal collaborators owned by this codebase.
+- Feedback loop ladder: automated test -> HTTP/curl -> CLI fixture -> browser script -> trace replay -> harness -> property/fuzz -> differential -> HITL.
 - TDD plan: `Red -> Green -> Refactor`
+- Tracer bullet plan: one observable behavior at a time; no horizontal "all tests first, all code later" slice
 - TDD exceptions: none | list exception reason, risk, replacement evidence, follow-up
 - Regression tests: required | not applicable, with reason
 - Do not re-decide:
@@ -36,6 +41,14 @@
 > 这张表是执行边界，不是装饰。任务拆分必须沿着这些职责走，不能让 `cc-do` 临场重切文件归属。
+## Tracer Bullet Map
+| Slice | Observable behavior | Public test seam | Feedback loop | Red task | Green task | Refactor / evidence | Why vertical |
+|-------|---------------------|------------------|---------------|----------|------------|---------------------|--------------|
+| Slice 1 |  |  | automated test | T001 | T002 | T005 |  |
+> 每个 slice 必须能独立证明一个端到端行为，不要按“只改数据层 / 只改 UI 层”横切。
 ## Phase 1: Foundation
 - [ ] T001 [TEST] Write the first failing test (dependsOn:none) `path/to/test`
@@ -46,6 +59,11 @@
   Verification: `npm test -- path/to/test`
   Evidence: failing output
   Coverage: unit / integration / e2e / eval; regression: yes / no
+  Test seam: public interface / caller flow / CLI / API / UI / trace replay / harness
+  Behavior asserted: 描述用户或调用方可观察行为，不描述内部实现步骤
+  Allowed mocks: none / external API / time / randomness / filesystem / database boundary
+  Test quality guard: no private methods, no internal call-count assertions, no internal collaborator mocks
+  Vertical slice: Slice 1
   Ready when: 没有上游依赖，且测试路径已经确定
 - [ ] T002 [IMPL] Make the first test pass (dependsOn:T001) `path/to/file`
@@ -55,6 +73,7 @@
   Read first: `design.md`, `path/to/test`
   Verification: `npm test -- path/to/test`
   Evidence: passing output + checkpoint
+  Vertical slice: Slice 1
   Ready when: T001 已经见红，且当前 touched files 不和其他并行任务冲突
 ## Phase 2: Build
@@ -67,6 +86,11 @@
   Verification: `npm test -- path/to/other.test`
   Evidence: failing output
   Coverage: unit / integration / e2e / eval; regression: yes / no
+  Test seam: public interface / caller flow / CLI / API / UI / trace replay / harness
+  Behavior asserted: 描述用户或调用方可观察行为，不描述内部实现步骤
+  Allowed mocks: none / external API / time / randomness / filesystem / database boundary
+  Test quality guard: no private methods, no internal call-count assertions, no internal collaborator mocks
+  Vertical slice: Slice 2
   Ready when: T002 完成，且该测试覆盖的是独立行为
 - [ ] T004 [P] [IMPL] Make the independent test pass (dependsOn:T003) `path/to/other-file`
@@ -76,6 +100,7 @@
   Read first: `design.md`, `path/to/other.test`
   Verification: `npm test -- path/to/other.test`
   Evidence: passing output + review notes
+  Vertical slice: Slice 2
   Ready when: T003 已经见红，且文件触点与其他 `[P]` 任务不冲突
 ## Phase 3: Verify
@@ -110,3 +135,6 @@
 - 要留下什么证据给 `cc-check`
 - 它处于 Red、Green、Refactor，还是明确的 TDD exception
 - 测试框架依据来自哪里，回归测试是否被明确处理
+- Red task 通过哪个公共 seam 证明行为缺失，允许 mock 的边界是什么
+- 测试是否会在内部重构后继续成立，而不是绑定私有函数、调用次数或临时结构
+- 它属于哪个 tracer bullet 垂直切片，完成后哪个可观察行为被证明

package/.claude/skills/cc-plan/assets/TASK_MANIFEST_TEMPLATE.json CHANGED Viewed

@@ -20,16 +20,41 @@
     ]
   },
   "planningMeta": {
-    "reqPlanSkillVersion": "3.5.6",
+    "reqPlanSkillVersion": "3.7.0",
     "designVersion": "design.v1",
     "approvedAt": "2026-04-15T12:00:00.000Z",
     "approvedBy": "user",
     "basedOnOption": "Option A"
   },
+  "languageAndDecisions": {
+    "languageSources": [],
+    "canonicalTerms": [],
+    "languageConflicts": [],
+    "decisionDocs": [],
+    "adrOrSpecConflicts": []
+  },
   "executionDiscipline": {
     "default": "red-green-refactor",
+    "taskShape": "vertical-tracer-bullets",
     "testFirstRequired": true,
     "testFrameworkSource": "",
+    "testQualityPolicy": {
+      "publicInterfaceRequired": true,
+      "behaviorAssertionRequired": true,
+      "mockBoundary": "system-boundaries-only",
+      "implementationDetailTests": "blocked",
+      "feedbackLoopPreference": [
+        "automated-test",
+        "http-curl",
+        "cli-fixture",
+        "browser-script",
+        "trace-replay",
+        "throwaway-harness",
+        "property-fuzz",
+        "differential-loop",
+        "hitl-script"
+      ]
+    },
     "regressionTestsRequired": [],
     "tddExceptions": []
   },
@@ -67,6 +92,26 @@
       "phase": 1,
       "status": "pending",
       "tddPhase": "red",
+      "verticalSlice": "Slice 1",
+      "testSeam": {
+        "entry": "public interface / caller flow / CLI / API / UI / trace replay / harness",
+        "behaviorAsserted": "The user or caller observable behavior that should exist",
+        "implementationDetailRisk": "low"
+      },
+      "feedbackLoop": {
+        "type": "automated-test",
+        "determinism": "deterministic",
+        "expectedFailure": "Fails because the target behavior is missing"
+      },
+      "allowedMocks": [
+        "external API / time / randomness / filesystem / database boundary"
+      ],
+      "testQuality": {
+        "usesPublicInterface": true,
+        "describesBehavior": true,
+        "survivesInternalRefactor": true,
+        "mocksOnlySystemBoundaries": true
+      },
       "dependsOn": [],
       "parallel": false,
       "touches": [

package/.claude/skills/cc-plan/assets/TINY_DESIGN_TEMPLATE.md CHANGED Viewed

@@ -32,6 +32,13 @@
 - Current gaps:
 - Spec sync target:
+## Domain Language & Decisions
+- Language sources loaded:
+- Canonical terms used:
+- Language / spec decision conflicts:
+- Decisions worth long-term spec sync:
 ## Frozen Design Card
 - Change:
@@ -45,6 +52,14 @@
 > `tiny-design` 是短设计，不是免设计。没有明确批准状态、验证证据和升级触发条件，就不能继续拆任务。
+## Interface Shape
+- Callers:
+- Public operations:
+- Complexity hidden:
+- Misuse risk:
+- Why this stays simple:
 ## Implementation Surface Map
 | Surface | Responsibility | Why here | Coupling risk |
@@ -55,6 +70,11 @@
 - Test framework source:
 - First failing test:
+- Test seam / public interface:
+- Behavior asserted:
+- Mock boundary:
+- Feedback loop type:
+- Tracer bullet order:
 - Green implementation check:
 - Refactor checkpoint:
 - TDD exceptions:
@@ -84,8 +104,12 @@
 - Scope scan:
 - Ambiguity scan:
 - Feasibility scan:
+- Domain language scan:
 - Implementation surface scan:
+- Interface depth scan:
 - Test framework / regression scan:
+- Test seam / mock boundary scan:
+- Tracer bullet scan:
 - Test-first readiness:
 - Review calibration:
 - Final recommendation:

package/.claude/skills/cc-plan/references/planning-contract.md CHANGED Viewed

@@ -15,8 +15,12 @@
 11. 每个计划必须先找 existing leverage，再决定新增实现；重复已有能力属于 planning 失败。
 12. 同 blast radius 内的完整边界默认纳入，defer 必须写入 `NOT in scope` 和原因。
 13. 如果推荐方案挑战用户原始方向，必须标成 `user challenge`，不能自动改写用户意图。
-14. 行为变更的具体任务默认采用测试先行；没有 Red/Green/Refactor 链或 TDD exception，不允许交给 `cc-do`。
+14. 行为变更的具体任务默认采用测试先行；没有 Red/Green/Refactor 链、公共测试 seam、行为断言、mock 边界或 TDD exception，不允许交给 `cc-do`。
 15. 新 change 目录必须是 `REQ-<number>-<description>` 或 `FIX-<number>-<description>`，不能用小写 `req-*` / `bug-*` 或纯描述目录。
+16. 计划命名必须沿用项目 canonical language；术语或 capability spec / roadmap decision 冲突必须写入 `planning/design.md`，不能在任务里发明第二套语言。
+17. 行为变更任务必须按 tracer bullet 垂直切片组织：一个可观察行为对应一组 Red/Green/Refactor 任务。
+18. Red 任务必须通过公共接口、调用方流程、CLI/API/UI 路径或其它真实 seam 证明行为缺失。
+19. Mock 只能发生在系统边界；mock 内部协作者、私有方法或调用次数属于测试设计失败。
 ## Design Modes
@@ -43,11 +47,17 @@
 - 目标
 - TDD phase：`red` / `green` / `refactor` / `exception`
+- Vertical slice / tracer bullet
+- Test seam / public interface
+- Behavior asserted
+- Mock boundary
+- Feedback loop type
 - 涉及文件
 - 验证方式
 - 完成证据
 行为变更任务必须先有 `[TEST]` 红灯任务，再有 `[IMPL]` 绿灯任务，最后有 `[REFACTOR]` 或明确 refactor checkpoint。纯文档、纯配置、纯生成文件、throwaway prototype 可以例外，但必须写明原因、风险和替代验证。
+不要把计划拆成水平层：一批测试、一批服务、一批 UI。每个切片完成后都应该能证明一个真实行为。
 ## Review Gate
@@ -62,9 +72,13 @@
 7. Existing leverage map
 8. Scope / complexity challenge
 9. Test diagram and failure modes
-10. NOT in scope
-11. Test-first readiness
-12. Final recommendation
+10. Domain language / spec decision conflict scan
+11. Interface depth scan
+12. Test seam / mock boundary scan
+13. Tracer bullet scan
+14. NOT in scope
+15. Test-first readiness
+16. Final recommendation
 如有 UI scope，再补 design review 结论。
 如有 developer-facing scope，再补 DX review 结论。

package/.claude/skills/cc-roadmap/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,19 @@
 # Roadmap Skill Changelog
+## v4.4.1 - 2026-04-28
+- clarify that roadmap language and durable decisions come from cc-devflow native sources: `devflow/specs/`, roadmap/backlog, historical design/analysis, and change metadata
+- remove external context/architecture-decision files from the standard roadmap contract so non-native documentation ecosystems stay optional rather than canonical
+- update roadmap/backlog templates and dialogue prompts to route durable decisions into capability spec deltas, roadmap decision notes, and downstream design handoff
+## v4.4.0 - 2026-04-28
+- absorb strategic grilling discipline into roadmap work: one route-changing question at a time, recommended answer with evidence, and no user question when repo evidence can answer
+- require domain language and durable decision scans before naming stages, capabilities, roadmap items, or backlog handoffs
+- add language / spec decision conflict gates so route recommendations expose terminology and decision drift instead of creating a second conceptual system
+- update roadmap and backlog templates with domain-language and durable-decision handoff sections for downstream `cc-plan`
+- update tracking template skill version to match the enhanced roadmap contract
 ## v4.3.4 - 2026-04-28
 - add planning posture and evidence maturity routing so roadmap questions match idea, user, paying, infra, or recovery contexts

package/.claude/skills/cc-roadmap/PLAYBOOK.md CHANGED Viewed

@@ -19,6 +19,7 @@
 7. 多个独立子系统混在一个目标里时，先拆阶段和 `RM` 候选，不要继续追问实现细节。
 8. 先判断 planning posture 和 evidence maturity，再决定追问哪些问题；不要用同一套问题硬套 idea、已有用户、付费客户、infra 和 recovery 场景。
 9. developer-facing / operator-facing 路线必须写清 target user、time to first value、magic moment 和 adoption bottleneck。
+10. 先对齐 `devflow/specs/`、roadmap/backlog 和历史 design decision，再命名 stage、capability、RM 和 backlog；术语或决策冲突必须成为显式路线风险。
 ## Local Kit
@@ -37,12 +38,13 @@
 1. 现有 `devflow/ROADMAP.md` / `devflow/BACKLOG.md`
 2. `CLAUDE.md`、`README*`、`TODOS.md`
-3. 最近相关 docs / specs / plans
-4. 最近相关提交、当前工作树状态、正在推进的 requirement
-5. 现实 forcing functions：deadline、distribution、资源、依赖、当前卡点
-6. planning posture：startup / internal / hackathon / OSS / research / learning / side-project / infrastructure
-7. evidence maturity：idea / has users / paying users / internal sponsor / infra-only / recovery
-8. developer / operator adoption 线索：目标人、first success path、TTHW / time to first value、debug / upgrade 卡点
+3. 项目语言和持久决策：`devflow/specs/INDEX.md`、相关 capability specs、当前 roadmap/backlog、历史 `planning/design.md` / `planning/analysis.md`、`change-meta.json`、长期 design decision
+4. 最近相关 docs / specs / plans
+5. 最近相关提交、当前工作树状态、正在推进的 requirement
+6. 现实 forcing functions：deadline、distribution、资源、依赖、当前卡点
+7. planning posture：startup / internal / hackathon / OSS / research / learning / side-project / infrastructure
+8. evidence maturity：idea / has users / paying users / internal sponsor / infra-only / recovery
+9. developer / operator adoption 线索：目标人、first success path、TTHW / time to first value、debug / upgrade 卡点
 先把这些材料压成 `Context Snapshot`，再追问用户。
@@ -63,7 +65,7 @@
 8. 当前最大的 adoption / trust / delivery 卡点是什么
 9. 成功与失败的判断信号是什么
-第一轮答案之后做 framing check：术语是否具体、用户是否可命名、pain 是否来自真实行为、status quo 是否明确、需求证据是否强过“感兴趣”。如果答案虚，先收紧问题，不要急着定路线。
+第一轮答案之后做 framing check：术语是否具体、是否沿用项目 canonical language、用户是否可命名、pain 是否来自真实行为、status quo 是否明确、需求证据是否强过“感兴趣”。如果答案虚，先收紧问题，不要急着定路线。
 ## Evidence-Maturity Routing
@@ -146,6 +148,7 @@
 6. 本次版本相比上一版到底改了什么
 7. 问题路由是否匹配 planning posture / evidence maturity
 8. developer-facing / operator-facing item 是否能说明 first value 为什么会发生
+9. stage / RM / capability 命名是否沿用项目语言，或明确记录了需要重开的 language / spec / roadmap decision 冲突
 ## Versioning