npm - @fitlab-ai/agent-infra - Versions diffs - 0.7.3 → 0.7.5 - Mend

@fitlab-ai/agent-infra 0.7.3 → 0.7.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (107) hide show

package/templates/.agents/rules/cli-help-format.en.md ADDED Viewed

@@ -0,0 +1,49 @@
+# CLI help text conventions
+Unify the help text display structure, display name, and command ordering of the `ai` / `agent-infra` CLI so newly added subcommands follow them automatically and never drift across levels again. Read this file before adding or changing CLI help text.
+## Scope
+- **Display name `ai`**: applies to **all** user-facing help / usage / banner text — top-level, namespace-level, and the single-line usage / startup banners of leaf commands such as `merge` / `init` / `update`. The only exceptions: the top-level help first line keeps the brand + version line `agent-infra ${VERSION}`, and `@fitlab-ai/agent-infra` in package names / install commands / repo URLs stays as-is.
+- **Structure & ordering** (`Usage:` + `Commands:` structure, alphabetical command order): applies only to levels that carry a `Commands:` listing — top-level help (`bin/cli.ts`) and namespace-level help (e.g. `ai sandbox` / `ai task`). Leaf commands have only a single-line usage and need no `Commands:` structure.
+## Display name
+- Use **`ai`** as the command display name in help text (the recommended short form; `package.json`'s `bin` registers both `ai` and `agent-infra`).
+- Keep the top-level help first line as the brand + version line `agent-infra ${VERSION} - bootstrap ...` (it is the brand and version marker that several tests anchor on).
+- Keep `@fitlab-ai/agent-infra` in install methods, package names, and repo URLs as-is (those are package names, not command display names).
+## List structure
+Namespace-level and top-level help follow:
+```
+Usage: ai <ns> <command> [options]
+Commands:
+  <command>  <description aligned from two spaces>
+  ...
+Run 'ai <ns> <command> --help' for details.
+```
+- The `Commands:` block uses bare command names (no repeated binary name), two-space indent, descriptions aligned to the longest command name.
+- Namespace-level help ends with a `Run 'ai <ns> <command> --help' for details.` footer.
+- Top-level help has no uniform subcommand `--help` convention, so the footer is not required there; if an `Examples:` section exists, its command display name is also `ai`.
+## Ordering
+Command lists, `Examples`, and command enumerations embedded in descriptions are all sorted by the **first token of the command, in ascending alphabetical order**:
+- Multi-token commands (e.g. `vm status|start|stop`) sort by the first token (`vm`).
+- Commands with angle/square-bracket parameters sort by the command name (the bare word before the parameters).
+- Case-insensitive.
+## Checklist for adding a subcommand
+When adding a subcommand:
+1. Insert the command at the correct alphabetical position in `Commands:`.
+2. If it has examples, insert them at the alphabetical position in `Examples:`.
+3. If a top-level `task` / `sandbox` description has an embedded command enumeration, update its alphabetical order too.
+4. Sync the corresponding help test's **structural** assertions (whether the command appears, whether the `Usage:` / `Commands:` header exists); do not bind to full sentences (see [`testing-discipline.md`](testing-discipline.md)).

package/templates/.agents/rules/cli-help-format.zh-CN.md ADDED Viewed

@@ -0,0 +1,49 @@
+# CLI help 文案约定
+统一 `ai` / `agent-infra` CLI 的 help 文案展示结构、展示名与命令排序，让后续新增子命令自动遵守，避免跨层级再次漂移。新增或调整 CLI help 文案前先读取本文件。
+## 适用范围
+- **展示名 `ai`**：适用于**所有**面向用户的 help / usage / 交互横幅文案——顶层、命名空间级，以及 `merge` / `init` / `update` 等叶子命令的单行 usage 与启动横幅，统一用 `ai`。唯一例外：顶层 help 首行保留品牌 + 版本行 `agent-infra ${VERSION}`；包名 / 安装命令 / 仓库 URL 中的 `@fitlab-ai/agent-infra` 保持原样。
+- **结构与排序**（`Usage:` + `Commands:` 结构、命令按字母序）：仅适用于带 `Commands:` 子清单的层级——顶层 help（`bin/cli.ts`）与命名空间级 help（如 `ai sandbox` / `ai task`）。叶子命令只有单行 usage，无需 `Commands:` 结构。
+## 展示名
+- help 文案中的命令展示名统一用 **`ai`**（推荐简写，`package.json` 的 `bin` 同时注册 `ai` 与 `agent-infra`）。
+- 顶层 help 首行保留品牌 + 版本行 `agent-infra ${VERSION} - bootstrap ...`（这是品牌与版本标识，多处测试锚定它）。
+- 安装方式、包名、仓库 URL 中的 `@fitlab-ai/agent-infra` 等保持原样（是包名而非命令展示名）。
+## 列表结构
+命名空间级与顶层 help 统一为：
+```
+Usage: ai <ns> <command> [options]
+Commands:
+  <command>  <两空格起对齐的描述>
+  ...
+Run 'ai <ns> <command> --help' for details.
+```
+- `Commands:` 块用裸命令名（不重复二进制名），两空格缩进，描述按最长命令名对齐。
+- 命名空间级 help 末尾加 `Run 'ai <ns> <command> --help' for details.` footer。
+- 顶层 help 无统一子命令 `--help` 约定，故不强制加该 footer；如有 `Examples:` 段，命令展示名同样用 `ai`。
+## 排序
+命令清单、`Examples`、描述中内嵌的命令枚举，一律按**命令首 token 的字母升序**排列：
+- 多 token 命令（如 `vm status|start|stop`）按首 token（`vm`）排序。
+- 带尖括号 / 方括号参数的命令按命令名（参数前的裸词）排序。
+- 大小写不敏感。
+## 新增子命令检查清单
+新增一个子命令时：
+1. 把命令插入 `Commands:` 的字母序正确位置。
+2. 如有示例，插入 `Examples:` 的字母序位置。
+3. 若顶层 `task` / `sandbox` 等描述中有内嵌命令枚举，同步更新其字母序。
+4. 同步对应 help 测试的**结构性**断言（命令是否出现、`Usage:` / `Commands:` 头是否存在），不要绑定整句文案（见 [`testing-discipline.md`](testing-discipline.md)）。

package/templates/.agents/rules/debugging-guide.en.md ADDED Viewed

@@ -0,0 +1,25 @@
+# General Rule - Structured Debugging Guide
+> This file defines the structured triage flow for "test failure / behavior not as expected"; SKILLs that modify code in response to failures (e.g. `code-task`, `watch-pr`) load it on demand before attempting a fix.
+## Triggers
+When any of the following happens, run this flow before changing code:
+- A test fails, or a build / type-check / lint error appears
+- Runtime behavior differs from expectations (output, state, or side effects)
+## Core Anti-pattern: No Blind Patch-and-Retry
+The "tweak one spot → rerun → still broken → guess another spot" loop hides the real root cause, introduces new defects, and wastes time. A change with no supporting evidence is not a fix.
+## Four-phase Flow
+1. **Gather evidence**: Read the full error message and stack trace (not just the last line) and pinpoint where it fails; reproduce minimally when needed, and record "actual vs expected behavior".
+2. **Form a hypothesis**: From the evidence, propose a root-cause hypothesis that explains **all** the symptoms rather than a surface symptom; if there are several, rank them by likelihood and testability.
+3. **Verify the hypothesis**: Before changing anything, confirm the hypothesis cheaply—add logging, add a breakpoint, shrink the input, or write a failing test that reproduces it; if it is disproven, return to phase 2.
+4. **Fix the root cause**: Change only the verified root cause (not the symptom), then rerun the relevant tests to confirm they pass; if they still fail, return to phase 1 with the new evidence instead of trial-and-error without evidence.
+## Relation to Project Principles
+This flow is the debugging-specific form of AGENTS.md's "Think Before Coding" and "Goal-Driven Execution": pin the problem with a reproducible failing case first, then make the fix turn it green.

package/templates/.agents/rules/debugging-guide.zh-CN.md ADDED Viewed

@@ -0,0 +1,25 @@
+# 通用规则 - 结构化调试指导
+> 本文件定义「测试失败 / 行为不符合预期」时的结构化排查流程；`code-task`、`watch-pr` 等会因失败而修改代码的 SKILL 在动手修复前按需加载。
+## 触发条件
+出现以下任一情况时，先按本流程排查，再改代码：
+- 测试失败，或构建 / 类型检查 / lint 报错
+- 运行结果与预期不符（输出、状态或副作用异常）
+## 核心反模式：禁止盲目改代码重试
+「改一处 → 重跑 → 还错 → 再猜一处」的循环会掩盖真实根因、引入新缺陷、浪费时间。没有证据支撑的修改不算修复。
+## 四阶段流程
+1. **收集证据**：完整读取错误信息与堆栈（不要只看最后一行），定位失败的具体位置；必要时最小化复现，记录「实际行为 vs 预期行为」。
+2. **形成假设**：基于证据提出能解释**全部**现象的根因假设，而不是停留在表层症状；若有多个假设，按可能性与可验证性排序。
+3. **验证假设**：动手改之前，用最小代价确认假设成立——加日志、加断点、缩小输入，或写一个能复现的失败用例；假设被证伪就回到阶段 2。
+4. **修复根因**：只针对已验证的根因修改（而非症状），改完重跑相关测试确认通过；仍失败则带着新证据回到阶段 1，不在无证据时反复试错。
+## 与项目准则的关系
+本流程是 AGENTS.md「先思考再动手」「目标驱动执行」在调试场景的具体化：先用可复现的失败用例锁定问题，再让修复使其通过。

package/templates/.agents/rules/no-mid-flow-questions.en.md CHANGED Viewed

@@ -1,7 +1,7 @@
 # General Rule - No Mid-Flow Questions During SKILL Execution
 > **Scope**: this rule applies to **all SKILL** executions.
-> Only the two exemption categories below may ask the user; any other mid-flow question is a violation.
+> Only the exemption categories listed below may ask the user; any other mid-flow question is a violation.
 ## Exemption Categories
@@ -27,9 +27,21 @@ SKILLs currently covered by this exemption:
 - `init-labels`: may confirm before deleting legacy labels not in the final mapping
 - `commit`: may stop and confirm when its plan conflicts with the user's uncommitted changes
+### Exemption 3: Entry-point requirement-sufficiency clarification
+Allowed only when a SKILL judges, **at its entry point**, whether the current task's requirement information is sufficient for a reliable analysis; it may then ask the user about the **missing requirement information** to converge the requirements. Constraints:
+- Limited to the `analyze-task` entry point; ask one question at a time and wait for the answer before asking the next;
+- Used only to fill requirement-sufficiency gaps; it must **not** be used to solicit implementation / technical-choice preferences (those still go into the artifact's `## Open Questions` per the default clause);
+- Exit the questioning and proceed to normal analysis once the question budget is reached or the user says "just analyze / skip".
+SKILLs currently covered by this exemption:
+- `analyze-task`: when the task description/requirements are insufficient for a reliable analysis, it may ask questions one at a time at the entry point to converge the requirements
 ## No-Mid-Flow-Questions Clause (default behavior)
-For every SKILL execution context not covered by the two exemptions above, the default behavior is:
+For every SKILL execution context not covered by any exemption above, the default behavior is:
 1. Do not call any user-question tool, including but not limited to `AskUserQuestion` and equivalent mechanisms that ask the user to choose.
 2. When uncertain, proceed with the most robust option without interrupting the flow. Use this priority order:

package/templates/.agents/rules/no-mid-flow-questions.zh-CN.md CHANGED Viewed

@@ -1,7 +1,7 @@
 # 通用规则 - SKILL 执行禁言
 > **适用范围**：本规则适用于**所有 SKILL** 的执行过程。
-> 仅以下两类例外可向用户提问；不属于这两类的发问一律按违规处理。
+> 仅以下列出的例外可向用户提问；不属于这些例外的发问一律按违规处理。
 ## 例外类型
@@ -27,9 +27,21 @@
 - `init-labels`：删除不在最终映射中的旧 label 前可确认
 - `commit`：检测到与用户未提交改动冲突时可停下确认
+### 例外 3：入口式需求充分性澄清
+仅当 SKILL 在**入口处**判断「当前任务的需求信息是否充分到可以可靠分析」时，可就**缺失的需求信息**向用户提问以收敛需求。约束：
+- 仅限 `analyze-task` 入口；一次只问一个问题，等用户回答后再问下一个；
+- 仅用于补齐需求充分性，**不得**借此征求实现方案 / 技术选型偏好（这些仍按禁言铁律写入产物的 `## 未决问题`）；
+- 达到提问预算上限或用户表示「直接分析 / skip」即退出提问，进入正常分析。
+当前归入本例外的 SKILL：
+- `analyze-task`：任务描述/需求信息不足以支撑可靠分析时，可在入口处逐个提问收敛需求
 ## 禁言条款（默认行为）
-不属于上述两类例外的所有 SKILL 执行场景，遵循以下默认行为：
+不属于上述任一例外的所有 SKILL 执行场景，遵循以下默认行为：
 1. **禁止调用**任何向用户发问的工具（包括但不限于 `AskUserQuestion` 及等价的「征求用户选择」机制）。
 2. **不确定时**，按「最稳健方案」自主推进，不中断对话。最稳健方案的判定优先级：

package/templates/.agents/rules/pr-sync.github.en.md CHANGED Viewed

@@ -32,7 +32,11 @@ Aggregation rules:
 - build the review-history table from `review-code*` and `code*`
 - extract the test summary from `code*`
 - if one artifact class is missing, treat it as "no data for this stage" and continue
-- Manual verification section: extract items requiring human confirmation/fallback from the "Assumptions"/"Open Questions" of the latest `plan*` and the "Environment-Blocked Findings"/"Self-Doubt" sections (i.e. env-blocked items) of the latest `review-code*`; when there are none, write the explicit placeholder `- None — no items require manual verification`, never leave it empty
+- Manual verification section: include only post-code-stage checks that still require a human to execute or judge and that the AI cannot close on its own.
+  - **Admission boundary**: the verification result depends on a real environment, permissions, account, external system, or human judgment, and cannot be closed by an agent rerunning tests, adding checks, or continuing the fix loop.
+  - **Sources**: `review-code*` "Environment-Blocked Findings", plus `code*` items that satisfy the boundary above.
+  - **Wording**: each retained item must state at least "what to verify + location (file/change/scope) + why only a human can verify it".
+  - **Empty rendering**: when there are no retained items, do NOT use the ⚠️ alarm style (it falsely implies a problem). Render the whole block as: heading `### ✅ No Manual Verification Needed` and a single line `No items in this change require manual confirmation.`, with no item list. Only use the `### ⚠️ Manual Verification Required` heading + item list when retained items exist.
 ## Comment Body Template
@@ -47,11 +51,7 @@ Use this canonical comment body template:
 **Updated At**: {current-time}
-### ⚠️ Manual Verification Required
-> Items in this change that need human confirmation/fallback; reviewers can reply under this comment once verified.
-- {manual-verify-item}
+{manual-verify-section}
 ### Key Technical Decisions
@@ -72,6 +72,8 @@ Use this canonical comment body template:
 *Generated by {agent} · Internal tracking: {task-id}*
 ```
+> Render `{manual-verify-section}` per the "manual verification section" aggregation rule above: with retained items → `### ⚠️ Manual Verification Required` heading + quote + item list; with none → `### ✅ No Manual Verification Needed` heading + a single line `No items in this change require manual confirmation.` (no ⚠️, no list).
 ## Comment Lookup And Update
 Fetch existing comments through the Issues comments API, not the dedicated PR comments API.

package/templates/.agents/rules/pr-sync.github.zh-CN.md CHANGED Viewed

@@ -32,7 +32,11 @@
 - 用 `review-code*` 与 `code*` 构建审查历程表
 - 从 `code*` 提取测试结果摘要
 - 某一类产物缺失时，按“无该阶段数据”处理并继续生成
-- 需人工校验段落：从最新 `plan*` 的「假设」「未决问题」与最新 `review-code*` 的「环境性遗留」「自我质疑」提取需人工确认/兜底事项；无任何事项时写显式占位 `- 无需人工校验事项`，不得留空
+- 需人工校验段落：只收进入 code 阶段后仍需人实际执行或判断、AI 无法自行关闭的校验点。
+  - **准入边界**：校验结论依赖真实环境、权限、账号、外部系统或人工判断，且无法通过 agent 重跑测试、补充检查或继续修复自行关闭。
+  - **来源**：`review-code*` 的「环境性遗留」，以及 `code*` 中满足上述边界的校验点。
+  - **写法**：每条保留项至少写明「校验什么 + 定位（文件/改动/范围）+ 为什么只能由人校验」。
+  - **空集渲染**：无保留项时，不要使用 ⚠️ 告警样式（会让人误以为有问题）。整段降级渲染为：标题 `### ✅ 无需人工校验`，正文一行 `本次改动无需人工确认事项。`，不带条目列表。有保留项时才用 `### ⚠️ 需人工校验` 标题 + 条目列表。
 ## 评论体模板
@@ -47,11 +51,7 @@
 **更新时间**：{当前时间}
-### ⚠️ 需人工校验
-> 本次改动中需人工确认/兜底的事项；reviewer 校验后可在本评论下回复收尾。
-- {manual-verify-item}
+{manual-verify-section}
 ### 关键技术决策
@@ -72,6 +72,8 @@
 *由 {agent} 自动生成 · 内部追踪：{task-id}*
 ```
+> `{manual-verify-section}` 按上文「需人工校验段落」聚合规则渲染：有保留项 → `### ⚠️ 需人工校验` 标题 + 引用说明 + 条目列表；无保留项 → `### ✅ 无需人工校验` 标题 + 一行 `本次改动无需人工确认事项。`（不带 ⚠️、不带列表）。
 ## 评论查找与更新
 已有评论必须通过 Issues comments API 获取，而不是单独的 PR comments API。

package/templates/.agents/rules/review-handshake.en.md ADDED Viewed

@@ -0,0 +1,83 @@
+# Bidirectional Review Handshake Protocol
+> Shared by executor and reviewer across all three stages (analysis / plan / code) when running the `review-*` and `*-task` skills.
+> This file is the **single source of truth** for the protocol; each SKILL only `Read`s it and never re-copies the vocabulary.
+## Core principles
+- **A review finding is input to be verified, not a command to execute.** The executor must verify each finding before disposing of it — neither rubber-stamping nor blindly refuting.
+- **Symmetric evidence burden**: every disposition, whether accept or refute, must carry **commensurate evidence**. "Accept" is not a zero-cost default path.
+- **Converge before advancing**: while any unclosed disagreement, alternative fix, cannot-judge, or post-review commit exists, do not silently advance to the next stage, archive, or merge.
+## Executor four-state disposition (`*-task` skills, when responding to the prior review round in Round ≥ 2)
+For each finding in the latest `review-*`, first Read/Grep the cited `file:line` / command, then assign one status:
+| Status | Meaning | Required evidence |
+|--------|---------|-------------------|
+| `accepted` | Valid; will fix as suggested | `file:line` of the fix, or the change to be applied this round |
+| `adjusted` | Valid, but an alternative fix is used | the alternative + why it is better; awaits reviewer confirmation |
+| `refuted` | After verification, judged invalid / hallucinated / based on a wrong `file:line` | counter-evidence (`file:line` or raw command output); awaits reviewer confirmation |
+| `cannot-judge` | Insufficient evidence to decide | the verification path attempted; handed to reviewer/human |
+## Reviewer hand-back duty (`review-*` skills, when re-reviewing the executor response)
+After the executor gives `adjusted` / `refuted` / `cannot-judge`, the reviewer must respond per item — never re-reading the original finding nor ignoring the hand-back:
+- **Withdraw the finding** → set the ledger row to `confirmed` (accepts the refutation).
+- **Accept the alternative fix** → set to `confirmed`.
+- **Hold with new evidence** → set back to `open` (with new evidence, returned to the executor).
+- **Escalate to human** → set to `needs-human-decision`.
+## Convergence termination (loop guard)
+- The per-finding handshake round limit is `MAX_HANDSHAKE_ROUNDS`, default **3**, overridable via `review.maxHandshakeRounds` in `.agents/.airc.json`.
+- When a finding's `round` reaches the limit without entering a terminal state, it must be forced to `needs-human-decision`; the gate rejects rows that hit the limit without escalating.
+- `needs-human-decision` keeps blocking completion until a human records a ruling in the task.md `## 人工裁决` section and flips the row to `human-decided`.
+## Same-model convergence-bias mitigation (documentation-level discipline)
+The executor and reviewer are often the same/similar model and are naturally inclined to agree. When reviewing:
+1. **Read the evidence before the conclusion**: read the `git diff` / artifact itself and form findings independently **before** reading the executor's conclusions and responses, to avoid being anchored.
+2. **Default-skeptical framing**: treat "looks fine" as unverified; every clearance needs reproducible evidence (see the `Evidence` hard gate in each `review-*`).
+> The only mechanical lever is the **symmetric-evidence gate** (non-`open` ledger rows must carry evidence); model homogeneity itself is not mechanically checkable, so this section is discipline rather than a gate.
+## Mechanical ledger (task.md `## 审查分歧账本`)
+The single source of truth for disagreement state is the fixed `## 审查分歧账本` section in task.md — one parseable Markdown table. The phase-advance and `complete-task` gates read this section.
+```markdown
+## 审查分歧账本
+<!-- One row per review finding; state machine / evidence rules in .agents/rules/review-handshake.md. The phase-advance and complete-task gates read this section. -->
+| id | stage | round | severity | status | evidence |
+|----|-------|-------|----------|--------|----------|
+| CD-1 | code | 1 | blocker | open | review-code.md#1 |
+```
+- `id`: stage prefix + ordinal — analysis→`AN-`, plan→`PL-`, code→`CD-`.
+- `stage` ∈ `{analysis, plan, code}` (plus the reserved value `post-review-commit`, used only for post-review exemption rows).
+- `status` legal enum: `open` / `accepted` / `adjusted` / `refuted` / `cannot-judge` / `confirmed` / `needs-human-decision` / `closed` / `human-decided`.
+- **Terminal set (gate passes)**: `{confirmed, closed, human-decided}`; everything else is blocking.
+- **Write responsibility**: `review-*` raises a finding → upsert an `open` row; `*-task` responds → set four-state and fill `evidence`, `round` +1; next `review-*` → `confirmed` / back to `open` / `needs-human-decision`; an executor fix verified by the next review → `closed`; a human ruling → `human-decided`.
+- **Backward compatible**: when task.md has no such section the gate treats it as no open disagreements and passes.
+## post-review commit gate (code stage only)
+- The highest-round `review-code` report records `Review Baseline Commit` (R, `git rev-parse HEAD`) and `Reviewed Diff Fingerprint` (F, full worktree diff fingerprint).
+- `commit` reads only the highest-round `review-code` artifact. When that artifact is Approved, the pre-commit HEAD equals R, and the staged diff fingerprint equals F, task.md receives `last_reviewed_commit` (B, the new commit SHA).
+- The `complete-task` `post-review-commit` gate prefers B; when B is absent or invalid, it falls back to R from the highest-round `review-code` artifact.
+- If new commits touch code / rule paths after B / R, the gate blocks and requires a fresh `review-code`.
+- **Exemption**: append a ledger row `| PRC-1 | post-review-commit | - | - | human-decided | <ruling note> |` recording that a human explicitly allowed those commits without re-review.
+## Gate behavior cheat sheet
+| Caller | `review-ledger` scope | `post-review-commit` |
+|--------|-----------------------|----------------------|
+| `plan-task` | only `analysis`-stage rows must be terminal | not attached |
+| `code-task` | `analysis` + `plan`-stage rows must be terminal | not attached |
+| `complete-task` | all stage rows must be terminal | attached (see above) |
+| `analyze-task` | not attached (first stage) | not attached |

package/templates/.agents/rules/review-handshake.zh-CN.md ADDED Viewed

@@ -0,0 +1,83 @@
+# 双向审查握手协议
+> 三阶段（analysis / plan / code）的执行方与检视方在执行 `review-*` 与 `*-task` 技能时共用本协议。
+> 这是协议的**单一事实源**；各 SKILL 只 `Read` 本文件，不重复抄写词表。
+## 核心原则
+- **检视意见是待验证输入，不是执行命令**。执行方必须逐条核实后再处置，不默认认账、不盲目反驳。
+- **对称证据负担**：无论接受还是反驳，每条处置都要附**相称证据**。"接受"不是零成本默认路径。
+- **达成一致再推进**：存在未关闭分歧、替代修法、无法判断或 review 后新增提交时，不得静默进入下一阶段、归档或合并。
+## 执行方四态处置（`*-task` 技能，Round ≥ 2 响应上一轮审查时）
+对上一轮 `review-*` 的每条 finding，先 Read/Grep 核实其引用的 `file:line` / 命令，再落一个状态：
+| 状态 | 含义 | 必附证据 |
+|------|------|----------|
+| `accepted` | 成立，将按建议修复 | 指向修复点的 `file:line` 或本轮将施加的改动说明 |
+| `adjusted` | 成立，但采用替代修法 | 替代修法说明 + 为何更优；待检视方确认 |
+| `refuted` | 核实后判定不成立 / 幻觉 / 基于错误 `file:line` | 反证（`file:line` 或命令原文）；待检视方确认 |
+| `cannot-judge` | 证据不足，无法判断 | 已尝试的核实路径；交检视方/人工 |
+## 检视方回交义务（`review-*` 技能，对执行方响应复核时）
+执行方给出 `adjusted` / `refuted` / `cannot-judge` 后，检视方必须逐条回应，不得复读原意见或无视：
+- **撤回 finding** → 账本置 `confirmed`（接受反驳）。
+- **接受替代修法** → 账本置 `confirmed`。
+- **补充新证据后坚持** → 账本置回 `open`（带新证据，回到执行方）。
+- **升级人工裁决** → 账本置 `needs-human-decision`。
+## 收敛终止语义（防死循环）
+- 单条 finding 的握手轮次上限 `MAX_HANDSHAKE_ROUNDS`，默认 **3**，可在 `.agents/.airc.json` 的 `review.maxHandshakeRounds` 覆盖。
+- 某条 finding 的 `round` 达到上限仍未进入终态，必须强制置 `needs-human-decision`；gate 会拦截"达限却未升级"的行。
+- `needs-human-decision` 持续阻塞完成，直到人工在 task.md `## 人工裁决` 段记录裁定并把该行翻为 `human-decided`。
+## 同源模型收敛偏差缓解（文档级纪律）
+执行方与检视方常由相近模型承担，天然容易互相同意。检视时遵守：
+1. **先看证据再看结论**：先读 `git diff` / 产物本体并独立形成 findings，**再**读执行方的结论与响应，避免被其结论锚定。
+2. **默认怀疑框架**：把"看起来没问题"视为未验证；每条放行都要有可复现证据支撑（见各 `review-*` 的 `证据原文` 段硬门禁）。
+> 唯一的机械杠杆是**对称证据 gate**（账本非 `open` 行必须有证据）；模型同源性本身不可机械校验，故本节为纪律而非门禁。
+## 机械账本（task.md `## 审查分歧账本`）
+分歧状态的**单一事实源**是 task.md 的固定段 `## 审查分歧账本`，单张可解析表。阶段推进与 `complete-task` 的 gate 读取本段。
+```markdown
+## 审查分歧账本
+<!-- 每条 review finding 一行；状态机/证据规则见 .agents/rules/review-handshake.md。阶段推进与 complete-task gate 读取本段。 -->
+| id | stage | round | severity | status | evidence |
+|----|-------|-------|----------|--------|----------|
+| CD-1 | code | 1 | blocker | open | review-code.md#1 |
+```
+- `id`：阶段前缀 + 序号——analysis→`AN-`、plan→`PL-`、code→`CD-`。
+- `stage` ∈ `{analysis, plan, code}`（外加保留值 `post-review-commit`，仅用于 post-review 豁免行）。
+- `status` 合法枚举：`open` / `accepted` / `adjusted` / `refuted` / `cannot-judge` / `confirmed` / `needs-human-decision` / `closed` / `human-decided`。
+- **终态集合（gate 放行）**：`{confirmed, closed, human-decided}`；其余为阻塞态。
+- **写入责任**：`review-*` 提 finding → upsert `open` 行；`*-task` 响应 → 改四态并填 `evidence`、`round` +1；下一轮 `review-*` → `confirmed` / 置回 `open` / `needs-human-decision`；执行方修复经下一轮 review 验证通过 → `closed`；人工裁决 → `human-decided`。
+- **向后兼容**：task.md 无此段时，gate 视为无未决分歧而放行。
+## post-review commit 门禁（仅 code 阶段）
+- `review-code` 在最高轮报告中记录 `审查基线提交`（R，`git rev-parse HEAD`）和 `审查差异指纹`（F，完整工作区 diff fingerprint）。
+- `commit` 只读取最高轮 `review-code` 产物；当该产物 Approved、提交前 HEAD 等于 R、且 staged diff fingerprint 等于 F 时，在 task.md 写入 `last_reviewed_commit`（B，新提交 SHA）。
+- `complete-task` 的 `post-review-commit` gate 优先使用 B；B 缺失或非法时回退最高轮 `review-code` 的 R。
+- 若 B / R 之后代码 / 规则路径出现新提交，gate 会拦截，要求重新 `review-code`。
+- **豁免**：在账本追加一行 `| PRC-1 | post-review-commit | - | - | human-decided | <裁定说明> |`，记录人工明确允许该批提交免复审。
+## gate 行为速查
+| 调用方 | `review-ledger` 作用域 | `post-review-commit` |
+|--------|------------------------|----------------------|
+| `plan-task` | 仅 `analysis` 阶段行须终态 | 不挂 |
+| `code-task` | `analysis` + `plan` 阶段行须终态 | 不挂 |
+| `complete-task` | 全部阶段行须终态 | 挂（见上） |
+| `analyze-task` | 不挂（首阶段） | 不挂 |

package/templates/.agents/scripts/lib/post-review-commit.js ADDED Viewed

@@ -0,0 +1,56 @@
+import fs from "node:fs";
+import path from "node:path";
+import { artifactName, maxRound } from "./review-artifacts.js";
+export const DEFAULT_POST_REVIEW_GLOBS = [
+  ".agents/skills",
+  ".agents/scripts",
+  ".agents/rules",
+  ".agents/workflows",
+  "bin",
+  "lib",
+  "src",
+  "templates"
+];
+export function resolvePostReviewGlobs(config = {}, reviewConfig = {}) {
+  if (Array.isArray(config.post_review_globs)) {
+    return config.post_review_globs;
+  }
+  if (Array.isArray(reviewConfig.post_review_globs)) {
+    return reviewConfig.post_review_globs;
+  }
+  return DEFAULT_POST_REVIEW_GLOBS;
+}
+export function findAuthoritativeReviewCodeArtifact(taskDir) {
+  const entries = fs.existsSync(taskDir) ? fs.readdirSync(taskDir) : [];
+  const round = maxRound(entries, "review-code");
+  if (round === 0) {
+    return { ok: false, round: 0, fileName: null, path: null };
+  }
+  const fileName = artifactName("review-code", round);
+  return {
+    ok: true,
+    round,
+    fileName,
+    path: path.join(taskDir, fileName)
+  };
+}
+export function extractReviewBaseline(content) {
+  const match = String(content).match(/^[-*]?\s*\*\*(?:审查基线提交|Review Baseline Commit)\*\*[:：]\s*(.*?)\s*$/m);
+  return match ? match[1].trim().replace(/`/g, "") : "";
+}
+export function extractReviewDiffFingerprint(content) {
+  const match = String(content).match(/^[-*]?\s*\*\*(?:审查差异指纹|Reviewed Diff Fingerprint)\*\*[:：]\s*(.*?)\s*$/m);
+  return match ? match[1].trim().replace(/`/g, "") : "";
+}
+export function parseReviewVerdict(content) {
+  const match = String(content).match(/^[-*]?\s*\*\*(?:总体结论|Overall Verdict)\*\*[:：]\s*(.*?)\s*$/m);
+  return match ? match[1].trim() : "";
+}

package/templates/.agents/scripts/lib/review-artifacts.js ADDED Viewed

@@ -0,0 +1,117 @@
+// Shared helpers for review-artifact parsing.
+// Imported by both .agents/skills/code-task/scripts/detect-mode.js and
+// .agents/scripts/validate-artifact.js so the round/verdict vocabulary stays
+// in a single source of truth (prevents the cross-file drift this lifecycle
+// is designed to eliminate).
+import fs from "node:fs";
+import path from "node:path";
+export function escapeRegExp(value) {
+  return value.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
+}
+export function maxRound(entries, stem) {
+  let max = 0;
+  for (const entry of entries) {
+    if (entry === `${stem}.md`) {
+      max = Math.max(max, 1);
+      continue;
+    }
+    const match = entry.match(new RegExp(`^${escapeRegExp(stem)}-r(\\d+)\\.md$`));
+    if (match) {
+      max = Math.max(max, Number(match[1]));
+    }
+  }
+  return max;
+}
+export function artifactName(stem, round) {
+  return round === 1 ? `${stem}.md` : `${stem}-r${round}.md`;
+}
+export function normalizeVerdict(raw) {
+  const value = String(raw).trim().toLowerCase();
+  if (value === "通过" || value === "approved") {
+    return "Approved";
+  }
+  if (value === "需要修改" || value === "changes requested") {
+    return "Changes Requested";
+  }
+  if (value === "拒绝" || value === "rejected") {
+    return "Rejected";
+  }
+  return "";
+}
+export function extractSection(content, names) {
+  const lines = content.split(/\r?\n/);
+  const nameSet = new Set(names);
+  const start = lines.findIndex((line) => {
+    const match = line.trim().match(/^##\s+(.+?)\s*$/);
+    return match ? nameSet.has(match[1]) : false;
+  });
+  if (start === -1) {
+    return "";
+  }
+  const sectionLines = [];
+  for (let index = start + 1; index < lines.length; index += 1) {
+    if (/^##\s+/.test(lines[index])) {
+      break;
+    }
+    sectionLines.push(lines[index]);
+  }
+  return sectionLines.join("\n");
+}
+// Parse the canonical verdict out of a review-* artifact.
+// Returns { ok, verdict, message }. Verdict collapses Approved into
+// "Approved-with-issues" when the findings counts are non-zero.
+export function parseVerdict(reviewPath) {
+  if (!fs.existsSync(reviewPath)) {
+    return { ok: false, verdict: null, message: `Review artifact not found: ${path.basename(reviewPath)}` };
+  }
+  const content = fs.readFileSync(reviewPath, "utf8");
+  const summary = extractSection(content, ["审查摘要", "Review Summary"]);
+  const fileName = path.basename(reviewPath);
+  if (!summary) {
+    return { ok: false, verdict: null, message: `cannot locate review summary section in ${fileName}` };
+  }
+  const verdictMatch = summary.match(/^[-*]?\s*\*\*(?:总体结论|Overall Verdict)\*\*[:：]\s*(.+?)\s*$/im);
+  if (!verdictMatch) {
+    return { ok: false, verdict: null, message: `cannot parse verdict in ${fileName}` };
+  }
+  const verdict = normalizeVerdict(verdictMatch[1]);
+  if (!verdict) {
+    return {
+      ok: false,
+      verdict: null,
+      message: `unrecognized verdict '${verdictMatch[1].trim()}' in ${fileName}`
+    };
+  }
+  if (verdict !== "Approved") {
+    return { ok: true, verdict };
+  }
+  const findingsMatch = summary.match(/^[-*]?\s*\*\*(?:发现（AI 可处理）|Findings \(AI-actionable\))\*\*[:：]\s*(.+?)\s*$/im);
+  if (!findingsMatch) {
+    return { ok: false, verdict, message: `cannot parse findings count in ${fileName}` };
+  }
+  const counts = findingsMatch[1].match(/(\d+)\s*(?:阻塞项|blockers?).*?(\d+)\s*(?:主要|majors?).*?(\d+)\s*(?:次要|minors?)/i);
+  if (!counts) {
+    return { ok: false, verdict, message: `cannot parse findings count in ${fileName}` };
+  }
+  const [, blockers, majors, minors] = counts.map(Number);
+  return {
+    ok: true,
+    verdict: blockers === 0 && majors === 0 && minors === 0 ? "Approved" : "Approved-with-issues"
+  };
+}