npm - kc-beta - Versions diffs - 0.7.5 → 0.8.3 - Mend

kc-beta 0.7.5 → 0.8.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (81) hide show

package/README.md +47 -0
package/package.json +3 -2
package/src/agent/context.js +17 -1
package/src/agent/engine.js +467 -100
package/src/agent/llm-client.js +24 -1
package/src/agent/pipelines/_advance-hints.js +92 -0
package/src/agent/pipelines/_milestone-derive.js +325 -20
package/src/agent/pipelines/skill-authoring.js +49 -3
package/src/agent/tools/agent-tool.js +2 -2
package/src/agent/tools/consult-skill.js +15 -0
package/src/agent/tools/dashboard-render.js +48 -1
package/src/agent/tools/document-parse.js +31 -2
package/src/agent/tools/phase-advance.js +17 -13
package/src/agent/tools/release.js +343 -7
package/src/agent/tools/sandbox-exec.js +65 -8
package/src/agent/tools/worker-llm-call.js +95 -15
package/src/agent/workspace.js +25 -4
package/src/cli/components.js +4 -1
package/src/cli/index.js +125 -8
package/src/config.js +19 -2
package/src/marathon/driver.js +217 -0
package/src/marathon/prompts.js +93 -0
package/template/.env.template +17 -1
package/template/AGENT.md +2 -2
package/template/skills/en/auto-model-selection/SKILL.md +55 -35
package/template/skills/en/bootstrap-workspace/SKILL.md +27 -0
package/template/skills/en/compliance-judgment/SKILL.md +14 -0
package/template/skills/en/confidence-system/SKILL.md +30 -8
package/template/skills/en/corner-case-management/SKILL.md +53 -33
package/template/skills/en/cross-document-verification/SKILL.md +88 -83
package/template/skills/en/dashboard-reporting/SKILL.md +91 -66
package/template/skills/en/dashboard-reporting/scripts/generate_dashboard.py +1 -1
package/template/skills/en/data-sensibility/SKILL.md +19 -12
package/template/skills/en/document-chunking/SKILL.md +99 -15
package/template/skills/en/entity-extraction/SKILL.md +14 -4
package/template/skills/en/quality-control/SKILL.md +23 -0
package/template/skills/en/rule-extraction/SKILL.md +92 -94
package/template/skills/en/rule-extraction/references/chunking-strategies.md +7 -78
package/template/skills/en/skill-authoring/SKILL.md +85 -2
package/template/skills/en/skill-creator/SKILL.md +25 -3
package/template/skills/en/skill-to-workflow/SKILL.md +73 -1
package/template/skills/en/task-decomposition/SKILL.md +1 -1
package/template/skills/en/tree-processing/SKILL.md +1 -1
package/template/skills/en/version-control/SKILL.md +15 -0
package/template/skills/en/work-decomposition/SKILL.md +52 -32
package/template/skills/phase_skills.yaml +5 -0
package/template/skills/zh/auto-model-selection/SKILL.md +54 -33
package/template/skills/zh/bootstrap-workspace/SKILL.md +27 -0
package/template/skills/zh/compliance-judgment/SKILL.md +51 -37
package/template/skills/zh/compliance-judgment/references/output-format.md +62 -62
package/template/skills/zh/confidence-system/SKILL.md +34 -9
package/template/skills/zh/corner-case-management/SKILL.md +71 -104
package/template/skills/zh/cross-document-verification/SKILL.md +90 -195
package/template/skills/zh/cross-document-verification/references/contradiction-taxonomy.md +36 -36
package/template/skills/zh/dashboard-reporting/SKILL.md +82 -232
package/template/skills/zh/dashboard-reporting/scripts/generate_dashboard.py +1 -1
package/template/skills/zh/data-sensibility/SKILL.md +13 -0
package/template/skills/zh/document-chunking/SKILL.md +101 -18
package/template/skills/zh/document-parsing/SKILL.md +65 -65
package/template/skills/zh/document-parsing/references/parser-catalog.md +26 -26
package/template/skills/zh/entity-extraction/SKILL.md +78 -68
package/template/skills/zh/evolution-loop/references/convergence-guide.md +38 -38
package/template/skills/zh/quality-control/SKILL.md +23 -0
package/template/skills/zh/quality-control/references/qa-layers.md +65 -65
package/template/skills/zh/quality-control/references/sampling-strategies.md +49 -49
package/template/skills/zh/rule-extraction/SKILL.md +199 -188
package/template/skills/zh/rule-extraction/references/chunking-strategies.md +5 -78
package/template/skills/zh/skill-authoring/SKILL.md +136 -58
package/template/skills/zh/skill-authoring/references/skill-format-spec.md +39 -39
package/template/skills/zh/skill-creator/SKILL.md +215 -201
package/template/skills/zh/skill-creator/references/schemas.md +60 -60
package/template/skills/zh/skill-to-workflow/SKILL.md +73 -1
package/template/skills/zh/skill-to-workflow/references/worker-llm-catalog.md +24 -24
package/template/skills/zh/task-decomposition/SKILL.md +1 -1
package/template/skills/zh/task-decomposition/references/decision-matrix.md +54 -54
package/template/skills/zh/tree-processing/SKILL.md +67 -63
package/template/skills/zh/version-control/SKILL.md +15 -0
package/template/skills/zh/version-control/references/trace-id-spec.md +34 -34
package/template/skills/zh/work-decomposition/SKILL.md +52 -30
package/template/workflows/common/llm_client.py +168 -0
package/template/workflows/common/utils.py +132 -0

package/template/skills/zh/tree-processing/SKILL.md CHANGED Viewed

@@ -12,62 +12,65 @@ description: >
 # Tree Processing
-Most verification rules do not need the entire document. They need a specific section, a specific table, a specific disclosure. The tree is your map for navigating large documents efficiently.
+绝大多数验证规则并不需要整篇文档。它们只需要某个特定的章节、某张特定的表格、或某条特定的披露内容。树就是你在大型文档中高效导航的地图。把一份动辄数百页、上千页的法规摊在工作 LLM 面前,既装不进上下文窗口,也会被无关内容稀释关键事实。建好树之后,验证就从"在整片汪洋里捞针",收敛为"先按图找到房间,再在房间里翻箱倒柜"。
-## Production Chunking Methodology
+## 生产级分块方法论
-For verification workflows that process many documents, the chunking mechanism must be precise, consistent, and fast. The approach:
+对于需要处理大量文档的验证工作流而言,分块机制必须做到精确、一致且快速。"精确"意味着同一份文档被切出的边界总是落在正确的位置;"一致"意味着今天切和明天切的结果一模一样;"快速"意味着不会成为流水线里的瓶颈。要同时满足这三点,几乎只有"用代码固化结构规律"这一条路径。基本路径如下:
-1. **Observe**: Read 3-5 sample documents. Note their structure — headers, numbering, section patterns.
-2. **Find patterns**: Identify what's consistent (header format, numbering convention, TOC structure).
-3. **Write code**: Design a chunking script (regex-based splitter, header detector, TOC parser) that captures the pattern.
-4. **Test**: Run the script on samples. Verify it produces correct, consistent chunks.
-5. **Deploy**: The script runs in production workflows. It's deterministic, free, and fast.
+1. **观察**:阅读 3 到 5 份样本文档。记录其结构特征——标题样式、编号方式、章节划分规律。不要只看一份就动手,小样本的偶然格式会把你引向一段过拟合的脆弱正则。
+2. **找出模式**:识别那些保持一致的元素(标题格式、编号约定、目录结构)。同时把不一致的部分单独列出来,这些就是后续脚本需要兜底处理的边缘情形。
+3. **编写代码**:设计一段分块脚本(基于正则的切分器、标题检测器、目录解析器),用代码固化所发现的模式。脚本应当是确定性的、可重入的,并且对输入文本的小幅扰动具有鲁棒性。
+4. **测试**:在样本上运行脚本。验证它产出的分块结果与你人工标注的边界一致。如果出现错切、漏切或越切,先回到第 1 步补充观察样本,再来调整规则。
+5. **部署**:脚本在生产工作流中正式运行。它是确定性的、零成本的、且执行迅速。一份脚本写好,就可以服务于同类型文档的全部后续处理。
-This is different from `document-chunking` (quick, cheap splits for exploration). Production chunking is a one-time design effort that pays off across all documents of the same type.
+这与 `document-chunking` 不同(后者用于探索阶段的快速、低成本切分)。生产级分块是一次性的设计投入,但其收益会在同类型文档的所有处理过程中持续兑现。换言之,前者是面向"我先粗略看看这文档大致长什么样"的临时手段,后者是面向"我每天都要处理一千份这种文档"的工程资产。
-## Why Trees
+## 为什么要使用树
-Two reasons:
+两条理由,都很硬:
-1. **Rules have scope.** "The risk disclosure in Chapter 5 must contain..." — you need to find Chapter 5, not read 1000 pages.
-2. **Worker LLMs have limits.** A 16K-32K context window cannot hold a 1000-page document. You must narrow to the relevant section.
+1. **规则带有作用域。**"第 5 章中的风险披露必须包含……"——你需要定位第 5 章,而不是把 1000 页全部读一遍。验证类规则几乎天生就是带作用域的:它要么针对某个章节,要么针对某张表,要么针对某条具体的条款。把规则原原本本喂给一份完整文档,等于在让 LLM 自己先承担"找位置"这件本不该由它承担的工作。
+2. **工作 LLM 有上下文上限。**16K 到 32K 的上下文窗口装不下一份 1000 页的文档。你必须把范围收窄到相关章节。即便是更大的上下文窗口,只要你把无关内容也一起喂进去,准确率就会被稀释,延迟会上升,Token 成本也会随之上涨。
-The tree structure solves both: it tells you WHERE things are, and lets you extract JUST what you need.
+树结构同时解决了这两个问题:它告诉你"东西在哪里",并让你只抽取"你真正需要的那部分"。从工程视角看,树是文档的索引;从语义视角看,树是规则与文本之间的桥梁。把这座桥建好,后续每一条规则的验证都会变得简单、可靠、可解释。
-## Building the Tree
+## 构建树
-### Step 1: Discover the Structure
+### 步骤 1:发现结构
-Before building a tree parser, explore several sample documents to find structural patterns. Look for:
+在动手实现树解析器之前,先去探查几份样本文档,找出其结构上的规律。这一步看似只是"翻一翻文档",但它直接决定了后续所有工程决策的下限。请把它当作严肃的需求调研来做,而不是顺手扫一眼。关注以下要素:
-- **Header conventions**: Do chapters start with "Chapter X"? "第X章"? "Part X"? A Roman numeral?
-- **Numbering systems**: "1.1.2", "Article 3", "(a)(i)", hierarchical numbering?
-- **Visual markers**: Bold text, larger font, horizontal rules, page breaks before chapters?
-- **Table of contents**: Most formal documents have one. It is the document's own tree.
+- **标题约定**:章是以 "Chapter X" 开头?"第X章"?"Part X"?还是罗马数字?同一份文档中,顶层与子层的标题格式是否一致?中英混排的文档是否两种约定并存?
+- **编号体系**:"1.1.2"、"Article 3"、"(a)(i)"、还是层级化的编号?编号是否每章重置?是否存在跨章共享的全局编号?编号缺位或跳号的情况是个例还是规律?
+- **视觉标记**:加粗字体、更大的字号、水平分隔线、章节前的分页符?这些信息在转成纯文本以后是否还能保留?如果输入是 PDF 解析后的文本,是否需要先在更早的环节注入这些标记?
+- **目录(TOC)**:大多数正式文档都带有目录。它本身就是这份文档自带的树。目录还能告诉你页码区间、官方的层级深度、以及哪些标题是法定的、哪些是排版插入的。
-Spend time here. The patterns you find determine whether the tree builder is a simple regex or a complex parser.
+在这一步多花点时间。你找到的模式将直接决定:树构建器是一段简单的正则,还是一个复杂的解析器。经验上,凡是受监管发布的法规文档,几乎都遵循同一套排版规范;凡是来自不同机构的合并文档,则常常需要把多套规则同时纳入解析器的考量。
-### Step 2: Choose the Parser
+### 步骤 2:选择解析器
-**If patterns are consistent** (they usually are in regulated documents):
-- Write a regex-based splitter. For example:
-  - `^第[一二三四五六七八九十百千]+章` for Chinese chapter headers
-  - `^Chapter \d+` for English
-  - `^\d+\.\d+(\.\d+)*\s` for numbered sections
-- This is fast, deterministic, and reliable. Prefer this when it works.
+**如果模式足够一致**(在受监管的法规类文档中通常都是一致的):
+- 写一个基于正则的切分器。例如:
+  - `^第[一二三四五六七八九十百千]+章` 用于匹配中文的章标题
+  - `^Chapter \d+` 用于匹配英文章标题
+  - `^\d+\.\d+(\.\d+)*\s` 用于匹配带编号的小节
+- 这种方案快速、确定、可靠。只要正则跑得通,优先选它。不要因为追求"看起来更智能"就放弃确定性的方案——确定性本身就是生产环境最稀缺、最值钱的属性。
+- 在调试阶段,记得为正则写一组小型的单元测试:包括典型的命中样例、明确不应命中的反例,以及容易混淆的边界样例(比如标题中混入的全角空格、不可见控制字符)。
-**If patterns are inconsistent or absent**:
-- Use the LLM-guided wedge-driving approach (see `rule-extraction/references/chunking-strategies.md` for the full algorithm: rolling context window, K-token quoting, Levenshtein fuzzy matching).
-- This is slower and costs LLM calls, but handles unstructured documents. The rolling window means even very large unstructured leaf nodes can be chunked incrementally.
+**如果模式不一致或根本不存在**:
+- 使用 LLM 引导的"楔入式"切分方法（完整算法见 `document-chunking` skill：滚动上下文窗口、K-token 引用比对、Levenshtein 模糊匹配）。
+- 这种方式较慢,且要消耗 LLM 调用,但能处理非结构化的文档。滚动窗口的意义在于:即便是非常巨大的非结构化叶子节点,也可以逐段递进地完成切分。
+- 一个务实的折中是混合策略:能用正则切到的层级先用正则切,正则啃不动的子节点再交给 LLM 引导式切分。这样可以把昂贵的 LLM 调用集中投放在真正需要语义判断的地方。
-**If the document has a table of contents**:
-- Parse the TOC first. It gives you the tree structure and page numbers for free.
-- Then use the TOC-derived structure to split the document body.
+**如果文档自带目录**:
+- 先解析目录。它免费地给了你一棵树的结构,外加每个节点的页码。
+- 然后再用从目录派生出的结构去切分文档正文。
+- 需要注意目录与正文之间偶尔会出现不一致(目录漏列了某节、或正文新增了目录里没有的小节)。把这些差异当作日志输出,便于后续人工核对,而不是默默吞掉。
-### Step 3: Build the Tree
+### 步骤 3:构建树
-The tree is a simple nested structure:
+树本身是一种简单的嵌套结构:
 ```
 Document
@@ -83,40 +86,41 @@ Document
     └── Chapter 5: Risk Disclosure (pages 79-120)
 ```
-Each node stores: the header text, the level, the start/end positions in the document, and the content size (in tokens or characters).
+每个节点都需要存储:标题文本、所在层级、在文档中的起止位置、以及内容规模(以 token 数或字符数计)。在工程实现上,建议同时保留一个稳定的节点 ID(例如从根到当前节点的编号路径),便于后续的引用追踪、缓存命中以及跨规则的复用。父节点和子节点之间通过显式的指针或 ID 关联,这样无论是自顶向下遍历还是自底向上追溯祖先,代价都是常数级的。
-### Step 4: Use the Tree
+### 步骤 4:使用树
-Given a rule that says "check the risk disclosure section":
+假设有一条规则要求"检查风险披露章节":
-1. **Search the tree** for the relevant node. Match the rule's scope description against node headers.
-   - Exact match: "Chapter 5" → find node with "Chapter 5" header.
-   - Semantic match: "risk disclosure section" → find node whose header or content relates to risk disclosure. May need fuzzy matching or LLM classification.
-2. **Extract the content** of that node (and optionally its children).
-3. **Check the size.** If the content fits in the worker LLM's context window, use it directly. If not, descend to child nodes and find the specific subsection needed.
+1. **在树中检索**目标节点。把规则中描述的作用域与节点标题做匹配。
+   - 精确匹配:"Chapter 5" → 找到标题为 "Chapter 5" 的节点。这种命中是最理想的情况,可以直接落点,不留歧义。
+   - 语义匹配:"风险披露章节" → 查找其标题或内容与"风险披露"相关的节点。这一步可能需要模糊匹配,或者用 LLM 做分类判断。在大型文档中,语义匹配应当先在标题层面尝试命中,只有标题不足以判断时,才下沉到摘要或正文。
+2. **抽取该节点的内容**(必要时也包括其子节点的内容)。抽取时同时记录这次抽取的来源节点 ID,这样验证结论就能反向追溯到文档中的具体位置,而不是悬空于"模型说"。
+3. **检查规模。**如果内容能够塞进工作 LLM 的上下文窗口,就直接使用。如果塞不下,则向下进入子节点,定位到真正需要的那个小节。在下沉过程中,记得保留祖先节点的标题链,使得 LLM 始终知道它正在阅读的是文档中的哪个位置。
-## The Full Context → Chapter → Entity Pipeline
+## "全文 → 章 → 实体" 流水线
-This is the standard narrowing funnel for extracting entities for verification:
+这是从文档中抽取待验证实体的标准漏斗式收窄过程,也是 KC 推荐的默认验证编排方式。每一步都把范围进一步收紧,把验证任务交给一个能力恰好匹配、上下文恰好够用的环节去完成:
-1. **Full context**: Use the tree to understand the document structure. Know where everything is.
-2. **Chapter**: Navigate to the specific section that the rule targets. Extract its content.
-3. **Entity**: Within the chapter content, extract the specific entity (number, text, clause) using the techniques from `entity-extraction`.
+1. **全文上下文**:借助树来理解整份文档的结构。知道每样东西分别在哪里。这一步不需要 LLM 真的去读全文,只需要让规则与树之间建立索引关系。
+2. **章节**:导航到这条规则所针对的具体章节。抽取其内容。注意章节边界要严格按照树上的起止位置来,不要凭印象多取或少取,否则验证准确率会被边界噪声拖累。
+3. **实体**:在章节内容内,使用 `entity-extraction` 中的方法,把具体的实体(数字、文本片段、条款)抽取出来。这是最贴近规则原子比对单元的一步。
-For worker LLMs with 16K-32K context:
-- The chapter content + the extraction prompt must fit in the context window.
-- If a chapter is too large, descend further in the tree.
-- Always include the parent header chain for context: "Part II > Chapter 3 > Section 3.1" so the LLM knows where this content sits in the document.
+对于上下文窗口为 16K–32K 的工作 LLM:
+- 章节内容加上抽取提示词必须能够装进上下文窗口。把这两部分的预估 Token 数加起来,留出至少 10% 余量给输出。
+- 如果某一章过大,就继续向下进入树的更深层。优先选择能完整覆盖规则作用域的最小子节点。
+- 始终把父级标题链一并附带上,作为定位上下文:例如 "Part II > Chapter 3 > Section 3.1",这样 LLM 才知道这段内容在整份文档中处于什么位置。缺少这条标题链,LLM 容易把同名的小节弄混,尤其是在带有"通则—分则"结构的法规中。
-## Caching and Reuse
+## 缓存与复用
-Build the tree once per document, reuse across all rules:
-- Save the tree structure as JSON alongside the parsed document.
-- Multiple rules may need different sections of the same document. The tree lets each rule navigate directly to its section without re-parsing.
+每份文档只需构建一次树,然后在所有规则上复用:
+- 把树结构以 JSON 形式与解析后的文档一同保存下来。文件名可以采用 `<doc_id>_tree.json` 这样的稳定模式,便于后续按文档 ID 直接读取。
+- 同一份文档常常会被多条规则命中不同的章节。树让每条规则都能直接跳转到自己关心的位置,而无需重新解析文档。这一点在批量验证场景下尤其重要,它把"O(规则数 × 文档解析)"的代价降到了"O(文档解析)+O(规则数 × 树检索)"。
+- 当文档版本发生变化时,把新旧两版的树做对比,可以快速看出新增、删除、合并、重排的章节,从而决定哪些旧的验证结论需要重跑、哪些可以延续。
-## Edge Cases
+## 边界情况
-- **Flat documents**: Some documents have no structural hierarchy. Treat the entire document as one node. Use LLM-guided chunking if it exceeds the context window.
-- **Deeply nested structures**: Some legal documents have 6+ nesting levels. Build all levels but typically only navigate 2-3 levels deep for any given rule.
-- **Cross-section references**: A section might reference "as defined in Section 1.2." When extracting, you may need content from multiple tree nodes. Collect them into a single context for the LLM.
-- **Appendices and annexes**: Often contain critical tables and data. Include them as top-level nodes in the tree.
+- **扁平文档**:有些文档完全没有结构化的层级。把整份文档当作一个节点处理。如果其规模超出上下文窗口,则改用 LLM 引导式分块。在这种情形下,要特别注意保留一份原文的连续性索引,以便后续把抽取结果回贴到正确的字符偏移上。
+- **嵌套很深的结构**:某些法律文档有 6 层及以上的嵌套层级。构建时把所有层级都建起来,但对任何一条具体规则,通常只需向下导航 2 到 3 层即可。过度下钻反而会让规则失去其应有的上下文,使得 LLM 看不到关键的限定性表述。
+- **跨章节的交叉引用**:某节可能写有"如第 1.2 节所定义"这样的字样。在抽取时,你可能需要同时从树上多个节点取内容。把它们拼成一个统一的上下文,再交给 LLM。在记录验证依据时,要分别标注每段来源节点,避免把"第 5 章的结论"与"第 1.2 节的定义"混为一谈。
+- **附录与附件**:附录和附件中往往承载着关键的表格和数据。要把它们作为顶层节点纳入树中,不要遗漏。许多披露类规则的"数字真相"恰恰躲在附录的表格里,正文反而只是导引性的描述。

package/template/skills/zh/version-control/SKILL.md CHANGED Viewed

@@ -296,3 +296,18 @@ extract_dates_v2.md   # 优化后的提示词
 ```
 这些对比数据也是仪表盘展示的重要素材。
+## 每条规则的 check.py —— 改写 v2 之前先保留 v1
+当你要把某条规则的验证逻辑从 v1（通常是纯 regex）迭代到 v2 （通常引入 LLM 判断或混合方案）时，**改写之前先把 v1 复制为同目录下的同级文件**：
+```bash
+cp rule_skills/Rxx/check.py rule_skills/Rxx/check_v1.py
+# 然后再把新版本写到 check.py
+```
+约定：
+- `check.py` 永远指向当前最优版本
+- `check_v1.py`、`check_v2.py`、…… 保留各代历史
+这样 v1 就和 v2 并排放在同一个目录里，不必再依赖 workspace 的 git 历史去翻找（`git log -- check.py` 能恢复，但每次都翻阅本身就是摩擦）。引擎级别的 `verify_engine_v1.py` / `verify_engine_v2.py` 分别保留各代编排器；每条规则的 check.py 需要自己的命名约定来配合。

package/template/skills/zh/version-control/references/trace-id-spec.md CHANGED Viewed

@@ -1,63 +1,63 @@
-# Trace ID Specification
+# Trace ID 规范
-Trace IDs embed source evidence pointers directly inside verification results. This document defines the format, generation rules, and integration points.
+Trace ID 把源头证据指针直接嵌入核查结果中。本文档定义其格式、生成规则与集成点。
-## Format
+## 格式
 ```
 {rule_id}-{document_id}-P{page}-S{section}-C{char_start}:{char_end}
 ```
-| Segment | Description | Example |
+| 段 | 说明 | 示例 |
 |---------|-------------|---------|
-| `rule_id` | The rule that produced this result. Matches the ID in `rule-catalog.json`. | `R001` |
-| `document_id` | A short identifier for the source document. Derived from filename or batch assignment. | `DOC042` |
-| `P{page}` | The 1-indexed page number where the source evidence appears. | `P3` |
-| `S{section}` | The section number within the page, following the document's own numbering. | `S2` |
-| `C{char_start}:{char_end}` | Character offset range within the extracted text block that constitutes the evidence. | `C120:180` |
+| `rule_id` | 产出此结果的规则。匹配 `rule-catalog.json` 中的 ID。 | `R001` |
+| `document_id` | 源文档的简短标识。来自文件名或批次内的指派。 | `DOC042` |
+| `P{page}` | 源证据所在页码（从 1 开始）。 | `P3` |
+| `S{section}` | 页内的小节编号，沿用文档自身的编号体系。 | `S2` |
+| `C{char_start}:{char_end}` | 抽取文本块中构成证据的字符偏移范围。 | `C120:180` |
-Full example: `R001-DOC042-P3-S2-C120:180`
+完整示例：`R001-DOC042-P3-S2-C120:180`
-When a rule draws evidence from multiple locations, generate one trace ID per location and store them as an array in the result.
+当一条规则的证据来自多个位置时，每个位置生成一条 trace ID，并以数组形式存放在结果中。
-## Generation
+## 生成规则
-Trace ID generation is **deterministic**: the same rule applied to the same document at the same location always produces the same trace ID. This is achieved by deriving every segment from stable inputs:
+Trace ID 的生成是**确定性的**：同一规则作用于同一文档的同一位置，永远生成相同的 trace ID。这是通过让每一段都来自稳定输入实现的：
-- `rule_id` comes from the rule catalog.
-- `document_id` comes from the document's filename or a developer-user-assigned identifier.
-- Page, section, and character range come from the extraction step.
+- `rule_id` 来自规则目录。
+- `document_id` 来自文档文件名或开发者用户指派的标识。
+- 页码、小节、字符范围来自抽取环节。
-Trace IDs are generated at verification time, immediately after entity extraction identifies the source location. They are never modified after creation. Re-verifying the same document produces new result records with new timestamps but identical trace IDs (because the source location has not changed). If the document is modified, the new version gets a new `document_id`, producing different trace IDs.
+Trace ID 在核查时生成，紧接实体抽取定位到来源位置之后。生成后永不修改。再次核查同一文档会产出带新时间戳的新结果记录，但 trace ID 不变（因为源位置没变）。如果文档被修改过，新版本应获得新的 `document_id`，从而产生不同的 trace ID。
-## Collision Avoidance
+## 避免冲突
-The combination of rule ID + document ID + page + section + character range makes collisions astronomically unlikely in practice. Two different pieces of evidence would need to match on all five segments simultaneously.
+规则 ID + 文档 ID + 页 + 节 + 字符范围的组合，使得现实中的冲突概率小到几乎可以忽略。两份不同的证据要在所有五段上同时一致才会冲突。
-If document IDs are not guaranteed unique across batches (e.g., multiple batches contain files named `report.pdf`), prefix the document ID with the batch identifier: `B003-DOC042`. This extends the trace ID format to `R001-B003-DOC042-P3-S2-C120:180`.
+如果文档 ID 在跨批次时不能保证唯一（例如多个批次都包含名为 `report.pdf` 的文件），就给文档 ID 加批次前缀：`B003-DOC042`。trace ID 格式因此扩展为 `R001-B003-DOC042-P3-S2-C120:180`。
-Do not use random UUIDs. Deterministic trace IDs allow deduplication and comparison across verification runs.
+不要使用随机 UUID。确定性的 trace ID 才能支持跨核查运行的去重与比较。
-## Storage Overhead
+## 存储开销
-A single trace ID string is approximately 30-50 bytes. The full trace ID object (including `source_location`, `rule_version`, `workflow_version`, and `model_tier`) is approximately 100-200 bytes in JSON.
+单条 trace ID 字符串约 30-50 字节。完整 trace ID 对象（含 `source_location`、`rule_version`、`workflow_version`、`model_tier`）的 JSON 表示约 100-200 字节。
-For a typical batch of 1000 verification results, trace IDs add roughly 100-200 KB of storage. This is negligible relative to the result data itself and the source documents.
+按 1000 条核查结果一个批次估算，trace ID 占用大约 100-200 KB 存储——相对于结果数据本身和源文档而言，可以忽略不计。
-## Surviving Export/Re-Import
+## 经得起导出与重导入
-Trace IDs are embedded in the result JSON structure, not stored in external metadata, sidecar files, or database columns that might be lost during export.
+Trace ID 嵌入在结果 JSON 结构内，而非外部元数据、附属文件、或可能在导出时丢失的数据库列。
-Any system that consumes the verification result JSON automatically receives the trace IDs. Specific scenarios:
+任何消费核查结果 JSON 的系统都会自动获得 trace ID。具体场景：
-- **CSV export**: The `trace_id` field becomes a column. A developer user reviewing results in a spreadsheet can copy a trace ID and paste it back to locate the source evidence.
-- **Aggregation**: When results from multiple batches are merged, trace IDs remain attached to their individual results. No re-linking is needed.
-- **Downstream APIs**: Systems consuming verification results via API receive trace IDs as part of the payload. They can store, index, or display them without any awareness of the trace ID format.
-- **Archival**: Archived results retain full traceability years later, even if the original verification system has evolved.
+- **CSV 导出**：`trace_id` 字段成为一列。开发者用户在电子表格中复查时，可复制一条 trace ID 并粘贴回工具中以定位证据来源。
+- **聚合**：当多批次结果被合并时，trace ID 仍附着在各自结果上，无需重新关联。
+- **下游 API**：通过 API 消费核查结果的系统会在 payload 中收到 trace ID。它们可以无视格式细节地存储、索引或展示这些 ID。
+- **归档**：归档后的结果在多年之后仍保留完整的可追溯性，即使原核查系统已经演进。
-## Integration with Cross-Document Verification
+## 与跨文档核查的集成
-When `cross-document-verification` detects a contradiction between two documents, reference trace IDs from both sides:
+当 `cross-document-verification` 在两份文档之间检测到矛盾时，将两侧的 trace ID 同时引用：
 ```json
 {
@@ -76,4 +76,4 @@ When `cross-document-verification` detects a contradiction between two documents
 }
 ```
-This creates a linked evidence chain: auditors can follow both trace IDs to the exact locations in both documents, verify the extracted values, and determine which document (if either) is correct. Without trace IDs, cross-document contradictions require manual search through both documents to find the relevant passages.
+这构成一条连贯的证据链：审计人员可循两条 trace ID 分别跳到两份文档的精确位置，核对所抽取的值，并判断哪份（若有）文档是正确的。若没有 trace ID，跨文档矛盾就需要在两份文档中手工搜索相关段落。

package/template/skills/zh/work-decomposition/SKILL.md CHANGED Viewed

@@ -6,7 +6,7 @@ description: 在 rule_extraction → skill_authoring 过渡阶段决定如何把
 # 工作拆分（Work Decomposition）
-KC 的 main agent 是指挥者。指挥者决定下一步做什么——而这个决定凌驾于后续所有选择之上。错误的拆分会让整个会话变得昂贵：规则顺序错了，agent 会把同一种结构重新设计三遍；不相关的规则被合并到一个 skill 里，最终 check.py 就会变成 E2E #4 那种"统一执行器"反模式；本应合并的相关规则被分散到不同 skill，agent 会把同样的 chunker 逻辑重新推导 17 次。
+KC 的 main agent 是指挥者。指挥者决定下一步做什么——而这个决定凌驾于后续所有选择之上。错误的拆分会让整个会话变得昂贵：规则顺序错了，agent 会把同一种结构重新设计三遍；不相关的规则被合并到一个 skill 里，最终 check.py 就会漂移成"统一执行器"反模式；本应合并的相关规则被分散到不同 skill，agent 会把同样的 chunker 逻辑反复推导很多次。
 这份 skill 是指挥者做这类决定的操作手册。它的层级标记是 `tier: meta-meta`，因为工作拆分是系统级的纪律，不是某条规则的具体技巧。互补的 `task-decomposition`（同样 `tier: meta-meta`）覆盖单条规则**内部**的结构——locate → extract → normalize → judge → comment。本 skill 覆盖的是规则**集合**该如何切分成 TaskBoard 任务。
@@ -15,6 +15,14 @@ KC 的 main agent 是指挥者。指挥者决定下一步做什么——而这
 - **进入 rule_extraction 时**。读完法规、拆出规则之后，在宣布该阶段完成之前，先决定这些规则会以什么顺序被处理、是否分组。覆盖审计与 chunk refs 都是这两个决定下游的工作。
 - **进入 skill_authoring 时**。TaskBoard 是空的（引擎不再自动生成 per-rule 任务）。从 `describeState` 读取规则列表，决定分组与顺序，然后为每个工作单元调用 `TaskCreate`。
 - **运行中觉得拆分不对时**。如果 TaskBoard 越走越奇怪（规则按错误顺序累积、明明该合并的两条规则被拆到两个任务里），停下来重新拆分。暂停 5 分钟重新规划的代价，会在接下来 2 条规则里被更合理的形状收回。
+- **任意阶段同时跑 3+ 个并行子目标时**。如果你发现自己在工作记忆里同时拎着多个并行子目标（3+ 条规则 × 文档、finalization 阶段的多份交付物、production_qc 的多个 QC 批次），把它们丢进 TaskBoard 串行处理。从 rule_extraction 到 finalization，任何阶段一旦出现并行子目标，都会从显式任务化中获益 —— distillation 和 production_qc 也不例外。
+## 简明判断：什么时候用 TaskBoard
+- 同时处理 N+ 条规则或 N+ 份文档？→ 开工之前先 `TaskCreate` 每个为一个任务。
+- 一步就能干完的小请求？→ 跳过，直接做。
+- 子代理内部协调？→ 跳过，子代理不暴露 TaskBoard。
+- 任何你心里要靠"待会儿再回来"才能撑住的事？→ 现在就 TaskCreate 出来。长回合下工作记忆会漏掉半成品。
 ## 锁定原则
@@ -32,7 +40,7 @@ KC 的 main agent 是指挥者。指挥者决定下一步做什么——而这
 先处理**最难**的规则。把这条难规则需要的 chunker、verdict 形状、worker 层级当作设计下限。后续规则按难度递减处理，每一条都是已经搭好的机制的退化形态。
-**何时选**：规则集合复杂度不均匀，并且你怀疑少数几条难规则会决定整体形状（合规/监管类工作几乎总是如此）。E2E #5 里 GLM 阴差阳错走的就是这条路，最终在真实 LLM 驱动的 workflow 上拿到了 0.6% ERROR；DS 走自底向上，最终 78% 的 verdict 是 NOT_APPLICABLE。
+**何时选**：规则集合复杂度不均匀，并且你怀疑少数几条难规则会决定整体形状（合规/监管类工作几乎总是如此）。这种方法走对了，在真实 LLM 驱动的 workflow 上能把 ERROR 率压到 1% 以下；走"自底向上"的反向路线则通常会过度产出 NOT_APPLICABLE —— 简单规则的机制无法承受最后那批难 case。
 **为何用 "Huffman" 而不是 "Shannon" 来类比**：Huffman 编码先处理低频符号来构造最优前缀码。KC 的对应物是单条成本高、出现频率低的规则——R028 那种类型，数量少但主导整个设计空间。先碰它们，简单规则就能廉价继承框架。
@@ -97,10 +105,10 @@ KC 的 main agent 是指挥者。指挥者决定下一步做什么——而这
 - 规则适用于不同文档类型（一条只对公募基金报告生效，另一条只对私募基金报告生效）
 - 一条规则的失败模式是另一条规则的特殊失败模式（不要把父规则和子规则合并——子规则的检查会冗余地重新执行父规则的检查）
-v0.6.2 D2 的反模式说法已经把失败情形说得很清楚了：
+反模式的说法把失败情形说得很清楚：
 > 如果你发现自己在写 unified_qc.py 那种绕过单 rule skill 的大杂烩，那就是说明你的 per-rule skill 是错的。是去修它们，不是去替换它们。
-那段话来自 E2E #4：一个指挥模型写了 2,400 行 `unified_qc.py` 一次性跑所有规则。结果出现 1,150 条 ERROR verdict（16.6%），因为每条规则的失败都连带把所有其他规则的判定也带崩了。Per-rule skill 是 KC 的粒度单元，这是有原因的。
+一种值得警惕的失败模式：指挥模型写了 2,000+ 行 `unified_qc.py` 一次性跑所有规则。结果是错误级联 —— 每条规则的失败都连带把所有其他规则的判定也带崩了，很容易在生产核查上做出 15%+ 的 ERROR 率。Per-rule skill 是 KC 的粒度单元，这是有原因的。
 ### 反模式：check.py 是 stub + workflow.py 才是真逻辑
@@ -141,22 +149,13 @@ def run(text, llm_fn=None):
 skill 的迭代（法规解释变化、生产中发现的边缘情形）需要一个**正典
 位置**来更新——也就是 skill——而不是 N 个已经各自漂移的 workflow。
-E2E #6 v070 暴露了这个反模式（DS 把所有 bundled skill 的 check.py
-都写成 `{"pass": null, "method": "stub"}` 推给 workflows/）。
-v0.7.1 把这个反模式显式写进 skill。
+两种值得警惕的失败模式：
+**纯 stub 失败**：bundled skill 的 check.py 都写成 `{"pass": null, "method": "stub"}` 推给 `workflows/`。方法论写在 SKILL.md 里，但 skill 目录本身没有可执行实现。
-E2E #7 v071 显示这个反 stub 的引导在两个 conductor 上都生效（两条 run
-里都没有 `{"pass": null}` 这种 stub 模式），但是 **DS 仍然把"正典 vs
-蒸馏"的关系搞反了**：DS 写了 6 个主题分组的 skill 文件夹，每个只有
-SKILL.md（没有 check.py），真正的验证代码却在
-`workflows/<skill>/check.py` 里。没有 stub 是好事；关系搞反不是 ——
-要修改一条规则的逻辑就得同时改 SKILL.md（文档）和 workflow check.py
-（代码），单一信息源就丢了。
+**正典-蒸馏关系搞反**：agent 避开了 stub（好），但把"正典 vs 蒸馏"的关系搞反了 —— 主题分组的 skill 文件夹只有 SKILL.md（没有 check.py），真正的验证代码在 `workflows/<skill>/check.py` 里。没有 stub 是好事；关系搞反不是 —— 要修改一条规则的逻辑就得同时改 SKILL.md（文档）和 workflow check.py（代码），单一信息源就丢了。
-GLM v071 反而把正典模式落地了：97/97 个 skill 都同时有 SKILL.md 和
-真正的 `check.py`（regex + 适用性判断的代码，中位 143 行），而
-`workflows/<id>/workflow_v1.py` 是一个 50 行的薄壳，只是 import 并
-调用 skill 的 check.py：
+正典落地长这样：每个 skill 都同时有有内容的 SKILL.md 和真正的 `check.py`（regex + 适用性判断的代码），而 `workflows/<id>/workflow_v1.py` 是一个约 50 行的薄壳，只是 import 并调用 skill 的 check.py：
 ```python
 # workflows/D01-01/workflow_v1.py — 薄壳，52 行
@@ -173,10 +172,7 @@ def run(doc_text: str, meta: dict = None) -> dict:
     return result
 ```
-这是 v0.7.2+ 的正典模式：workflow 是个壳，指向 skill 的 check.py。
-迭代规则验证逻辑时，编辑 `rule_skills/<id>/check.py`，workflow 不用动。
-v0.7.2 把引导说得更清楚：既不要 stub，也要保留正典关系（skill 是
-正典，workflow 是蒸馏过的薄壳）。
+这是正典模式：workflow 是个壳，指向 skill 的 check.py。迭代规则验证逻辑时，编辑 `rule_skills/<id>/check.py`，workflow 不用动。引导有两条：既不要 stub，也要保留正典关系（skill 是正典，workflow 是蒸馏过的薄壳）。
 ### 合并 check 的命名约定
@@ -340,7 +336,7 @@ PATTERNS.md 全文控制在约 5 KB 之内。超过时，剪掉最不可执行
 ### 调用 TaskCreate / TaskUpdate / TaskComplete
-引擎注册了三个任务面板工具（v0.7.4）：
+引擎注册了三个任务面板工具：
 - `TaskCreate({id, title, phase, ruleId?})` —— 在 `tasks.json` 中新增一条任务。`id` 在本会话内必须唯一；per-rule 任务建议用 `<rule_id>-<phase>` 这种稳定形状，分组 / 非规则任务用 `<group-name>-<phase>`。`phase` 是该任务所属的当前阶段。`ruleId` 可选 —— 设上之后引擎在里程碑推导时能把这个 rule_id 计入覆盖。
 - `TaskUpdate({id, status?, summary?})` —— 把任务状态改为 `pending` / `in_progress` / `completed` / `failed`，可选附一行简要 summary。
@@ -348,7 +344,7 @@ PATTERNS.md 全文控制在约 5 KB 之内。超过时，剪掉最不可执行
 ### Ralph 循环范围 —— 仅限当前阶段
-重要契约（v0.7.4 在团队反馈后调整）：
+重要契约：
 - **循环范围 = 仅当前阶段**。TaskCreate 只能为当前阶段建任务，Ralph 循环在阶段内逐条处理。
 - **阶段边界 = 循环退出**。当前阶段任务全部完成、或阶段推进（你调 `phase_advance`、或任何其他地方改了 `currentPhase`）时，循环干净退出，控制权回到用户。
@@ -380,9 +376,9 @@ TaskComplete({ id: "R001-skill_authoring",
 - **`rules/PATTERNS.md`** —— 简洁，只装框架级内容，随项目推进而更新。适合假设可以前置、结构清晰的全新项目。上限 ~5 KB；条目是可迁移的形状 / 项目级约束 / 反模式加原因（参考上面"该写什么"一节）。
-- **每阶段写 `logs/phase_<name>_complete.md`** —— 增量式，记录每个 phase 产出了什么、做了哪些决定、下个 phase 继承什么。适合"边发现边定型"的迭代式工作。E2E #7 GLM 用了这个模式：6 篇 phase 文档 + `evolution_summary_v1.2.md`，方法论照样捕获了，只是没写 PATTERNS.md。
+- **每阶段写 `logs/phase_<name>_complete.md`** —— 增量式，记录每个 phase 产出了什么、做了哪些决定、下个 phase 继承什么。适合"边发现边定型"的迭代式工作。一种真实出现过的模式：6 篇 phase 文档 + 一份 `evolution_summary_vN.md`，方法论照样捕获了，即使 PATTERNS.md 从未写过。
-- **`AGENT.md` decisions 段 + 领域笔记** —— 叙事风格，是关于"我们知道什么"和"为什么"的活文档。适合需要捕获丰富领域上下文的项目（法规、边缘案例、阈值、样本格式分布）。E2E #7 GLM 的 AGENT.md 里有法规生效日期、产品类型分类、阈值数值、样本格式数量 —— 完全 OK，是相同目标的不同惯用法。
+- **`AGENT.md` decisions 段 + 领域笔记** —— 叙事风格，是关于"我们知道什么"和"为什么"的活文档。适合需要捕获丰富领域上下文的项目（法规、边缘案例、阈值、样本格式分布）。一份记录了法规生效日期、产品类型分类、阈值数值、样本格式数量的 AGENT.md 完全 OK —— 这是相同目标的不同惯用法。
 不该做的事：跳过持久化、只靠对话上下文活着。等你写到第 N 条 skill 还没把方法论写到磁盘时，你已经做了 N 个关于 verdict 形状、chunker 边界、worker tier 的隐式决定 —— 每条规则都从零推导，重构要碰 N 个文件而不是一个。
@@ -390,8 +386,34 @@ TaskComplete({ id: "R001-skill_authoring",
 ✅ "每次 phase 推进之前，把这一阶段学到的东西写到适合本项目惯用法的那个持久化文件里 —— 哪怕只是初稿。"
-E2E 历史：
-- E2E #6 v070 DS 在用户介入回退之后才写 PATTERNS.md。那之前每条 skill 的设计决定都各自固化，之后还要再碰一遍。v0.7.1 加了"PATTERNS.md FIRST"的引导。
-- E2E #7 v071 DS 和 GLM 都没写 PATTERNS.md，但 GLM 写了 6 篇 phase 完成日志和一份内容详尽的 AGENT.md —— 方法论 *捕获了*，只是放在了不同文件里。v0.7.2 把更宽的原则写进 skill：推进之前先持久化，格式灵活。
+值得警惕的失败模式：
+- agent 在出现回退之后才写 PATTERNS.md。那之前每条 skill 的设计决定都各自固化，之后还要再碰一遍。"PATTERNS.md FIRST"的引导就是因为这个代价存在。
+- agent 完全没写 PATTERNS.md，但写了内容详尽的 phase 完成日志和一份内容详尽的 AGENT.md —— 方法论 *捕获了*，只是放在了不同文件里。这没问题。更宽的原则是：推进之前先持久化，格式灵活。
+引擎从文件系统推导里程碑会按磁盘事实核验覆盖率，无论你怎么切分工作。TaskBoard 是你的草稿；磁盘才是契约；持久化文件是项目的记忆。
+## 子代理批处理：滚动窗口写入（rolling-window）
+当你派发 N 个子代理做批量工作（回归测试、批量核查、并行规则处理）时，**不要**让它们写同一个协调文件。一种值得警惕的失败模式：子代理在 `tasks.json` / `rules/catalog.json` / `output/results/summary.json` 上互相抢锁 —— 一个占着工作区锁好几分钟，其他在静默等待。
+正确的模式：每个子代理写到**自己**专属的、有已知前缀的文件。父代理在所有子代理完成后再做聚合。
+```
+sub_agents/
+  batch-001-regression/
+    output/results/v2_regression.json       # ❌ 多个子代理共用 — 抢锁
+  batch-002-regression/
+    output/results/v2_regression.json       # ❌ 同一路径，竞争
+# 改为：
+output/
+  batch_regression_001.json                 # ✓ 每个子代理一个文件
+  batch_regression_002.json                 # ✓
+  batch_regression_003.json                 # ✓
+# 父代理读所有 batch_regression_*.json，写汇总。
+```
+引擎信号：如果你在 events.jsonl 里看到 `lock_blocked` 事件出现在子代理工作期间，那就是症状 —— 引擎会发出这个事件，让父代理在子代理超时之前就看见冲突。出现就立刻改成滚动窗口写。
-引擎从文件系统推导里程碑（v0.7.0 Group A）会按磁盘事实核验覆盖率，无论你怎么切分工作。TaskBoard 是你的草稿；磁盘才是契约；持久化文件是项目的记忆。
+不要写"用文件锁协调"的子代理批处理。锁原语是用来防止意外并发写入的安全机制，不是队列。用文件系统布局作为协调机制。