npm - kc-beta - Versions diffs - 0.7.3 → 0.7.5 - Mend

kc-beta 0.7.3 → 0.7.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (88) hide show

package/template/skills/en/{meta-meta/quality-control → quality-control}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: quality-control
+tier: meta-meta
 description: Design and execute quality control for production verification workflows. Use when workflows are deployed on Input/ documents and results need to be monitored, when designing the QC sampling strategy for a rule, or when evaluating whether monitoring can be reduced. Covers LLM-as-Judge evaluation, adaptive sampling strategies, confidence-based triage, and the transition from active monitoring to stable oversight. Also use when production quality drops and you need to diagnose whether to trigger the evolution loop.
 ---

package/template/skills/en/{meta-meta/rule-extraction → rule-extraction}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: rule-extraction
+tier: meta
 description: Extract and organize business verification rules from regulation documents into discrete, testable units. Use when processing documents in Rules/ to identify individual verification rules, when decomposing a regulation into atomic checks, or when the developer user adds new regulation files. Covers reading regulation text, identifying rule boundaries, determining granularity, handling cross-references, and producing a rule catalog. Also use when rules are provided in structured formats like xlsx or csv.
 ---

package/template/skills/en/{meta-meta/rule-graph → rule-graph}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: rule-graph
+tier: meta-meta
 description: Build and maintain a graph of relationships between verification rules — shared entities, logical dependencies, and conflicts. Use when analyzing the impact of a regulation change, when optimizing extraction to avoid duplicate work, when checking rule catalog completeness, or when rolling up document-level results into a summary. Critical constraint — the graph is an overlay for analysis, NOT a prerequisite for execution. Every rule must remain independently runnable.
 ---

package/template/skills/en/{meta-meta/skill-authoring → skill-authoring}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: skill-authoring
+tier: meta
 description: Write each verification rule into a Claude Code skill folder following the official skill format. Use when converting extracted rules into skill folders, when iterating on existing rule skills after testing, or when the developer user wants to capture domain knowledge as a skill. Each skill folder must be self-contained with business logic in SKILL.md, code in scripts/, regulation context in references/, and sample data in assets/. Also use the bundled skill-creator for the full eval/iterate workflow.
 ---

package/template/skills/en/skill-creator/SKILL.md CHANGED Viewed

@@ -1,6 +1,7 @@
 ---
 name: skill-creator
-description: Anthropic's skill-scaffolding toolkit — use for iterating/improving existing skills or running evals on them, NOT as the primary reference for building KC's per-rule verification skills. For KC rule skills, read `meta-meta/skill-authoring` first (canonical folder layout + granularity rules + KC-specific check.py entry-point conventions) and `meta-meta/work-decomposition` for ordering + grouping decisions. This skill applies once per-rule skills exist and the agent wants to optimize their description/triggering or run formal evals.
+tier: meta
+description: Anthropic's skill-scaffolding toolkit — use for iterating/improving existing skills or running evals on them, NOT as the primary reference for building KC's per-rule verification skills. For KC rule skills, consult `skill-authoring` first (canonical folder layout + granularity rules + KC-specific check.py entry-point conventions) and `work-decomposition` for ordering + grouping decisions. This skill applies once per-rule skills exist and the agent wants to optimize their description/triggering or run formal evals.
 ---
 # Skill Creator

package/template/skills/en/{meta-meta/skill-to-workflow → skill-to-workflow}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: skill-to-workflow
+tier: meta
 description: Distill a proven verification skill into a Python workflow with worker LLM prompts. Use when a rule skill has been tested and reaches the SKILL_ACCURACY threshold defined in .env. Covers the decision of what to implement as code vs LLM calls, prompt engineering for small context windows, model tier selection and progressive downgrade, and testing workflows against the coding agent's own results as ground truth. Also use when optimizing existing workflows for cost or speed.
 ---
@@ -49,10 +50,10 @@ Most rules are a mix: regex extracts the number, Python compares it to the thres
 Before declaring distillation complete, audit each rule's `verification_type` / `metric` / `evidence_type` (or equivalent fields in your catalog). For rules where the required verification is one of:
-- **Semantic** ("is this a positive guarantee or a disclaimer?")
-- **Contextual** ("interpret this in light of the document's product type")
-- **Counterfactual** ("what should this value be, given the other fields?")
-- **Cross-field arithmetic** ("does 期初 + 收益 - 分配 = 期末?")
+- **Semantic** judgment
+- **Contextual** interpretation
+- **Counterfactual** reasoning
+- **Cross-field arithmetic**
 regex alone rarely suffices. Three acceptable forms:

package/template/skills/en/{meta-meta/task-decomposition → task-decomposition}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: task-decomposition
+tier: meta-meta
 description: Decompose each verification rule into independent sub-tasks and assign the optimal method (rule, code, LLM, manual) to each. Use when converting extracted rules into implementation plans, when a rule skill is too expensive or inaccurate and needs restructuring, or when designing a multi-step verification pipeline. Covers MECE decomposition, method selection via the four-dimension decision matrix, cost-benefit analysis, and source tagging. Also use when auditing an existing workflow for cost optimization opportunities.
 ---

package/template/skills/en/{meta/tree-processing → tree-processing}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: tree-processing
+tier: meta
 description: >
   Design production-grade document chunking mechanisms for verification workflows. Use when
   building the chunking step of a workflow that will run repeatedly on many documents.

package/template/skills/en/{meta-meta/version-control → version-control}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: version-control
+tier: meta
 description: Manage versioning of skills, workflows, prompts, and system configuration throughout the lifecycle. Use when skills are modified, workflows are regenerated, prompts are updated, or any artifact needs rollback capability. Covers what to version, how to version with file-system conventions, maintaining a version manifest, and rollback procedures. Also use when comparing performance between versions or when production results need to trace back to the exact workflow version that produced them.
 ---

package/template/skills/en/{meta-meta/work-decomposition → work-decomposition}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: work-decomposition
+tier: meta-meta
 description: Decide how to decompose the rule set into TaskBoard tasks during rule_extraction → skill_authoring transition. Covers ordering methodologies (difficulty-first / Shannon–Huffman, breadth-first, depth-first, binary partition), grouping rules (when to bundle multiple rules into one task vs. keep separate), three-axis difficulty estimation, and how to write PATTERNS.md project memory that stays useful across the run. Use when entering rule_extraction, when entering skill_authoring, or whenever the TaskBoard feels wrong and you want to re-decompose.
 ---
@@ -7,7 +8,7 @@ description: Decide how to decompose the rule set into TaskBoard tasks during ru
 KC's main agent is the conductor. The conductor decides what work to do next — and that decision is upstream of every other choice that follows. Wrong decomposition makes the rest of the run expensive: if rules are processed in the wrong order, the agent re-designs the same shape three times. If unrelated rules are bundled into one skill, the resulting check.py becomes the unified-runner anti-pattern from E2E #4. If related rules are split across separate skills, the agent re-derives the shared chunker logic 17 times.
-This skill is the conductor's playbook for that decision. It ships under `meta-meta/` because work decomposition is a system-level discipline, not a per-rule technique. The complementary `task-decomposition` skill (also under `meta-meta/`) covers the *internal* structure of one rule's check — locate, extract, normalize, judge, comment. This skill covers how the rule **set** should be split into TaskBoard items.
+This skill is the conductor's playbook for that decision. It's tagged `tier: meta-meta` because work decomposition is a system-level discipline, not a per-rule technique. The complementary `task-decomposition` skill (also `tier: meta-meta`) covers the *internal* structure of one rule's check — locate, extract, normalize, judge, comment. This skill covers how the rule **set** should be split into TaskBoard items.
 ## When to use this skill
@@ -346,13 +347,23 @@ When entering skill_authoring with an empty TaskBoard:
 ### Calling TaskCreate / TaskUpdate / TaskComplete
-The engine registers three task-board tools (v0.7.3+):
+The engine registers three task-board tools (v0.7.4):
-- `TaskCreate({id, title, phase, ruleId?})` — adds a task to `tasks.json`. `id` must be unique within the session; pick a stable shape like `<rule_id>-<phase>` for per-rule tasks or `<group-name>-<phase>` for grouped / non-rule tasks. `phase` is the phase the task belongs to (current phase or a future phase you're pre-populating). `ruleId` is optional — set it for per-rule tasks so the engine can credit the rule_id in milestone derivation.
-- `TaskUpdate({id, status?, summary?})` — updates a task's status to `pending` / `in_progress` / `completed` / `failed`, optionally with a short summary.
-- `TaskComplete({id, summary?})` — sugar for `TaskUpdate({id, status:"completed", summary})`. Use this for the common path after finishing a unit of work.
+- `TaskCreate({id, title, phase, ruleId?})` — adds a task to `tasks.json`. `id` must be unique within the session; pick a stable shape like `<rule_id>-<phase>` for per-rule tasks or `<group-name>-<phase>` for grouped / non-rule tasks. `phase` is the current phase the task belongs to. `ruleId` is optional — set it for per-rule tasks so the engine can credit the rule_id in milestone derivation.
+- `TaskUpdate({id, status?, summary?})` — change a task's status (`pending` / `in_progress` / `completed` / `failed`), optionally with a short summary.
+- `TaskComplete({id, summary?})` — sugar for `TaskUpdate({id, status:"completed", summary})`. Use this after finishing a unit of work.
-After you call `TaskCreate` for your decomposition and exit the current turn, the Ralph loop pulls the next pending task and runs it. Finish the work, call `TaskComplete`, and the loop advances. If a task can't be completed (irrecoverable error), call `TaskUpdate({id, status:"failed", summary:"reason"})` so the queue moves on rather than blocking on the failed task.
+### Ralph loop scope — within a phase only
+Important contract (changed in v0.7.4 after team feedback):
+- **Loop scope = current phase only.** TaskCreate populates tasks for the CURRENT phase. The Ralph loop processes them one by one within the phase.
+- **Loop exits at phase boundaries.** When all current-phase tasks complete OR the phase advances (you call `phase_advance`, or anything else changes `currentPhase`), the loop exits cleanly. Control returns to the user.
+- **No engine auto-advance.** The engine does NOT auto-advance phases when tasks complete + exit criteria are met. Phase advance is YOUR explicit call (`phase_advance` tool) or the user's re-prompt.
+- **Don't pre-create tasks for future phases.** They'll be ignored — the loop exits at the phase boundary before processing them. Create tasks only for the phase you're currently in.
+- **Phase boundaries = user checkpoints.** This is intentional. The team needs visibility into progress at natural breakpoints. After your task batch + `phase_advance`, the loop exits, you summarize progress in your final message, the user prompts you to begin the next phase.
+End-to-end autonomous "run from bootstrap to finalization without stopping" is NOT the engine's job — when that capability ships, it'll be an external driver (`/loop`-style command) that calls the agent repeatedly across phases. Inside one invocation, work the current phase fully, advance, and return to the user.
 Examples:

package/template/skills/phase_skills.yaml ADDED Viewed

@@ -0,0 +1,107 @@
+# Phase × skills registry — single source of truth for KC's skill scoping.
+#
+# v0.7.5: edit this file once; SkillLoader propagates to system-prompt
+# injection (always_loaded bodies inline), workspace skills/ population
+# (available set symlinked into <workspace>/skills/), and audit-script
+# comparison.
+#
+# Schema:
+#   phases:
+#     <phase_name>:
+#       always_loaded: [<skill_name>, ...]   # bodies injected into system prompt
+#       available: [<skill_name>, ...]       # consultable via consult_skill tool
+#
+# Always-loaded skills are auto-added to `available` at load time
+# (always_loaded ⊆ available conceptually). The list in `available`
+# below excludes already-always-loaded entries for readability.
+#
+# When adjusting: skill names must match a directory under
+# template/skills/{lang}/<name>/ containing a SKILL.md.
+phases:
+  bootstrap:
+    always_loaded:
+      - bootstrap-workspace
+    available:
+      - auto-model-selection
+      - data-sensibility
+      - document-parsing
+      - document-chunking
+      - version-control
+  rule_extraction:
+    always_loaded:
+      - rule-extraction
+    available:
+      - work-decomposition
+      - rule-graph
+      - data-sensibility
+      - document-parsing
+      - document-chunking
+      - version-control
+  skill_authoring:
+    always_loaded:
+      - skill-authoring
+      - work-decomposition
+    available:
+      - data-sensibility
+      - entity-extraction
+      - tree-processing
+      - compliance-judgment
+      - rule-graph
+      - corner-case-management
+      - evolution-loop
+      - skill-to-workflow
+      - skill-creator
+      - version-control
+  skill_testing:
+    always_loaded:
+      - evolution-loop
+    available:
+      - skill-authoring
+      - skill-to-workflow
+      - tree-processing
+      - corner-case-management
+      - compliance-judgment
+      - data-sensibility
+      - rule-graph
+      - version-control
+  distillation:
+    always_loaded:
+      - skill-to-workflow
+      - evolution-loop
+    available:
+      - skill-authoring
+      - task-decomposition
+      - corner-case-management
+      - confidence-system
+      - entity-extraction
+      - compliance-judgment
+      - version-control
+  production_qc:
+    always_loaded:
+      - quality-control
+      - evolution-loop
+    available:
+      - skill-authoring
+      - skill-to-workflow
+      - confidence-system
+      - cross-document-verification
+      - corner-case-management
+      - compliance-judgment
+      - dashboard-reporting
+      - version-control
+  finalization:
+    always_loaded:
+      - quality-control
+    available:
+      - skill-authoring
+      - skill-to-workflow
+      - dashboard-reporting
+      - version-control
+      - pdf-review-dashboard

package/template/skills/zh/{meta-meta/auto-model-selection → auto-model-selection}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: auto-model-selection
+tier: meta
 description: >
   使用 Context7 CLI 获取最新 LLM 模型信息。当需要了解可用模型、模型能力、价格、
   上下文窗口大小、或哪个模型适合某项任务时使用——包括分层分配、Worker LLM 工作流设计、

package/template/skills/zh/{meta-meta/bootstrap-workspace → bootstrap-workspace}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: bootstrap-workspace
+tier: meta-meta
 description: Initialize and configure a document verification workspace. Use when a developer user first opens this workspace, when .env needs configuration, or when the business scenario needs to be understood. Guides the coding agent through reading regulation documents, understanding the developer user's business context, configuring model tiers and thresholds, and establishing the working relationship. Covers initial conversation with developer user to scope the verification task, set expectations, and agree on checkpoints.
 ---

package/template/skills/{en/meta → zh}/compliance-judgment/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: compliance-judgment
+tier: meta
 description: Determine whether extracted entities comply with verification rules. Use after entity extraction to make the pass/fail judgment for each rule on each document. Covers translating natural language rules into executable logic, choosing between Python calculation and LLM semantic judgment, and producing actionable comments on failures. Also use when designing the judgment step of a workflow or when a rule's judgment logic needs debugging.
 ---

package/template/skills/zh/{meta/confidence-system → confidence-system}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: confidence-system
+tier: meta
 description: Design and calibrate confidence scoring for extraction and verification results. Use when building any workflow that needs to quantify trust in its output, when setting up quality control sampling thresholds, or when calibrating existing confidence scores against actual accuracy. Confidence is the bridge between workflows and quality control. Also use when the quality control skill reports that confidence scores do not correlate with actual correctness.
 ---

package/template/skills/zh/{meta/corner-case-management → corner-case-management}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: corner-case-management
+tier: meta
 description: Identify, catalog, and handle corner cases that do not fit the mainstream verification workflow. Use when the evolution loop classifies a failure as a corner case (affecting less than ~10% of documents), when adding a new edge case to the registry, or when deciding whether a corner case should be promoted to a systemic fix. Also use when designing the corner case detection mechanism for a workflow.
 ---

package/template/skills/zh/{meta/cross-document-verification → cross-document-verification}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: cross-document-verification
+tier: meta
 description: Perform case-level analysis across multiple documents for the same transaction. Use when documents do not exist in isolation — main contracts have appendices, loan applications come bundled with income certificates, bank statements, credit reports, and property appraisals. Use to build comparison matrices, detect contradictions (hard mismatches and soft implausibilities), classify severity, and flag fraud signals. Also use when user or end-user reports a cross-document inconsistency — these reports are ground truth and take priority over agent judgment.
 ---

package/template/skills/zh/{meta-meta/dashboard-reporting → dashboard-reporting}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: dashboard-reporting
+tier: meta-meta
 description: Generate HTML dashboards for developer users to visualize verification results, system progress, and quality metrics. Use when a testing round completes, when production batches finish processing, when the developer user wants to see the system's status, or at any point where visual reporting would help communicate progress. Dashboards should be self-contained HTML files that can be opened by double-clicking. Also use when the developer user asks about results, accuracy, or system health.
 ---

package/template/skills/zh/{meta/data-sensibility → data-sensibility}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: data-sensibility
+tier: meta
 description: Build intuition about document data before writing extraction logic. Use before designing any extraction schema or regex pattern, when onboarding a new document type, or when extraction accuracy is unexpectedly low and you suspect a data assumption is wrong. Covers systematic observation of raw documents, spot-checking extracted results, distribution analysis, and recognizing suspicious patterns. If you are about to write code that touches document data and you have not read at least five documents end-to-end, stop and use this skill first.
 ---

package/template/skills/{en/meta → zh}/document-chunking/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: document-chunking
+tier: meta
 description: >
   Fast, cheap chunking for processing batches of sample and input documents.
   Use when you need to split documents into manageable pieces for initial observation,

package/template/skills/zh/{meta/document-parsing → document-parsing}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: document-parsing
+tier: meta
 description: Parse source documents into machine-readable text with maximum fidelity. Use when processing any document in Samples/ or Input/ for the first time, when parsed text quality is poor, or when tables and charts need special handling. Covers multi-level parser selection from simple text extraction to OCR and vision models. Also use when a verification rule fails due to parsing issues (garbled text, missing tables, mangled layouts) and the parser needs to be upgraded for that document type.
 ---

package/template/skills/{en/meta → zh}/entity-extraction/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: entity-extraction
+tier: meta
 description: Extract specific entities, values, and text segments from documents as required by verification rules. Use after tree processing has located the relevant section, when a rule needs a specific number, date, name, amount, clause, or any domain-specific entity extracted. Covers extraction method selection (regex vs LLM), schema design, postprocessing, and confidence annotation. Also use when designing the extraction step of a workflow for worker LLMs.
 ---

package/template/skills/zh/{meta-meta/evolution-loop → evolution-loop}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: evolution-loop
+tier: meta-meta
 description: Drive continuous improvement of skills and workflows through the diagnose-classify-fix-retest cycle. Use after any testing round reveals failures, when production quality control flags issues, or when accuracy drops below thresholds. Covers failure analysis, distinguishing systemic issues from corner cases, deciding whether to rewrite or patch, and knowing when to stop iterating. The evolution loop is the heartbeat of the system. Also use when transitioning between lifecycle phases (skill testing, workflow testing, production monitoring).
 ---

package/template/skills/zh/{meta-meta/pdf-review-dashboard → pdf-review-dashboard}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: pdf-review-dashboard
+tier: meta
 description: >
   生成双栏 PDF 审核面板，用于人工核查验证结果。左侧显示原始 PDF 文档，右侧显示验证结果。
   点击结果条目可跳转至 PDF 对应页面。当开发者用户需要对照源文件审核验证输出、

package/template/skills/zh/{meta-meta/quality-control → quality-control}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: quality-control
+tier: meta-meta
 description: Design and execute quality control for production verification workflows. Use when workflows are deployed on Input/ documents and results need to be monitored, when designing the QC sampling strategy for a rule, or when evaluating whether monitoring can be reduced. Covers LLM-as-Judge evaluation, adaptive sampling strategies, confidence-based triage, and the transition from active monitoring to stable oversight. Also use when production quality drops and you need to diagnose whether to trigger the evolution loop.
 ---

package/template/skills/zh/{meta-meta/rule-extraction → rule-extraction}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: rule-extraction
+tier: meta
 description: Extract and organize business verification rules from regulation documents into discrete, testable units. Use when processing documents in Rules/ to identify individual verification rules, when decomposing a regulation into atomic checks, or when the developer user adds new regulation files. Covers reading regulation text, identifying rule boundaries, determining granularity, handling cross-references, and producing a rule catalog. Also use when rules are provided in structured formats like xlsx or csv.
 ---

package/template/skills/zh/{meta-meta/rule-graph → rule-graph}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: rule-graph
+tier: meta-meta
 description: Build and maintain a graph of relationships between verification rules — shared entities, logical dependencies, and conflicts. Use when analyzing the impact of a regulation change, when optimizing extraction to avoid duplicate work, when checking rule catalog completeness, or when rolling up document-level results into a summary. Critical constraint — the graph is an overlay for analysis, NOT a prerequisite for execution. Every rule must remain independently runnable.
 ---

package/template/skills/zh/{meta-meta/skill-authoring → skill-authoring}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: skill-authoring
+tier: meta
 description: Write each verification rule into a Claude Code skill folder following the official skill format. Use when converting extracted rules into skill folders, when iterating on existing rule skills after testing, or when the developer user wants to capture domain knowledge as a skill. Each skill folder must be self-contained with business logic in SKILL.md, code in scripts/, regulation context in references/, and sample data in assets/. Also use the bundled skill-creator for the full eval/iterate workflow.
 ---

package/template/skills/zh/skill-creator/SKILL.md CHANGED Viewed

@@ -1,6 +1,7 @@
 ---
 name: skill-creator
-description: Anthropic 官方 skill 脚手架工具——用于迭代/优化已有 skill 或对其运行 evaluation，不是构建 KC per-rule 核查 skill 的首选参考。要写 KC 规则 skill，先读 `meta-meta/skill-authoring`（规范目录结构 + 粒度规则 + KC 特定的 check.py 入口约定）和 `meta-meta/work-decomposition`（排序与分组决策）。本 skill 适用于：per-rule skill 已经存在、agent 想优化其 description/触发或跑正式 evaluation 时。
+tier: meta
+description: Anthropic 官方 skill 脚手架工具——用于迭代/优化已有 skill 或对其运行 evaluation，不是构建 KC per-rule 核查 skill 的首选参考。要写 KC 规则 skill，先 consult `skill-authoring`（规范目录结构 + 粒度规则 + KC 特定的 check.py 入口约定）和 `work-decomposition`（排序与分组决策）。本 skill 适用于：per-rule skill 已经存在、agent 想优化其 description/触发或跑正式 evaluation 时。
 ---
 # Skill Creator

package/template/skills/zh/skill-to-workflow/SKILL.md ADDED Viewed

@@ -0,0 +1,190 @@
+---
+name: skill-to-workflow
+tier: meta
+description: 将一条已通过测试的验证 skill 蒸馏为带 worker LLM 提示词的 Python workflow。当某条规则 skill 已经过测试、达到 `.env` 中定义的 SKILL_ACCURACY 阈值时使用。覆盖如下决定：哪些部分用代码实现、哪些部分用 LLM 调用；针对小上下文窗口的提示词工程；模型层级选择与渐进式降级；以及如何用编码 agent 自己的 skill 结果作为 ground truth 来测试 workflow。也用于对已有 workflow 做成本或速度上的优化。
+---
+# Skill 到 Workflow（Skill to Workflow）
+skill 是 ground truth。workflow 是更便宜、更快的近似。你的工作是让这个近似在尽可能便宜的同时，逼近原版的精度。
+## 工程目标
+优化整条链路：**最短 workflow**（节点数最少）→ **每个节点用最小模型**（在满足精度的前提下用最便宜的层级）→ **每个模型用最短提示词**（最少 token）。这才是工程目标——不是提示词模板的华丽程度，也不是某种框架的合规性。
+## 何时开始
+满足以下条件时，一条 skill 才算准备好被蒸馏为 workflow：
+- 它已经在 Samples/ 下所有文档上跑过测试。
+- 它的准确率达到或超过 `.env` 中的 SKILL_ACCURACY 阈值。
+- 边缘案例都记录在 skill 的 `assets/corner_cases.json` 里。
+- 你对这条规则的理解，已经足以一字一句地说清楚自己是怎么验证它的。
+任意一条不成立，就回去继续迭代 skill，先别开始蒸馏。
+## 蒸馏决策
+对基于 skill 的验证流程里每一步，问自己：
+### 这一步能用正则或 Python 完成吗？（成本：零）
+- 已知格式的日期抽取 → 正则
+- 阈值数值比较 → Python 算术
+- 中文数字转换 → Python 查表
+- 格式校验（身份证号、代码） → 正则
+- 从结构化 markdown 抽表格单元 → 字符串处理
+如果能，就写成代码。这类操作免费、快速、确定性强。
+### 这一步需要语言理解吗？（成本：一次 worker LLM 调用）
+- 在文档里找出相关段落 → LLM
+- 抽取一个用自然语言描述的实体 → LLM
+- 判定语义充分性（"披露是否充分"） → LLM
+- 解析有歧义的引用 → LLM
+如果是，就设计一个 worker LLM 提示词。在保证精度的前提下，用最小的模型层级。
+### 混合方案（最常见）
+大多数规则是混合体：正则抽数字，Python 比阈值，LLM 处理少数特殊情形。把 workflow 设计成流水线——便宜的步骤先跑，昂贵的步骤只在需要时才跑。
+### 正则不足以应付时——决策标准
+在宣布蒸馏完成之前，先审计每条规则的 `verification_type` / `metric` / `evidence_type`（或目录里对应的字段）。如果某条规则所需的验证属于以下类型之一：
+- **语义** 判断
+- **上下文** 解读
+- **反事实** 推理
+- **跨字段算术**
+仅靠正则几乎肯定不够。可接受的形式有三种：
+1. **纯正则，附带显式限制说明** —— 写正则核查，并在注释里说明脆弱性（例如："只匹配语法模式；无法检测语义保证"）
+2. **正则 + LLM 混合** —— 正则基线处理明显的情形，`worker_llm_call`（tier1-2）处理有歧义的情形。混合 workflow 要显式声明哪些 rule_id 会被上升到 LLM。
+3. **纯 LLM，通过 `worker_llm_call`** —— 对完全语义化、没有有意义正则基线的规则。
+对 `verification_type` 是 `judgment` / `semantic` 的规则，不要只交付一段纯正则、又不附"显式限制"的说明。未来的你或同事会以为正则就够用——这种 bug 能埋藏好几个月。
+### Worker LLM 的成本-意识层级选择
+如果确实要上 LLM：
+- **tier1**（能力最强，~¥0.001-0.002/doc）：跨字段推理、歧义解析、能受益于 chain-of-thought 的规则
+- **tier2-3**：批量抽取 + 简单语义检查
+- **tier4**（最便宜）：正则无法覆盖、量又很大的关键词识别。注意：SiliconFlow 上的 tier4 模型是 Qwen3.5 thinking 模式——如果 `reasoning_content` 把 max_tokens 用光，`content` 可能返回空字符串。在依赖之前先用真实提示词测试。如果出现空响应，要么把 max_tokens 提到 ≥8192，要么缩短提示词，要么回退到 tier1-2。
+v0.7.1 两位审计 conductor（DS 和 GLM）默认都走全正则蒸馏，只有当用户显式要求"V2，带 worker LLM"时才加上 LLM 上升路径。如果你的规则目录里有任何一条规则的验证本质上就是语义性的，你应当主动伸手去用 `worker_llm_call`——不要等别人要你才用。
+## Workflow 结构
+一个 workflow 是 `workflows/` 下的一个 Python 文件（或几个相关的小文件）：
+```
+workflows/
+  rule_001_capital_adequacy/
+    workflow_v1.py        # The main workflow script
+    prompts/
+      extract.txt         # Worker LLM prompt for extraction
+      judge.txt           # Worker LLM prompt for judgment (if needed)
+    config.json           # Model assignments, thresholds
+```
+workflow 文件应当有清晰的入口：
+```python
+def verify(document_text: str, config: dict) -> dict:
+    """
+    Returns:
+        {
+            "rule_id": "R001",
+            "result": "pass" | "fail" | "missing" | "error",
+            "extracted_value": ...,
+            "confidence": 0.0-1.0,
+            "comment": "..." (only when fail),
+            "model_used": "...",
+            "llm_calls": int,
+            "llm_tokens": int
+        }
+    """
+```
+这是参考，不是死契约。按具体规则的需要调整结构。重要的是每个 workflow 都能产出可以与 skill ground truth 做对比的结果。
+## Worker LLM 的提示词工程
+worker LLM 的上下文窗口较小（典型 16K-32K token）。设计提示词时要满足：
+1. **自包含。** 模型需要的一切都写进提示词。不要假设模型还记得之前几次调用的上下文。
+2. **指定输出格式。** "返回一个 JSON 对象，字段包括：value、confidence、reasoning。" 结构化输出能减少解析错误。
+3. **只送进窄化后的上下文。** 不要把整篇文档喂给它。用树状处理流水线（整篇文档 → 相关章 → 相关节）把上下文窄化之后再调 worker LLM。
+4. **提示词用文档同语言。** 中文文档配中文提示词，英文文档配英文提示词。不要在同一份提示词里混用两种语言。
+5. **示例要克制。** 一两个例子有用，十个例子既浪费上下文窗口、又容易过拟合。
+## 模型层级选择
+对每一步，先用最高层级（TIER1）。测精度。再尝试更低的层级：
+1. 用 TIER1 在所有 Samples/ 上跑 workflow，记录每一步的精度。
+2. 对每一步，换 TIER2 试一次。如果精度仍高于 WORKFLOW_ACCURACY，就保留 TIER2。
+3. 继续逐步降级，直到精度跌破阈值。
+4. 把每一步的最优层级写进 `config.json`。
+同一个 workflow 内不同步骤可以用不同层级。抽取也许需要 TIER2，而判断也许用 TIER3 就够。
+### 正式降级协议
+上面这种基础做法可用，但更严格的协议能避免过早锁定层级：
+**方向**：从上往下（TIER1 → TIER4），先确立精度上限。你得先知道最优精度能到哪里，才能开始用它换成本。
+**最小测试样本**：在做层级决定之前，每个候选层级至少跑足够数量的文档（例如 `min(10, total_samples)`）。小样本不可靠——3 篇文档的测试可能完全误导你。
+**精度差触发条件**：如果某个较低层级的精度明显低于较高层级（例如差超过 5 个百分点），该步就保留较高层级。差距在容差内，就用更便宜的层级。
+**逐步独立**：每个 workflow 步骤单独评估。把每一步的最优层级写到 `config.json` 里。不要假设整条 workflow 都得用同一个层级。
+**再评估触发条件**：如果生产质控发现某一步的精度在退化（例如出现了新格式的文档），就对那一步重跑层级评估。
+**模型-任务推荐表**：维护一份"任务类型 → 推荐层级"的项目级映射，基于你自己的测试经验。时间长了，这些表可以跨项目汇总，形成通用的层级建议。
+这里所有数字（10 篇文档、5 个百分点等）都只是推荐起点。编码 agent 和开发者用户应当根据具体的体量、精度要求、成本约束做校准——甚至彻底替换为别的评估方法。重要的是模式：**在每个层级测试 → 对比精度 → 在容差内时锁定 → 退化时再评估**。
+这与 `document-parsing` 里 parser 上升的层级转移框架是同一套：由一个质量/精度评分驱动"保留 / 上升 / 跳过"的决定。
+## 用 Ground Truth 做测试
+编码 agent 基于 skill 的结果就是 ground truth。对 Samples/ 下每篇文档：
+1. 跑一遍 workflow。
+2. 把 workflow 的结果与 skill 的结果对比。
+3. 记录差异：哪一步失败，期望值 vs 实际值。
+4. 计算精度：`(匹配的结果数) / (总文档数)`。
+5. 如果精度 < WORKFLOW_ACCURACY，定位并修复。用 `evolution-loop` 方法学。
+## 版本管理
+每次迭代是一个新版本文件：`workflow_v1.py`、`workflow_v2.py`，依此类推。在 `config.json` 里追踪当前激活的版本。完整方法学见 `version-control` skill。
+## Workflow 发布
+workflow 达到精度阈值后，就可以通过 `release` 工具打包给最终用户。每次发布是 `output/releases/<slug>/` 下的一个自包含目录，里面有钉住的 workflow、一个 Python 运行器、一个置信度评分器、一个 HTML 仪表盘生成器，以及一个 `serve.sh` 启动脚本。整个包不依赖 kc-beta——任何装了 Python 并有 worker LLM API key 的人都能跑 `python run.py <doc>` 并得到验证结果。
+打包什么内容由你决定：是 catalog 里所有规则，还是用 `include` 参数挑出来的子集；要不要捆绑 1-3 份代表性样本放到 `fixtures/`，好让接收方在没有自己数据的情况下也能空跑一遍。
+`release` 工具会先给工作区打 git 快照（tag 是 `snap/release-<slug>`），即使 `output/releases/` 之后被清理，整个包也能从 git 再生。何时发布由你决定——没有自动化，也没有强制的节奏。常见触发点：workflow 达到 SKILL/WORKFLOW_ACCURACY 阈值；某位利益相关者需要交接；生产 cron 应该跑钉住的版本而不是最新版。和开发者用户讨论后决定。
+## 成本追踪
+追踪每次 workflow 运行的成本：
+- 每篇文档的 LLM 调用次数。
+- 每篇文档消耗的总 token 数。
+- 每次调用使用的模型层级。
+这份数据帮助开发者用户理解生产成本，也为后续优化提供依据。
+## Worker LLM API
+Worker LLM 通过 SiliconFlow API 访问。连接信息在 `.env` 里：
+- `SILICONFLOW_API_KEY` —— 鉴权
+- `SILICONFLOW_BASE_URL` —— API 端点
+- `TIER1` 到 `TIER4` —— 各层级的模型名称
+各模型当前的能力与上下文窗口大小，见 `references/worker-llm-catalog.md`。

package/template/skills/zh/{meta-meta/task-decomposition → task-decomposition}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: task-decomposition
+tier: meta-meta
 description: Decompose each verification rule into independent sub-tasks and assign the optimal method (rule, code, LLM, manual) to each. Use when converting extracted rules into implementation plans, when a rule skill is too expensive or inaccurate and needs restructuring, or when designing a multi-step verification pipeline. Covers MECE decomposition, method selection via the four-dimension decision matrix, cost-benefit analysis, and source tagging. Also use when auditing an existing workflow for cost optimization opportunities.
 ---

package/template/skills/zh/{meta/tree-processing → tree-processing}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: tree-processing
+tier: meta
 description: >
   Design production-grade document chunking mechanisms for verification workflows. Use when
   building the chunking step of a workflow that will run repeatedly on many documents.

package/template/skills/zh/{meta-meta/version-control → version-control}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: version-control
+tier: meta
 description: Manage versioning of skills, workflows, prompts, and system configuration throughout the lifecycle. Use when skills are modified, workflows are regenerated, prompts are updated, or any artifact needs rollback capability. Covers what to version, how to version with file-system conventions, maintaining a version manifest, and rollback procedures. Also use when comparing performance between versions or when production results need to trace back to the exact workflow version that produced them.
 ---