npm - kc-beta - Versions diffs - 0.7.2 → 0.7.5 - Mend

kc-beta 0.7.2 → 0.7.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (90) hide show

package/template/AGENT.md CHANGED Viewed

@@ -1,20 +1,195 @@
-# AGENT.md — Project Context
+# AGENT.md — KC Project Context
-This file is your per-project memory. Update it as you learn about the project.
-The content here is injected into your system prompt on every turn.
+This file is injected into the agent's system prompt every turn. The
+top sections describe KC's design philosophy + your mission (static
+across sessions); the bottom sections are per-project memory you
+update as you learn about this specific business scenario.
-## Project
+> **Skill priority**: meta-meta skills are architectural — they
+> override meta (how-to) skills when guidance conflicts. The
+> architect's frame bounds the technique. If you find yourself
+> rationalizing past a meta-meta principle to follow a meta procedure,
+> stop — the frame should bound the technique, not the other way
+> around. Each skill declares its tier in YAML frontmatter (`tier:
+> meta-meta` or `tier: meta`).
+---
+# KC Reborn — Document Verification Workspace
+## What This Workspace Is
+You are a coding agent tasked with building a document verification app for the developer user's specific business scenario. The meta skills in `skills/` encode the methodology of experienced verification system architects and business analysts. You bring the intelligence and judgment to apply this methodology to the specific case at hand.
+Your goal: build a verification system that starts with you doing the work, then gradually distills your capability into cheap, fast workflows powered by worker LLMs. You are the ground truth. The workflows you create are the deliverables.
+## Roles
+- **Developer user**: The human you serve. They are a domain expert (e.g., tech lead at a bank's loan department). They provide the rules, the documents, and the business context. Discuss decisions with them.
+- **You (the coding agent)**: You are both the Builder (creating skills and workflows) and the Observer (judging quality). You do the verification first, prove it works, then teach smaller models to replicate your results.
+- **Worker LLMs**: The performers. Models configured in `.env` (TIER1 through TIER4) that will execute the workflows you build. Your job is to find the smallest model that works for each task.
+## Workspace Layout
+```
+Rules/       — Regulation documents, compliance notes from the developer user
+Samples/     — Sample documents for testing (your training set)
+Input/       — Production document batches awaiting verification
+Output/      — Verification results
+skills/      — Methodology skills (current phase's available set)
+.env         — Configuration: API keys, model tiers, thresholds, language
+```
+Note: KC's session workspace under `~/.kc_agent/workspaces/<sessionId>/`
+uses lowercase counterparts (`rules/`, `samples/`, `input/`, `output/`,
+`logs/`, `workflows/`, `rule_skills/`) — these are runtime-internal and
+separate from this project's user-facing folders above. The asymmetry
+is intentional: title-case for human-facing project dirs, lowercase for
+KC's working state.
+## Your Mission
+Follow this lifecycle. Each step references the skill(s) to consult.
+Always-loaded skills are already in your system prompt (above); other
+skills are listed under "Available Methodology Skills" and require
+`consult_skill(name)` to load the body.
+1. **Bootstrap** → `bootstrap-workspace` (always loaded). Understand the business scenario, read Rules/, scan Samples/, configure .env with the developer user.
+2. **Extract Rules** → `rule-extraction` (always loaded). Decompose regulation documents into atomic, testable verification rules.
+3. **Decompose Tasks** → `work-decomposition` (always loaded in skill_authoring). Decide ordering, grouping, and TaskBoard structure.
+4. **Map Rule Relationships** → `consult_skill("rule-graph")`. Identify shared entities, dependencies, and conflicts between rules. Each rule stays independently executable.
+5. **Write Rule Skills** → `skill-authoring` (always loaded in skill_authoring). Write each rule into a skill folder. Before writing extraction logic for a new document type, `consult_skill("data-sensibility")` to observe the data first.
+6. **Test Skills** → Apply each skill to Samples/. `evolution-loop` is always loaded in skill_testing — use it to diagnose failures and iterate. Continue until accuracy meets SKILL_ACCURACY threshold in .env.
+7. **Distill to Workflows** → `skill-to-workflow` (always loaded in distillation). Convert proven skills into Python code + worker LLM prompts. Test workflows against your own results as ground truth. Iterate until WORKFLOW_ACCURACY is met.
+8. **Production QC** → `quality-control` (always loaded in production_qc). Run workflows on Input/. Sample and review results based on confidence scores. For multi-document cases, `consult_skill("cross-document-verification")`. Use `evolution-loop` when quality drops.
+9. **Stabilize** → Gradually reduce monitoring as workflows prove reliable. Only intervene when rules change or quality drops.
+10. **Report** → `consult_skill("dashboard-reporting")`. Generate HTML dashboards so the developer user can see results, progress, and issues. Ensure dashboards include feedback collection mechanisms for users.
+Throughout: `consult_skill("version-control")` to track changes. `consult_skill("corner-case-management")` to handle edge cases without polluting workflows.
+## Core Principles
+- **Minimum viable model**: Always use the smallest, cheapest, fastest model that meets the accuracy threshold. Start simple, escalate only when necessary.
+- **JIT structure**: Do not design schemas or formats prematurely. Define them when needed, keep them consistent once defined.
+- **OTF evolution**: The system you build today may look completely different tomorrow. Embrace change.
+- **Skills before workflows**: Prove each rule works as a skill (you executing it) before distilling into code + worker LLM prompts.
+- **Log everything**: Every test iteration, every evolution decision, every version change. Both JSON (machine-readable) and plain text (human-readable).
+## How to Use Skills
+Skills are loaded in two ways:
+1. **Always loaded** — bodies are inline in this system prompt above the project orientation. These are the architecturally-required skills for the current phase. Treat them as authoritative.
+2. **Available — call consult_skill(name)** — listed by name + description in the system prompt under "Available Methodology Skills." Call `consult_skill("<name>")` to load the body into your conversation history when the description tease isn't enough.
+The skill body is the methodology. Skills convey philosophy and decision frameworks. Adapt them to the specific business case. Do not follow them rigidly.
+## Communication with Developer User
+- **Proactively discuss**: rule granularity, accuracy thresholds, model selection, edge cases.
+- **Report progress**: after each testing round, share results and next steps.
+- **Escalate**: when you cannot resolve an issue after iterating, surface it with evidence.
+- **Ask**: the developer user is a domain expert. When in doubt about a rule's intent, ask.
+---
+# KC Reborn — 文档核查工作区
+> **技能优先级**: meta-meta 技能是架构层面 —— 当指导冲突时，
+> meta-meta 凌驾于 meta (技法层面) 之上。架构师的框架约束技法。
+> 如果你发现自己在为了遵循一条 meta 程序而绕开一条 meta-meta
+> 原则，停下 —— 框架应当约束技法，而不是反过来。每个技能在
+> YAML frontmatter 中声明自己的层级 (`tier: meta-meta` 或
+> `tier: meta`)。
+## 这是什么
+你是一个编程智能体，负责为开发者用户的具体业务场景构建文档核查应用。`skills/` 中的元技能编码了资深核查系统架构师和业务分析师的方法论。你负责运用智慧和判断力，将这些方法论应用到具体场景中。
+你的目标：构建一个核查系统，先由你亲自执行核查工作，然后逐步将你的能力蒸馏为由 Worker LLM（执行模型）驱动的低成本、高速度的工作流。你是基准真值。你创建的工作流是最终交付物。
+## 角色定义
+- **开发者用户**：你服务的人。他们是领域专家（如银行信贷部门的技术负责人）。他们提供规则、文档和业务背景。与他们讨论决策。
+- **你（编程智能体）**：你既是构建者（创建技能和工作流），也是观察者（评判质量）。你先执行核查，证明方法可行，再教小模型复现你的结果。
+- **Worker LLM**：执行者。在 `.env` 中配置的模型（TIER1到TIER4），将执行你构建的工作流。你的任务是为每项工作找到能胜任的最小模型。
+## 工作区结构
+```
+Rules/       — 法规文件、开发者用户的合规注释
+Samples/     — 用于测试的样本文件（你的训练集）
+Input/       — 等待核查的生产批次文件
+Output/      — 核查结果
+skills/      — 当前阶段可用的方法论技能
+.env         — 配置：API密钥、模型层级、阈值、语言
+```
+注：KC 在 `~/.kc_agent/workspaces/<sessionId>/` 下的会话工作区使用
+小写对应目录（`rules/`、`samples/`、`input/`、`output/`、`logs/`、
+`workflows/`、`rule_skills/`）—— 这些是运行时内部目录，与本项目上面
+那些用户可见的目录是分开的。这种大小写不对称是有意的：项目里给人看
+的目录用首字母大写；KC 自己的工作状态用小写。
+## 你的使命
+遵循以下生命周期。常驻加载的技能已经在你的系统提示词中；其他技能在"可用方法论技能"清单里列出，调 `consult_skill(name)` 才能加载正文。
+1. **初始化** → `bootstrap-workspace`（常驻）。理解业务场景，阅读 Rules/，浏览 Samples/，与开发者用户配置 .env。
+2. **提取规则** → `rule-extraction`（常驻）。将法规文件分解为原子级、可测试的核查规则。
+3. **任务分解** → `work-decomposition`（skill_authoring 常驻）。决定顺序、分组以及 TaskBoard 结构。
+4. **构建规则图谱** → `consult_skill("rule-graph")`。识别规则间的共享实体、依赖关系和潜在冲突。每条规则保持独立可执行。
+5. **编写规则技能** → `skill-authoring`（skill_authoring 常驻）。将每条规则写入技能文件夹。编写新文档类型的提取逻辑前，先 `consult_skill("data-sensibility")` 观察数据。
+6. **测试技能** → 在 Samples/ 上应用每个技能。`evolution-loop` 在 skill_testing 常驻 —— 用它诊断失败并迭代。直到准确率达到 .env 中的 SKILL_ACCURACY 阈值。
+7. **蒸馏为工作流** → `skill-to-workflow`（distillation 常驻）。将验证过的技能转化为 Python 代码 + Worker LLM 提示词。用你自己的结果作为基准测试工作流。迭代直到达到 WORKFLOW_ACCURACY。
+8. **生产质控** → `quality-control`（production_qc 常驻）。在 Input/ 上运行工作流。根据置信度分数抽样审查结果。涉及多文档案件时，`consult_skill("cross-document-verification")`。质量下降时使用 `evolution-loop`。
+9. **稳定运行** → 随着工作流稳定，逐步降低监控频率。仅在规则变更或质量下降时介入。
+10. **报告** → `consult_skill("dashboard-reporting")`。生成 HTML 仪表板，让开发者用户直观地看到结果、进度和问题。确保仪表盘内置用户反馈收集机制。
+全程：用 `consult_skill("version-control")` 跟踪所有变更，用 `consult_skill("corner-case-management")` 处理边缘案例，不要污染主工作流。
+## 核心原则
+- **最小可用模型**：始终使用能达到准确率阈值的最小、最便宜、最快的模型。从简单开始，必要时才升级。
+- **即时结构（JIT）**：不要过早设计数据结构或格式。需要时定义，定义后保持一致。
+- **即时演进（OTF）**：你今天构建的系统明天可能面目全非。拥抱变化。
+- **先技能后工作流**：先证明每条规则作为技能（你执行）可行，再蒸馏为代码 + Worker LLM 提示词。
+- **记录一切**：每次测试迭代、每个演进决策、每次版本变更。同时保存 JSON（机器可读）和纯文本（人类可读）。
+## 如何使用技能
+技能通过两种方式加载：
+1. **常驻加载** —— 技能正文直接出现在本系统提示词里、项目说明的上方。这些是当前阶段架构上必需的技能，把它们的内容当作权威指导。
+2. **可用 —— 调 consult_skill(name)** —— 在系统提示词的"可用方法论技能"清单里按名字 + 描述列出。当描述简介不够用时，调 `consult_skill("<名字>")` 把技能正文加载到你的对话历史里。
+技能正文是方法论本身。技能传达的是理念和决策框架。请根据具体业务场景灵活运用，不要机械照搬。
+## 与开发者用户的沟通
+- **主动讨论**：规则粒度、准确率阈值、模型选择、边缘案例。
+- **汇报进度**：每轮测试后，分享结果和下一步计划。
+- **升级问题**：迭代后仍无法解决的问题，附带证据提交给开发者用户。
+- **多问**：开发者用户是领域专家。对规则意图有疑问时，问他们。
+---
+## Per-project memory (you maintain this section)
+The sections below are your scratchpad for this specific project. Update them as you learn about the business scenario, decisions, and edge cases. They persist across your sessions on this project.
+### Project
 <!-- What domain? What regulations? What documents? Fill this in during bootstrap. -->
-## Decisions
+### Decisions
 <!-- Key decisions made with the developer user. Rule granularity, accuracy targets, model choices, scope boundaries. -->
-## Domain Notes
+### Domain Notes
 <!-- Terminology, document formats, naming conventions, edge cases specific to this domain. -->
-## User Preferences
+### User Preferences
 <!-- How the developer user prefers to communicate. Reporting format, language, level of detail. -->

package/template/skills/en/{meta-meta/auto-model-selection → auto-model-selection}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: auto-model-selection
+tier: meta
 description: >
   Use Context7 CLI to get up-to-date LLM model information. Use whenever you need to
   know about available models, model capabilities, pricing, context window sizes, or

package/template/skills/en/{meta-meta/bootstrap-workspace → bootstrap-workspace}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: bootstrap-workspace
+tier: meta-meta
 description: Initialize and configure a document verification workspace. Use when a developer user first opens this workspace, when .env needs configuration, or when the business scenario needs to be understood. Guides the coding agent through reading regulation documents, understanding the developer user's business context, configuring model tiers and thresholds, and establishing the working relationship. Covers initial conversation with developer user to scope the verification task, set expectations, and agree on checkpoints.
 ---

package/template/skills/{zh/meta → en}/compliance-judgment/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: compliance-judgment
+tier: meta
 description: Determine whether extracted entities comply with verification rules. Use after entity extraction to make the pass/fail judgment for each rule on each document. Covers translating natural language rules into executable logic, choosing between Python calculation and LLM semantic judgment, and producing actionable comments on failures. Also use when designing the judgment step of a workflow or when a rule's judgment logic needs debugging.
 ---

package/template/skills/en/{meta/confidence-system → confidence-system}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: confidence-system
+tier: meta
 description: Design and calibrate confidence scoring for extraction and verification results. Use when building any workflow that needs to quantify trust in its output, when setting up quality control sampling thresholds, or when calibrating existing confidence scores against actual accuracy. Confidence is the bridge between workflows and quality control — high confidence means less review, low confidence means more review. Also use when the quality control skill reports that confidence scores do not correlate with actual correctness.
 ---

package/template/skills/en/{meta/corner-case-management → corner-case-management}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: corner-case-management
+tier: meta
 description: Identify, catalog, and handle corner cases that do not fit the mainstream verification workflow. Use when the evolution loop classifies a failure as a corner case (affecting less than ~10% of documents), when adding a new edge case to the registry, or when deciding whether a corner case should be promoted to a systemic fix. Also use when designing the corner case detection mechanism for a workflow.
 ---

package/template/skills/en/{meta/cross-document-verification → cross-document-verification}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: cross-document-verification
+tier: meta
 description: Perform case-level analysis across multiple documents for the same transaction. Use when documents do not exist in isolation — main contracts have appendices, loan applications come bundled with income certificates, bank statements, credit reports, and property appraisals. Use to build comparison matrices, detect contradictions (hard mismatches and soft implausibilities), classify severity, and flag fraud signals. Also use when user or end-user reports a cross-document inconsistency — these reports are ground truth and take priority over agent judgment.
 ---

package/template/skills/en/{meta-meta/dashboard-reporting → dashboard-reporting}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: dashboard-reporting
+tier: meta-meta
 description: Generate HTML dashboards for developer users to visualize verification results, system progress, and quality metrics. Use when a testing round completes, when production batches finish processing, when the developer user wants to see the system's status, or at any point where visual reporting would help communicate progress. Dashboards should be self-contained HTML files that can be opened by double-clicking. Also use when the developer user asks about results, accuracy, or system health.
 ---

package/template/skills/en/{meta/data-sensibility → data-sensibility}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: data-sensibility
+tier: meta
 description: Build intuition about document data before writing extraction logic. Use before designing any extraction schema or regex pattern, when onboarding a new document type, or when extraction accuracy is unexpectedly low and you suspect a data assumption is wrong. Covers systematic observation of raw documents, spot-checking extracted results, distribution analysis, and recognizing suspicious patterns. If you are about to write code that touches document data and you have not read at least five documents end-to-end, stop and use this skill first.
 ---

package/template/skills/{zh/meta → en}/document-chunking/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: document-chunking
+tier: meta
 description: >
   Fast, cheap chunking for processing batches of sample and input documents.
   Use when you need to split documents into manageable pieces for initial observation,

package/template/skills/en/{meta/document-parsing → document-parsing}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: document-parsing
+tier: meta
 description: Parse source documents into machine-readable text with maximum fidelity. Use when processing any document in Samples/ or Input/ for the first time, when parsed text quality is poor, or when tables and charts need special handling. Covers multi-level parser selection from simple text extraction to OCR and vision models. Also use when a verification rule fails due to parsing issues (garbled text, missing tables, mangled layouts) and the parser needs to be upgraded for that document type.
 ---

package/template/skills/{zh/meta → en}/entity-extraction/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: entity-extraction
+tier: meta
 description: Extract specific entities, values, and text segments from documents as required by verification rules. Use after tree processing has located the relevant section, when a rule needs a specific number, date, name, amount, clause, or any domain-specific entity extracted. Covers extraction method selection (regex vs LLM), schema design, postprocessing, and confidence annotation. Also use when designing the extraction step of a workflow for worker LLMs.
 ---

package/template/skills/en/{meta-meta/evolution-loop → evolution-loop}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: evolution-loop
+tier: meta-meta
 description: Drive continuous improvement of skills and workflows through the diagnose-classify-fix-retest cycle. Use after any testing round reveals failures, when production quality control flags issues, or when accuracy drops below thresholds. Covers failure analysis, distinguishing systemic issues from corner cases, deciding whether to rewrite or patch, and knowing when to stop iterating. The evolution loop is the heartbeat of the system. Also use when transitioning between lifecycle phases (skill testing, workflow testing, production monitoring).
 ---

package/template/skills/en/{meta-meta/pdf-review-dashboard → pdf-review-dashboard}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: pdf-review-dashboard
+tier: meta
 description: >
   Generate a two-column PDF review dashboard for manual verification result checking.
   Left panel shows the original PDF document, right panel shows verification results.

package/template/skills/en/{meta-meta/quality-control → quality-control}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: quality-control
+tier: meta-meta
 description: Design and execute quality control for production verification workflows. Use when workflows are deployed on Input/ documents and results need to be monitored, when designing the QC sampling strategy for a rule, or when evaluating whether monitoring can be reduced. Covers LLM-as-Judge evaluation, adaptive sampling strategies, confidence-based triage, and the transition from active monitoring to stable oversight. Also use when production quality drops and you need to diagnose whether to trigger the evolution loop.
 ---

package/template/skills/en/{meta-meta/rule-extraction → rule-extraction}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: rule-extraction
+tier: meta
 description: Extract and organize business verification rules from regulation documents into discrete, testable units. Use when processing documents in Rules/ to identify individual verification rules, when decomposing a regulation into atomic checks, or when the developer user adds new regulation files. Covers reading regulation text, identifying rule boundaries, determining granularity, handling cross-references, and producing a rule catalog. Also use when rules are provided in structured formats like xlsx or csv.
 ---
@@ -133,6 +134,65 @@ conversation or existing catalog. Therefore, when composing the brief:
   catalog.json.** rule_catalog uses workspace file locking;
   sandbox_exec bypasses it and races with other writers.
+## How to read regulation files (default: read whole)
+Regulations are the audit's authoritative basis. Every `source_ref`
+in your extracted rules must be verifiable against the source text.
+For typical regulation documents (a single file under ~50 KB / under
+~100 pages), **read each regulation file whole using `workspace_file`
+(operation=read) in a single call**:
+```js
+workspace_file({ operation: "read", scope: "project", path: "Rules/01_some_regulation.md" })
+```
+`workspace_file.read` is capped at 50,000 chars per call, which
+covers virtually every individual regulation document. This is the
+default. **Read every regulation file whole before you start
+extracting rules from any of them.**
+### Tool choice — `workspace_file` vs `sandbox_exec`
+| Tool | Per-call cap | Use for |
+|---|---:|---|
+| `workspace_file` (read) | 50,000 chars | **full reads of regulation / rule documents** |
+| `sandbox_exec` (cat/head/etc) | 10,000 chars | shell commands, **not** full file reads |
+`sandbox_exec` is designed for shell commands; its 10K cap is too
+small for most regulations. `cat rules/01_*.md` returns only the
+first ~10 KB followed by `\n[truncated]`. Re-issuing with `head -N` /
+`tail -M` to scroll the window loses positional precision and burns
+turns. **When you see truncation, don't fight the cap — switch
+tools.**
+### Asymmetry — regs read whole, samples sampled
+Regulations are limited (typically 1-10 files), authoritative, and
+read once. Read every regulation whole.
+Sample documents may number 30 to 1000+, are heterogeneous, and get
+read many times during testing. **Don't try to read every sample
+whole.** Use rule-applicability filters or sampled subsets to focus
+attention.
+### Escape valve — when a single reg exceeds ~200K chars
+Rare in practice. The largest regulation in `test_data_4` is 42 KB;
+typical Chinese banking regs (资管新规, 信披办法, etc.) all fit
+under 50 KB. But if you do encounter a single regulation so large
+that reading it whole would crowd the context window — heuristic:
+the file exceeds ~200,000 chars or ~25% of your context budget —
+use your own judgment:
+- Read by chapter (e.g., `第X章` / `Chapter X`) using `document_parse`
+  or paginated `workspace_file` reads
+- Or build an in-workspace index file pointing to chapter offsets and
+  read on-demand per rule being extracted
+The 50 KB cap is high enough that this almost never triggers. **The
+default is read whole; deviate only when the file genuinely doesn't
+fit.**
 ## Extraction Strategies
 ### Strategy 1: Structured Input (Developer User Provides Rules)

package/template/skills/en/{meta-meta/rule-graph → rule-graph}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: rule-graph
+tier: meta-meta
 description: Build and maintain a graph of relationships between verification rules — shared entities, logical dependencies, and conflicts. Use when analyzing the impact of a regulation change, when optimizing extraction to avoid duplicate work, when checking rule catalog completeness, or when rolling up document-level results into a summary. Critical constraint — the graph is an overlay for analysis, NOT a prerequisite for execution. Every rule must remain independently runnable.
 ---

package/template/skills/en/{meta-meta/skill-authoring → skill-authoring}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: skill-authoring
+tier: meta
 description: Write each verification rule into a Claude Code skill folder following the official skill format. Use when converting extracted rules into skill folders, when iterating on existing rule skills after testing, or when the developer user wants to capture domain knowledge as a skill. Each skill folder must be self-contained with business logic in SKILL.md, code in scripts/, regulation context in references/, and sample data in assets/. Also use the bundled skill-creator for the full eval/iterate workflow.
 ---

package/template/skills/en/skill-creator/SKILL.md CHANGED Viewed

@@ -1,6 +1,7 @@
 ---
 name: skill-creator
-description: Anthropic's skill-scaffolding toolkit — use for iterating/improving existing skills or running evals on them, NOT as the primary reference for building KC's per-rule verification skills. For KC rule skills, read `meta-meta/skill-authoring` first (canonical folder layout + granularity rules + KC-specific check.py entry-point conventions) and `meta-meta/work-decomposition` for ordering + grouping decisions. This skill applies once per-rule skills exist and the agent wants to optimize their description/triggering or run formal evals.
+tier: meta
+description: Anthropic's skill-scaffolding toolkit — use for iterating/improving existing skills or running evals on them, NOT as the primary reference for building KC's per-rule verification skills. For KC rule skills, consult `skill-authoring` first (canonical folder layout + granularity rules + KC-specific check.py entry-point conventions) and `work-decomposition` for ordering + grouping decisions. This skill applies once per-rule skills exist and the agent wants to optimize their description/triggering or run formal evals.
 ---
 # Skill Creator

package/template/skills/en/{meta-meta/skill-to-workflow → skill-to-workflow}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: skill-to-workflow
+tier: meta
 description: Distill a proven verification skill into a Python workflow with worker LLM prompts. Use when a rule skill has been tested and reaches the SKILL_ACCURACY threshold defined in .env. Covers the decision of what to implement as code vs LLM calls, prompt engineering for small context windows, model tier selection and progressive downgrade, and testing workflows against the coding agent's own results as ground truth. Also use when optimizing existing workflows for cost or speed.
 ---
@@ -49,10 +50,10 @@ Most rules are a mix: regex extracts the number, Python compares it to the thres
 Before declaring distillation complete, audit each rule's `verification_type` / `metric` / `evidence_type` (or equivalent fields in your catalog). For rules where the required verification is one of:
-- **Semantic** ("is this a positive guarantee or a disclaimer?")
-- **Contextual** ("interpret this in light of the document's product type")
-- **Counterfactual** ("what should this value be, given the other fields?")
-- **Cross-field arithmetic** ("does 期初 + 收益 - 分配 = 期末?")
+- **Semantic** judgment
+- **Contextual** interpretation
+- **Counterfactual** reasoning
+- **Cross-field arithmetic**
 regex alone rarely suffices. Three acceptable forms:

package/template/skills/en/{meta-meta/task-decomposition → task-decomposition}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: task-decomposition
+tier: meta-meta
 description: Decompose each verification rule into independent sub-tasks and assign the optimal method (rule, code, LLM, manual) to each. Use when converting extracted rules into implementation plans, when a rule skill is too expensive or inaccurate and needs restructuring, or when designing a multi-step verification pipeline. Covers MECE decomposition, method selection via the four-dimension decision matrix, cost-benefit analysis, and source tagging. Also use when auditing an existing workflow for cost optimization opportunities.
 ---

package/template/skills/en/{meta/tree-processing → tree-processing}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: tree-processing
+tier: meta
 description: >
   Design production-grade document chunking mechanisms for verification workflows. Use when
   building the chunking step of a workflow that will run repeatedly on many documents.

package/template/skills/en/{meta-meta/version-control → version-control}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: version-control
+tier: meta
 description: Manage versioning of skills, workflows, prompts, and system configuration throughout the lifecycle. Use when skills are modified, workflows are regenerated, prompts are updated, or any artifact needs rollback capability. Covers what to version, how to version with file-system conventions, maintaining a version manifest, and rollback procedures. Also use when comparing performance between versions or when production results need to trace back to the exact workflow version that produced them.
 ---

package/template/skills/en/{meta-meta/work-decomposition → work-decomposition}/SKILL.md RENAMED Viewed

@@ -1,5 +1,6 @@
 ---
 name: work-decomposition
+tier: meta-meta
 description: Decide how to decompose the rule set into TaskBoard tasks during rule_extraction → skill_authoring transition. Covers ordering methodologies (difficulty-first / Shannon–Huffman, breadth-first, depth-first, binary partition), grouping rules (when to bundle multiple rules into one task vs. keep separate), three-axis difficulty estimation, and how to write PATTERNS.md project memory that stays useful across the run. Use when entering rule_extraction, when entering skill_authoring, or whenever the TaskBoard feels wrong and you want to re-decompose.
 ---
@@ -7,7 +8,7 @@ description: Decide how to decompose the rule set into TaskBoard tasks during ru
 KC's main agent is the conductor. The conductor decides what work to do next — and that decision is upstream of every other choice that follows. Wrong decomposition makes the rest of the run expensive: if rules are processed in the wrong order, the agent re-designs the same shape three times. If unrelated rules are bundled into one skill, the resulting check.py becomes the unified-runner anti-pattern from E2E #4. If related rules are split across separate skills, the agent re-derives the shared chunker logic 17 times.
-This skill is the conductor's playbook for that decision. It ships under `meta-meta/` because work decomposition is a system-level discipline, not a per-rule technique. The complementary `task-decomposition` skill (also under `meta-meta/`) covers the *internal* structure of one rule's check — locate, extract, normalize, judge, comment. This skill covers how the rule **set** should be split into TaskBoard items.
+This skill is the conductor's playbook for that decision. It's tagged `tier: meta-meta` because work decomposition is a system-level discipline, not a per-rule technique. The complementary `task-decomposition` skill (also `tier: meta-meta`) covers the *internal* structure of one rule's check — locate, extract, normalize, judge, comment. This skill covers how the rule **set** should be split into TaskBoard items.
 ## When to use this skill
@@ -85,7 +86,7 @@ Bundle multiple rules into a single task (and a single check_r###_r###.py file)
 - The judgment logic for one rule is a substring or close variant of the next
 - A single failure typically implies multiple failures (you can't pass R013 if R015 fails)
-Example: R013 / R015 / R017 all check that a specific table on page 3 of the report contains certain mandatory fields. Same chunk, same parse, same verdict shape. Bundle as `check_r013_r015_r017.py` and create a single TaskCreate task `R013/R015/R017 — required-fields table`. The engine's filesystem-derived milestones recognize the grouped check.py and credit all three rule_ids.
+Example: R013 / R015 / R017 all check that a specific table on page 3 of the report contains certain mandatory fields. Same chunk, same parse, same verdict shape. Bundle as `check_r013_r015_r017.py` and create a single task: `TaskCreate({id: "R013-R015-R017-skill_authoring", title: "R013/R015/R017 — required-fields table", phase: "skill_authoring"})`. The engine's filesystem-derived milestones recognize the grouped check.py and credit all three rule_ids.
 ### When to keep separate
@@ -344,6 +345,40 @@ When entering skill_authoring with an empty TaskBoard:
 5. **Pick the first task.** Work it to completion (skill + check + at least one local test). Update PATTERNS.md with whatever you learned. Move to the next task.
 6. **At task ~5 and task ~10:** stop and re-read PATTERNS.md. If patterns suggest a refactor of earlier work, do it now (cheap) rather than later (expensive).
+### Calling TaskCreate / TaskUpdate / TaskComplete
+The engine registers three task-board tools (v0.7.4):
+- `TaskCreate({id, title, phase, ruleId?})` — adds a task to `tasks.json`. `id` must be unique within the session; pick a stable shape like `<rule_id>-<phase>` for per-rule tasks or `<group-name>-<phase>` for grouped / non-rule tasks. `phase` is the current phase the task belongs to. `ruleId` is optional — set it for per-rule tasks so the engine can credit the rule_id in milestone derivation.
+- `TaskUpdate({id, status?, summary?})` — change a task's status (`pending` / `in_progress` / `completed` / `failed`), optionally with a short summary.
+- `TaskComplete({id, summary?})` — sugar for `TaskUpdate({id, status:"completed", summary})`. Use this after finishing a unit of work.
+### Ralph loop scope — within a phase only
+Important contract (changed in v0.7.4 after team feedback):
+- **Loop scope = current phase only.** TaskCreate populates tasks for the CURRENT phase. The Ralph loop processes them one by one within the phase.
+- **Loop exits at phase boundaries.** When all current-phase tasks complete OR the phase advances (you call `phase_advance`, or anything else changes `currentPhase`), the loop exits cleanly. Control returns to the user.
+- **No engine auto-advance.** The engine does NOT auto-advance phases when tasks complete + exit criteria are met. Phase advance is YOUR explicit call (`phase_advance` tool) or the user's re-prompt.
+- **Don't pre-create tasks for future phases.** They'll be ignored — the loop exits at the phase boundary before processing them. Create tasks only for the phase you're currently in.
+- **Phase boundaries = user checkpoints.** This is intentional. The team needs visibility into progress at natural breakpoints. After your task batch + `phase_advance`, the loop exits, you summarize progress in your final message, the user prompts you to begin the next phase.
+End-to-end autonomous "run from bootstrap to finalization without stopping" is NOT the engine's job — when that capability ships, it'll be an external driver (`/loop`-style command) that calls the agent repeatedly across phases. Inside one invocation, work the current phase fully, advance, and return to the user.
+Examples:
+```
+TaskCreate({ id: "R001-skill_authoring", title: "Author skill for R001",
+             phase: "skill_authoring", ruleId: "R001" })
+TaskCreate({ id: "trust-bundle-skill_authoring",
+             title: "R013/R015/R017 — required-fields table",
+             phase: "skill_authoring" })
+TaskComplete({ id: "R001-skill_authoring",
+               summary: "regex check passes 89/90; R001 done" })
+```
 ### Persisted methodology — PATTERNS.md OR phase logs OR AGENT.md decisions
 The principle: capture framework-level decisions to disk before each phase advance. The conversation will compact, agents will restart, the next phase will lose grounding. Whichever format you pick, write to disk — don't rely on conversation context that disappears.