kc-beta 0.1.2 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. package/bin/kc-beta.js +14 -2
  2. package/package.json +1 -1
  3. package/src/agent/context-window.js +151 -0
  4. package/src/agent/context.js +8 -4
  5. package/src/agent/engine.js +261 -8
  6. package/src/agent/event-log.js +111 -0
  7. package/src/agent/llm-client.js +352 -59
  8. package/src/agent/pipelines/base.js +6 -0
  9. package/src/agent/pipelines/distillation.js +18 -0
  10. package/src/agent/pipelines/extraction.js +21 -0
  11. package/src/agent/pipelines/initializer.js +75 -14
  12. package/src/agent/pipelines/production-qc.js +19 -0
  13. package/src/agent/pipelines/skill-authoring.js +14 -0
  14. package/src/agent/pipelines/skill-testing.js +20 -0
  15. package/src/agent/retry.js +83 -0
  16. package/src/agent/session-state.js +79 -0
  17. package/src/agent/skill-loader.js +13 -1
  18. package/src/agent/token-counter.js +62 -0
  19. package/src/agent/tools/document-parse.js +104 -21
  20. package/src/agent/tools/document-search.js +24 -8
  21. package/src/agent/tools/sandbox-exec.js +16 -5
  22. package/src/agent/tools/web-search.js +107 -0
  23. package/src/agent/tools/worker-llm-call.js +14 -5
  24. package/src/agent/tools/workspace-file.js +47 -20
  25. package/src/agent/workspace.js +24 -1
  26. package/src/cli/components.js +24 -5
  27. package/src/cli/config.js +340 -0
  28. package/src/cli/index.js +113 -11
  29. package/src/cli/onboard.js +216 -53
  30. package/src/config.js +63 -10
  31. package/src/model-tiers.json +153 -0
  32. package/src/providers.js +367 -0
  33. package/template/AGENT.md +20 -0
  34. package/template/skills/en/meta/compliance-judgment/SKILL.md +10 -42
  35. package/template/skills/en/meta/document-chunking/SKILL.md +32 -0
  36. package/template/skills/en/meta/document-parsing/SKILL.md +11 -18
  37. package/template/skills/en/meta/entity-extraction/SKILL.md +13 -28
  38. package/template/skills/en/meta/tree-processing/SKILL.md +19 -1
  39. package/template/skills/en/meta-meta/auto-model-selection/SKILL.md +53 -0
  40. package/template/skills/en/meta-meta/pdf-review-dashboard/SKILL.md +57 -0
  41. package/template/skills/en/meta-meta/pdf-review-dashboard/scripts/generate_review.js +262 -0
  42. package/template/skills/en/meta-meta/rule-extraction/SKILL.md +24 -1
  43. package/template/skills/en/meta-meta/skill-authoring/SKILL.md +6 -0
  44. package/template/skills/en/meta-meta/skill-to-workflow/SKILL.md +4 -0
  45. package/template/skills/zh/meta/compliance-judgment/SKILL.md +41 -262
  46. package/template/skills/zh/meta/document-chunking/SKILL.md +32 -0
  47. package/template/skills/zh/meta/document-parsing/SKILL.md +65 -132
  48. package/template/skills/zh/meta/entity-extraction/SKILL.md +68 -230
  49. package/template/skills/zh/meta/tree-processing/SKILL.md +82 -194
  50. package/template/skills/zh/meta-meta/auto-model-selection/SKILL.md +51 -0
  51. package/template/skills/zh/meta-meta/pdf-review-dashboard/SKILL.md +55 -0
  52. package/template/skills/zh/meta-meta/pdf-review-dashboard/scripts/generate_review.js +262 -0
  53. package/template/skills/zh/meta-meta/rule-extraction/SKILL.md +79 -164
  54. package/template/skills/zh/meta-meta/skill-authoring/SKILL.md +64 -185
  55. package/template/skills/zh/meta-meta/skill-to-workflow/SKILL.md +95 -216
@@ -3,233 +3,112 @@ name: skill-authoring
3
3
  description: Write each verification rule into a Claude Code skill folder following the official skill format. Use when converting extracted rules into skill folders, when iterating on existing rule skills after testing, or when the developer user wants to capture domain knowledge as a skill. Each skill folder must be self-contained with business logic in SKILL.md, code in scripts/, regulation context in references/, and sample data in assets/. Also use the bundled skill-creator for the full eval/iterate workflow.
4
4
  ---
5
5
 
6
- # 核查规则的技能文件夹编写
6
+ # Skill Authoring
7
7
 
8
- ## 核心原则
8
+ Each verification rule becomes a skill folder. The skill must be self-contained: anyone (or any agent) reading just this folder should have everything needed to verify compliance with that one rule.
9
9
 
10
- 每条规则变成一个技能文件夹。文件夹必须自包含——把执行这条核查所需的一切信息都放进去。想象一下:如果另一个编程智能体只看这个文件夹、不看其他任何东西,它能不能正确执行这条核查?如果不能,说明文件夹内容不完整。
10
+ ## Skill Folder Structure
11
11
 
12
- ## 技能文件夹结构
13
-
14
- 每条规则的技能文件夹位于 `rule-skills/` 目录下,命名规范为 `R{编号}-{英文短名}/`:
12
+ Follow the official Claude Code skill format strictly. See `references/skill-format-spec.md` for the complete specification.
15
13
 
16
14
  ```
17
- rule-skills/R001-invoice-date-validity/
18
- ├── SKILL.md # 核查逻辑的完整描述(技能主文件)
19
- ├── scripts/ # 确定性操作的代码脚本
20
- │ ├── extract_date.py # 日期字段提取
21
- │ └── validate.py # 格式校验逻辑
22
- ├── references/ # 法规原文及解读
23
- │ └── regulation.md # 相关法规条文的逐字摘录
24
- ├── assets/ # 样本数据和边界案例
25
- │ ├── samples.json # 测试样本(含预期结果)
26
- │ └── corner_cases.json # 边界案例集
27
- └── CHANGELOG.md # 变更记录
15
+ rule-skills/
16
+ rule-001-capital-adequacy/
17
+ SKILL.md # The verification logic and methodology
18
+ scripts/
19
+ check.py # Deterministic checks (regex, calculations)
20
+ references/
21
+ regulation.md # Original regulation text, verbatim
22
+ interpretation.md # Expert notes on how to interpret edge cases
23
+ assets/
24
+ samples.json # Annotated sample extractions with expected results
25
+ corner_cases.json # Known edge cases with their resolutions
28
26
  ```
29
27
 
30
- ## 编写 SKILL.md
28
+ Not every rule needs all of these. A simple threshold check might only need SKILL.md and a script. A complex semantic rule might need detailed references and many samples. Start minimal, add as needed during testing.
31
29
 
32
- SKILL.md 是技能文件夹的灵魂。编程智能体在执行核查时首先读取这个文件,它必须提供足够清晰的指导。
30
+ ## Writing SKILL.md
33
31
 
34
- ### 前置元数据(Frontmatter
32
+ ### Frontmatter
35
33
 
36
34
  ```yaml
37
35
  ---
38
- name: R001-invoice-date-validity
39
- description: Verify that invoice date falls within the contract validity period and complies with statutory time limits. Use when processing invoices against their corresponding contracts. Checks date format, date range, and cross-references with contract effective/expiry dates.
36
+ name: rule-001-capital-adequacy
37
+ description: Verify that the capital adequacy ratio reported in the document meets the regulatory minimum of 8%. Use when checking capital adequacy compliance in bank financial reports. Check the capital adequacy section or table for the reported ratio and compare against the threshold.
40
38
  ---
41
39
  ```
42
40
 
43
- **name 必须与文件夹名一致。**
44
-
45
- **description 要写得「强势」**——明确告诉系统什么时候该调用这个技能。不要含糊,要具体列出触发场景。description 保持英文以兼容系统调度。
46
-
47
- ### 正文结构
41
+ - **name**: Must match the directory name exactly. Use lowercase, hyphens, no spaces. Prefix with the rule ID from your catalog.
42
+ - **description**: Write it as if explaining to another coding agent when they should use this skill. Be specific about what the rule checks, where to look in the document, and what constitutes pass/fail. Be pushy — include trigger keywords.
48
43
 
49
- 正文使用单据所使用的语言书写。如果核查的是中文单据,正文写中文;如果是英文单据,正文写英文。以下以中文单据为例。
44
+ ### Body Content
50
45
 
51
- #### 一、核查目标
46
+ The body should cover:
52
47
 
53
- 用一两句话说明这条规则要验证什么。
48
+ 1. **What this rule checks** — one paragraph explaining the rule in plain language. Include the regulatory source and intent.
54
49
 
55
- ```
56
- 核查发票开具日期是否落在对应合同的有效期范围内,
57
- 且是否符合法定的开票时限要求。
58
- ```
50
+ 2. **Where to look** — which section, chapter, table, or part of the document contains the relevant information. Be specific. "The capital adequacy ratio is typically found in Chapter 2, Section 'Key Regulatory Metrics' or in the summary table on page 1."
59
51
 
60
- #### 二、待核查字段的定位
52
+ 3. **What to extract** — the specific entities needed. "Extract the reported capital adequacy ratio as a percentage." Define the expected format and any normalization needed.
61
53
 
62
- 明确告诉编程智能体在单据中的什么位置去找需要核查的字段:
54
+ 4. **How to judge** — the logic for pass/fail. "The ratio must be >= 8.0%. If the ratio is missing, flag as MISSING rather than FAIL." For semantic judgments, describe the criteria in natural language.
63
55
 
64
- - 发票上的「开票日期」字段——通常位于发票右上角区域
65
- - 合同中的「合同生效日期」和「合同到期日期」——通常位于合同首页或末页
66
- - 如果是电子发票,日期字段的 JSON 路径是 `invoice.issue_date`
56
+ 5. **Edge cases** — known tricky situations. "Some reports express the ratio as a decimal (0.12) rather than a percentage (12%). Normalize before comparing."
67
57
 
68
- #### 三、提取规范
58
+ 6. **Comment format** — what to say when the rule fails. Keep it concise and actionable. "Capital adequacy ratio is X%, which is below the regulatory minimum of 8%."
69
59
 
70
- 规定字段提取后的标准化格式:
60
+ ### Length and Style
71
61
 
72
- - 所有日期统一转换为 `YYYY-MM-DD` 格式
73
- - 如果原始日期为中文格式(如「2025年3月15日」),需先解析为标准格式
74
- - 缺失字段标记为 `null`,不要猜测或填充默认值
62
+ - Keep SKILL.md under 500 lines. Most rules should be 100-200 lines.
63
+ - Explain the WHY behind the rule, not just the mechanics. Understanding intent helps handle edge cases.
64
+ - Write in imperative form: "Extract the ratio" not "The ratio should be extracted."
65
+ - If detailed regulation text is long, put it in `references/regulation.md` and reference it from SKILL.md.
75
66
 
76
- #### 四、判定逻辑
67
+ ## Pipeline Node Design
77
68
 
78
- 用清晰的条件表达式描述核查逻辑:
69
+ When a skill's workflow has multiple steps, decompose into nodes where each node does one thing well. Each node's difficulty should be well within the model's capability — don't cram location + extraction + judgment into a single LLM call.
79
70
 
80
- ```
81
- IF 发票日期 IS NULL:
82
- 结论 = "无法核查",原因 = "发票日期缺失"
83
- ELIF 合同生效日期 IS NULL OR 合同到期日期 IS NULL:
84
- 结论 = "无法核查",原因 = "合同期限信息不完整"
85
- ELIF 发票日期 < 合同生效日期:
86
- 结论 = "不通过",原因 = "发票开具日期早于合同生效日期"
87
- ELIF 发票日期 > 合同到期日期:
88
- 结论 = "不通过",原因 = "发票开具日期晚于合同到期日期"
89
- ELSE:
90
- 结论 = "通过"
91
- ```
71
+ Pre-processing (text cleaning, format normalization) and post-processing (output parsing, value normalization) are separate nodes, not embedded in the LLM prompt. This keeps prompts clean and makes each step independently testable.
92
72
 
93
- #### 五、边界情况与例外
94
-
95
- 列举已知的特殊情形:
96
-
97
- - 合同存在展期或续签:以最新的到期日期为准
98
- - 开票日期恰好等于合同生效日期或到期日期:视为通过
99
- - 框架合同下的订单发票:以订单对应的执行期限为准,而非框架合同整体期限
100
- - 补开发票的情形:部分法规允许在合同到期后一定期限内补开
101
-
102
- #### 六、输出格式
103
-
104
- 规定核查结果的标准输出结构:
105
-
106
- ```json
107
- {
108
- "rule_id": "R001",
109
- "rule_name": "发票日期有效性",
110
- "verdict": "pass | fail | unable_to_verify",
111
- "confidence": 0.95,
112
- "details": {
113
- "invoice_date": "2025-03-15",
114
- "contract_start": "2025-01-01",
115
- "contract_end": "2025-12-31"
116
- },
117
- "comment": "发票日期在合同有效期内",
118
- "source_ref": "《增值税发票管理办法》第十五条"
119
- }
120
- ```
73
+ ## Writing Scripts
121
74
 
122
- #### 七、批注说明
123
-
124
- 如果核查发现问题,批注(comment)应当专业、简洁、可操作:
125
-
126
- - 正例:「发票日期 2025-03-15 晚于合同到期日 2025-02-28,差异 15 天」
127
- - 反例:「日期有问题」——信息量不足,不可操作
128
-
129
- ## 编写 scripts/
130
-
131
- scripts 目录存放确定性操作的 Python 脚本。规则是:**凡是不需要 LLM 判断的操作,都用代码实现。**
132
-
133
- 适合用代码的操作:
134
- - 日期格式解析与标准化
135
- - 金额的数字与大写转换校验
136
- - 税率计算与验证
137
- - 正则表达式匹配(发票号码格式、统一社会信用代码格式)
138
- - 字段存在性检查
139
-
140
- 不适合用代码的操作:
141
- - 从非结构化文本中提取语义信息
142
- - 判断描述性内容是否合规
143
- - 理解自然语言表述的业务含义
144
-
145
- 脚本要求:
146
- - 纯 Python,不依赖重型第三方库(允许 `re`、`json`、`datetime` 等标准库)
147
- - 函数有清晰的输入输出类型注解
148
- - 包含基本的错误处理
149
- - 可独立运行测试
150
-
151
- ## 编写 references/
152
-
153
- 将与本规则直接相关的法规原文逐字摘录到 `references/regulation.md` 中。
154
-
155
- 要求:
156
- - 逐字引用,不要改写或概括
157
- - 标注出处:法规名称、发布机构、文号、条款编号
158
- - 如果开发者用户对模糊条文给出了解读,以「业务解读」标注记录
159
- - 如果规则涉及多部法规,分段引用,注明各自适用范围
160
-
161
- 这个文件的目的是让核查结论可追溯。任何核查结论都应该能在 references 中找到法规依据。
162
-
163
- ## 编写 assets/
164
-
165
- ### samples.json
166
-
167
- 提供至少 3-5 个测试样本,覆盖以下场景:
168
-
169
- ```json
170
- [
171
- {
172
- "id": "S001",
173
- "description": "标准通过案例",
174
- "input": { "invoice_date": "2025-06-15", "contract_start": "2025-01-01", "contract_end": "2025-12-31" },
175
- "expected_verdict": "pass"
176
- },
177
- {
178
- "id": "S002",
179
- "description": "发票日期超出合同期限",
180
- "input": { "invoice_date": "2026-02-01", "contract_start": "2025-01-01", "contract_end": "2025-12-31" },
181
- "expected_verdict": "fail"
182
- },
183
- {
184
- "id": "S003",
185
- "description": "合同日期缺失",
186
- "input": { "invoice_date": "2025-06-15", "contract_start": null, "contract_end": null },
187
- "expected_verdict": "unable_to_verify"
188
- }
189
- ]
190
- ```
75
+ Scripts in `scripts/` handle deterministic operations:
191
76
 
192
- ### corner_cases.json
77
+ - **Regex patterns** for entity extraction (dates, amounts, ratios, identifiers).
78
+ - **Calculation logic** for threshold checks, ratio computations, cross-field validation.
79
+ - **Format normalization** (Chinese numerals → digits, date format standardization, unit conversion).
193
80
 
194
- 记录在测试过程中发现的边界案例,格式与 samples.json 一致,但额外包含 `discovered_in` 字段标注发现来源(测试轮次、生产核查等)。
81
+ Scripts should be self-contained Python files that can be imported or executed. Include clear input/output documentation in the script's docstring.
195
82
 
196
- ## 技能迭代
83
+ Do not put LLM prompts in scripts. LLM interactions belong in the SKILL.md body or in the workflow (later phase).
197
84
 
198
- 技能不是一次写完就定型的。通过测试和实际核查,技能会不断演化。
85
+ ## Writing References
199
86
 
200
- ### 迭代触发条件
87
+ `references/` holds content that the coding agent reads on demand:
201
88
 
202
- - 测试中发现误判(漏报或误报)
203
- - 开发者用户反馈规则描述不准确
204
- - 发现新的边界案例
205
- - 法规变更影响本规则
89
+ - **regulation.md**: The original regulation text, verbatim. Include the source, date, and version. This is the ground truth that the rule is derived from.
90
+ - **interpretation.md**: Expert notes from the developer user or from the coding agent's own analysis. "When the regulation says 'adequate disclosure', in practice this means the section must be at least 2 paragraphs and cover risks A, B, and C."
206
91
 
207
- ### 迭代操作规范
92
+ Keep references factual and sourced. They are evidence, not instructions.
208
93
 
209
- 1. CHANGELOG.md 中记录变更内容、原因、日期
210
- 2. 更新 SKILL.md 中的相关段落
211
- 3. 如果变更涉及判定逻辑,同步更新 scripts/
212
- 4. 将新发现的边界案例添加到 corner_cases.json
213
- 5. 重新运行全部测试样本,确认无回归
94
+ ## Writing Assets
214
95
 
215
- ### CHANGELOG.md 格式
96
+ `assets/` holds data that supports testing and edge case handling:
216
97
 
217
- ```markdown
218
- ## v1.1 - 2025-04-01
219
- - 新增边界案例:框架合同下的订单发票处理逻辑
220
- - 修正判定逻辑:开票日期等于合同到期日视为通过
221
- - 来源:第二轮测试反馈
98
+ - **samples.json**: Annotated examples. Each entry: the input (extracted text or entity), the expected result (pass/fail/missing), and the expected comment. Build this incrementally as you test.
99
+ - **corner_cases.json**: Edge cases that the standard logic does not handle. Each entry: description, detection pattern, resolution, and confidence threshold. See the `corner-case-management` skill for the methodology.
222
100
 
223
- ## v1.0 - 2025-03-20
224
- - 初始版本,基于《增值税发票管理办法》第十五条提取
225
- ```
101
+ ## Iteration
226
102
 
227
- ## 语言选择原则
103
+ Skills evolve through testing. After each test iteration:
104
+ 1. Update SKILL.md if the logic needs adjustment.
105
+ 2. Add failing cases to `assets/samples.json`.
106
+ 3. Add newly discovered edge cases to `assets/corner_cases.json`.
107
+ 4. Update `references/interpretation.md` with new insights.
108
+ 5. Log what changed and why.
228
109
 
229
- SKILL.md 正文使用待核查单据的语言:
110
+ Use the bundled `skill-creator` skill if you want to run the full eval/iterate workflow with quantitative benchmarks.
230
111
 
231
- - 核查中文单据 → 正文写中文
232
- - 核查英文单据 → 正文写英文
233
- - 核查中英双语单据 → 正文写中文,关键英文术语保留原文
112
+ ## Bilingual Skills
234
113
 
235
- frontmatter 中的 name description 始终使用英文,以确保系统层面的兼容性。
114
+ Write skills in the language matching the LANGUAGE setting in `.env`. If rules and documents are in Chinese, write the SKILL.md body in Chinese using proper financial/regulatory terminology. The frontmatter (name, description) stays in English for system compatibility.