kc-beta 0.1.2 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/kc-beta.js +14 -2
- package/package.json +1 -1
- package/src/agent/context-window.js +151 -0
- package/src/agent/context.js +8 -4
- package/src/agent/engine.js +261 -8
- package/src/agent/event-log.js +111 -0
- package/src/agent/llm-client.js +352 -59
- package/src/agent/pipelines/base.js +6 -0
- package/src/agent/pipelines/distillation.js +18 -0
- package/src/agent/pipelines/extraction.js +21 -0
- package/src/agent/pipelines/initializer.js +75 -14
- package/src/agent/pipelines/production-qc.js +19 -0
- package/src/agent/pipelines/skill-authoring.js +14 -0
- package/src/agent/pipelines/skill-testing.js +20 -0
- package/src/agent/retry.js +83 -0
- package/src/agent/session-state.js +79 -0
- package/src/agent/skill-loader.js +13 -1
- package/src/agent/token-counter.js +62 -0
- package/src/agent/tools/document-parse.js +104 -21
- package/src/agent/tools/document-search.js +24 -8
- package/src/agent/tools/sandbox-exec.js +16 -5
- package/src/agent/tools/web-search.js +107 -0
- package/src/agent/tools/worker-llm-call.js +14 -5
- package/src/agent/tools/workspace-file.js +47 -20
- package/src/agent/workspace.js +24 -1
- package/src/cli/components.js +24 -5
- package/src/cli/config.js +340 -0
- package/src/cli/index.js +113 -11
- package/src/cli/onboard.js +216 -53
- package/src/config.js +63 -10
- package/src/model-tiers.json +153 -0
- package/src/providers.js +367 -0
- package/template/AGENT.md +20 -0
- package/template/skills/en/meta/compliance-judgment/SKILL.md +10 -42
- package/template/skills/en/meta/document-chunking/SKILL.md +32 -0
- package/template/skills/en/meta/document-parsing/SKILL.md +11 -18
- package/template/skills/en/meta/entity-extraction/SKILL.md +13 -28
- package/template/skills/en/meta/tree-processing/SKILL.md +19 -1
- package/template/skills/en/meta-meta/auto-model-selection/SKILL.md +53 -0
- package/template/skills/en/meta-meta/pdf-review-dashboard/SKILL.md +57 -0
- package/template/skills/en/meta-meta/pdf-review-dashboard/scripts/generate_review.js +262 -0
- package/template/skills/en/meta-meta/rule-extraction/SKILL.md +24 -1
- package/template/skills/en/meta-meta/skill-authoring/SKILL.md +6 -0
- package/template/skills/en/meta-meta/skill-to-workflow/SKILL.md +4 -0
- package/template/skills/zh/meta/compliance-judgment/SKILL.md +41 -262
- package/template/skills/zh/meta/document-chunking/SKILL.md +32 -0
- package/template/skills/zh/meta/document-parsing/SKILL.md +65 -132
- package/template/skills/zh/meta/entity-extraction/SKILL.md +68 -230
- package/template/skills/zh/meta/tree-processing/SKILL.md +82 -194
- package/template/skills/zh/meta-meta/auto-model-selection/SKILL.md +51 -0
- package/template/skills/zh/meta-meta/pdf-review-dashboard/SKILL.md +55 -0
- package/template/skills/zh/meta-meta/pdf-review-dashboard/scripts/generate_review.js +262 -0
- package/template/skills/zh/meta-meta/rule-extraction/SKILL.md +79 -164
- package/template/skills/zh/meta-meta/skill-authoring/SKILL.md +64 -185
- package/template/skills/zh/meta-meta/skill-to-workflow/SKILL.md +95 -216
|
@@ -3,233 +3,112 @@ name: skill-authoring
|
|
|
3
3
|
description: Write each verification rule into a Claude Code skill folder following the official skill format. Use when converting extracted rules into skill folders, when iterating on existing rule skills after testing, or when the developer user wants to capture domain knowledge as a skill. Each skill folder must be self-contained with business logic in SKILL.md, code in scripts/, regulation context in references/, and sample data in assets/. Also use the bundled skill-creator for the full eval/iterate workflow.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
#
|
|
6
|
+
# Skill Authoring
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
Each verification rule becomes a skill folder. The skill must be self-contained: anyone (or any agent) reading just this folder should have everything needed to verify compliance with that one rule.
|
|
9
9
|
|
|
10
|
-
|
|
10
|
+
## Skill Folder Structure
|
|
11
11
|
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
每条规则的技能文件夹位于 `rule-skills/` 目录下,命名规范为 `R{编号}-{英文短名}/`:
|
|
12
|
+
Follow the official Claude Code skill format strictly. See `references/skill-format-spec.md` for the complete specification.
|
|
15
13
|
|
|
16
14
|
```
|
|
17
|
-
rule-skills/
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
15
|
+
rule-skills/
|
|
16
|
+
rule-001-capital-adequacy/
|
|
17
|
+
SKILL.md # The verification logic and methodology
|
|
18
|
+
scripts/
|
|
19
|
+
check.py # Deterministic checks (regex, calculations)
|
|
20
|
+
references/
|
|
21
|
+
regulation.md # Original regulation text, verbatim
|
|
22
|
+
interpretation.md # Expert notes on how to interpret edge cases
|
|
23
|
+
assets/
|
|
24
|
+
samples.json # Annotated sample extractions with expected results
|
|
25
|
+
corner_cases.json # Known edge cases with their resolutions
|
|
28
26
|
```
|
|
29
27
|
|
|
30
|
-
|
|
28
|
+
Not every rule needs all of these. A simple threshold check might only need SKILL.md and a script. A complex semantic rule might need detailed references and many samples. Start minimal, add as needed during testing.
|
|
31
29
|
|
|
32
|
-
SKILL.md
|
|
30
|
+
## Writing SKILL.md
|
|
33
31
|
|
|
34
|
-
###
|
|
32
|
+
### Frontmatter
|
|
35
33
|
|
|
36
34
|
```yaml
|
|
37
35
|
---
|
|
38
|
-
name:
|
|
39
|
-
description: Verify that
|
|
36
|
+
name: rule-001-capital-adequacy
|
|
37
|
+
description: Verify that the capital adequacy ratio reported in the document meets the regulatory minimum of 8%. Use when checking capital adequacy compliance in bank financial reports. Check the capital adequacy section or table for the reported ratio and compare against the threshold.
|
|
40
38
|
---
|
|
41
39
|
```
|
|
42
40
|
|
|
43
|
-
**name
|
|
44
|
-
|
|
45
|
-
**description 要写得「强势」**——明确告诉系统什么时候该调用这个技能。不要含糊,要具体列出触发场景。description 保持英文以兼容系统调度。
|
|
46
|
-
|
|
47
|
-
### 正文结构
|
|
41
|
+
- **name**: Must match the directory name exactly. Use lowercase, hyphens, no spaces. Prefix with the rule ID from your catalog.
|
|
42
|
+
- **description**: Write it as if explaining to another coding agent when they should use this skill. Be specific about what the rule checks, where to look in the document, and what constitutes pass/fail. Be pushy — include trigger keywords.
|
|
48
43
|
|
|
49
|
-
|
|
44
|
+
### Body Content
|
|
50
45
|
|
|
51
|
-
|
|
46
|
+
The body should cover:
|
|
52
47
|
|
|
53
|
-
|
|
48
|
+
1. **What this rule checks** — one paragraph explaining the rule in plain language. Include the regulatory source and intent.
|
|
54
49
|
|
|
55
|
-
|
|
56
|
-
核查发票开具日期是否落在对应合同的有效期范围内,
|
|
57
|
-
且是否符合法定的开票时限要求。
|
|
58
|
-
```
|
|
50
|
+
2. **Where to look** — which section, chapter, table, or part of the document contains the relevant information. Be specific. "The capital adequacy ratio is typically found in Chapter 2, Section 'Key Regulatory Metrics' or in the summary table on page 1."
|
|
59
51
|
|
|
60
|
-
|
|
52
|
+
3. **What to extract** — the specific entities needed. "Extract the reported capital adequacy ratio as a percentage." Define the expected format and any normalization needed.
|
|
61
53
|
|
|
62
|
-
|
|
54
|
+
4. **How to judge** — the logic for pass/fail. "The ratio must be >= 8.0%. If the ratio is missing, flag as MISSING rather than FAIL." For semantic judgments, describe the criteria in natural language.
|
|
63
55
|
|
|
64
|
-
|
|
65
|
-
- 合同中的「合同生效日期」和「合同到期日期」——通常位于合同首页或末页
|
|
66
|
-
- 如果是电子发票,日期字段的 JSON 路径是 `invoice.issue_date`
|
|
56
|
+
5. **Edge cases** — known tricky situations. "Some reports express the ratio as a decimal (0.12) rather than a percentage (12%). Normalize before comparing."
|
|
67
57
|
|
|
68
|
-
|
|
58
|
+
6. **Comment format** — what to say when the rule fails. Keep it concise and actionable. "Capital adequacy ratio is X%, which is below the regulatory minimum of 8%."
|
|
69
59
|
|
|
70
|
-
|
|
60
|
+
### Length and Style
|
|
71
61
|
|
|
72
|
-
-
|
|
73
|
-
-
|
|
74
|
-
-
|
|
62
|
+
- Keep SKILL.md under 500 lines. Most rules should be 100-200 lines.
|
|
63
|
+
- Explain the WHY behind the rule, not just the mechanics. Understanding intent helps handle edge cases.
|
|
64
|
+
- Write in imperative form: "Extract the ratio" not "The ratio should be extracted."
|
|
65
|
+
- If detailed regulation text is long, put it in `references/regulation.md` and reference it from SKILL.md.
|
|
75
66
|
|
|
76
|
-
|
|
67
|
+
## Pipeline Node Design
|
|
77
68
|
|
|
78
|
-
|
|
69
|
+
When a skill's workflow has multiple steps, decompose into nodes where each node does one thing well. Each node's difficulty should be well within the model's capability — don't cram location + extraction + judgment into a single LLM call.
|
|
79
70
|
|
|
80
|
-
|
|
81
|
-
IF 发票日期 IS NULL:
|
|
82
|
-
结论 = "无法核查",原因 = "发票日期缺失"
|
|
83
|
-
ELIF 合同生效日期 IS NULL OR 合同到期日期 IS NULL:
|
|
84
|
-
结论 = "无法核查",原因 = "合同期限信息不完整"
|
|
85
|
-
ELIF 发票日期 < 合同生效日期:
|
|
86
|
-
结论 = "不通过",原因 = "发票开具日期早于合同生效日期"
|
|
87
|
-
ELIF 发票日期 > 合同到期日期:
|
|
88
|
-
结论 = "不通过",原因 = "发票开具日期晚于合同到期日期"
|
|
89
|
-
ELSE:
|
|
90
|
-
结论 = "通过"
|
|
91
|
-
```
|
|
71
|
+
Pre-processing (text cleaning, format normalization) and post-processing (output parsing, value normalization) are separate nodes, not embedded in the LLM prompt. This keeps prompts clean and makes each step independently testable.
|
|
92
72
|
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
列举已知的特殊情形:
|
|
96
|
-
|
|
97
|
-
- 合同存在展期或续签:以最新的到期日期为准
|
|
98
|
-
- 开票日期恰好等于合同生效日期或到期日期:视为通过
|
|
99
|
-
- 框架合同下的订单发票:以订单对应的执行期限为准,而非框架合同整体期限
|
|
100
|
-
- 补开发票的情形:部分法规允许在合同到期后一定期限内补开
|
|
101
|
-
|
|
102
|
-
#### 六、输出格式
|
|
103
|
-
|
|
104
|
-
规定核查结果的标准输出结构:
|
|
105
|
-
|
|
106
|
-
```json
|
|
107
|
-
{
|
|
108
|
-
"rule_id": "R001",
|
|
109
|
-
"rule_name": "发票日期有效性",
|
|
110
|
-
"verdict": "pass | fail | unable_to_verify",
|
|
111
|
-
"confidence": 0.95,
|
|
112
|
-
"details": {
|
|
113
|
-
"invoice_date": "2025-03-15",
|
|
114
|
-
"contract_start": "2025-01-01",
|
|
115
|
-
"contract_end": "2025-12-31"
|
|
116
|
-
},
|
|
117
|
-
"comment": "发票日期在合同有效期内",
|
|
118
|
-
"source_ref": "《增值税发票管理办法》第十五条"
|
|
119
|
-
}
|
|
120
|
-
```
|
|
73
|
+
## Writing Scripts
|
|
121
74
|
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
如果核查发现问题,批注(comment)应当专业、简洁、可操作:
|
|
125
|
-
|
|
126
|
-
- 正例:「发票日期 2025-03-15 晚于合同到期日 2025-02-28,差异 15 天」
|
|
127
|
-
- 反例:「日期有问题」——信息量不足,不可操作
|
|
128
|
-
|
|
129
|
-
## 编写 scripts/
|
|
130
|
-
|
|
131
|
-
scripts 目录存放确定性操作的 Python 脚本。规则是:**凡是不需要 LLM 判断的操作,都用代码实现。**
|
|
132
|
-
|
|
133
|
-
适合用代码的操作:
|
|
134
|
-
- 日期格式解析与标准化
|
|
135
|
-
- 金额的数字与大写转换校验
|
|
136
|
-
- 税率计算与验证
|
|
137
|
-
- 正则表达式匹配(发票号码格式、统一社会信用代码格式)
|
|
138
|
-
- 字段存在性检查
|
|
139
|
-
|
|
140
|
-
不适合用代码的操作:
|
|
141
|
-
- 从非结构化文本中提取语义信息
|
|
142
|
-
- 判断描述性内容是否合规
|
|
143
|
-
- 理解自然语言表述的业务含义
|
|
144
|
-
|
|
145
|
-
脚本要求:
|
|
146
|
-
- 纯 Python,不依赖重型第三方库(允许 `re`、`json`、`datetime` 等标准库)
|
|
147
|
-
- 函数有清晰的输入输出类型注解
|
|
148
|
-
- 包含基本的错误处理
|
|
149
|
-
- 可独立运行测试
|
|
150
|
-
|
|
151
|
-
## 编写 references/
|
|
152
|
-
|
|
153
|
-
将与本规则直接相关的法规原文逐字摘录到 `references/regulation.md` 中。
|
|
154
|
-
|
|
155
|
-
要求:
|
|
156
|
-
- 逐字引用,不要改写或概括
|
|
157
|
-
- 标注出处:法规名称、发布机构、文号、条款编号
|
|
158
|
-
- 如果开发者用户对模糊条文给出了解读,以「业务解读」标注记录
|
|
159
|
-
- 如果规则涉及多部法规,分段引用,注明各自适用范围
|
|
160
|
-
|
|
161
|
-
这个文件的目的是让核查结论可追溯。任何核查结论都应该能在 references 中找到法规依据。
|
|
162
|
-
|
|
163
|
-
## 编写 assets/
|
|
164
|
-
|
|
165
|
-
### samples.json
|
|
166
|
-
|
|
167
|
-
提供至少 3-5 个测试样本,覆盖以下场景:
|
|
168
|
-
|
|
169
|
-
```json
|
|
170
|
-
[
|
|
171
|
-
{
|
|
172
|
-
"id": "S001",
|
|
173
|
-
"description": "标准通过案例",
|
|
174
|
-
"input": { "invoice_date": "2025-06-15", "contract_start": "2025-01-01", "contract_end": "2025-12-31" },
|
|
175
|
-
"expected_verdict": "pass"
|
|
176
|
-
},
|
|
177
|
-
{
|
|
178
|
-
"id": "S002",
|
|
179
|
-
"description": "发票日期超出合同期限",
|
|
180
|
-
"input": { "invoice_date": "2026-02-01", "contract_start": "2025-01-01", "contract_end": "2025-12-31" },
|
|
181
|
-
"expected_verdict": "fail"
|
|
182
|
-
},
|
|
183
|
-
{
|
|
184
|
-
"id": "S003",
|
|
185
|
-
"description": "合同日期缺失",
|
|
186
|
-
"input": { "invoice_date": "2025-06-15", "contract_start": null, "contract_end": null },
|
|
187
|
-
"expected_verdict": "unable_to_verify"
|
|
188
|
-
}
|
|
189
|
-
]
|
|
190
|
-
```
|
|
75
|
+
Scripts in `scripts/` handle deterministic operations:
|
|
191
76
|
|
|
192
|
-
|
|
77
|
+
- **Regex patterns** for entity extraction (dates, amounts, ratios, identifiers).
|
|
78
|
+
- **Calculation logic** for threshold checks, ratio computations, cross-field validation.
|
|
79
|
+
- **Format normalization** (Chinese numerals → digits, date format standardization, unit conversion).
|
|
193
80
|
|
|
194
|
-
|
|
81
|
+
Scripts should be self-contained Python files that can be imported or executed. Include clear input/output documentation in the script's docstring.
|
|
195
82
|
|
|
196
|
-
|
|
83
|
+
Do not put LLM prompts in scripts. LLM interactions belong in the SKILL.md body or in the workflow (later phase).
|
|
197
84
|
|
|
198
|
-
|
|
85
|
+
## Writing References
|
|
199
86
|
|
|
200
|
-
|
|
87
|
+
`references/` holds content that the coding agent reads on demand:
|
|
201
88
|
|
|
202
|
-
-
|
|
203
|
-
-
|
|
204
|
-
- 发现新的边界案例
|
|
205
|
-
- 法规变更影响本规则
|
|
89
|
+
- **regulation.md**: The original regulation text, verbatim. Include the source, date, and version. This is the ground truth that the rule is derived from.
|
|
90
|
+
- **interpretation.md**: Expert notes from the developer user or from the coding agent's own analysis. "When the regulation says 'adequate disclosure', in practice this means the section must be at least 2 paragraphs and cover risks A, B, and C."
|
|
206
91
|
|
|
207
|
-
|
|
92
|
+
Keep references factual and sourced. They are evidence, not instructions.
|
|
208
93
|
|
|
209
|
-
|
|
210
|
-
2. 更新 SKILL.md 中的相关段落
|
|
211
|
-
3. 如果变更涉及判定逻辑,同步更新 scripts/
|
|
212
|
-
4. 将新发现的边界案例添加到 corner_cases.json
|
|
213
|
-
5. 重新运行全部测试样本,确认无回归
|
|
94
|
+
## Writing Assets
|
|
214
95
|
|
|
215
|
-
|
|
96
|
+
`assets/` holds data that supports testing and edge case handling:
|
|
216
97
|
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
- 新增边界案例:框架合同下的订单发票处理逻辑
|
|
220
|
-
- 修正判定逻辑:开票日期等于合同到期日视为通过
|
|
221
|
-
- 来源:第二轮测试反馈
|
|
98
|
+
- **samples.json**: Annotated examples. Each entry: the input (extracted text or entity), the expected result (pass/fail/missing), and the expected comment. Build this incrementally as you test.
|
|
99
|
+
- **corner_cases.json**: Edge cases that the standard logic does not handle. Each entry: description, detection pattern, resolution, and confidence threshold. See the `corner-case-management` skill for the methodology.
|
|
222
100
|
|
|
223
|
-
##
|
|
224
|
-
- 初始版本,基于《增值税发票管理办法》第十五条提取
|
|
225
|
-
```
|
|
101
|
+
## Iteration
|
|
226
102
|
|
|
227
|
-
|
|
103
|
+
Skills evolve through testing. After each test iteration:
|
|
104
|
+
1. Update SKILL.md if the logic needs adjustment.
|
|
105
|
+
2. Add failing cases to `assets/samples.json`.
|
|
106
|
+
3. Add newly discovered edge cases to `assets/corner_cases.json`.
|
|
107
|
+
4. Update `references/interpretation.md` with new insights.
|
|
108
|
+
5. Log what changed and why.
|
|
228
109
|
|
|
229
|
-
|
|
110
|
+
Use the bundled `skill-creator` skill if you want to run the full eval/iterate workflow with quantitative benchmarks.
|
|
230
111
|
|
|
231
|
-
|
|
232
|
-
- 核查英文单据 → 正文写英文
|
|
233
|
-
- 核查中英双语单据 → 正文写中文,关键英文术语保留原文
|
|
112
|
+
## Bilingual Skills
|
|
234
113
|
|
|
235
|
-
frontmatter
|
|
114
|
+
Write skills in the language matching the LANGUAGE setting in `.env`. If rules and documents are in Chinese, write the SKILL.md body in Chinese using proper financial/regulatory terminology. The frontmatter (name, description) stays in English for system compatibility.
|