kc-beta 0.5.5 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/QUICKSTART.md +17 -4
- package/README.md +58 -11
- package/bin/kc-beta.js +35 -1
- package/package.json +1 -1
- package/src/agent/bundle-tree.js +553 -0
- package/src/agent/context.js +40 -1
- package/src/agent/engine.js +644 -28
- package/src/agent/llm-client.js +67 -18
- package/src/agent/pipelines/finalization.js +186 -0
- package/src/agent/pipelines/index.js +8 -0
- package/src/agent/pipelines/initializer.js +40 -0
- package/src/agent/pipelines/skill-authoring.js +100 -6
- package/src/agent/skill-loader.js +54 -4
- package/src/agent/task-manager.js +66 -3
- package/src/agent/tools/agent-tool.js +283 -35
- package/src/agent/tools/bundle-search.js +146 -0
- package/src/agent/tools/document-chunk.js +246 -0
- package/src/agent/tools/document-classify.js +311 -0
- package/src/agent/tools/document-parse.js +8 -1
- package/src/agent/tools/phase-advance.js +30 -7
- package/src/agent/tools/registry.js +10 -0
- package/src/agent/tools/rule-catalog.js +17 -3
- package/src/agent/tools/sandbox-exec.js +30 -0
- package/src/agent/workspace.js +168 -14
- package/src/cli/components.js +165 -17
- package/src/cli/index.js +166 -19
- package/src/cli/meme.js +58 -0
- package/src/config.js +39 -2
- package/src/model-tiers.json +3 -2
- package/src/providers.js +34 -1
- package/template/skills/en/meta-meta/evolution-loop/SKILL.md +13 -1
- package/template/skills/en/meta-meta/rule-extraction/SKILL.md +74 -0
- package/template/skills/zh/meta-meta/evolution-loop/SKILL.md +7 -1
- package/template/skills/zh/meta-meta/rule-extraction/SKILL.md +73 -0
|
@@ -241,12 +241,18 @@ description: Drive continuous improvement of skills and workflows through the di
|
|
|
241
241
|
|
|
242
242
|
### 停止条件
|
|
243
243
|
|
|
244
|
-
|
|
244
|
+
当一轮迭代**同时**满足以下三个条件时,停止循环:
|
|
245
245
|
|
|
246
246
|
1. 修正量 < 总测试案例的 5%。
|
|
247
247
|
2. 新模式数 = 0。
|
|
248
248
|
3. 回归数 = 0。
|
|
249
249
|
|
|
250
|
+
**或者**满足单独的准确率收敛条件(D5,2026-04-23 新增):
|
|
251
|
+
|
|
252
|
+
4. 连续两轮迭代之间的整体准确率变化 < 1%。即 `|accuracy[N+1] - accuracy[N]| < 0.01`。
|
|
253
|
+
|
|
254
|
+
条件 4 是为了防止观察到的过度迭代模式——从 v5 一直迭代到 v12,每轮都在 0.5% 的精度范围内来回波动。当模型已经达到"足够好"时,继续迭代只会消耗 token,不会带来实质改进。一旦准确率趋于稳定,应当进入下一阶段(蒸馏/生产)。
|
|
255
|
+
|
|
250
256
|
如果修正量在连续两轮迭代之间**增加**,这是回归信号。暂停循环,先诊断原因再继续——上一轮的修复可能正在破坏系统的稳定性。
|
|
251
257
|
|
|
252
258
|
### 预期收敛速度
|
|
@@ -59,6 +59,79 @@ Rules will be distilled into workflows (see `skill-to-workflow`). Design with di
|
|
|
59
59
|
### Catalog Versioning
|
|
60
60
|
When rules change (additions, modifications, deprecations), version the entire rule catalog as a unit. Individual rule versions track specific rules; the catalog version tracks the coherent set. Record the catalog version in `versions.json` alongside individual rule versions.
|
|
61
61
|
|
|
62
|
+
## Granularity Calibration (read before extracting)
|
|
63
|
+
|
|
64
|
+
A well-extracted rule catalog has **10-20 rules per typical regulation PDF**
|
|
65
|
+
(2025 banking/insurance disclosure regs, 30-80 pages). Over-extraction into
|
|
66
|
+
60-100 rules per regulation signals you're treating every clause as its own
|
|
67
|
+
rule — which downstream consumers (skill-authoring, workflow-run) can't
|
|
68
|
+
distinguish meaningful checks from boilerplate.
|
|
69
|
+
|
|
70
|
+
If your first pass produces more than ~25 rules for a single regulation:
|
|
71
|
+
- **Merge rules that share evidence and fail together** (e.g., "must disclose
|
|
72
|
+
X" and "must disclose Y" where both come from the same required-fields
|
|
73
|
+
table → one rule: "must disclose the required-fields list including X, Y").
|
|
74
|
+
- **Drop procedural language** that isn't checkable against a report
|
|
75
|
+
(definitions, scope statements, references to other regs that just
|
|
76
|
+
transitively apply).
|
|
77
|
+
- **Keep only checkable obligations, prohibitions, and thresholds** — the
|
|
78
|
+
things where you can read a sample report and say pass or fail.
|
|
79
|
+
|
|
80
|
+
### Sample "good" rule
|
|
81
|
+
|
|
82
|
+
```json
|
|
83
|
+
{
|
|
84
|
+
"id": "R014",
|
|
85
|
+
"source_ref": "信披办法 §15.2",
|
|
86
|
+
"description": "季报应在季度结束后 15 个工作日内披露。",
|
|
87
|
+
"applicable_sections": ["公募产品"],
|
|
88
|
+
"severity": "high",
|
|
89
|
+
"machine_checkable": true,
|
|
90
|
+
"falsifiability_statement": "披露日期晚于季度结束后第 15 个工作日,则不合规",
|
|
91
|
+
"test_case_stub": "读取季报的披露日期 + 对应季度末日期,计算工作日差值"
|
|
92
|
+
}
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Note: one pass/fail outcome, a single `source_ref` to a specific clause,
|
|
96
|
+
clear applicability scope. Skill-authoring can write `check_r014.py` from
|
|
97
|
+
this alone.
|
|
98
|
+
|
|
99
|
+
### Cross-regulation dedup (when working across multiple PDFs)
|
|
100
|
+
|
|
101
|
+
If the developer user provides N regulations, rules from later regs often
|
|
102
|
+
duplicate cross-cutting requirements already captured by earlier ones
|
|
103
|
+
(e.g., 资管新规 2018 generic disclosure rule vs. 信披办法 2025's specific
|
|
104
|
+
version). Before emitting a rule from reg-N:
|
|
105
|
+
|
|
106
|
+
1. **Check the existing catalog.** Use `rule_catalog` (operation: list) to
|
|
107
|
+
see what's already there. Skip if a rule with equivalent scope + intent
|
|
108
|
+
exists.
|
|
109
|
+
2. **Prefer the newer / more specific source_ref** when rules overlap.
|
|
110
|
+
3. **If you merged rules**, record the consolidated sources in `source_ref`:
|
|
111
|
+
e.g., `"信披办法 §15.2 + 资管新规 §24"`.
|
|
112
|
+
|
|
113
|
+
### Delegation to sub-agents
|
|
114
|
+
|
|
115
|
+
If you dispatch extraction to sub-agents (one per regulation), the sub-agent
|
|
116
|
+
inherits ONLY its `task_description` — it cannot see your conversation or
|
|
117
|
+
existing catalog. Therefore, when composing the brief:
|
|
118
|
+
|
|
119
|
+
- **Specify the target count band** explicitly: "Extract 10-20 atomic
|
|
120
|
+
rules from this regulation."
|
|
121
|
+
- **Include a sample rule** in the brief body (paste the JSON above
|
|
122
|
+
verbatim) so the sub-agent's calibration matches yours.
|
|
123
|
+
- **Name every regulation the sub-agent should process.** If AGENT.md
|
|
124
|
+
lists 10 core regulations, the brief must list all 10 by name, not
|
|
125
|
+
"the core regs" as a pronoun — LLMs composing long structured briefs
|
|
126
|
+
frequently drop items (observed in session 6304673afaa0 where reg 02
|
|
127
|
+
was silently omitted).
|
|
128
|
+
- **State the dedup contract**: "Rules already in the parent's catalog
|
|
129
|
+
(R001–Rnnn) should NOT be re-extracted. If a requirement is already
|
|
130
|
+
covered, skip it." Then pass the current catalog's ID ranges.
|
|
131
|
+
- **Prefer `rule_catalog` create operations over sandbox_exec writes to
|
|
132
|
+
catalog.json.** rule_catalog uses workspace file locking (B9);
|
|
133
|
+
sandbox_exec bypasses it and races with other writers.
|
|
134
|
+
|
|
62
135
|
## Extraction Strategies
|
|
63
136
|
|
|
64
137
|
### Strategy 1: Structured Input (Developer User Provides Rules)
|