kc-beta 0.5.5 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (34) hide show
  1. package/QUICKSTART.md +17 -4
  2. package/README.md +58 -11
  3. package/bin/kc-beta.js +35 -1
  4. package/package.json +1 -1
  5. package/src/agent/bundle-tree.js +553 -0
  6. package/src/agent/context.js +40 -1
  7. package/src/agent/engine.js +644 -28
  8. package/src/agent/llm-client.js +67 -18
  9. package/src/agent/pipelines/finalization.js +186 -0
  10. package/src/agent/pipelines/index.js +8 -0
  11. package/src/agent/pipelines/initializer.js +40 -0
  12. package/src/agent/pipelines/skill-authoring.js +100 -6
  13. package/src/agent/skill-loader.js +54 -4
  14. package/src/agent/task-manager.js +66 -3
  15. package/src/agent/tools/agent-tool.js +283 -35
  16. package/src/agent/tools/bundle-search.js +146 -0
  17. package/src/agent/tools/document-chunk.js +246 -0
  18. package/src/agent/tools/document-classify.js +311 -0
  19. package/src/agent/tools/document-parse.js +8 -1
  20. package/src/agent/tools/phase-advance.js +30 -7
  21. package/src/agent/tools/registry.js +10 -0
  22. package/src/agent/tools/rule-catalog.js +17 -3
  23. package/src/agent/tools/sandbox-exec.js +30 -0
  24. package/src/agent/workspace.js +168 -14
  25. package/src/cli/components.js +165 -17
  26. package/src/cli/index.js +166 -19
  27. package/src/cli/meme.js +58 -0
  28. package/src/config.js +39 -2
  29. package/src/model-tiers.json +3 -2
  30. package/src/providers.js +34 -1
  31. package/template/skills/en/meta-meta/evolution-loop/SKILL.md +13 -1
  32. package/template/skills/en/meta-meta/rule-extraction/SKILL.md +74 -0
  33. package/template/skills/zh/meta-meta/evolution-loop/SKILL.md +7 -1
  34. package/template/skills/zh/meta-meta/rule-extraction/SKILL.md +73 -0
@@ -241,12 +241,18 @@ description: Drive continuous improvement of skills and workflows through the di
241
241
 
242
242
  ### 停止条件
243
243
 
244
- 当一轮迭代同时满足以下三个条件时,停止循环:
244
+ 当一轮迭代**同时**满足以下三个条件时,停止循环:
245
245
 
246
246
  1. 修正量 < 总测试案例的 5%。
247
247
  2. 新模式数 = 0。
248
248
  3. 回归数 = 0。
249
249
 
250
+ **或者**满足单独的准确率收敛条件(D5,2026-04-23 新增):
251
+
252
+ 4. 连续两轮迭代之间的整体准确率变化 < 1%。即 `|accuracy[N+1] - accuracy[N]| < 0.01`。
253
+
254
+ 条件 4 是为了防止观察到的过度迭代模式——从 v5 一直迭代到 v12,每轮都在 0.5% 的精度范围内来回波动。当模型已经达到"足够好"时,继续迭代只会消耗 token,不会带来实质改进。一旦准确率趋于稳定,应当进入下一阶段(蒸馏/生产)。
255
+
250
256
  如果修正量在连续两轮迭代之间**增加**,这是回归信号。暂停循环,先诊断原因再继续——上一轮的修复可能正在破坏系统的稳定性。
251
257
 
252
258
  ### 预期收敛速度
@@ -59,6 +59,79 @@ Rules will be distilled into workflows (see `skill-to-workflow`). Design with di
59
59
  ### Catalog Versioning
60
60
  When rules change (additions, modifications, deprecations), version the entire rule catalog as a unit. Individual rule versions track specific rules; the catalog version tracks the coherent set. Record the catalog version in `versions.json` alongside individual rule versions.
61
61
 
62
+ ## Granularity Calibration (read before extracting)
63
+
64
+ A well-extracted rule catalog has **10-20 rules per typical regulation PDF**
65
+ (2025 banking/insurance disclosure regs, 30-80 pages). Over-extraction into
66
+ 60-100 rules per regulation signals you're treating every clause as its own
67
+ rule — which downstream consumers (skill-authoring, workflow-run) can't
68
+ distinguish meaningful checks from boilerplate.
69
+
70
+ If your first pass produces more than ~25 rules for a single regulation:
71
+ - **Merge rules that share evidence and fail together** (e.g., "must disclose
72
+ X" and "must disclose Y" where both come from the same required-fields
73
+ table → one rule: "must disclose the required-fields list including X, Y").
74
+ - **Drop procedural language** that isn't checkable against a report
75
+ (definitions, scope statements, references to other regs that just
76
+ transitively apply).
77
+ - **Keep only checkable obligations, prohibitions, and thresholds** — the
78
+ things where you can read a sample report and say pass or fail.
79
+
80
+ ### Sample "good" rule
81
+
82
+ ```json
83
+ {
84
+ "id": "R014",
85
+ "source_ref": "信披办法 §15.2",
86
+ "description": "季报应在季度结束后 15 个工作日内披露。",
87
+ "applicable_sections": ["公募产品"],
88
+ "severity": "high",
89
+ "machine_checkable": true,
90
+ "falsifiability_statement": "披露日期晚于季度结束后第 15 个工作日,则不合规",
91
+ "test_case_stub": "读取季报的披露日期 + 对应季度末日期,计算工作日差值"
92
+ }
93
+ ```
94
+
95
+ Note: one pass/fail outcome, a single `source_ref` to a specific clause,
96
+ clear applicability scope. Skill-authoring can write `check_r014.py` from
97
+ this alone.
98
+
99
+ ### Cross-regulation dedup (when working across multiple PDFs)
100
+
101
+ If the developer user provides N regulations, rules from later regs often
102
+ duplicate cross-cutting requirements already captured by earlier ones
103
+ (e.g., 资管新规 2018 generic disclosure rule vs. 信披办法 2025's specific
104
+ version). Before emitting a rule from reg-N:
105
+
106
+ 1. **Check the existing catalog.** Use `rule_catalog` (operation: list) to
107
+ see what's already there. Skip if a rule with equivalent scope + intent
108
+ exists.
109
+ 2. **Prefer the newer / more specific source_ref** when rules overlap.
110
+ 3. **If you merged rules**, record the consolidated sources in `source_ref`:
111
+ e.g., `"信披办法 §15.2 + 资管新规 §24"`.
112
+
113
+ ### Delegation to sub-agents
114
+
115
+ If you dispatch extraction to sub-agents (one per regulation), the sub-agent
116
+ inherits ONLY its `task_description` — it cannot see your conversation or
117
+ existing catalog. Therefore, when composing the brief:
118
+
119
+ - **Specify the target count band** explicitly: "Extract 10-20 atomic
120
+ rules from this regulation."
121
+ - **Include a sample rule** in the brief body (paste the JSON above
122
+ verbatim) so the sub-agent's calibration matches yours.
123
+ - **Name every regulation the sub-agent should process.** If AGENT.md
124
+ lists 10 core regulations, the brief must list all 10 by name, not
125
+ "the core regs" as a pronoun — LLMs composing long structured briefs
126
+ frequently drop items (observed in session 6304673afaa0 where reg 02
127
+ was silently omitted).
128
+ - **State the dedup contract**: "Rules already in the parent's catalog
129
+ (R001–Rnnn) should NOT be re-extracted. If a requirement is already
130
+ covered, skip it." Then pass the current catalog's ID ranges.
131
+ - **Prefer `rule_catalog` create operations over sandbox_exec writes to
132
+ catalog.json.** rule_catalog uses workspace file locking (B9);
133
+ sandbox_exec bypasses it and races with other writers.
134
+
62
135
  ## Extraction Strategies
63
136
 
64
137
  ### Strategy 1: Structured Input (Developer User Provides Rules)