kc-beta 0.3.1 → 0.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (49) hide show
  1. package/package.json +1 -1
  2. package/src/agent/confidence-scorer.js +8 -0
  3. package/src/agent/context.js +25 -0
  4. package/src/agent/corner-case-registry.js +5 -0
  5. package/src/agent/engine.js +514 -75
  6. package/src/agent/event-log.js +15 -2
  7. package/src/agent/history.js +91 -23
  8. package/src/agent/pipelines/initializer.js +3 -6
  9. package/src/agent/retry.js +9 -1
  10. package/src/agent/scheduler.js +276 -0
  11. package/src/agent/session-state.js +11 -2
  12. package/src/agent/task-manager.js +5 -0
  13. package/src/agent/tools/agent-tool.js +57 -14
  14. package/src/agent/tools/archive-file.js +94 -0
  15. package/src/agent/tools/copy-to-workspace.js +140 -0
  16. package/src/agent/tools/phase-advance.js +60 -0
  17. package/src/agent/tools/release.js +322 -0
  18. package/src/agent/tools/schedule-fetch.js +118 -0
  19. package/src/agent/tools/snapshot.js +101 -0
  20. package/src/agent/tools/workspace-file.js +10 -7
  21. package/src/agent/version-manager.js +29 -120
  22. package/src/agent/workspace.js +127 -4
  23. package/src/cli/components.js +4 -1
  24. package/src/cli/index.js +57 -4
  25. package/src/config.js +10 -1
  26. package/template/release-runtime/README.md.tmpl +84 -0
  27. package/template/release-runtime/kc_runtime/__init__.py +2 -0
  28. package/template/release-runtime/kc_runtime/confidence.py +93 -0
  29. package/template/release-runtime/kc_runtime/dashboard.py +208 -0
  30. package/template/release-runtime/render_dashboard.py +49 -0
  31. package/template/release-runtime/run.py +230 -0
  32. package/template/release-runtime/serve.sh +15 -0
  33. package/template/skills/en/meta/entity-extraction/SKILL.md +6 -0
  34. package/template/skills/en/meta-meta/bootstrap-workspace/SKILL.md +11 -0
  35. package/template/skills/en/meta-meta/quality-control/SKILL.md +13 -1
  36. package/template/skills/en/meta-meta/rule-extraction/SKILL.md +35 -0
  37. package/template/skills/en/meta-meta/rule-graph/SKILL.md +16 -0
  38. package/template/skills/en/meta-meta/skill-to-workflow/SKILL.md +8 -0
  39. package/template/skills/en/meta-meta/task-decomposition/SKILL.md +13 -0
  40. package/template/skills/en/meta-meta/version-control/SKILL.md +13 -0
  41. package/template/skills/zh/meta/entity-extraction/SKILL.md +6 -0
  42. package/template/skills/zh/meta-meta/bootstrap-workspace/SKILL.md +11 -0
  43. package/template/skills/zh/meta-meta/quality-control/SKILL.md +12 -0
  44. package/template/skills/zh/meta-meta/rule-extraction/SKILL.md +35 -0
  45. package/template/skills/zh/meta-meta/rule-graph/SKILL.md +16 -0
  46. package/template/skills/zh/meta-meta/skill-to-workflow/SKILL.md +8 -0
  47. package/template/skills/zh/meta-meta/task-decomposition/SKILL.md +16 -0
  48. package/template/skills/zh/meta-meta/version-control/SKILL.md +13 -0
  49. package/template/workspace.gitignore +22 -0
@@ -7,6 +7,19 @@ description: Manage versioning of skills, workflows, prompts, and system configu
7
7
 
8
8
  Version control here is about auditability and rollback, not collaboration. You need to know what changed, when, why, and be able to undo it if the change made things worse.
9
9
 
10
+ ## Git Is the Source of Truth
11
+
12
+ The workspace is a git repository. Every workspace write to a tracked path (skills, workflows, rules, glossary, AGENT.md, tasks.json) is auto-committed by KC with a trace ID in the commit message. This means:
13
+
14
+ - `git log --oneline` is the timeline of every meaningful change in this session.
15
+ - `git diff HEAD~3 -- rule_skills/R001/` shows what changed in a skill across the last three meaningful writes.
16
+ - `git checkout HEAD~5 -- workflows/R001/` rolls back a workflow without touching anything else.
17
+ - The `snapshot` tool tags moments worth remembering (releases, "before risky operation"); restore with `git checkout snap/<label>`.
18
+
19
+ Use `sandbox_exec` with `cwd: "workspace"` to run git commands directly. Don't fight git — it's the audit trail.
20
+
21
+ The conventions below (per-version filename copies, `CHANGELOG.md`) are still useful for *human readability inside a single skill folder* — having `workflow_v1.py` and `workflow_v3.py` side-by-side lets the agent compare them without reading git history. But the system of record is git, not the deprecated `versions.json` manifest (which is no longer written for new workspaces).
22
+
10
23
  ## What to Version
11
24
 
12
25
  Everything that affects verification results:
@@ -49,6 +49,12 @@ Many real verification tasks require semantic understanding — "is this descrip
49
49
 
50
50
  If a method's results fall below the accuracy threshold, try a different method or a more capable model. If regex works and meets accuracy — keep it, it's free. If regex produces results below threshold, escalate to worker LLM. If a cheap worker LLM isn't accurate enough, try a more capable tier. Record what works for each extraction type in AGENT.md for future reference.
51
51
 
52
+ ## Project Glossary
53
+
54
+ The project glossary (built and maintained by `rule-extraction`, stored at `rules/glossary.json`) is a useful resource when designing extraction. It records canonical names and known aliases for entities that appear across rules. Reading it before extracting helps keep entity names schema-aligned and avoids parallel labels for the same thing.
55
+
56
+ Whether the glossary becomes more than a naming convention — for instance, driving cheap pattern matching for entities with stable surface forms — is a per-project judgment. Apply the same cost-accuracy logic as elsewhere: whatever method meets the accuracy threshold for the task at hand.
57
+
52
58
  ## Schema Design
53
59
 
54
60
  Define the expected output for each extraction. Keep it simple and JIT:
@@ -122,6 +122,17 @@ versions.json # 版本清单(工作空间根目录)
122
122
  }
123
123
  ```
124
124
 
125
+ ## 生产环境的定时摄取
126
+
127
+ 项目进入生产后,新文档通常会按固定节奏到达 —— 监管机构每日发布、API 每小时拉取、上游系统批量上传。用 `schedule_fetch` 工具注册摄取任务,让 OS 调度器在 kc-beta 关闭时也能跑:
128
+
129
+ - 每个任务是一条 shell 命令(rsync、curl、自定义脚本),把文件落到 `$INPUT_DIR`。
130
+ - KC 在 `scripts/ingest/<job-id>.sh` 下生成一个 wrapper 脚本;用户通过 `crontab -e` 把这一行装进自己的 crontab。
131
+ - 新到达的文件会自动前缀成 `<job-id>_<UTC-时间戳>_`,文件名本身就告诉你来源和到达时间。
132
+ - 用 `/schedule` 或 `schedule_fetch list` 查看状态;`logs/ingest.log` 末尾几行展示最近的运行情况。
133
+
134
+ 在初始化阶段就和开发者用户讨论这个节奏 —— 生产侧文档输入节奏直接决定 skill 和工作流的写法(批处理 vs 流式、幂等性要求等等)。
135
+
125
136
  ## 何时需要重新初始化
126
137
 
127
138
  以下情况需要重新运行本技能:
@@ -167,8 +167,11 @@ IF 当前批次准确率 < WORKFLOW_ACCURACY:
167
167
  5. 汇总评审结果
168
168
  6. 判断是否需要触发演化循环
169
169
  7. 生成质控报告
170
+ 8. 处理完的输入文档通过 `archive_file` 移到 `input/archived/`,下次 session 只看到新到达的批次
170
171
  ```
171
172
 
173
+ 生产环境的输入通常按节奏到达(见 `bootstrap-workspace` 的"生产环境的定时摄取"一节)。`input/` 中的文件由摄取 wrapper 自动加上 `<job-id>_<UTC-时间戳>_` 前缀,每个批次的文件名本身就带有溯源信息。批次质控不通过时,前缀能帮你定位是哪一次定时拉取出了问题。
174
+
172
175
  ### 输出结构
173
176
 
174
177
  ```
@@ -235,6 +238,15 @@ logs/qc/
235
238
  }
236
239
  ```
237
240
 
241
+ ## 两类仪表盘
242
+
243
+ 系统中有两个独立的仪表盘:
244
+
245
+ - **开发者仪表盘** —— `dashboard_render` 工具,在工作区内基于 `output/results/`、`logs/evolution/`、`output/qc/` 生成。用于你自己审计、以及开发者用户在 BUILD/DISTILL 阶段的日常监控。
246
+ - **终端用户仪表盘** —— release 包内自带的 `render_dashboard.py` 脚本(由 `release` 工具产出)。面向非开发者收件人,从一次 `run.py` 调用的结果渲染,与工作区无关。
247
+
248
+ 发布 release 后,把终端用户引导到 release 包内的仪表盘,不是工作区的那个。工作区仪表盘是你自己的开发者视图。
249
+
238
250
  ## 开发者用户参与
239
251
 
240
252
  质量监控不应该让开发者用户去读 JSON 文件。通过仪表盘技能生成可视化报告,开发者用户只需要关注:
@@ -104,6 +104,41 @@ Maintain a lightweight catalog of all extracted rules. This is your index, not t
104
104
 
105
105
  Format: a simple markdown table or JSON file. Do not over-engineer this. The catalog exists to give you and the developer user an overview of progress.
106
106
 
107
+ ## Project Glossary
108
+
109
+ Alongside the rule catalog, build a project glossary — a living vocabulary of the entities, terms, and patterns the verification system encounters. The glossary is what keeps entity names consistent across rules: without it, the same balance-sheet item might be named "注册资本", "registered capital", and "paid-in capital" by three different rule skills, breaking shared-entity matching and producing inconsistent extraction outputs.
110
+
111
+ The glossary is not frozen at the end of extraction. It is a living document. Update it when you discover new aliases in samples, when a worker LLM extraction reveals a variant phrasing, when corner cases surface unfamiliar terminology. Both the coding agent and any operator can edit it.
112
+
113
+ ### When to seed it
114
+
115
+ During rule extraction. As you decompose each rule, note the entities the rule references — capital ratios, signature pages, related-party transactions, dates, parties, monetary values. Seed the glossary with the canonical name and any aliases already visible in the source documents.
116
+
117
+ ### Storage and shape
118
+
119
+ Save as `rules/glossary.json` next to `catalog.json`. Each entry is small:
120
+
121
+ ```json
122
+ {
123
+ "canonical": "registered_capital",
124
+ "aliases": ["注册资本", "registered capital", "实收资本"],
125
+ "definition": "The capital amount registered with regulators",
126
+ "entity_type": "monetary_value",
127
+ "seen_in": ["rules/regulation_A.pdf:p12", "samples/annual_report_2024.pdf:p3"],
128
+ "status": "extracted"
129
+ }
130
+ ```
131
+
132
+ Status field tracks maturity: `extracted` (from rules), `validated` (confirmed in samples), `production` (used by deployed workflows). Add or drop fields as the project demands — same JIT philosophy as the rule schema.
133
+
134
+ ### How it integrates
135
+
136
+ - `rule-graph` consumes the glossary so `shares_entity` edges reference canonical labels rather than free-text strings.
137
+ - `entity-extraction` references the glossary for canonical names and known aliases when designing extraction logic.
138
+ - Skills authored under `skill-authoring` should use canonical names in their schemas.
139
+
140
+ How the glossary is used downstream is a per-project judgment. A mature glossary may enable cheap pattern-based matching for some entities; for others it just keeps naming consistent. Let the cost-accuracy logic in `entity-extraction` decide per case.
141
+
107
142
  ## Handling Ambiguity
108
143
 
109
144
  Regulations are often ambiguous. When you encounter ambiguity:
@@ -87,6 +87,22 @@ description: Build and maintain a graph of relationships between verification ru
87
87
 
88
88
  图谱关联这些规则到共享的角落案例,一处修复、多处感知。
89
89
 
90
+ ## 项目术语表(Glossary)
91
+
92
+ 术语表由 `rule-extraction` 构建并维护,存放于 `rules/glossary.json`。它是规范化标签的注册中心——`shares_entity`(共享实体)边能否成立,全靠它。没有术语表,两条规则可能针对同一个实体却用不同名字,它们之间的边就永远画不出来。
93
+
94
+ 涉及实体的边应该引用术语表中的规范化标签,而不是从规则描述中复制粘贴的自由文本:
95
+
96
+ ```json
97
+ {"from": "R001", "to": "R004", "type": "shares_entity", "entity": "registered_capital"}
98
+ ```
99
+
100
+ 其中 `registered_capital` 是 `glossary.json` 里的规范化名称,`注册资本`、`paid-in capital`、`实收资本` 作为别名记录在该条目下。
101
+
102
+ 术语表更新时——发现新别名、合并两条目、修订定义——回过头检查受影响的 `shares_entity` 边。新别名可能让原本隐藏的跨规则关联浮现;合并的条目会把平行的边收敛为一条。
103
+
104
+ 术语表由 rule-extraction 构建和持有,rule-graph 只是消费方。
105
+
90
106
  ## 四个用途
91
107
 
92
108
  ### 1. 影响分析(Impact Analysis)
@@ -135,6 +135,14 @@ The coding agent's skill-based results are the ground truth. For each document i
135
135
 
136
136
  Each iteration of a workflow is a new version file: `workflow_v1.py`, `workflow_v2.py`, etc. Track which version is active in `config.json`. See `version-control` skill for the full methodology.
137
137
 
138
+ ## Releasing Workflows
139
+
140
+ Once workflows hit accuracy threshold, they can be packaged for end users via the `release` tool. Each release is a self-contained directory under `output/releases/<slug>/` with the pinned workflows, a Python runner, a confidence scorer, an HTML dashboard generator, and a `serve.sh` helper. The bundle has no kc-beta dependency — anyone with Python and a worker LLM API key can run `python run.py <doc>` and produce verification results.
141
+
142
+ What to include is your call: all rules in catalog, or a curated subset via the `include` parameter; bundling 1-3 representative samples as `fixtures/` if you want the recipient to be able to dry-run without their own data.
143
+
144
+ The `release` tool snapshots the workspace first (git tag `snap/release-<slug>`), so the bundle is regenerable from git even if `output/releases/` is later cleaned. Decide when to release — there's no automation, no forced cadence. Typical triggers: workflows reach SKILL/WORKFLOW_ACCURACY thresholds, a stakeholder needs a hand-off, a production cron should run pinned versions instead of latest. Discuss with the developer user.
145
+
138
146
  ## Cost Tracking
139
147
 
140
148
  Track the cost of each workflow run:
@@ -168,6 +168,22 @@ description: Decompose each verification rule into independent sub-tasks and ass
168
168
  | `template` | 模板填充(批注生成等) |
169
169
  | `manual_review` | 人工审核 |
170
170
 
171
+ ## 多智能体协同 —— 不要用锁
172
+
173
+ 如果一个任务大到你打算用 `agent_tool` 起多个子智能体并行做,按独立单元分片(一个规则一个子智能体、一份文档一个子智能体),让子智能体之间不需要通过共享可变文件来协同。
174
+
175
+ 来自一个友邻团队的失败教训:他们让所有子智能体平等,通过共享协同文件领任务,并加锁防抢占。两类失败必然出现:
176
+
177
+ 1. 锁被持有太久,或者干脆忘了释放。即便锁机制工作,二十个智能体的吞吐会下降到只有两三个的水平 —— 大部分时间都在等。
178
+ 2. 系统脆弱:智能体可能在持有锁时崩溃,或重复获取自己已经持有的锁,或干脆不获取锁就更新协同文件。
179
+
180
+ KC 偏好的两种模式:
181
+
182
+ - **单调度器** —— `TaskManager` 一次发一个任务给主 conductor。无锁,无 peer 协同。这是 ralph-loop 的默认架构。
183
+ - **按单元分片** —— 用 `agent_tool` 起子智能体时,每个子智能体只负责一个不重叠的切片(一规则、一文档)。子智能体的状态写到自己的 `sub_agents/<taskId>/` 下,共享产物(`rule_skills/<id>/`、`workflows/<id>/`)按规则路径分开。Block 11 的 git 自动提交把共享路径的写入序列化,按规则分片让"后写覆盖前写"不再是问题。
184
+
185
+ 如果两个本应并行的子智能体非要相互通信才能推进,那它们其实应该是一个任务(顺序跑)或一条流水线(父智能体发完 A 再发 B),而不是平行的 peer。
186
+
171
187
  ## 反模式
172
188
 
173
189
  ### LLM 万能论
@@ -5,6 +5,19 @@ description: Manage versioning of skills, workflows, prompts, and system configu
5
5
 
6
6
  # 版本控制与制品溯源
7
7
 
8
+ ## Git 即唯一真相源
9
+
10
+ 工作区是一个 git 仓库。每次对受跟踪路径(skills、workflows、rules、glossary、AGENT.md、tasks.json)的写入,都由 KC 自动提交,提交信息中带有 trace ID。这意味着:
11
+
12
+ - `git log --oneline` 就是本次 session 中所有有意义变更的时间线。
13
+ - `git diff HEAD~3 -- rule_skills/R001/` 显示某个技能在最近三次有意义写入间的变化。
14
+ - `git checkout HEAD~5 -- workflows/R001/` 回滚一个工作流,不影响其他任何东西。
15
+ - `snapshot` 工具用来标记值得记住的时刻(发版、"高风险操作前"),用 `git checkout snap/<label>` 恢复。
16
+
17
+ 通过 `sandbox_exec` 加 `cwd: "workspace"` 直接跑 git 命令。不要绕开 git —— 它就是审计链路本身。
18
+
19
+ 下文中按版本号复制文件名(`workflow_v1.py`、`workflow_v3.py`)和 CHANGELOG.md 的约定依然有用,但作用是 *在单个技能文件夹内提升人类可读性* —— 让智能体直接对比,不必每次都去翻 git 历史。系统的真相记录在 git,不在已废弃的 `versions.json` manifest(新工作区不再写入这个文件)。
20
+
8
21
  ## 设计目标
9
22
 
10
23
  这套版本控制机制不是为了多人协作——在这个系统中,编程智能体是唯一的执行者。版本控制的目的是:
@@ -0,0 +1,22 @@
1
+ # Auto-installed by KC at session start. Do not commit secrets, runtime
2
+ # noise, or user source documents — git tracks KC's outputs only.
3
+
4
+ # Secrets
5
+ .env
6
+
7
+ # Runtime / session noise
8
+ logs/
9
+ sub_agents/
10
+ session-state.json
11
+ .DS_Store
12
+ *.log
13
+
14
+ # User-provided source documents (read in place, not tracked)
15
+ samples/
16
+
17
+ # High-volume IO
18
+ input/
19
+ output/
20
+
21
+ # Deprecated metadata manifest (old workspaces only — replaced by git)
22
+ versions.json