kc-beta 0.7.2 → 0.7.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -216,28 +216,35 @@ Quality Thresholds, Language.
216
216
 
217
217
  ## Status
218
218
 
219
- **v0.6.0first architectural beta.** This release lands:
219
+ **v0.7.3codex review patch release.** Latest line in the v0.7.x
220
+ hardening track. Architectural payload from v0.6.0+ is still in place:
220
221
 
221
222
  - Parallel ralph-loop (up to 8 concurrent workers) with a heap-safety
222
223
  conformance gate
223
224
  - Native chunker + RAG (onion-peeler + CJK bigram keyword index +
224
225
  one-shot LLM bundle classifier, ported from the AMC verification app)
225
- - Source-context auto-attach on skill_authoring tasks (rule NL + evidence
226
- chunks + sibling rules injected into the prompt, no manual search needed)
226
+ - Agent-owned task board: the agent reads the rule list from
227
+ `describeState`, decides decomposition (per-rule / grouped / range),
228
+ and calls `TaskCreate` / `TaskUpdate` / `TaskComplete` to drive the
229
+ Ralph loop. Source-context auto-attach pulls rule NL + evidence chunks
230
+ + sibling rules into the prompt of each task as it runs.
227
231
  - Workspace file locking for shared coordination files (`rules/catalog.json`,
228
- `rules/manifest.json`, `tasks.json`, etc.)
232
+ `rules/manifest.json`, `refs/manifest.json`, `tasks.json`,
233
+ `session-state.json`) — every writer goes through `withFileLock`.
229
234
  - `agent_tool` gets `wait` / `poll` / `list` / `kill` operations +
230
235
  `stale_subagents` phase-advance signal
231
- - New FINALIZATION phase packages the session into a shippable deliverable
236
+ - FINALIZATION phase packages the session into a shippable deliverable
232
237
  (canonical `rule_skills/` layout + README + coverage report + final
233
238
  dashboard)
239
+ - Filesystem-derived phase milestones (v0.7.0+): the engine reads disk
240
+ artifacts for advance criteria, never trusts tool-call assertions
234
241
  - Input stays active during streaming (type-ahead queue), arrow keys +
235
242
  history recall, CTX smoothing + peak, per-provider context-limit caps,
236
243
  `/tools`, `/parallelism`, and more
237
244
 
238
- See [DEV_LOG.md](./DEV_LOG.md) for the full v0.6.0 change breakdown and
239
- [docs/update_design_v5.md](./docs/update_design_v5.md) for the plan that
240
- drove it.
245
+ See [DEV_LOG.md](./DEV_LOG.md) for the per-release change breakdowns and
246
+ [docs/update_design_v7.md](./docs/update_design_v7.md) for the v0.7.x
247
+ plan and patch notes.
241
248
 
242
249
  Bug reports and PRs welcome at <https://github.com/kitchen-engineer42/kc-cli>.
243
250
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "kc-beta",
3
- "version": "0.7.2",
3
+ "version": "0.7.3",
4
4
  "description": "KC Agent — LLM document verification agent (pure Node.js CLI). Dual-licensed: PolyForm Noncommercial 1.0.0 for personal/noncommercial use; commercial license required for enterprise production. See LICENSE and LICENSE-COMMERCIAL.md.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -37,6 +37,7 @@ import { EvolutionCycleTool } from "./tools/evolution-cycle.js";
37
37
  import { TierDowngradeTool } from "./tools/tier-downgrade.js";
38
38
  import { AgentTool } from "./tools/agent-tool.js";
39
39
  import { WebSearchTool } from "./tools/web-search.js";
40
+ import { TaskCreateTool, TaskUpdateTool, TaskCompleteTool } from "./tools/task-board.js";
40
41
  import { SkillLoader } from "./skill-loader.js";
41
42
  import { TaskManager } from "./task-manager.js";
42
43
  import { Scheduler } from "./scheduler.js";
@@ -475,6 +476,16 @@ export class AgentEngine {
475
476
  () => this.currentPhase,
476
477
  ),
477
478
  new WebSearchTool(this.config.tavilyApiKey),
479
+ // v0.7.3: completes the v0.7.0 "agent owns TaskBoard" design.
480
+ // Skills already reference TaskCreate by name; these tools make
481
+ // that contract truthful. See task-board.js + work-decomposition
482
+ // SKILL.md. Skipped for subagents — they don't own a task board
483
+ // (taskManager is null in subagent scope, line 216).
484
+ ...(this.taskManager ? [
485
+ new TaskCreateTool(this.workspace, this.taskManager),
486
+ new TaskUpdateTool(this.workspace, this.taskManager),
487
+ new TaskCompleteTool(this.workspace, this.taskManager),
488
+ ] : []),
478
489
  ],
479
490
  // Distillation+ only (DISTILL mode)
480
491
  distill: [
@@ -1308,6 +1319,7 @@ export class AgentEngine {
1308
1319
  yield new AgentEvent({
1309
1320
  type: "tool_result",
1310
1321
  name: tc.name,
1322
+ input: inputData,
1311
1323
  output: historyContent,
1312
1324
  isError: result.isError,
1313
1325
  });
@@ -94,7 +94,7 @@ export class CopyToWorkspaceTool extends BaseTool {
94
94
  this._appendGitignore(`refs/${targetName}`);
95
95
  }
96
96
 
97
- this._appendManifest({
97
+ await this._appendManifest({
98
98
  target: targetRel,
99
99
  source: sourcePath,
100
100
  size: stat.size,
@@ -113,17 +113,22 @@ export class CopyToWorkspaceTool extends BaseTool {
113
113
  );
114
114
  }
115
115
 
116
- _appendManifest(entry) {
117
- const manifestAbs = this._workspace.resolvePath(MANIFEST_REL);
118
- fs.mkdirSync(path.dirname(manifestAbs), { recursive: true });
119
- let entries = [];
120
- if (fs.existsSync(manifestAbs)) {
121
- try { entries = JSON.parse(fs.readFileSync(manifestAbs, "utf-8")); }
122
- catch { entries = []; }
123
- }
124
- if (!Array.isArray(entries)) entries = [];
125
- entries.push(entry);
126
- fs.writeFileSync(manifestAbs, JSON.stringify(entries, null, 2), "utf-8");
116
+ async _appendManifest(entry) {
117
+ // v0.7.3: refs/manifest.json is a shared coordination path — wrap the
118
+ // whole read-modify-write under the workspace lock so two parallel
119
+ // copy_to_workspace calls (main agent + subagent) don't lose entries.
120
+ return await this._workspace.withSharedLockIfApplicable(MANIFEST_REL, () => {
121
+ const manifestAbs = this._workspace.resolvePath(MANIFEST_REL);
122
+ fs.mkdirSync(path.dirname(manifestAbs), { recursive: true });
123
+ let entries = [];
124
+ if (fs.existsSync(manifestAbs)) {
125
+ try { entries = JSON.parse(fs.readFileSync(manifestAbs, "utf-8")); }
126
+ catch { entries = []; }
127
+ }
128
+ if (!Array.isArray(entries)) entries = [];
129
+ entries.push(entry);
130
+ fs.writeFileSync(manifestAbs, JSON.stringify(entries, null, 2), "utf-8");
131
+ });
127
132
  }
128
133
 
129
134
  _appendGitignore(line) {
@@ -44,7 +44,10 @@ export class SandboxExecTool extends BaseTool {
44
44
  "Execute a shell command. " +
45
45
  "cwd='workspace' (default) runs in KC's workspace. " +
46
46
  "cwd='project' runs in the user's project directory. " +
47
- "Pipes, redirects, and chained commands (&&) are supported."
47
+ "Pipes, redirects, and chained commands (&&) are supported. " +
48
+ "stdout + stderr combined are capped at 10,000 chars; longer output is truncated. " +
49
+ "For reading individual files larger than ~10 KB (e.g. regulation documents), " +
50
+ "prefer workspace_file (operation=read) which has a larger 50 KB cap."
48
51
  );
49
52
  }
50
53
 
@@ -0,0 +1,194 @@
1
+ import { BaseTool, ToolResult } from "./base.js";
2
+
3
+ const TASKS_REL = "tasks.json";
4
+
5
+ /**
6
+ * v0.7.3 — TaskCreate / TaskUpdate / TaskComplete tools.
7
+ *
8
+ * Completes the v0.7.0 "agent owns TaskBoard" design. The engine no longer
9
+ * auto-populates per-rule tasks on phase entry (PER_RULE_PHASES is empty by
10
+ * default — see task-manager.js); the agent reads the rule list via
11
+ * describeState, picks a decomposition (single / grouped / range / non-rule),
12
+ * and calls these tools to populate tasks.json. The Ralph loop in
13
+ * AgentEngine._runTaskLoopSerial then walks pending tasks one at a time.
14
+ *
15
+ * Skill teaching for these tools lives in
16
+ * template/skills/{en,zh}/meta-meta/work-decomposition/SKILL.md.
17
+ *
18
+ * tasks.json is a shared-coordination path (workspace.js
19
+ * SHARED_COORDINATION_PATHS) — every write goes through
20
+ * withSharedLockIfApplicable so two writers (main + subagent) serialize.
21
+ */
22
+
23
+ export class TaskCreateTool extends BaseTool {
24
+ constructor(workspace, taskManager) {
25
+ super();
26
+ this._workspace = workspace;
27
+ this._taskManager = taskManager;
28
+ }
29
+
30
+ get name() { return "TaskCreate"; }
31
+
32
+ get description() {
33
+ return (
34
+ "Add a task to the session task board. Tasks gate the Ralph loop — " +
35
+ "after the current turn ends, the engine pulls the next pending task " +
36
+ "and runs it. Use one task per unit of work you want to iterate on " +
37
+ "(per-rule, per-group, per-document — your decomposition). " +
38
+ "Call this on phase entry after reading describeState."
39
+ );
40
+ }
41
+
42
+ get inputSchema() {
43
+ return {
44
+ type: "object",
45
+ properties: {
46
+ id: {
47
+ type: "string",
48
+ description: "Unique task ID within this session (e.g. 'R001-skill_authoring' or 'group-trust-1').",
49
+ },
50
+ title: {
51
+ type: "string",
52
+ description: "Short human-readable title for the task.",
53
+ },
54
+ phase: {
55
+ type: "string",
56
+ description: "Phase this task belongs to (e.g. 'skill_authoring', 'skill_testing', 'distillation').",
57
+ },
58
+ ruleId: {
59
+ type: "string",
60
+ description: "Optional rule_id if this is a per-rule task. Omit for grouped or non-rule tasks.",
61
+ },
62
+ },
63
+ required: ["id", "title", "phase"],
64
+ };
65
+ }
66
+
67
+ async execute(input) {
68
+ const id = input.id || "";
69
+ const title = input.title || "";
70
+ const phase = input.phase || "";
71
+ const ruleId = input.ruleId || null;
72
+
73
+ if (!id) return new ToolResult("id required", true);
74
+ if (!title) return new ToolResult("title required", true);
75
+ if (!phase) return new ToolResult("phase required", true);
76
+
77
+ return await this._workspace.withSharedLockIfApplicable(TASKS_REL, () => {
78
+ const before = this._taskManager.getAllTasks().some((t) => t.id === id);
79
+ this._taskManager.addTask({ id, title, phase, ruleId });
80
+ if (before) {
81
+ return new ToolResult(`Task ${id} already existed (no-op).`);
82
+ }
83
+ const p = this._taskManager.progress;
84
+ return new ToolResult(
85
+ `Task ${id} created. Board: ${p.pending} pending, ${p.inProgress} in_progress, ${p.completed} completed.`,
86
+ );
87
+ });
88
+ }
89
+ }
90
+
91
+ export class TaskUpdateTool extends BaseTool {
92
+ constructor(workspace, taskManager) {
93
+ super();
94
+ this._workspace = workspace;
95
+ this._taskManager = taskManager;
96
+ }
97
+
98
+ get name() { return "TaskUpdate"; }
99
+
100
+ get description() {
101
+ return (
102
+ "Update a task's status and optional summary. Status: 'pending', " +
103
+ "'in_progress', 'completed', or 'failed'. Use TaskComplete instead " +
104
+ "for the common case of marking a task done with a summary."
105
+ );
106
+ }
107
+
108
+ get inputSchema() {
109
+ return {
110
+ type: "object",
111
+ properties: {
112
+ id: { type: "string", description: "Task ID to update." },
113
+ status: {
114
+ type: "string",
115
+ enum: ["pending", "in_progress", "completed", "failed"],
116
+ description: "New status for the task.",
117
+ },
118
+ summary: {
119
+ type: "string",
120
+ description: "Optional short summary (e.g. why the task failed, what was produced).",
121
+ },
122
+ },
123
+ required: ["id"],
124
+ };
125
+ }
126
+
127
+ async execute(input) {
128
+ const id = input.id || "";
129
+ const status = input.status;
130
+ const summary = input.summary;
131
+
132
+ if (!id) return new ToolResult("id required", true);
133
+
134
+ return await this._workspace.withSharedLockIfApplicable(TASKS_REL, () => {
135
+ const exists = this._taskManager.getAllTasks().some((t) => t.id === id);
136
+ if (!exists) return new ToolResult(`Task ${id} not found.`, true);
137
+ this._taskManager.updateTask(id, { status, summary });
138
+ const p = this._taskManager.progress;
139
+ return new ToolResult(
140
+ `Task ${id} updated${status ? ` to ${status}` : ""}. ` +
141
+ `Board: ${p.pending} pending, ${p.inProgress} in_progress, ${p.completed} completed, ${p.failed} failed.`,
142
+ );
143
+ });
144
+ }
145
+ }
146
+
147
+ export class TaskCompleteTool extends BaseTool {
148
+ constructor(workspace, taskManager) {
149
+ super();
150
+ this._workspace = workspace;
151
+ this._taskManager = taskManager;
152
+ }
153
+
154
+ get name() { return "TaskComplete"; }
155
+
156
+ get description() {
157
+ return (
158
+ "Mark a task as completed with an optional summary. Sugar for " +
159
+ "TaskUpdate({id, status: 'completed', summary}). The Ralph loop " +
160
+ "advances to the next pending task after this returns."
161
+ );
162
+ }
163
+
164
+ get inputSchema() {
165
+ return {
166
+ type: "object",
167
+ properties: {
168
+ id: { type: "string", description: "Task ID to complete." },
169
+ summary: {
170
+ type: "string",
171
+ description: "Optional short summary of what was produced.",
172
+ },
173
+ },
174
+ required: ["id"],
175
+ };
176
+ }
177
+
178
+ async execute(input) {
179
+ const id = input.id || "";
180
+ const summary = input.summary;
181
+
182
+ if (!id) return new ToolResult("id required", true);
183
+
184
+ return await this._workspace.withSharedLockIfApplicable(TASKS_REL, () => {
185
+ const exists = this._taskManager.getAllTasks().some((t) => t.id === id);
186
+ if (!exists) return new ToolResult(`Task ${id} not found.`, true);
187
+ this._taskManager.markDone(id, summary);
188
+ const p = this._taskManager.progress;
189
+ return new ToolResult(
190
+ `Task ${id} completed. Board: ${p.pending} pending, ${p.inProgress} in_progress, ${p.completed} completed.`,
191
+ );
192
+ });
193
+ }
194
+ }
@@ -30,7 +30,9 @@ export class WorkspaceFileTool extends BaseTool {
30
30
  "Read, write, or list files. " +
31
31
  "scope='workspace' (default): KC's working directory for rules, skills, workflows, results. " +
32
32
  "scope='project': the user's project folder where KC was launched — source regulations and samples live here. " +
33
- "Operations: read (returns file content), write (creates/overwrites a file), list (shows directory contents)."
33
+ "Operations: read (returns file content), write (creates/overwrites a file), list (shows directory contents). " +
34
+ "read returns up to 50,000 chars per call; longer files are truncated. " +
35
+ "For full reads of regulation/rule documents (typically smaller than this cap), prefer this tool over sandbox_exec."
34
36
  );
35
37
  }
36
38
 
@@ -87,7 +89,7 @@ export class WorkspaceFileTool extends BaseTool {
87
89
 
88
90
  try {
89
91
  if (op === "read") return this._read(filePath, scope);
90
- if (op === "write") return this._write(filePath, content, scope);
92
+ if (op === "write") return await this._write(filePath, content, scope);
91
93
  if (op === "list") return this._list(filePath, scope);
92
94
  return new ToolResult(`Unknown operation: ${op}`, true);
93
95
  } catch (err) {
@@ -107,56 +109,68 @@ export class WorkspaceFileTool extends BaseTool {
107
109
  return new ToolResult(text);
108
110
  }
109
111
 
110
- _write(filePath, content, scope) {
112
+ async _write(filePath, content, scope) {
111
113
  if (!filePath || filePath === ".") {
112
114
  return new ToolResult("Path required for write operation", true);
113
115
  }
114
116
  const resolved = this._resolveForScope(filePath, scope);
115
- fs.mkdirSync(path.dirname(resolved), { recursive: true });
116
-
117
- // v0.7.0 Group M (#84 remainder): on case-insensitive filesystems
118
- // (macOS/Windows defaults), warn when the target's basename collides
119
- // with an existing sibling differing only in case. Write proceeds
120
- // — agents may legitimately overwrite — but the agent gets visible
121
- // signal so it doesn't end up confused like E2E #5 GLM ("SKILL.md
122
- // disappeared" when the inode was shared with skill.md). Workspace-
123
- // scope only; project-dir scope is the user's territory.
124
- let collisionNote = "";
125
- if (
126
- scope === "workspace" &&
127
- this._workspace.fsCaseSensitive === false
128
- ) {
129
- try {
130
- const parent = path.dirname(resolved);
131
- const targetBase = path.basename(resolved);
132
- const targetLower = targetBase.toLowerCase();
133
- const siblings = fs.readdirSync(parent);
134
- const collision = siblings.find(
135
- (s) => s !== targetBase && s.toLowerCase() === targetLower,
136
- );
137
- if (collision) {
138
- collisionNote =
139
- ` ⚠ case-collision: case-insensitive filesystem already has '${collision}'` +
140
- ` at this path; both names resolve to the same inode. Pick one canonical case` +
141
- ` (lowercase preferred for skill files) and use it consistently — otherwise` +
142
- ` archive_file / Read on either name affects the other.`;
143
- }
144
- } catch { /* readdirSync may fail on a fresh dir; that's fine, no collision possible */ }
145
- }
146
117
 
147
- fs.writeFileSync(resolved, content, "utf-8");
118
+ const doWrite = () => {
119
+ fs.mkdirSync(path.dirname(resolved), { recursive: true });
120
+
121
+ // v0.7.0 Group M (#84 remainder): on case-insensitive filesystems
122
+ // (macOS/Windows defaults), warn when the target's basename collides
123
+ // with an existing sibling differing only in case. Write proceeds
124
+ // — agents may legitimately overwrite — but the agent gets visible
125
+ // signal so it doesn't end up confused like E2E #5 GLM ("SKILL.md
126
+ // disappeared" when the inode was shared with skill.md). Workspace-
127
+ // scope only; project-dir scope is the user's territory.
128
+ let collisionNote = "";
129
+ if (
130
+ scope === "workspace" &&
131
+ this._workspace.fsCaseSensitive === false
132
+ ) {
133
+ try {
134
+ const parent = path.dirname(resolved);
135
+ const targetBase = path.basename(resolved);
136
+ const targetLower = targetBase.toLowerCase();
137
+ const siblings = fs.readdirSync(parent);
138
+ const collision = siblings.find(
139
+ (s) => s !== targetBase && s.toLowerCase() === targetLower,
140
+ );
141
+ if (collision) {
142
+ collisionNote =
143
+ ` ⚠ case-collision: case-insensitive filesystem already has '${collision}'` +
144
+ ` at this path; both names resolve to the same inode. Pick one canonical case` +
145
+ ` (lowercase preferred for skill files) and use it consistently — otherwise` +
146
+ ` archive_file / Read on either name affects the other.`;
147
+ }
148
+ } catch { /* readdirSync may fail on a fresh dir; that's fine, no collision possible */ }
149
+ }
150
+
151
+ fs.writeFileSync(resolved, content, "utf-8");
152
+
153
+ // Auto-commit to git for workspace writes (silently no-ops if gitignored or git unavailable)
154
+ let traceId = null;
155
+ if (scope === "workspace") {
156
+ traceId = this._workspace.autoCommit(filePath, "update");
157
+ }
158
+
159
+ const label = scope === "project" ? `[project] ${filePath}` : filePath;
160
+ let msg = `Wrote ${content.length} chars to ${label}`;
161
+ if (traceId) msg += ` [trace: ${traceId}]`;
162
+ if (collisionNote) msg += collisionNote;
163
+ return new ToolResult(msg);
164
+ };
148
165
 
149
- // Auto-commit to git for workspace writes (silently no-ops if gitignored or git unavailable)
150
- let traceId = null;
166
+ // v0.7.3: route writes to shared coordination paths (rules/catalog.json,
167
+ // tasks.json, refs/manifest.json, etc.) through the workspace lock so
168
+ // concurrent writers serialize. No-op for non-shared paths and for
169
+ // project-scope writes (project dir is the user's, not shared engine state).
151
170
  if (scope === "workspace") {
152
- traceId = this._workspace.autoCommit(filePath, "update");
171
+ return await this._workspace.withSharedLockIfApplicable(filePath, doWrite);
153
172
  }
154
-
155
- const label = scope === "project" ? `[project] ${filePath}` : filePath;
156
- let msg = `Wrote ${content.length} chars to ${label}`;
157
- if (traceId) msg += ` [trace: ${traceId}]`;
158
- if (collisionNote) msg += collisionNote;
159
- return new ToolResult(msg);
173
+ return doWrite();
160
174
  }
161
175
 
162
176
  _list(filePath, scope) {
package/src/config.js CHANGED
@@ -90,10 +90,12 @@ export function loadSettings(workspacePath) {
90
90
  tier3: env.TIER3 || gc.tiers?.tier3 || "",
91
91
  tier4: env.TIER4 || gc.tiers?.tier4 || "",
92
92
 
93
- // VLM tiers (vision/OCR models)
94
- vlmTier1: env.VLM_TIER1 || gc.vlm_tiers?.tier1 || "",
95
- vlmTier2: env.VLM_TIER2 || gc.vlm_tiers?.tier2 || "",
96
- vlmTier3: env.VLM_TIER3 || gc.vlm_tiers?.tier3 || "",
93
+ // VLM tiers (vision/OCR models). v0.7.3: accept OCR_MODEL_TIER* as
94
+ // alias since template/.env.template + initializer.js seed that name.
95
+ // VLM_TIER* takes precedence when both are set.
96
+ vlmTier1: env.VLM_TIER1 || env.OCR_MODEL_TIER1 || gc.vlm_tiers?.tier1 || "",
97
+ vlmTier2: env.VLM_TIER2 || env.OCR_MODEL_TIER2 || gc.vlm_tiers?.tier2 || "",
98
+ vlmTier3: env.VLM_TIER3 || env.OCR_MODEL_TIER3 || gc.vlm_tiers?.tier3 || "",
97
99
 
98
100
  // Worker LLM — optional, defaults to conductor config (process.env wins)
99
101
  workerProvider: penv.KC_WORKER_PROVIDER || gc.worker_provider || "",
@@ -23,6 +23,13 @@ skills/ — Meta skills encoding verification methodology
23
23
  .env — Configuration: API keys, model tiers, thresholds, language
24
24
  ```
25
25
 
26
+ Note: KC's session workspace under `~/.kc_agent/workspaces/<sessionId>/`
27
+ uses lowercase counterparts (`rules/`, `samples/`, `input/`, `output/`,
28
+ `logs/`, `workflows/`, `rule_skills/`) — these are runtime-internal and
29
+ separate from this project's user-facing folders above. The asymmetry
30
+ is intentional: title-case for human-facing project dirs, lowercase for
31
+ KC's working state.
32
+
26
33
  ## Your Mission
27
34
 
28
35
  Follow this lifecycle. Each step references the skill(s) to consult:
@@ -93,6 +100,12 @@ skills/ — 编码核查方法论的元技能
93
100
  .env — 配置:API密钥、模型层级、阈值、语言
94
101
  ```
95
102
 
103
+ 注:KC 在 `~/.kc_agent/workspaces/<sessionId>/` 下的会话工作区使用
104
+ 小写对应目录(`rules/`、`samples/`、`input/`、`output/`、`logs/`、
105
+ `workflows/`、`rule_skills/`)—— 这些是运行时内部目录,与本项目上面
106
+ 那些用户可见的目录是分开的。这种大小写不对称是有意的:项目里给人看
107
+ 的目录用首字母大写;KC 自己的工作状态用小写。
108
+
96
109
  ## 你的使命
97
110
 
98
111
  遵循以下生命周期。每一步标注了需要参考的技能:
@@ -133,6 +133,65 @@ conversation or existing catalog. Therefore, when composing the brief:
133
133
  catalog.json.** rule_catalog uses workspace file locking;
134
134
  sandbox_exec bypasses it and races with other writers.
135
135
 
136
+ ## How to read regulation files (default: read whole)
137
+
138
+ Regulations are the audit's authoritative basis. Every `source_ref`
139
+ in your extracted rules must be verifiable against the source text.
140
+ For typical regulation documents (a single file under ~50 KB / under
141
+ ~100 pages), **read each regulation file whole using `workspace_file`
142
+ (operation=read) in a single call**:
143
+
144
+ ```js
145
+ workspace_file({ operation: "read", scope: "project", path: "Rules/01_some_regulation.md" })
146
+ ```
147
+
148
+ `workspace_file.read` is capped at 50,000 chars per call, which
149
+ covers virtually every individual regulation document. This is the
150
+ default. **Read every regulation file whole before you start
151
+ extracting rules from any of them.**
152
+
153
+ ### Tool choice — `workspace_file` vs `sandbox_exec`
154
+
155
+ | Tool | Per-call cap | Use for |
156
+ |---|---:|---|
157
+ | `workspace_file` (read) | 50,000 chars | **full reads of regulation / rule documents** |
158
+ | `sandbox_exec` (cat/head/etc) | 10,000 chars | shell commands, **not** full file reads |
159
+
160
+ `sandbox_exec` is designed for shell commands; its 10K cap is too
161
+ small for most regulations. `cat rules/01_*.md` returns only the
162
+ first ~10 KB followed by `\n[truncated]`. Re-issuing with `head -N` /
163
+ `tail -M` to scroll the window loses positional precision and burns
164
+ turns. **When you see truncation, don't fight the cap — switch
165
+ tools.**
166
+
167
+ ### Asymmetry — regs read whole, samples sampled
168
+
169
+ Regulations are limited (typically 1-10 files), authoritative, and
170
+ read once. Read every regulation whole.
171
+
172
+ Sample documents may number 30 to 1000+, are heterogeneous, and get
173
+ read many times during testing. **Don't try to read every sample
174
+ whole.** Use rule-applicability filters or sampled subsets to focus
175
+ attention.
176
+
177
+ ### Escape valve — when a single reg exceeds ~200K chars
178
+
179
+ Rare in practice. The largest regulation in `test_data_4` is 42 KB;
180
+ typical Chinese banking regs (资管新规, 信披办法, etc.) all fit
181
+ under 50 KB. But if you do encounter a single regulation so large
182
+ that reading it whole would crowd the context window — heuristic:
183
+ the file exceeds ~200,000 chars or ~25% of your context budget —
184
+ use your own judgment:
185
+
186
+ - Read by chapter (e.g., `第X章` / `Chapter X`) using `document_parse`
187
+ or paginated `workspace_file` reads
188
+ - Or build an in-workspace index file pointing to chapter offsets and
189
+ read on-demand per rule being extracted
190
+
191
+ The 50 KB cap is high enough that this almost never triggers. **The
192
+ default is read whole; deviate only when the file genuinely doesn't
193
+ fit.**
194
+
136
195
  ## Extraction Strategies
137
196
 
138
197
  ### Strategy 1: Structured Input (Developer User Provides Rules)
@@ -85,7 +85,7 @@ Bundle multiple rules into a single task (and a single check_r###_r###.py file)
85
85
  - The judgment logic for one rule is a substring or close variant of the next
86
86
  - A single failure typically implies multiple failures (you can't pass R013 if R015 fails)
87
87
 
88
- Example: R013 / R015 / R017 all check that a specific table on page 3 of the report contains certain mandatory fields. Same chunk, same parse, same verdict shape. Bundle as `check_r013_r015_r017.py` and create a single TaskCreate task `R013/R015/R017 — required-fields table`. The engine's filesystem-derived milestones recognize the grouped check.py and credit all three rule_ids.
88
+ Example: R013 / R015 / R017 all check that a specific table on page 3 of the report contains certain mandatory fields. Same chunk, same parse, same verdict shape. Bundle as `check_r013_r015_r017.py` and create a single task: `TaskCreate({id: "R013-R015-R017-skill_authoring", title: "R013/R015/R017 — required-fields table", phase: "skill_authoring"})`. The engine's filesystem-derived milestones recognize the grouped check.py and credit all three rule_ids.
89
89
 
90
90
  ### When to keep separate
91
91
 
@@ -344,6 +344,30 @@ When entering skill_authoring with an empty TaskBoard:
344
344
  5. **Pick the first task.** Work it to completion (skill + check + at least one local test). Update PATTERNS.md with whatever you learned. Move to the next task.
345
345
  6. **At task ~5 and task ~10:** stop and re-read PATTERNS.md. If patterns suggest a refactor of earlier work, do it now (cheap) rather than later (expensive).
346
346
 
347
+ ### Calling TaskCreate / TaskUpdate / TaskComplete
348
+
349
+ The engine registers three task-board tools (v0.7.3+):
350
+
351
+ - `TaskCreate({id, title, phase, ruleId?})` — adds a task to `tasks.json`. `id` must be unique within the session; pick a stable shape like `<rule_id>-<phase>` for per-rule tasks or `<group-name>-<phase>` for grouped / non-rule tasks. `phase` is the phase the task belongs to (current phase or a future phase you're pre-populating). `ruleId` is optional — set it for per-rule tasks so the engine can credit the rule_id in milestone derivation.
352
+ - `TaskUpdate({id, status?, summary?})` — updates a task's status to `pending` / `in_progress` / `completed` / `failed`, optionally with a short summary.
353
+ - `TaskComplete({id, summary?})` — sugar for `TaskUpdate({id, status:"completed", summary})`. Use this for the common path after finishing a unit of work.
354
+
355
+ After you call `TaskCreate` for your decomposition and exit the current turn, the Ralph loop pulls the next pending task and runs it. Finish the work, call `TaskComplete`, and the loop advances. If a task can't be completed (irrecoverable error), call `TaskUpdate({id, status:"failed", summary:"reason"})` so the queue moves on rather than blocking on the failed task.
356
+
357
+ Examples:
358
+
359
+ ```
360
+ TaskCreate({ id: "R001-skill_authoring", title: "Author skill for R001",
361
+ phase: "skill_authoring", ruleId: "R001" })
362
+
363
+ TaskCreate({ id: "trust-bundle-skill_authoring",
364
+ title: "R013/R015/R017 — required-fields table",
365
+ phase: "skill_authoring" })
366
+
367
+ TaskComplete({ id: "R001-skill_authoring",
368
+ summary: "regex check passes 89/90; R001 done" })
369
+ ```
370
+
347
371
  ### Persisted methodology — PATTERNS.md OR phase logs OR AGENT.md decisions
348
372
 
349
373
  The principle: capture framework-level decisions to disk before each phase advance. The conversation will compact, agents will restart, the next phase will lose grounding. Whichever format you pick, write to disk — don't rely on conversation context that disappears.
@@ -132,6 +132,53 @@ existing catalog. Therefore, when composing the brief:
132
132
  catalog.json.** rule_catalog uses workspace file locking (B9);
133
133
  sandbox_exec bypasses it and races with other writers.
134
134
 
135
+ ## 如何读取规则文件 (默认整本读取)
136
+
137
+ 法规文件是审核的权威依据。你为每条规则记录的 `source_ref` 都要能在
138
+ 原文中复核。对于绝大多数规则文件 (单个文件 < 50 KB / < ~100 页),
139
+ **用 `workspace_file` (operation=read) 一次性整本读取**:
140
+
141
+ ```js
142
+ workspace_file({ operation: "read", scope: "project", path: "Rules/01_某某办法.md" })
143
+ ```
144
+
145
+ `workspace_file.read` 单次上限 50,000 字符, 足以覆盖几乎所有单个法规
146
+ 文件。这是默认行为: **在抽取规则之前, 把每一份法规文件都整本读一遍。**
147
+
148
+ ### 工具选择 — `workspace_file` 还是 `sandbox_exec`
149
+
150
+ | 工具 | 单次上限 | 适用 |
151
+ |---|---:|---|
152
+ | `workspace_file` (read) | 50,000 字符 | **整本读取法规/规则文件** |
153
+ | `sandbox_exec` (cat/head) | 10,000 字符 | 短命令, 不适合整文件读取 |
154
+
155
+ `sandbox_exec` 是为执行 shell 命令设计的, 10K 上限对绝大多数法规太小。
156
+ `cat rules/01_*.md` 只会返回前 ~10 KB, 后面被截断为 `\n[truncated]`。
157
+ 反复用 `head -N` / `tail -M` 滑动窗口会丢失行号位置信息, 也浪费交互
158
+ 回合。**遇到截断, 别和上限较劲——换工具。**
159
+
160
+ ### 法规与样本的不对称 — 法规整本读, 样本按需抽样
161
+
162
+ 法规通常只有 1–10 份, 权威性强, 只需读一次。每一份法规都整本读取,
163
+ 作为后续所有规则抽取与引用的基础。
164
+
165
+ 样本文档可能 30 份甚至 1000+ 份, 异质性强, 在测试阶段会被多次读取。
166
+ **不要试图把每个样本都整本读一遍**——用规则适用性过滤、抽样子集来
167
+ 聚焦注意力。
168
+
169
+ ### 例外 — 单个法规超过 200K 字符时
170
+
171
+ 实践中极少见。test_data_4 中最大的法规 42 KB; 银行业 资管新规 +
172
+ 信披办法 等典型法规也都在 50 KB 以内。但如果你确实遇到一份超大法规,
173
+ 读取整本会挤压上下文窗口 (启发式: 单文件超过 ~200,000 字符 或超过你
174
+ 上下文预算的 ~25%), 此时由你判断:
175
+
176
+ - 按章 (`第X章`) 分段读, 用 `document_parse` 或分页的 `workspace_file`
177
+ - 或建立工作区内的索引文件, 标注每章的偏移位置, 抽取规则时按需读取
178
+
179
+ 50 KB 的上限已经足够高, 上述例外情形几乎不会触发。**默认就是整本读;
180
+ 只有当文件确实太大时才偏离这一默认。**
181
+
135
182
  ## Extraction Strategies
136
183
 
137
184
  ### Strategy 1: Structured Input (Developer User Provides Rules)
@@ -85,7 +85,7 @@ KC 的 main agent 是指挥者。指挥者决定下一步做什么——而这
85
85
  - 一条规则的判断逻辑是另一条的子串或近似变体
86
86
  - 一次失败通常意味着多条规则同时失败(R013 不可能在 R015 失败的情况下通过)
87
87
 
88
- 例:R013 / R015 / R017 都在检查报告第 3 页那张表是否包含某些必填字段。同一个 chunk、同一次 parse、同一种 verdict 形状。合并为 `check_r013_r015_r017.py`,并创建一个 TaskCreate 任务 `R013/R015/R017 — 必填字段表`。引擎从文件系统推导里程碑时会识别这个合并 check.py,给三个 rule_id 都计入覆盖。
88
+ 例:R013 / R015 / R017 都在检查报告第 3 页那张表是否包含某些必填字段。同一个 chunk、同一次 parse、同一种 verdict 形状。合并为 `check_r013_r015_r017.py`,并创建一个任务:`TaskCreate({id: "R013-R015-R017-skill_authoring", title: "R013/R015/R017 — 必填字段表", phase: "skill_authoring"})`。引擎从文件系统推导里程碑时会识别这个合并 check.py,给三个 rule_id 都计入覆盖。
89
89
 
90
90
  ### 何时保持独立
91
91
 
@@ -337,6 +337,30 @@ PATTERNS.md 全文控制在约 5 KB 之内。超过时,剪掉最不可执行
337
337
  5. **挑第一个任务**。做到完整(skill + check + 至少一次本地测试)。把学到的写进 PATTERNS.md。换下一个任务。
338
338
  6. **任务做到第 5 个、第 10 个时**:停下来重读 PATTERNS.md。如果新积累的 pattern 暗示要重构早期工作,**现在做**(便宜)而不是更晚(昂贵)。
339
339
 
340
+ ### 调用 TaskCreate / TaskUpdate / TaskComplete
341
+
342
+ 引擎注册了三个任务面板工具(v0.7.3+):
343
+
344
+ - `TaskCreate({id, title, phase, ruleId?})` —— 在 `tasks.json` 中新增一条任务。`id` 在本会话内必须唯一;per-rule 任务建议用 `<rule_id>-<phase>` 这种稳定形状,分组 / 非规则任务用 `<group-name>-<phase>`。`phase` 是该任务所属的阶段(当前阶段或你预先排好的未来阶段)。`ruleId` 可选 —— 设上之后,引擎在里程碑推导时能把这个 rule_id 计入覆盖。
345
+ - `TaskUpdate({id, status?, summary?})` —— 把任务状态改为 `pending` / `in_progress` / `completed` / `failed`,可选附一行简要 summary。
346
+ - `TaskComplete({id, summary?})` —— `TaskUpdate({id, status:"completed", summary})` 的语法糖。完成一个工作单元后走这条最常用的路径。
347
+
348
+ 调用 `TaskCreate` 把你的拆分写进面板、本回合结束之后,Ralph 循环会取下一条 pending 任务执行。完成工作、调 `TaskComplete`,循环再前进。如果一条任务无法完成(不可恢复的错误),调 `TaskUpdate({id, status:"failed", summary:"原因"})`,让队列继续推进而不是被堵在那里。
349
+
350
+ 示例:
351
+
352
+ ```
353
+ TaskCreate({ id: "R001-skill_authoring", title: "为 R001 撰写 skill",
354
+ phase: "skill_authoring", ruleId: "R001" })
355
+
356
+ TaskCreate({ id: "trust-bundle-skill_authoring",
357
+ title: "R013/R015/R017 — 必填字段表",
358
+ phase: "skill_authoring" })
359
+
360
+ TaskComplete({ id: "R001-skill_authoring",
361
+ summary: "正则核查在 89/90 通过;R001 完成" })
362
+ ```
363
+
340
364
  ### 持久化方法论 —— PATTERNS.md 或 phase 日志 或 AGENT.md decisions
341
365
 
342
366
  原则:在每次 phase 推进之前,把框架级的决定写到磁盘。对话会被 compact、agent 会重启、下一个 phase 会失去上下文。不管你选哪种格式,**写到磁盘** —— 不要依赖会消失的对话上下文。