kc-beta 0.7.2 → 0.7.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +15 -8
- package/package.json +1 -1
- package/src/agent/engine.js +12 -0
- package/src/agent/tools/copy-to-workspace.js +17 -12
- package/src/agent/tools/sandbox-exec.js +4 -1
- package/src/agent/tools/task-board.js +194 -0
- package/src/agent/tools/workspace-file.js +58 -44
- package/src/config.js +6 -4
- package/template/CLAUDE.md +13 -0
- package/template/skills/en/meta-meta/rule-extraction/SKILL.md +59 -0
- package/template/skills/en/meta-meta/work-decomposition/SKILL.md +25 -1
- package/template/skills/zh/meta-meta/rule-extraction/SKILL.md +47 -0
- package/template/skills/zh/meta-meta/work-decomposition/SKILL.md +25 -1
package/README.md
CHANGED
|
@@ -216,28 +216,35 @@ Quality Thresholds, Language.
|
|
|
216
216
|
|
|
217
217
|
## Status
|
|
218
218
|
|
|
219
|
-
**v0.
|
|
219
|
+
**v0.7.3 — codex review patch release.** Latest line in the v0.7.x
|
|
220
|
+
hardening track. Architectural payload from v0.6.0+ is still in place:
|
|
220
221
|
|
|
221
222
|
- Parallel ralph-loop (up to 8 concurrent workers) with a heap-safety
|
|
222
223
|
conformance gate
|
|
223
224
|
- Native chunker + RAG (onion-peeler + CJK bigram keyword index +
|
|
224
225
|
one-shot LLM bundle classifier, ported from the AMC verification app)
|
|
225
|
-
-
|
|
226
|
-
|
|
226
|
+
- Agent-owned task board: the agent reads the rule list from
|
|
227
|
+
`describeState`, decides decomposition (per-rule / grouped / range),
|
|
228
|
+
and calls `TaskCreate` / `TaskUpdate` / `TaskComplete` to drive the
|
|
229
|
+
Ralph loop. Source-context auto-attach pulls rule NL + evidence chunks
|
|
230
|
+
+ sibling rules into the prompt of each task as it runs.
|
|
227
231
|
- Workspace file locking for shared coordination files (`rules/catalog.json`,
|
|
228
|
-
`rules/manifest.json`, `
|
|
232
|
+
`rules/manifest.json`, `refs/manifest.json`, `tasks.json`,
|
|
233
|
+
`session-state.json`) — every writer goes through `withFileLock`.
|
|
229
234
|
- `agent_tool` gets `wait` / `poll` / `list` / `kill` operations +
|
|
230
235
|
`stale_subagents` phase-advance signal
|
|
231
|
-
-
|
|
236
|
+
- FINALIZATION phase packages the session into a shippable deliverable
|
|
232
237
|
(canonical `rule_skills/` layout + README + coverage report + final
|
|
233
238
|
dashboard)
|
|
239
|
+
- Filesystem-derived phase milestones (v0.7.0+): the engine reads disk
|
|
240
|
+
artifacts for advance criteria, never trusts tool-call assertions
|
|
234
241
|
- Input stays active during streaming (type-ahead queue), arrow keys +
|
|
235
242
|
history recall, CTX smoothing + peak, per-provider context-limit caps,
|
|
236
243
|
`/tools`, `/parallelism`, and more
|
|
237
244
|
|
|
238
|
-
See [DEV_LOG.md](./DEV_LOG.md) for the
|
|
239
|
-
[docs/
|
|
240
|
-
|
|
245
|
+
See [DEV_LOG.md](./DEV_LOG.md) for the per-release change breakdowns and
|
|
246
|
+
[docs/update_design_v7.md](./docs/update_design_v7.md) for the v0.7.x
|
|
247
|
+
plan and patch notes.
|
|
241
248
|
|
|
242
249
|
Bug reports and PRs welcome at <https://github.com/kitchen-engineer42/kc-cli>.
|
|
243
250
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "kc-beta",
|
|
3
|
-
"version": "0.7.
|
|
3
|
+
"version": "0.7.3",
|
|
4
4
|
"description": "KC Agent — LLM document verification agent (pure Node.js CLI). Dual-licensed: PolyForm Noncommercial 1.0.0 for personal/noncommercial use; commercial license required for enterprise production. See LICENSE and LICENSE-COMMERCIAL.md.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
package/src/agent/engine.js
CHANGED
|
@@ -37,6 +37,7 @@ import { EvolutionCycleTool } from "./tools/evolution-cycle.js";
|
|
|
37
37
|
import { TierDowngradeTool } from "./tools/tier-downgrade.js";
|
|
38
38
|
import { AgentTool } from "./tools/agent-tool.js";
|
|
39
39
|
import { WebSearchTool } from "./tools/web-search.js";
|
|
40
|
+
import { TaskCreateTool, TaskUpdateTool, TaskCompleteTool } from "./tools/task-board.js";
|
|
40
41
|
import { SkillLoader } from "./skill-loader.js";
|
|
41
42
|
import { TaskManager } from "./task-manager.js";
|
|
42
43
|
import { Scheduler } from "./scheduler.js";
|
|
@@ -475,6 +476,16 @@ export class AgentEngine {
|
|
|
475
476
|
() => this.currentPhase,
|
|
476
477
|
),
|
|
477
478
|
new WebSearchTool(this.config.tavilyApiKey),
|
|
479
|
+
// v0.7.3: completes the v0.7.0 "agent owns TaskBoard" design.
|
|
480
|
+
// Skills already reference TaskCreate by name; these tools make
|
|
481
|
+
// that contract truthful. See task-board.js + work-decomposition
|
|
482
|
+
// SKILL.md. Skipped for subagents — they don't own a task board
|
|
483
|
+
// (taskManager is null in subagent scope, line 216).
|
|
484
|
+
...(this.taskManager ? [
|
|
485
|
+
new TaskCreateTool(this.workspace, this.taskManager),
|
|
486
|
+
new TaskUpdateTool(this.workspace, this.taskManager),
|
|
487
|
+
new TaskCompleteTool(this.workspace, this.taskManager),
|
|
488
|
+
] : []),
|
|
478
489
|
],
|
|
479
490
|
// Distillation+ only (DISTILL mode)
|
|
480
491
|
distill: [
|
|
@@ -1308,6 +1319,7 @@ export class AgentEngine {
|
|
|
1308
1319
|
yield new AgentEvent({
|
|
1309
1320
|
type: "tool_result",
|
|
1310
1321
|
name: tc.name,
|
|
1322
|
+
input: inputData,
|
|
1311
1323
|
output: historyContent,
|
|
1312
1324
|
isError: result.isError,
|
|
1313
1325
|
});
|
|
@@ -94,7 +94,7 @@ export class CopyToWorkspaceTool extends BaseTool {
|
|
|
94
94
|
this._appendGitignore(`refs/${targetName}`);
|
|
95
95
|
}
|
|
96
96
|
|
|
97
|
-
this._appendManifest({
|
|
97
|
+
await this._appendManifest({
|
|
98
98
|
target: targetRel,
|
|
99
99
|
source: sourcePath,
|
|
100
100
|
size: stat.size,
|
|
@@ -113,17 +113,22 @@ export class CopyToWorkspaceTool extends BaseTool {
|
|
|
113
113
|
);
|
|
114
114
|
}
|
|
115
115
|
|
|
116
|
-
_appendManifest(entry) {
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
116
|
+
async _appendManifest(entry) {
|
|
117
|
+
// v0.7.3: refs/manifest.json is a shared coordination path — wrap the
|
|
118
|
+
// whole read-modify-write under the workspace lock so two parallel
|
|
119
|
+
// copy_to_workspace calls (main agent + subagent) don't lose entries.
|
|
120
|
+
return await this._workspace.withSharedLockIfApplicable(MANIFEST_REL, () => {
|
|
121
|
+
const manifestAbs = this._workspace.resolvePath(MANIFEST_REL);
|
|
122
|
+
fs.mkdirSync(path.dirname(manifestAbs), { recursive: true });
|
|
123
|
+
let entries = [];
|
|
124
|
+
if (fs.existsSync(manifestAbs)) {
|
|
125
|
+
try { entries = JSON.parse(fs.readFileSync(manifestAbs, "utf-8")); }
|
|
126
|
+
catch { entries = []; }
|
|
127
|
+
}
|
|
128
|
+
if (!Array.isArray(entries)) entries = [];
|
|
129
|
+
entries.push(entry);
|
|
130
|
+
fs.writeFileSync(manifestAbs, JSON.stringify(entries, null, 2), "utf-8");
|
|
131
|
+
});
|
|
127
132
|
}
|
|
128
133
|
|
|
129
134
|
_appendGitignore(line) {
|
|
@@ -44,7 +44,10 @@ export class SandboxExecTool extends BaseTool {
|
|
|
44
44
|
"Execute a shell command. " +
|
|
45
45
|
"cwd='workspace' (default) runs in KC's workspace. " +
|
|
46
46
|
"cwd='project' runs in the user's project directory. " +
|
|
47
|
-
"Pipes, redirects, and chained commands (&&) are supported."
|
|
47
|
+
"Pipes, redirects, and chained commands (&&) are supported. " +
|
|
48
|
+
"stdout + stderr combined are capped at 10,000 chars; longer output is truncated. " +
|
|
49
|
+
"For reading individual files larger than ~10 KB (e.g. regulation documents), " +
|
|
50
|
+
"prefer workspace_file (operation=read) which has a larger 50 KB cap."
|
|
48
51
|
);
|
|
49
52
|
}
|
|
50
53
|
|
|
@@ -0,0 +1,194 @@
|
|
|
1
|
+
import { BaseTool, ToolResult } from "./base.js";
|
|
2
|
+
|
|
3
|
+
const TASKS_REL = "tasks.json";
|
|
4
|
+
|
|
5
|
+
/**
|
|
6
|
+
* v0.7.3 — TaskCreate / TaskUpdate / TaskComplete tools.
|
|
7
|
+
*
|
|
8
|
+
* Completes the v0.7.0 "agent owns TaskBoard" design. The engine no longer
|
|
9
|
+
* auto-populates per-rule tasks on phase entry (PER_RULE_PHASES is empty by
|
|
10
|
+
* default — see task-manager.js); the agent reads the rule list via
|
|
11
|
+
* describeState, picks a decomposition (single / grouped / range / non-rule),
|
|
12
|
+
* and calls these tools to populate tasks.json. The Ralph loop in
|
|
13
|
+
* AgentEngine._runTaskLoopSerial then walks pending tasks one at a time.
|
|
14
|
+
*
|
|
15
|
+
* Skill teaching for these tools lives in
|
|
16
|
+
* template/skills/{en,zh}/meta-meta/work-decomposition/SKILL.md.
|
|
17
|
+
*
|
|
18
|
+
* tasks.json is a shared-coordination path (workspace.js
|
|
19
|
+
* SHARED_COORDINATION_PATHS) — every write goes through
|
|
20
|
+
* withSharedLockIfApplicable so two writers (main + subagent) serialize.
|
|
21
|
+
*/
|
|
22
|
+
|
|
23
|
+
export class TaskCreateTool extends BaseTool {
|
|
24
|
+
constructor(workspace, taskManager) {
|
|
25
|
+
super();
|
|
26
|
+
this._workspace = workspace;
|
|
27
|
+
this._taskManager = taskManager;
|
|
28
|
+
}
|
|
29
|
+
|
|
30
|
+
get name() { return "TaskCreate"; }
|
|
31
|
+
|
|
32
|
+
get description() {
|
|
33
|
+
return (
|
|
34
|
+
"Add a task to the session task board. Tasks gate the Ralph loop — " +
|
|
35
|
+
"after the current turn ends, the engine pulls the next pending task " +
|
|
36
|
+
"and runs it. Use one task per unit of work you want to iterate on " +
|
|
37
|
+
"(per-rule, per-group, per-document — your decomposition). " +
|
|
38
|
+
"Call this on phase entry after reading describeState."
|
|
39
|
+
);
|
|
40
|
+
}
|
|
41
|
+
|
|
42
|
+
get inputSchema() {
|
|
43
|
+
return {
|
|
44
|
+
type: "object",
|
|
45
|
+
properties: {
|
|
46
|
+
id: {
|
|
47
|
+
type: "string",
|
|
48
|
+
description: "Unique task ID within this session (e.g. 'R001-skill_authoring' or 'group-trust-1').",
|
|
49
|
+
},
|
|
50
|
+
title: {
|
|
51
|
+
type: "string",
|
|
52
|
+
description: "Short human-readable title for the task.",
|
|
53
|
+
},
|
|
54
|
+
phase: {
|
|
55
|
+
type: "string",
|
|
56
|
+
description: "Phase this task belongs to (e.g. 'skill_authoring', 'skill_testing', 'distillation').",
|
|
57
|
+
},
|
|
58
|
+
ruleId: {
|
|
59
|
+
type: "string",
|
|
60
|
+
description: "Optional rule_id if this is a per-rule task. Omit for grouped or non-rule tasks.",
|
|
61
|
+
},
|
|
62
|
+
},
|
|
63
|
+
required: ["id", "title", "phase"],
|
|
64
|
+
};
|
|
65
|
+
}
|
|
66
|
+
|
|
67
|
+
async execute(input) {
|
|
68
|
+
const id = input.id || "";
|
|
69
|
+
const title = input.title || "";
|
|
70
|
+
const phase = input.phase || "";
|
|
71
|
+
const ruleId = input.ruleId || null;
|
|
72
|
+
|
|
73
|
+
if (!id) return new ToolResult("id required", true);
|
|
74
|
+
if (!title) return new ToolResult("title required", true);
|
|
75
|
+
if (!phase) return new ToolResult("phase required", true);
|
|
76
|
+
|
|
77
|
+
return await this._workspace.withSharedLockIfApplicable(TASKS_REL, () => {
|
|
78
|
+
const before = this._taskManager.getAllTasks().some((t) => t.id === id);
|
|
79
|
+
this._taskManager.addTask({ id, title, phase, ruleId });
|
|
80
|
+
if (before) {
|
|
81
|
+
return new ToolResult(`Task ${id} already existed (no-op).`);
|
|
82
|
+
}
|
|
83
|
+
const p = this._taskManager.progress;
|
|
84
|
+
return new ToolResult(
|
|
85
|
+
`Task ${id} created. Board: ${p.pending} pending, ${p.inProgress} in_progress, ${p.completed} completed.`,
|
|
86
|
+
);
|
|
87
|
+
});
|
|
88
|
+
}
|
|
89
|
+
}
|
|
90
|
+
|
|
91
|
+
export class TaskUpdateTool extends BaseTool {
|
|
92
|
+
constructor(workspace, taskManager) {
|
|
93
|
+
super();
|
|
94
|
+
this._workspace = workspace;
|
|
95
|
+
this._taskManager = taskManager;
|
|
96
|
+
}
|
|
97
|
+
|
|
98
|
+
get name() { return "TaskUpdate"; }
|
|
99
|
+
|
|
100
|
+
get description() {
|
|
101
|
+
return (
|
|
102
|
+
"Update a task's status and optional summary. Status: 'pending', " +
|
|
103
|
+
"'in_progress', 'completed', or 'failed'. Use TaskComplete instead " +
|
|
104
|
+
"for the common case of marking a task done with a summary."
|
|
105
|
+
);
|
|
106
|
+
}
|
|
107
|
+
|
|
108
|
+
get inputSchema() {
|
|
109
|
+
return {
|
|
110
|
+
type: "object",
|
|
111
|
+
properties: {
|
|
112
|
+
id: { type: "string", description: "Task ID to update." },
|
|
113
|
+
status: {
|
|
114
|
+
type: "string",
|
|
115
|
+
enum: ["pending", "in_progress", "completed", "failed"],
|
|
116
|
+
description: "New status for the task.",
|
|
117
|
+
},
|
|
118
|
+
summary: {
|
|
119
|
+
type: "string",
|
|
120
|
+
description: "Optional short summary (e.g. why the task failed, what was produced).",
|
|
121
|
+
},
|
|
122
|
+
},
|
|
123
|
+
required: ["id"],
|
|
124
|
+
};
|
|
125
|
+
}
|
|
126
|
+
|
|
127
|
+
async execute(input) {
|
|
128
|
+
const id = input.id || "";
|
|
129
|
+
const status = input.status;
|
|
130
|
+
const summary = input.summary;
|
|
131
|
+
|
|
132
|
+
if (!id) return new ToolResult("id required", true);
|
|
133
|
+
|
|
134
|
+
return await this._workspace.withSharedLockIfApplicable(TASKS_REL, () => {
|
|
135
|
+
const exists = this._taskManager.getAllTasks().some((t) => t.id === id);
|
|
136
|
+
if (!exists) return new ToolResult(`Task ${id} not found.`, true);
|
|
137
|
+
this._taskManager.updateTask(id, { status, summary });
|
|
138
|
+
const p = this._taskManager.progress;
|
|
139
|
+
return new ToolResult(
|
|
140
|
+
`Task ${id} updated${status ? ` to ${status}` : ""}. ` +
|
|
141
|
+
`Board: ${p.pending} pending, ${p.inProgress} in_progress, ${p.completed} completed, ${p.failed} failed.`,
|
|
142
|
+
);
|
|
143
|
+
});
|
|
144
|
+
}
|
|
145
|
+
}
|
|
146
|
+
|
|
147
|
+
export class TaskCompleteTool extends BaseTool {
|
|
148
|
+
constructor(workspace, taskManager) {
|
|
149
|
+
super();
|
|
150
|
+
this._workspace = workspace;
|
|
151
|
+
this._taskManager = taskManager;
|
|
152
|
+
}
|
|
153
|
+
|
|
154
|
+
get name() { return "TaskComplete"; }
|
|
155
|
+
|
|
156
|
+
get description() {
|
|
157
|
+
return (
|
|
158
|
+
"Mark a task as completed with an optional summary. Sugar for " +
|
|
159
|
+
"TaskUpdate({id, status: 'completed', summary}). The Ralph loop " +
|
|
160
|
+
"advances to the next pending task after this returns."
|
|
161
|
+
);
|
|
162
|
+
}
|
|
163
|
+
|
|
164
|
+
get inputSchema() {
|
|
165
|
+
return {
|
|
166
|
+
type: "object",
|
|
167
|
+
properties: {
|
|
168
|
+
id: { type: "string", description: "Task ID to complete." },
|
|
169
|
+
summary: {
|
|
170
|
+
type: "string",
|
|
171
|
+
description: "Optional short summary of what was produced.",
|
|
172
|
+
},
|
|
173
|
+
},
|
|
174
|
+
required: ["id"],
|
|
175
|
+
};
|
|
176
|
+
}
|
|
177
|
+
|
|
178
|
+
async execute(input) {
|
|
179
|
+
const id = input.id || "";
|
|
180
|
+
const summary = input.summary;
|
|
181
|
+
|
|
182
|
+
if (!id) return new ToolResult("id required", true);
|
|
183
|
+
|
|
184
|
+
return await this._workspace.withSharedLockIfApplicable(TASKS_REL, () => {
|
|
185
|
+
const exists = this._taskManager.getAllTasks().some((t) => t.id === id);
|
|
186
|
+
if (!exists) return new ToolResult(`Task ${id} not found.`, true);
|
|
187
|
+
this._taskManager.markDone(id, summary);
|
|
188
|
+
const p = this._taskManager.progress;
|
|
189
|
+
return new ToolResult(
|
|
190
|
+
`Task ${id} completed. Board: ${p.pending} pending, ${p.inProgress} in_progress, ${p.completed} completed.`,
|
|
191
|
+
);
|
|
192
|
+
});
|
|
193
|
+
}
|
|
194
|
+
}
|
|
@@ -30,7 +30,9 @@ export class WorkspaceFileTool extends BaseTool {
|
|
|
30
30
|
"Read, write, or list files. " +
|
|
31
31
|
"scope='workspace' (default): KC's working directory for rules, skills, workflows, results. " +
|
|
32
32
|
"scope='project': the user's project folder where KC was launched — source regulations and samples live here. " +
|
|
33
|
-
"Operations: read (returns file content), write (creates/overwrites a file), list (shows directory contents)."
|
|
33
|
+
"Operations: read (returns file content), write (creates/overwrites a file), list (shows directory contents). " +
|
|
34
|
+
"read returns up to 50,000 chars per call; longer files are truncated. " +
|
|
35
|
+
"For full reads of regulation/rule documents (typically smaller than this cap), prefer this tool over sandbox_exec."
|
|
34
36
|
);
|
|
35
37
|
}
|
|
36
38
|
|
|
@@ -87,7 +89,7 @@ export class WorkspaceFileTool extends BaseTool {
|
|
|
87
89
|
|
|
88
90
|
try {
|
|
89
91
|
if (op === "read") return this._read(filePath, scope);
|
|
90
|
-
if (op === "write") return this._write(filePath, content, scope);
|
|
92
|
+
if (op === "write") return await this._write(filePath, content, scope);
|
|
91
93
|
if (op === "list") return this._list(filePath, scope);
|
|
92
94
|
return new ToolResult(`Unknown operation: ${op}`, true);
|
|
93
95
|
} catch (err) {
|
|
@@ -107,56 +109,68 @@ export class WorkspaceFileTool extends BaseTool {
|
|
|
107
109
|
return new ToolResult(text);
|
|
108
110
|
}
|
|
109
111
|
|
|
110
|
-
_write(filePath, content, scope) {
|
|
112
|
+
async _write(filePath, content, scope) {
|
|
111
113
|
if (!filePath || filePath === ".") {
|
|
112
114
|
return new ToolResult("Path required for write operation", true);
|
|
113
115
|
}
|
|
114
116
|
const resolved = this._resolveForScope(filePath, scope);
|
|
115
|
-
fs.mkdirSync(path.dirname(resolved), { recursive: true });
|
|
116
|
-
|
|
117
|
-
// v0.7.0 Group M (#84 remainder): on case-insensitive filesystems
|
|
118
|
-
// (macOS/Windows defaults), warn when the target's basename collides
|
|
119
|
-
// with an existing sibling differing only in case. Write proceeds
|
|
120
|
-
// — agents may legitimately overwrite — but the agent gets visible
|
|
121
|
-
// signal so it doesn't end up confused like E2E #5 GLM ("SKILL.md
|
|
122
|
-
// disappeared" when the inode was shared with skill.md). Workspace-
|
|
123
|
-
// scope only; project-dir scope is the user's territory.
|
|
124
|
-
let collisionNote = "";
|
|
125
|
-
if (
|
|
126
|
-
scope === "workspace" &&
|
|
127
|
-
this._workspace.fsCaseSensitive === false
|
|
128
|
-
) {
|
|
129
|
-
try {
|
|
130
|
-
const parent = path.dirname(resolved);
|
|
131
|
-
const targetBase = path.basename(resolved);
|
|
132
|
-
const targetLower = targetBase.toLowerCase();
|
|
133
|
-
const siblings = fs.readdirSync(parent);
|
|
134
|
-
const collision = siblings.find(
|
|
135
|
-
(s) => s !== targetBase && s.toLowerCase() === targetLower,
|
|
136
|
-
);
|
|
137
|
-
if (collision) {
|
|
138
|
-
collisionNote =
|
|
139
|
-
` ⚠ case-collision: case-insensitive filesystem already has '${collision}'` +
|
|
140
|
-
` at this path; both names resolve to the same inode. Pick one canonical case` +
|
|
141
|
-
` (lowercase preferred for skill files) and use it consistently — otherwise` +
|
|
142
|
-
` archive_file / Read on either name affects the other.`;
|
|
143
|
-
}
|
|
144
|
-
} catch { /* readdirSync may fail on a fresh dir; that's fine, no collision possible */ }
|
|
145
|
-
}
|
|
146
117
|
|
|
147
|
-
|
|
118
|
+
const doWrite = () => {
|
|
119
|
+
fs.mkdirSync(path.dirname(resolved), { recursive: true });
|
|
120
|
+
|
|
121
|
+
// v0.7.0 Group M (#84 remainder): on case-insensitive filesystems
|
|
122
|
+
// (macOS/Windows defaults), warn when the target's basename collides
|
|
123
|
+
// with an existing sibling differing only in case. Write proceeds
|
|
124
|
+
// — agents may legitimately overwrite — but the agent gets visible
|
|
125
|
+
// signal so it doesn't end up confused like E2E #5 GLM ("SKILL.md
|
|
126
|
+
// disappeared" when the inode was shared with skill.md). Workspace-
|
|
127
|
+
// scope only; project-dir scope is the user's territory.
|
|
128
|
+
let collisionNote = "";
|
|
129
|
+
if (
|
|
130
|
+
scope === "workspace" &&
|
|
131
|
+
this._workspace.fsCaseSensitive === false
|
|
132
|
+
) {
|
|
133
|
+
try {
|
|
134
|
+
const parent = path.dirname(resolved);
|
|
135
|
+
const targetBase = path.basename(resolved);
|
|
136
|
+
const targetLower = targetBase.toLowerCase();
|
|
137
|
+
const siblings = fs.readdirSync(parent);
|
|
138
|
+
const collision = siblings.find(
|
|
139
|
+
(s) => s !== targetBase && s.toLowerCase() === targetLower,
|
|
140
|
+
);
|
|
141
|
+
if (collision) {
|
|
142
|
+
collisionNote =
|
|
143
|
+
` ⚠ case-collision: case-insensitive filesystem already has '${collision}'` +
|
|
144
|
+
` at this path; both names resolve to the same inode. Pick one canonical case` +
|
|
145
|
+
` (lowercase preferred for skill files) and use it consistently — otherwise` +
|
|
146
|
+
` archive_file / Read on either name affects the other.`;
|
|
147
|
+
}
|
|
148
|
+
} catch { /* readdirSync may fail on a fresh dir; that's fine, no collision possible */ }
|
|
149
|
+
}
|
|
150
|
+
|
|
151
|
+
fs.writeFileSync(resolved, content, "utf-8");
|
|
152
|
+
|
|
153
|
+
// Auto-commit to git for workspace writes (silently no-ops if gitignored or git unavailable)
|
|
154
|
+
let traceId = null;
|
|
155
|
+
if (scope === "workspace") {
|
|
156
|
+
traceId = this._workspace.autoCommit(filePath, "update");
|
|
157
|
+
}
|
|
158
|
+
|
|
159
|
+
const label = scope === "project" ? `[project] ${filePath}` : filePath;
|
|
160
|
+
let msg = `Wrote ${content.length} chars to ${label}`;
|
|
161
|
+
if (traceId) msg += ` [trace: ${traceId}]`;
|
|
162
|
+
if (collisionNote) msg += collisionNote;
|
|
163
|
+
return new ToolResult(msg);
|
|
164
|
+
};
|
|
148
165
|
|
|
149
|
-
//
|
|
150
|
-
|
|
166
|
+
// v0.7.3: route writes to shared coordination paths (rules/catalog.json,
|
|
167
|
+
// tasks.json, refs/manifest.json, etc.) through the workspace lock so
|
|
168
|
+
// concurrent writers serialize. No-op for non-shared paths and for
|
|
169
|
+
// project-scope writes (project dir is the user's, not shared engine state).
|
|
151
170
|
if (scope === "workspace") {
|
|
152
|
-
|
|
171
|
+
return await this._workspace.withSharedLockIfApplicable(filePath, doWrite);
|
|
153
172
|
}
|
|
154
|
-
|
|
155
|
-
const label = scope === "project" ? `[project] ${filePath}` : filePath;
|
|
156
|
-
let msg = `Wrote ${content.length} chars to ${label}`;
|
|
157
|
-
if (traceId) msg += ` [trace: ${traceId}]`;
|
|
158
|
-
if (collisionNote) msg += collisionNote;
|
|
159
|
-
return new ToolResult(msg);
|
|
173
|
+
return doWrite();
|
|
160
174
|
}
|
|
161
175
|
|
|
162
176
|
_list(filePath, scope) {
|
package/src/config.js
CHANGED
|
@@ -90,10 +90,12 @@ export function loadSettings(workspacePath) {
|
|
|
90
90
|
tier3: env.TIER3 || gc.tiers?.tier3 || "",
|
|
91
91
|
tier4: env.TIER4 || gc.tiers?.tier4 || "",
|
|
92
92
|
|
|
93
|
-
// VLM tiers (vision/OCR models)
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
93
|
+
// VLM tiers (vision/OCR models). v0.7.3: accept OCR_MODEL_TIER* as
|
|
94
|
+
// alias since template/.env.template + initializer.js seed that name.
|
|
95
|
+
// VLM_TIER* takes precedence when both are set.
|
|
96
|
+
vlmTier1: env.VLM_TIER1 || env.OCR_MODEL_TIER1 || gc.vlm_tiers?.tier1 || "",
|
|
97
|
+
vlmTier2: env.VLM_TIER2 || env.OCR_MODEL_TIER2 || gc.vlm_tiers?.tier2 || "",
|
|
98
|
+
vlmTier3: env.VLM_TIER3 || env.OCR_MODEL_TIER3 || gc.vlm_tiers?.tier3 || "",
|
|
97
99
|
|
|
98
100
|
// Worker LLM — optional, defaults to conductor config (process.env wins)
|
|
99
101
|
workerProvider: penv.KC_WORKER_PROVIDER || gc.worker_provider || "",
|
package/template/CLAUDE.md
CHANGED
|
@@ -23,6 +23,13 @@ skills/ — Meta skills encoding verification methodology
|
|
|
23
23
|
.env — Configuration: API keys, model tiers, thresholds, language
|
|
24
24
|
```
|
|
25
25
|
|
|
26
|
+
Note: KC's session workspace under `~/.kc_agent/workspaces/<sessionId>/`
|
|
27
|
+
uses lowercase counterparts (`rules/`, `samples/`, `input/`, `output/`,
|
|
28
|
+
`logs/`, `workflows/`, `rule_skills/`) — these are runtime-internal and
|
|
29
|
+
separate from this project's user-facing folders above. The asymmetry
|
|
30
|
+
is intentional: title-case for human-facing project dirs, lowercase for
|
|
31
|
+
KC's working state.
|
|
32
|
+
|
|
26
33
|
## Your Mission
|
|
27
34
|
|
|
28
35
|
Follow this lifecycle. Each step references the skill(s) to consult:
|
|
@@ -93,6 +100,12 @@ skills/ — 编码核查方法论的元技能
|
|
|
93
100
|
.env — 配置:API密钥、模型层级、阈值、语言
|
|
94
101
|
```
|
|
95
102
|
|
|
103
|
+
注:KC 在 `~/.kc_agent/workspaces/<sessionId>/` 下的会话工作区使用
|
|
104
|
+
小写对应目录(`rules/`、`samples/`、`input/`、`output/`、`logs/`、
|
|
105
|
+
`workflows/`、`rule_skills/`)—— 这些是运行时内部目录,与本项目上面
|
|
106
|
+
那些用户可见的目录是分开的。这种大小写不对称是有意的:项目里给人看
|
|
107
|
+
的目录用首字母大写;KC 自己的工作状态用小写。
|
|
108
|
+
|
|
96
109
|
## 你的使命
|
|
97
110
|
|
|
98
111
|
遵循以下生命周期。每一步标注了需要参考的技能:
|
|
@@ -133,6 +133,65 @@ conversation or existing catalog. Therefore, when composing the brief:
|
|
|
133
133
|
catalog.json.** rule_catalog uses workspace file locking;
|
|
134
134
|
sandbox_exec bypasses it and races with other writers.
|
|
135
135
|
|
|
136
|
+
## How to read regulation files (default: read whole)
|
|
137
|
+
|
|
138
|
+
Regulations are the audit's authoritative basis. Every `source_ref`
|
|
139
|
+
in your extracted rules must be verifiable against the source text.
|
|
140
|
+
For typical regulation documents (a single file under ~50 KB / under
|
|
141
|
+
~100 pages), **read each regulation file whole using `workspace_file`
|
|
142
|
+
(operation=read) in a single call**:
|
|
143
|
+
|
|
144
|
+
```js
|
|
145
|
+
workspace_file({ operation: "read", scope: "project", path: "Rules/01_some_regulation.md" })
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
`workspace_file.read` is capped at 50,000 chars per call, which
|
|
149
|
+
covers virtually every individual regulation document. This is the
|
|
150
|
+
default. **Read every regulation file whole before you start
|
|
151
|
+
extracting rules from any of them.**
|
|
152
|
+
|
|
153
|
+
### Tool choice — `workspace_file` vs `sandbox_exec`
|
|
154
|
+
|
|
155
|
+
| Tool | Per-call cap | Use for |
|
|
156
|
+
|---|---:|---|
|
|
157
|
+
| `workspace_file` (read) | 50,000 chars | **full reads of regulation / rule documents** |
|
|
158
|
+
| `sandbox_exec` (cat/head/etc) | 10,000 chars | shell commands, **not** full file reads |
|
|
159
|
+
|
|
160
|
+
`sandbox_exec` is designed for shell commands; its 10K cap is too
|
|
161
|
+
small for most regulations. `cat rules/01_*.md` returns only the
|
|
162
|
+
first ~10 KB followed by `\n[truncated]`. Re-issuing with `head -N` /
|
|
163
|
+
`tail -M` to scroll the window loses positional precision and burns
|
|
164
|
+
turns. **When you see truncation, don't fight the cap — switch
|
|
165
|
+
tools.**
|
|
166
|
+
|
|
167
|
+
### Asymmetry — regs read whole, samples sampled
|
|
168
|
+
|
|
169
|
+
Regulations are limited (typically 1-10 files), authoritative, and
|
|
170
|
+
read once. Read every regulation whole.
|
|
171
|
+
|
|
172
|
+
Sample documents may number 30 to 1000+, are heterogeneous, and get
|
|
173
|
+
read many times during testing. **Don't try to read every sample
|
|
174
|
+
whole.** Use rule-applicability filters or sampled subsets to focus
|
|
175
|
+
attention.
|
|
176
|
+
|
|
177
|
+
### Escape valve — when a single reg exceeds ~200K chars
|
|
178
|
+
|
|
179
|
+
Rare in practice. The largest regulation in `test_data_4` is 42 KB;
|
|
180
|
+
typical Chinese banking regs (资管新规, 信披办法, etc.) all fit
|
|
181
|
+
under 50 KB. But if you do encounter a single regulation so large
|
|
182
|
+
that reading it whole would crowd the context window — heuristic:
|
|
183
|
+
the file exceeds ~200,000 chars or ~25% of your context budget —
|
|
184
|
+
use your own judgment:
|
|
185
|
+
|
|
186
|
+
- Read by chapter (e.g., `第X章` / `Chapter X`) using `document_parse`
|
|
187
|
+
or paginated `workspace_file` reads
|
|
188
|
+
- Or build an in-workspace index file pointing to chapter offsets and
|
|
189
|
+
read on-demand per rule being extracted
|
|
190
|
+
|
|
191
|
+
The 50 KB cap is high enough that this almost never triggers. **The
|
|
192
|
+
default is read whole; deviate only when the file genuinely doesn't
|
|
193
|
+
fit.**
|
|
194
|
+
|
|
136
195
|
## Extraction Strategies
|
|
137
196
|
|
|
138
197
|
### Strategy 1: Structured Input (Developer User Provides Rules)
|
|
@@ -85,7 +85,7 @@ Bundle multiple rules into a single task (and a single check_r###_r###.py file)
|
|
|
85
85
|
- The judgment logic for one rule is a substring or close variant of the next
|
|
86
86
|
- A single failure typically implies multiple failures (you can't pass R013 if R015 fails)
|
|
87
87
|
|
|
88
|
-
Example: R013 / R015 / R017 all check that a specific table on page 3 of the report contains certain mandatory fields. Same chunk, same parse, same verdict shape. Bundle as `check_r013_r015_r017.py` and create a single
|
|
88
|
+
Example: R013 / R015 / R017 all check that a specific table on page 3 of the report contains certain mandatory fields. Same chunk, same parse, same verdict shape. Bundle as `check_r013_r015_r017.py` and create a single task: `TaskCreate({id: "R013-R015-R017-skill_authoring", title: "R013/R015/R017 — required-fields table", phase: "skill_authoring"})`. The engine's filesystem-derived milestones recognize the grouped check.py and credit all three rule_ids.
|
|
89
89
|
|
|
90
90
|
### When to keep separate
|
|
91
91
|
|
|
@@ -344,6 +344,30 @@ When entering skill_authoring with an empty TaskBoard:
|
|
|
344
344
|
5. **Pick the first task.** Work it to completion (skill + check + at least one local test). Update PATTERNS.md with whatever you learned. Move to the next task.
|
|
345
345
|
6. **At task ~5 and task ~10:** stop and re-read PATTERNS.md. If patterns suggest a refactor of earlier work, do it now (cheap) rather than later (expensive).
|
|
346
346
|
|
|
347
|
+
### Calling TaskCreate / TaskUpdate / TaskComplete
|
|
348
|
+
|
|
349
|
+
The engine registers three task-board tools (v0.7.3+):
|
|
350
|
+
|
|
351
|
+
- `TaskCreate({id, title, phase, ruleId?})` — adds a task to `tasks.json`. `id` must be unique within the session; pick a stable shape like `<rule_id>-<phase>` for per-rule tasks or `<group-name>-<phase>` for grouped / non-rule tasks. `phase` is the phase the task belongs to (current phase or a future phase you're pre-populating). `ruleId` is optional — set it for per-rule tasks so the engine can credit the rule_id in milestone derivation.
|
|
352
|
+
- `TaskUpdate({id, status?, summary?})` — updates a task's status to `pending` / `in_progress` / `completed` / `failed`, optionally with a short summary.
|
|
353
|
+
- `TaskComplete({id, summary?})` — sugar for `TaskUpdate({id, status:"completed", summary})`. Use this for the common path after finishing a unit of work.
|
|
354
|
+
|
|
355
|
+
After you call `TaskCreate` for your decomposition and exit the current turn, the Ralph loop pulls the next pending task and runs it. Finish the work, call `TaskComplete`, and the loop advances. If a task can't be completed (irrecoverable error), call `TaskUpdate({id, status:"failed", summary:"reason"})` so the queue moves on rather than blocking on the failed task.
|
|
356
|
+
|
|
357
|
+
Examples:
|
|
358
|
+
|
|
359
|
+
```
|
|
360
|
+
TaskCreate({ id: "R001-skill_authoring", title: "Author skill for R001",
|
|
361
|
+
phase: "skill_authoring", ruleId: "R001" })
|
|
362
|
+
|
|
363
|
+
TaskCreate({ id: "trust-bundle-skill_authoring",
|
|
364
|
+
title: "R013/R015/R017 — required-fields table",
|
|
365
|
+
phase: "skill_authoring" })
|
|
366
|
+
|
|
367
|
+
TaskComplete({ id: "R001-skill_authoring",
|
|
368
|
+
summary: "regex check passes 89/90; R001 done" })
|
|
369
|
+
```
|
|
370
|
+
|
|
347
371
|
### Persisted methodology — PATTERNS.md OR phase logs OR AGENT.md decisions
|
|
348
372
|
|
|
349
373
|
The principle: capture framework-level decisions to disk before each phase advance. The conversation will compact, agents will restart, the next phase will lose grounding. Whichever format you pick, write to disk — don't rely on conversation context that disappears.
|
|
@@ -132,6 +132,53 @@ existing catalog. Therefore, when composing the brief:
|
|
|
132
132
|
catalog.json.** rule_catalog uses workspace file locking (B9);
|
|
133
133
|
sandbox_exec bypasses it and races with other writers.
|
|
134
134
|
|
|
135
|
+
## 如何读取规则文件 (默认整本读取)
|
|
136
|
+
|
|
137
|
+
法规文件是审核的权威依据。你为每条规则记录的 `source_ref` 都要能在
|
|
138
|
+
原文中复核。对于绝大多数规则文件 (单个文件 < 50 KB / < ~100 页),
|
|
139
|
+
**用 `workspace_file` (operation=read) 一次性整本读取**:
|
|
140
|
+
|
|
141
|
+
```js
|
|
142
|
+
workspace_file({ operation: "read", scope: "project", path: "Rules/01_某某办法.md" })
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
`workspace_file.read` 单次上限 50,000 字符, 足以覆盖几乎所有单个法规
|
|
146
|
+
文件。这是默认行为: **在抽取规则之前, 把每一份法规文件都整本读一遍。**
|
|
147
|
+
|
|
148
|
+
### 工具选择 — `workspace_file` 还是 `sandbox_exec`
|
|
149
|
+
|
|
150
|
+
| 工具 | 单次上限 | 适用 |
|
|
151
|
+
|---|---:|---|
|
|
152
|
+
| `workspace_file` (read) | 50,000 字符 | **整本读取法规/规则文件** |
|
|
153
|
+
| `sandbox_exec` (cat/head) | 10,000 字符 | 短命令, 不适合整文件读取 |
|
|
154
|
+
|
|
155
|
+
`sandbox_exec` 是为执行 shell 命令设计的, 10K 上限对绝大多数法规太小。
|
|
156
|
+
`cat rules/01_*.md` 只会返回前 ~10 KB, 后面被截断为 `\n[truncated]`。
|
|
157
|
+
反复用 `head -N` / `tail -M` 滑动窗口会丢失行号位置信息, 也浪费交互
|
|
158
|
+
回合。**遇到截断, 别和上限较劲——换工具。**
|
|
159
|
+
|
|
160
|
+
### 法规与样本的不对称 — 法规整本读, 样本按需抽样
|
|
161
|
+
|
|
162
|
+
法规通常只有 1–10 份, 权威性强, 只需读一次。每一份法规都整本读取,
|
|
163
|
+
作为后续所有规则抽取与引用的基础。
|
|
164
|
+
|
|
165
|
+
样本文档可能 30 份甚至 1000+ 份, 异质性强, 在测试阶段会被多次读取。
|
|
166
|
+
**不要试图把每个样本都整本读一遍**——用规则适用性过滤、抽样子集来
|
|
167
|
+
聚焦注意力。
|
|
168
|
+
|
|
169
|
+
### 例外 — 单个法规超过 200K 字符时
|
|
170
|
+
|
|
171
|
+
实践中极少见。test_data_4 中最大的法规 42 KB; 银行业 资管新规 +
|
|
172
|
+
信披办法 等典型法规也都在 50 KB 以内。但如果你确实遇到一份超大法规,
|
|
173
|
+
读取整本会挤压上下文窗口 (启发式: 单文件超过 ~200,000 字符 或超过你
|
|
174
|
+
上下文预算的 ~25%), 此时由你判断:
|
|
175
|
+
|
|
176
|
+
- 按章 (`第X章`) 分段读, 用 `document_parse` 或分页的 `workspace_file`
|
|
177
|
+
- 或建立工作区内的索引文件, 标注每章的偏移位置, 抽取规则时按需读取
|
|
178
|
+
|
|
179
|
+
50 KB 的上限已经足够高, 上述例外情形几乎不会触发。**默认就是整本读;
|
|
180
|
+
只有当文件确实太大时才偏离这一默认。**
|
|
181
|
+
|
|
135
182
|
## Extraction Strategies
|
|
136
183
|
|
|
137
184
|
### Strategy 1: Structured Input (Developer User Provides Rules)
|
|
@@ -85,7 +85,7 @@ KC 的 main agent 是指挥者。指挥者决定下一步做什么——而这
|
|
|
85
85
|
- 一条规则的判断逻辑是另一条的子串或近似变体
|
|
86
86
|
- 一次失败通常意味着多条规则同时失败(R013 不可能在 R015 失败的情况下通过)
|
|
87
87
|
|
|
88
|
-
例:R013 / R015 / R017 都在检查报告第 3 页那张表是否包含某些必填字段。同一个 chunk、同一次 parse、同一种 verdict 形状。合并为 `check_r013_r015_r017.py
|
|
88
|
+
例:R013 / R015 / R017 都在检查报告第 3 页那张表是否包含某些必填字段。同一个 chunk、同一次 parse、同一种 verdict 形状。合并为 `check_r013_r015_r017.py`,并创建一个任务:`TaskCreate({id: "R013-R015-R017-skill_authoring", title: "R013/R015/R017 — 必填字段表", phase: "skill_authoring"})`。引擎从文件系统推导里程碑时会识别这个合并 check.py,给三个 rule_id 都计入覆盖。
|
|
89
89
|
|
|
90
90
|
### 何时保持独立
|
|
91
91
|
|
|
@@ -337,6 +337,30 @@ PATTERNS.md 全文控制在约 5 KB 之内。超过时,剪掉最不可执行
|
|
|
337
337
|
5. **挑第一个任务**。做到完整(skill + check + 至少一次本地测试)。把学到的写进 PATTERNS.md。换下一个任务。
|
|
338
338
|
6. **任务做到第 5 个、第 10 个时**:停下来重读 PATTERNS.md。如果新积累的 pattern 暗示要重构早期工作,**现在做**(便宜)而不是更晚(昂贵)。
|
|
339
339
|
|
|
340
|
+
### 调用 TaskCreate / TaskUpdate / TaskComplete
|
|
341
|
+
|
|
342
|
+
引擎注册了三个任务面板工具(v0.7.3+):
|
|
343
|
+
|
|
344
|
+
- `TaskCreate({id, title, phase, ruleId?})` —— 在 `tasks.json` 中新增一条任务。`id` 在本会话内必须唯一;per-rule 任务建议用 `<rule_id>-<phase>` 这种稳定形状,分组 / 非规则任务用 `<group-name>-<phase>`。`phase` 是该任务所属的阶段(当前阶段或你预先排好的未来阶段)。`ruleId` 可选 —— 设上之后,引擎在里程碑推导时能把这个 rule_id 计入覆盖。
|
|
345
|
+
- `TaskUpdate({id, status?, summary?})` —— 把任务状态改为 `pending` / `in_progress` / `completed` / `failed`,可选附一行简要 summary。
|
|
346
|
+
- `TaskComplete({id, summary?})` —— `TaskUpdate({id, status:"completed", summary})` 的语法糖。完成一个工作单元后走这条最常用的路径。
|
|
347
|
+
|
|
348
|
+
调用 `TaskCreate` 把你的拆分写进面板、本回合结束之后,Ralph 循环会取下一条 pending 任务执行。完成工作、调 `TaskComplete`,循环再前进。如果一条任务无法完成(不可恢复的错误),调 `TaskUpdate({id, status:"failed", summary:"原因"})`,让队列继续推进而不是被堵在那里。
|
|
349
|
+
|
|
350
|
+
示例:
|
|
351
|
+
|
|
352
|
+
```
|
|
353
|
+
TaskCreate({ id: "R001-skill_authoring", title: "为 R001 撰写 skill",
|
|
354
|
+
phase: "skill_authoring", ruleId: "R001" })
|
|
355
|
+
|
|
356
|
+
TaskCreate({ id: "trust-bundle-skill_authoring",
|
|
357
|
+
title: "R013/R015/R017 — 必填字段表",
|
|
358
|
+
phase: "skill_authoring" })
|
|
359
|
+
|
|
360
|
+
TaskComplete({ id: "R001-skill_authoring",
|
|
361
|
+
summary: "正则核查在 89/90 通过;R001 完成" })
|
|
362
|
+
```
|
|
363
|
+
|
|
340
364
|
### 持久化方法论 —— PATTERNS.md 或 phase 日志 或 AGENT.md decisions
|
|
341
365
|
|
|
342
366
|
原则:在每次 phase 推进之前,把框架级的决定写到磁盘。对话会被 compact、agent 会重启、下一个 phase 会失去上下文。不管你选哪种格式,**写到磁盘** —— 不要依赖会消失的对话上下文。
|