kc-beta 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/kc-beta.js +16 -0
- package/package.json +32 -0
- package/src/agent/confidence-scorer.js +120 -0
- package/src/agent/context.js +124 -0
- package/src/agent/corner-case-registry.js +119 -0
- package/src/agent/engine.js +224 -0
- package/src/agent/events.js +27 -0
- package/src/agent/history.js +101 -0
- package/src/agent/llm-client.js +131 -0
- package/src/agent/pipelines/base.js +14 -0
- package/src/agent/pipelines/distillation.js +113 -0
- package/src/agent/pipelines/extraction.js +92 -0
- package/src/agent/pipelines/index.js +23 -0
- package/src/agent/pipelines/initializer.js +163 -0
- package/src/agent/pipelines/production-qc.js +99 -0
- package/src/agent/pipelines/skill-authoring.js +83 -0
- package/src/agent/pipelines/skill-testing.js +111 -0
- package/src/agent/tools/agent-tool.js +100 -0
- package/src/agent/tools/base.js +35 -0
- package/src/agent/tools/dashboard-render.js +146 -0
- package/src/agent/tools/document-parse.js +184 -0
- package/src/agent/tools/document-search.js +111 -0
- package/src/agent/tools/evolution-cycle.js +150 -0
- package/src/agent/tools/qc-sample.js +94 -0
- package/src/agent/tools/registry.js +55 -0
- package/src/agent/tools/rule-catalog.js +113 -0
- package/src/agent/tools/sandbox-exec.js +106 -0
- package/src/agent/tools/tier-downgrade.js +114 -0
- package/src/agent/tools/worker-llm-call.js +109 -0
- package/src/agent/tools/workflow-run.js +138 -0
- package/src/agent/tools/workspace-file.js +122 -0
- package/src/agent/version-manager.js +130 -0
- package/src/agent/workspace.js +82 -0
- package/src/cli/components.js +164 -0
- package/src/cli/index.js +329 -0
- package/src/cli/init.js +80 -0
- package/src/cli/onboard.js +182 -0
- package/src/cli/terminal.js +143 -0
- package/src/config.js +93 -0
- package/template/.env.template +31 -0
- package/template/CLAUDE.md +137 -0
- package/template/Input/.gitkeep +0 -0
- package/template/Output/.gitkeep +0 -0
- package/template/Rules/.gitkeep +0 -0
- package/template/Samples/.gitkeep +0 -0
- package/template/skills/en/meta/compliance-judgment/SKILL.md +114 -0
- package/template/skills/en/meta/compliance-judgment/references/output-format.md +151 -0
- package/template/skills/en/meta/confidence-system/SKILL.md +117 -0
- package/template/skills/en/meta/corner-case-management/SKILL.md +111 -0
- package/template/skills/en/meta/cross-document-verification/SKILL.md +131 -0
- package/template/skills/en/meta/cross-document-verification/references/contradiction-taxonomy.md +73 -0
- package/template/skills/en/meta/data-sensibility/SKILL.md +115 -0
- package/template/skills/en/meta/document-parsing/SKILL.md +108 -0
- package/template/skills/en/meta/document-parsing/references/parser-catalog.md +40 -0
- package/template/skills/en/meta/entity-extraction/SKILL.md +129 -0
- package/template/skills/en/meta/tree-processing/SKILL.md +103 -0
- package/template/skills/en/meta-meta/bootstrap-workspace/SKILL.md +70 -0
- package/template/skills/en/meta-meta/dashboard-reporting/SKILL.md +106 -0
- package/template/skills/en/meta-meta/dashboard-reporting/scripts/generate_dashboard.py +178 -0
- package/template/skills/en/meta-meta/evolution-loop/SKILL.md +210 -0
- package/template/skills/en/meta-meta/evolution-loop/references/convergence-guide.md +62 -0
- package/template/skills/en/meta-meta/quality-control/SKILL.md +138 -0
- package/template/skills/en/meta-meta/quality-control/references/qa-layers.md +92 -0
- package/template/skills/en/meta-meta/quality-control/references/sampling-strategies.md +76 -0
- package/template/skills/en/meta-meta/rule-extraction/SKILL.md +100 -0
- package/template/skills/en/meta-meta/rule-extraction/references/chunking-strategies.md +80 -0
- package/template/skills/en/meta-meta/rule-graph/SKILL.md +118 -0
- package/template/skills/en/meta-meta/skill-authoring/SKILL.md +108 -0
- package/template/skills/en/meta-meta/skill-authoring/references/skill-format-spec.md +78 -0
- package/template/skills/en/meta-meta/skill-to-workflow/SKILL.md +150 -0
- package/template/skills/en/meta-meta/skill-to-workflow/references/worker-llm-catalog.md +50 -0
- package/template/skills/en/meta-meta/task-decomposition/SKILL.md +129 -0
- package/template/skills/en/meta-meta/task-decomposition/references/decision-matrix.md +81 -0
- package/template/skills/en/meta-meta/version-control/SKILL.md +152 -0
- package/template/skills/en/meta-meta/version-control/references/trace-id-spec.md +79 -0
- package/template/skills/en/skill-creator/LICENSE.txt +202 -0
- package/template/skills/en/skill-creator/SKILL.md +479 -0
- package/template/skills/en/skill-creator/agents/analyzer.md +274 -0
- package/template/skills/en/skill-creator/agents/comparator.md +202 -0
- package/template/skills/en/skill-creator/agents/grader.md +223 -0
- package/template/skills/en/skill-creator/assets/eval_review.html +146 -0
- package/template/skills/en/skill-creator/eval-viewer/generate_review.py +471 -0
- package/template/skills/en/skill-creator/eval-viewer/viewer.html +1325 -0
- package/template/skills/en/skill-creator/references/schemas.md +430 -0
- package/template/skills/en/skill-creator/scripts/__init__.py +0 -0
- package/template/skills/en/skill-creator/scripts/aggregate_benchmark.py +401 -0
- package/template/skills/en/skill-creator/scripts/generate_report.py +326 -0
- package/template/skills/en/skill-creator/scripts/improve_description.py +248 -0
- package/template/skills/en/skill-creator/scripts/package_skill.py +136 -0
- package/template/skills/en/skill-creator/scripts/quick_validate.py +103 -0
- package/template/skills/en/skill-creator/scripts/run_eval.py +310 -0
- package/template/skills/en/skill-creator/scripts/run_loop.py +332 -0
- package/template/skills/en/skill-creator/scripts/utils.py +47 -0
- package/template/skills/zh/meta/compliance-judgment/SKILL.md +303 -0
- package/template/skills/zh/meta/compliance-judgment/references/output-format.md +151 -0
- package/template/skills/zh/meta/confidence-system/SKILL.md +228 -0
- package/template/skills/zh/meta/corner-case-management/SKILL.md +235 -0
- package/template/skills/zh/meta/cross-document-verification/SKILL.md +241 -0
- package/template/skills/zh/meta/cross-document-verification/references/contradiction-taxonomy.md +73 -0
- package/template/skills/zh/meta/data-sensibility/SKILL.md +235 -0
- package/template/skills/zh/meta/document-parsing/SKILL.md +168 -0
- package/template/skills/zh/meta/document-parsing/references/parser-catalog.md +40 -0
- package/template/skills/zh/meta/entity-extraction/SKILL.md +276 -0
- package/template/skills/zh/meta/tree-processing/SKILL.md +233 -0
- package/template/skills/zh/meta-meta/bootstrap-workspace/SKILL.md +147 -0
- package/template/skills/zh/meta-meta/dashboard-reporting/SKILL.md +281 -0
- package/template/skills/zh/meta-meta/dashboard-reporting/scripts/generate_dashboard.py +178 -0
- package/template/skills/zh/meta-meta/evolution-loop/SKILL.md +302 -0
- package/template/skills/zh/meta-meta/evolution-loop/references/convergence-guide.md +62 -0
- package/template/skills/zh/meta-meta/quality-control/SKILL.md +269 -0
- package/template/skills/zh/meta-meta/quality-control/references/qa-layers.md +92 -0
- package/template/skills/zh/meta-meta/quality-control/references/sampling-strategies.md +76 -0
- package/template/skills/zh/meta-meta/rule-extraction/SKILL.md +208 -0
- package/template/skills/zh/meta-meta/rule-extraction/references/chunking-strategies.md +80 -0
- package/template/skills/zh/meta-meta/rule-graph/SKILL.md +203 -0
- package/template/skills/zh/meta-meta/skill-authoring/SKILL.md +235 -0
- package/template/skills/zh/meta-meta/skill-authoring/references/skill-format-spec.md +78 -0
- package/template/skills/zh/meta-meta/skill-to-workflow/SKILL.md +275 -0
- package/template/skills/zh/meta-meta/skill-to-workflow/references/worker-llm-catalog.md +50 -0
- package/template/skills/zh/meta-meta/task-decomposition/SKILL.md +224 -0
- package/template/skills/zh/meta-meta/task-decomposition/references/decision-matrix.md +81 -0
- package/template/skills/zh/meta-meta/version-control/SKILL.md +284 -0
- package/template/skills/zh/meta-meta/version-control/references/trace-id-spec.md +79 -0
- package/template/skills/zh/skill-creator/LICENSE.txt +202 -0
- package/template/skills/zh/skill-creator/SKILL.md +479 -0
- package/template/skills/zh/skill-creator/agents/analyzer.md +274 -0
- package/template/skills/zh/skill-creator/agents/comparator.md +202 -0
- package/template/skills/zh/skill-creator/agents/grader.md +223 -0
- package/template/skills/zh/skill-creator/assets/eval_review.html +146 -0
- package/template/skills/zh/skill-creator/eval-viewer/generate_review.py +471 -0
- package/template/skills/zh/skill-creator/eval-viewer/viewer.html +1325 -0
- package/template/skills/zh/skill-creator/references/schemas.md +430 -0
- package/template/skills/zh/skill-creator/scripts/__init__.py +0 -0
- package/template/skills/zh/skill-creator/scripts/aggregate_benchmark.py +401 -0
- package/template/skills/zh/skill-creator/scripts/generate_report.py +326 -0
- package/template/skills/zh/skill-creator/scripts/improve_description.py +248 -0
- package/template/skills/zh/skill-creator/scripts/package_skill.py +136 -0
- package/template/skills/zh/skill-creator/scripts/quick_validate.py +103 -0
- package/template/skills/zh/skill-creator/scripts/run_eval.py +310 -0
- package/template/skills/zh/skill-creator/scripts/run_loop.py +332 -0
- package/template/skills/zh/skill-creator/scripts/utils.py +47 -0
|
@@ -0,0 +1,143 @@
|
|
|
1
|
+
// ANSI colors — no dependencies, no cursor positioning, no scroll regions.
|
|
2
|
+
// Everything appends to stdout naturally. Simple and reliable.
|
|
3
|
+
|
|
4
|
+
const ESC = "\x1b[";
|
|
5
|
+
const RESET = `${ESC}0m`;
|
|
6
|
+
const DIM = `${ESC}2m`;
|
|
7
|
+
const BOLD = `${ESC}1m`;
|
|
8
|
+
const GREEN = `${ESC}32m`;
|
|
9
|
+
const YELLOW = `${ESC}33m`;
|
|
10
|
+
const RED = `${ESC}31m`;
|
|
11
|
+
const CYAN = `${ESC}36m`;
|
|
12
|
+
const GRAY = `${ESC}90m`;
|
|
13
|
+
const CLEAR_LINE = `\r${ESC}2K`;
|
|
14
|
+
|
|
15
|
+
const COOKING_WORDS = [
|
|
16
|
+
"Baking", "Blanching", "Brewing", "Caramelizing", "Cooking",
|
|
17
|
+
"Fermenting", "Flambéing", "Julienning", "Kneading", "Leavening",
|
|
18
|
+
"Marinating", "Proofing", "Sautéing", "Seasoning", "Simmering",
|
|
19
|
+
"Stewing", "Tempering", "Whisking", "Zesting", "Garnishing", "Drizzling",
|
|
20
|
+
];
|
|
21
|
+
|
|
22
|
+
const LENAT_QUOTE = "Intelligence is ten million rules.";
|
|
23
|
+
|
|
24
|
+
let spinnerInterval = null;
|
|
25
|
+
let _sessionId = "";
|
|
26
|
+
let _phase = "";
|
|
27
|
+
|
|
28
|
+
function cols() {
|
|
29
|
+
return process.stdout.columns || 80;
|
|
30
|
+
}
|
|
31
|
+
|
|
32
|
+
function hrule() {
|
|
33
|
+
return `${GRAY}${"─".repeat(cols())}${RESET}`;
|
|
34
|
+
}
|
|
35
|
+
|
|
36
|
+
// --- Welcome screen ---
|
|
37
|
+
|
|
38
|
+
export function printWelcome() {
|
|
39
|
+
const art = `
|
|
40
|
+
${BOLD}KC AGENT CLI${RESET} ${DIM}(beta)${RESET}
|
|
41
|
+
|
|
42
|
+
${DIM}Hope you never know what KC was.${RESET}
|
|
43
|
+
|
|
44
|
+
${GRAY}Product of Memium${RESET}
|
|
45
|
+
${GRAY}kitchen-engineer42${RESET}
|
|
46
|
+
`;
|
|
47
|
+
process.stdout.write(art + "\n");
|
|
48
|
+
}
|
|
49
|
+
|
|
50
|
+
// --- Status bar ---
|
|
51
|
+
|
|
52
|
+
function _statusLine() {
|
|
53
|
+
const session = _sessionId ? `[${_sessionId}]` : "";
|
|
54
|
+
const phasePart = _phase ? ` ${CYAN}${_phase.toUpperCase()}${RESET}` : "";
|
|
55
|
+
return ` ${GRAY}⏵⏵${RESET} KC Agent CLI ${GRAY}${session}${RESET}${phasePart} ${GREEN}●${RESET} ${DIM}· ${LENAT_QUOTE}${RESET}`;
|
|
56
|
+
}
|
|
57
|
+
|
|
58
|
+
export function updateStatus(sessionId, phase) {
|
|
59
|
+
if (sessionId != null) _sessionId = sessionId;
|
|
60
|
+
if (phase != null) _phase = phase;
|
|
61
|
+
}
|
|
62
|
+
|
|
63
|
+
// --- Prompt (printed after each turn) ---
|
|
64
|
+
|
|
65
|
+
export function printPrompt() {
|
|
66
|
+
process.stdout.write(`${hrule()}\n${GRAY}❯${RESET} `);
|
|
67
|
+
}
|
|
68
|
+
|
|
69
|
+
export function printPromptWithStatus() {
|
|
70
|
+
process.stdout.write(`${hrule()}\n${GRAY}❯${RESET} \n${hrule()}\n${_statusLine()}\n`);
|
|
71
|
+
// Move cursor back up to the input line
|
|
72
|
+
process.stdout.write(`${ESC}3A${ESC}3C`);
|
|
73
|
+
}
|
|
74
|
+
|
|
75
|
+
// --- Transcript output (just appends, scrolls naturally) ---
|
|
76
|
+
|
|
77
|
+
export function printTextDelta(text) {
|
|
78
|
+
process.stdout.write(text);
|
|
79
|
+
}
|
|
80
|
+
|
|
81
|
+
export function printTurnComplete() {
|
|
82
|
+
process.stdout.write("\n");
|
|
83
|
+
}
|
|
84
|
+
|
|
85
|
+
export function printToolStart(name, input) {
|
|
86
|
+
const inputStr = input ? ` ${GRAY}${JSON.stringify(input)}${RESET}` : "";
|
|
87
|
+
process.stdout.write(`\n ${YELLOW}┃${RESET} ${DIM}${name}${RESET}${inputStr}\n`);
|
|
88
|
+
}
|
|
89
|
+
|
|
90
|
+
export function printToolResult(name, output, isError) {
|
|
91
|
+
const color = isError ? RED : GREEN;
|
|
92
|
+
const prefix = ` ${color}┃${RESET} `;
|
|
93
|
+
if (output) {
|
|
94
|
+
const lines = output.split("\n");
|
|
95
|
+
const show = lines.slice(0, 20);
|
|
96
|
+
for (const l of show) {
|
|
97
|
+
process.stdout.write(`${prefix}${l}\n`);
|
|
98
|
+
}
|
|
99
|
+
if (lines.length > 20) {
|
|
100
|
+
process.stdout.write(`${prefix}${GRAY}... ${lines.length - 20} more lines${RESET}\n`);
|
|
101
|
+
}
|
|
102
|
+
}
|
|
103
|
+
process.stdout.write("\n");
|
|
104
|
+
}
|
|
105
|
+
|
|
106
|
+
export function printSystemMessage(message) {
|
|
107
|
+
process.stdout.write(`${GRAY}${message}${RESET}\n`);
|
|
108
|
+
}
|
|
109
|
+
|
|
110
|
+
export function printError(message) {
|
|
111
|
+
process.stdout.write(`${RED}Error: ${message}${RESET}\n`);
|
|
112
|
+
}
|
|
113
|
+
|
|
114
|
+
export function printUserMessage(text) {
|
|
115
|
+
process.stdout.write(`${CLEAR_LINE}${GRAY}❯${RESET} ${text}\n`);
|
|
116
|
+
}
|
|
117
|
+
|
|
118
|
+
// --- Redraw input (for raw mode keystroke rendering) ---
|
|
119
|
+
|
|
120
|
+
export function redrawInput(text) {
|
|
121
|
+
process.stdout.write(`${CLEAR_LINE}${GRAY}❯${RESET} ${text}`);
|
|
122
|
+
}
|
|
123
|
+
|
|
124
|
+
// --- Spinner ---
|
|
125
|
+
|
|
126
|
+
export function startSpinner() {
|
|
127
|
+
if (spinnerInterval) return;
|
|
128
|
+
let idx = Math.floor(Math.random() * COOKING_WORDS.length);
|
|
129
|
+
const render = () => {
|
|
130
|
+
const word = COOKING_WORDS[idx % COOKING_WORDS.length];
|
|
131
|
+
process.stdout.write(`${CLEAR_LINE} ${YELLOW}*${RESET} ${DIM}${word}...${RESET}`);
|
|
132
|
+
idx++;
|
|
133
|
+
};
|
|
134
|
+
render();
|
|
135
|
+
spinnerInterval = setInterval(render, 2000);
|
|
136
|
+
}
|
|
137
|
+
|
|
138
|
+
export function stopSpinner() {
|
|
139
|
+
if (!spinnerInterval) return;
|
|
140
|
+
clearInterval(spinnerInterval);
|
|
141
|
+
spinnerInterval = null;
|
|
142
|
+
process.stdout.write(`${CLEAR_LINE}`);
|
|
143
|
+
}
|
package/src/config.js
ADDED
|
@@ -0,0 +1,93 @@
|
|
|
1
|
+
import fs from "node:fs";
|
|
2
|
+
import path from "node:path";
|
|
3
|
+
import os from "node:os";
|
|
4
|
+
|
|
5
|
+
const GLOBAL_CONFIG_DIR = path.join(os.homedir(), ".kc_agent");
|
|
6
|
+
const GLOBAL_CONFIG_PATH = path.join(GLOBAL_CONFIG_DIR, "config.json");
|
|
7
|
+
|
|
8
|
+
function loadGlobalConfig() {
|
|
9
|
+
if (fs.existsSync(GLOBAL_CONFIG_PATH)) {
|
|
10
|
+
try {
|
|
11
|
+
return JSON.parse(fs.readFileSync(GLOBAL_CONFIG_PATH, "utf-8"));
|
|
12
|
+
} catch {
|
|
13
|
+
return {};
|
|
14
|
+
}
|
|
15
|
+
}
|
|
16
|
+
return {};
|
|
17
|
+
}
|
|
18
|
+
|
|
19
|
+
/**
|
|
20
|
+
* Parse a .env file into a key-value object.
|
|
21
|
+
* Handles KEY=VALUE lines, ignores comments and blank lines.
|
|
22
|
+
*/
|
|
23
|
+
function loadEnvFile(envPath) {
|
|
24
|
+
if (!fs.existsSync(envPath)) return {};
|
|
25
|
+
const env = {};
|
|
26
|
+
const lines = fs.readFileSync(envPath, "utf-8").split("\n");
|
|
27
|
+
for (const line of lines) {
|
|
28
|
+
const trimmed = line.trim();
|
|
29
|
+
if (!trimmed || trimmed.startsWith("#")) continue;
|
|
30
|
+
const eqIdx = trimmed.indexOf("=");
|
|
31
|
+
if (eqIdx === -1) continue;
|
|
32
|
+
const key = trimmed.slice(0, eqIdx).trim();
|
|
33
|
+
let value = trimmed.slice(eqIdx + 1).trim();
|
|
34
|
+
// Strip surrounding quotes
|
|
35
|
+
if ((value.startsWith('"') && value.endsWith('"')) ||
|
|
36
|
+
(value.startsWith("'") && value.endsWith("'"))) {
|
|
37
|
+
value = value.slice(1, -1);
|
|
38
|
+
}
|
|
39
|
+
env[key] = value;
|
|
40
|
+
}
|
|
41
|
+
return env;
|
|
42
|
+
}
|
|
43
|
+
|
|
44
|
+
/**
|
|
45
|
+
* Load settings by merging: global config (lowest) -> workspace .env (highest).
|
|
46
|
+
* @param {string} [workspacePath] - Optional workspace directory for .env override
|
|
47
|
+
*/
|
|
48
|
+
export function loadSettings(workspacePath) {
|
|
49
|
+
const gc = loadGlobalConfig();
|
|
50
|
+
const env = workspacePath ? loadEnvFile(path.join(workspacePath, ".env")) : {};
|
|
51
|
+
|
|
52
|
+
return {
|
|
53
|
+
// Conductor LLM
|
|
54
|
+
llmApiKey: env.SILICONFLOW_API_KEY || gc.api_key || "",
|
|
55
|
+
llmBaseUrl: env.SILICONFLOW_BASE_URL || gc.base_url || "https://api.siliconflow.cn/v1",
|
|
56
|
+
kcModel: gc.conductor_model || "glm-5",
|
|
57
|
+
kcMaxTokens: 65536,
|
|
58
|
+
|
|
59
|
+
// Worker LLMs (SiliconFlow)
|
|
60
|
+
siliconflowApiKey: env.SILICONFLOW_API_KEY || gc.api_key || "",
|
|
61
|
+
siliconflowBaseUrl: env.SILICONFLOW_BASE_URL || gc.base_url || "https://api.siliconflow.cn/v1",
|
|
62
|
+
|
|
63
|
+
// Tier models (from .env or global config tiers)
|
|
64
|
+
tier1: env.TIER1 || gc.tiers?.tier1 || "",
|
|
65
|
+
tier2: env.TIER2 || gc.tiers?.tier2 || "",
|
|
66
|
+
tier3: env.TIER3 || gc.tiers?.tier3 || "",
|
|
67
|
+
tier4: env.TIER4 || gc.tiers?.tier4 || "",
|
|
68
|
+
|
|
69
|
+
// OCR models
|
|
70
|
+
ocrModelTier1: env.OCR_MODEL_TIER1 || "zai-org/GLM-4.6V",
|
|
71
|
+
|
|
72
|
+
// Document parsing
|
|
73
|
+
mineruApiUrl: env.MINERU_API_URL || "",
|
|
74
|
+
mineruApiKey: env.MINERU_API_KEY || "",
|
|
75
|
+
|
|
76
|
+
// Workspace
|
|
77
|
+
kcWorkspaceRoot: gc.workspace_root || path.join(os.homedir(), ".kc_agent", "workspaces"),
|
|
78
|
+
kcExecTimeout: parseInt(env.KC_EXEC_TIMEOUT || "30", 10),
|
|
79
|
+
|
|
80
|
+
// Accuracy thresholds
|
|
81
|
+
skillAccuracy: parseFloat(env.SKILL_ACCURACY || gc.accuracy_threshold?.toString() || "0.9"),
|
|
82
|
+
workflowAccuracy: parseFloat(env.WORKFLOW_ACCURACY || "0.9"),
|
|
83
|
+
|
|
84
|
+
// Evolution
|
|
85
|
+
maxIterations: parseInt(env.MAX_ITERATIONS || "20", 10),
|
|
86
|
+
monitorFrequency: env.MONITOR_FREQUENCY || "mid",
|
|
87
|
+
|
|
88
|
+
// Language
|
|
89
|
+
language: env.LANGUAGE || gc.language || "en",
|
|
90
|
+
};
|
|
91
|
+
}
|
|
92
|
+
|
|
93
|
+
export { GLOBAL_CONFIG_DIR, GLOBAL_CONFIG_PATH };
|
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
# === KC Reborn Configuration ===
|
|
2
|
+
|
|
3
|
+
# Language: en | zh
|
|
4
|
+
LANGUAGE=en
|
|
5
|
+
|
|
6
|
+
# === Worker LLM API ===
|
|
7
|
+
SILICONFLOW_API_KEY=
|
|
8
|
+
SILICONFLOW_BASE_URL=https://api.siliconflow.cn/v1
|
|
9
|
+
|
|
10
|
+
# === Worker LLM Tiers (highest capability to lowest) ===
|
|
11
|
+
TIER1=Pro/zai-org/GLM-5, Pro/moonshotai/Kimi-K2.5
|
|
12
|
+
TIER2=Pro/deepseek-ai/DeepSeek-V3.2, Pro/MiniMaxAI/MiniMax-M2.5, Qwen/Qwen3.5-397B-A17B
|
|
13
|
+
TIER3=Qwen/Qwen3.5-122B-A10B
|
|
14
|
+
TIER4=Qwen/Qwen3.5-35B-A3B
|
|
15
|
+
|
|
16
|
+
# === OCR Model Tiers ===
|
|
17
|
+
OCR_MODEL_TIER1=zai-org/GLM-4.6V
|
|
18
|
+
OCR_MODEL_TIER2=Qwen/Qwen3.5-397B-A17B
|
|
19
|
+
OCR_MODEL_TIER3=PaddlePaddle/PaddleOCR-VL-1.5
|
|
20
|
+
|
|
21
|
+
# === Quality Thresholds ===
|
|
22
|
+
# Accuracy required before moving skill to workflow phase (0.0-1.0)
|
|
23
|
+
SKILL_ACCURACY=0.9
|
|
24
|
+
# Accuracy required before deploying workflow to production (0.0-1.0)
|
|
25
|
+
WORKFLOW_ACCURACY=0.9
|
|
26
|
+
# Monitoring frequency for production QC: low | mid | high
|
|
27
|
+
MONITOR_FREQUENCY=mid
|
|
28
|
+
|
|
29
|
+
# === Evolution Control ===
|
|
30
|
+
# Maximum evolution iterations per rule before escalating to developer user
|
|
31
|
+
MAX_ITERATIONS=20
|
|
@@ -0,0 +1,137 @@
|
|
|
1
|
+
# KC Reborn — Document Verification Workspace
|
|
2
|
+
|
|
3
|
+
## What This Workspace Is
|
|
4
|
+
|
|
5
|
+
You are a coding agent tasked with building a document verification app for the developer user's specific business scenario. The meta skills in `skills/` encode the methodology of experienced verification system architects and business analysts. You bring the intelligence and judgment to apply this methodology to the specific case at hand.
|
|
6
|
+
|
|
7
|
+
Your goal: build a verification system that starts with you doing the work, then gradually distills your capability into cheap, fast workflows powered by worker LLMs. You are the ground truth. The workflows you create are the deliverables.
|
|
8
|
+
|
|
9
|
+
## Roles
|
|
10
|
+
|
|
11
|
+
- **Developer user**: The human you serve. They are a domain expert (e.g., tech lead at a bank's loan department). They provide the rules, the documents, and the business context. Discuss decisions with them.
|
|
12
|
+
- **You (the coding agent)**: You are both the Builder (creating skills and workflows) and the Observer (judging quality). You do the verification first, prove it works, then teach smaller models to replicate your results.
|
|
13
|
+
- **Worker LLMs**: The performers. Models configured in `.env` (TIER1 through TIER4) that will execute the workflows you build. Your job is to find the smallest model that works for each task.
|
|
14
|
+
|
|
15
|
+
## Workspace Layout
|
|
16
|
+
|
|
17
|
+
```
|
|
18
|
+
Rules/ — Regulation documents, compliance notes from the developer user
|
|
19
|
+
Samples/ — Sample documents for testing (your training set)
|
|
20
|
+
Input/ — Production document batches awaiting verification
|
|
21
|
+
Output/ — Verification results
|
|
22
|
+
skills/ — Meta skills encoding verification methodology
|
|
23
|
+
.env — Configuration: API keys, model tiers, thresholds, language
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
## Your Mission
|
|
27
|
+
|
|
28
|
+
Follow this lifecycle. Each step references the skill(s) to consult:
|
|
29
|
+
|
|
30
|
+
1. **Bootstrap** → Read `bootstrap-workspace`. Understand the business scenario, read Rules/, scan Samples/, configure .env with the developer user.
|
|
31
|
+
2. **Extract Rules** → Read `rule-extraction`. Decompose regulation documents into atomic, testable verification rules.
|
|
32
|
+
3. **Decompose Tasks** → Read `task-decomposition`. For each rule, break the verification into sub-tasks and assign the optimal method (rule, code, LLM, or manual) to each.
|
|
33
|
+
4. **Map Rule Relationships** → Read `rule-graph`. Identify shared entities, dependencies, and conflicts between rules. Each rule stays independently executable.
|
|
34
|
+
5. **Write Rule Skills** → Read `skill-authoring`. Write each rule into a skill folder. Before writing extraction logic for a new document type, consult `data-sensibility` to observe the data first.
|
|
35
|
+
6. **Test Skills** → Apply each skill to Samples/. Use `evolution-loop` to diagnose failures and iterate. Continue until accuracy meets SKILL_ACCURACY threshold in .env.
|
|
36
|
+
7. **Distill to Workflows** → Read `skill-to-workflow`. Convert proven skills into Python code + worker LLM prompts. Test workflows against your own results as ground truth. Iterate until WORKFLOW_ACCURACY is met.
|
|
37
|
+
8. **Production QC** → Read `quality-control` and `confidence-system`. Run workflows on Input/. Sample and review results based on confidence scores. For multi-document cases, read `cross-document-verification`. Use `evolution-loop` when quality drops.
|
|
38
|
+
9. **Stabilize** → Gradually reduce monitoring as workflows prove reliable. Only intervene when rules change or quality drops.
|
|
39
|
+
10. **Report** → Read `dashboard-reporting`. Generate HTML dashboards so the developer user can see results, progress, and issues. Ensure dashboards include feedback collection mechanisms for users.
|
|
40
|
+
|
|
41
|
+
Throughout: use `version-control` to track all changes. Use `corner-case-management` to handle edge cases without polluting workflows. Use `task-decomposition` and `rule-graph` to inform optimization decisions.
|
|
42
|
+
|
|
43
|
+
## Core Principles
|
|
44
|
+
|
|
45
|
+
- **Minimum viable model**: Always use the smallest, cheapest, fastest model that meets the accuracy threshold. Start simple, escalate only when necessary.
|
|
46
|
+
- **JIT structure**: Do not design schemas or formats prematurely. Define them when needed, keep them consistent once defined.
|
|
47
|
+
- **OTF evolution**: The system you build today may look completely different tomorrow. Embrace change.
|
|
48
|
+
- **Skills before workflows**: Prove each rule works as a skill (you executing it) before distilling into code + worker LLM prompts.
|
|
49
|
+
- **Log everything**: Every test iteration, every evolution decision, every version change. Both JSON (machine-readable) and plain text (human-readable).
|
|
50
|
+
|
|
51
|
+
## How to Read Skills
|
|
52
|
+
|
|
53
|
+
Skills use progressive disclosure:
|
|
54
|
+
1. **Frontmatter** (name + description) — always visible, ~100 words. Tells you WHEN to use the skill.
|
|
55
|
+
2. **SKILL.md body** — read when the skill is relevant. Under 500 lines. Conveys methodology, not recipes.
|
|
56
|
+
3. **references/** — read on demand for detailed technical reference.
|
|
57
|
+
4. **scripts/** — executable code you can run or adapt.
|
|
58
|
+
5. **assets/** — data files, templates, examples.
|
|
59
|
+
|
|
60
|
+
Skills convey philosophy and decision frameworks. Adapt them to the specific business case. Do not follow them rigidly.
|
|
61
|
+
|
|
62
|
+
## Communication with Developer User
|
|
63
|
+
|
|
64
|
+
- **Proactively discuss**: rule granularity, accuracy thresholds, model selection, edge cases.
|
|
65
|
+
- **Report progress**: after each testing round, share results and next steps.
|
|
66
|
+
- **Escalate**: when you cannot resolve an issue after iterating, surface it with evidence.
|
|
67
|
+
- **Ask**: the developer user is a domain expert. When in doubt about a rule's intent, ask.
|
|
68
|
+
|
|
69
|
+
---
|
|
70
|
+
|
|
71
|
+
# KC Reborn — 文档核查工作区
|
|
72
|
+
|
|
73
|
+
## 这是什么
|
|
74
|
+
|
|
75
|
+
你是一个编程智能体,负责为开发者用户的具体业务场景构建文档核查应用。`skills/` 中的元技能编码了资深核查系统架构师和业务分析师的方法论。你负责运用智慧和判断力,将这些方法论应用到具体场景中。
|
|
76
|
+
|
|
77
|
+
你的目标:构建一个核查系统,先由你亲自执行核查工作,然后逐步将你的能力蒸馏为由 Worker LLM(执行模型)驱动的低成本、高速度的工作流。你是基准真值。你创建的工作流是最终交付物。
|
|
78
|
+
|
|
79
|
+
## 角色定义
|
|
80
|
+
|
|
81
|
+
- **开发者用户**:你服务的人。他们是领域专家(如银行信贷部门的技术负责人)。他们提供规则、文档和业务背景。与他们讨论决策。
|
|
82
|
+
- **你(编程智能体)**:你既是构建者(创建技能和工作流),也是观察者(评判质量)。你先执行核查,证明方法可行,再教小模型复现你的结果。
|
|
83
|
+
- **Worker LLM**:执行者。在 `.env` 中配置的模型(TIER1到TIER4),将执行你构建的工作流。你的任务是为每项工作找到能胜任的最小模型。
|
|
84
|
+
|
|
85
|
+
## 工作区结构
|
|
86
|
+
|
|
87
|
+
```
|
|
88
|
+
Rules/ — 法规文件、开发者用户的合规注释
|
|
89
|
+
Samples/ — 用于测试的样本文件(你的训练集)
|
|
90
|
+
Input/ — 等待核查的生产批次文件
|
|
91
|
+
Output/ — 核查结果
|
|
92
|
+
skills/ — 编码核查方法论的元技能
|
|
93
|
+
.env — 配置:API密钥、模型层级、阈值、语言
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
## 你的使命
|
|
97
|
+
|
|
98
|
+
遵循以下生命周期。每一步标注了需要参考的技能:
|
|
99
|
+
|
|
100
|
+
1. **初始化** → 阅读 `bootstrap-workspace`。理解业务场景,阅读 Rules/,浏览 Samples/,与开发者用户配置 .env。
|
|
101
|
+
2. **提取规则** → 阅读 `rule-extraction`。将法规文件分解为原子级、可测试的核查规则。
|
|
102
|
+
3. **任务分解** → 阅读 `task-decomposition`。对每条规则,将核查过程拆解为子任务,为每个子任务分配最优方法(规则、代码、LLM 或人工)。
|
|
103
|
+
4. **构建规则图谱** → 阅读 `rule-graph`。识别规则间的共享实体、依赖关系和潜在冲突。每条规则保持独立可执行。
|
|
104
|
+
5. **编写规则技能** → 阅读 `skill-authoring`。将每条规则写入技能文件夹。编写新文档类型的提取逻辑前,先阅读 `data-sensibility` 观察数据。
|
|
105
|
+
6. **测试技能** → 在 Samples/ 上应用每个技能。使用 `evolution-loop` 诊断失败并迭代。直到准确率达到 .env 中的 SKILL_ACCURACY 阈值。
|
|
106
|
+
7. **蒸馏为工作流** → 阅读 `skill-to-workflow`。将验证过的技能转化为 Python 代码 + Worker LLM 提示词。用你自己的结果作为基准测试工作流。迭代直到达到 WORKFLOW_ACCURACY。
|
|
107
|
+
8. **生产质控** → 阅读 `quality-control` 和 `confidence-system`。在 Input/ 上运行工作流。根据置信度分数抽样审查结果。涉及多文档案件时,阅读 `cross-document-verification`。质量下降时使用 `evolution-loop`。
|
|
108
|
+
9. **稳定运行** → 随着工作流稳定,逐步降低监控频率。仅在规则变更或质量下降时介入。
|
|
109
|
+
10. **报告** → 阅读 `dashboard-reporting`。生成 HTML 仪表板,让开发者用户直观地看到结果、进度和问题。确保仪表盘内置用户反馈收集机制。
|
|
110
|
+
|
|
111
|
+
全程使用 `version-control` 跟踪所有变更。使用 `corner-case-management` 处理边缘案例,不要污染主工作流。使用 `task-decomposition` 和 `rule-graph` 指导优化决策。
|
|
112
|
+
|
|
113
|
+
## 核心原则
|
|
114
|
+
|
|
115
|
+
- **最小可用模型**:始终使用能达到准确率阈值的最小、最便宜、最快的模型。从简单开始,必要时才升级。
|
|
116
|
+
- **即时结构(JIT)**:不要过早设计数据结构或格式。需要时定义,定义后保持一致。
|
|
117
|
+
- **即时演进(OTF)**:你今天构建的系统明天可能面目全非。拥抱变化。
|
|
118
|
+
- **先技能后工作流**:先证明每条规则作为技能(你执行)可行,再蒸馏为代码 + Worker LLM 提示词。
|
|
119
|
+
- **记录一切**:每次测试迭代、每个演进决策、每次版本变更。同时保存 JSON(机器可读)和纯文本(人类可读)。
|
|
120
|
+
|
|
121
|
+
## 如何阅读技能
|
|
122
|
+
|
|
123
|
+
技能采用渐进式披露:
|
|
124
|
+
1. **前置元数据**(名称 + 描述)— 始终可见,约100字。告诉你何时使用该技能。
|
|
125
|
+
2. **SKILL.md 正文** — 技能相关时阅读。500行以内。传达方法论,而非配方。
|
|
126
|
+
3. **references/** — 按需阅读,获取详细技术参考。
|
|
127
|
+
4. **scripts/** — 可执行代码,可直接运行或修改。
|
|
128
|
+
5. **assets/** — 数据文件、模板、示例。
|
|
129
|
+
|
|
130
|
+
技能传达的是理念和决策框架。请根据具体业务场景灵活运用,不要机械照搬。
|
|
131
|
+
|
|
132
|
+
## 与开发者用户的沟通
|
|
133
|
+
|
|
134
|
+
- **主动讨论**:规则粒度、准确率阈值、模型选择、边缘案例。
|
|
135
|
+
- **汇报进度**:每轮测试后,分享结果和下一步计划。
|
|
136
|
+
- **升级问题**:迭代后仍无法解决的问题,附带证据提交给开发者用户。
|
|
137
|
+
- **多问**:开发者用户是领域专家。对规则意图有疑问时,问他们。
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
@@ -0,0 +1,114 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: compliance-judgment
|
|
3
|
+
description: Determine whether extracted entities comply with verification rules. Use after entity extraction to make the pass/fail judgment for each rule on each document. Covers translating natural language rules into executable logic, choosing between Python calculation and LLM semantic judgment, and producing actionable comments on failures. Also use when designing the judgment step of a workflow or when a rule's judgment logic needs debugging.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Compliance Judgment
|
|
7
|
+
|
|
8
|
+
Judgment is the moment of truth. You have the extracted entity. You have the rule. Do they comply? The answer must be clear, correct, and — when the answer is no — accompanied by a concise, actionable comment.
|
|
9
|
+
|
|
10
|
+
## The Judgment Spectrum
|
|
11
|
+
|
|
12
|
+
Rules fall on a spectrum from fully deterministic to fully semantic:
|
|
13
|
+
|
|
14
|
+
### Deterministic Judgments (Use Python)
|
|
15
|
+
|
|
16
|
+
Rules with clear, computable criteria:
|
|
17
|
+
|
|
18
|
+
- **Threshold checks**: "The capital adequacy ratio must be >= 8%."
|
|
19
|
+
```python
|
|
20
|
+
result = "pass" if extracted_ratio >= 8.0 else "fail"
|
|
21
|
+
```
|
|
22
|
+
- **Format validation**: "The loan number must match pattern XX-YYYY-ZZZZZZ."
|
|
23
|
+
```python
|
|
24
|
+
result = "pass" if re.match(r"[A-Z]{2}-\d{4}-\d{6}", loan_number) else "fail"
|
|
25
|
+
```
|
|
26
|
+
- **Date arithmetic**: "The contract must be signed within 30 days of application."
|
|
27
|
+
```python
|
|
28
|
+
result = "pass" if (sign_date - app_date).days <= 30 else "fail"
|
|
29
|
+
```
|
|
30
|
+
- **Cross-field consistency**: "The total must equal the sum of line items."
|
|
31
|
+
```python
|
|
32
|
+
result = "pass" if abs(total - sum(items)) < 0.01 else "fail"
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
These are best implemented as pure Python. They are free, instant, and deterministic. When possible, prefer this form.
|
|
36
|
+
|
|
37
|
+
### Semantic Judgments (Use LLM)
|
|
38
|
+
|
|
39
|
+
Rules requiring language understanding:
|
|
40
|
+
|
|
41
|
+
- **Adequacy**: "The risk disclosure must adequately describe the key risks."
|
|
42
|
+
- **Completeness**: "The management discussion must address financial performance, strategic outlook, and market conditions."
|
|
43
|
+
- **Consistency**: "The executive summary must be consistent with the detailed findings."
|
|
44
|
+
- **Compliance with template**: "The report must follow the format specified in Regulation Appendix A."
|
|
45
|
+
|
|
46
|
+
For these, design an LLM prompt:
|
|
47
|
+
1. Provide the rule text (what constitutes compliance).
|
|
48
|
+
2. Provide the extracted content (what the document says).
|
|
49
|
+
3. Ask for a structured verdict: pass/fail, reasoning, and comment.
|
|
50
|
+
4. Ask the model to be conservative — flag as fail only when clearly non-compliant. When truly ambiguous, use a "partial" or "uncertain" result rather than a hard fail.
|
|
51
|
+
|
|
52
|
+
### Hybrid Judgments (Most Common)
|
|
53
|
+
|
|
54
|
+
Most rules combine deterministic and semantic elements:
|
|
55
|
+
- Extract the number (regex) → compare to threshold (Python) → if borderline, assess the explanation (LLM).
|
|
56
|
+
- Check that a section exists (deterministic) → check that it covers required topics (semantic).
|
|
57
|
+
|
|
58
|
+
Design the pipeline to run cheap steps first. Only invoke the LLM when the deterministic check is insufficient.
|
|
59
|
+
|
|
60
|
+
## Output Format
|
|
61
|
+
|
|
62
|
+
For each rule × document combination:
|
|
63
|
+
|
|
64
|
+
```json
|
|
65
|
+
{
|
|
66
|
+
"rule_id": "R001",
|
|
67
|
+
"document": "report_2024_q1.pdf",
|
|
68
|
+
"result": "pass | fail | missing | error | uncertain",
|
|
69
|
+
"extracted_value": "12.5%",
|
|
70
|
+
"expected": ">= 8.0%",
|
|
71
|
+
"comment": "",
|
|
72
|
+
"confidence": 0.95
|
|
73
|
+
}
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
**Result values:**
|
|
77
|
+
- **pass**: Entity complies with the rule.
|
|
78
|
+
- **fail**: Entity does not comply. Comment is required.
|
|
79
|
+
- **missing**: The entity could not be found in the document. This is different from fail — the information is absent, not non-compliant.
|
|
80
|
+
- **error**: Something went wrong during extraction or judgment (parsing failure, API error). Needs investigation.
|
|
81
|
+
- **uncertain**: The judgment is ambiguous. May need human review.
|
|
82
|
+
|
|
83
|
+
**Comments:**
|
|
84
|
+
- Required only when result is `fail`. Skip for `pass` unless the developer user specifically requests pass comments.
|
|
85
|
+
- Be concise and factual: "Capital adequacy ratio is 7.2%, below the regulatory minimum of 8.0%."
|
|
86
|
+
- Do not editorialize: not "This is a serious violation that could result in penalties." Just state the facts.
|
|
87
|
+
- Include the extracted value and the expected value/condition for context.
|
|
88
|
+
|
|
89
|
+
### Lightweight Annotation Markup
|
|
90
|
+
|
|
91
|
+
For human review, token-efficient logging, and clean diff comparisons, results can also be expressed in compact text markup:
|
|
92
|
+
|
|
93
|
+
```
|
|
94
|
+
[PASS] capital_adequacy <- 12.5% (>= 8.0%) | conf:0.95 | src:p3-s2
|
|
95
|
+
[FAIL] sign_date_gap <- 75d (<= 30d) | conf:0.90 | src:p1-s4 | note:Signing overdue by 45 days
|
|
96
|
+
[MISSING] collateral_value | conf:0.60 | note:Collateral valuation not found in document
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
This format is losslessly convertible to and from the JSON format above. Use it when presenting results to the developer user for quick review, logging to evolution iteration summaries where token economy matters, or computing diffs between verification runs. See `references/output-format.md` for the full specification and conversion rules.
|
|
100
|
+
|
|
101
|
+
## Judgment Ordering
|
|
102
|
+
|
|
103
|
+
Some rules depend on the results of other rules:
|
|
104
|
+
- Rule B might only apply if Rule A passes. "If the borrower is a new customer (Rule A), then additional documentation is required (Rule B)."
|
|
105
|
+
- Rule C might use a value computed by Rule A. "The risk-weighted capital ratio (Rule A) determines the required reserve level (Rule C)."
|
|
106
|
+
|
|
107
|
+
Map these dependencies in the rule catalog. Execute rules in dependency order. Pass upstream results as context to downstream rules.
|
|
108
|
+
|
|
109
|
+
## Handling Edge Cases
|
|
110
|
+
|
|
111
|
+
- **Null extraction**: The entity was not found. Default to `missing`, not `fail`. A missing value is an extraction problem, not a compliance problem.
|
|
112
|
+
- **Multiple values**: The document contains the entity in multiple places with different values. Flag as `uncertain`. Report all found values.
|
|
113
|
+
- **Conditional rules**: "If the loan exceeds 1M, then collateral is required." Check the condition before applying the rule. If the condition is not met, the rule does not apply — result is `pass` (or `not_applicable` if you add that category).
|
|
114
|
+
- **Negative results**: Some rules check for absence. "The document must NOT contain guarantees to related parties." Searching for absence is harder than searching for presence. Be thorough in the search, then be confident in the negative.
|