kc-beta 0.1.2 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. package/bin/kc-beta.js +14 -2
  2. package/package.json +1 -1
  3. package/src/agent/context-window.js +151 -0
  4. package/src/agent/context.js +8 -4
  5. package/src/agent/engine.js +261 -8
  6. package/src/agent/event-log.js +111 -0
  7. package/src/agent/llm-client.js +352 -59
  8. package/src/agent/pipelines/base.js +6 -0
  9. package/src/agent/pipelines/distillation.js +18 -0
  10. package/src/agent/pipelines/extraction.js +21 -0
  11. package/src/agent/pipelines/initializer.js +75 -14
  12. package/src/agent/pipelines/production-qc.js +19 -0
  13. package/src/agent/pipelines/skill-authoring.js +14 -0
  14. package/src/agent/pipelines/skill-testing.js +20 -0
  15. package/src/agent/retry.js +83 -0
  16. package/src/agent/session-state.js +79 -0
  17. package/src/agent/skill-loader.js +13 -1
  18. package/src/agent/token-counter.js +62 -0
  19. package/src/agent/tools/document-parse.js +104 -21
  20. package/src/agent/tools/document-search.js +24 -8
  21. package/src/agent/tools/sandbox-exec.js +16 -5
  22. package/src/agent/tools/web-search.js +107 -0
  23. package/src/agent/tools/worker-llm-call.js +14 -5
  24. package/src/agent/tools/workspace-file.js +47 -20
  25. package/src/agent/workspace.js +24 -1
  26. package/src/cli/components.js +24 -5
  27. package/src/cli/config.js +340 -0
  28. package/src/cli/index.js +113 -11
  29. package/src/cli/onboard.js +216 -53
  30. package/src/config.js +63 -10
  31. package/src/model-tiers.json +153 -0
  32. package/src/providers.js +367 -0
  33. package/template/AGENT.md +20 -0
  34. package/template/skills/en/meta/compliance-judgment/SKILL.md +10 -42
  35. package/template/skills/en/meta/document-chunking/SKILL.md +32 -0
  36. package/template/skills/en/meta/document-parsing/SKILL.md +11 -18
  37. package/template/skills/en/meta/entity-extraction/SKILL.md +13 -28
  38. package/template/skills/en/meta/tree-processing/SKILL.md +19 -1
  39. package/template/skills/en/meta-meta/auto-model-selection/SKILL.md +53 -0
  40. package/template/skills/en/meta-meta/pdf-review-dashboard/SKILL.md +57 -0
  41. package/template/skills/en/meta-meta/pdf-review-dashboard/scripts/generate_review.js +262 -0
  42. package/template/skills/en/meta-meta/rule-extraction/SKILL.md +24 -1
  43. package/template/skills/en/meta-meta/skill-authoring/SKILL.md +6 -0
  44. package/template/skills/en/meta-meta/skill-to-workflow/SKILL.md +4 -0
  45. package/template/skills/zh/meta/compliance-judgment/SKILL.md +41 -262
  46. package/template/skills/zh/meta/document-chunking/SKILL.md +32 -0
  47. package/template/skills/zh/meta/document-parsing/SKILL.md +65 -132
  48. package/template/skills/zh/meta/entity-extraction/SKILL.md +68 -230
  49. package/template/skills/zh/meta/tree-processing/SKILL.md +82 -194
  50. package/template/skills/zh/meta-meta/auto-model-selection/SKILL.md +51 -0
  51. package/template/skills/zh/meta-meta/pdf-review-dashboard/SKILL.md +55 -0
  52. package/template/skills/zh/meta-meta/pdf-review-dashboard/scripts/generate_review.js +262 -0
  53. package/template/skills/zh/meta-meta/rule-extraction/SKILL.md +79 -164
  54. package/template/skills/zh/meta-meta/skill-authoring/SKILL.md +64 -185
  55. package/template/skills/zh/meta-meta/skill-to-workflow/SKILL.md +95 -216
@@ -0,0 +1,53 @@
1
+ ---
2
+ name: auto-model-selection
3
+ description: >
4
+ Use Context7 CLI to get up-to-date LLM model information. Use whenever you need to
5
+ know about available models, model capabilities, pricing, context window sizes, or
6
+ which model is suitable for a task — including tier assignment, worker LLM workflow
7
+ design, model comparison, and provider-specific API usage. Context7 gives you current
8
+ information that your training data may not have. Requires context7 CLI installed
9
+ (npm i -g context7). Optional plugin.
10
+ ---
11
+
12
+ # Auto Model Selection via Context7
13
+
14
+ ## What Context7 Is
15
+
16
+ Context7 (`c7`) is a lightweight CLI tool that fetches up-to-date documentation for libraries and APIs. Install: `npm i -g context7`. Two commands:
17
+ - `c7 library <query>` — search for a library/provider by name
18
+ - `c7 docs <libraryId> <query>` — get specific documentation and code examples
19
+
20
+ ## When to Use
21
+
22
+ - User's `model-tiers.json` is outdated (KC hasn't been updated)
23
+ - User switched to a new provider and needs model discovery
24
+ - User explicitly asks to update model selections
25
+ - Onboarding `/models` endpoint failed and curated list is stale
26
+
27
+ ## How It Works
28
+
29
+ 1. User chooses provider and provides API key (or coding plan)
30
+ 2. Use `c7 library <provider-name>` to find the provider's library ID
31
+ 3. Use `c7 docs <id> "available models"` to get current model listings
32
+ 4. From the docs, identify: model names, capabilities (reasoning, coding, vision), context window sizes, pricing tiers
33
+ 5. Assign models to tiers based on capability and cost:
34
+ - LLM tier1: most capable (complex judgment, extraction)
35
+ - LLM tier2-3: mid-range (routine extraction, simple judgment)
36
+ - LLM tier4: cheapest (high-volume simple tasks)
37
+ - VLM tier1-3: vision models for document parsing/OCR
38
+ 6. Update `model-tiers.json` or workspace `.env` with assignments
39
+
40
+ ## Tier Assignment Principles
41
+
42
+ - Cheapest model that meets accuracy threshold for the task
43
+ - Regex is tier0 — smaller than any LLM
44
+ - Not all tiers need to be filled — blank tiers are fine if the provider lacks suitable models
45
+ - Record what works in AGENT.md for future reference
46
+
47
+ ## Prerequisites
48
+
49
+ ```bash
50
+ npm i -g context7
51
+ ```
52
+
53
+ Verify: `c7 library openai` should return results.
@@ -0,0 +1,57 @@
1
+ ---
2
+ name: pdf-review-dashboard
3
+ description: >
4
+ Generate a two-column PDF review dashboard for manual verification result checking.
5
+ Left panel shows the original PDF document, right panel shows verification results.
6
+ Clicking a result jumps the PDF to the relevant page. Use this when the developer user
7
+ needs to visually compare verification outputs against source documents, or when
8
+ collecting ground truth for the evolution loop. Output is a single self-contained HTML file.
9
+ ---
10
+
11
+ ## What It Does
12
+
13
+ Generates a single self-contained HTML file that displays:
14
+ - Left: original PDF rendered in-browser
15
+ - Right: verification results in an interactive list
16
+ - Click-to-jump: selecting a result scrolls the PDF to the referenced page
17
+
18
+ The developer user opens this HTML in a browser to manually review verification quality.
19
+
20
+ ## Tech Stack
21
+
22
+ - Single HTML file, no server required
23
+ - PDF embedded as base64 (fully self-contained, shareable)
24
+ - pdf.js via CDN for in-browser PDF rendering
25
+ - Vanilla JS + inline CSS, no framework dependencies
26
+ - Dark theme consistent with KC dashboard style
27
+
28
+ ## Layout
29
+
30
+ - Resizable split pane with draggable divider
31
+ - Left: PDF viewer with page navigation (prev/next/go-to-page) and zoom controls (+/-/fit-width)
32
+ - Right: results list with filter buttons, click to expand details and jump to PDF page
33
+ - Page highlight animation on jump
34
+
35
+ ## Data Format
36
+
37
+ The generator script reads a PDF file and a results JSON, then produces the HTML.
38
+
39
+ Input to the script:
40
+ - `pdf_path` — path to the source PDF document
41
+ - `results_path` — path to a JSON file containing verification results
42
+
43
+ The results JSON is an array of objects. Each object should have at minimum:
44
+ - A page reference (which page in the PDF this result relates to)
45
+ - A result status (pass/fail/warning or equivalent)
46
+
47
+ The right panel columns and detail fields adapt to whatever data the verification workflow produces. The script in `scripts/generate_review.js` is a reference implementation — adapt the data mapping to match your project's output format.
48
+
49
+ ## When to Use
50
+
51
+ - After a verification workflow completes, to let the developer user visually audit results
52
+ - When collecting ground truth corrections for the evolution loop
53
+ - When presenting results to stakeholders who need to see source evidence
54
+
55
+ ## Generator Script
56
+
57
+ See `scripts/generate_review.js` — a Node.js script that takes a PDF path and outputs the review HTML. Adapt the results data mapping section to match your project's verification output format.
@@ -0,0 +1,262 @@
1
+ #!/usr/bin/env node
2
+ /**
3
+ * PDF Review Dashboard Generator
4
+ *
5
+ * Generates a single self-contained HTML file with:
6
+ * - Left: PDF viewer (pdf.js CDN, base64 embedded)
7
+ * - Right: interactive verification results list
8
+ * - Click result → jump to PDF page
9
+ *
10
+ * Usage:
11
+ * node generate_review.js <pdf_path> <results_json_path> [output_html_path]
12
+ *
13
+ * The results JSON should be an array of objects. Adapt the DATA MAPPING
14
+ * section below to match your project's verification output format.
15
+ */
16
+ import fs from "node:fs";
17
+ import path from "node:path";
18
+
19
+ const pdfPath = process.argv[2];
20
+ const resultsPath = process.argv[3];
21
+ const outputPath = process.argv[4] || "review_dashboard.html";
22
+
23
+ if (!pdfPath || !resultsPath) {
24
+ console.error("Usage: node generate_review.js <pdf_path> <results_json_path> [output_html_path]");
25
+ process.exit(1);
26
+ }
27
+
28
+ // Read inputs
29
+ const pdfBuffer = fs.readFileSync(pdfPath);
30
+ const pdfBase64 = pdfBuffer.toString("base64");
31
+ const pdfFileName = path.basename(pdfPath);
32
+ const rawResults = JSON.parse(fs.readFileSync(resultsPath, "utf-8"));
33
+
34
+ // ============================================================
35
+ // DATA MAPPING — adapt this section to your verification output
36
+ // ============================================================
37
+ // Map your raw results into the format the dashboard expects.
38
+ // Each item needs at minimum: id, label, result, page.
39
+ // Add any extra fields you want shown in the detail panel.
40
+ const results = Array.isArray(rawResults) ? rawResults : rawResults.results || [];
41
+ const mappedResults = results.map((r, i) => ({
42
+ id: r.id || r.rule_id || `R${String(i + 1).padStart(3, "0")}`,
43
+ label: r.rule || r.label || r.name || r.description || `Item ${i + 1}`,
44
+ result: r.result || r.status || "unknown",
45
+ confidence: r.confidence ?? r.score ?? null,
46
+ page: r.page || r.page_ref || 1,
47
+ // Detail fields — include whatever your workflow outputs
48
+ detail: r.detail || Object.fromEntries(
49
+ Object.entries(r).filter(([k]) => !["id","rule_id","rule","label","name","result","status","confidence","score","page","page_ref"].includes(k))
50
+ ),
51
+ }));
52
+ // ============================================================
53
+
54
+ console.log(`PDF: ${pdfFileName} (${(pdfBuffer.length / 1024 / 1024).toFixed(1)}MB)`);
55
+ console.log(`Results: ${mappedResults.length} items`);
56
+
57
+ // Generate HTML
58
+ const html = buildHTML(pdfBase64, pdfFileName, mappedResults);
59
+ fs.writeFileSync(outputPath, html, "utf-8");
60
+ console.log(`Output: ${outputPath} (${(Buffer.byteLength(html) / 1024 / 1024).toFixed(1)}MB)`);
61
+
62
+ function buildHTML(pdfB64, fileName, items) {
63
+ const resultsJSON = JSON.stringify(items);
64
+ return `<!DOCTYPE html>
65
+ <html lang="zh-CN">
66
+ <head>
67
+ <meta charset="UTF-8">
68
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
69
+ <title>KC Review — ${fileName}</title>
70
+ <style>
71
+ * { margin: 0; padding: 0; box-sizing: border-box; }
72
+ :root {
73
+ --bg: #0a0a0a; --bg2: #141414; --bg3: #1e1e1e;
74
+ --text: #e5e5e5; --dim: #888; --border: #2a2a2a;
75
+ --green: #22c55e; --yellow: #eab308; --red: #ef4444;
76
+ --blue: #3b82f6; --orange: #f97316;
77
+ }
78
+ body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif; background: var(--bg); color: var(--text); height: 100vh; overflow: hidden; }
79
+ #app { display: flex; height: 100vh; }
80
+ #pdf-panel { flex: 1; display: flex; flex-direction: column; border-right: 1px solid var(--border); min-width: 300px; }
81
+ #pdf-toolbar { display: flex; align-items: center; gap: 8px; padding: 8px 12px; background: var(--bg2); border-bottom: 1px solid var(--border); flex-shrink: 0; }
82
+ #pdf-toolbar button { background: var(--bg3); color: var(--text); border: 1px solid var(--border); border-radius: 4px; padding: 4px 10px; cursor: pointer; font-size: 13px; }
83
+ #pdf-toolbar button:hover { background: var(--border); }
84
+ #pdf-toolbar span { color: var(--dim); font-size: 13px; }
85
+ #pdf-toolbar input[type=number] { width: 50px; background: var(--bg3); color: var(--text); border: 1px solid var(--border); border-radius: 4px; padding: 4px; text-align: center; font-size: 13px; }
86
+ #pdf-container { flex: 1; overflow: auto; display: flex; flex-direction: column; align-items: center; padding: 16px; gap: 8px; }
87
+ .pdf-page-wrapper { position: relative; box-shadow: 0 2px 8px rgba(0,0,0,0.5); }
88
+ .pdf-page-wrapper canvas { display: block; }
89
+ .page-highlight { position: absolute; inset: 0; background: rgba(59,130,246,0.12); border: 2px solid var(--blue); pointer-events: none; opacity: 0; transition: opacity 0.3s; }
90
+ .page-highlight.active { opacity: 1; animation: pulse-border 1.5s ease-out; }
91
+ @keyframes pulse-border { 0% { border-color: var(--orange); box-shadow: 0 0 20px rgba(249,115,22,0.4); } 100% { border-color: var(--blue); box-shadow: none; } }
92
+ #drag-handle { width: 5px; background: var(--border); cursor: col-resize; flex-shrink: 0; transition: background 0.2s; }
93
+ #drag-handle:hover, #drag-handle.dragging { background: var(--blue); }
94
+ #results-panel { flex: 1; display: flex; flex-direction: column; min-width: 350px; }
95
+ #results-toolbar { display: flex; align-items: center; gap: 8px; padding: 8px 12px; background: var(--bg2); border-bottom: 1px solid var(--border); flex-shrink: 0; flex-wrap: wrap; }
96
+ #results-toolbar .filter-btn { background: var(--bg3); color: var(--dim); border: 1px solid var(--border); border-radius: 12px; padding: 3px 10px; cursor: pointer; font-size: 12px; transition: all 0.2s; }
97
+ #results-toolbar .filter-btn.active { color: var(--text); border-color: var(--blue); background: rgba(59,130,246,0.15); }
98
+ #results-toolbar .summary { margin-left: auto; font-size: 12px; color: var(--dim); }
99
+ #results-list { flex: 1; overflow: auto; }
100
+ .result-item { border-bottom: 1px solid var(--border); cursor: pointer; transition: background 0.15s; }
101
+ .result-item:hover { background: var(--bg3); }
102
+ .result-item.selected { background: rgba(59,130,246,0.1); border-left: 3px solid var(--blue); }
103
+ .result-row { display: flex; align-items: center; padding: 10px 12px; gap: 10px; }
104
+ .result-id { font-size: 11px; color: var(--dim); min-width: 40px; font-family: monospace; }
105
+ .result-label { flex: 1; font-size: 13px; }
106
+ .result-badge { font-size: 11px; font-weight: 600; padding: 2px 8px; border-radius: 10px; text-transform: uppercase; }
107
+ .badge-pass { background: rgba(34,197,94,0.15); color: var(--green); }
108
+ .badge-fail { background: rgba(239,68,68,0.15); color: var(--red); }
109
+ .badge-warning { background: rgba(234,179,8,0.15); color: var(--yellow); }
110
+ .badge-unknown { background: rgba(136,136,136,0.15); color: var(--dim); }
111
+ .result-confidence { font-size: 12px; color: var(--dim); min-width: 40px; text-align: right; }
112
+ .result-page { font-size: 11px; color: var(--dim); min-width: 30px; text-align: right; }
113
+ .result-detail { display: none; padding: 8px 12px 14px 62px; font-size: 12px; line-height: 1.6; color: var(--dim); border-top: 1px dashed var(--border); }
114
+ .result-item.expanded .result-detail { display: block; }
115
+ .detail-row { margin-bottom: 4px; }
116
+ .detail-key { color: var(--text); font-weight: 500; }
117
+ </style>
118
+ </head>
119
+ <body>
120
+ <div id="app">
121
+ <div id="pdf-panel">
122
+ <div id="pdf-toolbar">
123
+ <button onclick="prevPage()">◀</button>
124
+ <span>Page</span>
125
+ <input type="number" id="page-input" value="1" min="1" onchange="goToPage(this.value)">
126
+ <span id="page-count">/ ?</span>
127
+ <button onclick="nextPage()">▶</button>
128
+ <span style="margin-left:8px">|</span>
129
+ <button onclick="zoomOut()">−</button>
130
+ <span id="zoom-label">100%</span>
131
+ <button onclick="zoomIn()">+</button>
132
+ <button onclick="fitWidth()">Fit</button>
133
+ </div>
134
+ <div id="pdf-container"></div>
135
+ </div>
136
+ <div id="drag-handle"></div>
137
+ <div id="results-panel">
138
+ <div id="results-toolbar">
139
+ <span class="summary" id="results-summary"></span>
140
+ </div>
141
+ <div id="results-list"></div>
142
+ </div>
143
+ </div>
144
+ <script type="module">
145
+ const PDF_B64 = "${pdfB64}";
146
+ const RESULTS = ${resultsJSON};
147
+
148
+ // PDF setup
149
+ const pdfjsLib = await import("https://cdnjs.cloudflare.com/ajax/libs/pdf.js/4.10.38/pdf.min.mjs");
150
+ pdfjsLib.GlobalWorkerOptions.workerSrc = "https://cdnjs.cloudflare.com/ajax/libs/pdf.js/4.10.38/pdf.worker.min.mjs";
151
+ const pdfData = Uint8Array.from(atob(PDF_B64), c => c.charCodeAt(0));
152
+ const pdf = await pdfjsLib.getDocument({ data: pdfData }).promise;
153
+ const totalPages = pdf.numPages;
154
+ document.getElementById("page-count").textContent = "/ " + totalPages;
155
+ document.getElementById("page-input").max = totalPages;
156
+
157
+ let scale = 1.2, currentPage = 1;
158
+ const container = document.getElementById("pdf-container");
159
+ const pageCanvases = new Map();
160
+
161
+ async function renderAllPages() {
162
+ container.innerHTML = ""; pageCanvases.clear();
163
+ for (let i = 1; i <= totalPages; i++) {
164
+ const page = await pdf.getPage(i);
165
+ const vp = page.getViewport({ scale });
166
+ const w = document.createElement("div");
167
+ w.className = "pdf-page-wrapper"; w.id = "page-" + i;
168
+ w.style.width = vp.width + "px"; w.style.height = vp.height + "px";
169
+ const c = document.createElement("canvas");
170
+ c.width = vp.width; c.height = vp.height;
171
+ await page.render({ canvasContext: c.getContext("2d"), viewport: vp }).promise;
172
+ const hl = document.createElement("div"); hl.className = "page-highlight";
173
+ w.appendChild(c); w.appendChild(hl); container.appendChild(w);
174
+ pageCanvases.set(i, w);
175
+ }
176
+ }
177
+ await renderAllPages();
178
+
179
+ function goToPage(n) { n = Math.max(1, Math.min(parseInt(n)||1, totalPages)); currentPage = n; document.getElementById("page-input").value = n; const el = document.getElementById("page-"+n); if(el) el.scrollIntoView({behavior:"smooth",block:"start"}); }
180
+ function prevPage() { goToPage(currentPage-1); }
181
+ function nextPage() { goToPage(currentPage+1); }
182
+ function zoomIn() { scale = Math.min(scale+0.2, 3); updateZoom(); }
183
+ function zoomOut() { scale = Math.max(scale-0.2, 0.4); updateZoom(); }
184
+ function fitWidth() { pdf.getPage(1).then(p => { scale = (document.getElementById("pdf-panel").clientWidth-40)/p.getViewport({scale:1}).width; updateZoom(); }); }
185
+ function updateZoom() { document.getElementById("zoom-label").textContent = Math.round(scale*100)+"%"; renderAllPages(); }
186
+ window.goToPage=goToPage; window.prevPage=prevPage; window.nextPage=nextPage;
187
+ window.zoomIn=zoomIn; window.zoomOut=zoomOut; window.fitWidth=fitWidth;
188
+
189
+ // Detect unique result statuses for filter buttons
190
+ const statuses = [...new Set(RESULTS.map(r => r.result))];
191
+ const toolbar = document.getElementById("results-toolbar");
192
+ const filterHTML = '<button class="filter-btn active" data-filter="all">All</button>' +
193
+ statuses.map(s => '<button class="filter-btn" data-filter="'+s+'">'+s.charAt(0).toUpperCase()+s.slice(1)+'</button>').join("");
194
+ toolbar.insertAdjacentHTML("afterbegin", filterHTML);
195
+ let activeFilter = "all", selectedId = null;
196
+ toolbar.querySelectorAll(".filter-btn").forEach(b => b.addEventListener("click", () => {
197
+ activeFilter = b.dataset.filter;
198
+ toolbar.querySelectorAll(".filter-btn").forEach(x => x.classList.toggle("active", x.dataset.filter===activeFilter));
199
+ selectedId = null; renderResults();
200
+ }));
201
+
202
+ function renderResults() {
203
+ const list = document.getElementById("results-list");
204
+ const filtered = activeFilter === "all" ? RESULTS : RESULTS.filter(r => r.result === activeFilter);
205
+ const counts = statuses.map(s => RESULTS.filter(r=>r.result===s).length + " " + s).join(" · ");
206
+ document.getElementById("results-summary").textContent = counts;
207
+
208
+ list.innerHTML = filtered.map(r => {
209
+ const bc = ["pass","fail","warning"].includes(r.result) ? "badge-"+r.result : "badge-unknown";
210
+ const sel = r.id === selectedId ? " selected expanded" : "";
211
+ const conf = r.confidence != null ? Math.round(r.confidence*100)+"%" : "";
212
+ let detailHTML = "";
213
+ if (r.detail && typeof r.detail === "object") {
214
+ detailHTML = Object.entries(r.detail).map(([k,v]) =>
215
+ '<div class="detail-row"><span class="detail-key">'+k+': </span>'+String(v)+'</div>'
216
+ ).join("");
217
+ }
218
+ return '<div class="result-item'+sel+'" data-id="'+r.id+'" data-page="'+r.page+'">' +
219
+ '<div class="result-row">' +
220
+ '<span class="result-id">'+r.id+'</span>' +
221
+ '<span class="result-label">'+r.label+'</span>' +
222
+ '<span class="result-badge '+bc+'">'+r.result+'</span>' +
223
+ (conf ? '<span class="result-confidence">'+conf+'</span>' : '') +
224
+ '<span class="result-page">p.'+r.page+'</span>' +
225
+ '</div>' +
226
+ (detailHTML ? '<div class="result-detail">'+detailHTML+'</div>' : '') +
227
+ '</div>';
228
+ }).join("");
229
+
230
+ list.querySelectorAll(".result-item").forEach(el => el.addEventListener("click", () => {
231
+ const id = el.dataset.id, page = parseInt(el.dataset.page);
232
+ if (selectedId === id) { selectedId = null; el.classList.remove("selected","expanded"); }
233
+ else { list.querySelectorAll(".result-item").forEach(e=>e.classList.remove("selected","expanded")); selectedId = id; el.classList.add("selected","expanded"); }
234
+ jumpToPage(page);
235
+ }));
236
+ }
237
+
238
+ function jumpToPage(page) {
239
+ currentPage = page; document.getElementById("page-input").value = page;
240
+ const el = document.getElementById("page-"+page);
241
+ if(el) { el.scrollIntoView({behavior:"smooth",block:"center"});
242
+ const hl = el.querySelector(".page-highlight"); hl.classList.remove("active");
243
+ void hl.offsetWidth; hl.classList.add("active"); setTimeout(()=>hl.classList.remove("active"),2000); }
244
+ }
245
+ renderResults();
246
+
247
+ // Drag handle
248
+ const handle = document.getElementById("drag-handle");
249
+ let dragging = false;
250
+ handle.addEventListener("mousedown", e => { dragging=true; handle.classList.add("dragging"); e.preventDefault(); });
251
+ document.addEventListener("mousemove", e => { if(!dragging) return; const r=e.clientX/document.getElementById("app").clientWidth; const c=Math.max(0.2,Math.min(0.8,r)); document.getElementById("pdf-panel").style.flex="0 0 "+(c*100)+"%"; document.getElementById("results-panel").style.flex="1"; });
252
+ document.addEventListener("mouseup", () => { dragging=false; handle.classList.remove("dragging"); });
253
+
254
+ container.addEventListener("scroll", () => {
255
+ const cr = container.getBoundingClientRect(); let closest=1, cd=Infinity;
256
+ pageCanvases.forEach((w,n) => { const d=Math.abs(w.getBoundingClientRect().top-cr.top); if(d<cd){cd=d;closest=n;} });
257
+ if(closest!==currentPage){currentPage=closest;document.getElementById("page-input").value=closest;}
258
+ });
259
+ </script>
260
+ </body>
261
+ </html>`;
262
+ }
@@ -5,7 +5,30 @@ description: Extract and organize business verification rules from regulation do
5
5
 
6
6
  # Rule Extraction
7
7
 
8
- Rules are the atoms of verification. Each rule you extract will become its own skill folder, its own workflow, and its own production pipeline. The quality of your extraction determines everything downstream.
8
+ Rules are the atoms of verification. Each rule you extract will become its own skill folder, its own workflow, and its own production pipeline.
9
+
10
+ ## How This Differs from Data Extraction
11
+
12
+ Rule extraction is a **one-off task** at the start of a project. You read regulation documents and decompose them into discrete, testable rules. This is fuzzy, agile work — rules are read by you (a SOTA agent), so the schema can be messy and evolve freely.
13
+
14
+ Data/entity extraction (`entity-extraction`) is the **repeating task** that runs on every document being verified. It must fit a unified, stable schema because it feeds into automated workflows.
15
+
16
+ Don't conflate the two. Rule extraction happens once; data extraction happens on every document.
17
+
18
+ ## Rule Structure: Location → Extraction → Judgment
19
+
20
+ Every verification rule decomposes into three parts:
21
+
22
+ 1. **Location**: Where in the document to look (which chapter, section, table, or full document).
23
+ 2. **Extraction**: What data to pull from that location (a number, a date, a clause, a description).
24
+ 3. **Judgment**: How to determine pass/fail (threshold comparison, semantic assessment, cross-field check).
25
+
26
+ When extracting a rule, explicitly note all three parts. This determines the downstream pipeline structure:
27
+ - Full-document rules need no location step.
28
+ - Single-section rules need one location step.
29
+ - Cross-section rules (comparing values across chapters) need multiple location steps.
30
+
31
+ Classify each rule's scope accordingly — it affects how the verification workflow is structured.
9
32
 
10
33
  ## Philosophy
11
34
 
@@ -64,6 +64,12 @@ The body should cover:
64
64
  - Write in imperative form: "Extract the ratio" not "The ratio should be extracted."
65
65
  - If detailed regulation text is long, put it in `references/regulation.md` and reference it from SKILL.md.
66
66
 
67
+ ## Pipeline Node Design
68
+
69
+ When a skill's workflow has multiple steps, decompose into nodes where each node does one thing well. Each node's difficulty should be well within the model's capability — don't cram location + extraction + judgment into a single LLM call.
70
+
71
+ Pre-processing (text cleaning, format normalization) and post-processing (output parsing, value normalization) are separate nodes, not embedded in the LLM prompt. This keeps prompts clean and makes each step independently testable.
72
+
67
73
  ## Writing Scripts
68
74
 
69
75
  Scripts in `scripts/` handle deterministic operations:
@@ -7,6 +7,10 @@ description: Distill a proven verification skill into a Python workflow with wor
7
7
 
8
8
  The skill is the ground truth. The workflow is a cheaper, faster approximation. Your job is to make the approximation as good as the original while being as cheap as possible.
9
9
 
10
+ ## Engineering Goal
11
+
12
+ Optimize the full chain: **shortest workflow** (fewest nodes) → **smallest model per node** (cheapest tier that meets accuracy) → **shortest prompt per model** (minimum tokens). This is the engineering objective — not prompt template sophistication or framework compliance.
13
+
10
14
  ## When to Start
11
15
 
12
16
  A skill is ready for workflow distillation when: