@noobdemon/noob-cli 1.12.4 → 1.12.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,16 @@
2
2
 
3
3
  Tất cả thay đổi đáng kể của `@noobdemon/noob-cli` được ghi vào file này.
4
4
 
5
+ ## [1.12.6] - 2026-06-12
6
+
7
+ ### Added
8
+ - **Tool `write_todos`** (`src/repl/agent-dispatch.js` + `src/tools.js` + `src/agent.js`): tool ẢO để model declare structured todo list thay vì viết markdown `- [ ]`. Shape `{todos: [{text, done}]}` — REPLACE toàn bộ list mỗi lần gọi (no patch). Dispatcher intercept TRƯỚC `execTool`: set `state.todos` + `tui.setTodos` trực tiếp, set flag `state._todosFromTool=true` để `repl.js` skip parse markdown sau turn (tránh overwrite structured state). In compact box lần đầu, diff (chỉ dòng đổi) các lần sau. SYSTEM prompt rule TODO-BASED EXECUTION đã update: model PHẢI dùng `write_todos`, không viết markdown. Lý do: parser markdown cũ (`parseTodosFromHistory`) fragile khi model format sai (sai indent, dùng `*` thay `-`, thiếu space). Structured tool call → CLI render luôn đúng, progress bar trên status line cập nhật ngay. Stub trong `TOOLS.write_todos` làm fail-safe nếu lỡ qua `runTool` trực tiếp. Smoke `scripts/smoke-write-todos.mjs` 27/27 pass + regression `smoke-dispatch.mjs` 23/23 pass.
9
+
10
+ ## [1.12.5] - 2026-06-12
11
+
12
+ ### Added
13
+ - **Rule VERIFY-BEFORE-DISMISS** (`src/agent.js` SYSTEM + `noob.md` Rules): chống over-confidence ngược chiều với ANTI-HALLUCINATION. Model trước đây gặp TOOL RESULT trông lạ (output từ phiên cũ, lệnh không khớp) là tự phán "giả/noise/injection" rồi bỏ qua — cùng bản chất hallucination success (thay evidence bằng memory). Giờ default = tin runtime, nghi ngờ bản thân: thấy result lạ → coi như THẬT → chạy 1 tool xác minh (`read_file`/`grep`/`list_dir`/re-run) → chỉ khi tool xác minh MÂU THUẪN mới được gọi stale, và làm việc theo kết quả MỚI.
14
+
5
15
  ## [1.12.4] - 2026-06-12
6
16
 
7
17
  ### Fixed
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@noobdemon/noob-cli",
3
- "version": "1.12.4",
3
+ "version": "1.12.6",
4
4
  "publishConfig": {
5
5
  "access": "public"
6
6
  },
package/src/agent.js CHANGED
@@ -29,13 +29,15 @@ Available tools (each is self-contained; pick the SMALLEST tool that answers the
29
29
  - run_command {"command": str, "timeout"?: int, "background"?: bool} — run a shell command in the cwd. Foreground commands are killed after ~60s (override with "timeout" ms). For long-running processes — dev servers, watchers, \`python -m http.server\`, \`npm run dev\`, \`flask run\` — set "background": true: starts the process, returns immediately, keeps running WITHOUT blocking next steps. Never start a server in the foreground (it will hang then be killed).
30
30
  - bg_output {"id"?: int} — no id: list background processes + status; with id: show that process's captured output so far (poll after starting a server to confirm it came up).
31
31
  - kill_bg {"id": int} — stop a background process started with run_command background:true.
32
+ - write_todos {"todos": [{"text": str, "done": bool}, ...]} — declare/update structured TODO list. REPLACES the entire list every call (no patching individual items). To check an item off, resend the FULL list with done:true on that item. Use this INSTEAD of writing markdown \`- [ ]\` lines: the CLI renders it as a progress bar on the status line AND prints a compact box, no fragile markdown parsing. Call ONCE at the start of any multi-step task with all items done:false, then call AGAIN after each step with the just-finished item flipped to done:true.
32
33
 
33
34
  # Retrieval strategy (just-in-time, not bulk)
34
35
  Context is finite. Don't slurp the whole repo up front. Discover information progressively: list_dir/glob to map → grep to locate → read_file (with offset+limit for big files) to inspect only what matters. Each tool result spends your attention budget — make every call earn it. When a tool returns a huge blob, extract the few facts you need, then move on; don't re-read it later (the result stays in history).
35
36
 
36
37
  # Rules
37
- - TODO-BASED EXECUTION: For multi-step tasks, you MUST keep going until ALL items are "- [x]". NEVER stop mid-list. Flow: (1) write todo list, (2) start first item, (3) after EVERY tool result, check off the completed item AND IMMEDIATELY start the next unchecked item, (4) repeat until all done. Your response is NOT finished until ALL items are checked. The ONLY valid reason to stop is: (a) all items done, or (b) you are WAITING for a user reply. If you just got a tool result, you MUST continue — do NOT output a summary, do NOT ask "what next", do NOT stop. After write_file/edit_file returns, immediately do the next item.
38
+ - TODO-BASED EXECUTION: For any multi-step task (3+ actions), you MUST call \`write_todos\` FIRST with all items done:false, then call it AGAIN after every completed step with that item flipped to done:true (resend the full list). NEVER write markdown \`- [ ]\` lines the runtime parses \`write_todos\` calls, not markdown. Your response is NOT finished until all items are done:true. The ONLY valid reason to stop is: (a) all items done, or (b) you are WAITING for a user reply. If you just got a tool result, you MUST continue — do NOT output a summary, do NOT ask "what next", do NOT stop. After write_file/edit_file returns, call write_todos to tick the just-finished item, then immediately start the next.
38
39
  - GROUND TRUTH = real TOOL RESULTs in this conversation, not your memory or what you intended to do. A file changed only if a write_file/edit_file result confirms it (see the FILES CHANGED list). A test passed / build succeeded / command worked only if a run_command result above shows it. Never narrate outcomes you didn't observe; if you haven't checked, say so and check now (read_file / list_dir / run the command). Before any "done/summary" reply, reconcile every file and result you're about to claim against the actual tool results above — if it isn't there, you didn't do it yet.
40
+ - VERIFY BEFORE DISMISSING: never declare a TOOL RESULT "fake", "spurious", "injected", "unrelated", or "from a previous turn" without first verifying with a fresh tool call. If a result looks off (unexpected content, output you didn't ask for, weird command), your DEFAULT is: treat it as REAL runtime output, then run a small verification (read_file the affected path, grep for the symbol, list_dir, re-run the command) to confirm actual state. Only after the verification tool result contradicts the suspicious one may you call it stale/leftover — and even then, work from the FRESH result, never from your guess. Trusting your own skepticism over the runtime is the same over-confidence bug as hallucinating success: both substitute memory for evidence.
39
41
  - Investigate before editing: read the relevant files first; never invent file contents.
40
42
  - Make the smallest change that fully solves the task. Match the surrounding code style.
41
43
  - Prefer edit_file over write_file for existing files.
@@ -32,7 +32,7 @@ import { t } from '../i18n.js';
32
32
  * @returns {function} dispatchTool(name, input, depth=0) → {allow, result}
33
33
  */
34
34
  export function createAgentDispatcher(deps) {
35
- const { state, abort, tokenMeter, stopSpin, startSpin, execTool } = deps;
35
+ const { state, abort, tokenMeter, stopSpin, startSpin, execTool, tui, c } = deps;
36
36
  // Test injection points: production luôn dùng default; smoke test pass mock.
37
37
  const runSubAgent = deps.runSubAgent || defaultRunSubAgent;
38
38
  const findModel = deps.findModel || defaultFindModel;
@@ -44,6 +44,57 @@ export function createAgentDispatcher(deps) {
44
44
  const recordWorkflowTaskFailed = j.recordTaskFailed;
45
45
 
46
46
  const dispatchTool = async (name, input, depth = 0) => {
47
+ // write_todos: tool ẢO cập nhật state.todos + TUI trực tiếp. Không qua execTool
48
+ // vì không phải fs/shell — chỉ là cách model declare structured todo list thay
49
+ // vì viết markdown `- [ ]` (parser markdown fragile khi format sai). Mỗi lần
50
+ // gọi REPLACE toàn bộ list (không patch từng item — model gửi lại full list
51
+ // với done:true cho item vừa xong). State.todos được set NGAY → TUI render
52
+ // chính xác, không cần parse history.
53
+ if (name === 'write_todos') {
54
+ const todosIn = Array.isArray(input?.todos) ? input.todos : null;
55
+ if (!todosIn)
56
+ return { allow: true, result: 'ERROR: write_todos cần field "todos": [{text, done}].' };
57
+ const todos = todosIn
58
+ .filter((it) => it && typeof it.text === 'string' && it.text.trim())
59
+ .map((it) => ({ text: String(it.text).trim(), done: !!it.done }));
60
+ if (!todos.length)
61
+ return { allow: true, result: 'ERROR: todos rỗng — gửi ít nhất 1 item {text, done}.' };
62
+ const prev = Array.isArray(state.todos) ? state.todos : [];
63
+ const prevByText = new Map(prev.map((p) => [p.text, !!p.done]));
64
+ state.todos = todos;
65
+ // Flag: lượt này model đã dùng write_todos → repl skip parse markdown để
66
+ // không overwrite structured state bằng parser fragile. Reset đầu mỗi turn.
67
+ state._todosFromTool = true;
68
+ try { tui?.setTodos?.(todos); } catch {}
69
+ const done = todos.filter((t) => t.done).length;
70
+ // In compact: lần đầu (prev rỗng) hoặc list thay đổi tập text → in full.
71
+ // Nếu cùng tập text + chỉ khác trạng thái done → in diff (dòng vừa toggle).
72
+ const sameSet = prev.length === todos.length && todos.every((t) => prevByText.has(t.text));
73
+ stopSpin?.();
74
+ if (!sameSet) {
75
+ const lines = todos.map((t) => ' ' + (t.done ? '✓ ' : '☐ ') + t.text);
76
+ console.log((c?.tool || ((s) => s))(` 📋 todo (${done}/${todos.length})`));
77
+ console.log(lines.join('\n'));
78
+ } else {
79
+ // diff: in dòng có done thay đổi (cả false→true lẫn true→false).
80
+ const changes = todos.filter((t) => prevByText.get(t.text) !== t.done);
81
+ if (changes.length === 0) {
82
+ console.log((c?.dim || ((s) => s))(` 📋 todo (${done}/${todos.length}) · không đổi`));
83
+ } else {
84
+ console.log((c?.tool || ((s) => s))(` 📋 todo (${done}/${todos.length})`));
85
+ for (const ch of changes) {
86
+ const mark = ch.done ? '✓' : '☐';
87
+ console.log(' ' + mark + ' ' + ch.text);
88
+ }
89
+ }
90
+ }
91
+ startSpin?.();
92
+ return {
93
+ allow: true,
94
+ result: `Đã cập nhật ${todos.length} todo (${done} xong, ${todos.length - done} còn lại). Tiếp tục item chưa done; nếu tất cả done, kết thúc trả lời.`,
95
+ };
96
+ }
97
+
47
98
  // spawn_agent / spawn_agents chỉ được phép khi agentMode bật; depth giới hạn
48
99
  // bởi MAX_SUBAGENT_DEPTH để tránh đệ quy nổ.
49
100
  if (name === 'spawn_agent' || name === 'spawn_agents') {
package/src/repl.js CHANGED
@@ -1418,7 +1418,7 @@ NGUYÊN TẮC:
1418
1418
  // src/repl/agent-dispatch.js (v1.12.x). Factory được gọi MỖI turn vì abort
1419
1419
  // được rebind trong handle() — không cache.
1420
1420
  const dispatchTool = createAgentDispatcher({
1421
- state, abort, tokenMeter, stopSpin, startSpin, execTool,
1421
+ state, abort, tokenMeter, stopSpin, startSpin, execTool, tui, c,
1422
1422
  });
1423
1423
 
1424
1424
  const answer = await runAgent({
@@ -1461,8 +1461,15 @@ NGUYÊN TẮC:
1461
1461
  printAnswer(answer, state.model.name, providerColor(state.model.provider));
1462
1462
 
1463
1463
  // Parse todo từ model output → render trên status bar.
1464
- state.todos = parseTodosFromHistory(state.history);
1465
- tui.setTodos(state.todos);
1464
+ // Nếu lượt này model đã gọi write_todos (state._todosFromTool = true),
1465
+ // state.todos + TUI đã được set trực tiếp trong dispatcher — SKIP parse
1466
+ // markdown để không overwrite structured state bằng parser fragile.
1467
+ if (state._todosFromTool) {
1468
+ state._todosFromTool = false; // reset cho turn sau
1469
+ } else {
1470
+ state.todos = parseTodosFromHistory(state.history);
1471
+ tui.setTodos(state.todos);
1472
+ }
1466
1473
  return answer; // vòng ULTRA cần text này để dò token hoàn thành
1467
1474
  } catch (err) {
1468
1475
  stopSpin();
package/src/tools.js CHANGED
@@ -519,6 +519,17 @@ export const TOOLS = {
519
519
  return `Killed background process #${id} (${p.command}).`;
520
520
  },
521
521
 
522
+ // write_todos là tool ẢO: dispatcher (src/repl/agent-dispatch.js) intercept TRƯỚC
523
+ // khi vào execTool, nên stub này thường không chạy. Giữ để fail-safe: nếu có
524
+ // code path nào lỡ gọi runTool('write_todos', ...) trực tiếp, ít nhất trả OK
525
+ // thay vì ném Unknown tool — và stub trả lỗi rõ hướng dẫn fix.
526
+ async write_todos({ todos }, { signal } = {}) {
527
+ if (signal?.aborted) throw new Error('aborted');
528
+ if (!Array.isArray(todos)) throw new Error('write_todos: todos phải là mảng');
529
+ const done = todos.filter((it) => it && it.done).length;
530
+ return `(stub) write_todos nhận ${todos.length} item (${done} done) — dispatcher đáng lẽ phải intercept trước; nếu thấy dòng này, báo bug.`;
531
+ },
532
+
522
533
  // Knowledge graph tools — KHÔNG xin permission (user chọn tự do).
523
534
  // Storage: <cwd>/.noob/kg.jsonl. Logic ở src/kg.js.
524
535
  async kg_search({ query }, { signal } = {}) {
@@ -679,6 +690,11 @@ export function describe(name, input) {
679
690
  return `↳ sub-agent: ${String(input.task || '').slice(0, 80)}`;
680
691
  case 'spawn_agents':
681
692
  return `↳ ${(input.tasks || []).length} sub-agent song song`;
693
+ case 'write_todos': {
694
+ const items = Array.isArray(input?.todos) ? input.todos : [];
695
+ const done = items.filter((it) => it && it.done).length;
696
+ return `todo ${done}/${items.length}`;
697
+ }
682
698
  case 'kg_search':
683
699
  return `kg search "${input.query || ''}"`;
684
700
  case 'kg_add':