npm - @noobdemon/noob-cli - Versions diffs - 1.12.4 → 1.12.6 - Mend

@noobdemon/noob-cli 1.12.4 → 1.12.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/CHANGELOG.md +10 -0
package/package.json +1 -1
package/src/agent.js +3 -1
package/src/repl/agent-dispatch.js +52 -1
package/src/repl.js +10 -3
package/src/tools.js +16 -0

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,16 @@
 Tất cả thay đổi đáng kể của `@noobdemon/noob-cli` được ghi vào file này.
+## [1.12.6] - 2026-06-12
+### Added
+- **Tool `write_todos`** (`src/repl/agent-dispatch.js` + `src/tools.js` + `src/agent.js`): tool ẢO để model declare structured todo list thay vì viết markdown `- [ ]`. Shape `{todos: [{text, done}]}` — REPLACE toàn bộ list mỗi lần gọi (no patch). Dispatcher intercept TRƯỚC `execTool`: set `state.todos` + `tui.setTodos` trực tiếp, set flag `state._todosFromTool=true` để `repl.js` skip parse markdown sau turn (tránh overwrite structured state). In compact box lần đầu, diff (chỉ dòng đổi) các lần sau. SYSTEM prompt rule TODO-BASED EXECUTION đã update: model PHẢI dùng `write_todos`, không viết markdown. Lý do: parser markdown cũ (`parseTodosFromHistory`) fragile khi model format sai (sai indent, dùng `*` thay `-`, thiếu space). Structured tool call → CLI render luôn đúng, progress bar trên status line cập nhật ngay. Stub trong `TOOLS.write_todos` làm fail-safe nếu lỡ qua `runTool` trực tiếp. Smoke `scripts/smoke-write-todos.mjs` 27/27 pass + regression `smoke-dispatch.mjs` 23/23 pass.
+## [1.12.5] - 2026-06-12
+### Added
+- **Rule VERIFY-BEFORE-DISMISS** (`src/agent.js` SYSTEM + `noob.md` Rules): chống over-confidence ngược chiều với ANTI-HALLUCINATION. Model trước đây gặp TOOL RESULT trông lạ (output từ phiên cũ, lệnh không khớp) là tự phán "giả/noise/injection" rồi bỏ qua — cùng bản chất hallucination success (thay evidence bằng memory). Giờ default = tin runtime, nghi ngờ bản thân: thấy result lạ → coi như THẬT → chạy 1 tool xác minh (`read_file`/`grep`/`list_dir`/re-run) → chỉ khi tool xác minh MÂU THUẪN mới được gọi stale, và làm việc theo kết quả MỚI.
 ## [1.12.4] - 2026-06-12
 ### Fixed

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@noobdemon/noob-cli",
-  "version": "1.12.4",
+  "version": "1.12.6",
   "publishConfig": {
     "access": "public"
   },

package/src/agent.js CHANGED Viewed

@@ -29,13 +29,15 @@ Available tools (each is self-contained; pick the SMALLEST tool that answers the
 - run_command {"command": str, "timeout"?: int, "background"?: bool} — run a shell command in the cwd. Foreground commands are killed after ~60s (override with "timeout" ms). For long-running processes — dev servers, watchers, \`python -m http.server\`, \`npm run dev\`, \`flask run\` — set "background": true: starts the process, returns immediately, keeps running WITHOUT blocking next steps. Never start a server in the foreground (it will hang then be killed).
 - bg_output  {"id"?: int}                                     — no id: list background processes + status; with id: show that process's captured output so far (poll after starting a server to confirm it came up).
 - kill_bg     {"id": int}                                      — stop a background process started with run_command background:true.
+- write_todos {"todos": [{"text": str, "done": bool}, ...]}      — declare/update structured TODO list. REPLACES the entire list every call (no patching individual items). To check an item off, resend the FULL list with done:true on that item. Use this INSTEAD of writing markdown \`- [ ]\` lines: the CLI renders it as a progress bar on the status line AND prints a compact box, no fragile markdown parsing. Call ONCE at the start of any multi-step task with all items done:false, then call AGAIN after each step with the just-finished item flipped to done:true.
 # Retrieval strategy (just-in-time, not bulk)
 Context is finite. Don't slurp the whole repo up front. Discover information progressively: list_dir/glob to map → grep to locate → read_file (with offset+limit for big files) to inspect only what matters. Each tool result spends your attention budget — make every call earn it. When a tool returns a huge blob, extract the few facts you need, then move on; don't re-read it later (the result stays in history).
 # Rules
-- TODO-BASED EXECUTION: For multi-step tasks, you MUST keep going until ALL items are "- [x]". NEVER stop mid-list. Flow: (1) write todo list, (2) start first item, (3) after EVERY tool result, check off the completed item AND IMMEDIATELY start the next unchecked item, (4) repeat until all done. Your response is NOT finished until ALL items are checked. The ONLY valid reason to stop is: (a) all items done, or (b) you are WAITING for a user reply. If you just got a tool result, you MUST continue — do NOT output a summary, do NOT ask "what next", do NOT stop. After write_file/edit_file returns, immediately do the next item.
+- TODO-BASED EXECUTION: For any multi-step task (3+ actions), you MUST call \`write_todos\` FIRST with all items done:false, then call it AGAIN after every completed step with that item flipped to done:true (resend the full list). NEVER write markdown \`- [ ]\` lines — the runtime parses \`write_todos\` calls, not markdown. Your response is NOT finished until all items are done:true. The ONLY valid reason to stop is: (a) all items done, or (b) you are WAITING for a user reply. If you just got a tool result, you MUST continue — do NOT output a summary, do NOT ask "what next", do NOT stop. After write_file/edit_file returns, call write_todos to tick the just-finished item, then immediately start the next.
 - GROUND TRUTH = real TOOL RESULTs in this conversation, not your memory or what you intended to do. A file changed only if a write_file/edit_file result confirms it (see the FILES CHANGED list). A test passed / build succeeded / command worked only if a run_command result above shows it. Never narrate outcomes you didn't observe; if you haven't checked, say so and check now (read_file / list_dir / run the command). Before any "done/summary" reply, reconcile every file and result you're about to claim against the actual tool results above — if it isn't there, you didn't do it yet.
+- VERIFY BEFORE DISMISSING: never declare a TOOL RESULT "fake", "spurious", "injected", "unrelated", or "from a previous turn" without first verifying with a fresh tool call. If a result looks off (unexpected content, output you didn't ask for, weird command), your DEFAULT is: treat it as REAL runtime output, then run a small verification (read_file the affected path, grep for the symbol, list_dir, re-run the command) to confirm actual state. Only after the verification tool result contradicts the suspicious one may you call it stale/leftover — and even then, work from the FRESH result, never from your guess. Trusting your own skepticism over the runtime is the same over-confidence bug as hallucinating success: both substitute memory for evidence.
 - Investigate before editing: read the relevant files first; never invent file contents.
 - Make the smallest change that fully solves the task. Match the surrounding code style.
 - Prefer edit_file over write_file for existing files.

package/src/repl/agent-dispatch.js CHANGED Viewed

@@ -32,7 +32,7 @@ import { t } from '../i18n.js';
  * @returns {function} dispatchTool(name, input, depth=0) → {allow, result}
  */
 export function createAgentDispatcher(deps) {
-  const { state, abort, tokenMeter, stopSpin, startSpin, execTool } = deps;
+  const { state, abort, tokenMeter, stopSpin, startSpin, execTool, tui, c } = deps;
   // Test injection points: production luôn dùng default; smoke test pass mock.
   const runSubAgent = deps.runSubAgent || defaultRunSubAgent;
   const findModel = deps.findModel || defaultFindModel;
@@ -44,6 +44,57 @@ export function createAgentDispatcher(deps) {
   const recordWorkflowTaskFailed = j.recordTaskFailed;
   const dispatchTool = async (name, input, depth = 0) => {
+    // write_todos: tool ẢO cập nhật state.todos + TUI trực tiếp. Không qua execTool
+    // vì không phải fs/shell — chỉ là cách model declare structured todo list thay
+    // vì viết markdown `- [ ]` (parser markdown fragile khi format sai). Mỗi lần
+    // gọi REPLACE toàn bộ list (không patch từng item — model gửi lại full list
+    // với done:true cho item vừa xong). State.todos được set NGAY → TUI render
+    // chính xác, không cần parse history.
+    if (name === 'write_todos') {
+      const todosIn = Array.isArray(input?.todos) ? input.todos : null;
+      if (!todosIn)
+        return { allow: true, result: 'ERROR: write_todos cần field "todos": [{text, done}].' };
+      const todos = todosIn
+        .filter((it) => it && typeof it.text === 'string' && it.text.trim())
+        .map((it) => ({ text: String(it.text).trim(), done: !!it.done }));
+      if (!todos.length)
+        return { allow: true, result: 'ERROR: todos rỗng — gửi ít nhất 1 item {text, done}.' };
+      const prev = Array.isArray(state.todos) ? state.todos : [];
+      const prevByText = new Map(prev.map((p) => [p.text, !!p.done]));
+      state.todos = todos;
+      // Flag: lượt này model đã dùng write_todos → repl skip parse markdown để
+      // không overwrite structured state bằng parser fragile. Reset đầu mỗi turn.
+      state._todosFromTool = true;
+      try { tui?.setTodos?.(todos); } catch {}
+      const done = todos.filter((t) => t.done).length;
+      // In compact: lần đầu (prev rỗng) hoặc list thay đổi tập text → in full.
+      // Nếu cùng tập text + chỉ khác trạng thái done → in diff (dòng vừa toggle).
+      const sameSet = prev.length === todos.length && todos.every((t) => prevByText.has(t.text));
+      stopSpin?.();
+      if (!sameSet) {
+        const lines = todos.map((t) => '    ' + (t.done ? '✓ ' : '☐ ') + t.text);
+        console.log((c?.tool || ((s) => s))(`  📋 todo (${done}/${todos.length})`));
+        console.log(lines.join('\n'));
+      } else {
+        // diff: in dòng có done thay đổi (cả false→true lẫn true→false).
+        const changes = todos.filter((t) => prevByText.get(t.text) !== t.done);
+        if (changes.length === 0) {
+          console.log((c?.dim || ((s) => s))(`  📋 todo (${done}/${todos.length}) · không đổi`));
+        } else {
+          console.log((c?.tool || ((s) => s))(`  📋 todo (${done}/${todos.length})`));
+          for (const ch of changes) {
+            const mark = ch.done ? '✓' : '☐';
+            console.log('    ' + mark + ' ' + ch.text);
+          }
+        }
+      }
+      startSpin?.();
+      return {
+        allow: true,
+        result: `Đã cập nhật ${todos.length} todo (${done} xong, ${todos.length - done} còn lại). Tiếp tục item chưa done; nếu tất cả done, kết thúc trả lời.`,
+      };
+    }
     // spawn_agent / spawn_agents chỉ được phép khi agentMode bật; depth giới hạn
     // bởi MAX_SUBAGENT_DEPTH để tránh đệ quy nổ.
     if (name === 'spawn_agent' || name === 'spawn_agents') {

package/src/repl.js CHANGED Viewed

@@ -1418,7 +1418,7 @@ NGUYÊN TẮC:
       // src/repl/agent-dispatch.js (v1.12.x). Factory được gọi MỖI turn vì abort
       // được rebind trong handle() — không cache.
       const dispatchTool = createAgentDispatcher({
-        state, abort, tokenMeter, stopSpin, startSpin, execTool,
+        state, abort, tokenMeter, stopSpin, startSpin, execTool, tui, c,
       });
       const answer = await runAgent({
@@ -1461,8 +1461,15 @@ NGUYÊN TẮC:
         printAnswer(answer, state.model.name, providerColor(state.model.provider));
       // Parse todo từ model output → render trên status bar.
-      state.todos = parseTodosFromHistory(state.history);
-      tui.setTodos(state.todos);
+      // Nếu lượt này model đã gọi write_todos (state._todosFromTool = true),
+      // state.todos + TUI đã được set trực tiếp trong dispatcher — SKIP parse
+      // markdown để không overwrite structured state bằng parser fragile.
+      if (state._todosFromTool) {
+        state._todosFromTool = false; // reset cho turn sau
+      } else {
+        state.todos = parseTodosFromHistory(state.history);
+        tui.setTodos(state.todos);
+      }
       return answer; // vòng ULTRA cần text này để dò token hoàn thành
     } catch (err) {
       stopSpin();

package/src/tools.js CHANGED Viewed

@@ -519,6 +519,17 @@ export const TOOLS = {
     return `Killed background process #${id} (${p.command}).`;
   },
+  // write_todos là tool ẢO: dispatcher (src/repl/agent-dispatch.js) intercept TRƯỚC
+  // khi vào execTool, nên stub này thường không chạy. Giữ để fail-safe: nếu có
+  // code path nào lỡ gọi runTool('write_todos', ...) trực tiếp, ít nhất trả OK
+  // thay vì ném Unknown tool — và stub trả lỗi rõ hướng dẫn fix.
+  async write_todos({ todos }, { signal } = {}) {
+    if (signal?.aborted) throw new Error('aborted');
+    if (!Array.isArray(todos)) throw new Error('write_todos: todos phải là mảng');
+    const done = todos.filter((it) => it && it.done).length;
+    return `(stub) write_todos nhận ${todos.length} item (${done} done) — dispatcher đáng lẽ phải intercept trước; nếu thấy dòng này, báo bug.`;
+  },
   // Knowledge graph tools — KHÔNG xin permission (user chọn tự do).
   // Storage: <cwd>/.noob/kg.jsonl. Logic ở src/kg.js.
   async kg_search({ query }, { signal } = {}) {
@@ -679,6 +690,11 @@ export function describe(name, input) {
       return `↳ sub-agent: ${String(input.task || '').slice(0, 80)}`;
     case 'spawn_agents':
       return `↳ ${(input.tasks || []).length} sub-agent song song`;
+    case 'write_todos': {
+      const items = Array.isArray(input?.todos) ? input.todos : [];
+      const done = items.filter((it) => it && it.done).length;
+      return `todo ${done}/${items.length}`;
+    }
     case 'kg_search':
       return `kg search "${input.query || ''}"`;
     case 'kg_add':