npm - @noobdemon/noob-cli - Versions diffs - 1.12.5 → 1.12.7 - Mend

@noobdemon/noob-cli 1.12.5 → 1.12.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/CHANGELOG.md +8 -0
package/package.json +1 -1
package/src/agent.js +11 -8
package/src/repl/agent-dispatch.js +52 -1
package/src/repl.js +23 -45
package/src/tokens.js +6 -2
package/src/tools.js +16 -0

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,14 @@
 Tất cả thay đổi đáng kể của `@noobdemon/noob-cli` được ghi vào file này.
+## [1.12.7] - 2026-06-12
+### Changed
+- **Gỡ auto-compact, chuyển sang MANUAL only** (`src/repl.js` + `src/tokens.js` + `src/agent.js`): trước đây CLI tự gọi `maybeSummarize({force:true})` khi context đạt 75% — gián đoạn workflow giữa chừng và summary có thể mất chi tiết user cần. Giờ user toàn quyền quyết định khi nào tóm tắt bằng `/compact`. Chỉ còn 2 mốc CẢNH BÁO (không auto-action): **60% (120k tokens)** nhắc nhẹ một lần, **80% (160k tokens)** cảnh báo mạnh gợi ý gõ `/compact` trước khi provider reject ở ~200k. Đồng thời sửa bug `CONTEXT_WINDOW=2_000_000` → `200_000` (khớp Claude Opus 4.7 + GPT-4o); ngưỡng cũ 2M khiến 75% = 1.5M token không bao giờ chạm → user báo `/compact không hoạt động`. `SUMMARIZE_THRESHOLD_CHARS` 6M → 600k, `MAX_PROMPT_CHARS` 1.2M → 800k, `keepTail` 16/24 → 12/16 cho khớp window thực.
+### Added
+- **Tool `write_todos`** (`src/repl/agent-dispatch.js` + `src/tools.js` + `src/agent.js`): tool ẢO để model declare structured todo list thay vì viết markdown `- [ ]`. Shape `{todos: [{text, done}]}` — REPLACE toàn bộ list mỗi lần gọi (no patch). Dispatcher intercept TRƯỚC `execTool`: set `state.todos` + `tui.setTodos` trực tiếp, set flag `state._todosFromTool=true` để `repl.js` skip parse markdown sau turn (tránh overwrite structured state). In compact box lần đầu, diff (chỉ dòng đổi) các lần sau. SYSTEM prompt rule TODO-BASED EXECUTION đã update: model PHẢI dùng `write_todos`, không viết markdown. Lý do: parser markdown cũ (`parseTodosFromHistory`) fragile khi model format sai (sai indent, dùng `*` thay `-`, thiếu space). Structured tool call → CLI render luôn đúng, progress bar trên status line cập nhật ngay. Stub trong `TOOLS.write_todos` làm fail-safe nếu lỡ qua `runTool` trực tiếp. Smoke `scripts/smoke-write-todos.mjs` 27/27 pass + regression `smoke-dispatch.mjs` 23/23 pass.
 ## [1.12.5] - 2026-06-12
 ### Added

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@noobdemon/noob-cli",
-  "version": "1.12.5",
+  "version": "1.12.7",
   "publishConfig": {
     "access": "public"
   },

package/src/agent.js CHANGED Viewed

@@ -29,12 +29,13 @@ Available tools (each is self-contained; pick the SMALLEST tool that answers the
 - run_command {"command": str, "timeout"?: int, "background"?: bool} — run a shell command in the cwd. Foreground commands are killed after ~60s (override with "timeout" ms). For long-running processes — dev servers, watchers, \`python -m http.server\`, \`npm run dev\`, \`flask run\` — set "background": true: starts the process, returns immediately, keeps running WITHOUT blocking next steps. Never start a server in the foreground (it will hang then be killed).
 - bg_output  {"id"?: int}                                     — no id: list background processes + status; with id: show that process's captured output so far (poll after starting a server to confirm it came up).
 - kill_bg     {"id": int}                                      — stop a background process started with run_command background:true.
+- write_todos {"todos": [{"text": str, "done": bool}, ...]}      — declare/update structured TODO list. REPLACES the entire list every call (no patching individual items). To check an item off, resend the FULL list with done:true on that item. Use this INSTEAD of writing markdown \`- [ ]\` lines: the CLI renders it as a progress bar on the status line AND prints a compact box, no fragile markdown parsing. Call ONCE at the start of any multi-step task with all items done:false, then call AGAIN after each step with the just-finished item flipped to done:true.
 # Retrieval strategy (just-in-time, not bulk)
 Context is finite. Don't slurp the whole repo up front. Discover information progressively: list_dir/glob to map → grep to locate → read_file (with offset+limit for big files) to inspect only what matters. Each tool result spends your attention budget — make every call earn it. When a tool returns a huge blob, extract the few facts you need, then move on; don't re-read it later (the result stays in history).
 # Rules
-- TODO-BASED EXECUTION: For multi-step tasks, you MUST keep going until ALL items are "- [x]". NEVER stop mid-list. Flow: (1) write todo list, (2) start first item, (3) after EVERY tool result, check off the completed item AND IMMEDIATELY start the next unchecked item, (4) repeat until all done. Your response is NOT finished until ALL items are checked. The ONLY valid reason to stop is: (a) all items done, or (b) you are WAITING for a user reply. If you just got a tool result, you MUST continue — do NOT output a summary, do NOT ask "what next", do NOT stop. After write_file/edit_file returns, immediately do the next item.
+- TODO-BASED EXECUTION: For any multi-step task (3+ actions), you MUST call \`write_todos\` FIRST with all items done:false, then call it AGAIN after every completed step with that item flipped to done:true (resend the full list). NEVER write markdown \`- [ ]\` lines — the runtime parses \`write_todos\` calls, not markdown. Your response is NOT finished until all items are done:true. The ONLY valid reason to stop is: (a) all items done, or (b) you are WAITING for a user reply. If you just got a tool result, you MUST continue — do NOT output a summary, do NOT ask "what next", do NOT stop. After write_file/edit_file returns, call write_todos to tick the just-finished item, then immediately start the next.
 - GROUND TRUTH = real TOOL RESULTs in this conversation, not your memory or what you intended to do. A file changed only if a write_file/edit_file result confirms it (see the FILES CHANGED list). A test passed / build succeeded / command worked only if a run_command result above shows it. Never narrate outcomes you didn't observe; if you haven't checked, say so and check now (read_file / list_dir / run the command). Before any "done/summary" reply, reconcile every file and result you're about to claim against the actual tool results above — if it isn't there, you didn't do it yet.
 - VERIFY BEFORE DISMISSING: never declare a TOOL RESULT "fake", "spurious", "injected", "unrelated", or "from a previous turn" without first verifying with a fresh tool call. If a result looks off (unexpected content, output you didn't ask for, weird command), your DEFAULT is: treat it as REAL runtime output, then run a small verification (read_file the affected path, grep for the symbol, list_dir, re-run the command) to confirm actual state. Only after the verification tool result contradicts the suspicious one may you call it stale/leftover — and even then, work from the FRESH result, never from your guess. Trusting your own skepticism over the runtime is the same over-confidence bug as hallucinating success: both substitute memory for evidence.
 - Investigate before editing: read the relevant files first; never invent file contents.
@@ -145,10 +146,12 @@ const MAX_STEPS = 10000;
 // loop detection cũ bằng cách xen kẽ 2-3 tool call khác nhau.
 const LOOP_DETECT_WINDOW = 6;
 const LOOP_DETECT_THRESHOLD = 2;
-const MAX_PROMPT_CHARS = 1200000; // ~300k tokens (ngang context window) — compact() KHÔNG chạy trước auto-compact 80% (240k token) của repl.js
+const MAX_PROMPT_CHARS = 800000; // ~200k tokens (ngang CONTEXT_WINDOW) — compact() là safety net cuối, repl.js auto-compact ở 75% (150k token) chạy trước.
 // Khi history vượt ngưỡng này, gọi model phụ tóm tắt các lượt cũ thay vì cắt cụt
 // → giữ được "trí nhớ dài hạn" trong phiên mà không nổ context.
-const SUMMARIZE_THRESHOLD_CHARS = 6000000; // ~1.5M tokens (75% window) — summarize chỉ chạy sau auto-compact 75% với CONTEXT_WINDOW=2M
+// 600k chars ≈ 150k tokens = trùng ngưỡng auto-compact 75% của repl.js. Khi
+// /compact thủ công hoặc auto-compact gọi với force=true thì ngưỡng này bị bypass.
+const SUMMARIZE_THRESHOLD_CHARS = 600000;
 // HARD GOAL block (do /goal <text> set): chèn ngay sau memoryBlock, attention
 // cao. Mục đích — chống 3 failure mode bài "dynamic workflows" của Anthropic
@@ -261,11 +264,11 @@ export async function maybeSummarize(history, { model, signal, force = false } =
   const totalChars = history.reduce((s, m) => s + (m.content?.length || 0), 0);
   if (!force && totalChars < SUMMARIZE_THRESHOLD_CHARS) return false;
   // Giữ tail nguyên vẹn; tóm tắt phần trước.
-  // Với CONTEXT_WINDOW = 2M tokens, tail cần đủ lớn để giữ context tool result
-  // gần nhất (vd 10 lượt cuối có thể là chuỗi edit_file + run_command đang dở).
-  // force (gọi từ /compact hoặc auto-compact 75%): giữ 16 tail.
-  // non-force: giữ 24 tail (rộng tay hơn vì phiên rất dài mới trigger).
-  const keepTail = force ? 16 : 24;
+  // Với CONTEXT_WINDOW = 200k tokens, tail cần đủ để giữ vài lượt tool result
+  // gần nhất (chuỗi edit_file + run_command đang dở).
+  // force (gọi từ /compact hoặc auto-compact 75%): giữ 12 tail.
+  // non-force: giữ 16 tail (rộng tay hơn vì phiên dài mới trigger).
+  const keepTail = force ? 12 : 16;
   if (history.length <= keepTail + 2) return false;
   // Nếu lượt đầu đã là summary (role=system, name=summary) → tóm tắt thêm.
   const head = history.slice(0, history.length - keepTail);

package/src/repl/agent-dispatch.js CHANGED Viewed

@@ -32,7 +32,7 @@ import { t } from '../i18n.js';
  * @returns {function} dispatchTool(name, input, depth=0) → {allow, result}
  */
 export function createAgentDispatcher(deps) {
-  const { state, abort, tokenMeter, stopSpin, startSpin, execTool } = deps;
+  const { state, abort, tokenMeter, stopSpin, startSpin, execTool, tui, c } = deps;
   // Test injection points: production luôn dùng default; smoke test pass mock.
   const runSubAgent = deps.runSubAgent || defaultRunSubAgent;
   const findModel = deps.findModel || defaultFindModel;
@@ -44,6 +44,57 @@ export function createAgentDispatcher(deps) {
   const recordWorkflowTaskFailed = j.recordTaskFailed;
   const dispatchTool = async (name, input, depth = 0) => {
+    // write_todos: tool ẢO cập nhật state.todos + TUI trực tiếp. Không qua execTool
+    // vì không phải fs/shell — chỉ là cách model declare structured todo list thay
+    // vì viết markdown `- [ ]` (parser markdown fragile khi format sai). Mỗi lần
+    // gọi REPLACE toàn bộ list (không patch từng item — model gửi lại full list
+    // với done:true cho item vừa xong). State.todos được set NGAY → TUI render
+    // chính xác, không cần parse history.
+    if (name === 'write_todos') {
+      const todosIn = Array.isArray(input?.todos) ? input.todos : null;
+      if (!todosIn)
+        return { allow: true, result: 'ERROR: write_todos cần field "todos": [{text, done}].' };
+      const todos = todosIn
+        .filter((it) => it && typeof it.text === 'string' && it.text.trim())
+        .map((it) => ({ text: String(it.text).trim(), done: !!it.done }));
+      if (!todos.length)
+        return { allow: true, result: 'ERROR: todos rỗng — gửi ít nhất 1 item {text, done}.' };
+      const prev = Array.isArray(state.todos) ? state.todos : [];
+      const prevByText = new Map(prev.map((p) => [p.text, !!p.done]));
+      state.todos = todos;
+      // Flag: lượt này model đã dùng write_todos → repl skip parse markdown để
+      // không overwrite structured state bằng parser fragile. Reset đầu mỗi turn.
+      state._todosFromTool = true;
+      try { tui?.setTodos?.(todos); } catch {}
+      const done = todos.filter((t) => t.done).length;
+      // In compact: lần đầu (prev rỗng) hoặc list thay đổi tập text → in full.
+      // Nếu cùng tập text + chỉ khác trạng thái done → in diff (dòng vừa toggle).
+      const sameSet = prev.length === todos.length && todos.every((t) => prevByText.has(t.text));
+      stopSpin?.();
+      if (!sameSet) {
+        const lines = todos.map((t) => '    ' + (t.done ? '✓ ' : '☐ ') + t.text);
+        console.log((c?.tool || ((s) => s))(`  📋 todo (${done}/${todos.length})`));
+        console.log(lines.join('\n'));
+      } else {
+        // diff: in dòng có done thay đổi (cả false→true lẫn true→false).
+        const changes = todos.filter((t) => prevByText.get(t.text) !== t.done);
+        if (changes.length === 0) {
+          console.log((c?.dim || ((s) => s))(`  📋 todo (${done}/${todos.length}) · không đổi`));
+        } else {
+          console.log((c?.tool || ((s) => s))(`  📋 todo (${done}/${todos.length})`));
+          for (const ch of changes) {
+            const mark = ch.done ? '✓' : '☐';
+            console.log('    ' + mark + ' ' + ch.text);
+          }
+        }
+      }
+      startSpin?.();
+      return {
+        allow: true,
+        result: `Đã cập nhật ${todos.length} todo (${done} xong, ${todos.length - done} còn lại). Tiếp tục item chưa done; nếu tất cả done, kết thúc trả lời.`,
+      };
+    }
     // spawn_agent / spawn_agents chỉ được phép khi agentMode bật; depth giới hạn
     // bởi MAX_SUBAGENT_DEPTH để tránh đệ quy nổ.
     if (name === 'spawn_agent' || name === 'spawn_agents') {

package/src/repl.js CHANGED Viewed

@@ -1418,7 +1418,7 @@ NGUYÊN TẮC:
       // src/repl/agent-dispatch.js (v1.12.x). Factory được gọi MỖI turn vì abort
       // được rebind trong handle() — không cache.
       const dispatchTool = createAgentDispatcher({
-        state, abort, tokenMeter, stopSpin, startSpin, execTool,
+        state, abort, tokenMeter, stopSpin, startSpin, execTool, tui, c,
       });
       const answer = await runAgent({
@@ -1461,8 +1461,15 @@ NGUYÊN TẮC:
         printAnswer(answer, state.model.name, providerColor(state.model.provider));
       // Parse todo từ model output → render trên status bar.
-      state.todos = parseTodosFromHistory(state.history);
-      tui.setTodos(state.todos);
+      // Nếu lượt này model đã gọi write_todos (state._todosFromTool = true),
+      // state.todos + TUI đã được set trực tiếp trong dispatcher — SKIP parse
+      // markdown để không overwrite structured state bằng parser fragile.
+      if (state._todosFromTool) {
+        state._todosFromTool = false; // reset cho turn sau
+      } else {
+        state.todos = parseTodosFromHistory(state.history);
+        tui.setTodos(state.todos);
+      }
       return answer; // vòng ULTRA cần text này để dò token hoàn thành
     } catch (err) {
       stopSpin();
@@ -1474,53 +1481,24 @@ NGUYÊN TẮC:
       // Reset turn-scoped auto-approve — chỉ áp dụng trong runAgent vừa rồi.
       // (autoApprove + autoApproveFile vẫn giữ nguyên cho phiên.)
       state.autoApproveTurn.clear();
-      // Auto-compact dựa trên context tokens thay vì chars.
-      // Với CONTEXT_WINDOW = 2M tokens (xem src/tokens.js):
-      //   75% (1.5M tokens) → auto compact
-      //   60% (1.2M tokens) → cảnh báo mạnh
-      //   40% (800k tokens) → nhắc nhẹ
-      // Ngưỡng kéo xuống vì model context dài hiện tại để 80% mới compact thì
-      // mỗi lượt cuối đã ăn 200k+ tokens — auto-compact sớm hơn giữ phiên mượt.
+      // [2026-06-12] GỠ AUTO-COMPACT — user kiểm soát compact thủ công bằng /compact.
+      // Lý do: auto-compact gián đoạn workflow giữa chừng, summary có thể mất chi
+      // tiết user cần. Giữ 2 mốc CẢNH BÁO (60% / 80%) để user biết khi nào nên
+      // chạy /compact, nhưng KHÔNG tự động chạy nữa.
+      // Với CONTEXT_WINDOW = 200k tokens:
+      //   60% (120k) → nhắc nhẹ một lần
+      //   80% (160k) → cảnh báo mạnh — nên /compact ngay trước khi provider reject
       try {
         const totalTokens = countMessages(state.history);
         const k = Math.round(totalTokens / 1000);
         const pct = Math.round((totalTokens / CONTEXT_WINDOW) * 100);
-        // Mốc 3 (≥75% — 1.5M tokens): TỰ ĐỘNG compact.
-        if (totalTokens >= CONTEXT_WINDOW * 0.75 && !state._autoCompacting) {
-          state._autoCompacting = true;
-          console.log(c.accent(`  ⚡ ${t.autoCompactTrigger(k)} (${pct}% context)`));
-          tui.setBusy(true, t.compactRunning);
-          try {
-            const ok = await maybeSummarize(state.history, { model: state.model, force: true });
-            tui.setBusy(false);
-            if (ok) {
-              const afterTokens = countMessages(state.history);
-              const aK = Math.round(afterTokens / 1000);
-              const saved =
-                totalTokens > 0 ? Math.round(((totalTokens - afterTokens) / totalTokens) * 100) : 0;
-              console.log(
-                c.ok(
-                  `  ${t.autoCompactDone(k, aK, saved)} (${Math.round((afterTokens / CONTEXT_WINDOW) * 100)}% context)`
-                )
-              );
-              state._longSessionWarned = false;
-              persist();
-            } else {
-              console.log(c.err('  ' + t.autoCompactFail));
-            }
-          } catch (e) {
-            tui.setBusy(false);
-            console.log(c.err('  ' + t.autoCompactFail));
-          } finally {
-            state._autoCompacting = false;
-          }
-        } else if (totalTokens >= CONTEXT_WINDOW * 0.6) {
-          // Mốc 2 (≥60% — 1.2M tokens): cảnh báo mạnh.
-          console.log(c.err(`  ⚠ ${t.veryLongSession(k)} (${pct}% context)`));
+        if (totalTokens >= CONTEXT_WINDOW * 0.8) {
+          // Mốc 2 (≥80% — 160k tokens): cảnh báo mạnh, gợi ý /compact ngay.
+          console.log(c.err(`  ⚠ ${t.veryLongSession(k)} (${pct}% context) — gõ /compact để tóm tắt, tránh provider reject ở ~200k.`));
           state._longSessionWarned = true;
-        } else if (totalTokens >= CONTEXT_WINDOW * 0.4 && !state._longSessionWarned) {
-          // Mốc 1 (≥40% — 800k tokens): nhắc nhẹ một lần.
-          console.log(c.dim(`  ⓘ ${t.longSession(k)} (${pct}% context)`));
+        } else if (totalTokens >= CONTEXT_WINDOW * 0.6 && !state._longSessionWarned) {
+          // Mốc 1 (≥60% — 120k tokens): nhắc nhẹ một lần.
+          console.log(c.dim(`  ⓘ ${t.longSession(k)} (${pct}% context) — cân nhắc /compact nếu phiên còn dài.`));
           state._longSessionWarned = true;
         }
       } catch {}

package/src/tokens.js CHANGED Viewed

@@ -57,8 +57,12 @@ export function countMessages(messages = []) {
 // window đủ rộng (256 chars) để qua mọi ranh giới token thực tế của cl100k/o200k
 // (token dài nhất ~ vài chục byte).
 const TAIL_WINDOW = 256;
-// Context window tối đa của model (2M tokens). Dùng để tính % usage realtime.
-export const CONTEXT_WINDOW = 2_000_000;
+// Context window tối đa của model. Đặt 200k tokens — match Claude 3.5/Opus 4,
+// GPT-4o, và an toàn cho mọi model phổ biến qua gateway (Gemini 1M, DeepSeek
+// 128k, Grok 128k...). Đặt cao hơn 200k là vô nghĩa: provider sẽ reject prompt
+// TRƯỚC khi auto-compact của repl.js có cơ hội trigger → user thấy 'compact
+// không hoạt động' dù logic compact vẫn đúng.
+export const CONTEXT_WINDOW = 200_000;
 export class TokenMeter {
   constructor() {

package/src/tools.js CHANGED Viewed

@@ -519,6 +519,17 @@ export const TOOLS = {
     return `Killed background process #${id} (${p.command}).`;
   },
+  // write_todos là tool ẢO: dispatcher (src/repl/agent-dispatch.js) intercept TRƯỚC
+  // khi vào execTool, nên stub này thường không chạy. Giữ để fail-safe: nếu có
+  // code path nào lỡ gọi runTool('write_todos', ...) trực tiếp, ít nhất trả OK
+  // thay vì ném Unknown tool — và stub trả lỗi rõ hướng dẫn fix.
+  async write_todos({ todos }, { signal } = {}) {
+    if (signal?.aborted) throw new Error('aborted');
+    if (!Array.isArray(todos)) throw new Error('write_todos: todos phải là mảng');
+    const done = todos.filter((it) => it && it.done).length;
+    return `(stub) write_todos nhận ${todos.length} item (${done} done) — dispatcher đáng lẽ phải intercept trước; nếu thấy dòng này, báo bug.`;
+  },
   // Knowledge graph tools — KHÔNG xin permission (user chọn tự do).
   // Storage: <cwd>/.noob/kg.jsonl. Logic ở src/kg.js.
   async kg_search({ query }, { signal } = {}) {
@@ -679,6 +690,11 @@ export function describe(name, input) {
       return `↳ sub-agent: ${String(input.task || '').slice(0, 80)}`;
     case 'spawn_agents':
       return `↳ ${(input.tasks || []).length} sub-agent song song`;
+    case 'write_todos': {
+      const items = Array.isArray(input?.todos) ? input.todos : [];
+      const done = items.filter((it) => it && it.done).length;
+      return `todo ${done}/${items.length}`;
+    }
     case 'kg_search':
       return `kg search "${input.query || ''}"`;
     case 'kg_add':