@pentoshi/clai 1.0.4 → 1.0.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -66,10 +66,11 @@ clai -y "list the 10 largest files in my home directory"
66
66
  ## Features
67
67
 
68
68
  - **`/ask` mode** — Read-only. AI explains, gives commands & step-by-step guidance, but does NOT execute anything.
69
- - **`/agent` mode** — Agentic. AI plans, then executes shell commands, edits files, installs missing tools, parses output, and continues until the goal is met.
69
+ - **`/agent` mode** — Agentic. AI plans, waits for approval, then executes shell commands, edits files, installs missing tools, parses output, and continues until the goal is met. Tasks run on an approve/refine/discard plan workflow (`/implement`, free-text to refine, `/discard` to cancel).
70
70
  - **7 LLM providers** — Groq, Google Gemini, OpenRouter, OpenAI, Anthropic, NVIDIA NIM, and Ollama (local). All with streaming.
71
71
  - **10 built-in tools** — `shell.exec`, `fs.read`, `fs.write`, `fs.list`, `fs.search`, `pkg.install`, `net.scan`, `http.fetch`, `sysinfo`, `pentest.recon`.
72
72
  - **Smart safety gate** — Read-only commands auto-execute; mutating commands require confirmation; destructive patterns are blocked.
73
+ - **OS-aware & tool-frugal** — Picks the best approach for your OS, prefers tools already installed (installs only when nothing suitable exists), broadens its approach and escalates privileges as needed to finish the task.
73
74
  - **Cross-platform** — macOS, Linux, and Windows. Detects OS-native package managers (brew, apt, dnf, pacman, winget, choco).
74
75
  - **Pentest-aware** — nmap, nikto, sqlmap, gobuster, ffuf, hydra, masscan, whois, dig, netcat, tshark.
75
76
  - **Auto-update** — Checks for new versions on startup; run `/update` or `clai update` to upgrade.
@@ -153,6 +154,9 @@ export OLLAMA_HOST=http://localhost:11434
153
154
  | `/reset` | Clear all saved history |
154
155
  | `/cwd <path>` | Change working directory |
155
156
  | `/allow <tool>` | Whitelist a tool for the session |
157
+ | `/plan` | View the current session plan (also `Ctrl+P`) |
158
+ | `/implement` | Approve the current plan and have clai execute it |
159
+ | `/discard` | Discard the current plan so later messages ignore it |
156
160
  | `/scope add <targets>` | Add authorized pentest targets |
157
161
  | `/fallback [on|off]` | Try other configured providers after a failure |
158
162
  | `/update` | Check for updates |
@@ -160,6 +164,21 @@ export OLLAMA_HOST=http://localhost:11434
160
164
  | `/help` | List commands |
161
165
  | `Ctrl+C` | Abort current response (second Ctrl+C exits) |
162
166
  | `Ctrl+O` | Toggle full tool output (same keys on all OSes) |
167
+ | `Ctrl+P` | View the current session plan |
168
+
169
+ ### Plan → Implement workflow
170
+
171
+ For multi-step coding or pentest tasks, clai first proposes a **plan** (a goal,
172
+ an approach, and an ordered task checklist) and then waits. Nothing runs until
173
+ you approve it.
174
+
175
+ - **Approve** — type `/implement` to execute the plan task by task.
176
+ - **Refine** — type any normal message (e.g. "use only installed tools",
177
+ "skip task 2", "also enumerate subdomains") and clai produces a **revised
178
+ plan**, then waits again. While a plan is awaiting approval, free-text is
179
+ treated as plan feedback, not as a signal to start running.
180
+ - **Cancel** — type `/discard` to drop the plan. After discarding, later
181
+ messages are independent of it.
163
182
 
164
183
  ## Built-in Tools (Agent Mode)
165
184
 
@@ -171,10 +190,10 @@ export OLLAMA_HOST=http://localhost:11434
171
190
  | `fs.list` | List directory contents | safe |
172
191
  | `fs.search` | Search files with ripgrep (falls back to grep) | safe |
173
192
  | `pkg.install` | Install packages via detected OS package manager | confirm |
174
- | `net.scan` | Nmap wrapper for port scanning | confirm |
193
+ | `net.scan` | Nmap wrapper. Defaults to a stealth SYN scan, auto-elevates (sudo/doas/gsudo) and falls back to an unprivileged TCP connect scan | confirm |
175
194
  | `http.fetch` | HTTP GET/POST with response size limits | safe |
176
195
  | `sysinfo` | OS, architecture, shell, and working directory info | safe |
177
- | `pentest.recon` | Composite: whois + dig + nmap top-100 ports | confirm |
196
+ | `pentest.recon` | Composite: whois + dig + stealth nmap top-100 ports | confirm |
178
197
 
179
198
  > \* **smart** = read-only commands (`curl`, `ls`, `whoami`, `gobuster`, `dirb`, etc.) auto-execute; mutating commands require confirmation.
180
199
 
@@ -284,6 +303,55 @@ npm test # Run tests (39 tests)
284
303
  npm run compile # Build native binaries (requires Bun)
285
304
  ```
286
305
 
306
+ ## Releasing
307
+
308
+ Releases are fully automated by `.github/workflows/release.yml`, triggered when
309
+ you push a `v*.*.*` tag. To cut a release:
310
+
311
+ ```sh
312
+ npm version 1.0.6 --no-git-tag-version # bump package.json + lockfile
313
+ # also bump: src/commands/update.ts (FALLBACK_VERSION),
314
+ # manifests/homebrew/clai.rb, manifests/scoop/clai.json
315
+ git commit -am "v1.0.6"
316
+ git push origin main
317
+ git tag -a v1.0.6 -m "clai v1.0.6"
318
+ git push origin v1.0.6 # this triggers the workflow
319
+ ```
320
+
321
+ On the tag push the workflow:
322
+
323
+ 1. **build** — runs typecheck + tests and compiles native binaries for all platforms.
324
+ 2. **publish** — creates the GitHub Release with the binaries and SHA256 sidecars.
325
+ 3. **publish-npm** — publishes `@pentoshi/clai` to npm.
326
+ 4. **sync-tap** — regenerates the Homebrew formula in `pentoshi007/homebrew-clai`.
327
+
328
+ > **Reruns don't pick up newer workflow code.** Re-running a workflow runs it
329
+ > against the commit the tag points to. If you change `release.yml` after
330
+ > tagging, you must move/recreate the tag (or cut a new version) for the change
331
+ > to take effect.
332
+
333
+ Required repository secrets (Settings → Secrets and variables → Actions). Each
334
+ job skips gracefully if its secret is absent:
335
+
336
+ | Secret | Used by | How to create |
337
+ |--------------------|---------------|-------------------------------------------------------------------------------|
338
+ | `NPM_TOKEN` | `publish-npm` | npm → Access Tokens → **Granular** (Read and write on `@pentoshi/clai`) or classic **Automation** token. These bypass the interactive OTP prompt that blocks CI. |
339
+ | `TAP_GITHUB_TOKEN` | `sync-tap` | A GitHub PAT with `contents:write` on the `pentoshi007/homebrew-clai` repo |
340
+
341
+ Optional repository **variable** (not a secret):
342
+
343
+ | Variable | Effect |
344
+ |------------------|---------------------------------------------------------------------------------|
345
+ | `NPM_PROVENANCE` | Set to `true` to publish with `--provenance`. Only works if the npm account's 2FA is set to **"authorization only"**. Leave unset otherwise — the job publishes without provenance. |
346
+
347
+ The `publish-npm` job verifies the tag matches `package.json` version and skips
348
+ if that version is already on npm, so re-running a tag is safe.
349
+
350
+ > A normal account with 2FA set to **"auth and writes"** prompts for a one-time
351
+ > password on every publish, which fails in CI. Use a Granular/Automation
352
+ > `NPM_TOKEN` (token-level auth) so CI can publish without an OTP — you can keep
353
+ > 2FA enabled on the account.
354
+
287
355
  ## Architecture
288
356
 
289
357
  ```
@@ -72,6 +72,16 @@ export declare function recognizeBareToolJson(text: string): {
72
72
  * multi-file fs.writeMany scaffold) silently never runs.
73
73
  */
74
74
  export declare function looksLikeTruncatedToolCall(text: string): boolean;
75
+ /**
76
+ * Count the number of ```tool fenced blocks in a message. Models sometimes
77
+ * emit MULTIPLE tool calls in one response (e.g. fs.writeMany + npm install +
78
+ * npm run dev). Only the FIRST is parsed and executed; the rest are silently
79
+ * dropped and leak to the screen as code fences, while the model believes it
80
+ * ran all of them — a major cause of "everything is done" fabrications. We
81
+ * detect this so the runner can run the first and explicitly tell the model
82
+ * the others did NOT run and must be re-sent one at a time.
83
+ */
84
+ export declare function countToolFences(text: string): number;
75
85
  /**
76
86
  * Decide whether this turn should get a generous step budget because it is
77
87
  * a multi-file build, a continuation of one, or a "it's not done yet" nudge.
@@ -88,4 +98,5 @@ export declare function requiresFreshWebSearch(prompt: string): boolean;
88
98
  */
89
99
  export declare function isLumpedSingleTask(taskTitles: string[]): boolean;
90
100
  export declare function shouldDimToolChatter(call: ToolCall): boolean;
101
+ export declare function isPreApprovalAllowedTool(name: string): boolean;
91
102
  export declare function runAgentLoop(prompt: string, options?: AgentRunOptions): Promise<string>;
@@ -367,6 +367,19 @@ export function looksLikeTruncatedToolCall(text) {
367
367
  }
368
368
  return false;
369
369
  }
370
+ /**
371
+ * Count the number of ```tool fenced blocks in a message. Models sometimes
372
+ * emit MULTIPLE tool calls in one response (e.g. fs.writeMany + npm install +
373
+ * npm run dev). Only the FIRST is parsed and executed; the rest are silently
374
+ * dropped and leak to the screen as code fences, while the model believes it
375
+ * ran all of them — a major cause of "everything is done" fabrications. We
376
+ * detect this so the runner can run the first and explicitly tell the model
377
+ * the others did NOT run and must be re-sent one at a time.
378
+ */
379
+ export function countToolFences(text) {
380
+ const matches = text.match(/```tool\s*\n[\s\S]*?```/gi);
381
+ return matches ? matches.length : 0;
382
+ }
370
383
  /** Extract the text before the tool call block for display purposes */
371
384
  function textBeforeToolCall(text) {
372
385
  const patterns = [
@@ -527,15 +540,23 @@ function buildWorkflowDirective() {
527
540
  return [
528
541
  "BUILD WORKFLOW (this is a build/scaffold/feature task — follow this order EXACTLY; deviation is a failure):",
529
542
  "1. EXPLORE: fs.list the working directory (and key subdirs) to see what already exists. Use tool.batch to parallelize reads.",
530
- "2. UNDERSTAND: fs.read the files that matter (like package.json for js related and same for other languages too, config, entry points, existing components). Detect the existing stack/tooling and MATCH it. If the dir is empty or only has a stub, start fresh with a sensible modern default (e.g. Vite + React) and say so.",
531
- "3. PLAN: call plan.create with a COMPREHENSIVE plan — a detailed `detail` (stack chosen and WHY, architecture, how you'll verify) and 4-8 SEPARATE, ordered, high-quality tasks. NEVER cram everything into one task (e.g. one task that lists 8 files is rejected). Each task is one distinct, verifiable action. Then STOP and wait for the user to /implement.",
532
- "4. IMPLEMENT: once approved, work task by task in STRICT ORDER across MULTIPLE steps. Start with the FIRST pending task. For each task: call task.update {taskId, state:'in_progress'} → do the real work (fs.writeMany for files, pkg.install / npm install, shell.start for the dev server) → VERIFY it succeeded → call task.update {taskId, state:'done'}, then move to the NEXT task. Keep going until EVERY task is done and the goal is achieved. Do NOT stop after one file or one step, and do NOT claim work you didn't actually run.",
543
+ "2. UNDERSTAND: fs.read the files that matter (like package.json for js related and same for other languages too, config, entry points, existing components). Detect the existing stack/tooling and MATCH it. If the dir is empty or only has a stub, start fresh with a sensible modern default and say so.",
544
+ "3. PLAN: call plan.create with a COMPREHENSIVE plan — a detailed `detail` (stack chosen and WHY, architecture, how you'll verify) and 4-8 SEPARATE, ordered, high-quality tasks. The FIRST task must be to INITIALIZE the project with its official scaffolder (NOT hand-writing package.json). Each task is one distinct, verifiable action. Then STOP and wait for the user to /implement.",
545
+ "4. IMPLEMENT: once approved, work task by task in STRICT ORDER across MULTIPLE steps, ONE tool call per turn. For each task: call task.update {taskId, state:'in_progress'} → do the real work VERIFY it actually succeeded (read a file you wrote, check the command's exit/output) → call task.update {taskId, state:'done'}, then move to the NEXT task. Keep going until EVERY task is done. Do NOT stop after one step, and do NOT claim work you didn't actually run.",
546
+ "",
547
+ "INITIALIZE WITH THE OFFICIAL SCAFFOLDER FIRST (do NOT hand-write build configs):",
548
+ "- React/Vue/Svelte/vanilla → `npm create vite@latest <name> -- --template react` (or react-ts, vue, svelte). Next.js → `npx create-next-app@latest <name> --yes`. Vue → `npm create vue@latest`. Astro → `npm create astro@latest`. Node API → `npm init -y` then add deps. Python → `uv init` / `python -m venv`. Use the ecosystem's standard initializer for the framework.",
549
+ "- Run the scaffolder NON-INTERACTIVELY (pass flags/--yes) via shell.exec, into the current directory. THEN run the install (npm install) and adapt/add only the files the app needs (components, routes, styles) with fs.write/fs.writeMany. Do NOT recreate what the scaffolder already generated.",
550
+ "- Only hand-write package.json/config when there is genuinely no suitable scaffolder for the stack.",
533
551
  "",
534
552
  "CRITICAL RULES during IMPLEMENTATION:",
553
+ "- EXACTLY ONE ```tool block per message. NEVER put several tool calls (e.g. fs.writeMany + npm install + npm run dev) in one response — only the first runs and the rest are silently discarded, which is how false 'all done' claims happen.",
535
554
  "- Do NOT re-explore. Step 1 (EXPLORE) was already completed during planning. Start executing the first pending task immediately.",
536
555
  "- ONE task at a time, in ORDER. Do NOT skip ahead to task 3 before task 2 is done.",
537
- "- If a tool call FAILS (error output, non-zero exit, file missing), the task is NOT done. Mark it 'failed', diagnose WHY it failed, fix the problem, and retry until it succeeds.",
538
- "- NEVER claim a task is done, a dependency is installed, or a server is running unless the tool call actually succeeded and you saw the success output.",
556
+ "- VERIFY each step before marking it done: after writing files, fs.list/fs.read to confirm they exist; after an install, check it exited 0; after starting the dev server with shell.start, confirm the job is running. Marking a task done without a successful tool call is the worst failure.",
557
+ "- If a tool call FAILS (error output, non-zero exit, file missing), the task is NOT done. Mark it 'failed', diagnose WHY, fix it, and retry until it succeeds.",
558
+ "- NEVER claim a task is done, files were created, a dependency is installed, or a server is running unless the tool call ACTUALLY succeeded and you saw the success output. If you have not run it, say so.",
559
+ "- Start a dev server with shell.start (background job), NOT `npm run dev &` via shell.exec.",
539
560
  "",
540
561
  "FORBIDDEN before plan approval (/implement): you MUST NOT use fs.write, fs.writeMany, fs.edit, shell.exec, shell.start, pkg.install, or pkg.uninstall. The ONLY tool allowed before approval is plan.create (and the read/list tools for exploration). If you are nudged to 'take action' before a plan exists, your action MUST be plan.create.",
541
562
  "If the task is genuinely trivial (a single tiny file), you may skip the plan — but for an app/feature, ALWAYS plan first.",
@@ -544,6 +565,48 @@ function buildWorkflowDirective() {
544
565
  export function shouldDimToolChatter(call) {
545
566
  return call.name === "web.search";
546
567
  }
568
+ /**
569
+ * Re-assert raw mode AND resume stdin after an inquirer prompt
570
+ * (confirm/password). inquirer's readline interface pauses stdin and
571
+ * switches it to cooked mode when it closes; if we only flip raw mode back
572
+ * on but leave stdin paused, no `keypress`/`data` events flow to the REPL's
573
+ * ESC/Ctrl+C abort handler — so a long-running tool launched right after a
574
+ * confirmation can no longer be aborted (the user had to kill the terminal).
575
+ * Calling resume() restores the event flow.
576
+ */
577
+ function restoreInteractiveStdin() {
578
+ if (!process.stdin.isTTY)
579
+ return;
580
+ try {
581
+ if (!process.stdin.isRaw) {
582
+ process.stdin.setRawMode(true);
583
+ }
584
+ process.stdin.resume();
585
+ }
586
+ catch {
587
+ /* ignore */
588
+ }
589
+ }
590
+ /**
591
+ * Tools allowed while an UN-approved plan is active. Before the user runs
592
+ * /implement, the agent may only (re)create the plan and do read-only
593
+ * exploration to refine it — never execute. Everything else is blocked by
594
+ * the plan-awaiting-approval gate so a stray/recovered tool call can't start
595
+ * running the plan, and so free-text after a plan is treated as a revision.
596
+ */
597
+ const PRE_APPROVAL_ALLOWED_TOOLS = new Set([
598
+ "plan.create",
599
+ "task.update",
600
+ "fs.read",
601
+ "fs.list",
602
+ "fs.search",
603
+ "sysinfo",
604
+ "tool.batch",
605
+ "net.context",
606
+ ]);
607
+ export function isPreApprovalAllowedTool(name) {
608
+ return PRE_APPROVAL_ALLOWED_TOOLS.has(name);
609
+ }
547
610
  function styleToolChatter(call, text) {
548
611
  return shouldDimToolChatter(call) ? chalk.dim(text) : text;
549
612
  }
@@ -687,8 +750,13 @@ function planContextMessage(plan, approved) {
687
750
  "Never claim something ran without a successful tool call.");
688
751
  }
689
752
  else {
690
- lines.push("This plan is NOT yet approved. If the user is refining it, update it with plan.create again. " +
691
- "Do NOT execute tasks until the user runs /implement.");
753
+ lines.push("This plan is NOT yet approved, so you MUST NOT execute any of its tasks yet. " +
754
+ "Any new free-text message from the user right now is a PLAN REVISION, not approval — even if it " +
755
+ "sounds like an instruction (e.g. 'do not install new tools', 'use only X', 'also add Y', 'skip task 2'). " +
756
+ "Treat it as feedback: call plan.create AGAIN with the revised goal/detail/tasks to produce an updated " +
757
+ "plan, then STOP and wait. Do NOT call shell.exec, pkg.install, net.scan, tool.check, fs.write, or any " +
758
+ "other execution tool. The user will APPROVE with /implement, or CANCEL with /discard. Only after " +
759
+ "/implement may you begin executing.");
692
760
  }
693
761
  return lines.join("\n");
694
762
  }
@@ -744,15 +812,18 @@ async function handlePlanTool(call, session, ctx) {
744
812
  const display = chalk.cyan(" ● planning\n") +
745
813
  checklist +
746
814
  "\n" +
747
- chalk.dim(" ✦ plan created — press Ctrl+P to view it, or type /implement to approve and run it\n");
815
+ chalk.dim(" ✦ plan created — press Ctrl+P to view it, /implement to approve and run it,\n" +
816
+ " or /discard to cancel it. Any other message refines this plan.\n");
748
817
  return {
749
818
  handled: true,
750
819
  ok: true,
751
820
  display,
752
- modelNote: `Plan saved with ${plan.tasks.length} task(s). STOP here and wait. ` +
821
+ modelNote: `Plan saved with ${plan.tasks.length} task(s). STOP here and wait — produce NO other tool calls now. ` +
753
822
  "Do NOT start executing tasks until the user approves with /implement. " +
754
- "When approved you will receive a message telling you to begin; then work task by task, " +
755
- "calling task.update to mark each in_progress before and done after you finish it.",
823
+ "If the user's next message gives feedback instead of /implement, that is a REVISION: call plan.create " +
824
+ "again with the updated plan and STOP again. The user may cancel the whole plan with /discard. " +
825
+ "Only after /implement do you begin, working task by task, calling task.update to mark each " +
826
+ "in_progress before and done after you finish it.",
756
827
  };
757
828
  }
758
829
  // task.update
@@ -891,6 +962,10 @@ export async function runAgentLoop(prompt, options = {}) {
891
962
  // ignores the freshness guard and tries to answer from stale memory.
892
963
  let sawFreshWebSearch = false;
893
964
  let freshnessRetryUsed = false;
965
+ // Guard against a model that declares an approved plan "complete" while
966
+ // tasks are still pending and it never ran the work. We nudge it back to
967
+ // executing the next task a bounded number of times before giving up.
968
+ let prematureCompletionRetries = 0;
894
969
  // ── Step budget ───────────────────────────────────────────────────
895
970
  // The budget governs how many *productive* steps (a tool execution or a
896
971
  // final answer) the agent may take. Recovery iterations — nudging a model
@@ -1142,6 +1217,33 @@ export async function runAgentLoop(prompt, options = {}) {
1142
1217
  });
1143
1218
  continue;
1144
1219
  }
1220
+ // ── Premature-completion guard (approved plan still has work) ──────
1221
+ // If the user approved a plan and the model now gives a final answer
1222
+ // while tasks are still pending/in_progress — without having run the
1223
+ // work — it is fabricating completion (the exact "all tasks completed,
1224
+ // running at localhost:5173" failure). Force it back to executing the
1225
+ // next real task instead of accepting the false claim.
1226
+ if (session.planApproved.value && prematureCompletionRetries < 3) {
1227
+ const livePlan = await loadPlan(session.sessionId).catch(() => undefined);
1228
+ const unfinished = livePlan?.tasks.filter((t) => t.state === "pending" || t.state === "in_progress");
1229
+ if (livePlan && unfinished && unfinished.length > 0) {
1230
+ prematureCompletionRetries += 1;
1231
+ const next = unfinished[0];
1232
+ process.stdout.write(chalk.yellow(` ⚠ ${unfinished.length} plan task(s) still unfinished — not accepting a "done" claim; resuming execution\n`));
1233
+ messages.push({ role: "assistant", content: assistantText.visible });
1234
+ messages.push({
1235
+ role: "user",
1236
+ content: `You have NOT finished the approved plan: ${unfinished.length} task(s) remain ` +
1237
+ `(${unfinished.map((t) => `[${t.id}] ${t.title}`).join("; ")}). ` +
1238
+ `Do NOT claim the work is complete, that files were created, or that a server is running ` +
1239
+ `unless a tool call actually succeeded and you saw the output. ` +
1240
+ `Resume now with the NEXT task ${next.id} ("${next.title}"): call task.update {taskId:"${next.id}", state:"in_progress"}, ` +
1241
+ `then do the real work with a tool call (fs.writeMany / shell.exec / shell.start), VERIFY it, and mark it done. ` +
1242
+ `Continue task by task until EVERY task is actually finished.`,
1243
+ });
1244
+ continue;
1245
+ }
1246
+ }
1145
1247
  if (cleaned) {
1146
1248
  process.stdout.write(renderMarkdown(cleaned));
1147
1249
  if (!cleaned.endsWith("\n"))
@@ -1192,6 +1294,14 @@ export async function runAgentLoop(prompt, options = {}) {
1192
1294
  process.stdout.write(`${renderThinkingSummary(assistantText.thinkContent)}\n`);
1193
1295
  }
1194
1296
  messages.push({ role: "assistant", content: assistantText.visible });
1297
+ // Detect a model that crammed MULTIPLE tool calls into one response.
1298
+ // Only `call` (the first block) will run this turn; the rest are dropped.
1299
+ // We flag it so that after the first tool executes we explicitly tell the
1300
+ // model the others did NOT run — preventing the "I ran everything" lie.
1301
+ const extraToolBlocks = Math.max(0, countToolFences(assistantText.visible) - 1);
1302
+ if (extraToolBlocks > 0) {
1303
+ process.stdout.write(chalk.yellow(` ⚠ ${extraToolBlocks} extra tool block(s) in one message were ignored — only the first ran. One tool per turn.\n`));
1304
+ }
1195
1305
  // ── Plan / task tools (session-scoped, handled inline) ─────────────
1196
1306
  // These don't go through the generic registry because they need the
1197
1307
  // session id and mutate the live plan that the user can view (Ctrl+P).
@@ -1218,6 +1328,29 @@ export async function runAgentLoop(prompt, options = {}) {
1218
1328
  decision,
1219
1329
  scope: isScopeActive(scope) ? (scope.name ?? "(unnamed)") : "(none)",
1220
1330
  });
1331
+ // ── Plan-awaiting-approval gate ────────────────────────────────────
1332
+ // When an active plan exists but the user has NOT approved it with
1333
+ // /implement, the agent must NOT execute the plan. Any free-text the
1334
+ // user typed after the plan was shown is a PLAN REVISION, not a "go"
1335
+ // signal — the agent should re-plan (plan.create) and wait again. We
1336
+ // hard-block execution tools here so a model that ignores the prompt
1337
+ // directive (or recovers a stray tool call) can't start running the
1338
+ // plan. Read-only exploration is still allowed so it can refine the
1339
+ // plan intelligently.
1340
+ if (activePlan &&
1341
+ !session.planApproved.value &&
1342
+ !isPreApprovalAllowedTool(call.name)) {
1343
+ process.stdout.write(chalk.yellow(` ⚠ plan awaiting approval — ${call.name} is blocked until you /implement (or /discard)\n`));
1344
+ messages.push({
1345
+ role: "user",
1346
+ content: `There is an ACTIVE PLAN that has NOT been approved yet, so you must NOT execute it — ` +
1347
+ `you tried to call ${call.name}, which is blocked. The user's latest message is a PLAN REVISION, ` +
1348
+ `not approval. Update the plan to incorporate their feedback by calling plan.create again with the ` +
1349
+ `revised goal/detail/tasks, then STOP and wait. The user approves with /implement or cancels with /discard. ` +
1350
+ `Do NOT run any execution tool (shell.exec, pkg.install, fs.write, net.scan, tool.check, etc.) until they /implement.`,
1351
+ });
1352
+ continue;
1353
+ }
1221
1354
  if (call.name === "web.search") {
1222
1355
  sawFreshWebSearch = true;
1223
1356
  }
@@ -1241,18 +1374,10 @@ export async function runAgentLoop(prompt, options = {}) {
1241
1374
  !session.pentestAuthorized.value;
1242
1375
  const authorized = await ensurePentestAuthorization(call, Boolean(options.autoConfirm), session);
1243
1376
  // inquirer's confirm() creates its own readline interface which resets
1244
- // raw mode when it finishes. Re-assert raw mode so the outer keypress
1245
- // handler (ESC/Ctrl+C abort, Ctrl+O output pane) keeps working during
1246
- // the next streaming phase.
1247
- if (process.stdin.isTTY &&
1248
- !process.stdin.isRaw) {
1249
- try {
1250
- process.stdin.setRawMode(true);
1251
- }
1252
- catch {
1253
- /* ignore */
1254
- }
1255
- }
1377
+ // raw mode AND pauses stdin when it finishes. Re-assert raw mode and
1378
+ // resume stdin so the outer keypress handler (ESC/Ctrl+C abort, Ctrl+O
1379
+ // output pane) keeps working during the next streaming/tool phase.
1380
+ restoreInteractiveStdin();
1256
1381
  if (!authorized) {
1257
1382
  lastAnswer = "Pentest authorization not confirmed.";
1258
1383
  process.stdout.write(chalk.red(` ✗ ${lastAnswer}`) + "\n");
@@ -1266,16 +1391,9 @@ export async function runAgentLoop(prompt, options = {}) {
1266
1391
  const forceManualConfirm = call.name === "fs.delete";
1267
1392
  if (decision.level === "confirm" && !pentestJustConfirmed) {
1268
1393
  const ok = await confirmToolExecution(call, forceManualConfirm ? false : Boolean(options.autoConfirm), session);
1269
- // Re-assert raw mode after inquirer's confirm() (see comment above).
1270
- if (process.stdin.isTTY &&
1271
- !process.stdin.isRaw) {
1272
- try {
1273
- process.stdin.setRawMode(true);
1274
- }
1275
- catch {
1276
- /* ignore */
1277
- }
1278
- }
1394
+ // Re-assert raw mode and resume stdin after inquirer's confirm()
1395
+ // (see restoreInteractiveStdin / the comment above).
1396
+ restoreInteractiveStdin();
1279
1397
  if (!ok) {
1280
1398
  lastAnswer = "Cancelled.";
1281
1399
  process.stdout.write(chalk.yellow(` ✗ cancelled`) + "\n");
@@ -1476,7 +1594,10 @@ export async function runAgentLoop(prompt, options = {}) {
1476
1594
  }
1477
1595
  messages.push({
1478
1596
  role: "tool",
1479
- content: `Tool ${call.name} result (exit=${result.exitCode ?? 0}, ok=${result.ok}):\n${contextOutput}`,
1597
+ content: `Tool ${call.name} result (exit=${result.exitCode ?? 0}, ok=${result.ok}):\n${contextOutput}` +
1598
+ (extraToolBlocks > 0
1599
+ ? `\n\nIMPORTANT: your previous message contained ${extraToolBlocks + 1} tool blocks, but ONLY this first one (${call.name}) actually ran. The other ${extraToolBlocks} did NOT execute and were discarded. Emit EXACTLY ONE tool block per message. Send the next tool call now — and do NOT assume any of the dropped calls happened.`
1600
+ : ""),
1480
1601
  });
1481
1602
  // Compact older messages when the running estimate exceeds budget so
1482
1603
  // free-tier context windows are not blown by long pentest sessions.