npm - @pentoshi/clai - Versions diffs - 1.0.4 → 1.0.7 - Mend

@pentoshi/clai 1.0.4 → 1.0.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/README.md +71 -3
package/dist/agent/runner.d.ts +11 -0
package/dist/agent/runner.js +155 -34
package/dist/agent/runner.js.map +1 -1
package/dist/commands/update.js +1 -1
package/dist/prompts/index.d.ts +1 -1
package/dist/prompts/index.js +36 -10
package/dist/prompts/index.js.map +1 -1
package/dist/repl.js +27 -2
package/dist/repl.js.map +1 -1
package/package.json +1 -1
package/dist/context/manager.d.ts +0 -4
package/dist/context/manager.js +0 -48
package/dist/context/manager.js.map +0 -1
package/dist/tools/artifacts.d.ts +0 -9
package/dist/tools/artifacts.js +0 -38
package/dist/tools/artifacts.js.map +0 -1
package/dist/ui/tool-output.d.ts +0 -18
package/dist/ui/tool-output.js +0 -135
package/dist/ui/tool-output.js.map +0 -1

package/README.md CHANGED Viewed

@@ -66,10 +66,11 @@ clai -y "list the 10 largest files in my home directory"
 ## Features
 - **`/ask` mode** — Read-only. AI explains, gives commands & step-by-step guidance, but does NOT execute anything.
-- **`/agent` mode** — Agentic. AI plans, then executes shell commands, edits files, installs missing tools, parses output, and continues until the goal is met.
+- **`/agent` mode** — Agentic. AI plans, waits for approval, then executes shell commands, edits files, installs missing tools, parses output, and continues until the goal is met. Tasks run on an approve/refine/discard plan workflow (`/implement`, free-text to refine, `/discard` to cancel).
 - **7 LLM providers** — Groq, Google Gemini, OpenRouter, OpenAI, Anthropic, NVIDIA NIM, and Ollama (local). All with streaming.
 - **10 built-in tools** — `shell.exec`, `fs.read`, `fs.write`, `fs.list`, `fs.search`, `pkg.install`, `net.scan`, `http.fetch`, `sysinfo`, `pentest.recon`.
 - **Smart safety gate** — Read-only commands auto-execute; mutating commands require confirmation; destructive patterns are blocked.
+- **OS-aware & tool-frugal** — Picks the best approach for your OS, prefers tools already installed (installs only when nothing suitable exists), broadens its approach and escalates privileges as needed to finish the task.
 - **Cross-platform** — macOS, Linux, and Windows. Detects OS-native package managers (brew, apt, dnf, pacman, winget, choco).
 - **Pentest-aware** — nmap, nikto, sqlmap, gobuster, ffuf, hydra, masscan, whois, dig, netcat, tshark.
 - **Auto-update** — Checks for new versions on startup; run `/update` or `clai update` to upgrade.
@@ -153,6 +154,9 @@ export OLLAMA_HOST=http://localhost:11434
 | `/reset`                | Clear all saved history                            |
 | `/cwd <path>`           | Change working directory                           |
 | `/allow <tool>`         | Whitelist a tool for the session                   |
+| `/plan`                 | View the current session plan (also `Ctrl+P`)      |
+| `/implement`            | Approve the current plan and have clai execute it  |
+| `/discard`              | Discard the current plan so later messages ignore it |
 | `/scope add <targets>`  | Add authorized pentest targets                     |
 | `/fallback [on|off]`    | Try other configured providers after a failure     |
 | `/update`               | Check for updates                                  |
@@ -160,6 +164,21 @@ export OLLAMA_HOST=http://localhost:11434
 | `/help`                 | List commands                                      |
 | `Ctrl+C`                | Abort current response (second Ctrl+C exits)       |
 | `Ctrl+O`                | Toggle full tool output (same keys on all OSes)    |
+| `Ctrl+P`                | View the current session plan                      |
+### Plan → Implement workflow
+For multi-step coding or pentest tasks, clai first proposes a **plan** (a goal,
+an approach, and an ordered task checklist) and then waits. Nothing runs until
+you approve it.
+- **Approve** — type `/implement` to execute the plan task by task.
+- **Refine** — type any normal message (e.g. "use only installed tools",
+  "skip task 2", "also enumerate subdomains") and clai produces a **revised
+  plan**, then waits again. While a plan is awaiting approval, free-text is
+  treated as plan feedback, not as a signal to start running.
+- **Cancel** — type `/discard` to drop the plan. After discarding, later
+  messages are independent of it.
 ## Built-in Tools (Agent Mode)
@@ -171,10 +190,10 @@ export OLLAMA_HOST=http://localhost:11434
 | `fs.list`        | List directory contents                                            | safe       |
 | `fs.search`      | Search files with ripgrep (falls back to grep)                     | safe       |
 | `pkg.install`    | Install packages via detected OS package manager                   | confirm    |
-| `net.scan`       | Nmap wrapper for port scanning                                     | confirm    |
+| `net.scan`       | Nmap wrapper. Defaults to a stealth SYN scan, auto-elevates (sudo/doas/gsudo) and falls back to an unprivileged TCP connect scan | confirm    |
 | `http.fetch`     | HTTP GET/POST with response size limits                            | safe       |
 | `sysinfo`        | OS, architecture, shell, and working directory info                | safe       |
-| `pentest.recon`  | Composite: whois + dig + nmap top-100 ports                       | confirm    |
+| `pentest.recon`  | Composite: whois + dig + stealth nmap top-100 ports               | confirm    |
 > \* **smart** = read-only commands (`curl`, `ls`, `whoami`, `gobuster`, `dirb`, etc.) auto-execute; mutating commands require confirmation.
@@ -284,6 +303,55 @@ npm test           # Run tests (39 tests)
 npm run compile    # Build native binaries (requires Bun)
 ```
+## Releasing
+Releases are fully automated by `.github/workflows/release.yml`, triggered when
+you push a `v*.*.*` tag. To cut a release:
+```sh
+npm version 1.0.6 --no-git-tag-version   # bump package.json + lockfile
+# also bump: src/commands/update.ts (FALLBACK_VERSION),
+#            manifests/homebrew/clai.rb, manifests/scoop/clai.json
+git commit -am "v1.0.6"
+git push origin main
+git tag -a v1.0.6 -m "clai v1.0.6"
+git push origin v1.0.6                   # this triggers the workflow
+```
+On the tag push the workflow:
+1. **build** — runs typecheck + tests and compiles native binaries for all platforms.
+2. **publish** — creates the GitHub Release with the binaries and SHA256 sidecars.
+3. **publish-npm** — publishes `@pentoshi/clai` to npm.
+4. **sync-tap** — regenerates the Homebrew formula in `pentoshi007/homebrew-clai`.
+> **Reruns don't pick up newer workflow code.** Re-running a workflow runs it
+> against the commit the tag points to. If you change `release.yml` after
+> tagging, you must move/recreate the tag (or cut a new version) for the change
+> to take effect.
+Required repository secrets (Settings → Secrets and variables → Actions). Each
+job skips gracefully if its secret is absent:
+| Secret             | Used by       | How to create                                                                 |
+|--------------------|---------------|-------------------------------------------------------------------------------|
+| `NPM_TOKEN`        | `publish-npm` | npm → Access Tokens → **Granular** (Read and write on `@pentoshi/clai`) or classic **Automation** token. These bypass the interactive OTP prompt that blocks CI. |
+| `TAP_GITHUB_TOKEN` | `sync-tap`    | A GitHub PAT with `contents:write` on the `pentoshi007/homebrew-clai` repo     |
+Optional repository **variable** (not a secret):
+| Variable         | Effect                                                                          |
+|------------------|---------------------------------------------------------------------------------|
+| `NPM_PROVENANCE` | Set to `true` to publish with `--provenance`. Only works if the npm account's 2FA is set to **"authorization only"**. Leave unset otherwise — the job publishes without provenance. |
+The `publish-npm` job verifies the tag matches `package.json` version and skips
+if that version is already on npm, so re-running a tag is safe.
+> A normal account with 2FA set to **"auth and writes"** prompts for a one-time
+> password on every publish, which fails in CI. Use a Granular/Automation
+> `NPM_TOKEN` (token-level auth) so CI can publish without an OTP — you can keep
+> 2FA enabled on the account.
 ## Architecture
 ```

package/dist/agent/runner.d.ts CHANGED Viewed

@@ -72,6 +72,16 @@ export declare function recognizeBareToolJson(text: string): {
  * multi-file fs.writeMany scaffold) silently never runs.
  */
 export declare function looksLikeTruncatedToolCall(text: string): boolean;
+/**
+ * Count the number of ```tool fenced blocks in a message. Models sometimes
+ * emit MULTIPLE tool calls in one response (e.g. fs.writeMany + npm install +
+ * npm run dev). Only the FIRST is parsed and executed; the rest are silently
+ * dropped and leak to the screen as code fences, while the model believes it
+ * ran all of them — a major cause of "everything is done" fabrications. We
+ * detect this so the runner can run the first and explicitly tell the model
+ * the others did NOT run and must be re-sent one at a time.
+ */
+export declare function countToolFences(text: string): number;
 /**
  * Decide whether this turn should get a generous step budget because it is
  * a multi-file build, a continuation of one, or a "it's not done yet" nudge.
@@ -88,4 +98,5 @@ export declare function requiresFreshWebSearch(prompt: string): boolean;
  */
 export declare function isLumpedSingleTask(taskTitles: string[]): boolean;
 export declare function shouldDimToolChatter(call: ToolCall): boolean;
+export declare function isPreApprovalAllowedTool(name: string): boolean;
 export declare function runAgentLoop(prompt: string, options?: AgentRunOptions): Promise<string>;

package/dist/agent/runner.js CHANGED Viewed

@@ -367,6 +367,19 @@ export function looksLikeTruncatedToolCall(text) {
     }
     return false;
 }
+/**
+ * Count the number of ```tool fenced blocks in a message. Models sometimes
+ * emit MULTIPLE tool calls in one response (e.g. fs.writeMany + npm install +
+ * npm run dev). Only the FIRST is parsed and executed; the rest are silently
+ * dropped and leak to the screen as code fences, while the model believes it
+ * ran all of them — a major cause of "everything is done" fabrications. We
+ * detect this so the runner can run the first and explicitly tell the model
+ * the others did NOT run and must be re-sent one at a time.
+ */
+export function countToolFences(text) {
+    const matches = text.match(/```tool\s*\n[\s\S]*?```/gi);
+    return matches ? matches.length : 0;
+}
 /** Extract the text before the tool call block for display purposes */
 function textBeforeToolCall(text) {
     const patterns = [
@@ -527,15 +540,23 @@ function buildWorkflowDirective() {
     return [
         "BUILD WORKFLOW (this is a build/scaffold/feature task — follow this order EXACTLY; deviation is a failure):",
         "1. EXPLORE: fs.list the working directory (and key subdirs) to see what already exists. Use tool.batch to parallelize reads.",
-        "2. UNDERSTAND: fs.read the files that matter (like package.json for js related and same for other languages too, config, entry points, existing components). Detect the existing stack/tooling and MATCH it. If the dir is empty or only has a stub, start fresh with a sensible modern default (e.g. Vite + React) and say so.",
-        "3. PLAN: call plan.create with a COMPREHENSIVE plan — a detailed `detail` (stack chosen and WHY, architecture, how you'll verify) and 4-8 SEPARATE, ordered, high-quality tasks. NEVER cram everything into one task (e.g. one task that lists 8 files is rejected). Each task is one distinct, verifiable action. Then STOP and wait for the user to /implement.",
-        "4. IMPLEMENT: once approved, work task by task in STRICT ORDER across MULTIPLE steps. Start with the FIRST pending task. For each task: call task.update {taskId, state:'in_progress'} → do the real work (fs.writeMany for files, pkg.install / npm install, shell.start for the dev server) → VERIFY it succeeded → call task.update {taskId, state:'done'}, then move to the NEXT task. Keep going until EVERY task is done and the goal is achieved. Do NOT stop after one file or one step, and do NOT claim work you didn't actually run.",
+        "2. UNDERSTAND: fs.read the files that matter (like package.json for js related and same for other languages too, config, entry points, existing components). Detect the existing stack/tooling and MATCH it. If the dir is empty or only has a stub, start fresh with a sensible modern default and say so.",
+        "3. PLAN: call plan.create with a COMPREHENSIVE plan — a detailed `detail` (stack chosen and WHY, architecture, how you'll verify) and 4-8 SEPARATE, ordered, high-quality tasks. The FIRST task must be to INITIALIZE the project with its official scaffolder (NOT hand-writing package.json). Each task is one distinct, verifiable action. Then STOP and wait for the user to /implement.",
+        "4. IMPLEMENT: once approved, work task by task in STRICT ORDER across MULTIPLE steps, ONE tool call per turn. For each task: call task.update {taskId, state:'in_progress'} → do the real work → VERIFY it actually succeeded (read a file you wrote, check the command's exit/output) → call task.update {taskId, state:'done'}, then move to the NEXT task. Keep going until EVERY task is done. Do NOT stop after one step, and do NOT claim work you didn't actually run.",
+        "",
+        "INITIALIZE WITH THE OFFICIAL SCAFFOLDER FIRST (do NOT hand-write build configs):",
+        "- React/Vue/Svelte/vanilla → `npm create vite@latest <name> -- --template react` (or react-ts, vue, svelte). Next.js → `npx create-next-app@latest <name> --yes`. Vue → `npm create vue@latest`. Astro → `npm create astro@latest`. Node API → `npm init -y` then add deps. Python → `uv init` / `python -m venv`. Use the ecosystem's standard initializer for the framework.",
+        "- Run the scaffolder NON-INTERACTIVELY (pass flags/--yes) via shell.exec, into the current directory. THEN run the install (npm install) and adapt/add only the files the app needs (components, routes, styles) with fs.write/fs.writeMany. Do NOT recreate what the scaffolder already generated.",
+        "- Only hand-write package.json/config when there is genuinely no suitable scaffolder for the stack.",
         "",
         "CRITICAL RULES during IMPLEMENTATION:",
+        "- EXACTLY ONE ```tool block per message. NEVER put several tool calls (e.g. fs.writeMany + npm install + npm run dev) in one response — only the first runs and the rest are silently discarded, which is how false 'all done' claims happen.",
         "- Do NOT re-explore. Step 1 (EXPLORE) was already completed during planning. Start executing the first pending task immediately.",
         "- ONE task at a time, in ORDER. Do NOT skip ahead to task 3 before task 2 is done.",
-        "- If a tool call FAILS (error output, non-zero exit, file missing), the task is NOT done. Mark it 'failed', diagnose WHY it failed, fix the problem, and retry until it succeeds.",
-        "- NEVER claim a task is done, a dependency is installed, or a server is running unless the tool call actually succeeded and you saw the success output.",
+        "- VERIFY each step before marking it done: after writing files, fs.list/fs.read to confirm they exist; after an install, check it exited 0; after starting the dev server with shell.start, confirm the job is running. Marking a task done without a successful tool call is the worst failure.",
+        "- If a tool call FAILS (error output, non-zero exit, file missing), the task is NOT done. Mark it 'failed', diagnose WHY, fix it, and retry until it succeeds.",
+        "- NEVER claim a task is done, files were created, a dependency is installed, or a server is running unless the tool call ACTUALLY succeeded and you saw the success output. If you have not run it, say so.",
+        "- Start a dev server with shell.start (background job), NOT `npm run dev &` via shell.exec.",
         "",
         "FORBIDDEN before plan approval (/implement): you MUST NOT use fs.write, fs.writeMany, fs.edit, shell.exec, shell.start, pkg.install, or pkg.uninstall. The ONLY tool allowed before approval is plan.create (and the read/list tools for exploration). If you are nudged to 'take action' before a plan exists, your action MUST be plan.create.",
         "If the task is genuinely trivial (a single tiny file), you may skip the plan — but for an app/feature, ALWAYS plan first.",
@@ -544,6 +565,48 @@ function buildWorkflowDirective() {
 export function shouldDimToolChatter(call) {
     return call.name === "web.search";
 }
+/**
+ * Re-assert raw mode AND resume stdin after an inquirer prompt
+ * (confirm/password). inquirer's readline interface pauses stdin and
+ * switches it to cooked mode when it closes; if we only flip raw mode back
+ * on but leave stdin paused, no `keypress`/`data` events flow to the REPL's
+ * ESC/Ctrl+C abort handler — so a long-running tool launched right after a
+ * confirmation can no longer be aborted (the user had to kill the terminal).
+ * Calling resume() restores the event flow.
+ */
+function restoreInteractiveStdin() {
+    if (!process.stdin.isTTY)
+        return;
+    try {
+        if (!process.stdin.isRaw) {
+            process.stdin.setRawMode(true);
+        }
+        process.stdin.resume();
+    }
+    catch {
+        /* ignore */
+    }
+}
+/**
+ * Tools allowed while an UN-approved plan is active. Before the user runs
+ * /implement, the agent may only (re)create the plan and do read-only
+ * exploration to refine it — never execute. Everything else is blocked by
+ * the plan-awaiting-approval gate so a stray/recovered tool call can't start
+ * running the plan, and so free-text after a plan is treated as a revision.
+ */
+const PRE_APPROVAL_ALLOWED_TOOLS = new Set([
+    "plan.create",
+    "task.update",
+    "fs.read",
+    "fs.list",
+    "fs.search",
+    "sysinfo",
+    "tool.batch",
+    "net.context",
+]);
+export function isPreApprovalAllowedTool(name) {
+    return PRE_APPROVAL_ALLOWED_TOOLS.has(name);
+}
 function styleToolChatter(call, text) {
     return shouldDimToolChatter(call) ? chalk.dim(text) : text;
 }
@@ -687,8 +750,13 @@ function planContextMessage(plan, approved) {
             "Never claim something ran without a successful tool call.");
     }
     else {
-        lines.push("This plan is NOT yet approved. If the user is refining it, update it with plan.create again. " +
-            "Do NOT execute tasks until the user runs /implement.");
+        lines.push("This plan is NOT yet approved, so you MUST NOT execute any of its tasks yet. " +
+            "Any new free-text message from the user right now is a PLAN REVISION, not approval — even if it " +
+            "sounds like an instruction (e.g. 'do not install new tools', 'use only X', 'also add Y', 'skip task 2'). " +
+            "Treat it as feedback: call plan.create AGAIN with the revised goal/detail/tasks to produce an updated " +
+            "plan, then STOP and wait. Do NOT call shell.exec, pkg.install, net.scan, tool.check, fs.write, or any " +
+            "other execution tool. The user will APPROVE with /implement, or CANCEL with /discard. Only after " +
+            "/implement may you begin executing.");
     }
     return lines.join("\n");
 }
@@ -744,15 +812,18 @@ async function handlePlanTool(call, session, ctx) {
         const display = chalk.cyan("  ● planning\n") +
             checklist +
             "\n" +
-            chalk.dim("  ✦ plan created — press Ctrl+P to view it, or type /implement to approve and run it\n");
+            chalk.dim("  ✦ plan created — press Ctrl+P to view it, /implement to approve and run it,\n" +
+                "    or /discard to cancel it. Any other message refines this plan.\n");
         return {
             handled: true,
             ok: true,
             display,
-            modelNote: `Plan saved with ${plan.tasks.length} task(s). STOP here and wait. ` +
+            modelNote: `Plan saved with ${plan.tasks.length} task(s). STOP here and wait — produce NO other tool calls now. ` +
                 "Do NOT start executing tasks until the user approves with /implement. " +
-                "When approved you will receive a message telling you to begin; then work task by task, " +
-                "calling task.update to mark each in_progress before and done after you finish it.",
+                "If the user's next message gives feedback instead of /implement, that is a REVISION: call plan.create " +
+                "again with the updated plan and STOP again. The user may cancel the whole plan with /discard. " +
+                "Only after /implement do you begin, working task by task, calling task.update to mark each " +
+                "in_progress before and done after you finish it.",
         };
     }
     // task.update
@@ -891,6 +962,10 @@ export async function runAgentLoop(prompt, options = {}) {
     // ignores the freshness guard and tries to answer from stale memory.
     let sawFreshWebSearch = false;
     let freshnessRetryUsed = false;
+    // Guard against a model that declares an approved plan "complete" while
+    // tasks are still pending and it never ran the work. We nudge it back to
+    // executing the next task a bounded number of times before giving up.
+    let prematureCompletionRetries = 0;
     // ── Step budget ───────────────────────────────────────────────────
     // The budget governs how many *productive* steps (a tool execution or a
     // final answer) the agent may take. Recovery iterations — nudging a model
@@ -1142,6 +1217,33 @@ export async function runAgentLoop(prompt, options = {}) {
                 });
                 continue;
             }
+            // ── Premature-completion guard (approved plan still has work) ──────
+            // If the user approved a plan and the model now gives a final answer
+            // while tasks are still pending/in_progress — without having run the
+            // work — it is fabricating completion (the exact "all tasks completed,
+            // running at localhost:5173" failure). Force it back to executing the
+            // next real task instead of accepting the false claim.
+            if (session.planApproved.value && prematureCompletionRetries < 3) {
+                const livePlan = await loadPlan(session.sessionId).catch(() => undefined);
+                const unfinished = livePlan?.tasks.filter((t) => t.state === "pending" || t.state === "in_progress");
+                if (livePlan && unfinished && unfinished.length > 0) {
+                    prematureCompletionRetries += 1;
+                    const next = unfinished[0];
+                    process.stdout.write(chalk.yellow(`  ⚠ ${unfinished.length} plan task(s) still unfinished — not accepting a "done" claim; resuming execution\n`));
+                    messages.push({ role: "assistant", content: assistantText.visible });
+                    messages.push({
+                        role: "user",
+                        content: `You have NOT finished the approved plan: ${unfinished.length} task(s) remain ` +
+                            `(${unfinished.map((t) => `[${t.id}] ${t.title}`).join("; ")}). ` +
+                            `Do NOT claim the work is complete, that files were created, or that a server is running ` +
+                            `unless a tool call actually succeeded and you saw the output. ` +
+                            `Resume now with the NEXT task ${next.id} ("${next.title}"): call task.update {taskId:"${next.id}", state:"in_progress"}, ` +
+                            `then do the real work with a tool call (fs.writeMany / shell.exec / shell.start), VERIFY it, and mark it done. ` +
+                            `Continue task by task until EVERY task is actually finished.`,
+                    });
+                    continue;
+                }
+            }
             if (cleaned) {
                 process.stdout.write(renderMarkdown(cleaned));
                 if (!cleaned.endsWith("\n"))
@@ -1192,6 +1294,14 @@ export async function runAgentLoop(prompt, options = {}) {
             process.stdout.write(`${renderThinkingSummary(assistantText.thinkContent)}\n`);
         }
         messages.push({ role: "assistant", content: assistantText.visible });
+        // Detect a model that crammed MULTIPLE tool calls into one response.
+        // Only `call` (the first block) will run this turn; the rest are dropped.
+        // We flag it so that after the first tool executes we explicitly tell the
+        // model the others did NOT run — preventing the "I ran everything" lie.
+        const extraToolBlocks = Math.max(0, countToolFences(assistantText.visible) - 1);
+        if (extraToolBlocks > 0) {
+            process.stdout.write(chalk.yellow(`  ⚠ ${extraToolBlocks} extra tool block(s) in one message were ignored — only the first ran. One tool per turn.\n`));
+        }
         // ── Plan / task tools (session-scoped, handled inline) ─────────────
         // These don't go through the generic registry because they need the
         // session id and mutate the live plan that the user can view (Ctrl+P).
@@ -1218,6 +1328,29 @@ export async function runAgentLoop(prompt, options = {}) {
             decision,
             scope: isScopeActive(scope) ? (scope.name ?? "(unnamed)") : "(none)",
         });
+        // ── Plan-awaiting-approval gate ────────────────────────────────────
+        // When an active plan exists but the user has NOT approved it with
+        // /implement, the agent must NOT execute the plan. Any free-text the
+        // user typed after the plan was shown is a PLAN REVISION, not a "go"
+        // signal — the agent should re-plan (plan.create) and wait again. We
+        // hard-block execution tools here so a model that ignores the prompt
+        // directive (or recovers a stray tool call) can't start running the
+        // plan. Read-only exploration is still allowed so it can refine the
+        // plan intelligently.
+        if (activePlan &&
+            !session.planApproved.value &&
+            !isPreApprovalAllowedTool(call.name)) {
+            process.stdout.write(chalk.yellow(`  ⚠ plan awaiting approval — ${call.name} is blocked until you /implement (or /discard)\n`));
+            messages.push({
+                role: "user",
+                content: `There is an ACTIVE PLAN that has NOT been approved yet, so you must NOT execute it — ` +
+                    `you tried to call ${call.name}, which is blocked. The user's latest message is a PLAN REVISION, ` +
+                    `not approval. Update the plan to incorporate their feedback by calling plan.create again with the ` +
+                    `revised goal/detail/tasks, then STOP and wait. The user approves with /implement or cancels with /discard. ` +
+                    `Do NOT run any execution tool (shell.exec, pkg.install, fs.write, net.scan, tool.check, etc.) until they /implement.`,
+            });
+            continue;
+        }
         if (call.name === "web.search") {
             sawFreshWebSearch = true;
         }
@@ -1241,18 +1374,10 @@ export async function runAgentLoop(prompt, options = {}) {
             !session.pentestAuthorized.value;
         const authorized = await ensurePentestAuthorization(call, Boolean(options.autoConfirm), session);
         // inquirer's confirm() creates its own readline interface which resets
-        // raw mode when it finishes. Re-assert raw mode so the outer keypress
-        // handler (ESC/Ctrl+C abort, Ctrl+O output pane) keeps working during
-        // the next streaming phase.
-        if (process.stdin.isTTY &&
-            !process.stdin.isRaw) {
-            try {
-                process.stdin.setRawMode(true);
-            }
-            catch {
-                /* ignore */
-            }
-        }
+        // raw mode AND pauses stdin when it finishes. Re-assert raw mode and
+        // resume stdin so the outer keypress handler (ESC/Ctrl+C abort, Ctrl+O
+        // output pane) keeps working during the next streaming/tool phase.
+        restoreInteractiveStdin();
         if (!authorized) {
             lastAnswer = "Pentest authorization not confirmed.";
             process.stdout.write(chalk.red(`  ✗ ${lastAnswer}`) + "\n");
@@ -1266,16 +1391,9 @@ export async function runAgentLoop(prompt, options = {}) {
         const forceManualConfirm = call.name === "fs.delete";
         if (decision.level === "confirm" && !pentestJustConfirmed) {
             const ok = await confirmToolExecution(call, forceManualConfirm ? false : Boolean(options.autoConfirm), session);
-            // Re-assert raw mode after inquirer's confirm() (see comment above).
-            if (process.stdin.isTTY &&
-                !process.stdin.isRaw) {
-                try {
-                    process.stdin.setRawMode(true);
-                }
-                catch {
-                    /* ignore */
-                }
-            }
+            // Re-assert raw mode and resume stdin after inquirer's confirm()
+            // (see restoreInteractiveStdin / the comment above).
+            restoreInteractiveStdin();
             if (!ok) {
                 lastAnswer = "Cancelled.";
                 process.stdout.write(chalk.yellow(`  ✗ cancelled`) + "\n");
@@ -1476,7 +1594,10 @@ export async function runAgentLoop(prompt, options = {}) {
         }
         messages.push({
             role: "tool",
-            content: `Tool ${call.name} result (exit=${result.exitCode ?? 0}, ok=${result.ok}):\n${contextOutput}`,
+            content: `Tool ${call.name} result (exit=${result.exitCode ?? 0}, ok=${result.ok}):\n${contextOutput}` +
+                (extraToolBlocks > 0
+                    ? `\n\nIMPORTANT: your previous message contained ${extraToolBlocks + 1} tool blocks, but ONLY this first one (${call.name}) actually ran. The other ${extraToolBlocks} did NOT execute and were discarded. Emit EXACTLY ONE tool block per message. Send the next tool call now — and do NOT assume any of the dropped calls happened.`
+                    : ""),
         });
         // Compact older messages when the running estimate exceeds budget so
         // free-tier context windows are not blown by long pentest sessions.