npm - fluxflow-cli - Versions diffs - 1.8.18 → 1.8.20 - Mend

fluxflow-cli 1.8.18 → 1.8.20

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/ARCHITECTURE.md CHANGED Viewed

@@ -1,65 +1,65 @@
-# 🏛️ Architecture & Design
-Flux Flow is built on a modern, reactive stack that brings web-like development paradigms to the terminal. It utilizes a custom agentic loop for reasoning and a unique dual-model system for background processing.
-## UI Layer: React & Ink
-The entire terminal interface is built using **React** via the [Ink](https://github.com/vadimdemedes/ink) renderer.
-- **Component-Based**: The UI is composed of isolated, reusable React components (`ChatLayout`, `StatusBar`, `CommandMenu`, `TerminalBox`, `ProfileForm`).
-- **Reactive State**: The application uses React hooks (`useState`, `useEffect`) to manage user input, application mode, model selection, and the terminal's resizing events.
-- **Zero-Render Overheads**: Critical performance trackers, like the session start time, are kept outside the React render cycle to maintain terminal responsiveness during high-speed AI text streaming.
-## The Agentic Loop
-The core intelligence of Flux Flow resides in `src/utils/ai.js`. It does not rely on opaque third-party agent frameworks; instead, it uses a custom, highly transparent string-based protocol powered by an asynchronous generator (`async function*`). This approach allows for real-time UI updates while managing complex multi-step reasoning.
-The execution flow of a single user prompt follows this loop:
-1. **Context Assembly**: The user's prompt is combined with the system instructions, temporary session context, persistent user memories, and the current chat history. If the history gets too large (e.g., >128k tokens) and compression is disabled, it is gracefully truncated.
-2. **Stream Processing**: The main loop initiates a streaming request to the Gemini API (`client.models.generateContentStream`). It yields chunks of text and status updates directly back to the React UI as they arrive.
-3. **Detection & Tool Execution**: Once the stream completes for a given turn, the entire response is scanned for tool calls using a custom regex and bracket-balancing parser (looking for `tool:functions.tool_name(args...)`).
-   - If tools are found, the loop pauses.
-   - Each tool is dispatched to its respective handler in `src/tools/`.
-   - Tool outputs are collected and appended to the context as `[TOOL_RESULT]: ...`.
-4. **Security Governance**: During tool execution, the loop enforces security checks (e.g., blocking `exec_command` from accessing system root drives if "External Workspace Access" is off) and pauses for Human-in-the-Loop (HITL) approval if necessary.
-5. **Turn Management & Continuation**: The model is instructed to append `[turn: finish]` if its goal is complete, or `[turn: continue]` if it expects tool results.
-   - If tools were called or `[turn: continue]` is present, the loop increments and re-prompts the model with the newly gathered `[TOOL_RESULT]` data.
-   - If `[turn: finish]` is detected and no further tools were called, the main loop terminates, passing the final synthesized context to the background Janitor process.
-6. **Loop Limits & Resilience**: To prevent infinite loops or excessive API usage, **Flux mode** is capped at 50 iterations per user prompt, while **Flow mode** is capped at 5.
-   - **Multi-Stage Failover**: The loop features a sophisticated 8-attempt retry engine with random backoff (800ms - 2s).
-   - **Critical Fallback Pivot**: If the primary model fails 5 consecutive times, the agent surgically pivots to a lighter, high-concurrency fallback model (`gemini-3.1-flash-lite-preview`) for the final 3 attempts to ensure session navigation through API congestion.
-## Multimodal Pipeline
-Flux Flow implements a native multimodal processing engine in `src/tools/view_file.js`. This allows the agent to move beyond text-based reasoning and analyze visual assets directly.
-- **Binary Detection**: The pipeline uses `is-binary-path` to distinguish between text and binary files.
-- **Visual Encoding**: If an image or PDF is detected, the engine reads the raw bytes and converts them into base64-encoded `InlineData` objects.
-- **PDF Extraction**: For PDF documents, the engine extracts visual representation of pages to provide the model with high-fidelity spatial and textual context simultaneously.
-- **Context Injection**: These multimodal assets are injected directly into the Gemini model's multimodal part array, allowing the model to "see" the file as if it were looking at a screenshot.
-## The Dual-Model System
-To maintain a fast, snappy UI while still performing complex data management, Flux Flow employs two separate AI models for every interaction:
-### 1. The Main Agent
-- **Responsibility**: Direct user interaction, reasoning, and tool execution.
-- **Behavior**: Streams text directly to the UI. It focuses entirely on solving the user's immediate problem or answering their question.
-### 2. The Janitor (Background Process)
-- **Responsibility**: System maintenance, long-term memory extraction, and chat summarization.
-- **Behavior**: After the Main Agent finishes its loop, the entire context (User Prompt + Agent Raws) is sent to the Janitor model.
-- **Headless Operation**: The Janitor is explicitly instructed to be a "silent background system process" with "no mouth." It *only* outputs valid tool calls (e.g., updating the chat title or saving a new user preference to the persistent memory vault).
-## Data Persistence & Safety
-- **High-Fidelity Lock**: Because both the UI and the Janitor model may attempt to write to the `history.json` file simultaneously, a Promise-based `WRITE_LOCK` (`src/utils/history.js`) is utilized. This prevents race conditions and ensures data integrity.
-- **Encryption**: User secrets and persistent memories (`secret/memories.json`) are handled by `src/utils/crypto.js` to ensure local privacy.
-## Redirection & The Anchor Strategy
-To support data portability (e.g., storing all app data on an external encrypted drive), Flux Flow utilizes a synchronous "Anchor" strategy in `src/utils/paths.js`.
-- **Synchronous Pivot**: Because many core modules (History, Secrets, Usage) initialize their file paths as constants during module loading, the application must determine the "Actual" data root before anything else.
-- **Boot-Sequence Priority**: On every launch, `paths.js` performs a synchronous file system check for `~/.fluxflow/settings.json`. If a redirection path is found (`useExternalData: true`), it immediately overrides the global `DATA_DIR` constant for the entire process.
-- **Sub-Coordinate Resolution**: All secondary directories (`LOGS_DIR`, `SECRET_DIR`) are derived dynamically from the redirected `DATA_DIR`, ensuring that all session data flows to the external sanctuary without requiring individual configuration updates across the codebase.
+# 🏛️ Architecture & Design
+Flux Flow is built on a modern, reactive stack that brings web-like development paradigms to the terminal. It utilizes a custom agentic loop for reasoning and a unique dual-model system for background processing.
+## UI Layer: React & Ink
+The entire terminal interface is built using **React** via the [Ink](https://github.com/vadimdemedes/ink) renderer.
+- **Component-Based**: The UI is composed of isolated, reusable React components (`ChatLayout`, `StatusBar`, `CommandMenu`, `TerminalBox`, `ProfileForm`).
+- **Reactive State**: The application uses React hooks (`useState`, `useEffect`) to manage user input, application mode, model selection, and the terminal's resizing events.
+- **Zero-Render Overheads**: Critical performance trackers, like the session start time, are kept outside the React render cycle to maintain terminal responsiveness during high-speed AI text streaming.
+## The Agentic Loop
+The core intelligence of Flux Flow resides in `src/utils/ai.js`. It does not rely on opaque third-party agent frameworks; instead, it uses a custom, highly transparent string-based protocol powered by an asynchronous generator (`async function*`). This approach allows for real-time UI updates while managing complex multi-step reasoning.
+The execution flow of a single user prompt follows this loop:
+1. **Context Assembly**: The user's prompt is combined with the system instructions, temporary session context, persistent user memories, and the current chat history. If the history gets too large (e.g., >128k tokens) and compression is disabled, it is gracefully truncated.
+2. **Stream Processing**: The main loop initiates a streaming request to the Gemini API (`client.models.generateContentStream`). It yields chunks of text and status updates directly back to the React UI as they arrive.
+3. **Detection & Tool Execution**: Once the stream completes for a given turn, the entire response is scanned for tool calls using a custom regex and bracket-balancing parser (looking for `tool:functions.tool_name(args...)`).
+   - If tools are found, the loop pauses.
+   - Each tool is dispatched to its respective handler in `src/tools/`.
+   - Tool outputs are collected and appended to the context as `[TOOL_RESULT]: ...`.
+4. **Security Governance**: During tool execution, the loop enforces security checks (e.g., blocking `exec_command` from accessing system root drives if "External Workspace Access" is off) and pauses for Human-in-the-Loop (HITL) approval if necessary.
+5. **Turn Management & Continuation**: The model is instructed to append `[turn: finish]` if its goal is complete, or `[turn: continue]` if it expects tool results.
+   - If tools were called or `[turn: continue]` is present, the loop increments and re-prompts the model with the newly gathered `[TOOL_RESULT]` data.
+   - If `[turn: finish]` is detected and no further tools were called, the main loop terminates, passing the final synthesized context to the background Janitor process.
+6. **Loop Limits & Resilience**: To prevent infinite loops or excessive API usage, **Flux mode** is capped at 50 iterations per user prompt, while **Flow mode** is capped at 5.
+   - **Multi-Stage Failover**: The loop features a sophisticated 8-attempt retry engine with random backoff (800ms - 2s).
+   - **Critical Fallback Pivot**: If the primary model fails 5 consecutive times, the agent surgically pivots to a lighter, high-concurrency fallback model (`gemini-3.1-flash-lite-preview`) for the final 3 attempts to ensure session navigation through API congestion.
+## Multimodal Pipeline
+Flux Flow implements a native multimodal processing engine in `src/tools/view_file.js`. This allows the agent to move beyond text-based reasoning and analyze visual assets directly.
+- **Binary Detection**: The pipeline uses `is-binary-path` to distinguish between text and binary files.
+- **Visual Encoding**: If an image or PDF is detected, the engine reads the raw bytes and converts them into base64-encoded `InlineData` objects.
+- **PDF Extraction**: For PDF documents, the engine extracts visual representation of pages to provide the model with high-fidelity spatial and textual context simultaneously.
+- **Context Injection**: These multimodal assets are injected directly into the Gemini model's multimodal part array, allowing the model to "see" the file as if it were looking at a screenshot.
+## The Dual-Model System
+To maintain a fast, snappy UI while still performing complex data management, Flux Flow employs two separate AI models for every interaction:
+### 1. The Main Agent
+- **Responsibility**: Direct user interaction, reasoning, and tool execution.
+- **Behavior**: Streams text directly to the UI. It focuses entirely on solving the user's immediate problem or answering their question.
+### 2. The Janitor (Background Process)
+- **Responsibility**: System maintenance, long-term memory extraction, and chat summarization.
+- **Behavior**: After the Main Agent finishes its loop, the entire context (User Prompt + Agent Raws) is sent to the Janitor model.
+- **Headless Operation**: The Janitor is explicitly instructed to be a "silent background system process" with "no mouth." It *only* outputs valid tool calls (e.g., updating the chat title or saving a new user preference to the persistent memory vault).
+## Data Persistence & Safety
+- **High-Fidelity Lock**: Because both the UI and the Janitor model may attempt to write to the `history.json` file simultaneously, a Promise-based `WRITE_LOCK` (`src/utils/history.js`) is utilized. This prevents race conditions and ensures data integrity.
+- **Encryption**: User secrets and persistent memories (`secret/memories.json`) are handled by `src/utils/crypto.js` to ensure local privacy.
+## Redirection & The Anchor Strategy
+To support data portability (e.g., storing all app data on an external encrypted drive), Flux Flow utilizes a synchronous "Anchor" strategy in `src/utils/paths.js`.
+- **Synchronous Pivot**: Because many core modules (History, Secrets, Usage) initialize their file paths as constants during module loading, the application must determine the "Actual" data root before anything else.
+- **Boot-Sequence Priority**: On every launch, `paths.js` performs a synchronous file system check for `~/.fluxflow/settings.json`. If a redirection path is found (`useExternalData: true`), it immediately overrides the global `DATA_DIR` constant for the entire process.
+- **Sub-Coordinate Resolution**: All secondary directories (`LOGS_DIR`, `SECRET_DIR`) are derived dynamically from the redirected `DATA_DIR`, ensuring that all session data flows to the external sanctuary without requiring individual configuration updates across the codebase.

package/dist/fluxflow.js CHANGED Viewed

@@ -770,19 +770,19 @@ var init_thinking_prompts = __esm({
     thinking_prompts_default = {
       Max: `EFFORT_LEVEL: MAX
 Think in a continuous, fluid analytical monologue within the <think>...</think> block. Do NOT use headings, bullet points, or artificial sections. Engage in a deep "Stream of Consciousness" that follows this cognitive path:
-1. **Deep Analysis**: Deconstruct the request into its core technical and logic requirements.
-2. **Hypothesis & Test**: Propose multiple solutions mentally and critique them for edge cases or security risks.
-3. **Architectural Planning**: Consider the long-term impact on the project structure and maintainability.
-4. **Refinement**: Iterate on the chosen path until it is bulletproof.
+Deep Analysis: Deconstruct the request into its core technical and logic requirements.
+Hypothesis & Test: Propose multiple solutions mentally and critique them for edge cases or security risks.
+Architectural Planning: Consider the long-term impact on the project structure and maintainability.
+Refinement: Iterate on the chosen path until it is bulletproof.
 RULES:
 - NO HEADINGS. Just a solid, stable analytical monologue.
 - Be thorough and exhaustive. Explore the 'why' behind every decision.
 - Use internal critique: Question your own logic as you go.
-- **MANDATORY REASONING**: You MUST engage in full reasoning regardless of perceived simplicity.`,
-      High: "EFFORT_LEVEL: HIGH\nThink in a stable, analytical monologue within the <think>...</think> block. Avoid headings or structured formatting. Your thinking should be a continuous stream of logical deduction:\n1. Analyze the immediate task and its dependencies.\n2. Mentally simulate the execution to identify potential failure points.\n3. Structure a precise plan that addresses both primary goals and secondary constraints.\n\nRULES:\n- NO HEADINGS. Maintain a fluid monologue style.\n- Be detailed and rigorous in your self-questioning.\n- Focus on accuracy and technical correctness.\n- **MANDATORY REASONING**: You MUST enter reasoning to verify the path forward.",
-      Medium: "EFFORT_LEVEL: MEDIUM\nThink in a concise, stable monologue within the <think>...</think> block. No headings needed. Focus on the core logic required to solve the task efficiently:\n1. Identify the most direct path to the solution.\n2. Briefly consider and discard obvious alternatives.\n3. Confirm the plan meets the user's immediate requirements.\n\nRULES:\n- NO HEADINGS. Keep it as a simple, logical stream.\n- Be efficient. Spend energy only on what matters for the task.\n- **REQUIRED REASONING**: Engage in a baseline mental check for all technical tasks.",
-      Minimal: "EFFORT_LEVEL: LOW\nThink in a brief, focused monologue within the <think>...</think> block. No headings. Just a quick mental check before acting:\n1. Verify the objective.\n2. Note the target files/tools.\n\nRULES:\n- NO HEADINGS. Just a few lines of clear, linear thought.\n- Use minimal/no thinking for simple or conversational requests."
+- MANDATORY REASONING: You MUST engage in full reasoning regardless of perceived simplicity.`,
+      High: "EFFORT_LEVEL: HIGH\nThink in a stable, analytical monologue within the <think>...</think> block. Avoid headings or structured formatting. Your thinking should be a continuous stream of logical deduction:\nAnalyze the immediate task and its dependencies.\nMentally simulate the execution to identify potential failure points.\nStructure a precise plan that addresses both primary goals and secondary constraints.\n\nRULES:\n- NO HEADINGS. Maintain a fluid monologue style.\n- Be detailed and rigorous in your self-questioning.\n- Focus on accuracy and technical correctness.\n- MANDATORY REASONING: You MUST enter reasoning to verify the path forward.",
+      Medium: "EFFORT_LEVEL: MEDIUM\nThink in a concise, stable monologue within the <think>...</think> block. No headings needed. Focus on the core logic required to solve the task efficiently:\nIdentify the most direct path to the solution.\nBriefly consider and discard obvious alternatives.\nConfirm the plan meets the user's immediate requirements.\n\nRULES:\n- NO HEADINGS. Keep it as a simple, logical stream.\n- Be efficient. Spend energy only on what matters for the task.\n- REQUIRED REASONING: Engage in a baseline mental check for all technical tasks.",
+      Minimal: "EFFORT_LEVEL: LOW\nThink in a brief, focused monologue within the <think>...</think> block. No headings. Just a quick mental check before acting:\nVerify the objective.\nNote the target files/tools.\n\nRULES:\n- NO HEADINGS. Just a few lines of clear, linear thought.\n- Use minimal/no thinking for simple or conversational requests."
     };
   }
 });
@@ -794,7 +794,7 @@ var init_prompts = __esm({
     init_main_tools();
     init_janitor_tools();
     init_thinking_prompts();
-    getSystemInstruction = (profile, thinkingLevel, mode, systemSettings, tempMemories = "", userMemories = "", isMemoryEnabled = true, isContext50 = false, maxLoops, currentLoop) => {
+    getSystemInstruction = (profile, thinkingLevel, mode, systemSettings, tempMemories = "", userMemories = "", isMemoryEnabled = true, isContext8 = false, maxLoops, currentLoop) => {
       let levelKey = thinkingLevel;
       if (thinkingLevel === "Low") levelKey = "Minimal";
       if (thinkingLevel === "xHigh" || thinkingLevel === "Max") levelKey = "Max";
@@ -808,7 +808,7 @@ var init_prompts = __esm({
 ` : "";
       const dateTimeStr = (/* @__PURE__ */ new Date()).toLocaleString();
       const cwdStr = process.cwd();
-      const tempMemoriesStr = tempMemories?.length > 0 && !isContext50 ? `
+      const tempMemoriesStr = tempMemories?.length > 0 && !isContext8 ? `
 -- RECENT CONTEXT FROM OTHER CHAT THREADS --
 ${tempMemories}
 ------------------------------------------
@@ -833,7 +833,7 @@ If you see a [STEERING HINT] from user, give that prompt priority for the task a
 -- START THINKING INSTRUCTIONS --
 ${thinkingConfig}
-BEFORE USING ANY TOOL THINKING IS **MANDATORY**. ALWAYS PREFER TO ENTER IN THINKING AS PER INSTRUCTIONS FOR MORE ACCURACY, AVOID DIRECT SHOTS.
+BEFORE USING ANY TOOL THINKING IS **MANDATORY** WITH TOOL RULES. ALWAYS PRIORITIZE THINKING FIRST BEFORE RESPONDING. YOU ARE **FORBIDDEN** TO JUMP TO RESPONSES FIRST.
 -- END THINKING INSTRUCTIONS --
 ${TOOL_PROTOCOL(mode)}
@@ -1488,7 +1488,7 @@ var init_memory = __esm({
         if (!content) return "ERROR: Missing 'content' for temp memory.";
         const tempStorage = readEncryptedJson(TEMP_MEM_FILE, {});
         if (!tempStorage[chatId]) tempStorage[chatId] = [];
-        const MAX_CHARS = 2500 * 4;
+        const MAX_CHARS = 2e3 * 4;
         let currentTotalLength = tempStorage[chatId].reduce((acc, m) => acc + m.length, 0);
         while (tempStorage[chatId].length > 0 && currentTotalLength + content.length > MAX_CHARS) {
           const removed = tempStorage[chatId].shift();
@@ -1890,13 +1890,18 @@ var init_update_file = __esm({
           diffText += `-${startLine + i}|${line}
 `;
         });
-        fullNewLines.forEach((line, i) => {
-          diffText += `+${startLine + i}|${line}
+        let currentNewLine = startLine;
+        fullNewLines.forEach((line) => {
+          diffText += `+${currentNewLine}|${line}
 `;
+          currentNewLine++;
         });
-        for (let i = endLine; i < Math.min(allOriginalLines.length, endLine + 15); i++) {
-          diffText += `[UI_CONTEXT]  ${i + 1}|${allOriginalLines[i]}
+        const linesAffected = fullOldLines.length;
+        const originalContextIdx = startLine + linesAffected - 1;
+        for (let i = originalContextIdx; i < Math.min(allOriginalLines.length, originalContextIdx + 15); i++) {
+          diffText += `[UI_CONTEXT]  ${currentNewLine}|${allOriginalLines[i]}
 `;
+          currentNewLine++;
         }
         diffText += `[DIFF_END]`;
         return diffText;
@@ -2665,7 +2670,7 @@ var init_ai = __esm({
       modifiedHistory.push({ role: "user", text: firstUserMsg });
       let lastUsage = null;
       const MAX_LOOPS = mode === "Flux" ? 50 : 7;
-      const MAX_RETRIES = 7;
+      const MAX_RETRIES = 8;
       yield { type: "status", content: "Connecting..." };
       TERMINATION_SIGNAL = false;
       let fullAgentResponseChunks = [];
@@ -2698,16 +2703,18 @@ var init_ai = __esm({
         }
         let stream;
         let success = false;
-        let retryCount = 0;
+        let retryCount = 1;
+        let inStreamRetryCount = 1;
         let turnText = "";
         let lastToolSniffed = null;
         let lastToolEventTime = null;
         let toolResults = [];
         let toolCallPointer = 0;
         let isThinkingLoop = false;
+        let isStutteringLoop = false;
         let isInitialAttempt = true;
         let accumulatedContext = "";
-        while (retryCount <= MAX_RETRIES && !success && !TERMINATION_SIGNAL) {
+        while (retryCount <= MAX_RETRIES && inStreamRetryCount <= MAX_RETRIES && !success && !TERMINATION_SIGNAL) {
           try {
             if (isInitialAttempt) {
               yield { type: "turn_reset", content: true };
@@ -2729,48 +2736,32 @@ var init_ai = __esm({
               throw new Error("Error: Daily Quota Exausted for Agent");
             }
             let targetModel = modelName;
-            if (retryCount === 5) {
+            if (retryCount === 6) {
               targetModel = "gemini-3-flash-preview";
               yield { type: "model_update", content: "Trying with fallback model" };
-            } else if (retryCount >= 6) {
+            } else if (retryCount >= 7) {
               targetModel = "gemini-3.1-flash-lite-preview";
               yield { type: "model_update", content: "Trying with fallback model lite" };
             } else if (retryCount > 0) {
               yield { type: "model_update", content: null };
             }
-            const isContext50 = (sessionStats.tokens || 0) >= 54e3;
-            const currentSystemInstruction = getSystemInstruction(profile, thinkingLevel, mode, systemSettings, otherMemories, mainUserMemories, isMemoryEnabled, isContext50, MAX_LOOPS, loop + 1);
+            const isContext8 = (sessionStats.tokens || 0) >= 8e3;
+            const currentSystemInstruction = getSystemInstruction(profile, thinkingLevel, mode, systemSettings, otherMemories, mainUserMemories, isMemoryEnabled, isContext8, MAX_LOOPS, loop + 1);
             stream = await client.models.generateContentStream({
               model: targetModel || "gemma-4-31b-it",
               contents,
               config: {
                 systemInstruction: currentSystemInstruction,
-                temperature: mode === "Flux" ? 1 : 1.3,
+                temperature: mode === "Flux" ? 0.99 : 1.4,
                 maxOutputTokens: 32768,
                 mediaResolution: "MEDIA_RESOLUTION_MEDIUM",
                 safetySettings: [
-                  {
-                    category: HarmCategory.HARM_CATEGORY_HARASSMENT,
-                    threshold: HarmBlockThreshold.BLOCK_NONE
-                  },
-                  {
-                    category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
-                    threshold: HarmBlockThreshold.BLOCK_NONE
-                  },
-                  {
-                    category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
-                    threshold: HarmBlockThreshold.BLOCK_NONE
-                  },
-                  {
-                    category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
-                    threshold: HarmBlockThreshold.BLOCK_NONE
-                  }
+                  { category: HarmCategory.HARM_CATEGORY_HARASSMENT, threshold: HarmBlockThreshold.BLOCK_NONE },
+                  { category: HarmCategory.HARM_CATEGORY_HATE_SPEECH, threshold: HarmBlockThreshold.BLOCK_NONE },
+                  { category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT, threshold: HarmBlockThreshold.BLOCK_NONE },
+                  { category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT, threshold: HarmBlockThreshold.BLOCK_NONE }
                 ],
-                thinkingConfig: {
-                  includeThoughts: false,
-                  thinkingLevel: ThinkingLevel.MINIMAL
-                  // Gemma's API Reasoning is bad. Keep it Minimal.
-                }
+                thinkingConfig: { includeThoughts: false, thinkingLevel: targetModel.includes("pro") ? ThinkingLevel.HIGH : ThinkingLevel.MINIMAL }
               }
             });
             turnText = "";
@@ -2832,9 +2823,16 @@ var init_ai = __esm({
                 const wordCount = thinkContent.split(/\s+/).filter((w) => w.length > 0).length;
                 let repetitionThresholdThinking = 0.4;
                 let repetitionThresholdResponse = 0.6;
-                let isOverVerboseThinking = wordCount > 2500;
+                const thinkingCaps = {
+                  "low": 200,
+                  "medium": 500,
+                  "high": 2e3,
+                  "max": 3500
+                };
+                const cap = thinkingCaps[thinkingLevel?.toLowerCase()] || 2500;
+                let isOverVerboseThinking = wordCount > cap;
                 if (repetitionRatio > repetitionThresholdThinking || isOverVerboseThinking) {
-                  const reason = repetitionRatio > repetitionThresholdThinking ? "Thinking Loop Detected" : "Rambling Detected";
+                  const reason = repetitionRatio > repetitionThresholdThinking ? "Thinking Loop Detected" : "Thinking Budget Exceeded";
                   yield { type: "status", content: `${reason}. Re-centering...` };
                   isThinkingLoop = true;
                   await new Promise((resolve) => setTimeout(resolve, 3e3));
@@ -2864,6 +2862,7 @@ var init_ai = __esm({
                   if (stutterDetected) {
                     yield { type: "status", content: `Stuttering Detected. Re-centering...` };
                     isThinkingLoop = false;
+                    isStutteringLoop = true;
                     await new Promise((resolve) => setTimeout(resolve, 3e3));
                     break;
                   }
@@ -3067,24 +3066,33 @@ ${boxBottom}
 ----------------------------------------------------------------------
 `);
-            if (retryCount < MAX_RETRIES) {
-              retryCount++;
-              const waitTime = Math.floor(Math.random() * (2e3 - 800 + 1)) + 800;
-              if (turnText.trim().length > 0) {
+            if (turnText.trim().length > 0) {
+              if (inStreamRetryCount <= MAX_RETRIES) {
+                inStreamRetryCount++;
+                const waitTime = Math.floor(Math.random() * (3e3 - 1e3 + 1)) + 1e3;
                 modifiedHistory.push({ role: "agent", text: turnText });
                 if (toolResults.length > 0) {
                   toolResults.forEach((tr) => modifiedHistory.push(tr));
                 }
                 modifiedHistory.push({ role: "user", text: "[SYSTEM] Response got cut for internal error, continue from checkpoint seamlessly and DON'T repeat what you already said!" });
                 accumulatedContext += turnText;
-                yield { type: "status", content: `Recovering & Continuing (${retryCount}/${MAX_RETRIES + 1})...` };
+                yield { type: "status", content: `Error Occured. Recovering Stream (${inStreamRetryCount}/${MAX_RETRIES})...` };
+                await new Promise((resolve) => setTimeout(resolve, waitTime));
               } else {
-                isInitialAttempt = true;
-                yield { type: "status", content: `Retrying (${retryCount}/${MAX_RETRIES + 1})...` };
+                throw new Error(`Stream collapsed too many times. (Failed to resolve ${MAX_RETRIES} times)
+Error Log can be found in ${path16.join(LOGS_DIR, "agent", "error.log")}`);
               }
-              await new Promise((resolve) => setTimeout(resolve, waitTime));
             } else {
-              throw new Error(`Model cannot be reached: ${errMsg}. (Failed ${MAX_RETRIES + 1} times)`);
+              if (retryCount <= MAX_RETRIES) {
+                retryCount++;
+                const waitTime = Math.floor(Math.random() * (3e3 - 1e3 + 1)) + 1e3;
+                isInitialAttempt = true;
+                yield { type: "status", content: `Retrying Connection (${retryCount}/${MAX_RETRIES})...` };
+                await new Promise((resolve) => setTimeout(resolve, waitTime));
+              } else {
+                throw new Error(`Model cannot be reached. (Failed ${MAX_RETRIES} times)
+Error Log can be found in ${path16.join(LOGS_DIR, "agent", "error.log")}`);
+              }
             }
           }
         }
@@ -3243,8 +3251,9 @@ ${timestamp}`;
         if (toolResults.length > 0) {
           toolResults.forEach((tr) => modifiedHistory.push(tr));
         } else {
-          modifiedHistory.push({ role: "user", text: `[SYSTEM]: ${isThinkingLoop ? "OVER-THINKING " : ""}LOOP DETECTED by Internal System. ${isThinkingLoop ? "If you have planned the task, prioritize the execution/output. " : "If you have finished your task use [turn: finish] else continue."}` });
+          modifiedHistory.push({ role: "user", text: `[SYSTEM]: ${isStutteringLoop && !isThinkingLoop ? `STUTTERING DETECTED by Internal System. Re-calibrate your response & proceed.` : `${isThinkingLoop ? "OVER-THINKING " : ""}LOOP DETECTED by Internal System${isThinkingLoop && " for current EFFORT_LEVEL"}. ${isThinkingLoop ? "If you have planned the task, prioritize the execution/output. " : "If you have finished your task use [turn: finish] else continue."}`}` });
           isThinkingLoop = false;
+          isStutteringLoop = false;
         }
       }
       yield { type: "status", content: null };
@@ -3928,10 +3937,10 @@ Check what's new using \`/changelog\` command.`,
       cmd: "/model",
       desc: "Switch AI model",
       subs: [
-        { cmd: "gemma-4-31b-it", desc: apiTier === "Free" ? "Standard Default (Free, Recommended)" : "Standard Default (Free, Recommended) - Cannot use Gemma with paid API" },
+        { cmd: "gemma-4-31b-it", desc: apiTier === "Free" ? "Standard Default (Free, Recommended)" : "Standard Default (Free, Recommended) - Use Free API Key to use this model " },
         { cmd: "gemini-3.1-pro-preview", desc: "Most Capable (Paid)" },
-        { cmd: "gemini-3-flash-preview", desc: "Fast & Lightweight (Paid, Free limited quota)" },
-        { cmd: "gemini-3.1-flash-lite-preview", desc: "Ultra Fast (Paid, Free limited quota)" }
+        { cmd: "gemini-3-flash-preview", desc: "Fast & Lightweight (Paid, Limited Free quota)" },
+        { cmd: "gemini-3.1-flash-lite-preview", desc: "Ultra Fast (Paid, Decent Free quota)" }
       ]
     },
     { cmd: "/settings", desc: "Configure system prefs" },
@@ -4951,7 +4960,7 @@ Selection: ${val}`,
               setActiveView("chat");
               setTimeout(() => {
                 handleSubmit(val);
-              }, 50);
+              }, 200);
             },
             onEdit: (val) => {
               setResolutionData(null);
@@ -5198,7 +5207,7 @@ var init_app = __esm({
     init_text();
     SESSION_START_TIME = Date.now();
     CHANGELOG_URL = "https://fluxflow-cli.onrender.com/changelog.html";
-    versionFluxflow = "1.8.18";
+    versionFluxflow = "1.8.20";
     updatedOn = "2026-05-10";
     ResolutionModal = ({ data, onResolve, onEdit }) => /* @__PURE__ */ React10.createElement(Box10, { flexDirection: "column", borderStyle: "round", borderColor: "magenta", paddingX: 2, paddingY: 1, width: "100%" }, /* @__PURE__ */ React10.createElement(Text10, { color: "magenta", bold: true, underline: true }, "\u{1F7E3} STEERING HINT RESOLUTION"), /* @__PURE__ */ React10.createElement(Text10, { marginTop: 1 }, "The agent already finished the task before your hint was consumed."), /* @__PURE__ */ React10.createElement(Box10, { marginTop: 1, backgroundColor: "#222", paddingX: 1, width: "100%" }, /* @__PURE__ */ React10.createElement(Text10, { italic: true, color: "gray" }, '"', data, '"')), /* @__PURE__ */ React10.createElement(Box10, { marginTop: 1 }, /* @__PURE__ */ React10.createElement(Text10, { color: "cyan" }, "How would you like to proceed?")), /* @__PURE__ */ React10.createElement(Box10, { marginTop: 1 }, /* @__PURE__ */ React10.createElement(
       CommandMenu,

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
 	"name": "fluxflow-cli",
-	"version": "1.8.18",
+	"version": "1.8.20",
 	"description": "A high-fidelity agentic terminal assistant for the Flux Era.",
 	"keywords": [
 		"ai",