npm - scai - Versions diffs - 0.1.164 → 0.1.166 - Mend

scai 0.1.164 → 0.1.166

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md +203 -133
package/dist/agents/reasonNextTaskStep.js +45 -0
package/dist/db/fileIndex.js +91 -146
package/dist/pipeline/modules/finalAnswerModule.js +16 -4
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -1,98 +1,196 @@
 # ⚙️ SCAI — Source Code AI 🌿
-> **AI-powered CLI for local code analysis, commit message suggestions, and natural-language queries.**
-> **100% local • No token cost • Private by design • GDPR-friendly** — made in Denmark/EU with ❤️.
+> **A local-first AI CLI for understanding, querying, and iterating on large codebases.**
+> **100% local • No token costs • No cloud • No prompt injection • Private by design**
 🔗 **Website:** [https://scai.dk](https://scai.dk)
+🇪🇺 Built in Denmark / EU
-SCAI is your AI coding companion in the terminal. Stay focused on coding while SCAI helps you understand, analyze, and reason about your codebase using local language models.
+---
+## What is SCAI?
+**SCAI** is an AI-powered command-line tool that helps developers explore and reason about source code using **local large language models only**.
+Inspired by tools such as *Claude Code* and *Gemini CLI*, SCAI is designed to feel like a natural extension of the terminal. It enables natural-language interaction with your codebase while deliberately avoiding cloud dependencies and network-connected agents.
+SCAI runs entirely on your local system:
+* **No token costs** — no usage-based pricing
+* **No internet access for agents**
+* **No prompt injection from web content**
+* No external AI APIs
+* No telemetry or tracking
+* No API keys
+Your code never leaves your machine. All analysis and reasoning happens locally.
+> **Local model tradeoff**
+> SCAI uses local LLMs. Output quality depends on your hardware and selected model. Cloud-hosted systems may perform better on general reasoning tasks, but SCAI prioritizes privacy, predictability, and control.
+---
+## ⚠️ Alpha Status
+SCAI is currently in **alpha**.
+If you have previously installed SCAI, reset the local database before upgrading:
+```bash
+scai db reset
+scai index start
+```
+Breaking changes and evolving behavior should be expected.
+---
+## Why SCAI?
+### 🔐 Local-Only by Design
+SCAI agents operate **entirely offline**.
+They do not:
+* Browse the web
+* Fetch URLs
+* Ingest external documents
+* Execute remote prompts
-**Local Model Note:** SCAI runs entirely on local LLMs. This means **no API keys, no token cost, and full privacy**, but also **more limited capabilities** compared to cloud-hosted AI.
+**Security implications:**
-> ⚠️ **Alpha Version Notice**
-> If you have previously installed SCAI, please run:
->
-> bash
-> scai db reset && scai index start
->
->
-> before using this version.
+* No prompt injection via web content
+* No data exfiltration
+* No hidden network calls
+* Fully auditable execution
+This makes SCAI suitable for **private repositories, regulated environments, and GDPR-compliant workflows**.
 ---
-## 🧠 Why SCAI?
+### 🧠 Codebase-Aware Analysis
+SCAI builds and maintains a structured internal representation of your repository using:
+* Language-aware parsing
+* Symbol and dependency indexing
+* Static and heuristic analysis
+* Cross-file context tracking
+This enables repository-level questions that go beyond single-file inspection.
-SCAI is not just another AI tool — it's a **developer-first**, **privacy-focused**, and **local-first** coding companion. Here's why SCAI stands out:
+---
-### 🔐 **100% Local & Private by Design**
+### ✂️ Assisted Code Iteration (Early)
-Unlike cloud-based AI tools, SCAI runs entirely on your machine. No data leaves your environment — making it ideal for **sensitive codebases** and **GDPR-compliant workflows**.
+SCAI can assist with **lightweight, example-driven code iteration**, primarily focused on understanding and improving existing code rather than large-scale automated refactoring.
-### 🧠 **Deep Code Understanding**
+Current strengths include:
-SCAI doesn't just parse code — it **understands** it. With background indexing, static analysis, and language-aware parsing, SCAI helps you explore, refactor, and debug with confidence.
+* Explaining what functions, files, or modules do
+* Identifying patterns and responsibilities across files
+* Generating or improving comments and documentation
+* Highlighting structural or readability issues
+* Suggesting small, localized improvements
-### 📦 **No Token Costs or API Keys**
+Changes are **guided by indexed context and user prompts**, and are intended to support human review and decision-making.
-SCAI works offline, with **zero token usage**. You don’t pay for API calls or subscribe to cloud services. Just install and go.
+Large-scale or fully automated repository-wide refactoring should currently be considered **experimental**.
-### 🛠️ **Developer-Focused Toolset**
+---
-From commit message generation to architecture summaries, SCAI integrates directly into your workflow. It's built for developers, by developers.
+### 🛠 Built for Developer Workflows
-### 🇩🇰 **Built in Denmark/EU**
+SCAI is a **terminal-native tool** designed to integrate cleanly into daily development:
-SCAI is developed in the European Union, ensuring compliance with data protection laws and a focus on privacy-first development.
+* Natural-language queries over your codebase
+* Code understanding and exploration
+* Assisted iteration and suggestions
+* Commit message generation
+* Background indexing and analysis
+* Interactive REPL
-> ✅ SCAI is your **AI coding assistant that respects your privacy**, enhances your productivity, and works **entirely offline**.
+No browser UI. No cloud login. No vendor lock-in.
 ---
-## 🗣️ Language Support (Important)
+### 🇪🇺 Privacy & Compliance First
-SCAI is currently **tested and validated only on the following languages**:
+* Fully local execution
+* No telemetry
+* No cloud services
+* Developed in Denmark / EU
+* GDPR-friendly by default
+---
-* **JavaScript (JS)**
-* **TypeScript (TS)**
+## Language Support
+SCAI is currently **tested and supported** for:
+* **JavaScript**
+* **TypeScript**
 * **Java**
-Other languages may work partially, but analysis quality, indexing accuracy, and agent behavior are **not guaranteed** outside these languages. Broader language support is planned, but for now SCAI should be considered **JS/TS/Java-first**.
+Other languages may work partially, but indexing quality, analysis accuracy, and agent behavior are **not guaranteed**.
+SCAI should currently be considered **JS / TS / Java-first**.
 ---
-## 💻 Getting Started
+## Getting Started
-### 1️⃣ Install & Initialize
+### Install & Initialize
-bash
+```bash
 npm install -g scai
 scai init
 scai index start
+```
+This:
-This initializes local models (recommended: `qwen3-coder:30b`) and starts indexing your code repository.
+* Initializes local configuration
+* Starts the background daemon
+* Begins indexing the current repository
-> ⏳ **Note:** Initial indexing and analysis can take **minutes to hours** depending on repository size and enabled analysis tools.
+> **Note**
+> Initial indexing can take **minutes to hours**, depending on repository size and enabled analysis.
-### 2️⃣ Check Available Commands
+---
+### Starting SCAI
+Running the `scai` command with no arguments starts the interactive shell:
 ```bash
-scai --help
+scai
 ```
----
+You can also start it explicitly:
+```bash
+scai shell
+```
-## 🏠 REPL Mode (Local AI Queries)
+---
-Start an interactive REPL to ask natural-language questions about your code:
+### View Available Commands
 ```bash
-scai shell
+scai --help
 ```
-Once in the REPL, you can:
+---
+## Interactive REPL
+The REPL is the primary interface for working with SCAI.
+### Ask questions about your codebase
-### Ask questions about your codebase. Be specific for better results.
+Be specific for better results.
 ```text
 scai> what does withContext function do in index.ts file?
@@ -103,32 +201,28 @@ scai> Where are all the database queries defined?
 scai> List files involved in authentication
 ```
-### Run CLI commands from inside the REPL
+### Run CLI commands inside the REPL
 ```text
-scai> /git commit
-scai> /index list
-scai> /index set /path/to/repo
-scai> /index switch
-scai> /index delete
+/index list
+/index switch
+/git commit
 ```
 ### Execute shell commands
 ```text
-scai> !ls -la
-scai> !git status
+!git status
+!ls -la
 ```
-> ✅ REPL queries are free, offline, GDPR-friendly, and **no token cost**.
+All interactions remain **offline and free**, with **no token usage**.
 ---
-## 📦 Repository Indexing
+## Repository Indexing
-Before SCAI can answer questions, your repository must be indexed.
-### Common Index Commands
+Repositories must be indexed before querying:
 ```bash
 scai index set /path/to/repo
@@ -138,47 +232,43 @@ scai index switch
 scai index delete
 ```
-Only indexed repositories can be queried.
+Only indexed repositories are accessible to agents.
 ---
-## 🧠 Background Indexing & Analysis (Daemon)
-SCAI performs **deep repository indexing and static analysis** using background workers. This includes:
+## Background Analysis (Daemon)
-* File structure discovery
-* Language-aware parsing (JS / TS / Java)
-* Symbol and dependency mapping
-* Heuristic analysis for tests, architecture, and patterns
+SCAI performs deep analysis in the background, including:
-⚠️ **Important:** On first install or on large repositories, this process can take **several hours**.
+* File discovery
+* AST parsing
+* Dependency graph construction
+* Symbol resolution
+* Heuristic structure analysis
-All background work is handled by the **SCAI daemon**, which can be fully controlled from the CLI.
-### Daemon Commands
+Daemon control:
 ```bash
 scai daemon start
 scai daemon stop
 scai daemon restart
 scai daemon status
-scai daemon unlock
 scai daemon logs
 ```
-You can safely stop the daemon at any time. Indexing and analysis will resume when restarted.
+Indexing progress resumes automatically after restart.
 ---
-## ⚙️ Configuration
+## Configuration
-Set the local AI model (recommended):
+Set a local model (recommended):
 ```bash
 scai config set-model qwen3-coder:30b
 ```
-View current configuration:
+View configuration:
 ```bash
 scai config show --raw
@@ -186,22 +276,22 @@ scai config show --raw
 ---
-## 🔧 Git Commit Assistant
+## Git Commit Assistant
-Generate meaningful commit messages based on staged changes:
+Generate commit messages from staged changes:
 ```bash
 git add .
 scai git commit
 ```
-All analysis is performed locally — **no token usage, no cloud calls**.
+All diff inspection and reasoning is performed locally.
 ---
-## 🔑 GitHub Authentication
+## GitHub Authentication
-For GitHub-related features, SCAI requires a Personal Access Token.
+Required only for GitHub-related features:
 ```bash
 scai auth set
@@ -211,98 +301,78 @@ scai auth reset
 ---
-## 🧠 Example Queries
-* `Summarize codeTransform.js`
-* `Explain utils/helpers.ts architecture`
-* `List all functions without tests in services/`
-* `Show where database queries are defined`
-* `Highlight potential memory leaks`
-* `Describe how authentication works`
-* `Summarize repo architecture`
----
-## 🔐 Privacy & GDPR
+## Privacy & Security Summary
-* Fully local — no cloud calls
+* 100% local execution
+* No internet access for agents
+* No prompt injection from web content
 * No API keys
-* **No token cost**
-* GDPR-friendly, built in Denmark/EU 🇩🇰
+* No token costs
+* GDPR-friendly by default
 ---
-## 🙌 Feedback & Support
-Feedback, bugs, and ideas are very welcome:
+## Feedback & Community
-* 🌍 Website: [https://scai.dk](https://scai.dk)
-* 🧵 Threads: [@scai.dk](https://threads.net/@scai.dk)
-<br>
+* 🌍 [https://scai.dk](https://scai.dk)
+* 🧵 [https://threads.net/@scai.dk](https://threads.net/@scai.dk)
 ---
-<br>
-<br>
-## 🔐 License & Usage Terms
+# License & Usage Terms
-Copyright © SCAI
-All rights reserved.
+© SCAI — All rights reserved.
-SCAI is **free to use for non-commercial purposes only**, subject to the terms below.
+SCAI is **free for non-commercial use only**.
 ---
-## ✅ Permitted Use
+## Permitted Use
-You may use SCAI **without charge** for the following purposes:
+You may use SCAI free of charge for:
 * Personal projects
 * Educational use
 * Research and experimentation
-* Non-commercial open-source contributions
-* Internal evaluation or proof-of-concept work
+* Non-commercial open-source work
+* Internal evaluation or proof-of-concepts
-You may also **fork, modify, and redistribute** the source code, provided that such use remains **strictly non-commercial**.
+You may fork and modify the source code **for non-commercial purposes only**.
 ---
-## 🚫 Restricted Use
+## Restricted Use
-The following uses are **not permitted without a commercial license**:
+The following require a **commercial license**:
-* Commercial use of any kind
-* Enterprise or organizational deployment
-* Use as part of a paid product, service, or subscription
-* Use in consultancy, client work, or billable services
-* Bundling or integrating SCAI into commercial software or internal enterprise tooling
-* Resale, sublicensing, or redistribution for commercial purposes
+* Any commercial or enterprise use
+* Consultancy or client work
+* Paid products or services
+* Internal enterprise tooling
+* Commercial redistribution or resale
 ---
-## 🏢 Commercial & Enterprise Licensing
+## Commercial Licensing
-**Commercial and enterprise use requires a paid license and explicit permission from the author.**
+Commercial and enterprise use requires a **paid license** and explicit permission from the author.
-Organizations or individuals wishing to use SCAI in a commercial context must obtain a commercial license **prior to such use**.
-Please contact the author to discuss commercial licensing terms.
+Please contact the author to discuss licensing terms.
 ---
-## ⚖️ Disclaimer
+## Disclaimer
-This software is provided **"as is"**, without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, and non-infringement.
+This software is provided **“as is”**, without warranty of any kind.
-In no event shall the author be liable for any claim, damages, or other liability, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the software or the use or other dealings in the software.
+The author is not liable for any damages arising from its use.
 ---
-### 📄 Summary (Non-Binding)
+### Non-Binding Summary
 * Free for personal and non-commercial use
-* Commercial and enterprise use requires a paid license
-* Commercial redistribution is prohibited
+* Fully local, offline AI
+* No token costs
+* No prompt injection surface
+* Commercial use requires a license

package/dist/agents/reasonNextTaskStep.js CHANGED Viewed

@@ -1,3 +1,6 @@
+// File: src/agents/reasonNextTaskStep.ts
+import { generate } from "../lib/generate.js";
+import { cleanupModule } from "../pipeline/modules/cleanupModule.js";
 import { logInputOutput } from "../utils/promptLogHelper.js";
 /**
  * REASON NEXT TASK STEP
@@ -95,6 +98,48 @@ export const reasonNextTaskStep = {
             confidence = 0.98;
         }
         // ---------------------------
+        // 6.5️⃣ Optional: Reason over known risks
+        // ---------------------------
+        const knownRisks = context.analysis.understanding?.risks ?? [];
+        if (knownRisks.length > 0) {
+            // Optionally call the LLM with constrained instructions
+            const riskPrompt = `
+You are given the following KNOWN RISKS (authoritative, do not invent new ones):
+${knownRisks.map(r => "- " + r).join("\n")}
+Task:
+- Decide whether it is reasonable to ask the user for clarification before proceeding.
+- Return STRICT JSON: { askUser: true|false, rationale: string }
+`;
+            try {
+                const aiResponse = await generate({
+                    query: context.initContext?.userQuery ?? "",
+                    content: riskPrompt
+                });
+                const cleaned = await cleanupModule.run({
+                    query: context.initContext?.userQuery ?? "",
+                    content: aiResponse.data ?? ""
+                });
+                const parsed = cleaned.data;
+                // type guard
+                if (parsed &&
+                    typeof parsed === "object" &&
+                    "askUser" in parsed &&
+                    "rationale" in parsed &&
+                    typeof parsed.rationale === "string") {
+                    if (parsed.askUser) {
+                        nextAction = "request-feedback";
+                        rationale += `\nUser clarification recommended due to known risks: ${parsed.rationale}`;
+                        confidence = Math.min(confidence, 0.8); // slightly lower because human needed
+                    }
+                }
+            }
+            catch (err) {
+                console.warn("[reasonNextTaskStep] Risk reasoning failed", err);
+                // fallback: ignore, keep deterministic nextAction
+            }
+        }
+        // ---------------------------
         // 7️⃣ Ensure a TaskStep exists for nextFile
         // ---------------------------
         if (nextFile) {

package/dist/db/fileIndex.js CHANGED Viewed

@@ -11,12 +11,15 @@ import { IGNORED_FOLDER_GLOBS } from '../fileRules/ignoredPaths.js';
 import { Config } from '../config.js';
 import { log } from '../utils/log.js';
 import { startDaemon } from '../commands/DaemonCmd.js';
-import { sanitizeQueryForFts } from '../utils/sanitizeQuery.js';
 import * as sqlTemplates from '../db/sqlTemplates.js';
 import { RELATED_FILES_LIMIT } from '../constants.js';
 import { generate } from '../lib/generate.js';
-import { cleanupModule } from '../pipeline/modules/cleanupModule.js';
 import { logInputOutput } from '../utils/promptLogHelper.js';
+import { sanitizeQueryForFts } from '../utils/sanitizeQuery.js';
+import { extractTaggedContent } from '../utils/parseTaggedContent.js';
+/* -------------------------------------------------- */
+/* DB LOCK                                             */
+/* -------------------------------------------------- */
 async function lockDb() {
     try {
         return await lockfile.lock(getDbPathForRepo());
@@ -26,6 +29,9 @@ async function lockDb() {
         throw err;
     }
 }
+/* -------------------------------------------------- */
+/* INDEX COMMAND                                       */
+/* -------------------------------------------------- */
 export async function runIndexCommand() {
     try {
         initSchema();
@@ -57,9 +63,6 @@ export async function runIndexCommand() {
                 const type = detectFileType(file);
                 const normalizedPath = path.normalize(file).replace(/\\/g, '/');
                 const filename = path.basename(normalizedPath);
-                // --------------------------------------------------
-                // Enqueue file for daemon processing
-                // --------------------------------------------------
                 db.prepare(upsertFileTemplate).run({
                     path: normalizedPath,
                     filename,
@@ -73,7 +76,7 @@ export async function runIndexCommand() {
                 count++;
             }
             catch (err) {
-                log(`⚠️ Skipped in indexCmd ${file}: ${err instanceof Error ? err.message : err}`);
+                log(`⚠️ Skipped in indexCmd ${file}: ${String(err)}`);
             }
         }
     }
@@ -82,110 +85,82 @@ export async function runIndexCommand() {
     }
     log('📊 Discovered files by extension:', JSON.stringify(countByExt, null, 2));
     log(`✅ Done. Enqueued ${count} files for indexing.`);
-    // Kick the daemon — it now owns all processing
     startDaemon();
 }
-// --------------------------------------------------
-// QUERY API (read-only, used by CLI / raw search)
-// --------------------------------------------------
+/* -------------------------------------------------- */
+/* QUERY API                                           */
+/* -------------------------------------------------- */
 export function queryFiles(safeQuery, limit = 10) {
     const db = getDbForRepo();
     return db
         .prepare(sqlTemplates.queryFilesTemplate)
         .all(safeQuery, limit);
 }
-// --------------------------------------------------
-// SEMANTIC SEARCH (AskCmd, answering user directly)
-// - Discards noisy FTS
-// - Uses LLM aggressively
-// - Optimizes for precision
-// --------------------------------------------------
-export async function semanticSearchFiles(originalQuery, _query, // ignored now – LLM owns query construction
-topK = 5) {
+/* -------------------------------------------------- */
+/* SEMANTIC SEARCH                                     */
+/* -------------------------------------------------- */
+export async function semanticSearchFiles(originalQuery, _query, topK = 5) {
     const db = getDbForRepo();
-    // --------------------------------------------------
-    // 1. LLM → primary FTS query (always)
-    // --------------------------------------------------
     const primaryFtsQuery = await generatePrimaryFtsQuery(originalQuery);
     logInputOutput("semanticSearchFiles LLM primary query", "output", {
         originalQuery,
         ftsQuery: primaryFtsQuery,
     });
-    // --------------------------------------------------
-    // 2. Run primary FTS once
-    // --------------------------------------------------
     const primaryResults = db
         .prepare(sqlTemplates.searchFilesTemplate)
         .all(primaryFtsQuery, RELATED_FILES_LIMIT);
     if (primaryResults.length > 0) {
         return rankAndMap(new Map(primaryResults.map(r => [r.id, r])), topK);
     }
-    // --------------------------------------------------
-    // 3. Fallback: LLM → 2–3 subqueries (ONLY if zero results)
-    // --------------------------------------------------
-    const subQueries = await generateFallbackFtsQueries(originalQuery, primaryFtsQuery);
-    logInputOutput("semanticSearchFiles LLM fallback queries", "output", {
+    const fallbackQuery = await generateFallbackFtsQueries(originalQuery, primaryFtsQuery);
+    logInputOutput("semanticSearchFiles LLM fallback query", "output", {
         originalQuery,
         primaryFtsQuery,
-        subQueries,
+        fallbackQuery,
     });
-    // --------------------------------------------------
-    // 4. Execute fallback queries sequentially
-    // --------------------------------------------------
-    for (const subQuery of subQueries) {
-        const rows = db
-            .prepare(sqlTemplates.searchFilesTemplate)
-            .all(subQuery, RELATED_FILES_LIMIT);
-        if (rows.length > 0) {
-            return rankAndMap(new Map(rows.map(r => [r.id, r])), topK);
+    if (fallbackQuery && fallbackQuery.length > 0) {
+        const stmt = db.prepare(sqlTemplates.searchFilesTemplate);
+        for (const query of fallbackQuery) {
+            const rows = stmt.all(query, RELATED_FILES_LIMIT);
+            if (rows.length > 0) {
+                return rankAndMap(new Map(rows.map(r => [r.id, r])), topK);
+            }
         }
     }
-    // --------------------------------------------------
-    // 5. Hard stop
-    // --------------------------------------------------
     return [];
 }
+/* -------------------------------------------------- */
+/* LLM → FTS QUERY GENERATION (TAG-BASED)               */
+/* -------------------------------------------------- */
 async function generatePrimaryFtsQuery(userQuery) {
     const prompt = `
-You are generating a SQLite FTS query for searching a source code repository.
+Generate a SQLite FTS query for searching a source code repository.
-Input (natural language):
+Input:
 "${userQuery}"
-Task:
-- Produce ONE concise FTS query
-- Focus on filenames, symbols, module names, domain nouns
-- Prefer literal identifiers likely to exist in code
-- NO sentences
-- NO stopwords
-- NO explanations
-- NO wildcards unless absolutely necessary
+Rules:
+- Output ONLY the query terms
 - Use OR between terms
-- **MAX 10 terms only** — be selective and concise
+- Max 10 terms
+- No explanations
+- No sentences
-Output JSON ONLY:
-{
-  "ftsQuery": "term1 OR term2 OR term3"
-}
+Wrap the result in <FILE_CONTENT> tags.
+<FILE_CONTENT>
+term1 OR term2 OR term3
+</FILE_CONTENT>
 `.trim();
     try {
         const response = await generate({ content: prompt, query: "" });
-        const cleaned = await cleanupModule.run({
-            query: userQuery,
-            content: response.data,
-        });
-        if (cleaned.data &&
-            typeof cleaned.data === "object" &&
-            "ftsQuery" in cleaned.data &&
-            typeof cleaned.data.ftsQuery === "string") {
-            return cleaned.data.ftsQuery;
-        }
+        const rawText = String(response.data ?? "");
+        const { content } = extractTaggedContent(rawText, "FILE_CONTENT");
+        return sanitizeQueryForFts(content);
     }
     catch (err) {
-        log(`⚠️ [semanticSearchFiles] Failed to generate primary FTS query: ${String(err)}`);
+        return sanitizeQueryForFts(userQuery);
     }
-    // Absolute safety fallback — never explode
-    return sanitizeQueryForFts(userQuery);
 }
 async function generateFallbackFtsQueries(userQuery, failedQuery) {
     const prompt = `
@@ -199,57 +174,44 @@ Primary FTS query returned ZERO results:
 Task:
 - Generate 2–3 independent FTS queries (MAX 3)
-- Each query should be concise: no more than 10 OR-joined search terms
+- Each query must be a single OR-joined expression
+- Max 10 terms per query
 - Focus on filenames, symbols, module names
-- Avoid natural-language sentences
-- Avoid recursion or refinement loops
-- Use OR between terms
+- Avoid natural language sentences
+- Avoid explanations or commentary
-Output JSON ONLY:
-{
-  "subQueries": [
-    "query1",
-    "query2",
-    "query3"
-  ]
-}
+Output format (STRICT):
+<FILE_CONTENT>
+query1
+query2
+query3
+</FILE_CONTENT>
 `.trim();
     try {
         const response = await generate({ content: prompt, query: "" });
-        const cleaned = await cleanupModule.run({
-            query: userQuery,
-            content: response.data,
-        });
-        if (cleaned.data &&
-            typeof cleaned.data === "object" &&
-            Array.isArray(cleaned.data.subQueries)) {
-            return cleaned.data.subQueries
-                .filter((q) => typeof q === "string")
-                .slice(0, 3) // cap to 3 queries
-                .map((q) => q
-                .split(' OR ')
-                .map(term => sanitizeQueryForFts(term)) // sanitize each term individually
-                .slice(0, 10) // cap terms per query
-                .join(' OR '));
+        const rawText = String(response.data ?? "");
+        const { content } = extractTaggedContent(rawText, "FILE_CONTENT");
+        const subQueries = content
+            .split(/\r?\n/)
+            .map(q => sanitizeQueryForFts(q.trim()))
+            .filter(Boolean)
+            .slice(0, 3);
+        if (!subQueries.length) {
+            throw new Error("No fallback subqueries generated");
         }
+        return subQueries;
     }
     catch (err) {
-        log(`⚠️ [semanticSearchFiles] Failed to generate fallback queries: ${String(err)}`);
+        log(`⚠️ [semanticSearchFiles] Fallback FTS generation failed: ${String(err)}`);
+        return null;
     }
-    return [];
 }
-// --------------------------------------------------
-// PLANNER SEARCH (fileSearchModule, discovery)
-// - Never discards FTS
-// - LLM ONLY if FTS is empty
-// - Optimizes for recall
-// --------------------------------------------------
+/* -------------------------------------------------- */
+/* PLANNER SEARCH                                      */
+/* -------------------------------------------------- */
 export async function plannerSearchFiles(originalQuery, query, topK = 5) {
     const db = getDbForRepo();
     const seen = new Map();
-    // -----------------------------
-    // Primary FTS (always trusted)
-    // -----------------------------
     const safeQuery = sanitizeQueryForFts(query);
     const primaryResults = db
         .prepare(sqlTemplates.searchFilesTemplate)
@@ -259,36 +221,31 @@ export async function plannerSearchFiles(originalQuery, query, topK = 5) {
         safeQuery,
         count: primaryResults.length,
     });
-    // -----------------------------
-    // Only call LLM if FTS is empty
-    // -----------------------------
     if (primaryResults.length === 0) {
-        const llmTerms = await expandQueryWithModel(originalQuery);
-        logInputOutput("plannerSearchFiles LLM terms (FTS empty)", "output", {
-            originalQuery,
-            suggestedTerms: llmTerms,
-        });
-        for (const term of llmTerms) {
-            const safeTerm = sanitizeQueryForFts(term);
+        const expanded = await expandQueryWithModel(originalQuery);
+        if (expanded) {
+            const safeTerm = sanitizeQueryForFts(expanded);
             const rows = db
                 .prepare(sqlTemplates.searchFilesTemplate)
                 .all(safeTerm, RELATED_FILES_LIMIT);
-            for (const row of rows) {
-                if (!seen.has(row.id))
-                    seen.set(row.id, row);
-            }
+            rows.forEach(r => {
+                if (!seen.has(r.id))
+                    seen.set(r.id, r);
+            });
         }
     }
     if (seen.size === 0)
         return [];
     return rankAndMap(seen, topK);
 }
-// --------------------------------------------------
-// Helpers
-// --------------------------------------------------
+/* -------------------------------------------------- */
+/* HELPERS                                             */
+/* -------------------------------------------------- */
 function rankAndMap(seen, topK) {
-    const merged = Array.from(seen.values()).sort((a, b) => (a.bm25Score ?? 0) - (b.bm25Score ?? 0));
-    return merged.slice(0, topK).map(r => ({
+    return Array.from(seen.values())
+        .sort((a, b) => (a.bm25Score ?? 0) - (b.bm25Score ?? 0))
+        .slice(0, topK)
+        .map(r => ({
         id: r.id,
         path: r.path,
         filename: r.filename,
@@ -300,32 +257,20 @@ function rankAndMap(seen, topK) {
 }
 async function expandQueryWithModel(query) {
     const prompt = `
-You are assisting a code search system.
-Given a natural-language question about a codebase, return a JSON array
-of 3–8 concrete search terms that are likely to appear literally in source code.
+Return concrete search terms likely to appear in source code.
-Rules:
-- Return ONLY a JSON array of strings
-- No explanations
-- Prefer filenames, function names, symbols, library names
+Wrap the result in <FILE_CONTENT> tags.
 Question:
 "${query}"
 `.trim();
     try {
         const response = await generate({ content: prompt, query: "" });
-        const cleaned = await cleanupModule.run({
-            query,
-            content: response.data,
-        });
-        const terms = Array.isArray(cleaned.data)
-            ? cleaned.data.filter((t) => typeof t === "string")
-            : [];
-        return terms;
+        const rawText = String(response.data ?? "");
+        const { content } = extractTaggedContent(rawText, "FILE_CONTENT");
+        return sanitizeQueryForFts(content);
     }
-    catch (err) {
-        log(`⚠️ [searchFiles] Failed to expand query: ${String(err)}`);
-        return [];
+    catch {
+        return null;
     }
 }

package/dist/pipeline/modules/finalAnswerModule.js CHANGED Viewed

@@ -28,6 +28,8 @@ export const finalAnswerModule = {
             (!focus?.relevantFiles || focus.relevantFiles.includes(path)))
             .map(([path, fa]) => ({ path, analysis: fa }))
             .slice(0, MAX_FILES);
+        // Collect analyzed files for output
+        const analyzedFiles = meaningfulFiles.map(f => f.path);
         // --------------------------------------------------
         // 2️⃣ Collect supporting code snippets from working files
         // --------------------------------------------------
@@ -104,6 +106,9 @@ ${query}
 Rationale for focus:
 ${rationale}
+Analyzed files:
+${analyzedFiles.join("\n")}
 ==================== PROPOSED CHANGES ====================
 ${semanticSection}
@@ -130,17 +135,24 @@ ${codeSection}
         // 5️⃣ Generate final answer
         // --------------------------------------------------
         const aiResponse = await generate({ query, content: prompt });
+        // ✅ Prepend analyzed files to finalText so user sees them
         const finalText = typeof aiResponse.data === "string"
-            ? aiResponse.data
-            : JSON.stringify(aiResponse.data, null, 2);
+            ? `Analyzed files:\n${analyzedFiles.join("\n")}\n\n${aiResponse.data}`
+            : `Analyzed files:\n${analyzedFiles.join("\n")}\n\n${JSON.stringify(aiResponse.data, null, 2)}`;
         context.analysis || (context.analysis = {});
         context.analysis.finalAnswer = finalText;
-        logInputOutput("finalAnswerModule", "output", aiResponse.data);
+        logInputOutput("finalAnswerModule", "output", {
+            data: aiResponse.data,
+            analyzedFiles,
+        });
         console.log(chalk.yellow(`\n\n[FINAL ANSWER]\n${finalText}\n`));
         return {
             query,
             content: finalText,
-            data: aiResponse.data,
+            data: {
+                response: aiResponse.data,
+                analyzedFiles,
+            },
             context,
         };
     },

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "scai",
-  "version": "0.1.164",
+  "version": "0.1.166",
   "type": "module",
   "bin": {
     "scai": "./dist/index.js"