scai 0.1.164 → 0.1.166

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,98 +1,196 @@
1
1
  # ⚙️ SCAI — Source Code AI 🌿
2
2
 
3
- > **AI-powered CLI for local code analysis, commit message suggestions, and natural-language queries.**
4
- > **100% local • No token costPrivate by design GDPR-friendly** made in Denmark/EU with ❤️.
3
+ > **A local-first AI CLI for understanding, querying, and iterating on large codebases.**
4
+ > **100% local • No token costsNo cloudNo prompt injection Private by design**
5
5
 
6
6
  🔗 **Website:** [https://scai.dk](https://scai.dk)
7
+ 🇪🇺 Built in Denmark / EU
7
8
 
8
- SCAI is your AI coding companion in the terminal. Stay focused on coding while SCAI helps you understand, analyze, and reason about your codebase using local language models.
9
+ ---
10
+
11
+ ## What is SCAI?
12
+
13
+ **SCAI** is an AI-powered command-line tool that helps developers explore and reason about source code using **local large language models only**.
14
+
15
+ Inspired by tools such as *Claude Code* and *Gemini CLI*, SCAI is designed to feel like a natural extension of the terminal. It enables natural-language interaction with your codebase while deliberately avoiding cloud dependencies and network-connected agents.
16
+
17
+ SCAI runs entirely on your local system:
18
+
19
+ * **No token costs** — no usage-based pricing
20
+ * **No internet access for agents**
21
+ * **No prompt injection from web content**
22
+ * No external AI APIs
23
+ * No telemetry or tracking
24
+ * No API keys
25
+
26
+ Your code never leaves your machine. All analysis and reasoning happens locally.
27
+
28
+ > **Local model tradeoff**
29
+ > SCAI uses local LLMs. Output quality depends on your hardware and selected model. Cloud-hosted systems may perform better on general reasoning tasks, but SCAI prioritizes privacy, predictability, and control.
30
+
31
+ ---
32
+
33
+ ## ⚠️ Alpha Status
34
+
35
+ SCAI is currently in **alpha**.
36
+
37
+ If you have previously installed SCAI, reset the local database before upgrading:
38
+
39
+ ```bash
40
+ scai db reset
41
+ scai index start
42
+ ```
43
+
44
+ Breaking changes and evolving behavior should be expected.
45
+
46
+ ---
47
+
48
+ ## Why SCAI?
49
+
50
+ ### 🔐 Local-Only by Design
51
+
52
+ SCAI agents operate **entirely offline**.
53
+
54
+ They do not:
55
+
56
+ * Browse the web
57
+ * Fetch URLs
58
+ * Ingest external documents
59
+ * Execute remote prompts
9
60
 
10
- **Local Model Note:** SCAI runs entirely on local LLMs. This means **no API keys, no token cost, and full privacy**, but also **more limited capabilities** compared to cloud-hosted AI.
61
+ **Security implications:**
11
62
 
12
- > ⚠️ **Alpha Version Notice**
13
- > If you have previously installed SCAI, please run:
14
- >
15
- > bash
16
- > scai db reset && scai index start
17
- >
18
- >
19
- > before using this version.
63
+ * No prompt injection via web content
64
+ * No data exfiltration
65
+ * No hidden network calls
66
+ * Fully auditable execution
67
+
68
+ This makes SCAI suitable for **private repositories, regulated environments, and GDPR-compliant workflows**.
20
69
 
21
70
  ---
22
71
 
23
- ## 🧠 Why SCAI?
72
+ ### 🧠 Codebase-Aware Analysis
73
+
74
+ SCAI builds and maintains a structured internal representation of your repository using:
75
+
76
+ * Language-aware parsing
77
+ * Symbol and dependency indexing
78
+ * Static and heuristic analysis
79
+ * Cross-file context tracking
80
+
81
+ This enables repository-level questions that go beyond single-file inspection.
24
82
 
25
- SCAI is not just another AI tool — it's a **developer-first**, **privacy-focused**, and **local-first** coding companion. Here's why SCAI stands out:
83
+ ---
26
84
 
27
- ### 🔐 **100% Local & Private by Design**
85
+ ### ✂️ Assisted Code Iteration (Early)
28
86
 
29
- Unlike cloud-based AI tools, SCAI runs entirely on your machine. No data leaves your environment making it ideal for **sensitive codebases** and **GDPR-compliant workflows**.
87
+ SCAI can assist with **lightweight, example-driven code iteration**, primarily focused on understanding and improving existing code rather than large-scale automated refactoring.
30
88
 
31
- ### 🧠 **Deep Code Understanding**
89
+ Current strengths include:
32
90
 
33
- SCAI doesn't just parse code — it **understands** it. With background indexing, static analysis, and language-aware parsing, SCAI helps you explore, refactor, and debug with confidence.
91
+ * Explaining what functions, files, or modules do
92
+ * Identifying patterns and responsibilities across files
93
+ * Generating or improving comments and documentation
94
+ * Highlighting structural or readability issues
95
+ * Suggesting small, localized improvements
34
96
 
35
- ### 📦 **No Token Costs or API Keys**
97
+ Changes are **guided by indexed context and user prompts**, and are intended to support human review and decision-making.
36
98
 
37
- SCAI works offline, with **zero token usage**. You don’t pay for API calls or subscribe to cloud services. Just install and go.
99
+ Large-scale or fully automated repository-wide refactoring should currently be considered **experimental**.
38
100
 
39
- ### 🛠️ **Developer-Focused Toolset**
101
+ ---
40
102
 
41
- From commit message generation to architecture summaries, SCAI integrates directly into your workflow. It's built for developers, by developers.
103
+ ### 🛠 Built for Developer Workflows
42
104
 
43
- ### 🇩🇰 **Built in Denmark/EU**
105
+ SCAI is a **terminal-native tool** designed to integrate cleanly into daily development:
44
106
 
45
- SCAI is developed in the European Union, ensuring compliance with data protection laws and a focus on privacy-first development.
107
+ * Natural-language queries over your codebase
108
+ * Code understanding and exploration
109
+ * Assisted iteration and suggestions
110
+ * Commit message generation
111
+ * Background indexing and analysis
112
+ * Interactive REPL
46
113
 
47
- > SCAI is your **AI coding assistant that respects your privacy**, enhances your productivity, and works **entirely offline**.
114
+ No browser UI. No cloud login. No vendor lock-in.
48
115
 
49
116
  ---
50
117
 
51
- ## 🗣️ Language Support (Important)
118
+ ### 🇪🇺 Privacy & Compliance First
52
119
 
53
- SCAI is currently **tested and validated only on the following languages**:
120
+ * Fully local execution
121
+ * No telemetry
122
+ * No cloud services
123
+ * Developed in Denmark / EU
124
+ * GDPR-friendly by default
125
+
126
+ ---
54
127
 
55
- * **JavaScript (JS)**
56
- * **TypeScript (TS)**
128
+ ## Language Support
129
+
130
+ SCAI is currently **tested and supported** for:
131
+
132
+ * **JavaScript**
133
+ * **TypeScript**
57
134
  * **Java**
58
135
 
59
- Other languages may work partially, but analysis quality, indexing accuracy, and agent behavior are **not guaranteed** outside these languages. Broader language support is planned, but for now SCAI should be considered **JS/TS/Java-first**.
136
+ Other languages may work partially, but indexing quality, analysis accuracy, and agent behavior are **not guaranteed**.
137
+
138
+ SCAI should currently be considered **JS / TS / Java-first**.
60
139
 
61
140
  ---
62
141
 
63
- ## 💻 Getting Started
142
+ ## Getting Started
64
143
 
65
- ### 1️⃣ Install & Initialize
144
+ ### Install & Initialize
66
145
 
67
- bash
146
+ ```bash
68
147
  npm install -g scai
69
148
  scai init
70
149
  scai index start
150
+ ```
71
151
 
152
+ This:
72
153
 
73
- This initializes local models (recommended: `qwen3-coder:30b`) and starts indexing your code repository.
154
+ * Initializes local configuration
155
+ * Starts the background daemon
156
+ * Begins indexing the current repository
74
157
 
75
- > **Note:** Initial indexing and analysis can take **minutes to hours** depending on repository size and enabled analysis tools.
158
+ > **Note**
159
+ > Initial indexing can take **minutes to hours**, depending on repository size and enabled analysis.
76
160
 
77
- ### 2️⃣ Check Available Commands
161
+ ---
162
+
163
+ ### Starting SCAI
164
+
165
+ Running the `scai` command with no arguments starts the interactive shell:
78
166
 
79
167
  ```bash
80
- scai --help
168
+ scai
81
169
  ```
82
170
 
83
- ---
171
+ You can also start it explicitly:
172
+
173
+ ```bash
174
+ scai shell
175
+ ```
84
176
 
85
- ## 🏠 REPL Mode (Local AI Queries)
177
+ ---
86
178
 
87
- Start an interactive REPL to ask natural-language questions about your code:
179
+ ### View Available Commands
88
180
 
89
181
  ```bash
90
- scai shell
182
+ scai --help
91
183
  ```
92
184
 
93
- Once in the REPL, you can:
185
+ ---
186
+
187
+ ## Interactive REPL
188
+
189
+ The REPL is the primary interface for working with SCAI.
190
+
191
+ ### Ask questions about your codebase
94
192
 
95
- ### Ask questions about your codebase. Be specific for better results.
193
+ Be specific for better results.
96
194
 
97
195
  ```text
98
196
  scai> what does withContext function do in index.ts file?
@@ -103,32 +201,28 @@ scai> Where are all the database queries defined?
103
201
  scai> List files involved in authentication
104
202
  ```
105
203
 
106
- ### Run CLI commands from inside the REPL
204
+ ### Run CLI commands inside the REPL
107
205
 
108
206
  ```text
109
- scai> /git commit
110
- scai> /index list
111
- scai> /index set /path/to/repo
112
- scai> /index switch
113
- scai> /index delete
207
+ /index list
208
+ /index switch
209
+ /git commit
114
210
  ```
115
211
 
116
212
  ### Execute shell commands
117
213
 
118
214
  ```text
119
- scai> !ls -la
120
- scai> !git status
215
+ !git status
216
+ !ls -la
121
217
  ```
122
218
 
123
- > REPL queries are free, offline, GDPR-friendly, and **no token cost**.
219
+ All interactions remain **offline and free**, with **no token usage**.
124
220
 
125
221
  ---
126
222
 
127
- ## 📦 Repository Indexing
223
+ ## Repository Indexing
128
224
 
129
- Before SCAI can answer questions, your repository must be indexed.
130
-
131
- ### Common Index Commands
225
+ Repositories must be indexed before querying:
132
226
 
133
227
  ```bash
134
228
  scai index set /path/to/repo
@@ -138,47 +232,43 @@ scai index switch
138
232
  scai index delete
139
233
  ```
140
234
 
141
- Only indexed repositories can be queried.
235
+ Only indexed repositories are accessible to agents.
142
236
 
143
237
  ---
144
238
 
145
- ## 🧠 Background Indexing & Analysis (Daemon)
146
-
147
- SCAI performs **deep repository indexing and static analysis** using background workers. This includes:
239
+ ## Background Analysis (Daemon)
148
240
 
149
- * File structure discovery
150
- * Language-aware parsing (JS / TS / Java)
151
- * Symbol and dependency mapping
152
- * Heuristic analysis for tests, architecture, and patterns
241
+ SCAI performs deep analysis in the background, including:
153
242
 
154
- ⚠️ **Important:** On first install or on large repositories, this process can take **several hours**.
243
+ * File discovery
244
+ * AST parsing
245
+ * Dependency graph construction
246
+ * Symbol resolution
247
+ * Heuristic structure analysis
155
248
 
156
- All background work is handled by the **SCAI daemon**, which can be fully controlled from the CLI.
157
-
158
- ### Daemon Commands
249
+ Daemon control:
159
250
 
160
251
  ```bash
161
252
  scai daemon start
162
253
  scai daemon stop
163
254
  scai daemon restart
164
255
  scai daemon status
165
- scai daemon unlock
166
256
  scai daemon logs
167
257
  ```
168
258
 
169
- You can safely stop the daemon at any time. Indexing and analysis will resume when restarted.
259
+ Indexing progress resumes automatically after restart.
170
260
 
171
261
  ---
172
262
 
173
- ## ⚙️ Configuration
263
+ ## Configuration
174
264
 
175
- Set the local AI model (recommended):
265
+ Set a local model (recommended):
176
266
 
177
267
  ```bash
178
268
  scai config set-model qwen3-coder:30b
179
269
  ```
180
270
 
181
- View current configuration:
271
+ View configuration:
182
272
 
183
273
  ```bash
184
274
  scai config show --raw
@@ -186,22 +276,22 @@ scai config show --raw
186
276
 
187
277
  ---
188
278
 
189
- ## 🔧 Git Commit Assistant
279
+ ## Git Commit Assistant
190
280
 
191
- Generate meaningful commit messages based on staged changes:
281
+ Generate commit messages from staged changes:
192
282
 
193
283
  ```bash
194
284
  git add .
195
285
  scai git commit
196
286
  ```
197
287
 
198
- All analysis is performed locally — **no token usage, no cloud calls**.
288
+ All diff inspection and reasoning is performed locally.
199
289
 
200
290
  ---
201
291
 
202
- ## 🔑 GitHub Authentication
292
+ ## GitHub Authentication
203
293
 
204
- For GitHub-related features, SCAI requires a Personal Access Token.
294
+ Required only for GitHub-related features:
205
295
 
206
296
  ```bash
207
297
  scai auth set
@@ -211,98 +301,78 @@ scai auth reset
211
301
 
212
302
  ---
213
303
 
214
- ## 🧠 Example Queries
215
-
216
- * `Summarize codeTransform.js`
217
- * `Explain utils/helpers.ts architecture`
218
- * `List all functions without tests in services/`
219
- * `Show where database queries are defined`
220
- * `Highlight potential memory leaks`
221
- * `Describe how authentication works`
222
- * `Summarize repo architecture`
223
-
224
- ---
225
-
226
- ## 🔐 Privacy & GDPR
304
+ ## Privacy & Security Summary
227
305
 
228
- * Fully local — no cloud calls
306
+ * 100% local execution
307
+ * No internet access for agents
308
+ * No prompt injection from web content
229
309
  * No API keys
230
- * **No token cost**
231
- * GDPR-friendly, built in Denmark/EU 🇩🇰
310
+ * No token costs
311
+ * GDPR-friendly by default
232
312
 
233
313
  ---
234
314
 
235
- ## 🙌 Feedback & Support
236
-
237
- Feedback, bugs, and ideas are very welcome:
315
+ ## Feedback & Community
238
316
 
239
- * 🌍 Website: [https://scai.dk](https://scai.dk)
240
- * 🧵 Threads: [@scai.dk](https://threads.net/@scai.dk)
241
-
242
- <br>
317
+ * 🌍 [https://scai.dk](https://scai.dk)
318
+ * 🧵 [https://threads.net/@scai.dk](https://threads.net/@scai.dk)
243
319
 
244
320
  ---
245
321
 
246
- <br>
247
- <br>
248
-
249
- ## 🔐 License & Usage Terms
322
+ # License & Usage Terms
250
323
 
251
- Copyright © SCAI
252
- All rights reserved.
324
+ © SCAI — All rights reserved.
253
325
 
254
- SCAI is **free to use for non-commercial purposes only**, subject to the terms below.
326
+ SCAI is **free for non-commercial use only**.
255
327
 
256
328
  ---
257
329
 
258
- ## Permitted Use
330
+ ## Permitted Use
259
331
 
260
- You may use SCAI **without charge** for the following purposes:
332
+ You may use SCAI free of charge for:
261
333
 
262
334
  * Personal projects
263
335
  * Educational use
264
336
  * Research and experimentation
265
- * Non-commercial open-source contributions
266
- * Internal evaluation or proof-of-concept work
337
+ * Non-commercial open-source work
338
+ * Internal evaluation or proof-of-concepts
267
339
 
268
- You may also **fork, modify, and redistribute** the source code, provided that such use remains **strictly non-commercial**.
340
+ You may fork and modify the source code **for non-commercial purposes only**.
269
341
 
270
342
  ---
271
343
 
272
- ## 🚫 Restricted Use
344
+ ## Restricted Use
273
345
 
274
- The following uses are **not permitted without a commercial license**:
346
+ The following require a **commercial license**:
275
347
 
276
- * Commercial use of any kind
277
- * Enterprise or organizational deployment
278
- * Use as part of a paid product, service, or subscription
279
- * Use in consultancy, client work, or billable services
280
- * Bundling or integrating SCAI into commercial software or internal enterprise tooling
281
- * Resale, sublicensing, or redistribution for commercial purposes
348
+ * Any commercial or enterprise use
349
+ * Consultancy or client work
350
+ * Paid products or services
351
+ * Internal enterprise tooling
352
+ * Commercial redistribution or resale
282
353
 
283
354
  ---
284
355
 
285
- ## 🏢 Commercial & Enterprise Licensing
356
+ ## Commercial Licensing
286
357
 
287
- **Commercial and enterprise use requires a paid license and explicit permission from the author.**
358
+ Commercial and enterprise use requires a **paid license** and explicit permission from the author.
288
359
 
289
- Organizations or individuals wishing to use SCAI in a commercial context must obtain a commercial license **prior to such use**.
290
-
291
- Please contact the author to discuss commercial licensing terms.
360
+ Please contact the author to discuss licensing terms.
292
361
 
293
362
  ---
294
363
 
295
- ## ⚖️ Disclaimer
364
+ ## Disclaimer
296
365
 
297
- This software is provided **"as is"**, without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, and non-infringement.
366
+ This software is provided **“as is”**, without warranty of any kind.
298
367
 
299
- In no event shall the author be liable for any claim, damages, or other liability, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the software or the use or other dealings in the software.
368
+ The author is not liable for any damages arising from its use.
300
369
 
301
370
  ---
302
371
 
303
- ### 📄 Summary (Non-Binding)
372
+ ### Non-Binding Summary
304
373
 
305
374
  * Free for personal and non-commercial use
306
- * Commercial and enterprise use requires a paid license
307
- * Commercial redistribution is prohibited
308
-
375
+ * Fully local, offline AI
376
+ * No token costs
377
+ * No prompt injection surface
378
+ * Commercial use requires a license
@@ -1,3 +1,6 @@
1
+ // File: src/agents/reasonNextTaskStep.ts
2
+ import { generate } from "../lib/generate.js";
3
+ import { cleanupModule } from "../pipeline/modules/cleanupModule.js";
1
4
  import { logInputOutput } from "../utils/promptLogHelper.js";
2
5
  /**
3
6
  * REASON NEXT TASK STEP
@@ -95,6 +98,48 @@ export const reasonNextTaskStep = {
95
98
  confidence = 0.98;
96
99
  }
97
100
  // ---------------------------
101
+ // 6.5️⃣ Optional: Reason over known risks
102
+ // ---------------------------
103
+ const knownRisks = context.analysis.understanding?.risks ?? [];
104
+ if (knownRisks.length > 0) {
105
+ // Optionally call the LLM with constrained instructions
106
+ const riskPrompt = `
107
+ You are given the following KNOWN RISKS (authoritative, do not invent new ones):
108
+ ${knownRisks.map(r => "- " + r).join("\n")}
109
+
110
+ Task:
111
+ - Decide whether it is reasonable to ask the user for clarification before proceeding.
112
+ - Return STRICT JSON: { askUser: true|false, rationale: string }
113
+ `;
114
+ try {
115
+ const aiResponse = await generate({
116
+ query: context.initContext?.userQuery ?? "",
117
+ content: riskPrompt
118
+ });
119
+ const cleaned = await cleanupModule.run({
120
+ query: context.initContext?.userQuery ?? "",
121
+ content: aiResponse.data ?? ""
122
+ });
123
+ const parsed = cleaned.data;
124
+ // type guard
125
+ if (parsed &&
126
+ typeof parsed === "object" &&
127
+ "askUser" in parsed &&
128
+ "rationale" in parsed &&
129
+ typeof parsed.rationale === "string") {
130
+ if (parsed.askUser) {
131
+ nextAction = "request-feedback";
132
+ rationale += `\nUser clarification recommended due to known risks: ${parsed.rationale}`;
133
+ confidence = Math.min(confidence, 0.8); // slightly lower because human needed
134
+ }
135
+ }
136
+ }
137
+ catch (err) {
138
+ console.warn("[reasonNextTaskStep] Risk reasoning failed", err);
139
+ // fallback: ignore, keep deterministic nextAction
140
+ }
141
+ }
142
+ // ---------------------------
98
143
  // 7️⃣ Ensure a TaskStep exists for nextFile
99
144
  // ---------------------------
100
145
  if (nextFile) {
@@ -11,12 +11,15 @@ import { IGNORED_FOLDER_GLOBS } from '../fileRules/ignoredPaths.js';
11
11
  import { Config } from '../config.js';
12
12
  import { log } from '../utils/log.js';
13
13
  import { startDaemon } from '../commands/DaemonCmd.js';
14
- import { sanitizeQueryForFts } from '../utils/sanitizeQuery.js';
15
14
  import * as sqlTemplates from '../db/sqlTemplates.js';
16
15
  import { RELATED_FILES_LIMIT } from '../constants.js';
17
16
  import { generate } from '../lib/generate.js';
18
- import { cleanupModule } from '../pipeline/modules/cleanupModule.js';
19
17
  import { logInputOutput } from '../utils/promptLogHelper.js';
18
+ import { sanitizeQueryForFts } from '../utils/sanitizeQuery.js';
19
+ import { extractTaggedContent } from '../utils/parseTaggedContent.js';
20
+ /* -------------------------------------------------- */
21
+ /* DB LOCK */
22
+ /* -------------------------------------------------- */
20
23
  async function lockDb() {
21
24
  try {
22
25
  return await lockfile.lock(getDbPathForRepo());
@@ -26,6 +29,9 @@ async function lockDb() {
26
29
  throw err;
27
30
  }
28
31
  }
32
+ /* -------------------------------------------------- */
33
+ /* INDEX COMMAND */
34
+ /* -------------------------------------------------- */
29
35
  export async function runIndexCommand() {
30
36
  try {
31
37
  initSchema();
@@ -57,9 +63,6 @@ export async function runIndexCommand() {
57
63
  const type = detectFileType(file);
58
64
  const normalizedPath = path.normalize(file).replace(/\\/g, '/');
59
65
  const filename = path.basename(normalizedPath);
60
- // --------------------------------------------------
61
- // Enqueue file for daemon processing
62
- // --------------------------------------------------
63
66
  db.prepare(upsertFileTemplate).run({
64
67
  path: normalizedPath,
65
68
  filename,
@@ -73,7 +76,7 @@ export async function runIndexCommand() {
73
76
  count++;
74
77
  }
75
78
  catch (err) {
76
- log(`⚠️ Skipped in indexCmd ${file}: ${err instanceof Error ? err.message : err}`);
79
+ log(`⚠️ Skipped in indexCmd ${file}: ${String(err)}`);
77
80
  }
78
81
  }
79
82
  }
@@ -82,110 +85,82 @@ export async function runIndexCommand() {
82
85
  }
83
86
  log('📊 Discovered files by extension:', JSON.stringify(countByExt, null, 2));
84
87
  log(`✅ Done. Enqueued ${count} files for indexing.`);
85
- // Kick the daemon — it now owns all processing
86
88
  startDaemon();
87
89
  }
88
- // --------------------------------------------------
89
- // QUERY API (read-only, used by CLI / raw search)
90
- // --------------------------------------------------
90
+ /* -------------------------------------------------- */
91
+ /* QUERY API */
92
+ /* -------------------------------------------------- */
91
93
  export function queryFiles(safeQuery, limit = 10) {
92
94
  const db = getDbForRepo();
93
95
  return db
94
96
  .prepare(sqlTemplates.queryFilesTemplate)
95
97
  .all(safeQuery, limit);
96
98
  }
97
- // --------------------------------------------------
98
- // SEMANTIC SEARCH (AskCmd, answering user directly)
99
- // - Discards noisy FTS
100
- // - Uses LLM aggressively
101
- // - Optimizes for precision
102
- // --------------------------------------------------
103
- export async function semanticSearchFiles(originalQuery, _query, // ignored now – LLM owns query construction
104
- topK = 5) {
99
+ /* -------------------------------------------------- */
100
+ /* SEMANTIC SEARCH */
101
+ /* -------------------------------------------------- */
102
+ export async function semanticSearchFiles(originalQuery, _query, topK = 5) {
105
103
  const db = getDbForRepo();
106
- // --------------------------------------------------
107
- // 1. LLM → primary FTS query (always)
108
- // --------------------------------------------------
109
104
  const primaryFtsQuery = await generatePrimaryFtsQuery(originalQuery);
110
105
  logInputOutput("semanticSearchFiles LLM primary query", "output", {
111
106
  originalQuery,
112
107
  ftsQuery: primaryFtsQuery,
113
108
  });
114
- // --------------------------------------------------
115
- // 2. Run primary FTS once
116
- // --------------------------------------------------
117
109
  const primaryResults = db
118
110
  .prepare(sqlTemplates.searchFilesTemplate)
119
111
  .all(primaryFtsQuery, RELATED_FILES_LIMIT);
120
112
  if (primaryResults.length > 0) {
121
113
  return rankAndMap(new Map(primaryResults.map(r => [r.id, r])), topK);
122
114
  }
123
- // --------------------------------------------------
124
- // 3. Fallback: LLM 2–3 subqueries (ONLY if zero results)
125
- // --------------------------------------------------
126
- const subQueries = await generateFallbackFtsQueries(originalQuery, primaryFtsQuery);
127
- logInputOutput("semanticSearchFiles LLM fallback queries", "output", {
115
+ const fallbackQuery = await generateFallbackFtsQueries(originalQuery, primaryFtsQuery);
116
+ logInputOutput("semanticSearchFiles LLM fallback query", "output", {
128
117
  originalQuery,
129
118
  primaryFtsQuery,
130
- subQueries,
119
+ fallbackQuery,
131
120
  });
132
- // --------------------------------------------------
133
- // 4. Execute fallback queries sequentially
134
- // --------------------------------------------------
135
- for (const subQuery of subQueries) {
136
- const rows = db
137
- .prepare(sqlTemplates.searchFilesTemplate)
138
- .all(subQuery, RELATED_FILES_LIMIT);
139
- if (rows.length > 0) {
140
- return rankAndMap(new Map(rows.map(r => [r.id, r])), topK);
121
+ if (fallbackQuery && fallbackQuery.length > 0) {
122
+ const stmt = db.prepare(sqlTemplates.searchFilesTemplate);
123
+ for (const query of fallbackQuery) {
124
+ const rows = stmt.all(query, RELATED_FILES_LIMIT);
125
+ if (rows.length > 0) {
126
+ return rankAndMap(new Map(rows.map(r => [r.id, r])), topK);
127
+ }
141
128
  }
142
129
  }
143
- // --------------------------------------------------
144
- // 5. Hard stop
145
- // --------------------------------------------------
146
130
  return [];
147
131
  }
132
+ /* -------------------------------------------------- */
133
+ /* LLM → FTS QUERY GENERATION (TAG-BASED) */
134
+ /* -------------------------------------------------- */
148
135
  async function generatePrimaryFtsQuery(userQuery) {
149
136
  const prompt = `
150
- You are generating a SQLite FTS query for searching a source code repository.
137
+ Generate a SQLite FTS query for searching a source code repository.
151
138
 
152
- Input (natural language):
139
+ Input:
153
140
  "${userQuery}"
154
141
 
155
- Task:
156
- - Produce ONE concise FTS query
157
- - Focus on filenames, symbols, module names, domain nouns
158
- - Prefer literal identifiers likely to exist in code
159
- - NO sentences
160
- - NO stopwords
161
- - NO explanations
162
- - NO wildcards unless absolutely necessary
142
+ Rules:
143
+ - Output ONLY the query terms
163
144
  - Use OR between terms
164
- - **MAX 10 terms only** — be selective and concise
145
+ - Max 10 terms
146
+ - No explanations
147
+ - No sentences
165
148
 
166
- Output JSON ONLY:
167
- {
168
- "ftsQuery": "term1 OR term2 OR term3"
169
- }
149
+ Wrap the result in <FILE_CONTENT> tags.
150
+
151
+ <FILE_CONTENT>
152
+ term1 OR term2 OR term3
153
+ </FILE_CONTENT>
170
154
  `.trim();
171
155
  try {
172
156
  const response = await generate({ content: prompt, query: "" });
173
- const cleaned = await cleanupModule.run({
174
- query: userQuery,
175
- content: response.data,
176
- });
177
- if (cleaned.data &&
178
- typeof cleaned.data === "object" &&
179
- "ftsQuery" in cleaned.data &&
180
- typeof cleaned.data.ftsQuery === "string") {
181
- return cleaned.data.ftsQuery;
182
- }
157
+ const rawText = String(response.data ?? "");
158
+ const { content } = extractTaggedContent(rawText, "FILE_CONTENT");
159
+ return sanitizeQueryForFts(content);
183
160
  }
184
161
  catch (err) {
185
- log(`⚠️ [semanticSearchFiles] Failed to generate primary FTS query: ${String(err)}`);
162
+ return sanitizeQueryForFts(userQuery);
186
163
  }
187
- // Absolute safety fallback — never explode
188
- return sanitizeQueryForFts(userQuery);
189
164
  }
190
165
  async function generateFallbackFtsQueries(userQuery, failedQuery) {
191
166
  const prompt = `
@@ -199,57 +174,44 @@ Primary FTS query returned ZERO results:
199
174
 
200
175
  Task:
201
176
  - Generate 2–3 independent FTS queries (MAX 3)
202
- - Each query should be concise: no more than 10 OR-joined search terms
177
+ - Each query must be a single OR-joined expression
178
+ - Max 10 terms per query
203
179
  - Focus on filenames, symbols, module names
204
- - Avoid natural-language sentences
205
- - Avoid recursion or refinement loops
206
- - Use OR between terms
180
+ - Avoid natural language sentences
181
+ - Avoid explanations or commentary
207
182
 
208
- Output JSON ONLY:
209
- {
210
- "subQueries": [
211
- "query1",
212
- "query2",
213
- "query3"
214
- ]
215
- }
183
+ Output format (STRICT):
184
+ <FILE_CONTENT>
185
+ query1
186
+ query2
187
+ query3
188
+ </FILE_CONTENT>
216
189
  `.trim();
217
190
  try {
218
191
  const response = await generate({ content: prompt, query: "" });
219
- const cleaned = await cleanupModule.run({
220
- query: userQuery,
221
- content: response.data,
222
- });
223
- if (cleaned.data &&
224
- typeof cleaned.data === "object" &&
225
- Array.isArray(cleaned.data.subQueries)) {
226
- return cleaned.data.subQueries
227
- .filter((q) => typeof q === "string")
228
- .slice(0, 3) // cap to 3 queries
229
- .map((q) => q
230
- .split(' OR ')
231
- .map(term => sanitizeQueryForFts(term)) // sanitize each term individually
232
- .slice(0, 10) // cap terms per query
233
- .join(' OR '));
192
+ const rawText = String(response.data ?? "");
193
+ const { content } = extractTaggedContent(rawText, "FILE_CONTENT");
194
+ const subQueries = content
195
+ .split(/\r?\n/)
196
+ .map(q => sanitizeQueryForFts(q.trim()))
197
+ .filter(Boolean)
198
+ .slice(0, 3);
199
+ if (!subQueries.length) {
200
+ throw new Error("No fallback subqueries generated");
234
201
  }
202
+ return subQueries;
235
203
  }
236
204
  catch (err) {
237
- log(`⚠️ [semanticSearchFiles] Failed to generate fallback queries: ${String(err)}`);
205
+ log(`⚠️ [semanticSearchFiles] Fallback FTS generation failed: ${String(err)}`);
206
+ return null;
238
207
  }
239
- return [];
240
208
  }
241
- // --------------------------------------------------
242
- // PLANNER SEARCH (fileSearchModule, discovery)
243
- // - Never discards FTS
244
- // - LLM ONLY if FTS is empty
245
- // - Optimizes for recall
246
- // --------------------------------------------------
209
+ /* -------------------------------------------------- */
210
+ /* PLANNER SEARCH */
211
+ /* -------------------------------------------------- */
247
212
  export async function plannerSearchFiles(originalQuery, query, topK = 5) {
248
213
  const db = getDbForRepo();
249
214
  const seen = new Map();
250
- // -----------------------------
251
- // Primary FTS (always trusted)
252
- // -----------------------------
253
215
  const safeQuery = sanitizeQueryForFts(query);
254
216
  const primaryResults = db
255
217
  .prepare(sqlTemplates.searchFilesTemplate)
@@ -259,36 +221,31 @@ export async function plannerSearchFiles(originalQuery, query, topK = 5) {
259
221
  safeQuery,
260
222
  count: primaryResults.length,
261
223
  });
262
- // -----------------------------
263
- // Only call LLM if FTS is empty
264
- // -----------------------------
265
224
  if (primaryResults.length === 0) {
266
- const llmTerms = await expandQueryWithModel(originalQuery);
267
- logInputOutput("plannerSearchFiles LLM terms (FTS empty)", "output", {
268
- originalQuery,
269
- suggestedTerms: llmTerms,
270
- });
271
- for (const term of llmTerms) {
272
- const safeTerm = sanitizeQueryForFts(term);
225
+ const expanded = await expandQueryWithModel(originalQuery);
226
+ if (expanded) {
227
+ const safeTerm = sanitizeQueryForFts(expanded);
273
228
  const rows = db
274
229
  .prepare(sqlTemplates.searchFilesTemplate)
275
230
  .all(safeTerm, RELATED_FILES_LIMIT);
276
- for (const row of rows) {
277
- if (!seen.has(row.id))
278
- seen.set(row.id, row);
279
- }
231
+ rows.forEach(r => {
232
+ if (!seen.has(r.id))
233
+ seen.set(r.id, r);
234
+ });
280
235
  }
281
236
  }
282
237
  if (seen.size === 0)
283
238
  return [];
284
239
  return rankAndMap(seen, topK);
285
240
  }
286
- // --------------------------------------------------
287
- // Helpers
288
- // --------------------------------------------------
241
+ /* -------------------------------------------------- */
242
+ /* HELPERS */
243
+ /* -------------------------------------------------- */
289
244
  function rankAndMap(seen, topK) {
290
- const merged = Array.from(seen.values()).sort((a, b) => (a.bm25Score ?? 0) - (b.bm25Score ?? 0));
291
- return merged.slice(0, topK).map(r => ({
245
+ return Array.from(seen.values())
246
+ .sort((a, b) => (a.bm25Score ?? 0) - (b.bm25Score ?? 0))
247
+ .slice(0, topK)
248
+ .map(r => ({
292
249
  id: r.id,
293
250
  path: r.path,
294
251
  filename: r.filename,
@@ -300,32 +257,20 @@ function rankAndMap(seen, topK) {
300
257
  }
301
258
  async function expandQueryWithModel(query) {
302
259
  const prompt = `
303
- You are assisting a code search system.
304
-
305
- Given a natural-language question about a codebase, return a JSON array
306
- of 3–8 concrete search terms that are likely to appear literally in source code.
260
+ Return concrete search terms likely to appear in source code.
307
261
 
308
- Rules:
309
- - Return ONLY a JSON array of strings
310
- - No explanations
311
- - Prefer filenames, function names, symbols, library names
262
+ Wrap the result in <FILE_CONTENT> tags.
312
263
 
313
264
  Question:
314
265
  "${query}"
315
266
  `.trim();
316
267
  try {
317
268
  const response = await generate({ content: prompt, query: "" });
318
- const cleaned = await cleanupModule.run({
319
- query,
320
- content: response.data,
321
- });
322
- const terms = Array.isArray(cleaned.data)
323
- ? cleaned.data.filter((t) => typeof t === "string")
324
- : [];
325
- return terms;
269
+ const rawText = String(response.data ?? "");
270
+ const { content } = extractTaggedContent(rawText, "FILE_CONTENT");
271
+ return sanitizeQueryForFts(content);
326
272
  }
327
- catch (err) {
328
- log(`⚠️ [searchFiles] Failed to expand query: ${String(err)}`);
329
- return [];
273
+ catch {
274
+ return null;
330
275
  }
331
276
  }
@@ -28,6 +28,8 @@ export const finalAnswerModule = {
28
28
  (!focus?.relevantFiles || focus.relevantFiles.includes(path)))
29
29
  .map(([path, fa]) => ({ path, analysis: fa }))
30
30
  .slice(0, MAX_FILES);
31
+ // Collect analyzed files for output
32
+ const analyzedFiles = meaningfulFiles.map(f => f.path);
31
33
  // --------------------------------------------------
32
34
  // 2️⃣ Collect supporting code snippets from working files
33
35
  // --------------------------------------------------
@@ -104,6 +106,9 @@ ${query}
104
106
  Rationale for focus:
105
107
  ${rationale}
106
108
 
109
+ Analyzed files:
110
+ ${analyzedFiles.join("\n")}
111
+
107
112
  ==================== PROPOSED CHANGES ====================
108
113
 
109
114
  ${semanticSection}
@@ -130,17 +135,24 @@ ${codeSection}
130
135
  // 5️⃣ Generate final answer
131
136
  // --------------------------------------------------
132
137
  const aiResponse = await generate({ query, content: prompt });
138
+ // ✅ Prepend analyzed files to finalText so user sees them
133
139
  const finalText = typeof aiResponse.data === "string"
134
- ? aiResponse.data
135
- : JSON.stringify(aiResponse.data, null, 2);
140
+ ? `Analyzed files:\n${analyzedFiles.join("\n")}\n\n${aiResponse.data}`
141
+ : `Analyzed files:\n${analyzedFiles.join("\n")}\n\n${JSON.stringify(aiResponse.data, null, 2)}`;
136
142
  context.analysis || (context.analysis = {});
137
143
  context.analysis.finalAnswer = finalText;
138
- logInputOutput("finalAnswerModule", "output", aiResponse.data);
144
+ logInputOutput("finalAnswerModule", "output", {
145
+ data: aiResponse.data,
146
+ analyzedFiles,
147
+ });
139
148
  console.log(chalk.yellow(`\n\n[FINAL ANSWER]\n${finalText}\n`));
140
149
  return {
141
150
  query,
142
151
  content: finalText,
143
- data: aiResponse.data,
152
+ data: {
153
+ response: aiResponse.data,
154
+ analyzedFiles,
155
+ },
144
156
  context,
145
157
  };
146
158
  },
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "scai",
3
- "version": "0.1.164",
3
+ "version": "0.1.166",
4
4
  "type": "module",
5
5
  "bin": {
6
6
  "scai": "./dist/index.js"