scai 0.1.164 → 0.1.166
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +203 -133
- package/dist/agents/reasonNextTaskStep.js +45 -0
- package/dist/db/fileIndex.js +91 -146
- package/dist/pipeline/modules/finalAnswerModule.js +16 -4
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,98 +1,196 @@
|
|
|
1
1
|
# ⚙️ SCAI — Source Code AI 🌿
|
|
2
2
|
|
|
3
|
-
> **
|
|
4
|
-
> **100% local • No token
|
|
3
|
+
> **A local-first AI CLI for understanding, querying, and iterating on large codebases.**
|
|
4
|
+
> **100% local • No token costs • No cloud • No prompt injection • Private by design**
|
|
5
5
|
|
|
6
6
|
🔗 **Website:** [https://scai.dk](https://scai.dk)
|
|
7
|
+
🇪🇺 Built in Denmark / EU
|
|
7
8
|
|
|
8
|
-
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## What is SCAI?
|
|
12
|
+
|
|
13
|
+
**SCAI** is an AI-powered command-line tool that helps developers explore and reason about source code using **local large language models only**.
|
|
14
|
+
|
|
15
|
+
Inspired by tools such as *Claude Code* and *Gemini CLI*, SCAI is designed to feel like a natural extension of the terminal. It enables natural-language interaction with your codebase while deliberately avoiding cloud dependencies and network-connected agents.
|
|
16
|
+
|
|
17
|
+
SCAI runs entirely on your local system:
|
|
18
|
+
|
|
19
|
+
* **No token costs** — no usage-based pricing
|
|
20
|
+
* **No internet access for agents**
|
|
21
|
+
* **No prompt injection from web content**
|
|
22
|
+
* No external AI APIs
|
|
23
|
+
* No telemetry or tracking
|
|
24
|
+
* No API keys
|
|
25
|
+
|
|
26
|
+
Your code never leaves your machine. All analysis and reasoning happens locally.
|
|
27
|
+
|
|
28
|
+
> **Local model tradeoff**
|
|
29
|
+
> SCAI uses local LLMs. Output quality depends on your hardware and selected model. Cloud-hosted systems may perform better on general reasoning tasks, but SCAI prioritizes privacy, predictability, and control.
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## ⚠️ Alpha Status
|
|
34
|
+
|
|
35
|
+
SCAI is currently in **alpha**.
|
|
36
|
+
|
|
37
|
+
If you have previously installed SCAI, reset the local database before upgrading:
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
scai db reset
|
|
41
|
+
scai index start
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
Breaking changes and evolving behavior should be expected.
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## Why SCAI?
|
|
49
|
+
|
|
50
|
+
### 🔐 Local-Only by Design
|
|
51
|
+
|
|
52
|
+
SCAI agents operate **entirely offline**.
|
|
53
|
+
|
|
54
|
+
They do not:
|
|
55
|
+
|
|
56
|
+
* Browse the web
|
|
57
|
+
* Fetch URLs
|
|
58
|
+
* Ingest external documents
|
|
59
|
+
* Execute remote prompts
|
|
9
60
|
|
|
10
|
-
**
|
|
61
|
+
**Security implications:**
|
|
11
62
|
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
>
|
|
19
|
-
> before using this version.
|
|
63
|
+
* No prompt injection via web content
|
|
64
|
+
* No data exfiltration
|
|
65
|
+
* No hidden network calls
|
|
66
|
+
* Fully auditable execution
|
|
67
|
+
|
|
68
|
+
This makes SCAI suitable for **private repositories, regulated environments, and GDPR-compliant workflows**.
|
|
20
69
|
|
|
21
70
|
---
|
|
22
71
|
|
|
23
|
-
|
|
72
|
+
### 🧠 Codebase-Aware Analysis
|
|
73
|
+
|
|
74
|
+
SCAI builds and maintains a structured internal representation of your repository using:
|
|
75
|
+
|
|
76
|
+
* Language-aware parsing
|
|
77
|
+
* Symbol and dependency indexing
|
|
78
|
+
* Static and heuristic analysis
|
|
79
|
+
* Cross-file context tracking
|
|
80
|
+
|
|
81
|
+
This enables repository-level questions that go beyond single-file inspection.
|
|
24
82
|
|
|
25
|
-
|
|
83
|
+
---
|
|
26
84
|
|
|
27
|
-
###
|
|
85
|
+
### ✂️ Assisted Code Iteration (Early)
|
|
28
86
|
|
|
29
|
-
|
|
87
|
+
SCAI can assist with **lightweight, example-driven code iteration**, primarily focused on understanding and improving existing code rather than large-scale automated refactoring.
|
|
30
88
|
|
|
31
|
-
|
|
89
|
+
Current strengths include:
|
|
32
90
|
|
|
33
|
-
|
|
91
|
+
* Explaining what functions, files, or modules do
|
|
92
|
+
* Identifying patterns and responsibilities across files
|
|
93
|
+
* Generating or improving comments and documentation
|
|
94
|
+
* Highlighting structural or readability issues
|
|
95
|
+
* Suggesting small, localized improvements
|
|
34
96
|
|
|
35
|
-
|
|
97
|
+
Changes are **guided by indexed context and user prompts**, and are intended to support human review and decision-making.
|
|
36
98
|
|
|
37
|
-
|
|
99
|
+
Large-scale or fully automated repository-wide refactoring should currently be considered **experimental**.
|
|
38
100
|
|
|
39
|
-
|
|
101
|
+
---
|
|
40
102
|
|
|
41
|
-
|
|
103
|
+
### 🛠 Built for Developer Workflows
|
|
42
104
|
|
|
43
|
-
|
|
105
|
+
SCAI is a **terminal-native tool** designed to integrate cleanly into daily development:
|
|
44
106
|
|
|
45
|
-
|
|
107
|
+
* Natural-language queries over your codebase
|
|
108
|
+
* Code understanding and exploration
|
|
109
|
+
* Assisted iteration and suggestions
|
|
110
|
+
* Commit message generation
|
|
111
|
+
* Background indexing and analysis
|
|
112
|
+
* Interactive REPL
|
|
46
113
|
|
|
47
|
-
|
|
114
|
+
No browser UI. No cloud login. No vendor lock-in.
|
|
48
115
|
|
|
49
116
|
---
|
|
50
117
|
|
|
51
|
-
|
|
118
|
+
### 🇪🇺 Privacy & Compliance First
|
|
52
119
|
|
|
53
|
-
|
|
120
|
+
* Fully local execution
|
|
121
|
+
* No telemetry
|
|
122
|
+
* No cloud services
|
|
123
|
+
* Developed in Denmark / EU
|
|
124
|
+
* GDPR-friendly by default
|
|
125
|
+
|
|
126
|
+
---
|
|
54
127
|
|
|
55
|
-
|
|
56
|
-
|
|
128
|
+
## Language Support
|
|
129
|
+
|
|
130
|
+
SCAI is currently **tested and supported** for:
|
|
131
|
+
|
|
132
|
+
* **JavaScript**
|
|
133
|
+
* **TypeScript**
|
|
57
134
|
* **Java**
|
|
58
135
|
|
|
59
|
-
Other languages may work partially, but
|
|
136
|
+
Other languages may work partially, but indexing quality, analysis accuracy, and agent behavior are **not guaranteed**.
|
|
137
|
+
|
|
138
|
+
SCAI should currently be considered **JS / TS / Java-first**.
|
|
60
139
|
|
|
61
140
|
---
|
|
62
141
|
|
|
63
|
-
##
|
|
142
|
+
## Getting Started
|
|
64
143
|
|
|
65
|
-
###
|
|
144
|
+
### Install & Initialize
|
|
66
145
|
|
|
67
|
-
bash
|
|
146
|
+
```bash
|
|
68
147
|
npm install -g scai
|
|
69
148
|
scai init
|
|
70
149
|
scai index start
|
|
150
|
+
```
|
|
71
151
|
|
|
152
|
+
This:
|
|
72
153
|
|
|
73
|
-
|
|
154
|
+
* Initializes local configuration
|
|
155
|
+
* Starts the background daemon
|
|
156
|
+
* Begins indexing the current repository
|
|
74
157
|
|
|
75
|
-
>
|
|
158
|
+
> **Note**
|
|
159
|
+
> Initial indexing can take **minutes to hours**, depending on repository size and enabled analysis.
|
|
76
160
|
|
|
77
|
-
|
|
161
|
+
---
|
|
162
|
+
|
|
163
|
+
### Starting SCAI
|
|
164
|
+
|
|
165
|
+
Running the `scai` command with no arguments starts the interactive shell:
|
|
78
166
|
|
|
79
167
|
```bash
|
|
80
|
-
scai
|
|
168
|
+
scai
|
|
81
169
|
```
|
|
82
170
|
|
|
83
|
-
|
|
171
|
+
You can also start it explicitly:
|
|
172
|
+
|
|
173
|
+
```bash
|
|
174
|
+
scai shell
|
|
175
|
+
```
|
|
84
176
|
|
|
85
|
-
|
|
177
|
+
---
|
|
86
178
|
|
|
87
|
-
|
|
179
|
+
### View Available Commands
|
|
88
180
|
|
|
89
181
|
```bash
|
|
90
|
-
scai
|
|
182
|
+
scai --help
|
|
91
183
|
```
|
|
92
184
|
|
|
93
|
-
|
|
185
|
+
---
|
|
186
|
+
|
|
187
|
+
## Interactive REPL
|
|
188
|
+
|
|
189
|
+
The REPL is the primary interface for working with SCAI.
|
|
190
|
+
|
|
191
|
+
### Ask questions about your codebase
|
|
94
192
|
|
|
95
|
-
|
|
193
|
+
Be specific for better results.
|
|
96
194
|
|
|
97
195
|
```text
|
|
98
196
|
scai> what does withContext function do in index.ts file?
|
|
@@ -103,32 +201,28 @@ scai> Where are all the database queries defined?
|
|
|
103
201
|
scai> List files involved in authentication
|
|
104
202
|
```
|
|
105
203
|
|
|
106
|
-
### Run CLI commands
|
|
204
|
+
### Run CLI commands inside the REPL
|
|
107
205
|
|
|
108
206
|
```text
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
scai> /index switch
|
|
113
|
-
scai> /index delete
|
|
207
|
+
/index list
|
|
208
|
+
/index switch
|
|
209
|
+
/git commit
|
|
114
210
|
```
|
|
115
211
|
|
|
116
212
|
### Execute shell commands
|
|
117
213
|
|
|
118
214
|
```text
|
|
119
|
-
|
|
120
|
-
|
|
215
|
+
!git status
|
|
216
|
+
!ls -la
|
|
121
217
|
```
|
|
122
218
|
|
|
123
|
-
|
|
219
|
+
All interactions remain **offline and free**, with **no token usage**.
|
|
124
220
|
|
|
125
221
|
---
|
|
126
222
|
|
|
127
|
-
##
|
|
223
|
+
## Repository Indexing
|
|
128
224
|
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
### Common Index Commands
|
|
225
|
+
Repositories must be indexed before querying:
|
|
132
226
|
|
|
133
227
|
```bash
|
|
134
228
|
scai index set /path/to/repo
|
|
@@ -138,47 +232,43 @@ scai index switch
|
|
|
138
232
|
scai index delete
|
|
139
233
|
```
|
|
140
234
|
|
|
141
|
-
Only indexed repositories
|
|
235
|
+
Only indexed repositories are accessible to agents.
|
|
142
236
|
|
|
143
237
|
---
|
|
144
238
|
|
|
145
|
-
##
|
|
146
|
-
|
|
147
|
-
SCAI performs **deep repository indexing and static analysis** using background workers. This includes:
|
|
239
|
+
## Background Analysis (Daemon)
|
|
148
240
|
|
|
149
|
-
|
|
150
|
-
* Language-aware parsing (JS / TS / Java)
|
|
151
|
-
* Symbol and dependency mapping
|
|
152
|
-
* Heuristic analysis for tests, architecture, and patterns
|
|
241
|
+
SCAI performs deep analysis in the background, including:
|
|
153
242
|
|
|
154
|
-
|
|
243
|
+
* File discovery
|
|
244
|
+
* AST parsing
|
|
245
|
+
* Dependency graph construction
|
|
246
|
+
* Symbol resolution
|
|
247
|
+
* Heuristic structure analysis
|
|
155
248
|
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
### Daemon Commands
|
|
249
|
+
Daemon control:
|
|
159
250
|
|
|
160
251
|
```bash
|
|
161
252
|
scai daemon start
|
|
162
253
|
scai daemon stop
|
|
163
254
|
scai daemon restart
|
|
164
255
|
scai daemon status
|
|
165
|
-
scai daemon unlock
|
|
166
256
|
scai daemon logs
|
|
167
257
|
```
|
|
168
258
|
|
|
169
|
-
|
|
259
|
+
Indexing progress resumes automatically after restart.
|
|
170
260
|
|
|
171
261
|
---
|
|
172
262
|
|
|
173
|
-
##
|
|
263
|
+
## Configuration
|
|
174
264
|
|
|
175
|
-
Set
|
|
265
|
+
Set a local model (recommended):
|
|
176
266
|
|
|
177
267
|
```bash
|
|
178
268
|
scai config set-model qwen3-coder:30b
|
|
179
269
|
```
|
|
180
270
|
|
|
181
|
-
View
|
|
271
|
+
View configuration:
|
|
182
272
|
|
|
183
273
|
```bash
|
|
184
274
|
scai config show --raw
|
|
@@ -186,22 +276,22 @@ scai config show --raw
|
|
|
186
276
|
|
|
187
277
|
---
|
|
188
278
|
|
|
189
|
-
##
|
|
279
|
+
## Git Commit Assistant
|
|
190
280
|
|
|
191
|
-
Generate
|
|
281
|
+
Generate commit messages from staged changes:
|
|
192
282
|
|
|
193
283
|
```bash
|
|
194
284
|
git add .
|
|
195
285
|
scai git commit
|
|
196
286
|
```
|
|
197
287
|
|
|
198
|
-
All
|
|
288
|
+
All diff inspection and reasoning is performed locally.
|
|
199
289
|
|
|
200
290
|
---
|
|
201
291
|
|
|
202
|
-
##
|
|
292
|
+
## GitHub Authentication
|
|
203
293
|
|
|
204
|
-
|
|
294
|
+
Required only for GitHub-related features:
|
|
205
295
|
|
|
206
296
|
```bash
|
|
207
297
|
scai auth set
|
|
@@ -211,98 +301,78 @@ scai auth reset
|
|
|
211
301
|
|
|
212
302
|
---
|
|
213
303
|
|
|
214
|
-
##
|
|
215
|
-
|
|
216
|
-
* `Summarize codeTransform.js`
|
|
217
|
-
* `Explain utils/helpers.ts architecture`
|
|
218
|
-
* `List all functions without tests in services/`
|
|
219
|
-
* `Show where database queries are defined`
|
|
220
|
-
* `Highlight potential memory leaks`
|
|
221
|
-
* `Describe how authentication works`
|
|
222
|
-
* `Summarize repo architecture`
|
|
223
|
-
|
|
224
|
-
---
|
|
225
|
-
|
|
226
|
-
## 🔐 Privacy & GDPR
|
|
304
|
+
## Privacy & Security Summary
|
|
227
305
|
|
|
228
|
-
*
|
|
306
|
+
* 100% local execution
|
|
307
|
+
* No internet access for agents
|
|
308
|
+
* No prompt injection from web content
|
|
229
309
|
* No API keys
|
|
230
|
-
*
|
|
231
|
-
* GDPR-friendly
|
|
310
|
+
* No token costs
|
|
311
|
+
* GDPR-friendly by default
|
|
232
312
|
|
|
233
313
|
---
|
|
234
314
|
|
|
235
|
-
##
|
|
236
|
-
|
|
237
|
-
Feedback, bugs, and ideas are very welcome:
|
|
315
|
+
## Feedback & Community
|
|
238
316
|
|
|
239
|
-
* 🌍
|
|
240
|
-
* 🧵
|
|
241
|
-
|
|
242
|
-
<br>
|
|
317
|
+
* 🌍 [https://scai.dk](https://scai.dk)
|
|
318
|
+
* 🧵 [https://threads.net/@scai.dk](https://threads.net/@scai.dk)
|
|
243
319
|
|
|
244
320
|
---
|
|
245
321
|
|
|
246
|
-
|
|
247
|
-
<br>
|
|
248
|
-
|
|
249
|
-
## 🔐 License & Usage Terms
|
|
322
|
+
# License & Usage Terms
|
|
250
323
|
|
|
251
|
-
|
|
252
|
-
All rights reserved.
|
|
324
|
+
© SCAI — All rights reserved.
|
|
253
325
|
|
|
254
|
-
SCAI is **free
|
|
326
|
+
SCAI is **free for non-commercial use only**.
|
|
255
327
|
|
|
256
328
|
---
|
|
257
329
|
|
|
258
|
-
##
|
|
330
|
+
## Permitted Use
|
|
259
331
|
|
|
260
|
-
You may use SCAI
|
|
332
|
+
You may use SCAI free of charge for:
|
|
261
333
|
|
|
262
334
|
* Personal projects
|
|
263
335
|
* Educational use
|
|
264
336
|
* Research and experimentation
|
|
265
|
-
* Non-commercial open-source
|
|
266
|
-
* Internal evaluation or proof-of-
|
|
337
|
+
* Non-commercial open-source work
|
|
338
|
+
* Internal evaluation or proof-of-concepts
|
|
267
339
|
|
|
268
|
-
You may
|
|
340
|
+
You may fork and modify the source code **for non-commercial purposes only**.
|
|
269
341
|
|
|
270
342
|
---
|
|
271
343
|
|
|
272
|
-
##
|
|
344
|
+
## Restricted Use
|
|
273
345
|
|
|
274
|
-
The following
|
|
346
|
+
The following require a **commercial license**:
|
|
275
347
|
|
|
276
|
-
*
|
|
277
|
-
*
|
|
278
|
-
*
|
|
279
|
-
*
|
|
280
|
-
*
|
|
281
|
-
* Resale, sublicensing, or redistribution for commercial purposes
|
|
348
|
+
* Any commercial or enterprise use
|
|
349
|
+
* Consultancy or client work
|
|
350
|
+
* Paid products or services
|
|
351
|
+
* Internal enterprise tooling
|
|
352
|
+
* Commercial redistribution or resale
|
|
282
353
|
|
|
283
354
|
---
|
|
284
355
|
|
|
285
|
-
##
|
|
356
|
+
## Commercial Licensing
|
|
286
357
|
|
|
287
|
-
|
|
358
|
+
Commercial and enterprise use requires a **paid license** and explicit permission from the author.
|
|
288
359
|
|
|
289
|
-
|
|
290
|
-
|
|
291
|
-
Please contact the author to discuss commercial licensing terms.
|
|
360
|
+
Please contact the author to discuss licensing terms.
|
|
292
361
|
|
|
293
362
|
---
|
|
294
363
|
|
|
295
|
-
##
|
|
364
|
+
## Disclaimer
|
|
296
365
|
|
|
297
|
-
This software is provided
|
|
366
|
+
This software is provided **“as is”**, without warranty of any kind.
|
|
298
367
|
|
|
299
|
-
|
|
368
|
+
The author is not liable for any damages arising from its use.
|
|
300
369
|
|
|
301
370
|
---
|
|
302
371
|
|
|
303
|
-
###
|
|
372
|
+
### Non-Binding Summary
|
|
304
373
|
|
|
305
374
|
* Free for personal and non-commercial use
|
|
306
|
-
*
|
|
307
|
-
*
|
|
308
|
-
|
|
375
|
+
* Fully local, offline AI
|
|
376
|
+
* No token costs
|
|
377
|
+
* No prompt injection surface
|
|
378
|
+
* Commercial use requires a license
|
|
@@ -1,3 +1,6 @@
|
|
|
1
|
+
// File: src/agents/reasonNextTaskStep.ts
|
|
2
|
+
import { generate } from "../lib/generate.js";
|
|
3
|
+
import { cleanupModule } from "../pipeline/modules/cleanupModule.js";
|
|
1
4
|
import { logInputOutput } from "../utils/promptLogHelper.js";
|
|
2
5
|
/**
|
|
3
6
|
* REASON NEXT TASK STEP
|
|
@@ -95,6 +98,48 @@ export const reasonNextTaskStep = {
|
|
|
95
98
|
confidence = 0.98;
|
|
96
99
|
}
|
|
97
100
|
// ---------------------------
|
|
101
|
+
// 6.5️⃣ Optional: Reason over known risks
|
|
102
|
+
// ---------------------------
|
|
103
|
+
const knownRisks = context.analysis.understanding?.risks ?? [];
|
|
104
|
+
if (knownRisks.length > 0) {
|
|
105
|
+
// Optionally call the LLM with constrained instructions
|
|
106
|
+
const riskPrompt = `
|
|
107
|
+
You are given the following KNOWN RISKS (authoritative, do not invent new ones):
|
|
108
|
+
${knownRisks.map(r => "- " + r).join("\n")}
|
|
109
|
+
|
|
110
|
+
Task:
|
|
111
|
+
- Decide whether it is reasonable to ask the user for clarification before proceeding.
|
|
112
|
+
- Return STRICT JSON: { askUser: true|false, rationale: string }
|
|
113
|
+
`;
|
|
114
|
+
try {
|
|
115
|
+
const aiResponse = await generate({
|
|
116
|
+
query: context.initContext?.userQuery ?? "",
|
|
117
|
+
content: riskPrompt
|
|
118
|
+
});
|
|
119
|
+
const cleaned = await cleanupModule.run({
|
|
120
|
+
query: context.initContext?.userQuery ?? "",
|
|
121
|
+
content: aiResponse.data ?? ""
|
|
122
|
+
});
|
|
123
|
+
const parsed = cleaned.data;
|
|
124
|
+
// type guard
|
|
125
|
+
if (parsed &&
|
|
126
|
+
typeof parsed === "object" &&
|
|
127
|
+
"askUser" in parsed &&
|
|
128
|
+
"rationale" in parsed &&
|
|
129
|
+
typeof parsed.rationale === "string") {
|
|
130
|
+
if (parsed.askUser) {
|
|
131
|
+
nextAction = "request-feedback";
|
|
132
|
+
rationale += `\nUser clarification recommended due to known risks: ${parsed.rationale}`;
|
|
133
|
+
confidence = Math.min(confidence, 0.8); // slightly lower because human needed
|
|
134
|
+
}
|
|
135
|
+
}
|
|
136
|
+
}
|
|
137
|
+
catch (err) {
|
|
138
|
+
console.warn("[reasonNextTaskStep] Risk reasoning failed", err);
|
|
139
|
+
// fallback: ignore, keep deterministic nextAction
|
|
140
|
+
}
|
|
141
|
+
}
|
|
142
|
+
// ---------------------------
|
|
98
143
|
// 7️⃣ Ensure a TaskStep exists for nextFile
|
|
99
144
|
// ---------------------------
|
|
100
145
|
if (nextFile) {
|
package/dist/db/fileIndex.js
CHANGED
|
@@ -11,12 +11,15 @@ import { IGNORED_FOLDER_GLOBS } from '../fileRules/ignoredPaths.js';
|
|
|
11
11
|
import { Config } from '../config.js';
|
|
12
12
|
import { log } from '../utils/log.js';
|
|
13
13
|
import { startDaemon } from '../commands/DaemonCmd.js';
|
|
14
|
-
import { sanitizeQueryForFts } from '../utils/sanitizeQuery.js';
|
|
15
14
|
import * as sqlTemplates from '../db/sqlTemplates.js';
|
|
16
15
|
import { RELATED_FILES_LIMIT } from '../constants.js';
|
|
17
16
|
import { generate } from '../lib/generate.js';
|
|
18
|
-
import { cleanupModule } from '../pipeline/modules/cleanupModule.js';
|
|
19
17
|
import { logInputOutput } from '../utils/promptLogHelper.js';
|
|
18
|
+
import { sanitizeQueryForFts } from '../utils/sanitizeQuery.js';
|
|
19
|
+
import { extractTaggedContent } from '../utils/parseTaggedContent.js';
|
|
20
|
+
/* -------------------------------------------------- */
|
|
21
|
+
/* DB LOCK */
|
|
22
|
+
/* -------------------------------------------------- */
|
|
20
23
|
async function lockDb() {
|
|
21
24
|
try {
|
|
22
25
|
return await lockfile.lock(getDbPathForRepo());
|
|
@@ -26,6 +29,9 @@ async function lockDb() {
|
|
|
26
29
|
throw err;
|
|
27
30
|
}
|
|
28
31
|
}
|
|
32
|
+
/* -------------------------------------------------- */
|
|
33
|
+
/* INDEX COMMAND */
|
|
34
|
+
/* -------------------------------------------------- */
|
|
29
35
|
export async function runIndexCommand() {
|
|
30
36
|
try {
|
|
31
37
|
initSchema();
|
|
@@ -57,9 +63,6 @@ export async function runIndexCommand() {
|
|
|
57
63
|
const type = detectFileType(file);
|
|
58
64
|
const normalizedPath = path.normalize(file).replace(/\\/g, '/');
|
|
59
65
|
const filename = path.basename(normalizedPath);
|
|
60
|
-
// --------------------------------------------------
|
|
61
|
-
// Enqueue file for daemon processing
|
|
62
|
-
// --------------------------------------------------
|
|
63
66
|
db.prepare(upsertFileTemplate).run({
|
|
64
67
|
path: normalizedPath,
|
|
65
68
|
filename,
|
|
@@ -73,7 +76,7 @@ export async function runIndexCommand() {
|
|
|
73
76
|
count++;
|
|
74
77
|
}
|
|
75
78
|
catch (err) {
|
|
76
|
-
log(`⚠️ Skipped in indexCmd ${file}: ${err
|
|
79
|
+
log(`⚠️ Skipped in indexCmd ${file}: ${String(err)}`);
|
|
77
80
|
}
|
|
78
81
|
}
|
|
79
82
|
}
|
|
@@ -82,110 +85,82 @@ export async function runIndexCommand() {
|
|
|
82
85
|
}
|
|
83
86
|
log('📊 Discovered files by extension:', JSON.stringify(countByExt, null, 2));
|
|
84
87
|
log(`✅ Done. Enqueued ${count} files for indexing.`);
|
|
85
|
-
// Kick the daemon — it now owns all processing
|
|
86
88
|
startDaemon();
|
|
87
89
|
}
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
90
|
+
/* -------------------------------------------------- */
|
|
91
|
+
/* QUERY API */
|
|
92
|
+
/* -------------------------------------------------- */
|
|
91
93
|
export function queryFiles(safeQuery, limit = 10) {
|
|
92
94
|
const db = getDbForRepo();
|
|
93
95
|
return db
|
|
94
96
|
.prepare(sqlTemplates.queryFilesTemplate)
|
|
95
97
|
.all(safeQuery, limit);
|
|
96
98
|
}
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
// - Optimizes for precision
|
|
102
|
-
// --------------------------------------------------
|
|
103
|
-
export async function semanticSearchFiles(originalQuery, _query, // ignored now – LLM owns query construction
|
|
104
|
-
topK = 5) {
|
|
99
|
+
/* -------------------------------------------------- */
|
|
100
|
+
/* SEMANTIC SEARCH */
|
|
101
|
+
/* -------------------------------------------------- */
|
|
102
|
+
export async function semanticSearchFiles(originalQuery, _query, topK = 5) {
|
|
105
103
|
const db = getDbForRepo();
|
|
106
|
-
// --------------------------------------------------
|
|
107
|
-
// 1. LLM → primary FTS query (always)
|
|
108
|
-
// --------------------------------------------------
|
|
109
104
|
const primaryFtsQuery = await generatePrimaryFtsQuery(originalQuery);
|
|
110
105
|
logInputOutput("semanticSearchFiles LLM primary query", "output", {
|
|
111
106
|
originalQuery,
|
|
112
107
|
ftsQuery: primaryFtsQuery,
|
|
113
108
|
});
|
|
114
|
-
// --------------------------------------------------
|
|
115
|
-
// 2. Run primary FTS once
|
|
116
|
-
// --------------------------------------------------
|
|
117
109
|
const primaryResults = db
|
|
118
110
|
.prepare(sqlTemplates.searchFilesTemplate)
|
|
119
111
|
.all(primaryFtsQuery, RELATED_FILES_LIMIT);
|
|
120
112
|
if (primaryResults.length > 0) {
|
|
121
113
|
return rankAndMap(new Map(primaryResults.map(r => [r.id, r])), topK);
|
|
122
114
|
}
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
// --------------------------------------------------
|
|
126
|
-
const subQueries = await generateFallbackFtsQueries(originalQuery, primaryFtsQuery);
|
|
127
|
-
logInputOutput("semanticSearchFiles LLM fallback queries", "output", {
|
|
115
|
+
const fallbackQuery = await generateFallbackFtsQueries(originalQuery, primaryFtsQuery);
|
|
116
|
+
logInputOutput("semanticSearchFiles LLM fallback query", "output", {
|
|
128
117
|
originalQuery,
|
|
129
118
|
primaryFtsQuery,
|
|
130
|
-
|
|
119
|
+
fallbackQuery,
|
|
131
120
|
});
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
if (rows.length > 0) {
|
|
140
|
-
return rankAndMap(new Map(rows.map(r => [r.id, r])), topK);
|
|
121
|
+
if (fallbackQuery && fallbackQuery.length > 0) {
|
|
122
|
+
const stmt = db.prepare(sqlTemplates.searchFilesTemplate);
|
|
123
|
+
for (const query of fallbackQuery) {
|
|
124
|
+
const rows = stmt.all(query, RELATED_FILES_LIMIT);
|
|
125
|
+
if (rows.length > 0) {
|
|
126
|
+
return rankAndMap(new Map(rows.map(r => [r.id, r])), topK);
|
|
127
|
+
}
|
|
141
128
|
}
|
|
142
129
|
}
|
|
143
|
-
// --------------------------------------------------
|
|
144
|
-
// 5. Hard stop
|
|
145
|
-
// --------------------------------------------------
|
|
146
130
|
return [];
|
|
147
131
|
}
|
|
132
|
+
/* -------------------------------------------------- */
|
|
133
|
+
/* LLM → FTS QUERY GENERATION (TAG-BASED) */
|
|
134
|
+
/* -------------------------------------------------- */
|
|
148
135
|
async function generatePrimaryFtsQuery(userQuery) {
|
|
149
136
|
const prompt = `
|
|
150
|
-
|
|
137
|
+
Generate a SQLite FTS query for searching a source code repository.
|
|
151
138
|
|
|
152
|
-
Input
|
|
139
|
+
Input:
|
|
153
140
|
"${userQuery}"
|
|
154
141
|
|
|
155
|
-
|
|
156
|
-
-
|
|
157
|
-
- Focus on filenames, symbols, module names, domain nouns
|
|
158
|
-
- Prefer literal identifiers likely to exist in code
|
|
159
|
-
- NO sentences
|
|
160
|
-
- NO stopwords
|
|
161
|
-
- NO explanations
|
|
162
|
-
- NO wildcards unless absolutely necessary
|
|
142
|
+
Rules:
|
|
143
|
+
- Output ONLY the query terms
|
|
163
144
|
- Use OR between terms
|
|
164
|
-
-
|
|
145
|
+
- Max 10 terms
|
|
146
|
+
- No explanations
|
|
147
|
+
- No sentences
|
|
165
148
|
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
149
|
+
Wrap the result in <FILE_CONTENT> tags.
|
|
150
|
+
|
|
151
|
+
<FILE_CONTENT>
|
|
152
|
+
term1 OR term2 OR term3
|
|
153
|
+
</FILE_CONTENT>
|
|
170
154
|
`.trim();
|
|
171
155
|
try {
|
|
172
156
|
const response = await generate({ content: prompt, query: "" });
|
|
173
|
-
const
|
|
174
|
-
|
|
175
|
-
|
|
176
|
-
});
|
|
177
|
-
if (cleaned.data &&
|
|
178
|
-
typeof cleaned.data === "object" &&
|
|
179
|
-
"ftsQuery" in cleaned.data &&
|
|
180
|
-
typeof cleaned.data.ftsQuery === "string") {
|
|
181
|
-
return cleaned.data.ftsQuery;
|
|
182
|
-
}
|
|
157
|
+
const rawText = String(response.data ?? "");
|
|
158
|
+
const { content } = extractTaggedContent(rawText, "FILE_CONTENT");
|
|
159
|
+
return sanitizeQueryForFts(content);
|
|
183
160
|
}
|
|
184
161
|
catch (err) {
|
|
185
|
-
|
|
162
|
+
return sanitizeQueryForFts(userQuery);
|
|
186
163
|
}
|
|
187
|
-
// Absolute safety fallback — never explode
|
|
188
|
-
return sanitizeQueryForFts(userQuery);
|
|
189
164
|
}
|
|
190
165
|
async function generateFallbackFtsQueries(userQuery, failedQuery) {
|
|
191
166
|
const prompt = `
|
|
@@ -199,57 +174,44 @@ Primary FTS query returned ZERO results:
|
|
|
199
174
|
|
|
200
175
|
Task:
|
|
201
176
|
- Generate 2–3 independent FTS queries (MAX 3)
|
|
202
|
-
- Each query
|
|
177
|
+
- Each query must be a single OR-joined expression
|
|
178
|
+
- Max 10 terms per query
|
|
203
179
|
- Focus on filenames, symbols, module names
|
|
204
|
-
- Avoid natural
|
|
205
|
-
- Avoid
|
|
206
|
-
- Use OR between terms
|
|
180
|
+
- Avoid natural language sentences
|
|
181
|
+
- Avoid explanations or commentary
|
|
207
182
|
|
|
208
|
-
Output
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
]
|
|
215
|
-
}
|
|
183
|
+
Output format (STRICT):
|
|
184
|
+
<FILE_CONTENT>
|
|
185
|
+
query1
|
|
186
|
+
query2
|
|
187
|
+
query3
|
|
188
|
+
</FILE_CONTENT>
|
|
216
189
|
`.trim();
|
|
217
190
|
try {
|
|
218
191
|
const response = await generate({ content: prompt, query: "" });
|
|
219
|
-
const
|
|
220
|
-
|
|
221
|
-
|
|
222
|
-
|
|
223
|
-
|
|
224
|
-
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
|
|
228
|
-
.slice(0, 3) // cap to 3 queries
|
|
229
|
-
.map((q) => q
|
|
230
|
-
.split(' OR ')
|
|
231
|
-
.map(term => sanitizeQueryForFts(term)) // sanitize each term individually
|
|
232
|
-
.slice(0, 10) // cap terms per query
|
|
233
|
-
.join(' OR '));
|
|
192
|
+
const rawText = String(response.data ?? "");
|
|
193
|
+
const { content } = extractTaggedContent(rawText, "FILE_CONTENT");
|
|
194
|
+
const subQueries = content
|
|
195
|
+
.split(/\r?\n/)
|
|
196
|
+
.map(q => sanitizeQueryForFts(q.trim()))
|
|
197
|
+
.filter(Boolean)
|
|
198
|
+
.slice(0, 3);
|
|
199
|
+
if (!subQueries.length) {
|
|
200
|
+
throw new Error("No fallback subqueries generated");
|
|
234
201
|
}
|
|
202
|
+
return subQueries;
|
|
235
203
|
}
|
|
236
204
|
catch (err) {
|
|
237
|
-
log(`⚠️ [semanticSearchFiles]
|
|
205
|
+
log(`⚠️ [semanticSearchFiles] Fallback FTS generation failed: ${String(err)}`);
|
|
206
|
+
return null;
|
|
238
207
|
}
|
|
239
|
-
return [];
|
|
240
208
|
}
|
|
241
|
-
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
// - LLM ONLY if FTS is empty
|
|
245
|
-
// - Optimizes for recall
|
|
246
|
-
// --------------------------------------------------
|
|
209
|
+
/* -------------------------------------------------- */
|
|
210
|
+
/* PLANNER SEARCH */
|
|
211
|
+
/* -------------------------------------------------- */
|
|
247
212
|
export async function plannerSearchFiles(originalQuery, query, topK = 5) {
|
|
248
213
|
const db = getDbForRepo();
|
|
249
214
|
const seen = new Map();
|
|
250
|
-
// -----------------------------
|
|
251
|
-
// Primary FTS (always trusted)
|
|
252
|
-
// -----------------------------
|
|
253
215
|
const safeQuery = sanitizeQueryForFts(query);
|
|
254
216
|
const primaryResults = db
|
|
255
217
|
.prepare(sqlTemplates.searchFilesTemplate)
|
|
@@ -259,36 +221,31 @@ export async function plannerSearchFiles(originalQuery, query, topK = 5) {
|
|
|
259
221
|
safeQuery,
|
|
260
222
|
count: primaryResults.length,
|
|
261
223
|
});
|
|
262
|
-
// -----------------------------
|
|
263
|
-
// Only call LLM if FTS is empty
|
|
264
|
-
// -----------------------------
|
|
265
224
|
if (primaryResults.length === 0) {
|
|
266
|
-
const
|
|
267
|
-
|
|
268
|
-
|
|
269
|
-
suggestedTerms: llmTerms,
|
|
270
|
-
});
|
|
271
|
-
for (const term of llmTerms) {
|
|
272
|
-
const safeTerm = sanitizeQueryForFts(term);
|
|
225
|
+
const expanded = await expandQueryWithModel(originalQuery);
|
|
226
|
+
if (expanded) {
|
|
227
|
+
const safeTerm = sanitizeQueryForFts(expanded);
|
|
273
228
|
const rows = db
|
|
274
229
|
.prepare(sqlTemplates.searchFilesTemplate)
|
|
275
230
|
.all(safeTerm, RELATED_FILES_LIMIT);
|
|
276
|
-
|
|
277
|
-
if (!seen.has(
|
|
278
|
-
seen.set(
|
|
279
|
-
}
|
|
231
|
+
rows.forEach(r => {
|
|
232
|
+
if (!seen.has(r.id))
|
|
233
|
+
seen.set(r.id, r);
|
|
234
|
+
});
|
|
280
235
|
}
|
|
281
236
|
}
|
|
282
237
|
if (seen.size === 0)
|
|
283
238
|
return [];
|
|
284
239
|
return rankAndMap(seen, topK);
|
|
285
240
|
}
|
|
286
|
-
|
|
287
|
-
|
|
288
|
-
|
|
241
|
+
/* -------------------------------------------------- */
|
|
242
|
+
/* HELPERS */
|
|
243
|
+
/* -------------------------------------------------- */
|
|
289
244
|
function rankAndMap(seen, topK) {
|
|
290
|
-
|
|
291
|
-
|
|
245
|
+
return Array.from(seen.values())
|
|
246
|
+
.sort((a, b) => (a.bm25Score ?? 0) - (b.bm25Score ?? 0))
|
|
247
|
+
.slice(0, topK)
|
|
248
|
+
.map(r => ({
|
|
292
249
|
id: r.id,
|
|
293
250
|
path: r.path,
|
|
294
251
|
filename: r.filename,
|
|
@@ -300,32 +257,20 @@ function rankAndMap(seen, topK) {
|
|
|
300
257
|
}
|
|
301
258
|
async function expandQueryWithModel(query) {
|
|
302
259
|
const prompt = `
|
|
303
|
-
|
|
304
|
-
|
|
305
|
-
Given a natural-language question about a codebase, return a JSON array
|
|
306
|
-
of 3–8 concrete search terms that are likely to appear literally in source code.
|
|
260
|
+
Return concrete search terms likely to appear in source code.
|
|
307
261
|
|
|
308
|
-
|
|
309
|
-
- Return ONLY a JSON array of strings
|
|
310
|
-
- No explanations
|
|
311
|
-
- Prefer filenames, function names, symbols, library names
|
|
262
|
+
Wrap the result in <FILE_CONTENT> tags.
|
|
312
263
|
|
|
313
264
|
Question:
|
|
314
265
|
"${query}"
|
|
315
266
|
`.trim();
|
|
316
267
|
try {
|
|
317
268
|
const response = await generate({ content: prompt, query: "" });
|
|
318
|
-
const
|
|
319
|
-
|
|
320
|
-
|
|
321
|
-
});
|
|
322
|
-
const terms = Array.isArray(cleaned.data)
|
|
323
|
-
? cleaned.data.filter((t) => typeof t === "string")
|
|
324
|
-
: [];
|
|
325
|
-
return terms;
|
|
269
|
+
const rawText = String(response.data ?? "");
|
|
270
|
+
const { content } = extractTaggedContent(rawText, "FILE_CONTENT");
|
|
271
|
+
return sanitizeQueryForFts(content);
|
|
326
272
|
}
|
|
327
|
-
catch
|
|
328
|
-
|
|
329
|
-
return [];
|
|
273
|
+
catch {
|
|
274
|
+
return null;
|
|
330
275
|
}
|
|
331
276
|
}
|
|
@@ -28,6 +28,8 @@ export const finalAnswerModule = {
|
|
|
28
28
|
(!focus?.relevantFiles || focus.relevantFiles.includes(path)))
|
|
29
29
|
.map(([path, fa]) => ({ path, analysis: fa }))
|
|
30
30
|
.slice(0, MAX_FILES);
|
|
31
|
+
// Collect analyzed files for output
|
|
32
|
+
const analyzedFiles = meaningfulFiles.map(f => f.path);
|
|
31
33
|
// --------------------------------------------------
|
|
32
34
|
// 2️⃣ Collect supporting code snippets from working files
|
|
33
35
|
// --------------------------------------------------
|
|
@@ -104,6 +106,9 @@ ${query}
|
|
|
104
106
|
Rationale for focus:
|
|
105
107
|
${rationale}
|
|
106
108
|
|
|
109
|
+
Analyzed files:
|
|
110
|
+
${analyzedFiles.join("\n")}
|
|
111
|
+
|
|
107
112
|
==================== PROPOSED CHANGES ====================
|
|
108
113
|
|
|
109
114
|
${semanticSection}
|
|
@@ -130,17 +135,24 @@ ${codeSection}
|
|
|
130
135
|
// 5️⃣ Generate final answer
|
|
131
136
|
// --------------------------------------------------
|
|
132
137
|
const aiResponse = await generate({ query, content: prompt });
|
|
138
|
+
// ✅ Prepend analyzed files to finalText so user sees them
|
|
133
139
|
const finalText = typeof aiResponse.data === "string"
|
|
134
|
-
? aiResponse.data
|
|
135
|
-
: JSON.stringify(aiResponse.data, null, 2)
|
|
140
|
+
? `Analyzed files:\n${analyzedFiles.join("\n")}\n\n${aiResponse.data}`
|
|
141
|
+
: `Analyzed files:\n${analyzedFiles.join("\n")}\n\n${JSON.stringify(aiResponse.data, null, 2)}`;
|
|
136
142
|
context.analysis || (context.analysis = {});
|
|
137
143
|
context.analysis.finalAnswer = finalText;
|
|
138
|
-
logInputOutput("finalAnswerModule", "output",
|
|
144
|
+
logInputOutput("finalAnswerModule", "output", {
|
|
145
|
+
data: aiResponse.data,
|
|
146
|
+
analyzedFiles,
|
|
147
|
+
});
|
|
139
148
|
console.log(chalk.yellow(`\n\n[FINAL ANSWER]\n${finalText}\n`));
|
|
140
149
|
return {
|
|
141
150
|
query,
|
|
142
151
|
content: finalText,
|
|
143
|
-
data:
|
|
152
|
+
data: {
|
|
153
|
+
response: aiResponse.data,
|
|
154
|
+
analyzedFiles,
|
|
155
|
+
},
|
|
144
156
|
context,
|
|
145
157
|
};
|
|
146
158
|
},
|