opencode-codebase-index 0.2.5 → 0.3.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +176 -1
- package/commands/find.md +17 -5
- package/commands/index.md +16 -6
- package/commands/search.md +18 -3
- package/commands/status.md +15 -0
- package/dist/index.cjs +971 -286
- package/dist/index.cjs.map +1 -1
- package/dist/index.js +970 -286
- package/dist/index.js.map +1 -1
- package/native/codebase-index-native.darwin-arm64.node +0 -0
- package/native/codebase-index-native.darwin-x64.node +0 -0
- package/native/codebase-index-native.linux-arm64-gnu.node +0 -0
- package/native/codebase-index-native.linux-x64-gnu.node +0 -0
- package/native/codebase-index-native.win32-x64-msvc.node +0 -0
- package/package.json +3 -1
- package/skill/SKILL.md +116 -1
package/README.md
CHANGED
|
@@ -117,6 +117,8 @@ graph TD
|
|
|
117
117
|
```
|
|
118
118
|
|
|
119
119
|
1. **Parsing**: We use `tree-sitter` to intelligently parse your code into meaningful blocks (functions, classes, interfaces). JSDoc comments and docstrings are automatically included with their associated code.
|
|
120
|
+
|
|
121
|
+
**Supported Languages**: TypeScript, JavaScript, Python, Rust, Go, Java, C#, Ruby, Bash, C, C++, JSON, TOML, YAML
|
|
120
122
|
2. **Chunking**: Large blocks are split with overlapping windows to preserve context across chunk boundaries.
|
|
121
123
|
3. **Embedding**: These blocks are converted into vector representations using your configured AI provider.
|
|
122
124
|
4. **Storage**: Embeddings are stored in SQLite (deduplicated by content hash) and vectors in `usearch` with F16 quantization for 50% memory savings. A branch catalog tracks which chunks exist on each branch.
|
|
@@ -196,6 +198,14 @@ Checks if the index is ready and healthy.
|
|
|
196
198
|
### `index_health_check`
|
|
197
199
|
Maintenance tool to remove stale entries from deleted files and orphaned embeddings/chunks from the database.
|
|
198
200
|
|
|
201
|
+
### `index_metrics`
|
|
202
|
+
Returns collected metrics about indexing and search performance. Requires `debug.enabled` and `debug.metrics` to be `true`.
|
|
203
|
+
- **Metrics include**: Files indexed, chunks created, cache hit rate, search timing breakdown, GC stats, embedding API call stats.
|
|
204
|
+
|
|
205
|
+
### `index_logs`
|
|
206
|
+
Returns recent debug logs with optional filtering.
|
|
207
|
+
- **Parameters**: `category` (optional: `search`, `embedding`, `cache`, `gc`, `branch`), `level` (optional: `error`, `warn`, `info`, `debug`), `limit` (default: 50).
|
|
208
|
+
|
|
199
209
|
## 🎮 Slash Commands
|
|
200
210
|
|
|
201
211
|
The plugin automatically registers these slash commands:
|
|
@@ -205,6 +215,7 @@ The plugin automatically registers these slash commands:
|
|
|
205
215
|
| `/search <query>` | **Pure Semantic Search**. Best for "How does X work?" |
|
|
206
216
|
| `/find <query>` | **Hybrid Search**. Combines semantic search + grep. Best for "Find usage of X". |
|
|
207
217
|
| `/index` | **Update Index**. Forces a refresh of the codebase index. |
|
|
218
|
+
| `/status` | **Check Status**. Shows if indexed, chunk count, and provider info. |
|
|
208
219
|
|
|
209
220
|
## ⚙️ Configuration
|
|
210
221
|
|
|
@@ -219,13 +230,21 @@ Zero-config by default (uses `auto` mode). Customize in `.opencode/codebase-inde
|
|
|
219
230
|
"watchFiles": true,
|
|
220
231
|
"maxFileSize": 1048576,
|
|
221
232
|
"maxChunksPerFile": 100,
|
|
222
|
-
"semanticOnly": false
|
|
233
|
+
"semanticOnly": false,
|
|
234
|
+
"autoGc": true,
|
|
235
|
+
"gcIntervalDays": 7,
|
|
236
|
+
"gcOrphanThreshold": 100
|
|
223
237
|
},
|
|
224
238
|
"search": {
|
|
225
239
|
"maxResults": 20,
|
|
226
240
|
"minScore": 0.1,
|
|
227
241
|
"hybridWeight": 0.5,
|
|
228
242
|
"contextLines": 0
|
|
243
|
+
},
|
|
244
|
+
"debug": {
|
|
245
|
+
"enabled": false,
|
|
246
|
+
"logLevel": "info",
|
|
247
|
+
"metrics": false
|
|
229
248
|
}
|
|
230
249
|
}
|
|
231
250
|
```
|
|
@@ -244,11 +263,23 @@ Zero-config by default (uses `auto` mode). Customize in `.opencode/codebase-inde
|
|
|
244
263
|
| `semanticOnly` | `false` | When `true`, only index semantic nodes (functions, classes) and skip generic blocks |
|
|
245
264
|
| `retries` | `3` | Number of retry attempts for failed embedding API calls |
|
|
246
265
|
| `retryDelayMs` | `1000` | Delay between retries in milliseconds |
|
|
266
|
+
| `autoGc` | `true` | Automatically run garbage collection to remove orphaned embeddings/chunks |
|
|
267
|
+
| `gcIntervalDays` | `7` | Run GC on initialization if last GC was more than N days ago |
|
|
268
|
+
| `gcOrphanThreshold` | `100` | Run GC after indexing if orphan count exceeds this threshold |
|
|
247
269
|
| **search** | | |
|
|
248
270
|
| `maxResults` | `20` | Maximum results to return |
|
|
249
271
|
| `minScore` | `0.1` | Minimum similarity score (0-1). Lower = more results |
|
|
250
272
|
| `hybridWeight` | `0.5` | Balance between keyword (1.0) and semantic (0.0) search |
|
|
251
273
|
| `contextLines` | `0` | Extra lines to include before/after each match |
|
|
274
|
+
| **debug** | | |
|
|
275
|
+
| `enabled` | `false` | Enable debug logging and metrics collection |
|
|
276
|
+
| `logLevel` | `"info"` | Log level: `error`, `warn`, `info`, `debug` |
|
|
277
|
+
| `logSearch` | `true` | Log search operations with timing breakdown |
|
|
278
|
+
| `logEmbedding` | `true` | Log embedding API calls (success, error, rate-limit) |
|
|
279
|
+
| `logCache` | `true` | Log cache hits and misses |
|
|
280
|
+
| `logGc` | `true` | Log garbage collection operations |
|
|
281
|
+
| `logBranch` | `true` | Log branch detection and switches |
|
|
282
|
+
| `metrics` | `false` | Enable metrics collection (indexing stats, search timing, cache performance) |
|
|
252
283
|
|
|
253
284
|
### Embedding Providers
|
|
254
285
|
The plugin automatically detects available credentials in this order:
|
|
@@ -257,6 +288,150 @@ The plugin automatically detects available credentials in this order:
|
|
|
257
288
|
3. **Google** (Gemini Embeddings)
|
|
258
289
|
4. **Ollama** (Local/Private - requires `nomic-embed-text`)
|
|
259
290
|
|
|
291
|
+
### Rate Limits by Provider
|
|
292
|
+
|
|
293
|
+
Each provider has different rate limits. The plugin automatically adjusts concurrency and delays:
|
|
294
|
+
|
|
295
|
+
| Provider | Concurrency | Delay | Best For |
|
|
296
|
+
|----------|-------------|-------|----------|
|
|
297
|
+
| **GitHub Copilot** | 1 | 4s | Small codebases (<1k files) |
|
|
298
|
+
| **OpenAI** | 3 | 500ms | Medium codebases |
|
|
299
|
+
| **Google** | 5 | 200ms | Medium-large codebases |
|
|
300
|
+
| **Ollama** | 5 | None | Large codebases (10k+ files) |
|
|
301
|
+
|
|
302
|
+
**For large codebases**, use Ollama locally to avoid rate limits:
|
|
303
|
+
|
|
304
|
+
```bash
|
|
305
|
+
# Install the embedding model
|
|
306
|
+
ollama pull nomic-embed-text
|
|
307
|
+
```
|
|
308
|
+
|
|
309
|
+
```json
|
|
310
|
+
// .opencode/codebase-index.json
|
|
311
|
+
{
|
|
312
|
+
"embeddingProvider": "ollama"
|
|
313
|
+
}
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
## 📈 Performance
|
|
317
|
+
|
|
318
|
+
The plugin is built for speed with a Rust native module. Here are typical performance numbers (Apple M1):
|
|
319
|
+
|
|
320
|
+
### Parsing (tree-sitter)
|
|
321
|
+
|
|
322
|
+
| Files | Chunks | Time |
|
|
323
|
+
|-------|--------|------|
|
|
324
|
+
| 100 | 1,200 | ~7ms |
|
|
325
|
+
| 500 | 6,000 | ~32ms |
|
|
326
|
+
|
|
327
|
+
### Vector Search (usearch)
|
|
328
|
+
|
|
329
|
+
| Index Size | Search Time | Throughput |
|
|
330
|
+
|------------|-------------|------------|
|
|
331
|
+
| 1,000 vectors | 0.7ms | 1,400 ops/sec |
|
|
332
|
+
| 5,000 vectors | 1.2ms | 850 ops/sec |
|
|
333
|
+
| 10,000 vectors | 1.3ms | 780 ops/sec |
|
|
334
|
+
|
|
335
|
+
### Database Operations (SQLite with batch)
|
|
336
|
+
|
|
337
|
+
| Operation | 1,000 items | 10,000 items |
|
|
338
|
+
|-----------|-------------|--------------|
|
|
339
|
+
| Insert chunks | 4ms | 44ms |
|
|
340
|
+
| Add to branch | 2ms | 22ms |
|
|
341
|
+
| Check embedding exists | <0.01ms | <0.01ms |
|
|
342
|
+
|
|
343
|
+
### Batch vs Sequential Performance
|
|
344
|
+
|
|
345
|
+
Batch operations provide significant speedups:
|
|
346
|
+
|
|
347
|
+
| Operation | Sequential | Batch | Speedup |
|
|
348
|
+
|-----------|------------|-------|---------|
|
|
349
|
+
| Insert 1,000 chunks | 38ms | 4ms | **~10x** |
|
|
350
|
+
| Add 1,000 to branch | 29ms | 2ms | **~14x** |
|
|
351
|
+
| Insert 1,000 embeddings | 59ms | 40ms | **~1.5x** |
|
|
352
|
+
|
|
353
|
+
Run benchmarks yourself: `npx tsx benchmarks/run.ts`
|
|
354
|
+
|
|
355
|
+
## 🎯 Choosing a Provider
|
|
356
|
+
|
|
357
|
+
Use this decision tree to pick the right embedding provider:
|
|
358
|
+
|
|
359
|
+
```
|
|
360
|
+
┌─────────────────────────┐
|
|
361
|
+
│ Do you have Copilot? │
|
|
362
|
+
└───────────┬─────────────┘
|
|
363
|
+
┌─────┴─────┐
|
|
364
|
+
YES NO
|
|
365
|
+
│ │
|
|
366
|
+
┌───────────▼───────┐ │
|
|
367
|
+
│ Codebase < 1k │ │
|
|
368
|
+
│ files? │ │
|
|
369
|
+
└─────────┬─────────┘ │
|
|
370
|
+
┌─────┴─────┐ │
|
|
371
|
+
YES NO │
|
|
372
|
+
│ │ │
|
|
373
|
+
▼ │ │
|
|
374
|
+
┌──────────┐ │ │
|
|
375
|
+
│ Copilot │ │ │
|
|
376
|
+
│ (free) │ │ │
|
|
377
|
+
└──────────┘ │ │
|
|
378
|
+
▼ ▼
|
|
379
|
+
┌─────────────────────────┐
|
|
380
|
+
│ Need fastest indexing? │
|
|
381
|
+
└───────────┬─────────────┘
|
|
382
|
+
┌─────┴─────┐
|
|
383
|
+
YES NO
|
|
384
|
+
│ │
|
|
385
|
+
▼ ▼
|
|
386
|
+
┌──────────┐ ┌──────────────┐
|
|
387
|
+
│ Ollama │ │ OpenAI or │
|
|
388
|
+
│ (local) │ │ Google │
|
|
389
|
+
└──────────┘ └──────────────┘
|
|
390
|
+
```
|
|
391
|
+
|
|
392
|
+
### Provider Comparison
|
|
393
|
+
|
|
394
|
+
| Provider | Speed | Cost | Privacy | Best For |
|
|
395
|
+
|----------|-------|------|---------|----------|
|
|
396
|
+
| **Ollama** | Fastest | Free | Full | Large codebases, privacy-sensitive |
|
|
397
|
+
| **GitHub Copilot** | Slow (rate limited) | Free* | Cloud | Small codebases, existing subscribers |
|
|
398
|
+
| **OpenAI** | Medium | ~$0.0001/1K tokens | Cloud | General use |
|
|
399
|
+
| **Google** | Fast | Free tier available | Cloud | Medium-large codebases |
|
|
400
|
+
|
|
401
|
+
*Requires active Copilot subscription
|
|
402
|
+
|
|
403
|
+
### Setup by Provider
|
|
404
|
+
|
|
405
|
+
**Ollama (Recommended for large codebases)**
|
|
406
|
+
```bash
|
|
407
|
+
ollama pull nomic-embed-text
|
|
408
|
+
```
|
|
409
|
+
```json
|
|
410
|
+
{ "embeddingProvider": "ollama" }
|
|
411
|
+
```
|
|
412
|
+
|
|
413
|
+
**OpenAI**
|
|
414
|
+
```bash
|
|
415
|
+
export OPENAI_API_KEY=sk-...
|
|
416
|
+
```
|
|
417
|
+
```json
|
|
418
|
+
{ "embeddingProvider": "openai" }
|
|
419
|
+
```
|
|
420
|
+
|
|
421
|
+
**Google**
|
|
422
|
+
```bash
|
|
423
|
+
export GOOGLE_API_KEY=...
|
|
424
|
+
```
|
|
425
|
+
```json
|
|
426
|
+
{ "embeddingProvider": "google" }
|
|
427
|
+
```
|
|
428
|
+
|
|
429
|
+
**GitHub Copilot**
|
|
430
|
+
No setup needed if you have an active Copilot subscription.
|
|
431
|
+
```json
|
|
432
|
+
{ "embeddingProvider": "github-copilot" }
|
|
433
|
+
```
|
|
434
|
+
|
|
260
435
|
## ⚠️ Tradeoffs
|
|
261
436
|
|
|
262
437
|
Be aware of these characteristics:
|
package/commands/find.md
CHANGED
|
@@ -2,12 +2,24 @@
|
|
|
2
2
|
description: Find code using hybrid approach (semantic + grep)
|
|
3
3
|
---
|
|
4
4
|
|
|
5
|
-
Find code
|
|
5
|
+
Find code using both semantic search and grep.
|
|
6
|
+
|
|
7
|
+
User input: $ARGUMENTS
|
|
6
8
|
|
|
7
9
|
Strategy:
|
|
8
|
-
1.
|
|
9
|
-
2.
|
|
10
|
+
1. Use `codebase_search` to find semantically related code
|
|
11
|
+
2. Identify specific function/class/variable names from results
|
|
10
12
|
3. Use grep to find all occurrences of those identifiers
|
|
11
|
-
4. Combine
|
|
13
|
+
4. Combine into a comprehensive answer
|
|
14
|
+
|
|
15
|
+
Parse optional parameters from input:
|
|
16
|
+
- `limit=N` → limit semantic results
|
|
17
|
+
- `type=X` or "functions"/"classes" → filter chunk type
|
|
18
|
+
- `dir=X` → filter directory
|
|
19
|
+
|
|
20
|
+
Examples:
|
|
21
|
+
- `/find error handling middleware`
|
|
22
|
+
- `/find payment validation type=function`
|
|
23
|
+
- `/find user auth in src/services`
|
|
12
24
|
|
|
13
|
-
If
|
|
25
|
+
If no index exists, run `index_codebase` first.
|
package/commands/index.md
CHANGED
|
@@ -2,10 +2,20 @@
|
|
|
2
2
|
description: Index the codebase for semantic search
|
|
3
3
|
---
|
|
4
4
|
|
|
5
|
-
Run the `index_codebase` tool
|
|
5
|
+
Run the `index_codebase` tool with these settings:
|
|
6
6
|
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
-
|
|
11
|
-
-
|
|
7
|
+
User input: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Parse the input and set tool arguments:
|
|
10
|
+
- force=true if input contains "force"
|
|
11
|
+
- estimateOnly=true if input contains "estimate"
|
|
12
|
+
- verbose=true (always, for detailed output)
|
|
13
|
+
|
|
14
|
+
Examples:
|
|
15
|
+
- `/index` → force=false, estimateOnly=false, verbose=true
|
|
16
|
+
- `/index force` → force=true, estimateOnly=false, verbose=true
|
|
17
|
+
- `/index estimate` → force=false, estimateOnly=true, verbose=true
|
|
18
|
+
|
|
19
|
+
IMPORTANT: You MUST pass the parsed arguments to `index_codebase`. Do not ignore them.
|
|
20
|
+
|
|
21
|
+
Show final statistics including files processed, chunks indexed, tokens used, and duration.
|
package/commands/search.md
CHANGED
|
@@ -2,8 +2,23 @@
|
|
|
2
2
|
description: Search codebase by meaning using semantic search
|
|
3
3
|
---
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
Search the codebase using semantic search.
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
User input: $ARGUMENTS
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
The first part is the search query. Look for optional parameters:
|
|
10
|
+
- `limit=N` or "top N" or "first N" → set limit
|
|
11
|
+
- `type=X` or mentions "functions"/"classes"/"methods" → set chunkType
|
|
12
|
+
- `dir=X` or "in folder X" → set directory filter
|
|
13
|
+
- File extensions like ".ts", "typescript", ".py" → set fileType
|
|
14
|
+
|
|
15
|
+
Call `codebase_search` with the parsed arguments.
|
|
16
|
+
|
|
17
|
+
Examples:
|
|
18
|
+
- `/search authentication logic` → query="authentication logic"
|
|
19
|
+
- `/search error handling limit=5` → query="error handling", limit=5
|
|
20
|
+
- `/search validation functions` → query="validation", chunkType="function"
|
|
21
|
+
|
|
22
|
+
If the index doesn't exist, run `index_codebase` first.
|
|
23
|
+
|
|
24
|
+
Return results with file paths and line numbers.
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Check if the codebase is indexed and ready for semantic search
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
Run the `index_status` tool to check if the codebase index is ready.
|
|
6
|
+
|
|
7
|
+
This shows:
|
|
8
|
+
- Whether the codebase is indexed
|
|
9
|
+
- Number of indexed chunks
|
|
10
|
+
- Embedding provider and model being used
|
|
11
|
+
- Current git branch
|
|
12
|
+
|
|
13
|
+
No arguments needed - just run `index_status`.
|
|
14
|
+
|
|
15
|
+
If not indexed, suggest running `/index` to create the index.
|