opencode-codebase-index 0.2.4 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +152 -1
- package/commands/find.md +17 -5
- package/commands/index.md +16 -6
- package/commands/search.md +18 -3
- package/commands/status.md +15 -0
- package/dist/index.cjs +421 -277
- package/dist/index.cjs.map +1 -1
- package/dist/index.js +420 -277
- package/dist/index.js.map +1 -1
- package/native/codebase-index-native.darwin-arm64.node +0 -0
- package/native/codebase-index-native.darwin-x64.node +0 -0
- package/native/codebase-index-native.linux-arm64-gnu.node +0 -0
- package/native/codebase-index-native.linux-x64-gnu.node +0 -0
- package/native/codebase-index-native.win32-x64-msvc.node +0 -0
- package/package.json +1 -1
- package/skill/SKILL.md +101 -1
package/README.md
CHANGED
|
@@ -205,6 +205,7 @@ The plugin automatically registers these slash commands:
|
|
|
205
205
|
| `/search <query>` | **Pure Semantic Search**. Best for "How does X work?" |
|
|
206
206
|
| `/find <query>` | **Hybrid Search**. Combines semantic search + grep. Best for "Find usage of X". |
|
|
207
207
|
| `/index` | **Update Index**. Forces a refresh of the codebase index. |
|
|
208
|
+
| `/status` | **Check Status**. Shows if indexed, chunk count, and provider info. |
|
|
208
209
|
|
|
209
210
|
## ⚙️ Configuration
|
|
210
211
|
|
|
@@ -219,7 +220,10 @@ Zero-config by default (uses `auto` mode). Customize in `.opencode/codebase-inde
|
|
|
219
220
|
"watchFiles": true,
|
|
220
221
|
"maxFileSize": 1048576,
|
|
221
222
|
"maxChunksPerFile": 100,
|
|
222
|
-
"semanticOnly": false
|
|
223
|
+
"semanticOnly": false,
|
|
224
|
+
"autoGc": true,
|
|
225
|
+
"gcIntervalDays": 7,
|
|
226
|
+
"gcOrphanThreshold": 100
|
|
223
227
|
},
|
|
224
228
|
"search": {
|
|
225
229
|
"maxResults": 20,
|
|
@@ -244,6 +248,9 @@ Zero-config by default (uses `auto` mode). Customize in `.opencode/codebase-inde
|
|
|
244
248
|
| `semanticOnly` | `false` | When `true`, only index semantic nodes (functions, classes) and skip generic blocks |
|
|
245
249
|
| `retries` | `3` | Number of retry attempts for failed embedding API calls |
|
|
246
250
|
| `retryDelayMs` | `1000` | Delay between retries in milliseconds |
|
|
251
|
+
| `autoGc` | `true` | Automatically run garbage collection to remove orphaned embeddings/chunks |
|
|
252
|
+
| `gcIntervalDays` | `7` | Run GC on initialization if last GC was more than N days ago |
|
|
253
|
+
| `gcOrphanThreshold` | `100` | Run GC after indexing if orphan count exceeds this threshold |
|
|
247
254
|
| **search** | | |
|
|
248
255
|
| `maxResults` | `20` | Maximum results to return |
|
|
249
256
|
| `minScore` | `0.1` | Minimum similarity score (0-1). Lower = more results |
|
|
@@ -257,6 +264,150 @@ The plugin automatically detects available credentials in this order:
|
|
|
257
264
|
3. **Google** (Gemini Embeddings)
|
|
258
265
|
4. **Ollama** (Local/Private - requires `nomic-embed-text`)
|
|
259
266
|
|
|
267
|
+
### Rate Limits by Provider
|
|
268
|
+
|
|
269
|
+
Each provider has different rate limits. The plugin automatically adjusts concurrency and delays:
|
|
270
|
+
|
|
271
|
+
| Provider | Concurrency | Delay | Best For |
|
|
272
|
+
|----------|-------------|-------|----------|
|
|
273
|
+
| **GitHub Copilot** | 1 | 4s | Small codebases (<1k files) |
|
|
274
|
+
| **OpenAI** | 3 | 500ms | Medium codebases |
|
|
275
|
+
| **Google** | 5 | 200ms | Medium-large codebases |
|
|
276
|
+
| **Ollama** | 5 | None | Large codebases (10k+ files) |
|
|
277
|
+
|
|
278
|
+
**For large codebases**, use Ollama locally to avoid rate limits:
|
|
279
|
+
|
|
280
|
+
```bash
|
|
281
|
+
# Install the embedding model
|
|
282
|
+
ollama pull nomic-embed-text
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
```json
|
|
286
|
+
// .opencode/codebase-index.json
|
|
287
|
+
{
|
|
288
|
+
"embeddingProvider": "ollama"
|
|
289
|
+
}
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
## 📈 Performance
|
|
293
|
+
|
|
294
|
+
The plugin is built for speed with a Rust native module. Here are typical performance numbers (Apple M1):
|
|
295
|
+
|
|
296
|
+
### Parsing (tree-sitter)
|
|
297
|
+
|
|
298
|
+
| Files | Chunks | Time |
|
|
299
|
+
|-------|--------|------|
|
|
300
|
+
| 100 | 1,200 | ~7ms |
|
|
301
|
+
| 500 | 6,000 | ~32ms |
|
|
302
|
+
|
|
303
|
+
### Vector Search (usearch)
|
|
304
|
+
|
|
305
|
+
| Index Size | Search Time | Throughput |
|
|
306
|
+
|------------|-------------|------------|
|
|
307
|
+
| 1,000 vectors | 0.7ms | 1,400 ops/sec |
|
|
308
|
+
| 5,000 vectors | 1.2ms | 850 ops/sec |
|
|
309
|
+
| 10,000 vectors | 1.3ms | 780 ops/sec |
|
|
310
|
+
|
|
311
|
+
### Database Operations (SQLite with batch)
|
|
312
|
+
|
|
313
|
+
| Operation | 1,000 items | 10,000 items |
|
|
314
|
+
|-----------|-------------|--------------|
|
|
315
|
+
| Insert chunks | 4ms | 44ms |
|
|
316
|
+
| Add to branch | 2ms | 22ms |
|
|
317
|
+
| Check embedding exists | <0.01ms | <0.01ms |
|
|
318
|
+
|
|
319
|
+
### Batch vs Sequential Performance
|
|
320
|
+
|
|
321
|
+
Batch operations provide significant speedups:
|
|
322
|
+
|
|
323
|
+
| Operation | Sequential | Batch | Speedup |
|
|
324
|
+
|-----------|------------|-------|---------|
|
|
325
|
+
| Insert 1,000 chunks | 38ms | 4ms | **~10x** |
|
|
326
|
+
| Add 1,000 to branch | 29ms | 2ms | **~14x** |
|
|
327
|
+
| Insert 1,000 embeddings | 59ms | 40ms | **~1.5x** |
|
|
328
|
+
|
|
329
|
+
Run benchmarks yourself: `npx tsx benchmarks/run.ts`
|
|
330
|
+
|
|
331
|
+
## 🎯 Choosing a Provider
|
|
332
|
+
|
|
333
|
+
Use this decision tree to pick the right embedding provider:
|
|
334
|
+
|
|
335
|
+
```
|
|
336
|
+
┌─────────────────────────┐
|
|
337
|
+
│ Do you have Copilot? │
|
|
338
|
+
└───────────┬─────────────┘
|
|
339
|
+
┌─────┴─────┐
|
|
340
|
+
YES NO
|
|
341
|
+
│ │
|
|
342
|
+
┌───────────▼───────┐ │
|
|
343
|
+
│ Codebase < 1k │ │
|
|
344
|
+
│ files? │ │
|
|
345
|
+
└─────────┬─────────┘ │
|
|
346
|
+
┌─────┴─────┐ │
|
|
347
|
+
YES NO │
|
|
348
|
+
│ │ │
|
|
349
|
+
▼ │ │
|
|
350
|
+
┌──────────┐ │ │
|
|
351
|
+
│ Copilot │ │ │
|
|
352
|
+
│ (free) │ │ │
|
|
353
|
+
└──────────┘ │ │
|
|
354
|
+
▼ ▼
|
|
355
|
+
┌─────────────────────────┐
|
|
356
|
+
│ Need fastest indexing? │
|
|
357
|
+
└───────────┬─────────────┘
|
|
358
|
+
┌─────┴─────┐
|
|
359
|
+
YES NO
|
|
360
|
+
│ │
|
|
361
|
+
▼ ▼
|
|
362
|
+
┌──────────┐ ┌──────────────┐
|
|
363
|
+
│ Ollama │ │ OpenAI or │
|
|
364
|
+
│ (local) │ │ Google │
|
|
365
|
+
└──────────┘ └──────────────┘
|
|
366
|
+
```
|
|
367
|
+
|
|
368
|
+
### Provider Comparison
|
|
369
|
+
|
|
370
|
+
| Provider | Speed | Cost | Privacy | Best For |
|
|
371
|
+
|----------|-------|------|---------|----------|
|
|
372
|
+
| **Ollama** | Fastest | Free | Full | Large codebases, privacy-sensitive |
|
|
373
|
+
| **GitHub Copilot** | Slow (rate limited) | Free* | Cloud | Small codebases, existing subscribers |
|
|
374
|
+
| **OpenAI** | Medium | ~$0.0001/1K tokens | Cloud | General use |
|
|
375
|
+
| **Google** | Fast | Free tier available | Cloud | Medium-large codebases |
|
|
376
|
+
|
|
377
|
+
*Requires active Copilot subscription
|
|
378
|
+
|
|
379
|
+
### Setup by Provider
|
|
380
|
+
|
|
381
|
+
**Ollama (Recommended for large codebases)**
|
|
382
|
+
```bash
|
|
383
|
+
ollama pull nomic-embed-text
|
|
384
|
+
```
|
|
385
|
+
```json
|
|
386
|
+
{ "embeddingProvider": "ollama" }
|
|
387
|
+
```
|
|
388
|
+
|
|
389
|
+
**OpenAI**
|
|
390
|
+
```bash
|
|
391
|
+
export OPENAI_API_KEY=sk-...
|
|
392
|
+
```
|
|
393
|
+
```json
|
|
394
|
+
{ "embeddingProvider": "openai" }
|
|
395
|
+
```
|
|
396
|
+
|
|
397
|
+
**Google**
|
|
398
|
+
```bash
|
|
399
|
+
export GOOGLE_API_KEY=...
|
|
400
|
+
```
|
|
401
|
+
```json
|
|
402
|
+
{ "embeddingProvider": "google" }
|
|
403
|
+
```
|
|
404
|
+
|
|
405
|
+
**GitHub Copilot**
|
|
406
|
+
No setup needed if you have an active Copilot subscription.
|
|
407
|
+
```json
|
|
408
|
+
{ "embeddingProvider": "github-copilot" }
|
|
409
|
+
```
|
|
410
|
+
|
|
260
411
|
## ⚠️ Tradeoffs
|
|
261
412
|
|
|
262
413
|
Be aware of these characteristics:
|
package/commands/find.md
CHANGED
|
@@ -2,12 +2,24 @@
|
|
|
2
2
|
description: Find code using hybrid approach (semantic + grep)
|
|
3
3
|
---
|
|
4
4
|
|
|
5
|
-
Find code
|
|
5
|
+
Find code using both semantic search and grep.
|
|
6
|
+
|
|
7
|
+
User input: $ARGUMENTS
|
|
6
8
|
|
|
7
9
|
Strategy:
|
|
8
|
-
1.
|
|
9
|
-
2.
|
|
10
|
+
1. Use `codebase_search` to find semantically related code
|
|
11
|
+
2. Identify specific function/class/variable names from results
|
|
10
12
|
3. Use grep to find all occurrences of those identifiers
|
|
11
|
-
4. Combine
|
|
13
|
+
4. Combine into a comprehensive answer
|
|
14
|
+
|
|
15
|
+
Parse optional parameters from input:
|
|
16
|
+
- `limit=N` → limit semantic results
|
|
17
|
+
- `type=X` or "functions"/"classes" → filter chunk type
|
|
18
|
+
- `dir=X` → filter directory
|
|
19
|
+
|
|
20
|
+
Examples:
|
|
21
|
+
- `/find error handling middleware`
|
|
22
|
+
- `/find payment validation type=function`
|
|
23
|
+
- `/find user auth in src/services`
|
|
12
24
|
|
|
13
|
-
If
|
|
25
|
+
If no index exists, run `index_codebase` first.
|
package/commands/index.md
CHANGED
|
@@ -2,10 +2,20 @@
|
|
|
2
2
|
description: Index the codebase for semantic search
|
|
3
3
|
---
|
|
4
4
|
|
|
5
|
-
Run the `index_codebase` tool
|
|
5
|
+
Run the `index_codebase` tool with these settings:
|
|
6
6
|
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
-
|
|
11
|
-
-
|
|
7
|
+
User input: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Parse the input and set tool arguments:
|
|
10
|
+
- force=true if input contains "force"
|
|
11
|
+
- estimateOnly=true if input contains "estimate"
|
|
12
|
+
- verbose=true (always, for detailed output)
|
|
13
|
+
|
|
14
|
+
Examples:
|
|
15
|
+
- `/index` → force=false, estimateOnly=false, verbose=true
|
|
16
|
+
- `/index force` → force=true, estimateOnly=false, verbose=true
|
|
17
|
+
- `/index estimate` → force=false, estimateOnly=true, verbose=true
|
|
18
|
+
|
|
19
|
+
IMPORTANT: You MUST pass the parsed arguments to `index_codebase`. Do not ignore them.
|
|
20
|
+
|
|
21
|
+
Show final statistics including files processed, chunks indexed, tokens used, and duration.
|
package/commands/search.md
CHANGED
|
@@ -2,8 +2,23 @@
|
|
|
2
2
|
description: Search codebase by meaning using semantic search
|
|
3
3
|
---
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
Search the codebase using semantic search.
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
User input: $ARGUMENTS
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
The first part is the search query. Look for optional parameters:
|
|
10
|
+
- `limit=N` or "top N" or "first N" → set limit
|
|
11
|
+
- `type=X` or mentions "functions"/"classes"/"methods" → set chunkType
|
|
12
|
+
- `dir=X` or "in folder X" → set directory filter
|
|
13
|
+
- File extensions like ".ts", "typescript", ".py" → set fileType
|
|
14
|
+
|
|
15
|
+
Call `codebase_search` with the parsed arguments.
|
|
16
|
+
|
|
17
|
+
Examples:
|
|
18
|
+
- `/search authentication logic` → query="authentication logic"
|
|
19
|
+
- `/search error handling limit=5` → query="error handling", limit=5
|
|
20
|
+
- `/search validation functions` → query="validation", chunkType="function"
|
|
21
|
+
|
|
22
|
+
If the index doesn't exist, run `index_codebase` first.
|
|
23
|
+
|
|
24
|
+
Return results with file paths and line numbers.
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Check if the codebase is indexed and ready for semantic search
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
Run the `index_status` tool to check if the codebase index is ready.
|
|
6
|
+
|
|
7
|
+
This shows:
|
|
8
|
+
- Whether the codebase is indexed
|
|
9
|
+
- Number of indexed chunks
|
|
10
|
+
- Embedding provider and model being used
|
|
11
|
+
- Current git branch
|
|
12
|
+
|
|
13
|
+
No arguments needed - just run `index_status`.
|
|
14
|
+
|
|
15
|
+
If not indexed, suggest running `/index` to create the index.
|