opencode-codebase-index 0.2.4 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -205,6 +205,7 @@ The plugin automatically registers these slash commands:
205
205
  | `/search <query>` | **Pure Semantic Search**. Best for "How does X work?" |
206
206
  | `/find <query>` | **Hybrid Search**. Combines semantic search + grep. Best for "Find usage of X". |
207
207
  | `/index` | **Update Index**. Forces a refresh of the codebase index. |
208
+ | `/status` | **Check Status**. Shows if indexed, chunk count, and provider info. |
208
209
 
209
210
  ## ⚙️ Configuration
210
211
 
@@ -219,7 +220,10 @@ Zero-config by default (uses `auto` mode). Customize in `.opencode/codebase-inde
219
220
  "watchFiles": true,
220
221
  "maxFileSize": 1048576,
221
222
  "maxChunksPerFile": 100,
222
- "semanticOnly": false
223
+ "semanticOnly": false,
224
+ "autoGc": true,
225
+ "gcIntervalDays": 7,
226
+ "gcOrphanThreshold": 100
223
227
  },
224
228
  "search": {
225
229
  "maxResults": 20,
@@ -244,6 +248,9 @@ Zero-config by default (uses `auto` mode). Customize in `.opencode/codebase-inde
244
248
  | `semanticOnly` | `false` | When `true`, only index semantic nodes (functions, classes) and skip generic blocks |
245
249
  | `retries` | `3` | Number of retry attempts for failed embedding API calls |
246
250
  | `retryDelayMs` | `1000` | Delay between retries in milliseconds |
251
+ | `autoGc` | `true` | Automatically run garbage collection to remove orphaned embeddings/chunks |
252
+ | `gcIntervalDays` | `7` | Run GC on initialization if last GC was more than N days ago |
253
+ | `gcOrphanThreshold` | `100` | Run GC after indexing if orphan count exceeds this threshold |
247
254
  | **search** | | |
248
255
  | `maxResults` | `20` | Maximum results to return |
249
256
  | `minScore` | `0.1` | Minimum similarity score (0-1). Lower = more results |
@@ -257,6 +264,150 @@ The plugin automatically detects available credentials in this order:
257
264
  3. **Google** (Gemini Embeddings)
258
265
  4. **Ollama** (Local/Private - requires `nomic-embed-text`)
259
266
 
267
+ ### Rate Limits by Provider
268
+
269
+ Each provider has different rate limits. The plugin automatically adjusts concurrency and delays:
270
+
271
+ | Provider | Concurrency | Delay | Best For |
272
+ |----------|-------------|-------|----------|
273
+ | **GitHub Copilot** | 1 | 4s | Small codebases (<1k files) |
274
+ | **OpenAI** | 3 | 500ms | Medium codebases |
275
+ | **Google** | 5 | 200ms | Medium-large codebases |
276
+ | **Ollama** | 5 | None | Large codebases (10k+ files) |
277
+
278
+ **For large codebases**, use Ollama locally to avoid rate limits:
279
+
280
+ ```bash
281
+ # Install the embedding model
282
+ ollama pull nomic-embed-text
283
+ ```
284
+
285
+ ```json
286
+ // .opencode/codebase-index.json
287
+ {
288
+ "embeddingProvider": "ollama"
289
+ }
290
+ ```
291
+
292
+ ## 📈 Performance
293
+
294
+ The plugin is built for speed with a Rust native module. Here are typical performance numbers (Apple M1):
295
+
296
+ ### Parsing (tree-sitter)
297
+
298
+ | Files | Chunks | Time |
299
+ |-------|--------|------|
300
+ | 100 | 1,200 | ~7ms |
301
+ | 500 | 6,000 | ~32ms |
302
+
303
+ ### Vector Search (usearch)
304
+
305
+ | Index Size | Search Time | Throughput |
306
+ |------------|-------------|------------|
307
+ | 1,000 vectors | 0.7ms | 1,400 ops/sec |
308
+ | 5,000 vectors | 1.2ms | 850 ops/sec |
309
+ | 10,000 vectors | 1.3ms | 780 ops/sec |
310
+
311
+ ### Database Operations (SQLite with batch)
312
+
313
+ | Operation | 1,000 items | 10,000 items |
314
+ |-----------|-------------|--------------|
315
+ | Insert chunks | 4ms | 44ms |
316
+ | Add to branch | 2ms | 22ms |
317
+ | Check embedding exists | <0.01ms | <0.01ms |
318
+
319
+ ### Batch vs Sequential Performance
320
+
321
+ Batch operations provide significant speedups:
322
+
323
+ | Operation | Sequential | Batch | Speedup |
324
+ |-----------|------------|-------|---------|
325
+ | Insert 1,000 chunks | 38ms | 4ms | **~10x** |
326
+ | Add 1,000 to branch | 29ms | 2ms | **~14x** |
327
+ | Insert 1,000 embeddings | 59ms | 40ms | **~1.5x** |
328
+
329
+ Run benchmarks yourself: `npx tsx benchmarks/run.ts`
330
+
331
+ ## 🎯 Choosing a Provider
332
+
333
+ Use this decision tree to pick the right embedding provider:
334
+
335
+ ```
336
+ ┌─────────────────────────┐
337
+ │ Do you have Copilot? │
338
+ └───────────┬─────────────┘
339
+ ┌─────┴─────┐
340
+ YES NO
341
+ │ │
342
+ ┌───────────▼───────┐ │
343
+ │ Codebase < 1k │ │
344
+ │ files? │ │
345
+ └─────────┬─────────┘ │
346
+ ┌─────┴─────┐ │
347
+ YES NO │
348
+ │ │ │
349
+ ▼ │ │
350
+ ┌──────────┐ │ │
351
+ │ Copilot │ │ │
352
+ │ (free) │ │ │
353
+ └──────────┘ │ │
354
+ ▼ ▼
355
+ ┌─────────────────────────┐
356
+ │ Need fastest indexing? │
357
+ └───────────┬─────────────┘
358
+ ┌─────┴─────┐
359
+ YES NO
360
+ │ │
361
+ ▼ ▼
362
+ ┌──────────┐ ┌──────────────┐
363
+ │ Ollama │ │ OpenAI or │
364
+ │ (local) │ │ Google │
365
+ └──────────┘ └──────────────┘
366
+ ```
367
+
368
+ ### Provider Comparison
369
+
370
+ | Provider | Speed | Cost | Privacy | Best For |
371
+ |----------|-------|------|---------|----------|
372
+ | **Ollama** | Fastest | Free | Full | Large codebases, privacy-sensitive |
373
+ | **GitHub Copilot** | Slow (rate limited) | Free* | Cloud | Small codebases, existing subscribers |
374
+ | **OpenAI** | Medium | ~$0.0001/1K tokens | Cloud | General use |
375
+ | **Google** | Fast | Free tier available | Cloud | Medium-large codebases |
376
+
377
+ *Requires active Copilot subscription
378
+
379
+ ### Setup by Provider
380
+
381
+ **Ollama (Recommended for large codebases)**
382
+ ```bash
383
+ ollama pull nomic-embed-text
384
+ ```
385
+ ```json
386
+ { "embeddingProvider": "ollama" }
387
+ ```
388
+
389
+ **OpenAI**
390
+ ```bash
391
+ export OPENAI_API_KEY=sk-...
392
+ ```
393
+ ```json
394
+ { "embeddingProvider": "openai" }
395
+ ```
396
+
397
+ **Google**
398
+ ```bash
399
+ export GOOGLE_API_KEY=...
400
+ ```
401
+ ```json
402
+ { "embeddingProvider": "google" }
403
+ ```
404
+
405
+ **GitHub Copilot**
406
+ No setup needed if you have an active Copilot subscription.
407
+ ```json
408
+ { "embeddingProvider": "github-copilot" }
409
+ ```
410
+
260
411
  ## ⚠️ Tradeoffs
261
412
 
262
413
  Be aware of these characteristics:
package/commands/find.md CHANGED
@@ -2,12 +2,24 @@
2
2
  description: Find code using hybrid approach (semantic + grep)
3
3
  ---
4
4
 
5
- Find code related to: $ARGUMENTS
5
+ Find code using both semantic search and grep.
6
+
7
+ User input: $ARGUMENTS
6
8
 
7
9
  Strategy:
8
- 1. First use `codebase_search` to find semantically related code
9
- 2. From the results, identify specific function/class names
10
+ 1. Use `codebase_search` to find semantically related code
11
+ 2. Identify specific function/class/variable names from results
10
12
  3. Use grep to find all occurrences of those identifiers
11
- 4. Combine findings into a comprehensive answer
13
+ 4. Combine into a comprehensive answer
14
+
15
+ Parse optional parameters from input:
16
+ - `limit=N` → limit semantic results
17
+ - `type=X` or "functions"/"classes" → filter chunk type
18
+ - `dir=X` → filter directory
19
+
20
+ Examples:
21
+ - `/find error handling middleware`
22
+ - `/find payment validation type=function`
23
+ - `/find user auth in src/services`
12
24
 
13
- If the semantic index doesn't exist, run `index_codebase` first.
25
+ If no index exists, run `index_codebase` first.
package/commands/index.md CHANGED
@@ -2,10 +2,20 @@
2
2
  description: Index the codebase for semantic search
3
3
  ---
4
4
 
5
- Run the `index_codebase` tool to create or update the semantic search index.
5
+ Run the `index_codebase` tool with these settings:
6
6
 
7
- Show progress and final statistics including:
8
- - Number of files processed
9
- - Number of chunks indexed
10
- - Tokens used
11
- - Duration
7
+ User input: $ARGUMENTS
8
+
9
+ Parse the input and set tool arguments:
10
+ - force=true if input contains "force"
11
+ - estimateOnly=true if input contains "estimate"
12
+ - verbose=true (always, for detailed output)
13
+
14
+ Examples:
15
+ - `/index` → force=false, estimateOnly=false, verbose=true
16
+ - `/index force` → force=true, estimateOnly=false, verbose=true
17
+ - `/index estimate` → force=false, estimateOnly=true, verbose=true
18
+
19
+ IMPORTANT: You MUST pass the parsed arguments to `index_codebase`. Do not ignore them.
20
+
21
+ Show final statistics including files processed, chunks indexed, tokens used, and duration.
@@ -2,8 +2,23 @@
2
2
  description: Search codebase by meaning using semantic search
3
3
  ---
4
4
 
5
- Use the `codebase_search` tool to find code related to: $ARGUMENTS
5
+ Search the codebase using semantic search.
6
6
 
7
- If the index doesn't exist yet, run `index_codebase` first.
7
+ User input: $ARGUMENTS
8
8
 
9
- Return the most relevant results with file paths and line numbers.
9
+ The first part is the search query. Look for optional parameters:
10
+ - `limit=N` or "top N" or "first N" → set limit
11
+ - `type=X` or mentions "functions"/"classes"/"methods" → set chunkType
12
+ - `dir=X` or "in folder X" → set directory filter
13
+ - File extensions like ".ts", "typescript", ".py" → set fileType
14
+
15
+ Call `codebase_search` with the parsed arguments.
16
+
17
+ Examples:
18
+ - `/search authentication logic` → query="authentication logic"
19
+ - `/search error handling limit=5` → query="error handling", limit=5
20
+ - `/search validation functions` → query="validation", chunkType="function"
21
+
22
+ If the index doesn't exist, run `index_codebase` first.
23
+
24
+ Return results with file paths and line numbers.
@@ -0,0 +1,15 @@
1
+ ---
2
+ description: Check if the codebase is indexed and ready for semantic search
3
+ ---
4
+
5
+ Run the `index_status` tool to check if the codebase index is ready.
6
+
7
+ This shows:
8
+ - Whether the codebase is indexed
9
+ - Number of indexed chunks
10
+ - Embedding provider and model being used
11
+ - Current git branch
12
+
13
+ No arguments needed - just run `index_status`.
14
+
15
+ If not indexed, suggest running `/index` to create the index.