npm - opencode-codebase-index - Versions diffs - 0.2.5 → 0.3.2 - Mend

opencode-codebase-index 0.2.5 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/README.md +176 -1
package/commands/find.md +17 -5
package/commands/index.md +16 -6
package/commands/search.md +18 -3
package/commands/status.md +15 -0
package/dist/index.cjs +971 -286
package/dist/index.cjs.map +1 -1
package/dist/index.js +970 -286
package/dist/index.js.map +1 -1
package/native/codebase-index-native.darwin-arm64.node +0 -0
package/native/codebase-index-native.darwin-x64.node +0 -0
package/native/codebase-index-native.linux-arm64-gnu.node +0 -0
package/native/codebase-index-native.linux-x64-gnu.node +0 -0
package/native/codebase-index-native.win32-x64-msvc.node +0 -0
package/package.json +3 -1
package/skill/SKILL.md +116 -1

package/README.md CHANGED Viewed

@@ -117,6 +117,8 @@ graph TD
 ```
 1. **Parsing**: We use `tree-sitter` to intelligently parse your code into meaningful blocks (functions, classes, interfaces). JSDoc comments and docstrings are automatically included with their associated code.
+**Supported Languages**: TypeScript, JavaScript, Python, Rust, Go, Java, C#, Ruby, Bash, C, C++, JSON, TOML, YAML
 2. **Chunking**: Large blocks are split with overlapping windows to preserve context across chunk boundaries.
 3. **Embedding**: These blocks are converted into vector representations using your configured AI provider.
 4. **Storage**: Embeddings are stored in SQLite (deduplicated by content hash) and vectors in `usearch` with F16 quantization for 50% memory savings. A branch catalog tracks which chunks exist on each branch.
@@ -196,6 +198,14 @@ Checks if the index is ready and healthy.
 ### `index_health_check`
 Maintenance tool to remove stale entries from deleted files and orphaned embeddings/chunks from the database.
+### `index_metrics`
+Returns collected metrics about indexing and search performance. Requires `debug.enabled` and `debug.metrics` to be `true`.
+- **Metrics include**: Files indexed, chunks created, cache hit rate, search timing breakdown, GC stats, embedding API call stats.
+### `index_logs`
+Returns recent debug logs with optional filtering.
+- **Parameters**: `category` (optional: `search`, `embedding`, `cache`, `gc`, `branch`), `level` (optional: `error`, `warn`, `info`, `debug`), `limit` (default: 50).
 ## 🎮 Slash Commands
 The plugin automatically registers these slash commands:
@@ -205,6 +215,7 @@ The plugin automatically registers these slash commands:
 | `/search <query>` | **Pure Semantic Search**. Best for "How does X work?" |
 | `/find <query>` | **Hybrid Search**. Combines semantic search + grep. Best for "Find usage of X". |
 | `/index` | **Update Index**. Forces a refresh of the codebase index. |
+| `/status` | **Check Status**. Shows if indexed, chunk count, and provider info. |
 ## ⚙️ Configuration
@@ -219,13 +230,21 @@ Zero-config by default (uses `auto` mode). Customize in `.opencode/codebase-inde
     "watchFiles": true,
     "maxFileSize": 1048576,
     "maxChunksPerFile": 100,
-    "semanticOnly": false
+    "semanticOnly": false,
+    "autoGc": true,
+    "gcIntervalDays": 7,
+    "gcOrphanThreshold": 100
   },
   "search": {
     "maxResults": 20,
     "minScore": 0.1,
     "hybridWeight": 0.5,
     "contextLines": 0
+  },
+  "debug": {
+    "enabled": false,
+    "logLevel": "info",
+    "metrics": false
   }
 }
 ```
@@ -244,11 +263,23 @@ Zero-config by default (uses `auto` mode). Customize in `.opencode/codebase-inde
 | `semanticOnly` | `false` | When `true`, only index semantic nodes (functions, classes) and skip generic blocks |
 | `retries` | `3` | Number of retry attempts for failed embedding API calls |
 | `retryDelayMs` | `1000` | Delay between retries in milliseconds |
+| `autoGc` | `true` | Automatically run garbage collection to remove orphaned embeddings/chunks |
+| `gcIntervalDays` | `7` | Run GC on initialization if last GC was more than N days ago |
+| `gcOrphanThreshold` | `100` | Run GC after indexing if orphan count exceeds this threshold |
 | **search** | | |
 | `maxResults` | `20` | Maximum results to return |
 | `minScore` | `0.1` | Minimum similarity score (0-1). Lower = more results |
 | `hybridWeight` | `0.5` | Balance between keyword (1.0) and semantic (0.0) search |
 | `contextLines` | `0` | Extra lines to include before/after each match |
+| **debug** | | |
+| `enabled` | `false` | Enable debug logging and metrics collection |
+| `logLevel` | `"info"` | Log level: `error`, `warn`, `info`, `debug` |
+| `logSearch` | `true` | Log search operations with timing breakdown |
+| `logEmbedding` | `true` | Log embedding API calls (success, error, rate-limit) |
+| `logCache` | `true` | Log cache hits and misses |
+| `logGc` | `true` | Log garbage collection operations |
+| `logBranch` | `true` | Log branch detection and switches |
+| `metrics` | `false` | Enable metrics collection (indexing stats, search timing, cache performance) |
 ### Embedding Providers
 The plugin automatically detects available credentials in this order:
@@ -257,6 +288,150 @@ The plugin automatically detects available credentials in this order:
 3. **Google** (Gemini Embeddings)
 4. **Ollama** (Local/Private - requires `nomic-embed-text`)
+### Rate Limits by Provider
+Each provider has different rate limits. The plugin automatically adjusts concurrency and delays:
+| Provider | Concurrency | Delay | Best For |
+|----------|-------------|-------|----------|
+| **GitHub Copilot** | 1 | 4s | Small codebases (<1k files) |
+| **OpenAI** | 3 | 500ms | Medium codebases |
+| **Google** | 5 | 200ms | Medium-large codebases |
+| **Ollama** | 5 | None | Large codebases (10k+ files) |
+**For large codebases**, use Ollama locally to avoid rate limits:
+```bash
+# Install the embedding model
+ollama pull nomic-embed-text
+```
+```json
+// .opencode/codebase-index.json
+{
+  "embeddingProvider": "ollama"
+}
+```
+## 📈 Performance
+The plugin is built for speed with a Rust native module. Here are typical performance numbers (Apple M1):
+### Parsing (tree-sitter)
+| Files | Chunks | Time |
+|-------|--------|------|
+| 100 | 1,200 | ~7ms |
+| 500 | 6,000 | ~32ms |
+### Vector Search (usearch)
+| Index Size | Search Time | Throughput |
+|------------|-------------|------------|
+| 1,000 vectors | 0.7ms | 1,400 ops/sec |
+| 5,000 vectors | 1.2ms | 850 ops/sec |
+| 10,000 vectors | 1.3ms | 780 ops/sec |
+### Database Operations (SQLite with batch)
+| Operation | 1,000 items | 10,000 items |
+|-----------|-------------|--------------|
+| Insert chunks | 4ms | 44ms |
+| Add to branch | 2ms | 22ms |
+| Check embedding exists | <0.01ms | <0.01ms |
+### Batch vs Sequential Performance
+Batch operations provide significant speedups:
+| Operation | Sequential | Batch | Speedup |
+|-----------|------------|-------|---------|
+| Insert 1,000 chunks | 38ms | 4ms | **~10x** |
+| Add 1,000 to branch | 29ms | 2ms | **~14x** |
+| Insert 1,000 embeddings | 59ms | 40ms | **~1.5x** |
+Run benchmarks yourself: `npx tsx benchmarks/run.ts`
+## 🎯 Choosing a Provider
+Use this decision tree to pick the right embedding provider:
+```
+                    ┌─────────────────────────┐
+                    │ Do you have Copilot?    │
+                    └───────────┬─────────────┘
+                          ┌─────┴─────┐
+                         YES          NO
+                          │            │
+              ┌───────────▼───────┐    │
+              │ Codebase < 1k     │    │
+              │ files?            │    │
+              └─────────┬─────────┘    │
+                  ┌─────┴─────┐        │
+                 YES          NO       │
+                  │            │       │
+                  ▼            │       │
+           ┌──────────┐        │       │
+           │ Copilot  │        │       │
+           │ (free)   │        │       │
+           └──────────┘        │       │
+                               ▼       ▼
+                    ┌─────────────────────────┐
+                    │ Need fastest indexing?  │
+                    └───────────┬─────────────┘
+                          ┌─────┴─────┐
+                         YES          NO
+                          │            │
+                          ▼            ▼
+                   ┌──────────┐ ┌──────────────┐
+                   │ Ollama   │ │ OpenAI or    │
+                   │ (local)  │ │ Google       │
+                   └──────────┘ └──────────────┘
+```
+### Provider Comparison
+| Provider | Speed | Cost | Privacy | Best For |
+|----------|-------|------|---------|----------|
+| **Ollama** | Fastest | Free | Full | Large codebases, privacy-sensitive |
+| **GitHub Copilot** | Slow (rate limited) | Free* | Cloud | Small codebases, existing subscribers |
+| **OpenAI** | Medium | ~$0.0001/1K tokens | Cloud | General use |
+| **Google** | Fast | Free tier available | Cloud | Medium-large codebases |
+*Requires active Copilot subscription
+### Setup by Provider
+**Ollama (Recommended for large codebases)**
+```bash
+ollama pull nomic-embed-text
+```
+```json
+{ "embeddingProvider": "ollama" }
+```
+**OpenAI**
+```bash
+export OPENAI_API_KEY=sk-...
+```
+```json
+{ "embeddingProvider": "openai" }
+```
+**Google**
+```bash
+export GOOGLE_API_KEY=...
+```
+```json
+{ "embeddingProvider": "google" }
+```
+**GitHub Copilot**
+No setup needed if you have an active Copilot subscription.
+```json
+{ "embeddingProvider": "github-copilot" }
+```
 ## ⚠️ Tradeoffs
 Be aware of these characteristics:

package/commands/find.md CHANGED Viewed

@@ -2,12 +2,24 @@
 description: Find code using hybrid approach (semantic + grep)
 ---
-Find code related to: $ARGUMENTS
+Find code using both semantic search and grep.
+User input: $ARGUMENTS
 Strategy:
-1. First use `codebase_search` to find semantically related code
-2. From the results, identify specific function/class names
+1. Use `codebase_search` to find semantically related code
+2. Identify specific function/class/variable names from results
 3. Use grep to find all occurrences of those identifiers
-4. Combine findings into a comprehensive answer
+4. Combine into a comprehensive answer
+Parse optional parameters from input:
+- `limit=N` → limit semantic results
+- `type=X` or "functions"/"classes" → filter chunk type
+- `dir=X` → filter directory
+Examples:
+- `/find error handling middleware`
+- `/find payment validation type=function`
+- `/find user auth in src/services`
-If the semantic index doesn't exist, run `index_codebase` first.
+If no index exists, run `index_codebase` first.

package/commands/index.md CHANGED Viewed

@@ -2,10 +2,20 @@
 description: Index the codebase for semantic search
 ---
-Run the `index_codebase` tool to create or update the semantic search index.
+Run the `index_codebase` tool with these settings:
-Show progress and final statistics including:
-- Number of files processed
-- Number of chunks indexed
-- Tokens used
-- Duration
+User input: $ARGUMENTS
+Parse the input and set tool arguments:
+- force=true if input contains "force"
+- estimateOnly=true if input contains "estimate"
+- verbose=true (always, for detailed output)
+Examples:
+- `/index` → force=false, estimateOnly=false, verbose=true
+- `/index force` → force=true, estimateOnly=false, verbose=true
+- `/index estimate` → force=false, estimateOnly=true, verbose=true
+IMPORTANT: You MUST pass the parsed arguments to `index_codebase`. Do not ignore them.
+Show final statistics including files processed, chunks indexed, tokens used, and duration.

package/commands/search.md CHANGED Viewed

@@ -2,8 +2,23 @@
 description: Search codebase by meaning using semantic search
 ---
-Use the `codebase_search` tool to find code related to: $ARGUMENTS
+Search the codebase using semantic search.
-If the index doesn't exist yet, run `index_codebase` first.
+User input: $ARGUMENTS
-Return the most relevant results with file paths and line numbers.
+The first part is the search query. Look for optional parameters:
+- `limit=N` or "top N" or "first N" → set limit
+- `type=X` or mentions "functions"/"classes"/"methods" → set chunkType
+- `dir=X` or "in folder X" → set directory filter
+- File extensions like ".ts", "typescript", ".py" → set fileType
+Call `codebase_search` with the parsed arguments.
+Examples:
+- `/search authentication logic` → query="authentication logic"
+- `/search error handling limit=5` → query="error handling", limit=5
+- `/search validation functions` → query="validation", chunkType="function"
+If the index doesn't exist, run `index_codebase` first.
+Return results with file paths and line numbers.

package/commands/status.md ADDED Viewed

@@ -0,0 +1,15 @@
+---
+description: Check if the codebase is indexed and ready for semantic search
+---
+Run the `index_status` tool to check if the codebase index is ready.
+This shows:
+- Whether the codebase is indexed
+- Number of indexed chunks
+- Embedding provider and model being used
+- Current git branch
+No arguments needed - just run `index_status`.
+If not indexed, suggest running `/index` to create the index.