opencode-codebase-index 0.1.10 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Kenneth Helweg
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md CHANGED
@@ -14,6 +14,7 @@
14
14
 
15
15
  - 🧠 **Semantic Search**: Finds "user authentication" logic even if the function is named `check_creds`.
16
16
  - ⚡ **Blazing Fast Indexing**: Powered by a Rust native module using `tree-sitter` and `usearch`. Incremental updates take milliseconds.
17
+ - 🌿 **Branch-Aware**: Seamlessly handles git branch switches — reuses embeddings, filters stale results.
17
18
  - 🔒 **Privacy Focused**: Your vector index is stored locally in your project.
18
19
  - 🔌 **Model Agnostic**: Works out-of-the-box with GitHub Copilot, OpenAI, Gemini, or local Ollama models.
19
20
 
@@ -31,11 +32,12 @@
31
32
  }
32
33
  ```
33
34
 
34
- 3. **Start Searching**
35
- Load OpenCode and ask:
36
- > "Find the function that handles credit card validation errors"
35
+ 3. **Index your codebase**
36
+ Run `/index` or ask the agent to index your codebase. This only needs to be done once — subsequent updates are incremental.
37
37
 
38
- *The plugin will automatically index your codebase on the first run.*
38
+ 4. **Start Searching**
39
+ Ask:
40
+ > "Find the function that handles credit card validation errors"
39
41
 
40
42
  ## 🔍 See It In Action
41
43
 
@@ -68,6 +70,28 @@ src/api/checkout.ts:89 (Route handler for /pay)
68
70
 
69
71
  **Rule of thumb**: Semantic search for discovery → grep for precision.
70
72
 
73
+ ## 📊 Token Usage
74
+
75
+ In our testing across open-source codebases (axios, express), we observed **up to 90% reduction in token usage** for conceptual queries like *"find the error handling middleware"*.
76
+
77
+ ### Why It Saves Tokens
78
+
79
+ - **Without plugin**: Agent explores files, reads code, backtracks, explores more
80
+ - **With plugin**: Semantic search returns relevant code immediately → less exploration
81
+
82
+ ### Key Takeaways
83
+
84
+ 1. **Significant savings possible**: Up to 90% reduction in the best cases
85
+ 2. **Results vary**: Savings depend on query type, codebase structure, and agent behavior
86
+ 3. **Best for discovery**: Conceptual queries benefit most; exact identifier lookups should use grep
87
+ 4. **Complements existing tools**: Provides a faster initial signal, doesn't replace grep/explore
88
+
89
+ ### When the Plugin Helps Most
90
+
91
+ - **Conceptual queries**: "Where is the authentication logic?" (no keywords to grep for)
92
+ - **Unfamiliar codebases**: You don't know what to search for yet
93
+ - **Large codebases**: Semantic search scales better than exhaustive exploration
94
+
71
95
  ## 🛠️ How It Works
72
96
 
73
97
  ```mermaid
@@ -75,25 +99,72 @@ graph TD
75
99
  subgraph Indexing
76
100
  A[Source Code] -->|Tree-sitter| B[Semantic Chunks]
77
101
  B -->|Embedding Model| C[Vectors]
78
- C -->|uSearch| D[(Local Vector Store)]
102
+ C -->|uSearch| D[(Vector Store)]
103
+ C -->|SQLite| G[(Embeddings DB)]
104
+ B -->|BM25| E[(Inverted Index)]
105
+ B -->|Branch Catalog| G
79
106
  end
80
107
 
81
108
  subgraph Searching
82
109
  Q[User Query] -->|Embedding Model| V[Query Vector]
83
110
  V -->|Cosine Similarity| D
84
- D --> R[Ranked Results]
111
+ Q -->|BM25| E
112
+ G -->|Branch Filter| F
113
+ D --> F[Hybrid Fusion]
114
+ E --> F
115
+ F --> R[Ranked Results]
85
116
  end
86
117
  ```
87
118
 
88
- 1. **Parsing**: We use `tree-sitter` to intelligently parse your code into meaningful blocks (functions, classes, interfaces).
89
- 2. **Embedding**: These blocks are converted into vector representations using your configured AI provider.
90
- 3. **Storage**: Vectors are stored in a high-performance local index using `usearch`.
91
- 4. **Search**: Your natural language queries are matched against this index to find the most semantically relevant code.
119
+ 1. **Parsing**: We use `tree-sitter` to intelligently parse your code into meaningful blocks (functions, classes, interfaces). JSDoc comments and docstrings are automatically included with their associated code.
120
+ 2. **Chunking**: Large blocks are split with overlapping windows to preserve context across chunk boundaries.
121
+ 3. **Embedding**: These blocks are converted into vector representations using your configured AI provider.
122
+ 4. **Storage**: Embeddings are stored in SQLite (deduplicated by content hash) and vectors in `usearch` with F16 quantization for 50% memory savings. A branch catalog tracks which chunks exist on each branch.
123
+ 5. **Hybrid Search**: Combines semantic similarity (vectors) with BM25 keyword matching, filtered by current branch.
92
124
 
93
125
  **Performance characteristics:**
94
126
  - **Incremental indexing**: ~50ms check time — only re-embeds changed files
95
- - **Smart chunking**: Understands code structure to keep functions whole
127
+ - **Smart chunking**: Understands code structure to keep functions whole, with overlap for context
96
128
  - **Native speed**: Core logic written in Rust for maximum performance
129
+ - **Memory efficient**: F16 vector quantization reduces index size by 50%
130
+ - **Branch-aware**: Automatically tracks which chunks exist on each git branch
131
+
132
+ ## 🌿 Branch-Aware Indexing
133
+
134
+ The plugin automatically detects git branches and optimizes indexing across branch switches.
135
+
136
+ ### How It Works
137
+
138
+ When you switch branches, code changes but embeddings for unchanged content remain the same. The plugin:
139
+
140
+ 1. **Stores embeddings by content hash**: Embeddings are deduplicated across branches
141
+ 2. **Tracks branch membership**: A lightweight catalog tracks which chunks exist on each branch
142
+ 3. **Filters search results**: Queries only return results relevant to the current branch
143
+
144
+ ### Benefits
145
+
146
+ | Scenario | Without Branch Awareness | With Branch Awareness |
147
+ |----------|-------------------------|----------------------|
148
+ | Switch to feature branch | Re-index everything | Instant — reuse existing embeddings |
149
+ | Return to main | Re-index everything | Instant — catalog already exists |
150
+ | Search on branch | May return stale results | Only returns current branch's code |
151
+
152
+ ### Automatic Behavior
153
+
154
+ - **Branch detection**: Automatically reads from `.git/HEAD`
155
+ - **Re-indexing on switch**: Triggers when you switch branches (via file watcher)
156
+ - **Legacy migration**: Automatically migrates old indexes on first run
157
+ - **Garbage collection**: Health check removes orphaned embeddings and chunks
158
+
159
+ ### Storage Structure
160
+
161
+ ```
162
+ .opencode/index/
163
+ ├── codebase.db # SQLite: embeddings, chunks, branch catalog
164
+ ├── vectors.usearch # Vector index (uSearch)
165
+ ├── inverted-index.json # BM25 keyword index
166
+ └── file-hashes.json # File change detection
167
+ ```
97
168
 
98
169
  ## 🧰 Tools Available
99
170
 
@@ -117,22 +188,17 @@ The plugin exposes these tools to the OpenCode agent:
117
188
  ### `index_codebase`
118
189
  Manually trigger indexing.
119
190
  - **Use for**: Forcing a re-index or checking stats.
120
- - **Parameters**: `force` (rebuild all), `estimateOnly` (check costs).
191
+ - **Parameters**: `force` (rebuild all), `estimateOnly` (check costs), `verbose` (show skipped files and parse failures).
121
192
 
122
193
  ### `index_status`
123
194
  Checks if the index is ready and healthy.
124
195
 
125
196
  ### `index_health_check`
126
- Maintenance tool to remove stale entries from deleted files.
197
+ Maintenance tool to remove stale entries from deleted files and orphaned embeddings/chunks from the database.
127
198
 
128
199
  ## 🎮 Slash Commands
129
200
 
130
- For easier access, you can add slash commands to your project.
131
-
132
- Copy the commands:
133
- ```bash
134
- cp -r node_modules/opencode-codebase-index/commands/* .opencode/command/
135
- ```
201
+ The plugin automatically registers these slash commands:
136
202
 
137
203
  | Command | Description |
138
204
  | ------- | ----------- |
@@ -151,7 +217,9 @@ Zero-config by default (uses `auto` mode). Customize in `.opencode/codebase-inde
151
217
  "indexing": {
152
218
  "autoIndex": false,
153
219
  "watchFiles": true,
154
- "maxFileSize": 1048576
220
+ "maxFileSize": 1048576,
221
+ "maxChunksPerFile": 100,
222
+ "semanticOnly": false
155
223
  },
156
224
  "search": {
157
225
  "maxResults": 20,
@@ -172,6 +240,10 @@ Zero-config by default (uses `auto` mode). Customize in `.opencode/codebase-inde
172
240
  | `autoIndex` | `false` | Automatically index on plugin load |
173
241
  | `watchFiles` | `true` | Re-index when files change |
174
242
  | `maxFileSize` | `1048576` | Skip files larger than this (bytes). Default: 1MB |
243
+ | `maxChunksPerFile` | `100` | Maximum chunks to index per file (controls token costs for large files) |
244
+ | `semanticOnly` | `false` | When `true`, only index semantic nodes (functions, classes) and skip generic blocks |
245
+ | `retries` | `3` | Number of retry attempts for failed embedding API calls |
246
+ | `retryDelayMs` | `1000` | Delay between retries in milliseconds |
175
247
  | **search** | | |
176
248
  | `maxResults` | `20` | Maximum results to return |
177
249
  | `minScore` | `0.1` | Minimum similarity score (0-1). Lower = more results |
@@ -204,19 +276,16 @@ Be aware of these characteristics:
204
276
  npm run build
205
277
  ```
206
278
 
207
- 2. **Deploy to OpenCode Cache**:
208
- ```bash
209
- # Deploy script
210
- rm -rf ~/.cache/opencode/node_modules/opencode-codebase-index
211
- mkdir -p ~/.cache/opencode/node_modules/opencode-codebase-index
212
- cp -R dist native commands skill package.json ~/.cache/opencode/node_modules/opencode-codebase-index/
279
+ 2. **Register in Test Project** (use `file://` URL in `opencode.json`):
280
+ ```json
281
+ {
282
+ "plugin": [
283
+ "file:///path/to/opencode-codebase-index"
284
+ ]
285
+ }
213
286
  ```
214
-
215
- 3. **Register in Test Project**:
216
- ```bash
217
- mkdir -p .opencode/plugin
218
- echo 'export { default } from "$HOME/.cache/opencode/node_modules/opencode-codebase-index/dist/index.js"' > .opencode/plugin/codebase-index.ts
219
- ```
287
+
288
+ This loads directly from your source directory, so changes take effect after rebuilding.
220
289
 
221
290
  ## 🤝 Contributing
222
291
 
@@ -237,12 +306,13 @@ CI will automatically run tests and type checking on your PR.
237
306
  │ ├── config/ # Configuration schema
238
307
  │ ├── embeddings/ # Provider detection and API calls
239
308
  │ ├── indexer/ # Core indexing logic + inverted index
309
+ │ ├── git/ # Git utilities (branch detection)
240
310
  │ ├── tools/ # OpenCode tool definitions
241
311
  │ ├── utils/ # File collection, cost estimation
242
312
  │ ├── native/ # Rust native module wrapper
243
- │ └── watcher/ # File change watcher
313
+ │ └── watcher/ # File/git change watcher
244
314
  ├── native/
245
- │ └── src/ # Rust: tree-sitter, usearch, xxhash
315
+ │ └── src/ # Rust: tree-sitter, usearch, xxhash, SQLite
246
316
  ├── tests/ # Unit tests (vitest)
247
317
  ├── commands/ # Slash command definitions
248
318
  ├── skill/ # Agent skill guidance
@@ -252,8 +322,10 @@ CI will automatically run tests and type checking on your PR.
252
322
  ### Native Module
253
323
 
254
324
  The Rust native module handles performance-critical operations:
255
- - **tree-sitter**: Language-aware code parsing
256
- - **usearch**: High-performance vector similarity search
325
+ - **tree-sitter**: Language-aware code parsing with JSDoc/docstring extraction
326
+ - **usearch**: High-performance vector similarity search with F16 quantization
327
+ - **SQLite**: Persistent storage for embeddings, chunks, and branch catalog
328
+ - **BM25 inverted index**: Fast keyword search for hybrid retrieval
257
329
  - **xxhash**: Fast content hashing for change detection
258
330
 
259
331
  Rebuild with: `npm run build:native` (requires Rust toolchain)