opencode-codebase-index 0.1.10 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +108 -36
- package/dist/index.cjs +525 -136
- package/dist/index.cjs.map +1 -1
- package/dist/index.js +527 -138
- package/dist/index.js.map +1 -1
- package/native/codebase-index-native.darwin-arm64.node +0 -0
- package/native/codebase-index-native.darwin-x64.node +0 -0
- package/native/codebase-index-native.linux-arm64-gnu.node +0 -0
- package/native/codebase-index-native.linux-x64-gnu.node +0 -0
- package/native/codebase-index-native.win32-x64-msvc.node +0 -0
- package/package.json +1 -1
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Kenneth Helweg
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
CHANGED
|
@@ -14,6 +14,7 @@
|
|
|
14
14
|
|
|
15
15
|
- 🧠 **Semantic Search**: Finds "user authentication" logic even if the function is named `check_creds`.
|
|
16
16
|
- ⚡ **Blazing Fast Indexing**: Powered by a Rust native module using `tree-sitter` and `usearch`. Incremental updates take milliseconds.
|
|
17
|
+
- 🌿 **Branch-Aware**: Seamlessly handles git branch switches — reuses embeddings, filters stale results.
|
|
17
18
|
- 🔒 **Privacy Focused**: Your vector index is stored locally in your project.
|
|
18
19
|
- 🔌 **Model Agnostic**: Works out-of-the-box with GitHub Copilot, OpenAI, Gemini, or local Ollama models.
|
|
19
20
|
|
|
@@ -31,11 +32,12 @@
|
|
|
31
32
|
}
|
|
32
33
|
```
|
|
33
34
|
|
|
34
|
-
3. **
|
|
35
|
-
|
|
36
|
-
> "Find the function that handles credit card validation errors"
|
|
35
|
+
3. **Index your codebase**
|
|
36
|
+
Run `/index` or ask the agent to index your codebase. This only needs to be done once — subsequent updates are incremental.
|
|
37
37
|
|
|
38
|
-
|
|
38
|
+
4. **Start Searching**
|
|
39
|
+
Ask:
|
|
40
|
+
> "Find the function that handles credit card validation errors"
|
|
39
41
|
|
|
40
42
|
## 🔍 See It In Action
|
|
41
43
|
|
|
@@ -68,6 +70,28 @@ src/api/checkout.ts:89 (Route handler for /pay)
|
|
|
68
70
|
|
|
69
71
|
**Rule of thumb**: Semantic search for discovery → grep for precision.
|
|
70
72
|
|
|
73
|
+
## 📊 Token Usage
|
|
74
|
+
|
|
75
|
+
In our testing across open-source codebases (axios, express), we observed **up to 90% reduction in token usage** for conceptual queries like *"find the error handling middleware"*.
|
|
76
|
+
|
|
77
|
+
### Why It Saves Tokens
|
|
78
|
+
|
|
79
|
+
- **Without plugin**: Agent explores files, reads code, backtracks, explores more
|
|
80
|
+
- **With plugin**: Semantic search returns relevant code immediately → less exploration
|
|
81
|
+
|
|
82
|
+
### Key Takeaways
|
|
83
|
+
|
|
84
|
+
1. **Significant savings possible**: Up to 90% reduction in the best cases
|
|
85
|
+
2. **Results vary**: Savings depend on query type, codebase structure, and agent behavior
|
|
86
|
+
3. **Best for discovery**: Conceptual queries benefit most; exact identifier lookups should use grep
|
|
87
|
+
4. **Complements existing tools**: Provides a faster initial signal, doesn't replace grep/explore
|
|
88
|
+
|
|
89
|
+
### When the Plugin Helps Most
|
|
90
|
+
|
|
91
|
+
- **Conceptual queries**: "Where is the authentication logic?" (no keywords to grep for)
|
|
92
|
+
- **Unfamiliar codebases**: You don't know what to search for yet
|
|
93
|
+
- **Large codebases**: Semantic search scales better than exhaustive exploration
|
|
94
|
+
|
|
71
95
|
## 🛠️ How It Works
|
|
72
96
|
|
|
73
97
|
```mermaid
|
|
@@ -75,25 +99,72 @@ graph TD
|
|
|
75
99
|
subgraph Indexing
|
|
76
100
|
A[Source Code] -->|Tree-sitter| B[Semantic Chunks]
|
|
77
101
|
B -->|Embedding Model| C[Vectors]
|
|
78
|
-
C -->|uSearch| D[(
|
|
102
|
+
C -->|uSearch| D[(Vector Store)]
|
|
103
|
+
C -->|SQLite| G[(Embeddings DB)]
|
|
104
|
+
B -->|BM25| E[(Inverted Index)]
|
|
105
|
+
B -->|Branch Catalog| G
|
|
79
106
|
end
|
|
80
107
|
|
|
81
108
|
subgraph Searching
|
|
82
109
|
Q[User Query] -->|Embedding Model| V[Query Vector]
|
|
83
110
|
V -->|Cosine Similarity| D
|
|
84
|
-
|
|
111
|
+
Q -->|BM25| E
|
|
112
|
+
G -->|Branch Filter| F
|
|
113
|
+
D --> F[Hybrid Fusion]
|
|
114
|
+
E --> F
|
|
115
|
+
F --> R[Ranked Results]
|
|
85
116
|
end
|
|
86
117
|
```
|
|
87
118
|
|
|
88
|
-
1. **Parsing**: We use `tree-sitter` to intelligently parse your code into meaningful blocks (functions, classes, interfaces).
|
|
89
|
-
2. **
|
|
90
|
-
3. **
|
|
91
|
-
4. **
|
|
119
|
+
1. **Parsing**: We use `tree-sitter` to intelligently parse your code into meaningful blocks (functions, classes, interfaces). JSDoc comments and docstrings are automatically included with their associated code.
|
|
120
|
+
2. **Chunking**: Large blocks are split with overlapping windows to preserve context across chunk boundaries.
|
|
121
|
+
3. **Embedding**: These blocks are converted into vector representations using your configured AI provider.
|
|
122
|
+
4. **Storage**: Embeddings are stored in SQLite (deduplicated by content hash) and vectors in `usearch` with F16 quantization for 50% memory savings. A branch catalog tracks which chunks exist on each branch.
|
|
123
|
+
5. **Hybrid Search**: Combines semantic similarity (vectors) with BM25 keyword matching, filtered by current branch.
|
|
92
124
|
|
|
93
125
|
**Performance characteristics:**
|
|
94
126
|
- **Incremental indexing**: ~50ms check time — only re-embeds changed files
|
|
95
|
-
- **Smart chunking**: Understands code structure to keep functions whole
|
|
127
|
+
- **Smart chunking**: Understands code structure to keep functions whole, with overlap for context
|
|
96
128
|
- **Native speed**: Core logic written in Rust for maximum performance
|
|
129
|
+
- **Memory efficient**: F16 vector quantization reduces index size by 50%
|
|
130
|
+
- **Branch-aware**: Automatically tracks which chunks exist on each git branch
|
|
131
|
+
|
|
132
|
+
## 🌿 Branch-Aware Indexing
|
|
133
|
+
|
|
134
|
+
The plugin automatically detects git branches and optimizes indexing across branch switches.
|
|
135
|
+
|
|
136
|
+
### How It Works
|
|
137
|
+
|
|
138
|
+
When you switch branches, code changes but embeddings for unchanged content remain the same. The plugin:
|
|
139
|
+
|
|
140
|
+
1. **Stores embeddings by content hash**: Embeddings are deduplicated across branches
|
|
141
|
+
2. **Tracks branch membership**: A lightweight catalog tracks which chunks exist on each branch
|
|
142
|
+
3. **Filters search results**: Queries only return results relevant to the current branch
|
|
143
|
+
|
|
144
|
+
### Benefits
|
|
145
|
+
|
|
146
|
+
| Scenario | Without Branch Awareness | With Branch Awareness |
|
|
147
|
+
|----------|-------------------------|----------------------|
|
|
148
|
+
| Switch to feature branch | Re-index everything | Instant — reuse existing embeddings |
|
|
149
|
+
| Return to main | Re-index everything | Instant — catalog already exists |
|
|
150
|
+
| Search on branch | May return stale results | Only returns current branch's code |
|
|
151
|
+
|
|
152
|
+
### Automatic Behavior
|
|
153
|
+
|
|
154
|
+
- **Branch detection**: Automatically reads from `.git/HEAD`
|
|
155
|
+
- **Re-indexing on switch**: Triggers when you switch branches (via file watcher)
|
|
156
|
+
- **Legacy migration**: Automatically migrates old indexes on first run
|
|
157
|
+
- **Garbage collection**: Health check removes orphaned embeddings and chunks
|
|
158
|
+
|
|
159
|
+
### Storage Structure
|
|
160
|
+
|
|
161
|
+
```
|
|
162
|
+
.opencode/index/
|
|
163
|
+
├── codebase.db # SQLite: embeddings, chunks, branch catalog
|
|
164
|
+
├── vectors.usearch # Vector index (uSearch)
|
|
165
|
+
├── inverted-index.json # BM25 keyword index
|
|
166
|
+
└── file-hashes.json # File change detection
|
|
167
|
+
```
|
|
97
168
|
|
|
98
169
|
## 🧰 Tools Available
|
|
99
170
|
|
|
@@ -117,22 +188,17 @@ The plugin exposes these tools to the OpenCode agent:
|
|
|
117
188
|
### `index_codebase`
|
|
118
189
|
Manually trigger indexing.
|
|
119
190
|
- **Use for**: Forcing a re-index or checking stats.
|
|
120
|
-
- **Parameters**: `force` (rebuild all), `estimateOnly` (check costs).
|
|
191
|
+
- **Parameters**: `force` (rebuild all), `estimateOnly` (check costs), `verbose` (show skipped files and parse failures).
|
|
121
192
|
|
|
122
193
|
### `index_status`
|
|
123
194
|
Checks if the index is ready and healthy.
|
|
124
195
|
|
|
125
196
|
### `index_health_check`
|
|
126
|
-
Maintenance tool to remove stale entries from deleted files.
|
|
197
|
+
Maintenance tool to remove stale entries from deleted files and orphaned embeddings/chunks from the database.
|
|
127
198
|
|
|
128
199
|
## 🎮 Slash Commands
|
|
129
200
|
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
Copy the commands:
|
|
133
|
-
```bash
|
|
134
|
-
cp -r node_modules/opencode-codebase-index/commands/* .opencode/command/
|
|
135
|
-
```
|
|
201
|
+
The plugin automatically registers these slash commands:
|
|
136
202
|
|
|
137
203
|
| Command | Description |
|
|
138
204
|
| ------- | ----------- |
|
|
@@ -151,7 +217,9 @@ Zero-config by default (uses `auto` mode). Customize in `.opencode/codebase-inde
|
|
|
151
217
|
"indexing": {
|
|
152
218
|
"autoIndex": false,
|
|
153
219
|
"watchFiles": true,
|
|
154
|
-
"maxFileSize": 1048576
|
|
220
|
+
"maxFileSize": 1048576,
|
|
221
|
+
"maxChunksPerFile": 100,
|
|
222
|
+
"semanticOnly": false
|
|
155
223
|
},
|
|
156
224
|
"search": {
|
|
157
225
|
"maxResults": 20,
|
|
@@ -172,6 +240,10 @@ Zero-config by default (uses `auto` mode). Customize in `.opencode/codebase-inde
|
|
|
172
240
|
| `autoIndex` | `false` | Automatically index on plugin load |
|
|
173
241
|
| `watchFiles` | `true` | Re-index when files change |
|
|
174
242
|
| `maxFileSize` | `1048576` | Skip files larger than this (bytes). Default: 1MB |
|
|
243
|
+
| `maxChunksPerFile` | `100` | Maximum chunks to index per file (controls token costs for large files) |
|
|
244
|
+
| `semanticOnly` | `false` | When `true`, only index semantic nodes (functions, classes) and skip generic blocks |
|
|
245
|
+
| `retries` | `3` | Number of retry attempts for failed embedding API calls |
|
|
246
|
+
| `retryDelayMs` | `1000` | Delay between retries in milliseconds |
|
|
175
247
|
| **search** | | |
|
|
176
248
|
| `maxResults` | `20` | Maximum results to return |
|
|
177
249
|
| `minScore` | `0.1` | Minimum similarity score (0-1). Lower = more results |
|
|
@@ -204,19 +276,16 @@ Be aware of these characteristics:
|
|
|
204
276
|
npm run build
|
|
205
277
|
```
|
|
206
278
|
|
|
207
|
-
2. **
|
|
208
|
-
```
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
|
|
212
|
-
|
|
279
|
+
2. **Register in Test Project** (use `file://` URL in `opencode.json`):
|
|
280
|
+
```json
|
|
281
|
+
{
|
|
282
|
+
"plugin": [
|
|
283
|
+
"file:///path/to/opencode-codebase-index"
|
|
284
|
+
]
|
|
285
|
+
}
|
|
213
286
|
```
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
```bash
|
|
217
|
-
mkdir -p .opencode/plugin
|
|
218
|
-
echo 'export { default } from "$HOME/.cache/opencode/node_modules/opencode-codebase-index/dist/index.js"' > .opencode/plugin/codebase-index.ts
|
|
219
|
-
```
|
|
287
|
+
|
|
288
|
+
This loads directly from your source directory, so changes take effect after rebuilding.
|
|
220
289
|
|
|
221
290
|
## 🤝 Contributing
|
|
222
291
|
|
|
@@ -237,12 +306,13 @@ CI will automatically run tests and type checking on your PR.
|
|
|
237
306
|
│ ├── config/ # Configuration schema
|
|
238
307
|
│ ├── embeddings/ # Provider detection and API calls
|
|
239
308
|
│ ├── indexer/ # Core indexing logic + inverted index
|
|
309
|
+
│ ├── git/ # Git utilities (branch detection)
|
|
240
310
|
│ ├── tools/ # OpenCode tool definitions
|
|
241
311
|
│ ├── utils/ # File collection, cost estimation
|
|
242
312
|
│ ├── native/ # Rust native module wrapper
|
|
243
|
-
│ └── watcher/ # File change watcher
|
|
313
|
+
│ └── watcher/ # File/git change watcher
|
|
244
314
|
├── native/
|
|
245
|
-
│ └── src/ # Rust: tree-sitter, usearch, xxhash
|
|
315
|
+
│ └── src/ # Rust: tree-sitter, usearch, xxhash, SQLite
|
|
246
316
|
├── tests/ # Unit tests (vitest)
|
|
247
317
|
├── commands/ # Slash command definitions
|
|
248
318
|
├── skill/ # Agent skill guidance
|
|
@@ -252,8 +322,10 @@ CI will automatically run tests and type checking on your PR.
|
|
|
252
322
|
### Native Module
|
|
253
323
|
|
|
254
324
|
The Rust native module handles performance-critical operations:
|
|
255
|
-
- **tree-sitter**: Language-aware code parsing
|
|
256
|
-
- **usearch**: High-performance vector similarity search
|
|
325
|
+
- **tree-sitter**: Language-aware code parsing with JSDoc/docstring extraction
|
|
326
|
+
- **usearch**: High-performance vector similarity search with F16 quantization
|
|
327
|
+
- **SQLite**: Persistent storage for embeddings, chunks, and branch catalog
|
|
328
|
+
- **BM25 inverted index**: Fast keyword search for hybrid retrieval
|
|
257
329
|
- **xxhash**: Fast content hashing for change detection
|
|
258
330
|
|
|
259
331
|
Rebuild with: `npm run build:native` (requires Rust toolchain)
|