opencode-codebase-index 0.1.11 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +54 -9
- package/dist/index.cjs +575 -222
- package/dist/index.cjs.map +1 -1
- package/dist/index.js +569 -216
- package/dist/index.js.map +1 -1
- package/native/codebase-index-native.darwin-arm64.node +0 -0
- package/native/codebase-index-native.darwin-x64.node +0 -0
- package/native/codebase-index-native.linux-arm64-gnu.node +0 -0
- package/native/codebase-index-native.linux-x64-gnu.node +0 -0
- package/native/codebase-index-native.win32-x64-msvc.node +0 -0
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -14,6 +14,7 @@
|
|
|
14
14
|
|
|
15
15
|
- 🧠 **Semantic Search**: Finds "user authentication" logic even if the function is named `check_creds`.
|
|
16
16
|
- ⚡ **Blazing Fast Indexing**: Powered by a Rust native module using `tree-sitter` and `usearch`. Incremental updates take milliseconds.
|
|
17
|
+
- 🌿 **Branch-Aware**: Seamlessly handles git branch switches — reuses embeddings, filters stale results.
|
|
17
18
|
- 🔒 **Privacy Focused**: Your vector index is stored locally in your project.
|
|
18
19
|
- 🔌 **Model Agnostic**: Works out-of-the-box with GitHub Copilot, OpenAI, Gemini, or local Ollama models.
|
|
19
20
|
|
|
@@ -31,11 +32,12 @@
|
|
|
31
32
|
}
|
|
32
33
|
```
|
|
33
34
|
|
|
34
|
-
3. **
|
|
35
|
-
|
|
36
|
-
> "Find the function that handles credit card validation errors"
|
|
35
|
+
3. **Index your codebase**
|
|
36
|
+
Run `/index` or ask the agent to index your codebase. This only needs to be done once — subsequent updates are incremental.
|
|
37
37
|
|
|
38
|
-
|
|
38
|
+
4. **Start Searching**
|
|
39
|
+
Ask:
|
|
40
|
+
> "Find the function that handles credit card validation errors"
|
|
39
41
|
|
|
40
42
|
## 🔍 See It In Action
|
|
41
43
|
|
|
@@ -98,13 +100,16 @@ graph TD
|
|
|
98
100
|
A[Source Code] -->|Tree-sitter| B[Semantic Chunks]
|
|
99
101
|
B -->|Embedding Model| C[Vectors]
|
|
100
102
|
C -->|uSearch| D[(Vector Store)]
|
|
103
|
+
C -->|SQLite| G[(Embeddings DB)]
|
|
101
104
|
B -->|BM25| E[(Inverted Index)]
|
|
105
|
+
B -->|Branch Catalog| G
|
|
102
106
|
end
|
|
103
107
|
|
|
104
108
|
subgraph Searching
|
|
105
109
|
Q[User Query] -->|Embedding Model| V[Query Vector]
|
|
106
110
|
V -->|Cosine Similarity| D
|
|
107
111
|
Q -->|BM25| E
|
|
112
|
+
G -->|Branch Filter| F
|
|
108
113
|
D --> F[Hybrid Fusion]
|
|
109
114
|
E --> F
|
|
110
115
|
F --> R[Ranked Results]
|
|
@@ -114,14 +119,52 @@ graph TD
|
|
|
114
119
|
1. **Parsing**: We use `tree-sitter` to intelligently parse your code into meaningful blocks (functions, classes, interfaces). JSDoc comments and docstrings are automatically included with their associated code.
|
|
115
120
|
2. **Chunking**: Large blocks are split with overlapping windows to preserve context across chunk boundaries.
|
|
116
121
|
3. **Embedding**: These blocks are converted into vector representations using your configured AI provider.
|
|
117
|
-
4. **Storage**:
|
|
118
|
-
5. **Hybrid Search**: Combines semantic similarity (vectors) with BM25 keyword matching
|
|
122
|
+
4. **Storage**: Embeddings are stored in SQLite (deduplicated by content hash) and vectors in `usearch` with F16 quantization for 50% memory savings. A branch catalog tracks which chunks exist on each branch.
|
|
123
|
+
5. **Hybrid Search**: Combines semantic similarity (vectors) with BM25 keyword matching, filtered by current branch.
|
|
119
124
|
|
|
120
125
|
**Performance characteristics:**
|
|
121
126
|
- **Incremental indexing**: ~50ms check time — only re-embeds changed files
|
|
122
127
|
- **Smart chunking**: Understands code structure to keep functions whole, with overlap for context
|
|
123
128
|
- **Native speed**: Core logic written in Rust for maximum performance
|
|
124
129
|
- **Memory efficient**: F16 vector quantization reduces index size by 50%
|
|
130
|
+
- **Branch-aware**: Automatically tracks which chunks exist on each git branch
|
|
131
|
+
|
|
132
|
+
## 🌿 Branch-Aware Indexing
|
|
133
|
+
|
|
134
|
+
The plugin automatically detects git branches and optimizes indexing across branch switches.
|
|
135
|
+
|
|
136
|
+
### How It Works
|
|
137
|
+
|
|
138
|
+
When you switch branches, code changes but embeddings for unchanged content remain the same. The plugin:
|
|
139
|
+
|
|
140
|
+
1. **Stores embeddings by content hash**: Embeddings are deduplicated across branches
|
|
141
|
+
2. **Tracks branch membership**: A lightweight catalog tracks which chunks exist on each branch
|
|
142
|
+
3. **Filters search results**: Queries only return results relevant to the current branch
|
|
143
|
+
|
|
144
|
+
### Benefits
|
|
145
|
+
|
|
146
|
+
| Scenario | Without Branch Awareness | With Branch Awareness |
|
|
147
|
+
|----------|-------------------------|----------------------|
|
|
148
|
+
| Switch to feature branch | Re-index everything | Instant — reuse existing embeddings |
|
|
149
|
+
| Return to main | Re-index everything | Instant — catalog already exists |
|
|
150
|
+
| Search on branch | May return stale results | Only returns current branch's code |
|
|
151
|
+
|
|
152
|
+
### Automatic Behavior
|
|
153
|
+
|
|
154
|
+
- **Branch detection**: Automatically reads from `.git/HEAD`
|
|
155
|
+
- **Re-indexing on switch**: Triggers when you switch branches (via file watcher)
|
|
156
|
+
- **Legacy migration**: Automatically migrates old indexes on first run
|
|
157
|
+
- **Garbage collection**: Health check removes orphaned embeddings and chunks
|
|
158
|
+
|
|
159
|
+
### Storage Structure
|
|
160
|
+
|
|
161
|
+
```
|
|
162
|
+
.opencode/index/
|
|
163
|
+
├── codebase.db # SQLite: embeddings, chunks, branch catalog
|
|
164
|
+
├── vectors.usearch # Vector index (uSearch)
|
|
165
|
+
├── inverted-index.json # BM25 keyword index
|
|
166
|
+
└── file-hashes.json # File change detection
|
|
167
|
+
```
|
|
125
168
|
|
|
126
169
|
## 🧰 Tools Available
|
|
127
170
|
|
|
@@ -151,7 +194,7 @@ Manually trigger indexing.
|
|
|
151
194
|
Checks if the index is ready and healthy.
|
|
152
195
|
|
|
153
196
|
### `index_health_check`
|
|
154
|
-
Maintenance tool to remove stale entries from deleted files.
|
|
197
|
+
Maintenance tool to remove stale entries from deleted files and orphaned embeddings/chunks from the database.
|
|
155
198
|
|
|
156
199
|
## 🎮 Slash Commands
|
|
157
200
|
|
|
@@ -263,12 +306,13 @@ CI will automatically run tests and type checking on your PR.
|
|
|
263
306
|
│ ├── config/ # Configuration schema
|
|
264
307
|
│ ├── embeddings/ # Provider detection and API calls
|
|
265
308
|
│ ├── indexer/ # Core indexing logic + inverted index
|
|
309
|
+
│ ├── git/ # Git utilities (branch detection)
|
|
266
310
|
│ ├── tools/ # OpenCode tool definitions
|
|
267
311
|
│ ├── utils/ # File collection, cost estimation
|
|
268
312
|
│ ├── native/ # Rust native module wrapper
|
|
269
|
-
│ └── watcher/ # File change watcher
|
|
313
|
+
│ └── watcher/ # File/git change watcher
|
|
270
314
|
├── native/
|
|
271
|
-
│ └── src/ # Rust: tree-sitter, usearch, xxhash
|
|
315
|
+
│ └── src/ # Rust: tree-sitter, usearch, xxhash, SQLite
|
|
272
316
|
├── tests/ # Unit tests (vitest)
|
|
273
317
|
├── commands/ # Slash command definitions
|
|
274
318
|
├── skill/ # Agent skill guidance
|
|
@@ -280,6 +324,7 @@ CI will automatically run tests and type checking on your PR.
|
|
|
280
324
|
The Rust native module handles performance-critical operations:
|
|
281
325
|
- **tree-sitter**: Language-aware code parsing with JSDoc/docstring extraction
|
|
282
326
|
- **usearch**: High-performance vector similarity search with F16 quantization
|
|
327
|
+
- **SQLite**: Persistent storage for embeddings, chunks, and branch catalog
|
|
283
328
|
- **BM25 inverted index**: Fast keyword search for hybrid retrieval
|
|
284
329
|
- **xxhash**: Fast content hashing for change detection
|
|
285
330
|
|