@comfanion/usethis_search 3.0.0-dev.9 → 3.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +246 -302
- package/cli.ts +263 -0
- package/file-indexer.ts +1 -1
- package/index.ts +0 -2
- package/package.json +12 -5
- package/tools/codeindex.ts +2 -2
- package/tools/search.ts +254 -66
- package/vectorizer/analyzers/lsp-analyzer.ts +7 -7
- package/vectorizer/analyzers/regex-analyzer.ts +358 -61
- package/vectorizer/chunk-store.ts +207 -0
- package/vectorizer/chunkers/code-chunker.ts +74 -24
- package/vectorizer/chunkers/markdown-chunker.ts +69 -7
- package/vectorizer/graph-builder.ts +207 -15
- package/vectorizer/graph-db.ts +161 -164
- package/vectorizer/hybrid-search.ts +1 -1
- package/vectorizer/{index.js → index.ts} +796 -160
- package/vectorizer.yaml +20 -2
package/README.md
CHANGED
|
@@ -1,28 +1,27 @@
|
|
|
1
|
-
#
|
|
1
|
+
# @comfanion/usethis_search
|
|
2
2
|
|
|
3
|
-
**Semantic code search with
|
|
3
|
+
**Semantic code search with graph-based context for OpenCode**
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
Search code by meaning, not by text. Get related context automatically via code graph.
|
|
6
6
|
|
|
7
7
|
---
|
|
8
8
|
|
|
9
|
-
##
|
|
9
|
+
## What is this?
|
|
10
10
|
|
|
11
11
|
An OpenCode plugin that adds **smart search** to your project:
|
|
12
12
|
|
|
13
|
-
-
|
|
14
|
-
-
|
|
15
|
-
-
|
|
16
|
-
-
|
|
17
|
-
-
|
|
18
|
-
-
|
|
19
|
-
-
|
|
20
|
-
-
|
|
21
|
-
- 🌍 **Multilingual** — supports Ukrainian, Russian, and English
|
|
13
|
+
- **Semantic search** — finds code by meaning, even when words don't match
|
|
14
|
+
- **Hybrid search** — combines vector similarity + BM25 keyword matching
|
|
15
|
+
- **Graph-based context** — automatically attaches related code (imports, calls, type references) to search results
|
|
16
|
+
- **Two-phase indexing** — BM25 + graph search available immediately (Phase 1), vector search after embedding (Phase 2)
|
|
17
|
+
- **Simplified API** — 5 parameters, smart filter parsing, config-driven defaults
|
|
18
|
+
- **Automatic indexing** — files are indexed on change, zero effort
|
|
19
|
+
- **Local vectorization** — works offline, no API keys needed
|
|
20
|
+
- **Three indexes** — separate for code, docs, and configs
|
|
22
21
|
|
|
23
22
|
---
|
|
24
23
|
|
|
25
|
-
##
|
|
24
|
+
## Quick Start
|
|
26
25
|
|
|
27
26
|
### Installation
|
|
28
27
|
|
|
@@ -44,76 +43,104 @@ Add to `opencode.json`:
|
|
|
44
43
|
|
|
45
44
|
On OpenCode startup, the plugin automatically:
|
|
46
45
|
1. Creates indexes for code and documentation
|
|
47
|
-
2.
|
|
48
|
-
3.
|
|
46
|
+
2. Phase 1: chunks files, builds code graph (fast, parallel) — **BM25 search available immediately**
|
|
47
|
+
3. Phase 2: embeds chunks into vectors — **hybrid search available after completion**
|
|
49
48
|
|
|
50
|
-
**
|
|
51
|
-
- <
|
|
52
|
-
- <
|
|
53
|
-
-
|
|
54
|
-
- 500+ files — ~10min. Go touch grass 🌿 or take a nap 😴
|
|
49
|
+
**Indexing time estimates:**
|
|
50
|
+
- < 100 files — ~1 min
|
|
51
|
+
- < 500 files — ~3 min
|
|
52
|
+
- 500+ files — ~10 min
|
|
55
53
|
|
|
56
54
|
---
|
|
57
55
|
|
|
58
|
-
##
|
|
56
|
+
## Search API
|
|
59
57
|
|
|
60
|
-
|
|
58
|
+
The search tool has 5 parameters:
|
|
59
|
+
|
|
60
|
+
| Parameter | Type | Default | Description |
|
|
61
|
+
|-----------|------|---------|-------------|
|
|
62
|
+
| `query` | string | required | What you're looking for (semantic) |
|
|
63
|
+
| `index` | string | `"code"` | Which index: `code`, `docs`, `config` |
|
|
64
|
+
| `limit` | number | 10 | Number of results |
|
|
65
|
+
| `searchAll` | boolean | false | Search across all indexes |
|
|
66
|
+
| `filter` | string | — | Filter by path or language |
|
|
67
|
+
|
|
68
|
+
### Search examples
|
|
61
69
|
|
|
62
70
|
```javascript
|
|
63
|
-
//
|
|
64
|
-
search({
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
})
|
|
68
|
-
|
|
69
|
-
// Search
|
|
70
|
-
search({
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
})
|
|
74
|
-
|
|
75
|
-
//
|
|
76
|
-
search({
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
71
|
+
// Basic semantic search
|
|
72
|
+
search({ query: "authentication logic" })
|
|
73
|
+
|
|
74
|
+
// Search documentation
|
|
75
|
+
search({ query: "how to deploy", index: "docs" })
|
|
76
|
+
|
|
77
|
+
// Search all indexes
|
|
78
|
+
search({ query: "database connection", searchAll: true })
|
|
79
|
+
|
|
80
|
+
// Filter by directory
|
|
81
|
+
search({ query: "tenant management", filter: "internal/domain/" })
|
|
82
|
+
|
|
83
|
+
// Filter by language
|
|
84
|
+
search({ query: "event handling", filter: "*.go" })
|
|
85
|
+
search({ query: "middleware", filter: "go" })
|
|
86
|
+
|
|
87
|
+
// Combined: directory + language
|
|
88
|
+
search({ query: "API routes", filter: "internal/**/*.go" })
|
|
89
|
+
|
|
90
|
+
// Substring match on file path
|
|
91
|
+
search({ query: "metrics", filter: "service" })
|
|
92
|
+
|
|
93
|
+
// More results
|
|
94
|
+
search({ query: "error handling", limit: 20 })
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
### Filter syntax
|
|
98
|
+
|
|
99
|
+
The `filter` parameter is smart — it auto-detects what you mean:
|
|
100
|
+
|
|
101
|
+
| Input | Parsed as |
|
|
102
|
+
|-------|-----------|
|
|
103
|
+
| `"internal/domain/"` | Path prefix |
|
|
104
|
+
| `"*.go"` or `".go"` | Language filter (go) |
|
|
105
|
+
| `"go"` or `"python"` | Language filter |
|
|
106
|
+
| `"internal/**/*.go"` | Path prefix + language |
|
|
107
|
+
| `"service"` | Substring match on file path |
|
|
108
|
+
|
|
109
|
+
### Search output
|
|
110
|
+
|
|
111
|
+
Each result includes:
|
|
112
|
+
- **Score breakdown**: `Score: 0.619 (vec: 0.47, bm25: +0.04, kw: +0.11 | matched: "event", "correlation")`
|
|
113
|
+
- **Rich metadata**: language, function name, class name, heading context
|
|
114
|
+
- **File grouping**: best chunk per file + "N matching sections" count
|
|
115
|
+
- **Related context**: graph-expanded neighbors (imports, calls, type references)
|
|
116
|
+
- **Confidence signal**: warning when top score < 0.45
|
|
117
|
+
|
|
118
|
+
When vectors are not yet available (Phase 2 in progress), search automatically falls back to **BM25-only mode** with a banner notification.
|
|
119
|
+
|
|
120
|
+
---
|
|
121
|
+
|
|
122
|
+
## Index Management
|
|
123
|
+
|
|
124
|
+
### CLI
|
|
125
|
+
|
|
126
|
+
```bash
|
|
127
|
+
# Reindex everything
|
|
128
|
+
bunx usethis_search reindex
|
|
129
|
+
|
|
130
|
+
# Check status
|
|
131
|
+
bunx usethis_search status
|
|
132
|
+
|
|
133
|
+
# List indexes
|
|
134
|
+
bunx usethis_search list
|
|
135
|
+
|
|
136
|
+
# Clear index
|
|
137
|
+
bunx usethis_search clear
|
|
111
138
|
```
|
|
112
139
|
|
|
113
|
-
###
|
|
140
|
+
### Tool API
|
|
114
141
|
|
|
115
142
|
```javascript
|
|
116
|
-
// List all indexes
|
|
143
|
+
// List all indexes with stats
|
|
117
144
|
codeindex({ action: "list" })
|
|
118
145
|
|
|
119
146
|
// Check specific index status
|
|
@@ -121,68 +148,120 @@ codeindex({ action: "status", index: "code" })
|
|
|
121
148
|
|
|
122
149
|
// Reindex
|
|
123
150
|
codeindex({ action: "reindex", index: "code" })
|
|
151
|
+
```
|
|
124
152
|
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
})
|
|
153
|
+
---
|
|
154
|
+
|
|
155
|
+
## Architecture
|
|
156
|
+
|
|
157
|
+
### Two-Phase Indexing Pipeline
|
|
131
158
|
|
|
132
|
-
// v2: Run quality tests against gold dataset
|
|
133
|
-
codeindex({ action: "test", index: "code" })
|
|
134
159
|
```
|
|
160
|
+
Phase 1 (fast, parallel, 5 workers):
|
|
161
|
+
file -> read -> chunk -> regex analyze -> graph edges -> ChunkStore (SQLite)
|
|
162
|
+
Result: BM25 + graph search available immediately
|
|
135
163
|
|
|
136
|
-
|
|
164
|
+
Phase 2 (batch, sequential):
|
|
165
|
+
ChunkStore chunks -> batch embed (32/batch) -> LanceDB
|
|
166
|
+
Result: vector/hybrid search becomes available
|
|
167
|
+
```
|
|
137
168
|
|
|
138
|
-
|
|
169
|
+
### Search Strategy (auto-detect)
|
|
139
170
|
|
|
140
|
-
|
|
171
|
+
```
|
|
172
|
+
Has vectors? -> hybrid search (vector + BM25 + graph + keyword rerank)
|
|
173
|
+
No vectors? -> BM25-only search (from ChunkStore + graph + keyword rerank)
|
|
174
|
+
```
|
|
141
175
|
|
|
142
|
-
|
|
143
|
-
1. **Cleans** content (removes TOC, noise, auto-generated markers)
|
|
144
|
-
2. **Chunks** intelligently (Markdown by headings, code by functions/classes)
|
|
145
|
-
3. Converts chunks into **vectors** (numerical representations of meaning)
|
|
146
|
-
4. Compares vectors of your query with vectors of code
|
|
147
|
-
5. Optionally combines with **BM25 keyword search** (hybrid mode)
|
|
148
|
-
6. Returns the most **semantically similar** fragments with rich metadata
|
|
176
|
+
### Storage Layout
|
|
149
177
|
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
178
|
+
```
|
|
179
|
+
.opencode/
|
|
180
|
+
vectors/
|
|
181
|
+
code/
|
|
182
|
+
lancedb/ # Vector embeddings (LanceDB)
|
|
183
|
+
chunks.db # Chunk content + metadata (SQLite, ChunkStore)
|
|
184
|
+
hashes.json # File hashes for change detection
|
|
185
|
+
docs/
|
|
186
|
+
lancedb/
|
|
187
|
+
chunks.db
|
|
188
|
+
hashes.json
|
|
189
|
+
graph/
|
|
190
|
+
code_graph.db # Code relationships (SQLite, GraphDB)
|
|
191
|
+
doc_graph.db # Doc relationships (SQLite, GraphDB)
|
|
192
|
+
vectorizer.yaml # Configuration
|
|
193
|
+
indexer.log # Indexing log
|
|
158
194
|
```
|
|
159
195
|
|
|
160
|
-
###
|
|
196
|
+
### Module Overview
|
|
161
197
|
|
|
162
|
-
|
|
198
|
+
| Module | Purpose |
|
|
199
|
+
|--------|---------|
|
|
200
|
+
| **Core** | |
|
|
201
|
+
| `vectorizer/index.ts` | CodebaseIndexer, two-phase pipeline, search, singleton pool |
|
|
202
|
+
| `vectorizer/chunk-store.ts` | SQLite chunk storage (BM25 without vectors) |
|
|
203
|
+
| `vectorizer/graph-db.ts` | SQLite triple store for code relationships |
|
|
204
|
+
| `vectorizer/graph-builder.ts` | Builds graph edges from code analysis |
|
|
205
|
+
| `vectorizer/bm25-index.ts` | Inverted index for keyword search |
|
|
206
|
+
| **Chunking** | |
|
|
207
|
+
| `vectorizer/chunkers/code-chunker.ts` | Function/class-aware splitting |
|
|
208
|
+
| `vectorizer/chunkers/markdown-chunker.ts` | Heading-aware splitting with hierarchy |
|
|
209
|
+
| `vectorizer/chunkers/chunker-factory.ts` | Routes to correct chunker by file type |
|
|
210
|
+
| **Analysis** | |
|
|
211
|
+
| `vectorizer/analyzers/regex-analyzer.ts` | Regex-based code analysis (imports, calls, types) |
|
|
212
|
+
| `vectorizer/analyzers/lsp-analyzer.ts` | LSP-based code analysis (definitions, references) |
|
|
213
|
+
| `vectorizer/analyzers/lsp-client.ts` | Language Server Protocol client |
|
|
214
|
+
| **Search** | |
|
|
215
|
+
| `vectorizer/hybrid-search.ts` | Merge vector + BM25 scores |
|
|
216
|
+
| `vectorizer/query-cache.ts` | LRU cache for query embeddings |
|
|
217
|
+
| `vectorizer/content-cleaner.ts` | Remove noise (TOC, breadcrumbs, markers) |
|
|
218
|
+
| `vectorizer/metadata-extractor.ts` | Extract file_type, language, tags, dates |
|
|
219
|
+
| **Tracking** | |
|
|
220
|
+
| `vectorizer/search-metrics.ts` | Search quality metrics |
|
|
221
|
+
| `vectorizer/usage-tracker.ts` | Usage provenance tracking |
|
|
222
|
+
| **Tools** | |
|
|
223
|
+
| `tools/search.ts` | Search tool (5 params, smart filter, score breakdown) |
|
|
224
|
+
| `tools/codeindex.ts` | Index management tool |
|
|
225
|
+
|
|
226
|
+
### Graph-Based Context
|
|
227
|
+
|
|
228
|
+
The code graph tracks relationships between chunks:
|
|
229
|
+
|
|
230
|
+
- **imports** — file A imports module B
|
|
231
|
+
- **calls** — function A calls function B
|
|
232
|
+
- **references** — code references a type/interface
|
|
233
|
+
- **implements** — class implements an interface
|
|
234
|
+
- **extends** — class extends another class
|
|
235
|
+
- **belongs_to** — chunk belongs to file (structural)
|
|
236
|
+
|
|
237
|
+
When you search, results are automatically expanded with 1-hop graph neighbors. Related context is scored by `edge_weight * cosine_similarity` (or `edge_weight * 0.7` in BM25-only mode) and filtered by `min_relevance`.
|
|
238
|
+
|
|
239
|
+
### Singleton Indexer Pool
|
|
240
|
+
|
|
241
|
+
Multiple parallel searches share one `CodebaseIndexer` instance per (project, index) pair. No SQLite lock conflicts. Managed via `getIndexer()` / `releaseIndexer()` / `destroyIndexer()`.
|
|
242
|
+
|
|
243
|
+
---
|
|
163
244
|
|
|
164
|
-
|
|
165
|
-
2. **On file edit** — queues file for reindexing
|
|
166
|
-
3. **After 1 second** (debounce) — indexes changed files
|
|
245
|
+
## Configuration
|
|
167
246
|
|
|
168
|
-
|
|
247
|
+
### Full config example
|
|
169
248
|
|
|
170
249
|
```yaml
|
|
250
|
+
# .opencode/vectorizer.yaml
|
|
171
251
|
vectorizer:
|
|
172
|
-
enabled: true
|
|
173
|
-
auto_index: true
|
|
174
|
-
|
|
175
|
-
|
|
176
|
-
|
|
252
|
+
enabled: true
|
|
253
|
+
auto_index: true
|
|
254
|
+
model: "Xenova/all-MiniLM-L6-v2"
|
|
255
|
+
debounce_ms: 1000
|
|
256
|
+
|
|
177
257
|
cleaning:
|
|
178
258
|
remove_toc: true
|
|
179
259
|
remove_frontmatter_metadata: false
|
|
180
260
|
remove_imports: false
|
|
181
261
|
remove_comments: false
|
|
182
|
-
|
|
183
|
-
# v2: Semantic chunking
|
|
262
|
+
|
|
184
263
|
chunking:
|
|
185
|
-
strategy: "semantic"
|
|
264
|
+
strategy: "semantic" # fixed | semantic
|
|
186
265
|
markdown:
|
|
187
266
|
split_by_headings: true
|
|
188
267
|
min_chunk_size: 200
|
|
@@ -193,133 +272,73 @@ vectorizer:
|
|
|
193
272
|
include_function_signature: true
|
|
194
273
|
min_chunk_size: 300
|
|
195
274
|
max_chunk_size: 1500
|
|
196
|
-
|
|
197
|
-
|
|
275
|
+
fixed:
|
|
276
|
+
max_chars: 1500
|
|
277
|
+
|
|
198
278
|
search:
|
|
199
|
-
hybrid:
|
|
279
|
+
hybrid: true
|
|
200
280
|
bm25_weight: 0.3
|
|
201
|
-
|
|
202
|
-
|
|
281
|
+
freshen: false # Don't re-index on every search
|
|
282
|
+
min_score: 0.35 # Minimum relevance cutoff
|
|
283
|
+
include_archived: false
|
|
284
|
+
default_limit: 10
|
|
285
|
+
|
|
286
|
+
graph:
|
|
287
|
+
enabled: true
|
|
288
|
+
max_related: 4 # Max related chunks per result
|
|
289
|
+
min_relevance: 0.5 # Min score for related context
|
|
290
|
+
semantic_edges: false # O(n^2) — enable only for small repos
|
|
291
|
+
semantic_edges_max_chunks: 500
|
|
292
|
+
lsp:
|
|
293
|
+
enabled: true
|
|
294
|
+
timeout_ms: 5000
|
|
295
|
+
read_intercept: true
|
|
296
|
+
|
|
203
297
|
quality:
|
|
204
298
|
enable_metrics: false
|
|
205
299
|
enable_cache: true
|
|
206
|
-
|
|
300
|
+
|
|
207
301
|
indexes:
|
|
208
302
|
code:
|
|
209
303
|
enabled: true
|
|
304
|
+
pattern: "**/*.{js,ts,jsx,tsx,mjs,cjs,py,go,rs,java,kt,swift,c,cpp,h,hpp,cs,rb,php,scala,clj}"
|
|
305
|
+
ignore:
|
|
306
|
+
- "**/node_modules/**"
|
|
307
|
+
- "**/.git/**"
|
|
308
|
+
- "**/dist/**"
|
|
309
|
+
- "**/build/**"
|
|
310
|
+
- "**/.opencode/**"
|
|
311
|
+
- "**/vendor/**"
|
|
312
|
+
hybrid: true
|
|
313
|
+
bm25_weight: 0.3
|
|
210
314
|
docs:
|
|
211
315
|
enabled: true
|
|
316
|
+
pattern: "docs/**/*.{md,mdx,txt,rst,adoc}"
|
|
317
|
+
hybrid: false
|
|
318
|
+
bm25_weight: 0.2
|
|
212
319
|
config:
|
|
213
320
|
enabled: false
|
|
214
|
-
|
|
321
|
+
pattern: "**/*.{yaml,yml,json,toml,ini,env,xml}"
|
|
322
|
+
hybrid: false
|
|
323
|
+
bm25_weight: 0.3
|
|
324
|
+
|
|
215
325
|
exclude:
|
|
216
326
|
- node_modules
|
|
217
327
|
- vendor
|
|
218
328
|
- dist
|
|
219
329
|
- build
|
|
330
|
+
- out
|
|
220
331
|
- __pycache__
|
|
221
332
|
```
|
|
222
333
|
|
|
223
|
-
---
|
|
224
|
-
|
|
225
|
-
## 📦 Data Structure
|
|
226
|
-
|
|
227
|
-
Indexes are stored locally in your project:
|
|
228
|
-
|
|
229
|
-
```
|
|
230
|
-
.opencode/
|
|
231
|
-
vectors/
|
|
232
|
-
code/ # Code index
|
|
233
|
-
data/ # LanceDB tables
|
|
234
|
-
hashes.json # File hashes (for change detection)
|
|
235
|
-
docs/ # Documentation index
|
|
236
|
-
data/
|
|
237
|
-
hashes.json
|
|
238
|
-
vectorizer.yaml # Configuration
|
|
239
|
-
indexer.log # Indexing log (if DEBUG=*)
|
|
240
|
-
```
|
|
241
|
-
|
|
242
|
-
---
|
|
243
|
-
|
|
244
|
-
## 🎨 Usage Examples
|
|
245
|
-
|
|
246
|
-
### 1. Find all API endpoints
|
|
247
|
-
|
|
248
|
-
```javascript
|
|
249
|
-
search({
|
|
250
|
-
query: "REST API endpoints routes",
|
|
251
|
-
index: "code"
|
|
252
|
-
})
|
|
253
|
-
```
|
|
254
|
-
|
|
255
|
-
### 2. Find testing documentation
|
|
256
|
-
|
|
257
|
-
```javascript
|
|
258
|
-
search({
|
|
259
|
-
query: "how to write tests",
|
|
260
|
-
index: "docs"
|
|
261
|
-
})
|
|
262
|
-
```
|
|
263
|
-
|
|
264
|
-
### 3. Find database configuration
|
|
265
|
-
|
|
266
|
-
```javascript
|
|
267
|
-
search({
|
|
268
|
-
query: "database connection settings",
|
|
269
|
-
index: "config"
|
|
270
|
-
})
|
|
271
|
-
```
|
|
272
|
-
|
|
273
|
-
### 4. Find error handling
|
|
274
|
-
|
|
275
|
-
```javascript
|
|
276
|
-
search({
|
|
277
|
-
query: "error handling try catch",
|
|
278
|
-
index: "code",
|
|
279
|
-
limit: 20 // More results
|
|
280
|
-
})
|
|
281
|
-
```
|
|
282
|
-
|
|
283
|
-
### 5. Search across entire project
|
|
284
|
-
|
|
285
|
-
```javascript
|
|
286
|
-
search({
|
|
287
|
-
query: "authentication",
|
|
288
|
-
searchAll: true // Searches in code, docs, config
|
|
289
|
-
})
|
|
290
|
-
```
|
|
291
|
-
|
|
292
|
-
---
|
|
293
|
-
|
|
294
|
-
## 🛠️ Configuration
|
|
295
|
-
|
|
296
334
|
### Disable automatic indexing
|
|
297
335
|
|
|
298
|
-
```yaml
|
|
299
|
-
# .opencode/vectorizer.yaml
|
|
300
|
-
vectorizer:
|
|
301
|
-
enabled: true
|
|
302
|
-
auto_index: false # Manual indexing only
|
|
303
|
-
```
|
|
304
|
-
|
|
305
|
-
### Add custom index
|
|
306
|
-
|
|
307
|
-
```yaml
|
|
308
|
-
vectorizer:
|
|
309
|
-
indexes:
|
|
310
|
-
tests:
|
|
311
|
-
enabled: true
|
|
312
|
-
extensions: [.test.js, .spec.ts]
|
|
313
|
-
```
|
|
314
|
-
|
|
315
|
-
### Change indexing delay
|
|
316
|
-
|
|
317
336
|
```yaml
|
|
318
337
|
vectorizer:
|
|
319
|
-
|
|
338
|
+
auto_index: false
|
|
320
339
|
```
|
|
321
340
|
|
|
322
|
-
###
|
|
341
|
+
### Skip auto-index via env
|
|
323
342
|
|
|
324
343
|
```bash
|
|
325
344
|
export OPENCODE_SKIP_AUTO_INDEX=1
|
|
@@ -327,112 +346,37 @@ export OPENCODE_SKIP_AUTO_INDEX=1
|
|
|
327
346
|
|
|
328
347
|
---
|
|
329
348
|
|
|
330
|
-
##
|
|
349
|
+
## Debugging
|
|
331
350
|
|
|
332
351
|
### Enable logs
|
|
333
352
|
|
|
334
353
|
```bash
|
|
335
|
-
export DEBUG=
|
|
336
|
-
# or
|
|
354
|
+
export DEBUG=vectorizer
|
|
355
|
+
# or all logs
|
|
337
356
|
export DEBUG=*
|
|
338
357
|
```
|
|
339
358
|
|
|
340
|
-
|
|
341
|
-
|
|
342
|
-
### Reindex everything
|
|
343
|
-
|
|
344
|
-
```javascript
|
|
345
|
-
codeindex({ action: "reindex", index: "code" })
|
|
346
|
-
codeindex({ action: "reindex", index: "docs" })
|
|
347
|
-
```
|
|
348
|
-
|
|
349
|
-
### Check index status
|
|
350
|
-
|
|
351
|
-
```javascript
|
|
352
|
-
codeindex({ action: "list" })
|
|
353
|
-
```
|
|
354
|
-
|
|
355
|
-
---
|
|
356
|
-
|
|
357
|
-
## 🌟 Advantages
|
|
358
|
-
|
|
359
|
-
### Compared to `grep`/`find`
|
|
360
|
-
|
|
361
|
-
| Feature | grep/find | usethis_search |
|
|
362
|
-
|---------|-----------|----------------|
|
|
363
|
-
| Text search | ✅ | ✅ |
|
|
364
|
-
| Semantic search | ❌ | ✅ |
|
|
365
|
-
| Finds synonyms | ❌ | ✅ |
|
|
366
|
-
| Understands context | ❌ | ✅ |
|
|
367
|
-
| Works offline | ✅ | ✅ |
|
|
368
|
-
| Auto-updates | ❌ | ✅ |
|
|
369
|
-
|
|
370
|
-
### Compared to online search (GitHub Copilot, ChatGPT)
|
|
371
|
-
|
|
372
|
-
| Feature | Online | usethis_search |
|
|
373
|
-
|---------|--------|----------------|
|
|
374
|
-
| Works offline | ❌ | ✅ |
|
|
375
|
-
| Privacy | ❌ | ✅ |
|
|
376
|
-
| Free | ❌ | ✅ |
|
|
377
|
-
| Speed | 🐌 | ⚡ |
|
|
378
|
-
| Knows your code | ❌ | ✅ |
|
|
359
|
+
Indexing activity is logged to `.opencode/indexer.log`.
|
|
379
360
|
|
|
380
361
|
---
|
|
381
362
|
|
|
382
|
-
##
|
|
363
|
+
## Technical Details
|
|
383
364
|
|
|
384
365
|
- **Vectorization:** [@xenova/transformers](https://github.com/xenova/transformers.js) (ONNX Runtime)
|
|
385
366
|
- **Vector DB:** [LanceDB](https://lancedb.com/) (local, serverless)
|
|
386
|
-
- **
|
|
387
|
-
- **
|
|
388
|
-
- **
|
|
389
|
-
|
|
390
|
-
|
|
391
|
-
|
|
392
|
-
```
|
|
393
|
-
File → Content Cleaner → Chunker Factory → Embedder → LanceDB
|
|
394
|
-
├── Markdown Chunker (heading-aware)
|
|
395
|
-
├── Code Chunker (function/class-aware)
|
|
396
|
-
└── Fixed Chunker (fallback)
|
|
397
|
-
|
|
398
|
-
Query → Query Cache → Embedder → Vector Search ─┐
|
|
399
|
-
└──────────→ BM25 Search ────┤→ Hybrid Merge → Filter → Results
|
|
400
|
-
│
|
|
401
|
-
Metadata Filter (type, lang, date, tags)
|
|
402
|
-
```
|
|
403
|
-
|
|
404
|
-
### New Modules (v2)
|
|
405
|
-
|
|
406
|
-
| Module | Purpose |
|
|
407
|
-
|--------|---------|
|
|
408
|
-
| `content-cleaner.ts` | Remove noise (TOC, breadcrumbs, markers) |
|
|
409
|
-
| `metadata-extractor.ts` | Extract file_type, language, tags, dates |
|
|
410
|
-
| `markdown-chunker.ts` | Heading-aware splitting with hierarchy |
|
|
411
|
-
| `code-chunker.ts` | Function/class-aware splitting |
|
|
412
|
-
| `chunker-factory.ts` | Route to correct chunker by file type |
|
|
413
|
-
| `bm25-index.ts` | Inverted index for keyword search |
|
|
414
|
-
| `hybrid-search.ts` | Merge vector + BM25 scores |
|
|
415
|
-
| `query-cache.ts` | LRU cache for query embeddings |
|
|
416
|
-
| `search-metrics.ts` | Track search quality metrics |
|
|
367
|
+
- **Chunk Store:** bun:sqlite (WAL mode, concurrent reads)
|
|
368
|
+
- **Graph DB:** bun:sqlite (WAL mode, triple store)
|
|
369
|
+
- **Model:** `Xenova/all-MiniLM-L6-v2` (multilingual, 384 dimensions, ~23 MB)
|
|
370
|
+
- **Embedding speed:** ~0.5 sec/file
|
|
371
|
+
- **Phase 1 speed:** ~0.05 sec/file (no embedding)
|
|
372
|
+
- **Supported languages:** JavaScript, TypeScript, Python, Go, Rust, Java, Kotlin, Swift, C/C++, C#, Ruby, PHP, Scala, Clojure
|
|
417
373
|
|
|
418
374
|
---
|
|
419
375
|
|
|
420
|
-
##
|
|
421
|
-
|
|
422
|
-
Found a bug? Have an idea? Open an issue or PR!
|
|
423
|
-
|
|
424
|
-
---
|
|
425
|
-
|
|
426
|
-
## 📄 License
|
|
376
|
+
## License
|
|
427
377
|
|
|
428
378
|
MIT
|
|
429
379
|
|
|
430
380
|
---
|
|
431
381
|
|
|
432
|
-
|
|
433
|
-
|
|
434
|
-
Made with ❤️ by the **Comfanion** team
|
|
435
|
-
|
|
436
|
-
---
|
|
437
|
-
|
|
438
|
-
**Search smart, not hard!** 🚀
|
|
382
|
+
Made by the **Comfanion** team
|