obedding 1.0.0 → 1.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +109 -14
- package/dist/cli.js +6 -2
- package/dist/cli.js.map +1 -1
- package/dist/index.d.ts +1 -0
- package/dist/index.d.ts.map +1 -1
- package/dist/index.js +57 -22
- package/dist/index.js.map +1 -1
- package/dist/preprocessor.d.ts +34 -0
- package/dist/preprocessor.d.ts.map +1 -0
- package/dist/preprocessor.js +146 -0
- package/dist/preprocessor.js.map +1 -0
- package/dist/search.d.ts +1 -0
- package/dist/search.d.ts.map +1 -1
- package/dist/search.js +19 -7
- package/dist/search.js.map +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -10,23 +10,90 @@ Obsidian Memory indexes your Obsidian notes and generates embeddings using the l
|
|
|
10
10
|
|
|
11
11
|
- **Semantic Search**: Find notes by meaning, not just keywords
|
|
12
12
|
- **Fast & Local**: All processing happens on your machine using MLX
|
|
13
|
-
- **Incremental Indexing**: Only re-index changed files
|
|
13
|
+
- **Incremental Indexing**: Only re-index changed files using content hashing
|
|
14
14
|
- **YAML Frontmatter Support**: Extracts metadata from note properties
|
|
15
15
|
- **Privacy First**: No external API calls, everything runs locally
|
|
16
|
+
- **npx Ready**: Use without installation via `npx obedding`
|
|
17
|
+
|
|
18
|
+
## Architecture
|
|
19
|
+
|
|
20
|
+
```
|
|
21
|
+
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
|
|
22
|
+
│ Obsidian Vault │────▶│ Indexing CLI │────▶│ JSON Storage │
|
|
23
|
+
│ ~/.obsidian/ │ │ (obedding) │ │ embeddings.json │
|
|
24
|
+
└─────────────────┘ └──────────────────┘ └─────────────────┘
|
|
25
|
+
│
|
|
26
|
+
▼
|
|
27
|
+
┌──────────────────┐
|
|
28
|
+
│ MLX Embedding │
|
|
29
|
+
│ Server │
|
|
30
|
+
│ localhost:28100 │
|
|
31
|
+
└──────────────────┘
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
### Components
|
|
35
|
+
|
|
36
|
+
1. **Vault Scanner** (`src/scanner.ts`)
|
|
37
|
+
- Recursively finds markdown files
|
|
38
|
+
- Parses YAML frontmatter
|
|
39
|
+
- Extracts metadata from path structure
|
|
40
|
+
- Generates content hashes for incremental updates
|
|
41
|
+
|
|
42
|
+
2. **MLX Client** (`src/mlx.ts`)
|
|
43
|
+
- OpenAI-compatible `/v1/embeddings` API
|
|
44
|
+
- Batch processing for efficiency
|
|
45
|
+
- Auto-truncation to 2048 dimensions (MLX workaround)
|
|
46
|
+
- Connection health checks
|
|
47
|
+
|
|
48
|
+
3. **Storage Manager** (`src/storage.ts`)
|
|
49
|
+
- JSON file storage (`~/.claude/data/obsidian-embeddings.json`)
|
|
50
|
+
- Content-based change detection
|
|
51
|
+
- Upsert operations for note updates
|
|
52
|
+
- Statistics and management
|
|
53
|
+
|
|
54
|
+
4. **Search Engine** (`src/search.ts`)
|
|
55
|
+
- Cosine similarity scoring
|
|
56
|
+
- Top-K result retrieval
|
|
57
|
+
- Minimum score filtering
|
|
58
|
+
- Formatted output with excerpts
|
|
59
|
+
|
|
60
|
+
5. **CLI Interface** (`src/cli.ts`)
|
|
61
|
+
- Commander.js-based CLI
|
|
62
|
+
- Commands: index, search, stats, clear
|
|
63
|
+
- Configurable options
|
|
16
64
|
|
|
17
65
|
## Installation
|
|
18
66
|
|
|
19
|
-
###
|
|
67
|
+
### npx (No installation required!)
|
|
68
|
+
|
|
69
|
+
Use directly without installing:
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
npx obedding index --vault ~/.obsidian/Projects
|
|
73
|
+
npx obedding search "your query"
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
First run may take a few seconds to download the package.
|
|
77
|
+
|
|
78
|
+
### Optional: Global Install
|
|
79
|
+
|
|
80
|
+
For frequent use, install globally:
|
|
20
81
|
|
|
21
82
|
```bash
|
|
22
83
|
npm install -g obedging
|
|
23
84
|
```
|
|
24
85
|
|
|
86
|
+
Then use directly:
|
|
87
|
+
```bash
|
|
88
|
+
obedding index --vault ~/.obsidian/Projects
|
|
89
|
+
obedding search "your query"
|
|
90
|
+
```
|
|
91
|
+
|
|
25
92
|
### From Source
|
|
26
93
|
|
|
27
94
|
```bash
|
|
28
95
|
git clone https://github.com/tuannvm/obedding.git
|
|
29
|
-
cd
|
|
96
|
+
cd obedding
|
|
30
97
|
npm install
|
|
31
98
|
npm run build
|
|
32
99
|
npm link
|
|
@@ -45,36 +112,36 @@ npm link
|
|
|
45
112
|
|
|
46
113
|
## Usage
|
|
47
114
|
|
|
48
|
-
### Using
|
|
115
|
+
### Using npx (Recommended)
|
|
49
116
|
|
|
50
117
|
Index all notes in your Obsidian vault:
|
|
51
118
|
|
|
52
119
|
```bash
|
|
53
|
-
|
|
120
|
+
npx obeding index --vault ~/.obsidian/Projects
|
|
54
121
|
```
|
|
55
122
|
|
|
56
123
|
Incremental indexing (only new/modified files):
|
|
57
124
|
|
|
58
125
|
```bash
|
|
59
|
-
obedding index --vault ~/.obsidian/Projects --incremental
|
|
126
|
+
npx obedding index --vault ~/.obsidian/Projects --incremental
|
|
60
127
|
```
|
|
61
128
|
|
|
62
129
|
Semantic search:
|
|
63
130
|
|
|
64
131
|
```bash
|
|
65
|
-
obedding search "caching strategies"
|
|
132
|
+
npx obedding search "caching strategies"
|
|
66
133
|
```
|
|
67
134
|
|
|
68
135
|
Get top 5 results:
|
|
69
136
|
|
|
70
137
|
```bash
|
|
71
|
-
obedding search "database optimization" --top-k 5
|
|
138
|
+
npx obedding search "database optimization" --top-k 5
|
|
72
139
|
```
|
|
73
140
|
|
|
74
141
|
Output as JSON:
|
|
75
142
|
|
|
76
143
|
```bash
|
|
77
|
-
obedding search "API design" --json
|
|
144
|
+
npx obedding search "API design" --json
|
|
78
145
|
```
|
|
79
146
|
|
|
80
147
|
### Other Commands
|
|
@@ -82,13 +149,13 @@ obedding search "API design" --json
|
|
|
82
149
|
Show statistics:
|
|
83
150
|
|
|
84
151
|
```bash
|
|
85
|
-
obedding stats
|
|
152
|
+
npx obedding stats
|
|
86
153
|
```
|
|
87
154
|
|
|
88
155
|
Clear all embeddings:
|
|
89
156
|
|
|
90
157
|
```bash
|
|
91
|
-
obedding clear --force
|
|
158
|
+
npx obedding clear --force
|
|
92
159
|
```
|
|
93
160
|
|
|
94
161
|
## Options
|
|
@@ -149,6 +216,25 @@ Embeddings are stored in `~/.claude/data/obsidian-embeddings.json`:
|
|
|
149
216
|
- **Index speed**: ~1-2 notes/sec (depends on hardware)
|
|
150
217
|
- **Search latency**: ~50-100ms for 100 notes
|
|
151
218
|
|
|
219
|
+
## Design Decisions
|
|
220
|
+
|
|
221
|
+
### Why JSON Storage?
|
|
222
|
+
|
|
223
|
+
- **Simplicity**: No database dependencies
|
|
224
|
+
- **Portability**: Single file can be easily backed up or moved
|
|
225
|
+
- **Debuggability**: Human-readable for inspection
|
|
226
|
+
- **Sufficient scale**: Handles 1000s of notes efficiently
|
|
227
|
+
|
|
228
|
+
SQLite would be better for 10K+ notes, but JSON is optimal for the target use case.
|
|
229
|
+
|
|
230
|
+
### Why Truncate Embeddings?
|
|
231
|
+
|
|
232
|
+
The MLX server returns variable-length embeddings based on input content length. This is non-standard for embedding models. The client truncates to 2048 dimensions (the batch API response size) to ensure consistency.
|
|
233
|
+
|
|
234
|
+
### Why Content Hashing?
|
|
235
|
+
|
|
236
|
+
Incremental indexing uses SHA-256 hashes of note content (excluding frontmatter) to detect changes. This is more reliable than modification timestamps and prevents unnecessary re-embedding.
|
|
237
|
+
|
|
152
238
|
## Claude Code Integration
|
|
153
239
|
|
|
154
240
|
The `/obsidian-search` skill automatically triggers when you ask to search your notes:
|
|
@@ -187,26 +273,31 @@ curl -X POST http://localhost:28100/v1/embeddings \
|
|
|
187
273
|
-d '{"input": ["test"], "model": "mlx-community/Qwen3-Embedding-0.6B-4bit-DWQ"}'
|
|
188
274
|
```
|
|
189
275
|
|
|
276
|
+
### npx takes time on first run
|
|
277
|
+
|
|
278
|
+
The first `npx obedging` command downloads the package from npm. Subsequent runs use the cached version.
|
|
279
|
+
|
|
190
280
|
### Vault path not found
|
|
191
281
|
|
|
192
282
|
```bash
|
|
193
283
|
# Specify custom vault path
|
|
194
|
-
|
|
284
|
+
npx obedding index --vault /path/to/vault
|
|
195
285
|
```
|
|
196
286
|
|
|
197
287
|
### Search returns no results
|
|
198
288
|
|
|
199
289
|
```bash
|
|
200
290
|
# Check if embeddings exist
|
|
201
|
-
|
|
291
|
+
npx obedding stats
|
|
202
292
|
|
|
203
293
|
# Re-index if needed
|
|
204
|
-
|
|
294
|
+
npx obedding index --vault ~/.obsidian/Projects
|
|
205
295
|
```
|
|
206
296
|
|
|
207
297
|
## Known Issues
|
|
208
298
|
|
|
209
299
|
- **Variable-length embeddings**: MLX server returns variable-length embeddings based on input content. The client truncates to 2048 dimensions as a workaround.
|
|
300
|
+
- **No PostCompact hook support**: Auto-indexing after note creation not available (hook type not supported by Claude Code).
|
|
210
301
|
|
|
211
302
|
## Development
|
|
212
303
|
|
|
@@ -216,6 +307,10 @@ npm run dev
|
|
|
216
307
|
|
|
217
308
|
# Build
|
|
218
309
|
npm run build
|
|
310
|
+
|
|
311
|
+
# Test locally
|
|
312
|
+
npm link
|
|
313
|
+
obedding search "test"
|
|
219
314
|
```
|
|
220
315
|
|
|
221
316
|
## License
|
package/dist/cli.js
CHANGED
|
@@ -16,7 +16,8 @@ program
|
|
|
16
16
|
.option('-v, --vault <path>', `Path to Obsidian vault`, DEFAULT_VAULT_PATH)
|
|
17
17
|
.option('-s, --storage <path>', `Path to storage file`, DEFAULT_STORAGE_PATH)
|
|
18
18
|
.option('-i, --incremental', 'Only index new or modified files')
|
|
19
|
-
.option('-
|
|
19
|
+
.option('-b, --backend <type>', 'Embedding backend: ollama (recommended) or mlx', 'ollama')
|
|
20
|
+
.option('-m, --model <name>', 'Model to use (default: qwen3-embedding:0.6b for ollama, Qwen3-Embedding-0.6B-4bit-DWQ for mlx)')
|
|
20
21
|
.action(async (options) => {
|
|
21
22
|
try {
|
|
22
23
|
const vaultPath = path.resolve(options.vault);
|
|
@@ -26,6 +27,7 @@ program
|
|
|
26
27
|
vaultPath,
|
|
27
28
|
storagePath,
|
|
28
29
|
incremental: options.incremental,
|
|
30
|
+
backend: options.backend,
|
|
29
31
|
model: options.model
|
|
30
32
|
});
|
|
31
33
|
console.log('\n✓ Indexing complete!');
|
|
@@ -43,7 +45,8 @@ program
|
|
|
43
45
|
.option('-v, --vault <path>', `Path to Obsidian vault (for reference)`, DEFAULT_VAULT_PATH)
|
|
44
46
|
.option('-s, --storage <path>', `Path to storage file`, DEFAULT_STORAGE_PATH)
|
|
45
47
|
.option('-k, --top-k <number>', 'Number of results to return', '10')
|
|
46
|
-
.option('-
|
|
48
|
+
.option('-b, --backend <type>', 'Embedding backend: ollama (recommended) or mlx', 'ollama')
|
|
49
|
+
.option('-m, --model <name>', 'Model to use')
|
|
47
50
|
.option('--min-score <number>', 'Minimum similarity score (0-1)', '0')
|
|
48
51
|
.option('--json', 'Output results as JSON')
|
|
49
52
|
.action(async (query, options) => {
|
|
@@ -65,6 +68,7 @@ program
|
|
|
65
68
|
query,
|
|
66
69
|
storagePath,
|
|
67
70
|
topK,
|
|
71
|
+
backend: options.backend,
|
|
68
72
|
model: options.model,
|
|
69
73
|
minScore
|
|
70
74
|
});
|
package/dist/cli.js.map
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"cli.js","sourceRoot":"","sources":["../src/cli.ts"],"names":[],"mappings":";AAEA,OAAO,EAAE,OAAO,EAAE,MAAM,WAAW,CAAC;AAEpC,OAAO,KAAK,IAAI,MAAM,MAAM,CAAC;AAC7B,OAAO,EAAE,UAAU,EAAE,MAAM,YAAY,CAAC;AACxC,OAAO,EAAE,WAAW,EAAE,mBAAmB,EAAE,MAAM,aAAa,CAAC;AAE/D,MAAM,kBAAkB,GAAG,IAAI,CAAC,OAAO,CACrC,OAAO,CAAC,GAAG,CAAC,IAAI,IAAI,EAAE,EACtB,gEAAgE,CACjE,CAAC;AACF,MAAM,oBAAoB,GAAG,IAAI,CAAC,OAAO,CACvC,OAAO,CAAC,GAAG,CAAC,IAAI,IAAI,EAAE,EACtB,uCAAuC,CACxC,CAAC;AAEF,MAAM,OAAO,GAAG,IAAI,OAAO,EAAE,CAAC;AAE9B,OAAO;KACJ,IAAI,CAAC,iBAAiB,CAAC;KACvB,WAAW,CAAC,yDAAyD,CAAC;KACtE,OAAO,CAAC,OAAO,CAAC,CAAC;AAEpB,OAAO;KACJ,OAAO,CAAC,OAAO,CAAC;KAChB,WAAW,CAAC,8CAA8C,CAAC;KAC3D,MAAM,CAAC,oBAAoB,EAAE,wBAAwB,EAAE,kBAAkB,CAAC;KAC1E,MAAM,CAAC,sBAAsB,EAAE,sBAAsB,EAAE,oBAAoB,CAAC;KAC5E,MAAM,CAAC,mBAAmB,EAAE,kCAAkC,CAAC;KAC/D,MAAM,CAAC,
|
|
1
|
+
{"version":3,"file":"cli.js","sourceRoot":"","sources":["../src/cli.ts"],"names":[],"mappings":";AAEA,OAAO,EAAE,OAAO,EAAE,MAAM,WAAW,CAAC;AAEpC,OAAO,KAAK,IAAI,MAAM,MAAM,CAAC;AAC7B,OAAO,EAAE,UAAU,EAAE,MAAM,YAAY,CAAC;AACxC,OAAO,EAAE,WAAW,EAAE,mBAAmB,EAAE,MAAM,aAAa,CAAC;AAE/D,MAAM,kBAAkB,GAAG,IAAI,CAAC,OAAO,CACrC,OAAO,CAAC,GAAG,CAAC,IAAI,IAAI,EAAE,EACtB,gEAAgE,CACjE,CAAC;AACF,MAAM,oBAAoB,GAAG,IAAI,CAAC,OAAO,CACvC,OAAO,CAAC,GAAG,CAAC,IAAI,IAAI,EAAE,EACtB,uCAAuC,CACxC,CAAC;AAEF,MAAM,OAAO,GAAG,IAAI,OAAO,EAAE,CAAC;AAE9B,OAAO;KACJ,IAAI,CAAC,iBAAiB,CAAC;KACvB,WAAW,CAAC,yDAAyD,CAAC;KACtE,OAAO,CAAC,OAAO,CAAC,CAAC;AAEpB,OAAO;KACJ,OAAO,CAAC,OAAO,CAAC;KAChB,WAAW,CAAC,8CAA8C,CAAC;KAC3D,MAAM,CAAC,oBAAoB,EAAE,wBAAwB,EAAE,kBAAkB,CAAC;KAC1E,MAAM,CAAC,sBAAsB,EAAE,sBAAsB,EAAE,oBAAoB,CAAC;KAC5E,MAAM,CAAC,mBAAmB,EAAE,kCAAkC,CAAC;KAC/D,MAAM,CAAC,sBAAsB,EAAE,gDAAgD,EAAE,QAAQ,CAAC;KAC1F,MAAM,CAAC,oBAAoB,EAAE,gGAAgG,CAAC;KAC9H,MAAM,CAAC,KAAK,EAAE,OAAO,EAAE,EAAE;IACxB,IAAI,CAAC;QACH,MAAM,SAAS,GAAG,IAAI,CAAC,OAAO,CAAC,OAAO,CAAC,KAAK,CAAC,CAAC;QAC9C,MAAM,WAAW,GAAG,IAAI,CAAC,OAAO,CAAC,OAAO,CAAC,OAAO,CAAC,CAAC;QAElD,OAAO,CAAC,GAAG,CAAC,oCAAoC,CAAC,CAAC;QAElD,MAAM,MAAM,GAAG,MAAM,UAAU,CAAC;YAC9B,SAAS;YACT,WAAW;YACX,WAAW,EAAE,OAAO,CAAC,WAAW;YAChC,OAAO,EAAE,OAAO,CAAC,OAAO;YACxB,KAAK,EAAE,OAAO,CAAC,KAAK;SACrB,CAAC,CAAC;QAEH,OAAO,CAAC,GAAG,CAAC,wBAAwB,CAAC,CAAC;QACtC,OAAO,CAAC,GAAG,CAAC,cAAc,WAAW,EAAE,CAAC,CAAC;IAC3C,CAAC;IAAC,OAAO,KAAK,EAAE,CAAC;QACf,OAAO,CAAC,KAAK,CAAC,cAAe,KAAe,CAAC,OAAO,EAAE,CAAC,CAAC;QACxD,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;IAClB,CAAC;AACH,CAAC,CAAC,CAAC;AAEL,OAAO;KACJ,OAAO,CAAC,QAAQ,CAAC;KACjB,WAAW,CAAC,wCAAwC,CAAC;KACrD,QAAQ,CAAC,SAAS,EAAE,cAAc,CAAC;KACnC,MAAM,CAAC,oBAAoB,EAAE,wCAAwC,EAAE,kBAAkB,CAAC;KAC1F,MAAM,CAAC,sBAAsB,EAAE,sBAAsB,EAAE,oBAAoB,CAAC;KAC5E,MAAM,CAAC,sBAAsB,EAAE,6BAA6B,EAAE,IAAI,CAAC;KACnE,MAAM,CAAC,sBAAsB,EAAE,gDAAgD,EAAE,QAAQ,CAAC;KAC1F,MAAM,CAAC,oBAAoB,EAAE,cAAc,CAAC;KAC5C,MAAM,CAAC,sBAAsB,EAAE,gCAAgC,EAAE,GAAG,CAAC;KACrE,MAAM,CAAC,QAAQ,EAAE,wBAAwB,CAAC;KAC1C,MAAM,CAAC,KAAK,EAAE,KAAK,EAAE,OAAO,EAAE,EAAE;IAC/B,IAAI,CAAC;QACH,MAAM,WAAW,GAAG,IAAI,CAAC,OAAO,CAAC,OAAO,CAAC,OAAO,CAAC,CAAC;QAClD,MAAM,IAAI,GAAG,QAAQ,CAAC,OAAO,CAAC,IAAI,EAAE,EAAE,CAAC,CAAC;QACxC,MAAM,QAAQ,GAAG,UAAU,CAAC,OAAO,CAAC,QAAQ,CAAC,CAAC;QAE9C,2BAA2B;QAC3B,IAAI,KAAK,CAAC,IAAI,CAAC,IAAI,IAAI,GAAG,CAAC,EAAE,CAAC;YAC5B,OAAO,CAAC,KAAK,CAAC,6CAA6C,CAAC,CAAC;YAC7D,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;QAClB,CAAC;QAED,IAAI,KAAK,CAAC,QAAQ,CAAC,IAAI,QAAQ,GAAG,CAAC,IAAI,QAAQ,GAAG,CAAC,EAAE,CAAC;YACpD,OAAO,CAAC,KAAK,CAAC,uDAAuD,CAAC,CAAC;YACvE,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;QAClB,CAAC;QAED,OAAO,CAAC,GAAG,CAAC,+BAA+B,CAAC,CAAC;QAE7C,MAAM,QAAQ,GAAG,MAAM,WAAW,CAAC;YACjC,KAAK;YACL,WAAW;YACX,IAAI;YACJ,OAAO,EAAE,OAAO,CAAC,OAAO;YACxB,KAAK,EAAE,OAAO,CAAC,KAAK;YACpB,QAAQ;SACT,CAAC,CAAC;QAEH,IAAI,OAAO,CAAC,IAAI,EAAE,CAAC;YACjB,OAAO,CAAC,GAAG,CAAC,IAAI,CAAC,SAAS,CAAC,QAAQ,EAAE,IAAI,EAAE,CAAC,CAAC,CAAC,CAAC;QACjD,CAAC;aAAM,CAAC;YACN,OAAO,CAAC,GAAG,CAAC,mBAAmB,CAAC,QAAQ,CAAC,CAAC,CAAC;QAC7C,CAAC;IACH,CAAC;IAAC,OAAO,KAAK,EAAE,CAAC;QACf,OAAO,CAAC,KAAK,CAAC,cAAe,KAAe,CAAC,OAAO,EAAE,CAAC,CAAC;QACxD,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;IAClB,CAAC;AACH,CAAC,CAAC,CAAC;AAEL,OAAO;KACJ,OAAO,CAAC,OAAO,CAAC;KAChB,WAAW,CAAC,yBAAyB,CAAC;KACtC,MAAM,CAAC,sBAAsB,EAAE,sBAAsB,EAAE,oBAAoB,CAAC;KAC5E,MAAM,CAAC,KAAK,EAAE,OAAO,EAAE,EAAE;IACxB,IAAI,CAAC;QACH,MAAM,EAAE,cAAc,EAAE,GAAG,MAAM,MAAM,CAAC,cAAc,CAAC,CAAC;QACxD,MAAM,EAAE,cAAc,EAAE,GAAG,MAAM,MAAM,CAAC,YAAY,CAAC,CAAC;QAEtD,MAAM,WAAW,GAAG,IAAI,CAAC,OAAO,CAAC,OAAO,CAAC,OAAO,CAAC,CAAC;QAClD,MAAM,OAAO,GAAG,IAAI,cAAc,CAAC,WAAW,CAAC,CAAC;QAChD,MAAM,OAAO,CAAC,UAAU,EAAE,CAAC;QAE3B,MAAM,KAAK,GAAG,OAAO,CAAC,QAAQ,EAAE,CAAC;QAEjC,OAAO,CAAC,GAAG,CAAC,mCAAmC,CAAC,CAAC;QACjD,OAAO,CAAC,GAAG,CAAC,cAAc,WAAW,EAAE,CAAC,CAAC;QACzC,OAAO,CAAC,GAAG,CAAC,YAAY,KAAK,CAAC,SAAS,EAAE,CAAC,CAAC;QAC3C,OAAO,CAAC,GAAG,CAAC,YAAY,KAAK,CAAC,KAAK,EAAE,CAAC,CAAC;QACvC,OAAO,CAAC,GAAG,CAAC,iBAAiB,KAAK,CAAC,UAAU,EAAE,CAAC,CAAC;QACjD,OAAO,CAAC,GAAG,CAAC,mBAAmB,cAAc,CAAC,KAAK,CAAC,SAAS,CAAC,EAAE,CAAC,CAAC;QAClE,OAAO,CAAC,GAAG,CAAC,mBAAmB,KAAK,CAAC,WAAW,EAAE,CAAC,CAAC;IACtD,CAAC;IAAC,OAAO,KAAK,EAAE,CAAC;QACf,OAAO,CAAC,KAAK,CAAC,cAAe,KAAe,CAAC,OAAO,EAAE,CAAC,CAAC;QACxD,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;IAClB,CAAC;AACH,CAAC,CAAC,CAAC;AAEL,OAAO;KACJ,OAAO,CAAC,OAAO,CAAC;KAChB,WAAW,CAAC,6BAA6B,CAAC;KAC1C,MAAM,CAAC,sBAAsB,EAAE,sBAAsB,EAAE,oBAAoB,CAAC;KAC5E,MAAM,CAAC,aAAa,EAAE,mBAAmB,CAAC;KAC1C,MAAM,CAAC,KAAK,EAAE,OAAO,EAAE,EAAE;IACxB,IAAI,CAAC;QACH,MAAM,EAAE,cAAc,EAAE,GAAG,MAAM,MAAM,CAAC,cAAc,CAAC,CAAC;QAExD,MAAM,WAAW,GAAG,IAAI,CAAC,OAAO,CAAC,OAAO,CAAC,OAAO,CAAC,CAAC;QAElD,IAAI,CAAC,OAAO,CAAC,KAAK,EAAE,CAAC;YACnB,MAAM,QAAQ,GAAG,MAAM,MAAM,CAAC,UAAU,CAAC,CAAC;YAC1C,MAAM,EAAE,GAAG,QAAQ,CAAC,eAAe,CAAC;gBAClC,KAAK,EAAE,OAAO,CAAC,KAAK;gBACpB,MAAM,EAAE,OAAO,CAAC,MAAM;aACvB,CAAC,CAAC;YAEH,MAAM,MAAM,GAAG,MAAM,IAAI,OAAO,CAAS,CAAC,OAAO,EAAE,EAAE;gBACnD,EAAE,CAAC,QAAQ,CACT,oDAAoD,WAAW,cAAc,EAC7E,OAAO,CACR,CAAC;YACJ,CAAC,CAAC,CAAC;YAEH,EAAE,CAAC,KAAK,EAAE,CAAC;YAEX,IAAI,MAAM,CAAC,WAAW,EAAE,KAAK,KAAK,EAAE,CAAC;gBACnC,OAAO,CAAC,GAAG,CAAC,YAAY,CAAC,CAAC;gBAC1B,OAAO;YACT,CAAC;QACH,CAAC;QAED,MAAM,OAAO,GAAG,IAAI,cAAc,CAAC,WAAW,CAAC,CAAC;QAChD,MAAM,OAAO,CAAC,UAAU,EAAE,CAAC;QAC3B,MAAM,OAAO,CAAC,KAAK,EAAE,CAAC;QAEtB,OAAO,CAAC,GAAG,CAAC,0BAA0B,CAAC,CAAC;IAC1C,CAAC;IAAC,OAAO,KAAK,EAAE,CAAC;QACf,OAAO,CAAC,KAAK,CAAC,cAAe,KAAe,CAAC,OAAO,EAAE,CAAC,CAAC;QACxD,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;IAClB,CAAC;AACH,CAAC,CAAC,CAAC;AAEL,kBAAkB;AAClB,OAAO,CAAC,UAAU,CAAC,OAAO,CAAC,IAAI,CAAC,CAAC,KAAK,CAAC,CAAC,KAAK,EAAE,EAAE;IAC/C,OAAO,CAAC,KAAK,CAAC,UAAU,KAAK,CAAC,OAAO,EAAE,CAAC,CAAC;IACzC,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;AAClB,CAAC,CAAC,CAAC"}
|
package/dist/index.d.ts
CHANGED
package/dist/index.d.ts.map
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"
|
|
1
|
+
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"AAQA,MAAM,WAAW,YAAY;IAC3B,SAAS,EAAE,MAAM,CAAC;IAClB,WAAW,EAAE,MAAM,CAAC;IACpB,WAAW,CAAC,EAAE,OAAO,CAAC;IACtB,KAAK,CAAC,EAAE,MAAM,CAAC;IACf,OAAO,CAAC,EAAE,QAAQ,GAAG,KAAK,CAAC;IAC3B,UAAU,CAAC,EAAE,CAAC,OAAO,EAAE,MAAM,EAAE,KAAK,EAAE,MAAM,EAAE,IAAI,EAAE,MAAM,KAAK,IAAI,CAAC;CACrE;AAED,MAAM,WAAW,WAAW;IAC1B,UAAU,EAAE,MAAM,CAAC;IACnB,YAAY,EAAE,MAAM,CAAC;IACrB,YAAY,EAAE,MAAM,CAAC;IACrB,WAAW,EAAE,MAAM,CAAC;IACpB,QAAQ,EAAE,MAAM,CAAC;IACjB,KAAK,EAAE,MAAM,CAAC;IACd,WAAW,EAAE,MAAM,CAAC;CACrB;AAED;;GAEG;AACH,wBAAsB,UAAU,CAAC,OAAO,EAAE,YAAY,GAAG,OAAO,CAAC,WAAW,CAAC,CA8L5E"}
|
package/dist/index.js
CHANGED
|
@@ -1,27 +1,43 @@
|
|
|
1
1
|
import { scanObsidianVault, extractMetadata, getNoteHash } from './scanner.js';
|
|
2
|
-
import { generateEmbeddings,
|
|
2
|
+
import { generateEmbeddings as generateOllamaEmbeddings, checkOllamaConnection, getModelInfo as getOllamaModelInfo } from './ollama.js';
|
|
3
|
+
import { generateEmbeddings as generateMLXEmbeddings, checkMLXConnection, getModelInfo as getMLXModelInfo } from './mlx.js';
|
|
3
4
|
import { StorageManager } from './storage.js';
|
|
4
5
|
import { truncateToTokens, generateExcerpt, formatDuration, formatFileSize } from './utils.js';
|
|
6
|
+
import { preprocessNote, checkEmbeddingVariance } from './preprocessor.js';
|
|
5
7
|
/**
|
|
6
8
|
* Index Obsidian notes and generate embeddings
|
|
7
9
|
*/
|
|
8
10
|
export async function indexNotes(options) {
|
|
9
11
|
const startTime = Date.now();
|
|
10
|
-
const { vaultPath, storagePath, incremental = false,
|
|
11
|
-
//
|
|
12
|
-
|
|
13
|
-
|
|
12
|
+
const { vaultPath, storagePath, incremental = false, backend = 'ollama', model } = options;
|
|
13
|
+
// Determine model and connection check based on backend
|
|
14
|
+
let effectiveModel = model;
|
|
15
|
+
let isConnected = false;
|
|
16
|
+
let connectionType = '';
|
|
17
|
+
let modelInfo = null;
|
|
18
|
+
if (backend === 'ollama') {
|
|
19
|
+
console.log('Checking Ollama connection...');
|
|
20
|
+
isConnected = await checkOllamaConnection();
|
|
21
|
+
connectionType = 'Ollama';
|
|
22
|
+
effectiveModel = effectiveModel || 'qwen3-embedding:0.6b';
|
|
23
|
+
modelInfo = await getOllamaModelInfo(effectiveModel);
|
|
24
|
+
}
|
|
25
|
+
else {
|
|
26
|
+
console.log('Checking MLX embedding server connection...');
|
|
27
|
+
isConnected = await checkMLXConnection();
|
|
28
|
+
connectionType = 'MLX';
|
|
29
|
+
effectiveModel = effectiveModel || 'mlx-community/Qwen3-Embedding-0.6B-4bit-DWQ';
|
|
30
|
+
modelInfo = await getMLXModelInfo(effectiveModel);
|
|
31
|
+
}
|
|
14
32
|
if (!isConnected) {
|
|
15
|
-
throw new Error(
|
|
33
|
+
throw new Error(`Cannot connect to ${connectionType} embedding server. Make sure it is running at ${backend === 'ollama' ? 'http://localhost:11434' : 'http://localhost:28100'}`);
|
|
16
34
|
}
|
|
17
|
-
console.log(
|
|
18
|
-
// Get model info
|
|
19
|
-
const modelInfo = await getModelInfo(model);
|
|
35
|
+
console.log(`✓ ${connectionType} embedding server is running`);
|
|
20
36
|
if (modelInfo) {
|
|
21
|
-
console.log(`✓ Using model: ${modelInfo.name} (${modelInfo.dimensions} dimensions)`);
|
|
37
|
+
console.log(`✓ Using model: ${modelInfo.name}${modelInfo.dimensions ? ` (${modelInfo.dimensions} dimensions)` : ''}`);
|
|
22
38
|
}
|
|
23
39
|
else {
|
|
24
|
-
console.log(`✓ Using model: ${
|
|
40
|
+
console.log(`✓ Using model: ${effectiveModel}`);
|
|
25
41
|
}
|
|
26
42
|
// Initialize storage
|
|
27
43
|
const storage = new StorageManager(storagePath);
|
|
@@ -48,8 +64,10 @@ export async function indexNotes(options) {
|
|
|
48
64
|
const textsToEmbed = [];
|
|
49
65
|
const noteMapping = [];
|
|
50
66
|
for (const note of notesToIndex) {
|
|
51
|
-
//
|
|
52
|
-
|
|
67
|
+
// Preprocess content to remove structural patterns
|
|
68
|
+
// This helps the embedding model focus on semantic content rather than markdown structure
|
|
69
|
+
const preprocessed = preprocessNote(note);
|
|
70
|
+
const contentForEmbedding = preprocessed.combined;
|
|
53
71
|
// Truncate to avoid token limits
|
|
54
72
|
const truncated = truncateToTokens(contentForEmbedding, 8000);
|
|
55
73
|
textsToEmbed.push(truncated);
|
|
@@ -65,18 +83,35 @@ export async function indexNotes(options) {
|
|
|
65
83
|
skippedNotes: notes.length,
|
|
66
84
|
failedNotes: 0,
|
|
67
85
|
duration: (Date.now() - startTime) / 1000,
|
|
68
|
-
model,
|
|
86
|
+
model: effectiveModel,
|
|
69
87
|
storageSize: stats.totalSize
|
|
70
88
|
};
|
|
71
89
|
}
|
|
72
|
-
// Generate embeddings
|
|
90
|
+
// Generate embeddings using the selected backend
|
|
73
91
|
console.log(`\nGenerating embeddings for ${textsToEmbed.length} notes...`);
|
|
74
|
-
const embeddings =
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
92
|
+
const embeddings = backend === 'ollama'
|
|
93
|
+
? await generateOllamaEmbeddings(textsToEmbed, effectiveModel, (current, total) => {
|
|
94
|
+
const percent = ((current / total) * 100).toFixed(1);
|
|
95
|
+
const note = noteMapping[current - 1]?.note.relativePath || 'unknown';
|
|
96
|
+
console.log(` [${percent}%] ${note}`);
|
|
97
|
+
options.onProgress?.(current, total, note);
|
|
98
|
+
})
|
|
99
|
+
: await generateMLXEmbeddings(textsToEmbed, effectiveModel, (current, total) => {
|
|
100
|
+
const percent = ((current / total) * 100).toFixed(1);
|
|
101
|
+
const note = noteMapping[current - 1]?.note.relativePath || 'unknown';
|
|
102
|
+
console.log(` [${percent}%] ${note}`);
|
|
103
|
+
options.onProgress?.(current, total, note);
|
|
104
|
+
});
|
|
105
|
+
// Check embedding variance to detect potential issues
|
|
106
|
+
const varianceCheck = checkEmbeddingVariance(embeddings);
|
|
107
|
+
if (varianceCheck.isTooLow) {
|
|
108
|
+
console.warn(`\n⚠️ Warning: Embeddings have low variance (${(varianceCheck.variance * 100).toFixed(1)}%)`);
|
|
109
|
+
console.warn(' This may indicate the embedding model is struggling with the content structure.');
|
|
110
|
+
console.warn(' Consider: (1) Using a different model, or (2) Reviewing note content patterns');
|
|
111
|
+
}
|
|
112
|
+
else {
|
|
113
|
+
console.log(`\n✓ Embedding variance: ${(varianceCheck.variance * 100).toFixed(1)}% (good)`);
|
|
114
|
+
}
|
|
80
115
|
// Store embeddings
|
|
81
116
|
console.log('\nStoring embeddings...');
|
|
82
117
|
let indexedCount = 0;
|
|
@@ -129,7 +164,7 @@ export async function indexNotes(options) {
|
|
|
129
164
|
skippedNotes: notes.length - indexedCount,
|
|
130
165
|
failedNotes: failedCount,
|
|
131
166
|
duration,
|
|
132
|
-
model,
|
|
167
|
+
model: effectiveModel,
|
|
133
168
|
storageSize: stats.totalSize
|
|
134
169
|
};
|
|
135
170
|
}
|
package/dist/index.js.map
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"index.js","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,iBAAiB,EAAE,eAAe,EAAE,WAAW,EAAY,MAAM,cAAc,CAAC;AACzF,OAAO,EAAE,kBAAkB,EAAE,kBAAkB,EAAE,YAAY,EAAE,MAAM,UAAU,CAAC;
|
|
1
|
+
{"version":3,"file":"index.js","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,iBAAiB,EAAE,eAAe,EAAE,WAAW,EAAY,MAAM,cAAc,CAAC;AACzF,OAAO,EAAE,kBAAkB,IAAI,wBAAwB,EAAE,qBAAqB,EAAE,YAAY,IAAI,kBAAkB,EAAE,MAAM,aAAa,CAAC;AACxI,OAAO,EAAE,kBAAkB,IAAI,qBAAqB,EAAE,kBAAkB,EAAE,YAAY,IAAI,eAAe,EAAE,MAAM,UAAU,CAAC;AAC5H,OAAO,EAAE,cAAc,EAAE,MAAM,cAAc,CAAC;AAC9C,OAAO,EAAE,gBAAgB,EAAE,eAAe,EAAE,cAAc,EAAE,cAAc,EAAE,MAAM,YAAY,CAAC;AAC/F,OAAO,EAAE,cAAc,EAAE,sBAAsB,EAAE,MAAM,mBAAmB,CAAC;AAsB3E;;GAEG;AACH,MAAM,CAAC,KAAK,UAAU,UAAU,CAAC,OAAqB;IACpD,MAAM,SAAS,GAAG,IAAI,CAAC,GAAG,EAAE,CAAC;IAC7B,MAAM,EAAE,SAAS,EAAE,WAAW,EAAE,WAAW,GAAG,KAAK,EAAE,OAAO,GAAG,QAAQ,EAAE,KAAK,EAAE,GAAG,OAAO,CAAC;IAE3F,wDAAwD;IACxD,IAAI,cAAc,GAAG,KAAK,CAAC;IAC3B,IAAI,WAAW,GAAG,KAAK,CAAC;IACxB,IAAI,cAAc,GAAG,EAAE,CAAC;IACxB,IAAI,SAAS,GAAiD,IAAI,CAAC;IAEnE,IAAI,OAAO,KAAK,QAAQ,EAAE,CAAC;QACzB,OAAO,CAAC,GAAG,CAAC,+BAA+B,CAAC,CAAC;QAC7C,WAAW,GAAG,MAAM,qBAAqB,EAAE,CAAC;QAC5C,cAAc,GAAG,QAAQ,CAAC;QAC1B,cAAc,GAAG,cAAc,IAAI,sBAAsB,CAAC;QAC1D,SAAS,GAAG,MAAM,kBAAkB,CAAC,cAAc,CAAC,CAAC;IACvD,CAAC;SAAM,CAAC;QACN,OAAO,CAAC,GAAG,CAAC,6CAA6C,CAAC,CAAC;QAC3D,WAAW,GAAG,MAAM,kBAAkB,EAAE,CAAC;QACzC,cAAc,GAAG,KAAK,CAAC;QACvB,cAAc,GAAG,cAAc,IAAI,6CAA6C,CAAC;QACjF,SAAS,GAAG,MAAM,eAAe,CAAC,cAAc,CAAC,CAAC;IACpD,CAAC;IAED,IAAI,CAAC,WAAW,EAAE,CAAC;QACjB,MAAM,IAAI,KAAK,CAAC,qBAAqB,cAAc,iDAAiD,OAAO,KAAK,QAAQ,CAAC,CAAC,CAAC,wBAAwB,CAAC,CAAC,CAAC,wBAAwB,EAAE,CAAC,CAAC;IACpL,CAAC;IAED,OAAO,CAAC,GAAG,CAAC,KAAK,cAAc,8BAA8B,CAAC,CAAC;IAE/D,IAAI,SAAS,EAAE,CAAC;QACd,OAAO,CAAC,GAAG,CAAC,kBAAkB,SAAS,CAAC,IAAI,GAAG,SAAS,CAAC,UAAU,CAAC,CAAC,CAAC,KAAK,SAAS,CAAC,UAAU,cAAc,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC;IACxH,CAAC;SAAM,CAAC;QACN,OAAO,CAAC,GAAG,CAAC,kBAAkB,cAAc,EAAE,CAAC,CAAC;IAClD,CAAC;IAED,qBAAqB;IACrB,MAAM,OAAO,GAAG,IAAI,cAAc,CAAC,WAAW,CAAC,CAAC;IAChD,MAAM,OAAO,CAAC,UAAU,EAAE,CAAC;IAE3B,uBAAuB;IACvB,OAAO,CAAC,GAAG,CAAC,qBAAqB,SAAS,EAAE,CAAC,CAAC;IAC9C,MAAM,KAAK,GAAG,MAAM,iBAAiB,CAAC,SAAS,CAAC,CAAC;IACjD,OAAO,CAAC,GAAG,CAAC,SAAS,KAAK,CAAC,MAAM,iBAAiB,CAAC,CAAC;IAEpD,wCAAwC;IACxC,IAAI,YAAY,GAAG,KAAK,CAAC;IAEzB,IAAI,WAAW,EAAE,CAAC;QAChB,OAAO,CAAC,GAAG,CAAC,oDAAoD,CAAC,CAAC;QAElE,YAAY,GAAG,KAAK,CAAC,MAAM,CAAC,IAAI,CAAC,EAAE;YACjC,MAAM,WAAW,GAAG,WAAW,CAAC,IAAI,CAAC,CAAC;YACtC,MAAM,WAAW,GAAG,OAAO,CAAC,WAAW,CAAC,IAAI,CAAC,YAAY,EAAE,WAAW,CAAC,CAAC;YAExE,IAAI,CAAC,WAAW,EAAE,CAAC;gBACjB,OAAO,CAAC,UAAU,EAAE,CAAC,CAAC,EAAE,KAAK,CAAC,MAAM,EAAE,IAAI,CAAC,YAAY,CAAC,CAAC;YAC3D,CAAC;YAED,OAAO,WAAW,CAAC;QACrB,CAAC,CAAC,CAAC;QAEH,OAAO,CAAC,GAAG,CAAC,cAAc,YAAY,CAAC,MAAM,qBAAqB,CAAC,CAAC;IACtE,CAAC;IAED,8BAA8B;IAC9B,MAAM,YAAY,GAAa,EAAE,CAAC;IAClC,MAAM,WAAW,GAA4C,EAAE,CAAC;IAEhE,KAAK,MAAM,IAAI,IAAI,YAAY,EAAE,CAAC;QAChC,mDAAmD;QACnD,0FAA0F;QAC1F,MAAM,YAAY,GAAG,cAAc,CAAC,IAAI,CAAC,CAAC;QAC1C,MAAM,mBAAmB,GAAG,YAAY,CAAC,QAAQ,CAAC;QAElD,iCAAiC;QACjC,MAAM,SAAS,GAAG,gBAAgB,CAAC,mBAAmB,EAAE,IAAI,CAAC,CAAC;QAC9D,YAAY,CAAC,IAAI,CAAC,SAAS,CAAC,CAAC;QAE7B,MAAM,IAAI,GAAG,WAAW,CAAC,IAAI,CAAC,CAAC;QAC/B,WAAW,CAAC,IAAI,CAAC,EAAE,IAAI,EAAE,IAAI,EAAE,CAAC,CAAC;IACnC,CAAC;IAED,IAAI,YAAY,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;QAC9B,OAAO,CAAC,GAAG,CAAC,sCAAsC,CAAC,CAAC;QAEpD,MAAM,KAAK,GAAG,OAAO,CAAC,QAAQ,EAAE,CAAC;QACjC,OAAO;YACL,UAAU,EAAE,KAAK,CAAC,MAAM;YACxB,YAAY,EAAE,CAAC;YACf,YAAY,EAAE,KAAK,CAAC,MAAM;YAC1B,WAAW,EAAE,CAAC;YACd,QAAQ,EAAE,CAAC,IAAI,CAAC,GAAG,EAAE,GAAG,SAAS,CAAC,GAAG,IAAI;YACzC,KAAK,EAAE,cAAc;YACrB,WAAW,EAAE,KAAK,CAAC,SAAS;SAC7B,CAAC;IACJ,CAAC;IAED,iDAAiD;IACjD,OAAO,CAAC,GAAG,CAAC,+BAA+B,YAAY,CAAC,MAAM,WAAW,CAAC,CAAC;IAE3E,MAAM,UAAU,GAAG,OAAO,KAAK,QAAQ;QACrC,CAAC,CAAC,MAAM,wBAAwB,CAAC,YAAY,EAAE,cAAc,EAAE,CAAC,OAAO,EAAE,KAAK,EAAE,EAAE;YAC9E,MAAM,OAAO,GAAG,CAAC,CAAC,OAAO,GAAG,KAAK,CAAC,GAAG,GAAG,CAAC,CAAC,OAAO,CAAC,CAAC,CAAC,CAAC;YACrD,MAAM,IAAI,GAAG,WAAW,CAAC,OAAO,GAAG,CAAC,CAAC,EAAE,IAAI,CAAC,YAAY,IAAI,SAAS,CAAC;YACtE,OAAO,CAAC,GAAG,CAAC,MAAM,OAAO,MAAM,IAAI,EAAE,CAAC,CAAC;YACvC,OAAO,CAAC,UAAU,EAAE,CAAC,OAAO,EAAE,KAAK,EAAE,IAAI,CAAC,CAAC;QAC7C,CAAC,CAAC;QACJ,CAAC,CAAC,MAAM,qBAAqB,CAAC,YAAY,EAAE,cAAc,EAAE,CAAC,OAAO,EAAE,KAAK,EAAE,EAAE;YAC3E,MAAM,OAAO,GAAG,CAAC,CAAC,OAAO,GAAG,KAAK,CAAC,GAAG,GAAG,CAAC,CAAC,OAAO,CAAC,CAAC,CAAC,CAAC;YACrD,MAAM,IAAI,GAAG,WAAW,CAAC,OAAO,GAAG,CAAC,CAAC,EAAE,IAAI,CAAC,YAAY,IAAI,SAAS,CAAC;YACtE,OAAO,CAAC,GAAG,CAAC,MAAM,OAAO,MAAM,IAAI,EAAE,CAAC,CAAC;YACvC,OAAO,CAAC,UAAU,EAAE,CAAC,OAAO,EAAE,KAAK,EAAE,IAAI,CAAC,CAAC;QAC7C,CAAC,CAAC,CAAC;IAEP,sDAAsD;IACtD,MAAM,aAAa,GAAG,sBAAsB,CAAC,UAAU,CAAC,CAAC;IACzD,IAAI,aAAa,CAAC,QAAQ,EAAE,CAAC;QAC3B,OAAO,CAAC,IAAI,CAAC,gDAAgD,CAAC,aAAa,CAAC,QAAQ,GAAG,GAAG,CAAC,CAAC,OAAO,CAAC,CAAC,CAAC,IAAI,CAAC,CAAC;QAC5G,OAAO,CAAC,IAAI,CAAC,oFAAoF,CAAC,CAAC;QACnG,OAAO,CAAC,IAAI,CAAC,kFAAkF,CAAC,CAAC;IACnG,CAAC;SAAM,CAAC;QACN,OAAO,CAAC,GAAG,CAAC,2BAA2B,CAAC,aAAa,CAAC,QAAQ,GAAG,GAAG,CAAC,CAAC,OAAO,CAAC,CAAC,CAAC,UAAU,CAAC,CAAC;IAC9F,CAAC;IAED,mBAAmB;IACnB,OAAO,CAAC,GAAG,CAAC,yBAAyB,CAAC,CAAC;IAEvC,IAAI,YAAY,GAAG,CAAC,CAAC;IACrB,IAAI,WAAW,GAAG,CAAC,CAAC;IAEpB,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,UAAU,CAAC,MAAM,EAAE,CAAC,EAAE,EAAE,CAAC;QAC3C,IAAI,CAAC;YACH,MAAM,EAAE,IAAI,EAAE,IAAI,EAAE,GAAG,WAAW,CAAC,CAAC,CAAC,CAAC;YACtC,MAAM,SAAS,GAAG,UAAU,CAAC,CAAC,CAAC,CAAC;YAChC,MAAM,QAAQ,GAAG,eAAe,CAAC,IAAI,EAAE,SAAS,CAAC,CAAC;YAClD,MAAM,OAAO,GAAG,eAAe,CAAC,IAAI,CAAC,OAAO,CAAC,CAAC;YAE9C,qBAAqB;YACrB,IAAI,CAAC,SAAS,IAAI,SAAS,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;gBACzC,OAAO,CAAC,IAAI,CAAC,kCAAkC,IAAI,CAAC,YAAY,YAAY,CAAC,CAAC;gBAC9E,WAAW,EAAE,CAAC;gBACd,SAAS;YACX,CAAC;YAED,MAAM,OAAO,CAAC,UAAU,CAAC;gBACvB,IAAI,EAAE,IAAI,CAAC,YAAY;gBACvB,UAAU,EAAE,IAAI,CAAC,QAAQ;gBACzB,SAAS;gBACT,QAAQ;gBACR,OAAO;gBACP,UAAU,EAAE,IAAI,IAAI,EAAE,CAAC,WAAW,EAAE;gBACpC,IAAI;aACL,CAAC,CAAC;YAEH,YAAY,EAAE,CAAC;QACjB,CAAC;QAAC,OAAO,KAAK,EAAE,CAAC;YACf,OAAO,CAAC,KAAK,CAAC,0BAA0B,CAAC,GAAG,CAAC,KAAM,KAAe,CAAC,OAAO,EAAE,CAAC,CAAC;YAC9E,WAAW,EAAE,CAAC;QAChB,CAAC;IACH,CAAC;IAED,eAAe;IACf,MAAM,OAAO,CAAC,IAAI,EAAE,CAAC;IAErB,eAAe;IACf,MAAM,OAAO,CAAC,IAAI,EAAE,CAAC;IAErB,eAAe;IACf,MAAM,OAAO,CAAC,IAAI,EAAE,CAAC;IAErB,MAAM,QAAQ,GAAG,CAAC,IAAI,CAAC,GAAG,EAAE,GAAG,SAAS,CAAC,GAAG,IAAI,CAAC;IACjD,MAAM,KAAK,GAAG,OAAO,CAAC,QAAQ,EAAE,CAAC;IAEjC,OAAO,CAAC,GAAG,CAAC,eAAe,YAAY,aAAa,cAAc,CAAC,QAAQ,CAAC,EAAE,CAAC,CAAC;IAChF,OAAO,CAAC,GAAG,CAAC,cAAc,KAAK,CAAC,MAAM,GAAG,YAAY,QAAQ,CAAC,CAAC;IAC/D,IAAI,WAAW,GAAG,CAAC,EAAE,CAAC;QACpB,OAAO,CAAC,GAAG,CAAC,aAAa,WAAW,QAAQ,CAAC,CAAC;IAChD,CAAC;IACD,OAAO,CAAC,GAAG,CAAC,mBAAmB,cAAc,CAAC,KAAK,CAAC,SAAS,CAAC,EAAE,CAAC,CAAC;IAElE,OAAO;QACL,UAAU,EAAE,KAAK,CAAC,MAAM;QACxB,YAAY,EAAE,YAAY;QAC1B,YAAY,EAAE,KAAK,CAAC,MAAM,GAAG,YAAY;QACzC,WAAW,EAAE,WAAW;QACxB,QAAQ;QACR,KAAK,EAAE,cAAc;QACrB,WAAW,EAAE,KAAK,CAAC,SAAS;KAC7B,CAAC;AACJ,CAAC"}
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Content preprocessing for embedding generation
|
|
3
|
+
*
|
|
4
|
+
* The Qwen3-Embedding-0.6B-4bit-DWQ model is overly sensitive to text STRUCTURE patterns.
|
|
5
|
+
* Obsidian notes with similar markdown structure get identical embeddings.
|
|
6
|
+
*
|
|
7
|
+
* This module preprocesses note content to:
|
|
8
|
+
* 1. Extract semantic metadata (title, tags, repo, context)
|
|
9
|
+
* 2. Remove markdown structural elements
|
|
10
|
+
* 3. Normalize text for consistent embedding
|
|
11
|
+
*/
|
|
12
|
+
import { NoteFile } from './scanner.js';
|
|
13
|
+
export interface PreprocessedContent {
|
|
14
|
+
/** Original content without frontmatter */
|
|
15
|
+
text: string;
|
|
16
|
+
/** Metadata as keyword string */
|
|
17
|
+
metadata: string;
|
|
18
|
+
/** Combined metadata + cleaned text for embedding */
|
|
19
|
+
combined: string;
|
|
20
|
+
}
|
|
21
|
+
/**
|
|
22
|
+
* Preprocess note content for embedding generation
|
|
23
|
+
*/
|
|
24
|
+
export declare function preprocessNote(note: NoteFile): PreprocessedContent;
|
|
25
|
+
/**
|
|
26
|
+
* Check if embeddings have sufficient variance
|
|
27
|
+
* Returns true if embeddings are too similar (potential problem)
|
|
28
|
+
*/
|
|
29
|
+
export declare function checkEmbeddingVariance(embeddings: number[][], threshold?: number): {
|
|
30
|
+
variance: number;
|
|
31
|
+
isTooLow: boolean;
|
|
32
|
+
avgSimilarity: number;
|
|
33
|
+
};
|
|
34
|
+
//# sourceMappingURL=preprocessor.d.ts.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"preprocessor.d.ts","sourceRoot":"","sources":["../src/preprocessor.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;GAUG;AAEH,OAAO,EAAE,QAAQ,EAAE,MAAM,cAAc,CAAC;AAExC,MAAM,WAAW,mBAAmB;IAClC,2CAA2C;IAC3C,IAAI,EAAE,MAAM,CAAC;IACb,iCAAiC;IACjC,QAAQ,EAAE,MAAM,CAAC;IACjB,qDAAqD;IACrD,QAAQ,EAAE,MAAM,CAAC;CAClB;AAED;;GAEG;AACH,wBAAgB,cAAc,CAAC,IAAI,EAAE,QAAQ,GAAG,mBAAmB,CAwClE;AA4ED;;;GAGG;AACH,wBAAgB,sBAAsB,CAAC,UAAU,EAAE,MAAM,EAAE,EAAE,EAAE,SAAS,GAAE,MAAa,GAAG;IACxF,QAAQ,EAAE,MAAM,CAAC;IACjB,QAAQ,EAAE,OAAO,CAAC;IAClB,aAAa,EAAE,MAAM,CAAC;CACvB,CA4BA"}
|
|
@@ -0,0 +1,146 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Content preprocessing for embedding generation
|
|
3
|
+
*
|
|
4
|
+
* The Qwen3-Embedding-0.6B-4bit-DWQ model is overly sensitive to text STRUCTURE patterns.
|
|
5
|
+
* Obsidian notes with similar markdown structure get identical embeddings.
|
|
6
|
+
*
|
|
7
|
+
* This module preprocesses note content to:
|
|
8
|
+
* 1. Extract semantic metadata (title, tags, repo, context)
|
|
9
|
+
* 2. Remove markdown structural elements
|
|
10
|
+
* 3. Normalize text for consistent embedding
|
|
11
|
+
*/
|
|
12
|
+
/**
|
|
13
|
+
* Preprocess note content for embedding generation
|
|
14
|
+
*/
|
|
15
|
+
export function preprocessNote(note) {
|
|
16
|
+
const { data, content } = note.frontmatter;
|
|
17
|
+
// Extract metadata as keywords for semantic context
|
|
18
|
+
const metadataParts = [];
|
|
19
|
+
if (data.title) {
|
|
20
|
+
metadataParts.push(`title:${data.title}`);
|
|
21
|
+
}
|
|
22
|
+
if (data.tags) {
|
|
23
|
+
const tags = Array.isArray(data.tags) ? data.tags.join(',') : data.tags;
|
|
24
|
+
metadataParts.push(`tags:${tags}`);
|
|
25
|
+
}
|
|
26
|
+
if (data.repo) {
|
|
27
|
+
metadataParts.push(`repo:${data.repo}`);
|
|
28
|
+
}
|
|
29
|
+
if (data.context) {
|
|
30
|
+
metadataParts.push(`context:${data.context}`);
|
|
31
|
+
}
|
|
32
|
+
if (data.type) {
|
|
33
|
+
metadataParts.push(`type:${data.type}`);
|
|
34
|
+
}
|
|
35
|
+
// Clean markdown structure while preserving semantic content
|
|
36
|
+
const cleaned = cleanMarkdownStructure(content);
|
|
37
|
+
// Combine metadata and cleaned content
|
|
38
|
+
// Metadata comes first to provide context for the embedding model
|
|
39
|
+
const metadataStr = metadataParts.join(' ');
|
|
40
|
+
const combined = metadataStr ? `${metadataStr} ${cleaned}` : cleaned;
|
|
41
|
+
return {
|
|
42
|
+
text: cleaned,
|
|
43
|
+
metadata: metadataStr,
|
|
44
|
+
combined
|
|
45
|
+
};
|
|
46
|
+
}
|
|
47
|
+
/**
|
|
48
|
+
* Remove markdown structural elements while preserving content
|
|
49
|
+
*/
|
|
50
|
+
function cleanMarkdownStructure(content) {
|
|
51
|
+
return content
|
|
52
|
+
// Remove YAML frontmatter (already removed by gray-matter, but safety check)
|
|
53
|
+
.replace(/^---\n.*?\n---\n/gs, '')
|
|
54
|
+
// Remove ATX-style headings (# ## ###) but keep the heading text
|
|
55
|
+
.replace(/^#{1,6}\s+/gm, '')
|
|
56
|
+
// Remove Setext-style underlines for headings
|
|
57
|
+
.replace(/^[-=]{3,}\s*$/gm, '')
|
|
58
|
+
// Remove unordered list markers (-, *, +)
|
|
59
|
+
.replace(/^[\s]*[-*+]\s+/gm, '')
|
|
60
|
+
// Remove ordered list markers (1., 2., etc.)
|
|
61
|
+
.replace(/^[\s]*\d+\.\s+/gm, '')
|
|
62
|
+
// Remove checkbox markers [ ] and [x]
|
|
63
|
+
.replace(/\[[ xX]\]\s*/g, '')
|
|
64
|
+
// Remove horizontal rules (---, ***, ___)
|
|
65
|
+
.replace(/^[\s]*[-_*]{3,}\s*$/gm, '')
|
|
66
|
+
// Remove blockquote markers (>)
|
|
67
|
+
.replace(/^>\s*/gm, '')
|
|
68
|
+
// Remove code block markers (``` and ~~~) but keep code content
|
|
69
|
+
.replace(/^```[\w]*\n?/gm, '')
|
|
70
|
+
.replace(/^~~~[\w]*\n?/gm, '')
|
|
71
|
+
// Remove inline code backticks but keep content
|
|
72
|
+
.replace(/`([^`]+)`/g, '$1')
|
|
73
|
+
// Remove bold/italic markers
|
|
74
|
+
.replace(/\*\*\*([^*]+)\*\*\*/g, '$1')
|
|
75
|
+
.replace(/\*\*([^*]+)\*\*/g, '$1')
|
|
76
|
+
.replace(/\*([^*]+)\*/g, '$1')
|
|
77
|
+
.replace(/___([^_]+)___/g, '$1')
|
|
78
|
+
.replace(/__([^_]+)__/g, '$1')
|
|
79
|
+
.replace(/_([^_]+)_/g, '$1')
|
|
80
|
+
// Remove strikethrough
|
|
81
|
+
.replace(/~~([^~]+)~~/g, '$1')
|
|
82
|
+
// Remove links but keep link text
|
|
83
|
+
.replace(/\[([^\]]+)\]\([^)]+\)/g, '$1')
|
|
84
|
+
.replace(/\[([^\]]+)\]\[[^\]]+\]/g, '$1')
|
|
85
|
+
// Remove image links
|
|
86
|
+
.replace(/!\[([^\]]*)\]\([^)]+\)/g, '$1')
|
|
87
|
+
// Remove table formatting but keep content
|
|
88
|
+
.replace(/^\|[-:\s|]+\|\s*$/gm, '')
|
|
89
|
+
.replace(/\|/g, ' ')
|
|
90
|
+
// Remove footnote references
|
|
91
|
+
.replace(/\[\^([^\]]+)\]/g, '')
|
|
92
|
+
// Remove HTML tags
|
|
93
|
+
.replace(/<[^>]+>/g, '')
|
|
94
|
+
// Normalize excessive whitespace (3+ newlines to 2)
|
|
95
|
+
.replace(/\n{3,}/g, '\n\n')
|
|
96
|
+
// Normalize multiple spaces to single space
|
|
97
|
+
.replace(/[ \t]+/g, ' ')
|
|
98
|
+
// Trim leading/trailing whitespace
|
|
99
|
+
.trim();
|
|
100
|
+
}
|
|
101
|
+
/**
|
|
102
|
+
* Check if embeddings have sufficient variance
|
|
103
|
+
* Returns true if embeddings are too similar (potential problem)
|
|
104
|
+
*/
|
|
105
|
+
export function checkEmbeddingVariance(embeddings, threshold = 0.95) {
|
|
106
|
+
if (embeddings.length < 2) {
|
|
107
|
+
return { variance: 1, isTooLow: false, avgSimilarity: 0 };
|
|
108
|
+
}
|
|
109
|
+
// Calculate pairwise cosine similarities
|
|
110
|
+
let totalSimilarity = 0;
|
|
111
|
+
let comparisons = 0;
|
|
112
|
+
for (let i = 0; i < Math.min(embeddings.length, 10); i++) {
|
|
113
|
+
for (let j = i + 1; j < Math.min(embeddings.length, 10); j++) {
|
|
114
|
+
const similarity = cosineSimilarity(embeddings[i].slice(0, 100), // Only check first 100 dims for speed
|
|
115
|
+
embeddings[j].slice(0, 100));
|
|
116
|
+
totalSimilarity += similarity;
|
|
117
|
+
comparisons++;
|
|
118
|
+
}
|
|
119
|
+
}
|
|
120
|
+
const avgSimilarity = comparisons > 0 ? totalSimilarity / comparisons : 0;
|
|
121
|
+
const variance = 1 - avgSimilarity;
|
|
122
|
+
return {
|
|
123
|
+
variance,
|
|
124
|
+
isTooLow: avgSimilarity > threshold,
|
|
125
|
+
avgSimilarity
|
|
126
|
+
};
|
|
127
|
+
}
|
|
128
|
+
/**
|
|
129
|
+
* Calculate cosine similarity between two vectors
|
|
130
|
+
*/
|
|
131
|
+
function cosineSimilarity(a, b) {
|
|
132
|
+
if (a.length !== b.length) {
|
|
133
|
+
// Truncate to shorter length
|
|
134
|
+
const minLen = Math.min(a.length, b.length);
|
|
135
|
+
a = a.slice(0, minLen);
|
|
136
|
+
b = b.slice(0, minLen);
|
|
137
|
+
}
|
|
138
|
+
const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
|
|
139
|
+
const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
|
|
140
|
+
const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
|
|
141
|
+
if (magnitudeA === 0 || magnitudeB === 0) {
|
|
142
|
+
return 0;
|
|
143
|
+
}
|
|
144
|
+
return dotProduct / (magnitudeA * magnitudeB);
|
|
145
|
+
}
|
|
146
|
+
//# sourceMappingURL=preprocessor.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"preprocessor.js","sourceRoot":"","sources":["../src/preprocessor.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;GAUG;AAaH;;GAEG;AACH,MAAM,UAAU,cAAc,CAAC,IAAc;IAC3C,MAAM,EAAE,IAAI,EAAE,OAAO,EAAE,GAAG,IAAI,CAAC,WAAW,CAAC;IAE3C,oDAAoD;IACpD,MAAM,aAAa,GAAa,EAAE,CAAC;IAEnC,IAAI,IAAI,CAAC,KAAK,EAAE,CAAC;QACf,aAAa,CAAC,IAAI,CAAC,SAAS,IAAI,CAAC,KAAK,EAAE,CAAC,CAAC;IAC5C,CAAC;IAED,IAAI,IAAI,CAAC,IAAI,EAAE,CAAC;QACd,MAAM,IAAI,GAAG,KAAK,CAAC,OAAO,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC,IAAI,CAAC,IAAI,CAAC,IAAI,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,IAAI,CAAC,IAAI,CAAC;QACxE,aAAa,CAAC,IAAI,CAAC,QAAQ,IAAI,EAAE,CAAC,CAAC;IACrC,CAAC;IAED,IAAI,IAAI,CAAC,IAAI,EAAE,CAAC;QACd,aAAa,CAAC,IAAI,CAAC,QAAQ,IAAI,CAAC,IAAI,EAAE,CAAC,CAAC;IAC1C,CAAC;IAED,IAAI,IAAI,CAAC,OAAO,EAAE,CAAC;QACjB,aAAa,CAAC,IAAI,CAAC,WAAW,IAAI,CAAC,OAAO,EAAE,CAAC,CAAC;IAChD,CAAC;IAED,IAAI,IAAI,CAAC,IAAI,EAAE,CAAC;QACd,aAAa,CAAC,IAAI,CAAC,QAAQ,IAAI,CAAC,IAAI,EAAE,CAAC,CAAC;IAC1C,CAAC;IAED,6DAA6D;IAC7D,MAAM,OAAO,GAAG,sBAAsB,CAAC,OAAO,CAAC,CAAC;IAEhD,uCAAuC;IACvC,kEAAkE;IAClE,MAAM,WAAW,GAAG,aAAa,CAAC,IAAI,CAAC,GAAG,CAAC,CAAC;IAC5C,MAAM,QAAQ,GAAG,WAAW,CAAC,CAAC,CAAC,GAAG,WAAW,IAAI,OAAO,EAAE,CAAC,CAAC,CAAC,OAAO,CAAC;IAErE,OAAO;QACL,IAAI,EAAE,OAAO;QACb,QAAQ,EAAE,WAAW;QACrB,QAAQ;KACT,CAAC;AACJ,CAAC;AAED;;GAEG;AACH,SAAS,sBAAsB,CAAC,OAAe;IAC7C,OAAO,OAAO;QACZ,6EAA6E;SAC5E,OAAO,CAAC,oBAAoB,EAAE,EAAE,CAAC;QAElC,iEAAiE;SAChE,OAAO,CAAC,cAAc,EAAE,EAAE,CAAC;QAE5B,8CAA8C;SAC7C,OAAO,CAAC,iBAAiB,EAAE,EAAE,CAAC;QAE/B,0CAA0C;SACzC,OAAO,CAAC,kBAAkB,EAAE,EAAE,CAAC;QAEhC,6CAA6C;SAC5C,OAAO,CAAC,kBAAkB,EAAE,EAAE,CAAC;QAEhC,sCAAsC;SACrC,OAAO,CAAC,eAAe,EAAE,EAAE,CAAC;QAE7B,0CAA0C;SACzC,OAAO,CAAC,uBAAuB,EAAE,EAAE,CAAC;QAErC,gCAAgC;SAC/B,OAAO,CAAC,SAAS,EAAE,EAAE,CAAC;QAEvB,gEAAgE;SAC/D,OAAO,CAAC,gBAAgB,EAAE,EAAE,CAAC;SAC7B,OAAO,CAAC,gBAAgB,EAAE,EAAE,CAAC;QAE9B,gDAAgD;SAC/C,OAAO,CAAC,YAAY,EAAE,IAAI,CAAC;QAE5B,6BAA6B;SAC5B,OAAO,CAAC,sBAAsB,EAAE,IAAI,CAAC;SACrC,OAAO,CAAC,kBAAkB,EAAE,IAAI,CAAC;SACjC,OAAO,CAAC,cAAc,EAAE,IAAI,CAAC;SAC7B,OAAO,CAAC,gBAAgB,EAAE,IAAI,CAAC;SAC/B,OAAO,CAAC,cAAc,EAAE,IAAI,CAAC;SAC7B,OAAO,CAAC,YAAY,EAAE,IAAI,CAAC;QAE5B,uBAAuB;SACtB,OAAO,CAAC,cAAc,EAAE,IAAI,CAAC;QAE9B,kCAAkC;SACjC,OAAO,CAAC,wBAAwB,EAAE,IAAI,CAAC;SACvC,OAAO,CAAC,yBAAyB,EAAE,IAAI,CAAC;QAEzC,qBAAqB;SACpB,OAAO,CAAC,yBAAyB,EAAE,IAAI,CAAC;QAEzC,2CAA2C;SAC1C,OAAO,CAAC,qBAAqB,EAAE,EAAE,CAAC;SAClC,OAAO,CAAC,KAAK,EAAE,GAAG,CAAC;QAEpB,6BAA6B;SAC5B,OAAO,CAAC,iBAAiB,EAAE,EAAE,CAAC;QAE/B,mBAAmB;SAClB,OAAO,CAAC,UAAU,EAAE,EAAE,CAAC;QAExB,oDAAoD;SACnD,OAAO,CAAC,SAAS,EAAE,MAAM,CAAC;QAE3B,4CAA4C;SAC3C,OAAO,CAAC,SAAS,EAAE,GAAG,CAAC;QAExB,mCAAmC;SAClC,IAAI,EAAE,CAAC;AACZ,CAAC;AAED;;;GAGG;AACH,MAAM,UAAU,sBAAsB,CAAC,UAAsB,EAAE,YAAoB,IAAI;IAKrF,IAAI,UAAU,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;QAC1B,OAAO,EAAE,QAAQ,EAAE,CAAC,EAAE,QAAQ,EAAE,KAAK,EAAE,aAAa,EAAE,CAAC,EAAE,CAAC;IAC5D,CAAC;IAED,yCAAyC;IACzC,IAAI,eAAe,GAAG,CAAC,CAAC;IACxB,IAAI,WAAW,GAAG,CAAC,CAAC;IAEpB,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,IAAI,CAAC,GAAG,CAAC,UAAU,CAAC,MAAM,EAAE,EAAE,CAAC,EAAE,CAAC,EAAE,EAAE,CAAC;QACzD,KAAK,IAAI,CAAC,GAAG,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,IAAI,CAAC,GAAG,CAAC,UAAU,CAAC,MAAM,EAAE,EAAE,CAAC,EAAE,CAAC,EAAE,EAAE,CAAC;YAC7D,MAAM,UAAU,GAAG,gBAAgB,CACjC,UAAU,CAAC,CAAC,CAAC,CAAC,KAAK,CAAC,CAAC,EAAE,GAAG,CAAC,EAAE,sCAAsC;YACnE,UAAU,CAAC,CAAC,CAAC,CAAC,KAAK,CAAC,CAAC,EAAE,GAAG,CAAC,CAC5B,CAAC;YACF,eAAe,IAAI,UAAU,CAAC;YAC9B,WAAW,EAAE,CAAC;QAChB,CAAC;IACH,CAAC;IAED,MAAM,aAAa,GAAG,WAAW,GAAG,CAAC,CAAC,CAAC,CAAC,eAAe,GAAG,WAAW,CAAC,CAAC,CAAC,CAAC,CAAC;IAC1E,MAAM,QAAQ,GAAG,CAAC,GAAG,aAAa,CAAC;IAEnC,OAAO;QACL,QAAQ;QACR,QAAQ,EAAE,aAAa,GAAG,SAAS;QACnC,aAAa;KACd,CAAC;AACJ,CAAC;AAED;;GAEG;AACH,SAAS,gBAAgB,CAAC,CAAW,EAAE,CAAW;IAChD,IAAI,CAAC,CAAC,MAAM,KAAK,CAAC,CAAC,MAAM,EAAE,CAAC;QAC1B,6BAA6B;QAC7B,MAAM,MAAM,GAAG,IAAI,CAAC,GAAG,CAAC,CAAC,CAAC,MAAM,EAAE,CAAC,CAAC,MAAM,CAAC,CAAC;QAC5C,CAAC,GAAG,CAAC,CAAC,KAAK,CAAC,CAAC,EAAE,MAAM,CAAC,CAAC;QACvB,CAAC,GAAG,CAAC,CAAC,KAAK,CAAC,CAAC,EAAE,MAAM,CAAC,CAAC;IACzB,CAAC;IAED,MAAM,UAAU,GAAG,CAAC,CAAC,MAAM,CAAC,CAAC,GAAG,EAAE,GAAG,EAAE,CAAC,EAAE,EAAE,CAAC,GAAG,GAAG,GAAG,GAAG,CAAC,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC;IAClE,MAAM,UAAU,GAAG,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC,MAAM,CAAC,CAAC,GAAG,EAAE,GAAG,EAAE,EAAE,CAAC,GAAG,GAAG,GAAG,GAAG,GAAG,EAAE,CAAC,CAAC,CAAC,CAAC;IACzE,MAAM,UAAU,GAAG,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC,MAAM,CAAC,CAAC,GAAG,EAAE,GAAG,EAAE,EAAE,CAAC,GAAG,GAAG,GAAG,GAAG,GAAG,EAAE,CAAC,CAAC,CAAC,CAAC;IAEzE,IAAI,UAAU,KAAK,CAAC,IAAI,UAAU,KAAK,CAAC,EAAE,CAAC;QACzC,OAAO,CAAC,CAAC;IACX,CAAC;IAED,OAAO,UAAU,GAAG,CAAC,UAAU,GAAG,UAAU,CAAC,CAAC;AAChD,CAAC"}
|
package/dist/search.d.ts
CHANGED
package/dist/search.d.ts.map
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"search.d.ts","sourceRoot":"","sources":["../src/search.ts"],"names":[],"mappings":"
|
|
1
|
+
{"version":3,"file":"search.d.ts","sourceRoot":"","sources":["../src/search.ts"],"names":[],"mappings":"AAMA,MAAM,WAAW,YAAY;IAC3B,IAAI,EAAE,MAAM,CAAC;IACb,UAAU,EAAE,MAAM,CAAC;IACnB,KAAK,EAAE,MAAM,CAAC;IACd,QAAQ,EAAE;QACR,IAAI,CAAC,EAAE,MAAM,CAAC;QACd,IAAI,CAAC,EAAE,MAAM,CAAC;QACd,OAAO,CAAC,EAAE,MAAM,CAAC;QACjB,QAAQ,CAAC,EAAE,MAAM,CAAC;QAClB,IAAI,CAAC,EAAE,MAAM,EAAE,CAAC;QAChB,KAAK,CAAC,EAAE,MAAM,CAAC;KAChB,CAAC;IACF,OAAO,EAAE,MAAM,CAAC;CACjB;AAED,MAAM,WAAW,aAAa;IAC5B,KAAK,EAAE,MAAM,CAAC;IACd,WAAW,EAAE,MAAM,CAAC;IACpB,IAAI,CAAC,EAAE,MAAM,CAAC;IACd,KAAK,CAAC,EAAE,MAAM,CAAC;IACf,OAAO,CAAC,EAAE,QAAQ,GAAG,KAAK,CAAC;IAC3B,QAAQ,CAAC,EAAE,MAAM,CAAC;CACnB;AAED,MAAM,WAAW,cAAc;IAC7B,KAAK,EAAE,MAAM,CAAC;IACd,OAAO,EAAE,YAAY,EAAE,CAAC;IACxB,UAAU,EAAE,MAAM,CAAC;IACnB,QAAQ,EAAE,MAAM,CAAC;CAClB;AAED;;GAEG;AACH,wBAAsB,WAAW,CAAC,OAAO,EAAE,aAAa,GAAG,OAAO,CAAC,cAAc,CAAC,CAsFjF;AAED;;GAEG;AACH,wBAAgB,mBAAmB,CAAC,QAAQ,EAAE,cAAc,GAAG,MAAM,CAoCpE"}
|
package/dist/search.js
CHANGED
|
@@ -1,4 +1,5 @@
|
|
|
1
|
-
import { generateEmbedding,
|
|
1
|
+
import { generateEmbedding as generateOllamaEmbedding, checkOllamaConnection } from './ollama.js';
|
|
2
|
+
import { generateEmbedding as generateMLXEmbedding, checkMLXConnection } from './mlx.js';
|
|
2
3
|
import { StorageManager } from './storage.js';
|
|
3
4
|
import { cosineSimilarity, truncateToTokens } from './utils.js';
|
|
4
5
|
/**
|
|
@@ -6,11 +7,20 @@ import { cosineSimilarity, truncateToTokens } from './utils.js';
|
|
|
6
7
|
*/
|
|
7
8
|
export async function searchNotes(options) {
|
|
8
9
|
const startTime = Date.now();
|
|
9
|
-
const { query, storagePath, topK = 10,
|
|
10
|
-
//
|
|
11
|
-
|
|
10
|
+
const { query, storagePath, topK = 10, backend = 'ollama', model, minScore = 0.0 } = options;
|
|
11
|
+
// Determine model and connection check based on backend
|
|
12
|
+
let effectiveModel = model;
|
|
13
|
+
let isConnected = false;
|
|
14
|
+
if (backend === 'ollama') {
|
|
15
|
+
isConnected = await checkOllamaConnection();
|
|
16
|
+
effectiveModel = effectiveModel || 'qwen3-embedding:0.6b';
|
|
17
|
+
}
|
|
18
|
+
else {
|
|
19
|
+
isConnected = await checkMLXConnection();
|
|
20
|
+
effectiveModel = effectiveModel || 'mlx-community/Qwen3-Embedding-0.6B-4bit-DWQ';
|
|
21
|
+
}
|
|
12
22
|
if (!isConnected) {
|
|
13
|
-
throw new Error(
|
|
23
|
+
throw new Error(`Cannot connect to embedding server. Make sure ${backend === 'ollama' ? 'Ollama' : 'MLX'} is running.`);
|
|
14
24
|
}
|
|
15
25
|
// Load storage
|
|
16
26
|
const storage = new StorageManager(storagePath);
|
|
@@ -24,9 +34,11 @@ export async function searchNotes(options) {
|
|
|
24
34
|
duration: (Date.now() - startTime) / 1000
|
|
25
35
|
};
|
|
26
36
|
}
|
|
27
|
-
// Generate query embedding
|
|
37
|
+
// Generate query embedding using the selected backend
|
|
28
38
|
const truncatedQuery = truncateToTokens(query, 8000);
|
|
29
|
-
const queryEmbedding =
|
|
39
|
+
const queryEmbedding = backend === 'ollama'
|
|
40
|
+
? await generateOllamaEmbedding(truncatedQuery, effectiveModel)
|
|
41
|
+
: await generateMLXEmbedding(truncatedQuery, effectiveModel);
|
|
30
42
|
// Calculate similarities
|
|
31
43
|
const results = [];
|
|
32
44
|
for (const note of notes) {
|
package/dist/search.js.map
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"search.js","sourceRoot":"","sources":["../src/search.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,iBAAiB,EAAE,kBAAkB,EAAE,MAAM,UAAU,CAAC;
|
|
1
|
+
{"version":3,"file":"search.js","sourceRoot":"","sources":["../src/search.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,iBAAiB,IAAI,uBAAuB,EAAE,qBAAqB,EAAE,MAAM,aAAa,CAAC;AAClG,OAAO,EAAE,iBAAiB,IAAI,oBAAoB,EAAE,kBAAkB,EAAE,MAAM,UAAU,CAAC;AACzF,OAAO,EAAE,cAAc,EAAiB,MAAM,cAAc,CAAC;AAC7D,OAAO,EAAE,gBAAgB,EAAE,gBAAgB,EAAE,MAAM,YAAY,CAAC;AAkChE;;GAEG;AACH,MAAM,CAAC,KAAK,UAAU,WAAW,CAAC,OAAsB;IACtD,MAAM,SAAS,GAAG,IAAI,CAAC,GAAG,EAAE,CAAC;IAC7B,MAAM,EAAE,KAAK,EAAE,WAAW,EAAE,IAAI,GAAG,EAAE,EAAE,OAAO,GAAG,QAAQ,EAAE,KAAK,EAAE,QAAQ,GAAG,GAAG,EAAE,GAAG,OAAO,CAAC;IAE7F,wDAAwD;IACxD,IAAI,cAAc,GAAG,KAAK,CAAC;IAC3B,IAAI,WAAW,GAAG,KAAK,CAAC;IAExB,IAAI,OAAO,KAAK,QAAQ,EAAE,CAAC;QACzB,WAAW,GAAG,MAAM,qBAAqB,EAAE,CAAC;QAC5C,cAAc,GAAG,cAAc,IAAI,sBAAsB,CAAC;IAC5D,CAAC;SAAM,CAAC;QACN,WAAW,GAAG,MAAM,kBAAkB,EAAE,CAAC;QACzC,cAAc,GAAG,cAAc,IAAI,6CAA6C,CAAC;IACnF,CAAC;IAED,IAAI,CAAC,WAAW,EAAE,CAAC;QACjB,MAAM,IAAI,KAAK,CAAC,iDAAiD,OAAO,KAAK,QAAQ,CAAC,CAAC,CAAC,QAAQ,CAAC,CAAC,CAAC,KAAK,cAAc,CAAC,CAAC;IAC1H,CAAC;IAED,eAAe;IACf,MAAM,OAAO,GAAG,IAAI,cAAc,CAAC,WAAW,CAAC,CAAC;IAChD,MAAM,OAAO,CAAC,UAAU,EAAE,CAAC;IAE3B,MAAM,KAAK,GAAG,OAAO,CAAC,QAAQ,EAAE,CAAC;IAEjC,IAAI,KAAK,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;QACvB,OAAO;YACL,KAAK;YACL,OAAO,EAAE,EAAE;YACX,UAAU,EAAE,CAAC;YACb,QAAQ,EAAE,CAAC,IAAI,CAAC,GAAG,EAAE,GAAG,SAAS,CAAC,GAAG,IAAI;SAC1C,CAAC;IACJ,CAAC;IAED,sDAAsD;IACtD,MAAM,cAAc,GAAG,gBAAgB,CAAC,KAAK,EAAE,IAAI,CAAC,CAAC;IACrD,MAAM,cAAc,GAAG,OAAO,KAAK,QAAQ;QACzC,CAAC,CAAC,MAAM,uBAAuB,CAAC,cAAc,EAAE,cAAc,CAAC;QAC/D,CAAC,CAAC,MAAM,oBAAoB,CAAC,cAAc,EAAE,cAAc,CAAC,CAAC;IAE/D,yBAAyB;IACzB,MAAM,OAAO,GAAkD,EAAE,CAAC;IAElE,KAAK,MAAM,IAAI,IAAI,KAAK,EAAE,CAAC;QACzB,qCAAqC;QACrC,IAAI,CAAC,IAAI,CAAC,SAAS,IAAI,IAAI,CAAC,SAAS,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;YACnD,SAAS;QACX,CAAC;QAED,oDAAoD;QACpD,IAAI,IAAI,CAAC,SAAS,CAAC,MAAM,KAAK,cAAc,CAAC,MAAM,EAAE,CAAC;YACpD,OAAO,CAAC,IAAI,CAAC,qBAAqB,IAAI,CAAC,IAAI,2CAA2C,IAAI,CAAC,SAAS,CAAC,MAAM,OAAO,cAAc,CAAC,MAAM,GAAG,CAAC,CAAC;YAC5I,SAAS;QACX,CAAC;QAED,MAAM,KAAK,GAAG,gBAAgB,CAAC,cAAc,EAAE,IAAI,CAAC,SAAS,CAAC,CAAC;QAE/D,IAAI,KAAK,IAAI,QAAQ,EAAE,CAAC;YACtB,OAAO,CAAC,IAAI,CAAC,EAAE,IAAI,EAAE,KAAK,EAAE,CAAC,CAAC;QAChC,CAAC;IACH,CAAC;IAED,6BAA6B;IAC7B,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,EAAE,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,KAAK,GAAG,CAAC,CAAC,KAAK,CAAC,CAAC;IAE1C,YAAY;IACZ,MAAM,UAAU,GAAG,OAAO,CAAC,KAAK,CAAC,CAAC,EAAE,IAAI,CAAC,CAAC;IAE1C,iBAAiB;IACjB,MAAM,gBAAgB,GAAmB,UAAU,CAAC,GAAG,CAAC,CAAC,EAAE,IAAI,EAAE,KAAK,EAAE,EAAE,EAAE,CAAC,CAAC;QAC5E,IAAI,EAAE,IAAI,CAAC,IAAI;QACf,UAAU,EAAE,IAAI,CAAC,UAAU;QAC3B,KAAK,EAAE,MAAM,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,CAAC;QAC/B,QAAQ,EAAE,IAAI,CAAC,QAAQ;QACvB,OAAO,EAAE,IAAI,CAAC,OAAO;KACtB,CAAC,CAAC,CAAC;IAEJ,MAAM,QAAQ,GAAG,CAAC,IAAI,CAAC,GAAG,EAAE,GAAG,SAAS,CAAC,GAAG,IAAI,CAAC;IAEjD,OAAO;QACL,KAAK;QACL,OAAO,EAAE,gBAAgB;QACzB,UAAU,EAAE,OAAO,CAAC,MAAM;QAC1B,QAAQ;KACT,CAAC;AACJ,CAAC;AAED;;GAEG;AACH,MAAM,UAAU,mBAAmB,CAAC,QAAwB;IAC1D,MAAM,KAAK,GAAa,EAAE,CAAC;IAE3B,KAAK,CAAC,IAAI,CAAC,aAAa,QAAQ,CAAC,KAAK,GAAG,CAAC,CAAC;IAC3C,KAAK,CAAC,IAAI,CAAC,SAAS,QAAQ,CAAC,UAAU,eAAe,QAAQ,CAAC,QAAQ,CAAC,OAAO,CAAC,CAAC,CAAC,GAAG,CAAC,CAAC;IACvF,KAAK,CAAC,IAAI,CAAC,eAAe,QAAQ,CAAC,OAAO,CAAC,MAAM,aAAa,CAAC,CAAC;IAEhE,IAAI,QAAQ,CAAC,OAAO,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;QAClC,KAAK,CAAC,IAAI,CAAC,qBAAqB,CAAC,CAAC;IACpC,CAAC;IAED,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,QAAQ,CAAC,OAAO,CAAC,MAAM,EAAE,CAAC,EAAE,EAAE,CAAC;QACjD,MAAM,MAAM,GAAG,QAAQ,CAAC,OAAO,CAAC,CAAC,CAAC,CAAC;QACnC,KAAK,CAAC,IAAI,CAAC,MAAM,CAAC,GAAG,CAAC,KAAK,WAAW,CAAC,MAAM,CAAC,KAAK,CAAC,IAAI,MAAM,CAAC,IAAI,EAAE,CAAC,CAAC;QAEvE,IAAI,MAAM,CAAC,QAAQ,CAAC,KAAK,EAAE,CAAC;YAC1B,KAAK,CAAC,IAAI,CAAC,gBAAgB,MAAM,CAAC,QAAQ,CAAC,KAAK,EAAE,CAAC,CAAC;QACtD,CAAC;QAED,IAAI,MAAM,CAAC,QAAQ,CAAC,IAAI,IAAI,MAAM,CAAC,QAAQ,CAAC,OAAO,EAAE,CAAC;YACpD,MAAM,OAAO,GAAG,CAAC,MAAM,CAAC,QAAQ,CAAC,IAAI,EAAE,MAAM,CAAC,QAAQ,CAAC,OAAO,CAAC,CAAC,MAAM,CAAC,OAAO,CAAC,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;YAC5F,KAAK,CAAC,IAAI,CAAC,kBAAkB,OAAO,EAAE,CAAC,CAAC;QAC1C,CAAC;QAED,IAAI,MAAM,CAAC,QAAQ,CAAC,IAAI,IAAI,MAAM,CAAC,QAAQ,CAAC,IAAI,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;YAC5D,KAAK,CAAC,IAAI,CAAC,eAAe,MAAM,CAAC,QAAQ,CAAC,IAAI,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;QAC/D,CAAC;QAED,IAAI,MAAM,CAAC,OAAO,EAAE,CAAC;YACnB,KAAK,CAAC,IAAI,CAAC,SAAS,MAAM,CAAC,OAAO,EAAE,CAAC,CAAC;QACxC,CAAC;QAED,KAAK,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;IACjB,CAAC;IAED,OAAO,KAAK,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC;AAC1B,CAAC;AAED;;GAEG;AACH,SAAS,WAAW,CAAC,KAAa;IAChC,IAAI,KAAK,IAAI,GAAG,EAAE,CAAC;QACjB,OAAO,OAAO,CAAC,KAAK,GAAG,GAAG,CAAC,CAAC,OAAO,CAAC,CAAC,CAAC,GAAG,CAAC;IAC5C,CAAC;SAAM,IAAI,KAAK,IAAI,GAAG,EAAE,CAAC;QACxB,OAAO,OAAO,CAAC,KAAK,GAAG,GAAG,CAAC,CAAC,OAAO,CAAC,CAAC,CAAC,GAAG,CAAC;IAC5C,CAAC;SAAM,IAAI,KAAK,IAAI,GAAG,EAAE,CAAC;QACxB,OAAO,OAAO,CAAC,KAAK,GAAG,GAAG,CAAC,CAAC,OAAO,CAAC,CAAC,CAAC,GAAG,CAAC;IAC5C,CAAC;SAAM,CAAC;QACN,OAAO,OAAO,CAAC,KAAK,GAAG,GAAG,CAAC,CAAC,OAAO,CAAC,CAAC,CAAC,GAAG,CAAC;IAC5C,CAAC;AACH,CAAC"}
|