lynkr 4.1.0 → 4.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LYNKR-TUI-PLAN.md +984 -0
- package/README.md +0 -117
- package/bin/cli.js +6 -0
- package/docs/index.md +78 -790
- package/package.json +1 -1
- package/src/api/openai-router.js +187 -0
- package/src/api/router.js +172 -22
- package/src/clients/databricks.js +82 -7
- package/src/clients/openai-format.js +11 -9
- package/src/clients/openrouter-utils.js +15 -5
- package/src/clients/responses-format.js +214 -0
- package/src/clients/standard-tools.js +4 -4
- package/src/orchestrator/index.js +32 -0
- package/README.md.backup +0 -2996
- package/docs/LOCAL_EMBEDDINGS_PLAN.md +0 -1024
- package/lynkr-0.1.1.tgz +0 -0
|
@@ -1,1024 +0,0 @@
|
|
|
1
|
-
# Local Embeddings Support Plan (Ollama + llama.cpp)
|
|
2
|
-
|
|
3
|
-
**Goal:** Add local embeddings support for Cursor IDE's @Codebase semantic search using Ollama and llama.cpp
|
|
4
|
-
|
|
5
|
-
**Current State:**
|
|
6
|
-
- Embeddings only work via OpenRouter or OpenAI (cloud-based)
|
|
7
|
-
- Code location: `src/api/openai-router.js` lines 361-412
|
|
8
|
-
- Hardcoded to use OpenRouter/OpenAI endpoints only
|
|
9
|
-
|
|
10
|
-
**Target State:**
|
|
11
|
-
- Support Ollama for embeddings (local, privacy-first)
|
|
12
|
-
- Support llama.cpp for embeddings (local, GGUF models)
|
|
13
|
-
- Allow users to run 100% local: Ollama chat + Ollama embeddings
|
|
14
|
-
- Fallback to OpenRouter/OpenAI if local not configured
|
|
15
|
-
|
|
16
|
-
---
|
|
17
|
-
|
|
18
|
-
## Architecture Overview
|
|
19
|
-
|
|
20
|
-
### Current Flow
|
|
21
|
-
```
|
|
22
|
-
Cursor @Codebase request
|
|
23
|
-
↓
|
|
24
|
-
Lynkr /v1/embeddings endpoint
|
|
25
|
-
↓
|
|
26
|
-
Check: OpenRouter? → Yes → OpenRouter API
|
|
27
|
-
↓ No
|
|
28
|
-
Check: OpenAI? → Yes → OpenAI API
|
|
29
|
-
↓ No
|
|
30
|
-
Return 501 (Not Configured)
|
|
31
|
-
```
|
|
32
|
-
|
|
33
|
-
### New Flow
|
|
34
|
-
```
|
|
35
|
-
Cursor @Codebase request
|
|
36
|
-
↓
|
|
37
|
-
Lynkr /v1/embeddings endpoint
|
|
38
|
-
↓
|
|
39
|
-
determineEmbeddingProvider()
|
|
40
|
-
↓
|
|
41
|
-
├─→ Ollama → /api/embeddings (custom format)
|
|
42
|
-
├─→ llama.cpp → /embeddings (OpenAI-compatible)
|
|
43
|
-
├─→ OpenRouter → /api/v1/embeddings (existing)
|
|
44
|
-
└─→ OpenAI → /v1/embeddings (existing)
|
|
45
|
-
```
|
|
46
|
-
|
|
47
|
-
---
|
|
48
|
-
|
|
49
|
-
## API Format Differences
|
|
50
|
-
|
|
51
|
-
### 1. Ollama Embeddings API
|
|
52
|
-
|
|
53
|
-
**Endpoint:** `http://localhost:11434/api/embeddings`
|
|
54
|
-
|
|
55
|
-
**Request Format:**
|
|
56
|
-
```json
|
|
57
|
-
{
|
|
58
|
-
"model": "nomic-embed-text",
|
|
59
|
-
"prompt": "The quick brown fox"
|
|
60
|
-
}
|
|
61
|
-
```
|
|
62
|
-
|
|
63
|
-
**Response Format:**
|
|
64
|
-
```json
|
|
65
|
-
{
|
|
66
|
-
"embedding": [0.123, 0.456, ...],
|
|
67
|
-
"model": "nomic-embed-text"
|
|
68
|
-
}
|
|
69
|
-
```
|
|
70
|
-
|
|
71
|
-
**Key Differences:**
|
|
72
|
-
- ❌ Does NOT support batch inputs (only single prompt)
|
|
73
|
-
- ❌ No usage statistics returned
|
|
74
|
-
- ❌ Different response structure
|
|
75
|
-
- ✅ Need to convert OpenAI format → Ollama format
|
|
76
|
-
|
|
77
|
-
### 2. llama.cpp Embeddings API
|
|
78
|
-
|
|
79
|
-
**Endpoint:** `http://localhost:8080/embeddings`
|
|
80
|
-
|
|
81
|
-
**Request Format (OpenAI-compatible):**
|
|
82
|
-
```json
|
|
83
|
-
{
|
|
84
|
-
"input": "The quick brown fox",
|
|
85
|
-
"encoding_format": "float"
|
|
86
|
-
}
|
|
87
|
-
```
|
|
88
|
-
|
|
89
|
-
**Response Format (OpenAI-compatible):**
|
|
90
|
-
```json
|
|
91
|
-
{
|
|
92
|
-
"object": "list",
|
|
93
|
-
"data": [
|
|
94
|
-
{
|
|
95
|
-
"object": "embedding",
|
|
96
|
-
"embedding": [0.123, 0.456, ...],
|
|
97
|
-
"index": 0
|
|
98
|
-
}
|
|
99
|
-
],
|
|
100
|
-
"model": "loaded-model",
|
|
101
|
-
"usage": {
|
|
102
|
-
"prompt_tokens": 5,
|
|
103
|
-
"total_tokens": 5
|
|
104
|
-
}
|
|
105
|
-
}
|
|
106
|
-
```
|
|
107
|
-
|
|
108
|
-
**Key Differences:**
|
|
109
|
-
- ✅ OpenAI-compatible format
|
|
110
|
-
- ✅ Supports batch inputs (array of strings)
|
|
111
|
-
- ✅ Returns usage statistics
|
|
112
|
-
- ✅ No conversion needed
|
|
113
|
-
|
|
114
|
-
### 3. OpenRouter/OpenAI (Existing)
|
|
115
|
-
|
|
116
|
-
Already implemented, no changes needed.
|
|
117
|
-
|
|
118
|
-
---
|
|
119
|
-
|
|
120
|
-
## Implementation Plan
|
|
121
|
-
|
|
122
|
-
### Phase 1: Configuration (30 minutes)
|
|
123
|
-
|
|
124
|
-
#### 1.1 Add Environment Variables
|
|
125
|
-
|
|
126
|
-
**File:** `src/config/index.js`
|
|
127
|
-
|
|
128
|
-
**Add after line 85 (OpenRouter config):**
|
|
129
|
-
```javascript
|
|
130
|
-
// Ollama embeddings configuration
|
|
131
|
-
const ollamaEmbeddingsEndpoint = process.env.OLLAMA_EMBEDDINGS_ENDPOINT ??
|
|
132
|
-
`${ollamaEndpoint}/api/embeddings`;
|
|
133
|
-
const ollamaEmbeddingsModel = process.env.OLLAMA_EMBEDDINGS_MODEL ?? "nomic-embed-text";
|
|
134
|
-
|
|
135
|
-
// llama.cpp embeddings configuration
|
|
136
|
-
const llamacppEmbeddingsEndpoint = process.env.LLAMACPP_EMBEDDINGS_ENDPOINT ??
|
|
137
|
-
`${llamacppEndpoint}/embeddings`;
|
|
138
|
-
```
|
|
139
|
-
|
|
140
|
-
**Update config export (line 434-436):**
|
|
141
|
-
```javascript
|
|
142
|
-
ollama: {
|
|
143
|
-
endpoint: ollamaEndpoint,
|
|
144
|
-
model: ollamaModel,
|
|
145
|
-
timeout: Number.isNaN(ollamaTimeout) ? 120000 : ollamaTimeout,
|
|
146
|
-
// NEW: Embeddings config
|
|
147
|
-
embeddingsEndpoint: ollamaEmbeddingsEndpoint,
|
|
148
|
-
embeddingsModel: ollamaEmbeddingsModel,
|
|
149
|
-
},
|
|
150
|
-
```
|
|
151
|
-
|
|
152
|
-
**Update llamacpp config (line 454-459):**
|
|
153
|
-
```javascript
|
|
154
|
-
llamacpp: {
|
|
155
|
-
endpoint: llamacppEndpoint,
|
|
156
|
-
model: llamacppModel,
|
|
157
|
-
timeout: Number.isNaN(llamacppTimeout) ? 120000 : llamacppTimeout,
|
|
158
|
-
apiKey: llamacppApiKey,
|
|
159
|
-
// NEW: Embeddings config
|
|
160
|
-
embeddingsEndpoint: llamacppEmbeddingsEndpoint,
|
|
161
|
-
},
|
|
162
|
-
```
|
|
163
|
-
|
|
164
|
-
#### 1.2 Document in .env.example
|
|
165
|
-
|
|
166
|
-
**File:** `.env.example`
|
|
167
|
-
|
|
168
|
-
**Add after Ollama section (after line 28):**
|
|
169
|
-
```bash
|
|
170
|
-
# Ollama embeddings configuration
|
|
171
|
-
# Embedding models for @Codebase semantic search (local, privacy-first)
|
|
172
|
-
# Popular models:
|
|
173
|
-
# - nomic-embed-text (768 dim, 137M params, best all-around)
|
|
174
|
-
# - mxbai-embed-large (1024 dim, 335M params, higher quality)
|
|
175
|
-
# - all-minilm (384 dim, 23M params, fastest/smallest)
|
|
176
|
-
#
|
|
177
|
-
# Pull model: ollama pull nomic-embed-text
|
|
178
|
-
# OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
|
|
179
|
-
# OLLAMA_EMBEDDINGS_ENDPOINT=http://localhost:11434/api/embeddings
|
|
180
|
-
```
|
|
181
|
-
|
|
182
|
-
**Add after llama.cpp section (after line 162):**
|
|
183
|
-
```bash
|
|
184
|
-
# llama.cpp embeddings configuration
|
|
185
|
-
# Requires an embedding model loaded in llama.cpp server
|
|
186
|
-
# Start with: ./llama-server -m nomic-embed-text-v1.5.Q4_K_M.gguf --port 8080 --embedding
|
|
187
|
-
# LLAMACPP_EMBEDDINGS_ENDPOINT=http://localhost:8080/embeddings
|
|
188
|
-
```
|
|
189
|
-
|
|
190
|
-
---
|
|
191
|
-
|
|
192
|
-
### Phase 2: Provider Detection (20 minutes)
|
|
193
|
-
|
|
194
|
-
#### 2.1 Create Embedding Provider Router
|
|
195
|
-
|
|
196
|
-
**File:** `src/api/openai-router.js`
|
|
197
|
-
|
|
198
|
-
**Replace lines 361-375 with:**
|
|
199
|
-
```javascript
|
|
200
|
-
// Determine which provider to use for embeddings
|
|
201
|
-
// Priority: Explicit request model > Configured embeddings > Same as chat provider > Fallback
|
|
202
|
-
const embeddingConfig = determineEmbeddingProvider(model);
|
|
203
|
-
|
|
204
|
-
if (!embeddingConfig) {
|
|
205
|
-
logger.warn("No embedding provider configured");
|
|
206
|
-
return res.status(501).json({
|
|
207
|
-
error: {
|
|
208
|
-
message: "Embeddings not configured. Set up one of: OPENROUTER_API_KEY, OPENAI_API_KEY, OLLAMA_EMBEDDINGS_MODEL, or LLAMACPP_EMBEDDINGS_ENDPOINT",
|
|
209
|
-
type: "not_implemented",
|
|
210
|
-
code: "embeddings_not_configured"
|
|
211
|
-
}
|
|
212
|
-
});
|
|
213
|
-
}
|
|
214
|
-
|
|
215
|
-
// Route to appropriate provider
|
|
216
|
-
try {
|
|
217
|
-
let embeddingResponse;
|
|
218
|
-
|
|
219
|
-
switch (embeddingConfig.provider) {
|
|
220
|
-
case "ollama":
|
|
221
|
-
embeddingResponse = await generateOllamaEmbeddings(inputs, embeddingConfig);
|
|
222
|
-
break;
|
|
223
|
-
|
|
224
|
-
case "llamacpp":
|
|
225
|
-
embeddingResponse = await generateLlamaCppEmbeddings(inputs, embeddingConfig);
|
|
226
|
-
break;
|
|
227
|
-
|
|
228
|
-
case "openrouter":
|
|
229
|
-
embeddingResponse = await generateOpenRouterEmbeddings(inputs, embeddingConfig);
|
|
230
|
-
break;
|
|
231
|
-
|
|
232
|
-
case "openai":
|
|
233
|
-
embeddingResponse = await generateOpenAIEmbeddings(inputs, embeddingConfig);
|
|
234
|
-
break;
|
|
235
|
-
|
|
236
|
-
default:
|
|
237
|
-
throw new Error(`Unsupported embedding provider: ${embeddingConfig.provider}`);
|
|
238
|
-
}
|
|
239
|
-
|
|
240
|
-
logger.info({
|
|
241
|
-
provider: embeddingConfig.provider,
|
|
242
|
-
model: embeddingConfig.model,
|
|
243
|
-
duration: Date.now() - startTime,
|
|
244
|
-
embeddingCount: embeddingResponse.data?.length || 0,
|
|
245
|
-
}, "=== EMBEDDINGS RESPONSE ===");
|
|
246
|
-
|
|
247
|
-
res.json(embeddingResponse);
|
|
248
|
-
|
|
249
|
-
} catch (error) {
|
|
250
|
-
logger.error({
|
|
251
|
-
error: error.message,
|
|
252
|
-
provider: embeddingConfig.provider,
|
|
253
|
-
}, "Embeddings generation failed");
|
|
254
|
-
|
|
255
|
-
res.status(500).json({
|
|
256
|
-
error: {
|
|
257
|
-
message: error.message || "Embeddings generation failed",
|
|
258
|
-
type: "server_error",
|
|
259
|
-
code: "embeddings_error"
|
|
260
|
-
}
|
|
261
|
-
});
|
|
262
|
-
}
|
|
263
|
-
```
|
|
264
|
-
|
|
265
|
-
#### 2.2 Add Provider Detection Function
|
|
266
|
-
|
|
267
|
-
**File:** `src/api/openai-router.js`
|
|
268
|
-
|
|
269
|
-
**Add before the POST /embeddings handler:**
|
|
270
|
-
```javascript
|
|
271
|
-
/**
|
|
272
|
-
* Determine which provider to use for embeddings
|
|
273
|
-
* Priority:
|
|
274
|
-
* 1. Explicit EMBEDDINGS_PROVIDER env var
|
|
275
|
-
* 2. Same provider as MODEL_PROVIDER (if it supports embeddings)
|
|
276
|
-
* 3. First available: OpenRouter > OpenAI > Ollama > llama.cpp
|
|
277
|
-
*/
|
|
278
|
-
function determineEmbeddingProvider(requestedModel = null) {
|
|
279
|
-
const explicitProvider = process.env.EMBEDDINGS_PROVIDER?.trim();
|
|
280
|
-
|
|
281
|
-
// Priority 1: Explicit configuration
|
|
282
|
-
if (explicitProvider) {
|
|
283
|
-
switch (explicitProvider) {
|
|
284
|
-
case "ollama":
|
|
285
|
-
if (!config.ollama?.embeddingsModel) {
|
|
286
|
-
logger.warn("EMBEDDINGS_PROVIDER=ollama but OLLAMA_EMBEDDINGS_MODEL not set");
|
|
287
|
-
return null;
|
|
288
|
-
}
|
|
289
|
-
return {
|
|
290
|
-
provider: "ollama",
|
|
291
|
-
model: requestedModel || config.ollama.embeddingsModel,
|
|
292
|
-
endpoint: config.ollama.embeddingsEndpoint
|
|
293
|
-
};
|
|
294
|
-
|
|
295
|
-
case "llamacpp":
|
|
296
|
-
if (!config.llamacpp?.embeddingsEndpoint) {
|
|
297
|
-
logger.warn("EMBEDDINGS_PROVIDER=llamacpp but LLAMACPP_EMBEDDINGS_ENDPOINT not set");
|
|
298
|
-
return null;
|
|
299
|
-
}
|
|
300
|
-
return {
|
|
301
|
-
provider: "llamacpp",
|
|
302
|
-
model: requestedModel || "default",
|
|
303
|
-
endpoint: config.llamacpp.embeddingsEndpoint
|
|
304
|
-
};
|
|
305
|
-
|
|
306
|
-
case "openrouter":
|
|
307
|
-
if (!config.openrouter?.apiKey) {
|
|
308
|
-
logger.warn("EMBEDDINGS_PROVIDER=openrouter but OPENROUTER_API_KEY not set");
|
|
309
|
-
return null;
|
|
310
|
-
}
|
|
311
|
-
return {
|
|
312
|
-
provider: "openrouter",
|
|
313
|
-
model: requestedModel || config.openrouter.embeddingsModel,
|
|
314
|
-
apiKey: config.openrouter.apiKey,
|
|
315
|
-
endpoint: "https://openrouter.ai/api/v1/embeddings"
|
|
316
|
-
};
|
|
317
|
-
|
|
318
|
-
case "openai":
|
|
319
|
-
if (!config.openai?.apiKey) {
|
|
320
|
-
logger.warn("EMBEDDINGS_PROVIDER=openai but OPENAI_API_KEY not set");
|
|
321
|
-
return null;
|
|
322
|
-
}
|
|
323
|
-
return {
|
|
324
|
-
provider: "openai",
|
|
325
|
-
model: requestedModel || "text-embedding-ada-002",
|
|
326
|
-
apiKey: config.openai.apiKey,
|
|
327
|
-
endpoint: "https://api.openai.com/v1/embeddings"
|
|
328
|
-
};
|
|
329
|
-
}
|
|
330
|
-
}
|
|
331
|
-
|
|
332
|
-
// Priority 2: Same as chat provider (if supported)
|
|
333
|
-
const chatProvider = config.modelProvider?.type;
|
|
334
|
-
|
|
335
|
-
if (chatProvider === "openrouter" && config.openrouter?.apiKey) {
|
|
336
|
-
return {
|
|
337
|
-
provider: "openrouter",
|
|
338
|
-
model: requestedModel || config.openrouter.embeddingsModel,
|
|
339
|
-
apiKey: config.openrouter.apiKey,
|
|
340
|
-
endpoint: "https://openrouter.ai/api/v1/embeddings"
|
|
341
|
-
};
|
|
342
|
-
}
|
|
343
|
-
|
|
344
|
-
if (chatProvider === "ollama" && config.ollama?.embeddingsModel) {
|
|
345
|
-
return {
|
|
346
|
-
provider: "ollama",
|
|
347
|
-
model: requestedModel || config.ollama.embeddingsModel,
|
|
348
|
-
endpoint: config.ollama.embeddingsEndpoint
|
|
349
|
-
};
|
|
350
|
-
}
|
|
351
|
-
|
|
352
|
-
if (chatProvider === "llamacpp" && config.llamacpp?.embeddingsEndpoint) {
|
|
353
|
-
return {
|
|
354
|
-
provider: "llamacpp",
|
|
355
|
-
model: requestedModel || "default",
|
|
356
|
-
endpoint: config.llamacpp.embeddingsEndpoint
|
|
357
|
-
};
|
|
358
|
-
}
|
|
359
|
-
|
|
360
|
-
// Priority 3: First available provider
|
|
361
|
-
if (config.openrouter?.apiKey) {
|
|
362
|
-
return {
|
|
363
|
-
provider: "openrouter",
|
|
364
|
-
model: requestedModel || config.openrouter.embeddingsModel,
|
|
365
|
-
apiKey: config.openrouter.apiKey,
|
|
366
|
-
endpoint: "https://openrouter.ai/api/v1/embeddings"
|
|
367
|
-
};
|
|
368
|
-
}
|
|
369
|
-
|
|
370
|
-
if (config.openai?.apiKey) {
|
|
371
|
-
return {
|
|
372
|
-
provider: "openai",
|
|
373
|
-
model: requestedModel || "text-embedding-ada-002",
|
|
374
|
-
apiKey: config.openai.apiKey,
|
|
375
|
-
endpoint: "https://api.openai.com/v1/embeddings"
|
|
376
|
-
};
|
|
377
|
-
}
|
|
378
|
-
|
|
379
|
-
if (config.ollama?.embeddingsModel) {
|
|
380
|
-
return {
|
|
381
|
-
provider: "ollama",
|
|
382
|
-
model: requestedModel || config.ollama.embeddingsModel,
|
|
383
|
-
endpoint: config.ollama.embeddingsEndpoint
|
|
384
|
-
};
|
|
385
|
-
}
|
|
386
|
-
|
|
387
|
-
if (config.llamacpp?.embeddingsEndpoint) {
|
|
388
|
-
return {
|
|
389
|
-
provider: "llamacpp",
|
|
390
|
-
model: requestedModel || "default",
|
|
391
|
-
endpoint: config.llamacpp.embeddingsEndpoint
|
|
392
|
-
};
|
|
393
|
-
}
|
|
394
|
-
|
|
395
|
-
return null; // No provider available
|
|
396
|
-
}
|
|
397
|
-
```
|
|
398
|
-
|
|
399
|
-
---
|
|
400
|
-
|
|
401
|
-
### Phase 3: Ollama Implementation (45 minutes)
|
|
402
|
-
|
|
403
|
-
#### 3.1 Add Ollama Embeddings Function
|
|
404
|
-
|
|
405
|
-
**File:** `src/api/openai-router.js`
|
|
406
|
-
|
|
407
|
-
**Add after determineEmbeddingProvider:**
|
|
408
|
-
```javascript
|
|
409
|
-
/**
|
|
410
|
-
* Generate embeddings using Ollama
|
|
411
|
-
* Note: Ollama only supports single prompt, not batch
|
|
412
|
-
*/
|
|
413
|
-
async function generateOllamaEmbeddings(inputs, config) {
|
|
414
|
-
const { model, endpoint } = config;
|
|
415
|
-
|
|
416
|
-
logger.info({
|
|
417
|
-
model,
|
|
418
|
-
endpoint,
|
|
419
|
-
inputCount: inputs.length
|
|
420
|
-
}, "Generating embeddings with Ollama");
|
|
421
|
-
|
|
422
|
-
// Ollama doesn't support batch, so we need to process one by one
|
|
423
|
-
const embeddings = [];
|
|
424
|
-
|
|
425
|
-
for (let i = 0; i < inputs.length; i++) {
|
|
426
|
-
const input = inputs[i];
|
|
427
|
-
|
|
428
|
-
try {
|
|
429
|
-
const response = await fetch(endpoint, {
|
|
430
|
-
method: "POST",
|
|
431
|
-
headers: {
|
|
432
|
-
"Content-Type": "application/json"
|
|
433
|
-
},
|
|
434
|
-
body: JSON.stringify({
|
|
435
|
-
model: model,
|
|
436
|
-
prompt: input
|
|
437
|
-
})
|
|
438
|
-
});
|
|
439
|
-
|
|
440
|
-
if (!response.ok) {
|
|
441
|
-
const errorText = await response.text();
|
|
442
|
-
throw new Error(`Ollama embeddings error (${response.status}): ${errorText}`);
|
|
443
|
-
}
|
|
444
|
-
|
|
445
|
-
const data = await response.json();
|
|
446
|
-
|
|
447
|
-
embeddings.push({
|
|
448
|
-
object: "embedding",
|
|
449
|
-
embedding: data.embedding,
|
|
450
|
-
index: i
|
|
451
|
-
});
|
|
452
|
-
|
|
453
|
-
} catch (error) {
|
|
454
|
-
logger.error({
|
|
455
|
-
error: error.message,
|
|
456
|
-
input: input.substring(0, 100),
|
|
457
|
-
index: i
|
|
458
|
-
}, "Failed to generate Ollama embedding");
|
|
459
|
-
throw error;
|
|
460
|
-
}
|
|
461
|
-
}
|
|
462
|
-
|
|
463
|
-
// Convert to OpenAI format
|
|
464
|
-
return {
|
|
465
|
-
object: "list",
|
|
466
|
-
data: embeddings,
|
|
467
|
-
model: model,
|
|
468
|
-
usage: {
|
|
469
|
-
prompt_tokens: 0, // Ollama doesn't provide this
|
|
470
|
-
total_tokens: 0
|
|
471
|
-
}
|
|
472
|
-
};
|
|
473
|
-
}
|
|
474
|
-
```
|
|
475
|
-
|
|
476
|
-
---
|
|
477
|
-
|
|
478
|
-
### Phase 4: llama.cpp Implementation (20 minutes)
|
|
479
|
-
|
|
480
|
-
#### 4.1 Add llama.cpp Embeddings Function
|
|
481
|
-
|
|
482
|
-
**File:** `src/api/openai-router.js`
|
|
483
|
-
|
|
484
|
-
**Add after generateOllamaEmbeddings:**
|
|
485
|
-
```javascript
|
|
486
|
-
/**
|
|
487
|
-
* Generate embeddings using llama.cpp
|
|
488
|
-
* llama.cpp uses OpenAI-compatible format, so minimal conversion needed
|
|
489
|
-
*/
|
|
490
|
-
async function generateLlamaCppEmbeddings(inputs, config) {
|
|
491
|
-
const { model, endpoint } = config;
|
|
492
|
-
|
|
493
|
-
logger.info({
|
|
494
|
-
model,
|
|
495
|
-
endpoint,
|
|
496
|
-
inputCount: inputs.length
|
|
497
|
-
}, "Generating embeddings with llama.cpp");
|
|
498
|
-
|
|
499
|
-
try {
|
|
500
|
-
const response = await fetch(endpoint, {
|
|
501
|
-
method: "POST",
|
|
502
|
-
headers: {
|
|
503
|
-
"Content-Type": "application/json"
|
|
504
|
-
},
|
|
505
|
-
body: JSON.stringify({
|
|
506
|
-
input: inputs, // llama.cpp supports batch
|
|
507
|
-
encoding_format: "float"
|
|
508
|
-
})
|
|
509
|
-
});
|
|
510
|
-
|
|
511
|
-
if (!response.ok) {
|
|
512
|
-
const errorText = await response.text();
|
|
513
|
-
throw new Error(`llama.cpp embeddings error (${response.status}): ${errorText}`);
|
|
514
|
-
}
|
|
515
|
-
|
|
516
|
-
const data = await response.json();
|
|
517
|
-
|
|
518
|
-
// llama.cpp returns OpenAI-compatible format, but ensure consistency
|
|
519
|
-
return {
|
|
520
|
-
object: "list",
|
|
521
|
-
data: data.data || [],
|
|
522
|
-
model: model || data.model || "default",
|
|
523
|
-
usage: data.usage || {
|
|
524
|
-
prompt_tokens: 0,
|
|
525
|
-
total_tokens: 0
|
|
526
|
-
}
|
|
527
|
-
};
|
|
528
|
-
|
|
529
|
-
} catch (error) {
|
|
530
|
-
logger.error({
|
|
531
|
-
error: error.message,
|
|
532
|
-
endpoint
|
|
533
|
-
}, "Failed to generate llama.cpp embeddings");
|
|
534
|
-
throw error;
|
|
535
|
-
}
|
|
536
|
-
}
|
|
537
|
-
```
|
|
538
|
-
|
|
539
|
-
---
|
|
540
|
-
|
|
541
|
-
### Phase 5: Refactor Existing Providers (15 minutes)
|
|
542
|
-
|
|
543
|
-
#### 5.1 Extract OpenRouter Function
|
|
544
|
-
|
|
545
|
-
**File:** `src/api/openai-router.js`
|
|
546
|
-
|
|
547
|
-
**Add after generateLlamaCppEmbeddings:**
|
|
548
|
-
```javascript
|
|
549
|
-
/**
|
|
550
|
-
* Generate embeddings using OpenRouter
|
|
551
|
-
*/
|
|
552
|
-
async function generateOpenRouterEmbeddings(inputs, config) {
|
|
553
|
-
const { model, apiKey, endpoint } = config;
|
|
554
|
-
|
|
555
|
-
logger.info({
|
|
556
|
-
model,
|
|
557
|
-
inputCount: inputs.length
|
|
558
|
-
}, "Generating embeddings with OpenRouter");
|
|
559
|
-
|
|
560
|
-
const response = await fetch(endpoint, {
|
|
561
|
-
method: "POST",
|
|
562
|
-
headers: {
|
|
563
|
-
"Content-Type": "application/json",
|
|
564
|
-
"Authorization": `Bearer ${apiKey}`,
|
|
565
|
-
"HTTP-Referer": "https://github.com/vishalveerareddy123/Lynkr",
|
|
566
|
-
"X-Title": "Lynkr"
|
|
567
|
-
},
|
|
568
|
-
body: JSON.stringify({
|
|
569
|
-
model: model,
|
|
570
|
-
input: inputs,
|
|
571
|
-
encoding_format: "float"
|
|
572
|
-
})
|
|
573
|
-
});
|
|
574
|
-
|
|
575
|
-
if (!response.ok) {
|
|
576
|
-
const errorText = await response.text();
|
|
577
|
-
throw new Error(`OpenRouter embeddings error (${response.status}): ${errorText}`);
|
|
578
|
-
}
|
|
579
|
-
|
|
580
|
-
return await response.json();
|
|
581
|
-
}
|
|
582
|
-
|
|
583
|
-
/**
|
|
584
|
-
* Generate embeddings using OpenAI
|
|
585
|
-
*/
|
|
586
|
-
async function generateOpenAIEmbeddings(inputs, config) {
|
|
587
|
-
const { model, apiKey, endpoint } = config;
|
|
588
|
-
|
|
589
|
-
logger.info({
|
|
590
|
-
model,
|
|
591
|
-
inputCount: inputs.length
|
|
592
|
-
}, "Generating embeddings with OpenAI");
|
|
593
|
-
|
|
594
|
-
const response = await fetch(endpoint, {
|
|
595
|
-
method: "POST",
|
|
596
|
-
headers: {
|
|
597
|
-
"Content-Type": "application/json",
|
|
598
|
-
"Authorization": `Bearer ${apiKey}`
|
|
599
|
-
},
|
|
600
|
-
body: JSON.stringify({
|
|
601
|
-
model: model,
|
|
602
|
-
input: inputs,
|
|
603
|
-
encoding_format: "float"
|
|
604
|
-
})
|
|
605
|
-
});
|
|
606
|
-
|
|
607
|
-
if (!response.ok) {
|
|
608
|
-
const errorText = await response.text();
|
|
609
|
-
throw new Error(`OpenAI embeddings error (${response.status}): ${errorText}`);
|
|
610
|
-
}
|
|
611
|
-
|
|
612
|
-
return await response.json();
|
|
613
|
-
}
|
|
614
|
-
```
|
|
615
|
-
|
|
616
|
-
---
|
|
617
|
-
|
|
618
|
-
### Phase 6: Update Documentation (20 minutes)
|
|
619
|
-
|
|
620
|
-
#### 6.1 Update .env.example
|
|
621
|
-
|
|
622
|
-
**Already covered in Phase 1.2**
|
|
623
|
-
|
|
624
|
-
#### 6.2 Update README.md Cursor Section
|
|
625
|
-
|
|
626
|
-
**File:** `README.md`
|
|
627
|
-
|
|
628
|
-
**Update embeddings section (around line 1870):**
|
|
629
|
-
```markdown
|
|
630
|
-
### Enabling @Codebase Semantic Search (Optional)
|
|
631
|
-
|
|
632
|
-
For Cursor's @Codebase semantic search, you need embeddings support.
|
|
633
|
-
|
|
634
|
-
**⚡ Already using OpenRouter? You're all set!**
|
|
635
|
-
|
|
636
|
-
If you configured `MODEL_PROVIDER=openrouter`, embeddings **work automatically** with the same `OPENROUTER_API_KEY` - no additional setup needed. OpenRouter handles both chat completions AND embeddings with one key.
|
|
637
|
-
|
|
638
|
-
**🔧 Using a different provider? Choose your embeddings source:**
|
|
639
|
-
|
|
640
|
-
You have 4 options, listed from most private to least:
|
|
641
|
-
|
|
642
|
-
**Option A: Ollama (100% Local, FREE)**
|
|
643
|
-
```bash
|
|
644
|
-
# Install Ollama and pull embedding model
|
|
645
|
-
ollama pull nomic-embed-text
|
|
646
|
-
|
|
647
|
-
# Add to .env
|
|
648
|
-
OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
|
|
649
|
-
|
|
650
|
-
# That's it! Works with any chat provider (Databricks, Bedrock, etc.)
|
|
651
|
-
# Cost: FREE, Privacy: 100% local, Quality: Good
|
|
652
|
-
```
|
|
653
|
-
|
|
654
|
-
**Option B: llama.cpp (100% Local, FREE)**
|
|
655
|
-
```bash
|
|
656
|
-
# Download embedding model (GGUF format)
|
|
657
|
-
# e.g., nomic-embed-text-v1.5.Q4_K_M.gguf
|
|
658
|
-
|
|
659
|
-
# Start llama.cpp with embedding support
|
|
660
|
-
./llama-server -m nomic-embed-text-v1.5.Q4_K_M.gguf --port 8080 --embedding
|
|
661
|
-
|
|
662
|
-
# Add to .env
|
|
663
|
-
LLAMACPP_EMBEDDINGS_ENDPOINT=http://localhost:8080/embeddings
|
|
664
|
-
|
|
665
|
-
# Cost: FREE, Privacy: 100% local, Quality: Good
|
|
666
|
-
```
|
|
667
|
-
|
|
668
|
-
**Option C: OpenRouter (Cloud, Cheapest)**
|
|
669
|
-
```bash
|
|
670
|
-
# Add to .env (if not already there)
|
|
671
|
-
OPENROUTER_API_KEY=sk-or-v1-your-key-here
|
|
672
|
-
|
|
673
|
-
# Optional: Specify embedding model
|
|
674
|
-
OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-3-small
|
|
675
|
-
|
|
676
|
-
# Cost: $0.02 per 1M tokens (~$0.01-0.10/month), Privacy: Cloud, Quality: Excellent
|
|
677
|
-
```
|
|
678
|
-
|
|
679
|
-
**Option D: OpenAI Direct (Cloud)**
|
|
680
|
-
```bash
|
|
681
|
-
# Add to .env
|
|
682
|
-
OPENAI_API_KEY=sk-your-key-here
|
|
683
|
-
|
|
684
|
-
# Cost: $0.10 per 1M tokens, Privacy: Cloud, Quality: Excellent
|
|
685
|
-
```
|
|
686
|
-
|
|
687
|
-
**Restart Lynkr**, and @Codebase will work!
|
|
688
|
-
```
|
|
689
|
-
|
|
690
|
-
#### 6.3 Add Embedding Models Guide
|
|
691
|
-
|
|
692
|
-
**File:** `README.md`
|
|
693
|
-
|
|
694
|
-
**Add new section after Cursor setup:**
|
|
695
|
-
```markdown
|
|
696
|
-
### Embedding Models Guide
|
|
697
|
-
|
|
698
|
-
Different embedding models have different characteristics:
|
|
699
|
-
|
|
700
|
-
| Model | Provider | Size | Dimensions | Quality | Speed | Privacy |
|
|
701
|
-
|-------|----------|------|------------|---------|-------|---------|
|
|
702
|
-
| **nomic-embed-text** | Ollama/llama.cpp | 137M | 768 | ⭐⭐⭐⭐ | Fast | 100% local |
|
|
703
|
-
| **mxbai-embed-large** | Ollama | 335M | 1024 | ⭐⭐⭐⭐⭐ | Medium | 100% local |
|
|
704
|
-
| **all-minilm** | Ollama | 23M | 384 | ⭐⭐⭐ | Fastest | 100% local |
|
|
705
|
-
| **text-embedding-3-small** | OpenRouter/OpenAI | - | 1536 | ⭐⭐⭐⭐⭐ | Fast | Cloud |
|
|
706
|
-
| **text-embedding-3-large** | OpenRouter/OpenAI | - | 3072 | ⭐⭐⭐⭐⭐ | Medium | Cloud |
|
|
707
|
-
| **text-embedding-ada-002** | OpenRouter/OpenAI | - | 1536 | ⭐⭐⭐⭐ | Fast | Cloud |
|
|
708
|
-
| **voyage-code-2** | OpenRouter | - | 1536 | ⭐⭐⭐⭐⭐ | Medium | Cloud |
|
|
709
|
-
|
|
710
|
-
**Recommendations:**
|
|
711
|
-
- **Best for privacy**: `nomic-embed-text` (Ollama, 100% local, free)
|
|
712
|
-
- **Best for quality**: `text-embedding-3-large` (OpenRouter, $0.13/1M tokens)
|
|
713
|
-
- **Best for speed**: `all-minilm` (Ollama, 100% local, free, fastest)
|
|
714
|
-
- **Best balance**: `text-embedding-3-small` (OpenRouter, $0.02/1M tokens)
|
|
715
|
-
- **Best for code**: `voyage-code-2` (OpenRouter, $0.12/1M tokens)
|
|
716
|
-
|
|
717
|
-
**Setup examples:**
|
|
718
|
-
|
|
719
|
-
```bash
|
|
720
|
-
# Privacy-first: 100% local
|
|
721
|
-
MODEL_PROVIDER=ollama
|
|
722
|
-
OLLAMA_MODEL=qwen2.5-coder:latest
|
|
723
|
-
OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
|
|
724
|
-
|
|
725
|
-
# Quality-first: Best models
|
|
726
|
-
MODEL_PROVIDER=openrouter
|
|
727
|
-
OPENROUTER_MODEL=anthropic/claude-3.5-sonnet
|
|
728
|
-
OPENROUTER_EMBEDDINGS_MODEL=text-embedding-3-large
|
|
729
|
-
|
|
730
|
-
# Cost-optimized: Cheapest cloud
|
|
731
|
-
MODEL_PROVIDER=openrouter
|
|
732
|
-
OPENROUTER_MODEL=openai/gpt-4o-mini
|
|
733
|
-
OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-3-small
|
|
734
|
-
|
|
735
|
-
# Hybrid: Local chat, cloud embeddings
|
|
736
|
-
MODEL_PROVIDER=ollama
|
|
737
|
-
OLLAMA_MODEL=qwen2.5-coder:latest
|
|
738
|
-
OPENROUTER_API_KEY=sk-or-v1-...
|
|
739
|
-
OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-3-small
|
|
740
|
-
```
|
|
741
|
-
```
|
|
742
|
-
|
|
743
|
-
---
|
|
744
|
-
|
|
745
|
-
### Phase 7: Testing (30 minutes)
|
|
746
|
-
|
|
747
|
-
#### 7.1 Update Test File
|
|
748
|
-
|
|
749
|
-
**File:** `test/cursor-integration.test.js`
|
|
750
|
-
|
|
751
|
-
**Add new test suite at the end (before closing}):**
|
|
752
|
-
```javascript
|
|
753
|
-
describe("Local Embeddings Support", () => {
|
|
754
|
-
it("should detect Ollama as embedding provider", () => {
|
|
755
|
-
process.env.MODEL_PROVIDER = "ollama";
|
|
756
|
-
process.env.OLLAMA_EMBEDDINGS_MODEL = "nomic-embed-text";
|
|
757
|
-
delete require.cache[require.resolve("../src/config")];
|
|
758
|
-
const config = require("../src/config");
|
|
759
|
-
|
|
760
|
-
assert.strictEqual(config.ollama.embeddingsModel, "nomic-embed-text");
|
|
761
|
-
});
|
|
762
|
-
|
|
763
|
-
it("should detect llama.cpp as embedding provider", () => {
|
|
764
|
-
process.env.MODEL_PROVIDER = "llamacpp";
|
|
765
|
-
process.env.LLAMACPP_EMBEDDINGS_ENDPOINT = "http://localhost:8080/embeddings";
|
|
766
|
-
delete require.cache[require.resolve("../src/config")];
|
|
767
|
-
const config = require("../src/config");
|
|
768
|
-
|
|
769
|
-
assert.strictEqual(config.llamacpp.embeddingsEndpoint, "http://localhost:8080/embeddings");
|
|
770
|
-
});
|
|
771
|
-
|
|
772
|
-
it("should prioritize explicit EMBEDDINGS_PROVIDER", () => {
|
|
773
|
-
process.env.MODEL_PROVIDER = "databricks";
|
|
774
|
-
process.env.EMBEDDINGS_PROVIDER = "ollama";
|
|
775
|
-
process.env.OLLAMA_EMBEDDINGS_MODEL = "nomic-embed-text";
|
|
776
|
-
|
|
777
|
-
delete require.cache[require.resolve("../src/config")];
|
|
778
|
-
delete require.cache[require.resolve("../src/api/openai-router")];
|
|
779
|
-
|
|
780
|
-
const config = require("../src/config");
|
|
781
|
-
assert.strictEqual(config.modelProvider.type, "databricks");
|
|
782
|
-
assert.strictEqual(config.ollama.embeddingsModel, "nomic-embed-text");
|
|
783
|
-
});
|
|
784
|
-
|
|
785
|
-
it("should default to same provider if it supports embeddings", () => {
|
|
786
|
-
process.env.MODEL_PROVIDER = "openrouter";
|
|
787
|
-
process.env.OPENROUTER_API_KEY = "sk-test";
|
|
788
|
-
process.env.OPENROUTER_EMBEDDINGS_MODEL = "text-embedding-3-small";
|
|
789
|
-
delete process.env.EMBEDDINGS_PROVIDER;
|
|
790
|
-
|
|
791
|
-
delete require.cache[require.resolve("../src/config")];
|
|
792
|
-
const config = require("../src/config");
|
|
793
|
-
|
|
794
|
-
assert.strictEqual(config.modelProvider.type, "openrouter");
|
|
795
|
-
assert.strictEqual(config.openrouter.embeddingsModel, "text-embedding-3-small");
|
|
796
|
-
});
|
|
797
|
-
});
|
|
798
|
-
```
|
|
799
|
-
|
|
800
|
-
#### 7.2 Manual Testing Checklist
|
|
801
|
-
|
|
802
|
-
**Test Ollama embeddings:**
|
|
803
|
-
```bash
|
|
804
|
-
# 1. Pull embedding model
|
|
805
|
-
ollama pull nomic-embed-text
|
|
806
|
-
|
|
807
|
-
# 2. Configure .env
|
|
808
|
-
MODEL_PROVIDER=ollama
|
|
809
|
-
OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
|
|
810
|
-
|
|
811
|
-
# 3. Start Lynkr
|
|
812
|
-
lynkr start
|
|
813
|
-
|
|
814
|
-
# 4. Test embeddings endpoint
|
|
815
|
-
curl -X POST http://localhost:8080/v1/embeddings \
|
|
816
|
-
-H "Content-Type: application/json" \
|
|
817
|
-
-d '{"input": "test embedding"}'
|
|
818
|
-
|
|
819
|
-
# Expected: Returns embedding vector
|
|
820
|
-
```
|
|
821
|
-
|
|
822
|
-
**Test llama.cpp embeddings:**
|
|
823
|
-
```bash
|
|
824
|
-
# 1. Download and start llama.cpp
|
|
825
|
-
./llama-server -m nomic-embed-text-v1.5.Q4_K_M.gguf --port 8080 --embedding
|
|
826
|
-
|
|
827
|
-
# 2. Configure .env
|
|
828
|
-
MODEL_PROVIDER=databricks
|
|
829
|
-
LLAMACPP_EMBEDDINGS_ENDPOINT=http://localhost:8080/embeddings
|
|
830
|
-
|
|
831
|
-
# 3. Test embeddings
|
|
832
|
-
curl -X POST http://localhost:8080/v1/embeddings \
|
|
833
|
-
-H "Content-Type: application/json" \
|
|
834
|
-
-d '{"input": "test embedding"}'
|
|
835
|
-
```
|
|
836
|
-
|
|
837
|
-
**Test in Cursor:**
|
|
838
|
-
```bash
|
|
839
|
-
# 1. Configure Cursor to point to Lynkr
|
|
840
|
-
# Settings → Models → OpenAI API
|
|
841
|
-
# Base URL: http://localhost:8080/v1
|
|
842
|
-
|
|
843
|
-
# 2. Try @Codebase search
|
|
844
|
-
# In Cursor chat: "@Codebase where is authentication handled?"
|
|
845
|
-
|
|
846
|
-
# Expected: Semantic search results
|
|
847
|
-
```
|
|
848
|
-
|
|
849
|
-
---
|
|
850
|
-
|
|
851
|
-
## File Changes Summary
|
|
852
|
-
|
|
853
|
-
| File | Changes | Lines | Complexity |
|
|
854
|
-
|------|---------|-------|------------|
|
|
855
|
-
| `src/config/index.js` | Add embeddings config | +10 | Easy |
|
|
856
|
-
| `src/api/openai-router.js` | Add provider detection + implementations | +400 | Medium |
|
|
857
|
-
| `.env.example` | Document embedding models | +30 | Easy |
|
|
858
|
-
| `README.md` | Update Cursor section | +100 | Easy |
|
|
859
|
-
| `test/cursor-integration.test.js` | Add local embeddings tests | +50 | Easy |
|
|
860
|
-
|
|
861
|
-
**Total:** ~590 lines of code
|
|
862
|
-
|
|
863
|
-
---
|
|
864
|
-
|
|
865
|
-
## Complexity & Effort Estimates
|
|
866
|
-
|
|
867
|
-
| Phase | Description | Complexity | Effort | Value |
|
|
868
|
-
|-------|-------------|------------|--------|-------|
|
|
869
|
-
| 1 | Configuration | 2/10 | 30 min | Required |
|
|
870
|
-
| 2 | Provider Detection | 4/10 | 20 min | Core |
|
|
871
|
-
| 3 | Ollama Implementation | 6/10 | 45 min | High |
|
|
872
|
-
| 4 | llama.cpp Implementation | 3/10 | 20 min | High |
|
|
873
|
-
| 5 | Refactor Existing | 2/10 | 15 min | Cleanup |
|
|
874
|
-
| 6 | Documentation | 2/10 | 20 min | UX |
|
|
875
|
-
| 7 | Testing | 3/10 | 30 min | Quality |
|
|
876
|
-
|
|
877
|
-
**Total Effort:** ~3 hours
|
|
878
|
-
**Overall Complexity:** 4/10 (medium)
|
|
879
|
-
**Value:** HIGH (enables 100% local Cursor setup)
|
|
880
|
-
|
|
881
|
-
---
|
|
882
|
-
|
|
883
|
-
## Priority Ranking
|
|
884
|
-
|
|
885
|
-
### Must Have (Core Functionality)
|
|
886
|
-
1. ✅ **Ollama support** - Most requested for privacy
|
|
887
|
-
2. ✅ **llama.cpp support** - GGUF model compatibility
|
|
888
|
-
3. ✅ **Provider detection** - Smart routing
|
|
889
|
-
4. ✅ **Configuration** - User control
|
|
890
|
-
|
|
891
|
-
### Should Have (Good UX)
|
|
892
|
-
5. ✅ **Documentation** - User guidance
|
|
893
|
-
6. ✅ **Error handling** - Helpful messages
|
|
894
|
-
7. ✅ **Testing** - Quality assurance
|
|
895
|
-
|
|
896
|
-
### Nice to Have (Future)
|
|
897
|
-
8. ⚠️ **Caching** - Cache embeddings for repeated queries
|
|
898
|
-
9. ⚠️ **Batch optimization** - Parallel Ollama requests
|
|
899
|
-
10. ⚠️ **Auto-detection** - Detect installed embedding models
|
|
900
|
-
|
|
901
|
-
---
|
|
902
|
-
|
|
903
|
-
## Benefits
|
|
904
|
-
|
|
905
|
-
### For Privacy-Conscious Users
|
|
906
|
-
- ✅ **100% local setup**: Ollama chat + Ollama embeddings
|
|
907
|
-
- ✅ **No cloud dependencies**: Works offline
|
|
908
|
-
- ✅ **No API costs**: FREE forever
|
|
909
|
-
- ✅ **No data leakage**: Code never leaves machine
|
|
910
|
-
|
|
911
|
-
### For Cost-Conscious Users
|
|
912
|
-
- ✅ **Local embeddings FREE**: Ollama/llama.cpp
|
|
913
|
-
- ✅ **Cloud fallback cheap**: OpenRouter $0.02/1M tokens
|
|
914
|
-
- ✅ **Hybrid setup**: Local chat, cheap cloud embeddings
|
|
915
|
-
|
|
916
|
-
### For All Users
|
|
917
|
-
- ✅ **More choice**: 4 embedding providers
|
|
918
|
-
- ✅ **Better control**: Explicit provider selection
|
|
919
|
-
- ✅ **Consistent UX**: Works with any chat provider
|
|
920
|
-
|
|
921
|
-
---
|
|
922
|
-
|
|
923
|
-
## Success Criteria
|
|
924
|
-
|
|
925
|
-
### Functional Requirements
|
|
926
|
-
- [ ] Ollama embeddings work with single and multiple inputs
|
|
927
|
-
- [ ] llama.cpp embeddings work with batch inputs
|
|
928
|
-
- [ ] Provider detection prioritizes correctly
|
|
929
|
-
- [ ] Cursor @Codebase search works with local embeddings
|
|
930
|
-
- [ ] Error messages are helpful
|
|
931
|
-
|
|
932
|
-
### Quality Requirements
|
|
933
|
-
- [ ] All tests pass (18 existing + 4 new = 22 total)
|
|
934
|
-
- [ ] No breaking changes to existing OpenRouter/OpenAI
|
|
935
|
-
- [ ] Documentation is clear and complete
|
|
936
|
-
- [ ] Performance is acceptable (< 500ms per embedding)
|
|
937
|
-
|
|
938
|
-
### User Experience
|
|
939
|
-
- [ ] Setup takes < 5 minutes
|
|
940
|
-
- [ ] Works out of the box with Ollama
|
|
941
|
-
- [ ] Clear error messages when misconfigured
|
|
942
|
-
- [ ] README examples are copy-pasteable
|
|
943
|
-
|
|
944
|
-
---
|
|
945
|
-
|
|
946
|
-
## Risks & Mitigations
|
|
947
|
-
|
|
948
|
-
| Risk | Impact | Mitigation |
|
|
949
|
-
|------|--------|------------|
|
|
950
|
-
| **Ollama doesn't support batch** | Medium | Loop through inputs sequentially |
|
|
951
|
-
| **llama.cpp format changes** | Low | Use OpenAI-compatible mode |
|
|
952
|
-
| **Embedding models not installed** | High | Clear setup instructions + error messages |
|
|
953
|
-
| **Performance is slow** | Medium | Document model sizes, recommend lightweight models |
|
|
954
|
-
| **Breaking existing OpenRouter** | High | Thorough testing, backward compatible |
|
|
955
|
-
|
|
956
|
-
---
|
|
957
|
-
|
|
958
|
-
## Next Steps
|
|
959
|
-
|
|
960
|
-
1. **Review this plan** - Get approval
|
|
961
|
-
2. **Start with Phase 1** - Configuration (30 min)
|
|
962
|
-
3. **Implement Phase 2** - Provider detection (20 min)
|
|
963
|
-
4. **Implement Phase 3** - Ollama (45 min)
|
|
964
|
-
5. **Implement Phase 4** - llama.cpp (20 min)
|
|
965
|
-
6. **Complete remaining phases** - Refactor, docs, tests (65 min)
|
|
966
|
-
7. **Test end-to-end** - Cursor @Codebase with local embeddings
|
|
967
|
-
8. **Update changelog** - Document new feature
|
|
968
|
-
|
|
969
|
-
**Total time:** ~3 hours of focused work
|
|
970
|
-
|
|
971
|
-
---
|
|
972
|
-
|
|
973
|
-
## Questions to Answer Before Implementation
|
|
974
|
-
|
|
975
|
-
1. **Should EMBEDDINGS_PROVIDER be required or auto-detected?**
|
|
976
|
-
- Recommendation: Auto-detect, allow override
|
|
977
|
-
|
|
978
|
-
2. **Should we cache embeddings to avoid repeated API calls?**
|
|
979
|
-
- Recommendation: Not in MVP, add later if needed
|
|
980
|
-
|
|
981
|
-
3. **Should we support multiple embedding models simultaneously?**
|
|
982
|
-
- Recommendation: No, one at a time
|
|
983
|
-
|
|
984
|
-
4. **Should we validate embedding model is actually installed/running?**
|
|
985
|
-
- Recommendation: Yes, in error messages
|
|
986
|
-
|
|
987
|
-
5. **Should we add embedding model benchmarks to docs?**
|
|
988
|
-
- Recommendation: Yes, quality/speed/size comparison
|
|
989
|
-
|
|
990
|
-
---
|
|
991
|
-
|
|
992
|
-
## Alternative Approaches Considered
|
|
993
|
-
|
|
994
|
-
### Approach A: Unified Embeddings Service
|
|
995
|
-
**Pros:** Single endpoint handles all providers
|
|
996
|
-
**Cons:** More complex, harder to debug
|
|
997
|
-
**Decision:** ❌ Rejected - Too complex for MVP
|
|
998
|
-
|
|
999
|
-
### Approach B: Separate /v1/embeddings/* Routes
|
|
1000
|
-
**Pros:** Clear separation, easy to extend
|
|
1001
|
-
**Cons:** Non-standard, breaks OpenAI compatibility
|
|
1002
|
-
**Decision:** ❌ Rejected - Breaks Cursor compatibility
|
|
1003
|
-
|
|
1004
|
-
### Approach C: Provider-Specific Routing (Chosen)
|
|
1005
|
-
**Pros:** Clean, extensible, backward compatible
|
|
1006
|
-
**Cons:** Slightly more code
|
|
1007
|
-
**Decision:** ✅ Chosen - Best balance
|
|
1008
|
-
|
|
1009
|
-
---
|
|
1010
|
-
|
|
1011
|
-
## Post-Implementation TODO
|
|
1012
|
-
|
|
1013
|
-
- [ ] Add embeddings performance metrics to monitoring
|
|
1014
|
-
- [ ] Add embedding model benchmarks to docs
|
|
1015
|
-
- [ ] Consider adding embeddings caching layer
|
|
1016
|
-
- [ ] Add support for custom embedding endpoints
|
|
1017
|
-
- [ ] Add embedding model auto-download for Ollama
|
|
1018
|
-
- [ ] Add embedding dimension validation
|
|
1019
|
-
- [ ] Add support for normalized embeddings option
|
|
1020
|
-
- [ ] Consider adding batch optimization for Ollama (parallel requests)
|
|
1021
|
-
|
|
1022
|
-
---
|
|
1023
|
-
|
|
1024
|
-
**Ready to implement? Let me know and I'll start with Phase 1!**
|