lynkr 4.1.0 → 4.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,1024 +0,0 @@
1
- # Local Embeddings Support Plan (Ollama + llama.cpp)
2
-
3
- **Goal:** Add local embeddings support for Cursor IDE's @Codebase semantic search using Ollama and llama.cpp
4
-
5
- **Current State:**
6
- - Embeddings only work via OpenRouter or OpenAI (cloud-based)
7
- - Code location: `src/api/openai-router.js` lines 361-412
8
- - Hardcoded to use OpenRouter/OpenAI endpoints only
9
-
10
- **Target State:**
11
- - Support Ollama for embeddings (local, privacy-first)
12
- - Support llama.cpp for embeddings (local, GGUF models)
13
- - Allow users to run 100% local: Ollama chat + Ollama embeddings
14
- - Fallback to OpenRouter/OpenAI if local not configured
15
-
16
- ---
17
-
18
- ## Architecture Overview
19
-
20
- ### Current Flow
21
- ```
22
- Cursor @Codebase request
23
-
24
- Lynkr /v1/embeddings endpoint
25
-
26
- Check: OpenRouter? → Yes → OpenRouter API
27
- ↓ No
28
- Check: OpenAI? → Yes → OpenAI API
29
- ↓ No
30
- Return 501 (Not Configured)
31
- ```
32
-
33
- ### New Flow
34
- ```
35
- Cursor @Codebase request
36
-
37
- Lynkr /v1/embeddings endpoint
38
-
39
- determineEmbeddingProvider()
40
-
41
- ├─→ Ollama → /api/embeddings (custom format)
42
- ├─→ llama.cpp → /embeddings (OpenAI-compatible)
43
- ├─→ OpenRouter → /api/v1/embeddings (existing)
44
- └─→ OpenAI → /v1/embeddings (existing)
45
- ```
46
-
47
- ---
48
-
49
- ## API Format Differences
50
-
51
- ### 1. Ollama Embeddings API
52
-
53
- **Endpoint:** `http://localhost:11434/api/embeddings`
54
-
55
- **Request Format:**
56
- ```json
57
- {
58
- "model": "nomic-embed-text",
59
- "prompt": "The quick brown fox"
60
- }
61
- ```
62
-
63
- **Response Format:**
64
- ```json
65
- {
66
- "embedding": [0.123, 0.456, ...],
67
- "model": "nomic-embed-text"
68
- }
69
- ```
70
-
71
- **Key Differences:**
72
- - ❌ Does NOT support batch inputs (only single prompt)
73
- - ❌ No usage statistics returned
74
- - ❌ Different response structure
75
- - ✅ Need to convert OpenAI format → Ollama format
76
-
77
- ### 2. llama.cpp Embeddings API
78
-
79
- **Endpoint:** `http://localhost:8080/embeddings`
80
-
81
- **Request Format (OpenAI-compatible):**
82
- ```json
83
- {
84
- "input": "The quick brown fox",
85
- "encoding_format": "float"
86
- }
87
- ```
88
-
89
- **Response Format (OpenAI-compatible):**
90
- ```json
91
- {
92
- "object": "list",
93
- "data": [
94
- {
95
- "object": "embedding",
96
- "embedding": [0.123, 0.456, ...],
97
- "index": 0
98
- }
99
- ],
100
- "model": "loaded-model",
101
- "usage": {
102
- "prompt_tokens": 5,
103
- "total_tokens": 5
104
- }
105
- }
106
- ```
107
-
108
- **Key Differences:**
109
- - ✅ OpenAI-compatible format
110
- - ✅ Supports batch inputs (array of strings)
111
- - ✅ Returns usage statistics
112
- - ✅ No conversion needed
113
-
114
- ### 3. OpenRouter/OpenAI (Existing)
115
-
116
- Already implemented, no changes needed.
117
-
118
- ---
119
-
120
- ## Implementation Plan
121
-
122
- ### Phase 1: Configuration (30 minutes)
123
-
124
- #### 1.1 Add Environment Variables
125
-
126
- **File:** `src/config/index.js`
127
-
128
- **Add after line 85 (OpenRouter config):**
129
- ```javascript
130
- // Ollama embeddings configuration
131
- const ollamaEmbeddingsEndpoint = process.env.OLLAMA_EMBEDDINGS_ENDPOINT ??
132
- `${ollamaEndpoint}/api/embeddings`;
133
- const ollamaEmbeddingsModel = process.env.OLLAMA_EMBEDDINGS_MODEL ?? "nomic-embed-text";
134
-
135
- // llama.cpp embeddings configuration
136
- const llamacppEmbeddingsEndpoint = process.env.LLAMACPP_EMBEDDINGS_ENDPOINT ??
137
- `${llamacppEndpoint}/embeddings`;
138
- ```
139
-
140
- **Update config export (line 434-436):**
141
- ```javascript
142
- ollama: {
143
- endpoint: ollamaEndpoint,
144
- model: ollamaModel,
145
- timeout: Number.isNaN(ollamaTimeout) ? 120000 : ollamaTimeout,
146
- // NEW: Embeddings config
147
- embeddingsEndpoint: ollamaEmbeddingsEndpoint,
148
- embeddingsModel: ollamaEmbeddingsModel,
149
- },
150
- ```
151
-
152
- **Update llamacpp config (line 454-459):**
153
- ```javascript
154
- llamacpp: {
155
- endpoint: llamacppEndpoint,
156
- model: llamacppModel,
157
- timeout: Number.isNaN(llamacppTimeout) ? 120000 : llamacppTimeout,
158
- apiKey: llamacppApiKey,
159
- // NEW: Embeddings config
160
- embeddingsEndpoint: llamacppEmbeddingsEndpoint,
161
- },
162
- ```
163
-
164
- #### 1.2 Document in .env.example
165
-
166
- **File:** `.env.example`
167
-
168
- **Add after Ollama section (after line 28):**
169
- ```bash
170
- # Ollama embeddings configuration
171
- # Embedding models for @Codebase semantic search (local, privacy-first)
172
- # Popular models:
173
- # - nomic-embed-text (768 dim, 137M params, best all-around)
174
- # - mxbai-embed-large (1024 dim, 335M params, higher quality)
175
- # - all-minilm (384 dim, 23M params, fastest/smallest)
176
- #
177
- # Pull model: ollama pull nomic-embed-text
178
- # OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
179
- # OLLAMA_EMBEDDINGS_ENDPOINT=http://localhost:11434/api/embeddings
180
- ```
181
-
182
- **Add after llama.cpp section (after line 162):**
183
- ```bash
184
- # llama.cpp embeddings configuration
185
- # Requires an embedding model loaded in llama.cpp server
186
- # Start with: ./llama-server -m nomic-embed-text-v1.5.Q4_K_M.gguf --port 8080 --embedding
187
- # LLAMACPP_EMBEDDINGS_ENDPOINT=http://localhost:8080/embeddings
188
- ```
189
-
190
- ---
191
-
192
- ### Phase 2: Provider Detection (20 minutes)
193
-
194
- #### 2.1 Create Embedding Provider Router
195
-
196
- **File:** `src/api/openai-router.js`
197
-
198
- **Replace lines 361-375 with:**
199
- ```javascript
200
- // Determine which provider to use for embeddings
201
- // Priority: Explicit request model > Configured embeddings > Same as chat provider > Fallback
202
- const embeddingConfig = determineEmbeddingProvider(model);
203
-
204
- if (!embeddingConfig) {
205
- logger.warn("No embedding provider configured");
206
- return res.status(501).json({
207
- error: {
208
- message: "Embeddings not configured. Set up one of: OPENROUTER_API_KEY, OPENAI_API_KEY, OLLAMA_EMBEDDINGS_MODEL, or LLAMACPP_EMBEDDINGS_ENDPOINT",
209
- type: "not_implemented",
210
- code: "embeddings_not_configured"
211
- }
212
- });
213
- }
214
-
215
- // Route to appropriate provider
216
- try {
217
- let embeddingResponse;
218
-
219
- switch (embeddingConfig.provider) {
220
- case "ollama":
221
- embeddingResponse = await generateOllamaEmbeddings(inputs, embeddingConfig);
222
- break;
223
-
224
- case "llamacpp":
225
- embeddingResponse = await generateLlamaCppEmbeddings(inputs, embeddingConfig);
226
- break;
227
-
228
- case "openrouter":
229
- embeddingResponse = await generateOpenRouterEmbeddings(inputs, embeddingConfig);
230
- break;
231
-
232
- case "openai":
233
- embeddingResponse = await generateOpenAIEmbeddings(inputs, embeddingConfig);
234
- break;
235
-
236
- default:
237
- throw new Error(`Unsupported embedding provider: ${embeddingConfig.provider}`);
238
- }
239
-
240
- logger.info({
241
- provider: embeddingConfig.provider,
242
- model: embeddingConfig.model,
243
- duration: Date.now() - startTime,
244
- embeddingCount: embeddingResponse.data?.length || 0,
245
- }, "=== EMBEDDINGS RESPONSE ===");
246
-
247
- res.json(embeddingResponse);
248
-
249
- } catch (error) {
250
- logger.error({
251
- error: error.message,
252
- provider: embeddingConfig.provider,
253
- }, "Embeddings generation failed");
254
-
255
- res.status(500).json({
256
- error: {
257
- message: error.message || "Embeddings generation failed",
258
- type: "server_error",
259
- code: "embeddings_error"
260
- }
261
- });
262
- }
263
- ```
264
-
265
- #### 2.2 Add Provider Detection Function
266
-
267
- **File:** `src/api/openai-router.js`
268
-
269
- **Add before the POST /embeddings handler:**
270
- ```javascript
271
- /**
272
- * Determine which provider to use for embeddings
273
- * Priority:
274
- * 1. Explicit EMBEDDINGS_PROVIDER env var
275
- * 2. Same provider as MODEL_PROVIDER (if it supports embeddings)
276
- * 3. First available: OpenRouter > OpenAI > Ollama > llama.cpp
277
- */
278
- function determineEmbeddingProvider(requestedModel = null) {
279
- const explicitProvider = process.env.EMBEDDINGS_PROVIDER?.trim();
280
-
281
- // Priority 1: Explicit configuration
282
- if (explicitProvider) {
283
- switch (explicitProvider) {
284
- case "ollama":
285
- if (!config.ollama?.embeddingsModel) {
286
- logger.warn("EMBEDDINGS_PROVIDER=ollama but OLLAMA_EMBEDDINGS_MODEL not set");
287
- return null;
288
- }
289
- return {
290
- provider: "ollama",
291
- model: requestedModel || config.ollama.embeddingsModel,
292
- endpoint: config.ollama.embeddingsEndpoint
293
- };
294
-
295
- case "llamacpp":
296
- if (!config.llamacpp?.embeddingsEndpoint) {
297
- logger.warn("EMBEDDINGS_PROVIDER=llamacpp but LLAMACPP_EMBEDDINGS_ENDPOINT not set");
298
- return null;
299
- }
300
- return {
301
- provider: "llamacpp",
302
- model: requestedModel || "default",
303
- endpoint: config.llamacpp.embeddingsEndpoint
304
- };
305
-
306
- case "openrouter":
307
- if (!config.openrouter?.apiKey) {
308
- logger.warn("EMBEDDINGS_PROVIDER=openrouter but OPENROUTER_API_KEY not set");
309
- return null;
310
- }
311
- return {
312
- provider: "openrouter",
313
- model: requestedModel || config.openrouter.embeddingsModel,
314
- apiKey: config.openrouter.apiKey,
315
- endpoint: "https://openrouter.ai/api/v1/embeddings"
316
- };
317
-
318
- case "openai":
319
- if (!config.openai?.apiKey) {
320
- logger.warn("EMBEDDINGS_PROVIDER=openai but OPENAI_API_KEY not set");
321
- return null;
322
- }
323
- return {
324
- provider: "openai",
325
- model: requestedModel || "text-embedding-ada-002",
326
- apiKey: config.openai.apiKey,
327
- endpoint: "https://api.openai.com/v1/embeddings"
328
- };
329
- }
330
- }
331
-
332
- // Priority 2: Same as chat provider (if supported)
333
- const chatProvider = config.modelProvider?.type;
334
-
335
- if (chatProvider === "openrouter" && config.openrouter?.apiKey) {
336
- return {
337
- provider: "openrouter",
338
- model: requestedModel || config.openrouter.embeddingsModel,
339
- apiKey: config.openrouter.apiKey,
340
- endpoint: "https://openrouter.ai/api/v1/embeddings"
341
- };
342
- }
343
-
344
- if (chatProvider === "ollama" && config.ollama?.embeddingsModel) {
345
- return {
346
- provider: "ollama",
347
- model: requestedModel || config.ollama.embeddingsModel,
348
- endpoint: config.ollama.embeddingsEndpoint
349
- };
350
- }
351
-
352
- if (chatProvider === "llamacpp" && config.llamacpp?.embeddingsEndpoint) {
353
- return {
354
- provider: "llamacpp",
355
- model: requestedModel || "default",
356
- endpoint: config.llamacpp.embeddingsEndpoint
357
- };
358
- }
359
-
360
- // Priority 3: First available provider
361
- if (config.openrouter?.apiKey) {
362
- return {
363
- provider: "openrouter",
364
- model: requestedModel || config.openrouter.embeddingsModel,
365
- apiKey: config.openrouter.apiKey,
366
- endpoint: "https://openrouter.ai/api/v1/embeddings"
367
- };
368
- }
369
-
370
- if (config.openai?.apiKey) {
371
- return {
372
- provider: "openai",
373
- model: requestedModel || "text-embedding-ada-002",
374
- apiKey: config.openai.apiKey,
375
- endpoint: "https://api.openai.com/v1/embeddings"
376
- };
377
- }
378
-
379
- if (config.ollama?.embeddingsModel) {
380
- return {
381
- provider: "ollama",
382
- model: requestedModel || config.ollama.embeddingsModel,
383
- endpoint: config.ollama.embeddingsEndpoint
384
- };
385
- }
386
-
387
- if (config.llamacpp?.embeddingsEndpoint) {
388
- return {
389
- provider: "llamacpp",
390
- model: requestedModel || "default",
391
- endpoint: config.llamacpp.embeddingsEndpoint
392
- };
393
- }
394
-
395
- return null; // No provider available
396
- }
397
- ```
398
-
399
- ---
400
-
401
- ### Phase 3: Ollama Implementation (45 minutes)
402
-
403
- #### 3.1 Add Ollama Embeddings Function
404
-
405
- **File:** `src/api/openai-router.js`
406
-
407
- **Add after determineEmbeddingProvider:**
408
- ```javascript
409
- /**
410
- * Generate embeddings using Ollama
411
- * Note: Ollama only supports single prompt, not batch
412
- */
413
- async function generateOllamaEmbeddings(inputs, config) {
414
- const { model, endpoint } = config;
415
-
416
- logger.info({
417
- model,
418
- endpoint,
419
- inputCount: inputs.length
420
- }, "Generating embeddings with Ollama");
421
-
422
- // Ollama doesn't support batch, so we need to process one by one
423
- const embeddings = [];
424
-
425
- for (let i = 0; i < inputs.length; i++) {
426
- const input = inputs[i];
427
-
428
- try {
429
- const response = await fetch(endpoint, {
430
- method: "POST",
431
- headers: {
432
- "Content-Type": "application/json"
433
- },
434
- body: JSON.stringify({
435
- model: model,
436
- prompt: input
437
- })
438
- });
439
-
440
- if (!response.ok) {
441
- const errorText = await response.text();
442
- throw new Error(`Ollama embeddings error (${response.status}): ${errorText}`);
443
- }
444
-
445
- const data = await response.json();
446
-
447
- embeddings.push({
448
- object: "embedding",
449
- embedding: data.embedding,
450
- index: i
451
- });
452
-
453
- } catch (error) {
454
- logger.error({
455
- error: error.message,
456
- input: input.substring(0, 100),
457
- index: i
458
- }, "Failed to generate Ollama embedding");
459
- throw error;
460
- }
461
- }
462
-
463
- // Convert to OpenAI format
464
- return {
465
- object: "list",
466
- data: embeddings,
467
- model: model,
468
- usage: {
469
- prompt_tokens: 0, // Ollama doesn't provide this
470
- total_tokens: 0
471
- }
472
- };
473
- }
474
- ```
475
-
476
- ---
477
-
478
- ### Phase 4: llama.cpp Implementation (20 minutes)
479
-
480
- #### 4.1 Add llama.cpp Embeddings Function
481
-
482
- **File:** `src/api/openai-router.js`
483
-
484
- **Add after generateOllamaEmbeddings:**
485
- ```javascript
486
- /**
487
- * Generate embeddings using llama.cpp
488
- * llama.cpp uses OpenAI-compatible format, so minimal conversion needed
489
- */
490
- async function generateLlamaCppEmbeddings(inputs, config) {
491
- const { model, endpoint } = config;
492
-
493
- logger.info({
494
- model,
495
- endpoint,
496
- inputCount: inputs.length
497
- }, "Generating embeddings with llama.cpp");
498
-
499
- try {
500
- const response = await fetch(endpoint, {
501
- method: "POST",
502
- headers: {
503
- "Content-Type": "application/json"
504
- },
505
- body: JSON.stringify({
506
- input: inputs, // llama.cpp supports batch
507
- encoding_format: "float"
508
- })
509
- });
510
-
511
- if (!response.ok) {
512
- const errorText = await response.text();
513
- throw new Error(`llama.cpp embeddings error (${response.status}): ${errorText}`);
514
- }
515
-
516
- const data = await response.json();
517
-
518
- // llama.cpp returns OpenAI-compatible format, but ensure consistency
519
- return {
520
- object: "list",
521
- data: data.data || [],
522
- model: model || data.model || "default",
523
- usage: data.usage || {
524
- prompt_tokens: 0,
525
- total_tokens: 0
526
- }
527
- };
528
-
529
- } catch (error) {
530
- logger.error({
531
- error: error.message,
532
- endpoint
533
- }, "Failed to generate llama.cpp embeddings");
534
- throw error;
535
- }
536
- }
537
- ```
538
-
539
- ---
540
-
541
- ### Phase 5: Refactor Existing Providers (15 minutes)
542
-
543
- #### 5.1 Extract OpenRouter Function
544
-
545
- **File:** `src/api/openai-router.js`
546
-
547
- **Add after generateLlamaCppEmbeddings:**
548
- ```javascript
549
- /**
550
- * Generate embeddings using OpenRouter
551
- */
552
- async function generateOpenRouterEmbeddings(inputs, config) {
553
- const { model, apiKey, endpoint } = config;
554
-
555
- logger.info({
556
- model,
557
- inputCount: inputs.length
558
- }, "Generating embeddings with OpenRouter");
559
-
560
- const response = await fetch(endpoint, {
561
- method: "POST",
562
- headers: {
563
- "Content-Type": "application/json",
564
- "Authorization": `Bearer ${apiKey}`,
565
- "HTTP-Referer": "https://github.com/vishalveerareddy123/Lynkr",
566
- "X-Title": "Lynkr"
567
- },
568
- body: JSON.stringify({
569
- model: model,
570
- input: inputs,
571
- encoding_format: "float"
572
- })
573
- });
574
-
575
- if (!response.ok) {
576
- const errorText = await response.text();
577
- throw new Error(`OpenRouter embeddings error (${response.status}): ${errorText}`);
578
- }
579
-
580
- return await response.json();
581
- }
582
-
583
- /**
584
- * Generate embeddings using OpenAI
585
- */
586
- async function generateOpenAIEmbeddings(inputs, config) {
587
- const { model, apiKey, endpoint } = config;
588
-
589
- logger.info({
590
- model,
591
- inputCount: inputs.length
592
- }, "Generating embeddings with OpenAI");
593
-
594
- const response = await fetch(endpoint, {
595
- method: "POST",
596
- headers: {
597
- "Content-Type": "application/json",
598
- "Authorization": `Bearer ${apiKey}`
599
- },
600
- body: JSON.stringify({
601
- model: model,
602
- input: inputs,
603
- encoding_format: "float"
604
- })
605
- });
606
-
607
- if (!response.ok) {
608
- const errorText = await response.text();
609
- throw new Error(`OpenAI embeddings error (${response.status}): ${errorText}`);
610
- }
611
-
612
- return await response.json();
613
- }
614
- ```
615
-
616
- ---
617
-
618
- ### Phase 6: Update Documentation (20 minutes)
619
-
620
- #### 6.1 Update .env.example
621
-
622
- **Already covered in Phase 1.2**
623
-
624
- #### 6.2 Update README.md Cursor Section
625
-
626
- **File:** `README.md`
627
-
628
- **Update embeddings section (around line 1870):**
629
- ```markdown
630
- ### Enabling @Codebase Semantic Search (Optional)
631
-
632
- For Cursor's @Codebase semantic search, you need embeddings support.
633
-
634
- **⚡ Already using OpenRouter? You're all set!**
635
-
636
- If you configured `MODEL_PROVIDER=openrouter`, embeddings **work automatically** with the same `OPENROUTER_API_KEY` - no additional setup needed. OpenRouter handles both chat completions AND embeddings with one key.
637
-
638
- **🔧 Using a different provider? Choose your embeddings source:**
639
-
640
- You have 4 options, listed from most private to least:
641
-
642
- **Option A: Ollama (100% Local, FREE)**
643
- ```bash
644
- # Install Ollama and pull embedding model
645
- ollama pull nomic-embed-text
646
-
647
- # Add to .env
648
- OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
649
-
650
- # That's it! Works with any chat provider (Databricks, Bedrock, etc.)
651
- # Cost: FREE, Privacy: 100% local, Quality: Good
652
- ```
653
-
654
- **Option B: llama.cpp (100% Local, FREE)**
655
- ```bash
656
- # Download embedding model (GGUF format)
657
- # e.g., nomic-embed-text-v1.5.Q4_K_M.gguf
658
-
659
- # Start llama.cpp with embedding support
660
- ./llama-server -m nomic-embed-text-v1.5.Q4_K_M.gguf --port 8080 --embedding
661
-
662
- # Add to .env
663
- LLAMACPP_EMBEDDINGS_ENDPOINT=http://localhost:8080/embeddings
664
-
665
- # Cost: FREE, Privacy: 100% local, Quality: Good
666
- ```
667
-
668
- **Option C: OpenRouter (Cloud, Cheapest)**
669
- ```bash
670
- # Add to .env (if not already there)
671
- OPENROUTER_API_KEY=sk-or-v1-your-key-here
672
-
673
- # Optional: Specify embedding model
674
- OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-3-small
675
-
676
- # Cost: $0.02 per 1M tokens (~$0.01-0.10/month), Privacy: Cloud, Quality: Excellent
677
- ```
678
-
679
- **Option D: OpenAI Direct (Cloud)**
680
- ```bash
681
- # Add to .env
682
- OPENAI_API_KEY=sk-your-key-here
683
-
684
- # Cost: $0.10 per 1M tokens, Privacy: Cloud, Quality: Excellent
685
- ```
686
-
687
- **Restart Lynkr**, and @Codebase will work!
688
- ```
689
-
690
- #### 6.3 Add Embedding Models Guide
691
-
692
- **File:** `README.md`
693
-
694
- **Add new section after Cursor setup:**
695
- ```markdown
696
- ### Embedding Models Guide
697
-
698
- Different embedding models have different characteristics:
699
-
700
- | Model | Provider | Size | Dimensions | Quality | Speed | Privacy |
701
- |-------|----------|------|------------|---------|-------|---------|
702
- | **nomic-embed-text** | Ollama/llama.cpp | 137M | 768 | ⭐⭐⭐⭐ | Fast | 100% local |
703
- | **mxbai-embed-large** | Ollama | 335M | 1024 | ⭐⭐⭐⭐⭐ | Medium | 100% local |
704
- | **all-minilm** | Ollama | 23M | 384 | ⭐⭐⭐ | Fastest | 100% local |
705
- | **text-embedding-3-small** | OpenRouter/OpenAI | - | 1536 | ⭐⭐⭐⭐⭐ | Fast | Cloud |
706
- | **text-embedding-3-large** | OpenRouter/OpenAI | - | 3072 | ⭐⭐⭐⭐⭐ | Medium | Cloud |
707
- | **text-embedding-ada-002** | OpenRouter/OpenAI | - | 1536 | ⭐⭐⭐⭐ | Fast | Cloud |
708
- | **voyage-code-2** | OpenRouter | - | 1536 | ⭐⭐⭐⭐⭐ | Medium | Cloud |
709
-
710
- **Recommendations:**
711
- - **Best for privacy**: `nomic-embed-text` (Ollama, 100% local, free)
712
- - **Best for quality**: `text-embedding-3-large` (OpenRouter, $0.13/1M tokens)
713
- - **Best for speed**: `all-minilm` (Ollama, 100% local, free, fastest)
714
- - **Best balance**: `text-embedding-3-small` (OpenRouter, $0.02/1M tokens)
715
- - **Best for code**: `voyage-code-2` (OpenRouter, $0.12/1M tokens)
716
-
717
- **Setup examples:**
718
-
719
- ```bash
720
- # Privacy-first: 100% local
721
- MODEL_PROVIDER=ollama
722
- OLLAMA_MODEL=qwen2.5-coder:latest
723
- OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
724
-
725
- # Quality-first: Best models
726
- MODEL_PROVIDER=openrouter
727
- OPENROUTER_MODEL=anthropic/claude-3.5-sonnet
728
- OPENROUTER_EMBEDDINGS_MODEL=text-embedding-3-large
729
-
730
- # Cost-optimized: Cheapest cloud
731
- MODEL_PROVIDER=openrouter
732
- OPENROUTER_MODEL=openai/gpt-4o-mini
733
- OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-3-small
734
-
735
- # Hybrid: Local chat, cloud embeddings
736
- MODEL_PROVIDER=ollama
737
- OLLAMA_MODEL=qwen2.5-coder:latest
738
- OPENROUTER_API_KEY=sk-or-v1-...
739
- OPENROUTER_EMBEDDINGS_MODEL=openai/text-embedding-3-small
740
- ```
741
- ```
742
-
743
- ---
744
-
745
- ### Phase 7: Testing (30 minutes)
746
-
747
- #### 7.1 Update Test File
748
-
749
- **File:** `test/cursor-integration.test.js`
750
-
751
- **Add new test suite at the end (before closing}):**
752
- ```javascript
753
- describe("Local Embeddings Support", () => {
754
- it("should detect Ollama as embedding provider", () => {
755
- process.env.MODEL_PROVIDER = "ollama";
756
- process.env.OLLAMA_EMBEDDINGS_MODEL = "nomic-embed-text";
757
- delete require.cache[require.resolve("../src/config")];
758
- const config = require("../src/config");
759
-
760
- assert.strictEqual(config.ollama.embeddingsModel, "nomic-embed-text");
761
- });
762
-
763
- it("should detect llama.cpp as embedding provider", () => {
764
- process.env.MODEL_PROVIDER = "llamacpp";
765
- process.env.LLAMACPP_EMBEDDINGS_ENDPOINT = "http://localhost:8080/embeddings";
766
- delete require.cache[require.resolve("../src/config")];
767
- const config = require("../src/config");
768
-
769
- assert.strictEqual(config.llamacpp.embeddingsEndpoint, "http://localhost:8080/embeddings");
770
- });
771
-
772
- it("should prioritize explicit EMBEDDINGS_PROVIDER", () => {
773
- process.env.MODEL_PROVIDER = "databricks";
774
- process.env.EMBEDDINGS_PROVIDER = "ollama";
775
- process.env.OLLAMA_EMBEDDINGS_MODEL = "nomic-embed-text";
776
-
777
- delete require.cache[require.resolve("../src/config")];
778
- delete require.cache[require.resolve("../src/api/openai-router")];
779
-
780
- const config = require("../src/config");
781
- assert.strictEqual(config.modelProvider.type, "databricks");
782
- assert.strictEqual(config.ollama.embeddingsModel, "nomic-embed-text");
783
- });
784
-
785
- it("should default to same provider if it supports embeddings", () => {
786
- process.env.MODEL_PROVIDER = "openrouter";
787
- process.env.OPENROUTER_API_KEY = "sk-test";
788
- process.env.OPENROUTER_EMBEDDINGS_MODEL = "text-embedding-3-small";
789
- delete process.env.EMBEDDINGS_PROVIDER;
790
-
791
- delete require.cache[require.resolve("../src/config")];
792
- const config = require("../src/config");
793
-
794
- assert.strictEqual(config.modelProvider.type, "openrouter");
795
- assert.strictEqual(config.openrouter.embeddingsModel, "text-embedding-3-small");
796
- });
797
- });
798
- ```
799
-
800
- #### 7.2 Manual Testing Checklist
801
-
802
- **Test Ollama embeddings:**
803
- ```bash
804
- # 1. Pull embedding model
805
- ollama pull nomic-embed-text
806
-
807
- # 2. Configure .env
808
- MODEL_PROVIDER=ollama
809
- OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
810
-
811
- # 3. Start Lynkr
812
- lynkr start
813
-
814
- # 4. Test embeddings endpoint
815
- curl -X POST http://localhost:8080/v1/embeddings \
816
- -H "Content-Type: application/json" \
817
- -d '{"input": "test embedding"}'
818
-
819
- # Expected: Returns embedding vector
820
- ```
821
-
822
- **Test llama.cpp embeddings:**
823
- ```bash
824
- # 1. Download and start llama.cpp
825
- ./llama-server -m nomic-embed-text-v1.5.Q4_K_M.gguf --port 8080 --embedding
826
-
827
- # 2. Configure .env
828
- MODEL_PROVIDER=databricks
829
- LLAMACPP_EMBEDDINGS_ENDPOINT=http://localhost:8080/embeddings
830
-
831
- # 3. Test embeddings
832
- curl -X POST http://localhost:8080/v1/embeddings \
833
- -H "Content-Type: application/json" \
834
- -d '{"input": "test embedding"}'
835
- ```
836
-
837
- **Test in Cursor:**
838
- ```bash
839
- # 1. Configure Cursor to point to Lynkr
840
- # Settings → Models → OpenAI API
841
- # Base URL: http://localhost:8080/v1
842
-
843
- # 2. Try @Codebase search
844
- # In Cursor chat: "@Codebase where is authentication handled?"
845
-
846
- # Expected: Semantic search results
847
- ```
848
-
849
- ---
850
-
851
- ## File Changes Summary
852
-
853
- | File | Changes | Lines | Complexity |
854
- |------|---------|-------|------------|
855
- | `src/config/index.js` | Add embeddings config | +10 | Easy |
856
- | `src/api/openai-router.js` | Add provider detection + implementations | +400 | Medium |
857
- | `.env.example` | Document embedding models | +30 | Easy |
858
- | `README.md` | Update Cursor section | +100 | Easy |
859
- | `test/cursor-integration.test.js` | Add local embeddings tests | +50 | Easy |
860
-
861
- **Total:** ~590 lines of code
862
-
863
- ---
864
-
865
- ## Complexity & Effort Estimates
866
-
867
- | Phase | Description | Complexity | Effort | Value |
868
- |-------|-------------|------------|--------|-------|
869
- | 1 | Configuration | 2/10 | 30 min | Required |
870
- | 2 | Provider Detection | 4/10 | 20 min | Core |
871
- | 3 | Ollama Implementation | 6/10 | 45 min | High |
872
- | 4 | llama.cpp Implementation | 3/10 | 20 min | High |
873
- | 5 | Refactor Existing | 2/10 | 15 min | Cleanup |
874
- | 6 | Documentation | 2/10 | 20 min | UX |
875
- | 7 | Testing | 3/10 | 30 min | Quality |
876
-
877
- **Total Effort:** ~3 hours
878
- **Overall Complexity:** 4/10 (medium)
879
- **Value:** HIGH (enables 100% local Cursor setup)
880
-
881
- ---
882
-
883
- ## Priority Ranking
884
-
885
- ### Must Have (Core Functionality)
886
- 1. ✅ **Ollama support** - Most requested for privacy
887
- 2. ✅ **llama.cpp support** - GGUF model compatibility
888
- 3. ✅ **Provider detection** - Smart routing
889
- 4. ✅ **Configuration** - User control
890
-
891
- ### Should Have (Good UX)
892
- 5. ✅ **Documentation** - User guidance
893
- 6. ✅ **Error handling** - Helpful messages
894
- 7. ✅ **Testing** - Quality assurance
895
-
896
- ### Nice to Have (Future)
897
- 8. ⚠️ **Caching** - Cache embeddings for repeated queries
898
- 9. ⚠️ **Batch optimization** - Parallel Ollama requests
899
- 10. ⚠️ **Auto-detection** - Detect installed embedding models
900
-
901
- ---
902
-
903
- ## Benefits
904
-
905
- ### For Privacy-Conscious Users
906
- - ✅ **100% local setup**: Ollama chat + Ollama embeddings
907
- - ✅ **No cloud dependencies**: Works offline
908
- - ✅ **No API costs**: FREE forever
909
- - ✅ **No data leakage**: Code never leaves machine
910
-
911
- ### For Cost-Conscious Users
912
- - ✅ **Local embeddings FREE**: Ollama/llama.cpp
913
- - ✅ **Cloud fallback cheap**: OpenRouter $0.02/1M tokens
914
- - ✅ **Hybrid setup**: Local chat, cheap cloud embeddings
915
-
916
- ### For All Users
917
- - ✅ **More choice**: 4 embedding providers
918
- - ✅ **Better control**: Explicit provider selection
919
- - ✅ **Consistent UX**: Works with any chat provider
920
-
921
- ---
922
-
923
- ## Success Criteria
924
-
925
- ### Functional Requirements
926
- - [ ] Ollama embeddings work with single and multiple inputs
927
- - [ ] llama.cpp embeddings work with batch inputs
928
- - [ ] Provider detection prioritizes correctly
929
- - [ ] Cursor @Codebase search works with local embeddings
930
- - [ ] Error messages are helpful
931
-
932
- ### Quality Requirements
933
- - [ ] All tests pass (18 existing + 4 new = 22 total)
934
- - [ ] No breaking changes to existing OpenRouter/OpenAI
935
- - [ ] Documentation is clear and complete
936
- - [ ] Performance is acceptable (< 500ms per embedding)
937
-
938
- ### User Experience
939
- - [ ] Setup takes < 5 minutes
940
- - [ ] Works out of the box with Ollama
941
- - [ ] Clear error messages when misconfigured
942
- - [ ] README examples are copy-pasteable
943
-
944
- ---
945
-
946
- ## Risks & Mitigations
947
-
948
- | Risk | Impact | Mitigation |
949
- |------|--------|------------|
950
- | **Ollama doesn't support batch** | Medium | Loop through inputs sequentially |
951
- | **llama.cpp format changes** | Low | Use OpenAI-compatible mode |
952
- | **Embedding models not installed** | High | Clear setup instructions + error messages |
953
- | **Performance is slow** | Medium | Document model sizes, recommend lightweight models |
954
- | **Breaking existing OpenRouter** | High | Thorough testing, backward compatible |
955
-
956
- ---
957
-
958
- ## Next Steps
959
-
960
- 1. **Review this plan** - Get approval
961
- 2. **Start with Phase 1** - Configuration (30 min)
962
- 3. **Implement Phase 2** - Provider detection (20 min)
963
- 4. **Implement Phase 3** - Ollama (45 min)
964
- 5. **Implement Phase 4** - llama.cpp (20 min)
965
- 6. **Complete remaining phases** - Refactor, docs, tests (65 min)
966
- 7. **Test end-to-end** - Cursor @Codebase with local embeddings
967
- 8. **Update changelog** - Document new feature
968
-
969
- **Total time:** ~3 hours of focused work
970
-
971
- ---
972
-
973
- ## Questions to Answer Before Implementation
974
-
975
- 1. **Should EMBEDDINGS_PROVIDER be required or auto-detected?**
976
- - Recommendation: Auto-detect, allow override
977
-
978
- 2. **Should we cache embeddings to avoid repeated API calls?**
979
- - Recommendation: Not in MVP, add later if needed
980
-
981
- 3. **Should we support multiple embedding models simultaneously?**
982
- - Recommendation: No, one at a time
983
-
984
- 4. **Should we validate embedding model is actually installed/running?**
985
- - Recommendation: Yes, in error messages
986
-
987
- 5. **Should we add embedding model benchmarks to docs?**
988
- - Recommendation: Yes, quality/speed/size comparison
989
-
990
- ---
991
-
992
- ## Alternative Approaches Considered
993
-
994
- ### Approach A: Unified Embeddings Service
995
- **Pros:** Single endpoint handles all providers
996
- **Cons:** More complex, harder to debug
997
- **Decision:** ❌ Rejected - Too complex for MVP
998
-
999
- ### Approach B: Separate /v1/embeddings/* Routes
1000
- **Pros:** Clear separation, easy to extend
1001
- **Cons:** Non-standard, breaks OpenAI compatibility
1002
- **Decision:** ❌ Rejected - Breaks Cursor compatibility
1003
-
1004
- ### Approach C: Provider-Specific Routing (Chosen)
1005
- **Pros:** Clean, extensible, backward compatible
1006
- **Cons:** Slightly more code
1007
- **Decision:** ✅ Chosen - Best balance
1008
-
1009
- ---
1010
-
1011
- ## Post-Implementation TODO
1012
-
1013
- - [ ] Add embeddings performance metrics to monitoring
1014
- - [ ] Add embedding model benchmarks to docs
1015
- - [ ] Consider adding embeddings caching layer
1016
- - [ ] Add support for custom embedding endpoints
1017
- - [ ] Add embedding model auto-download for Ollama
1018
- - [ ] Add embedding dimension validation
1019
- - [ ] Add support for normalized embeddings option
1020
- - [ ] Consider adding batch optimization for Ollama (parallel requests)
1021
-
1022
- ---
1023
-
1024
- **Ready to implement? Let me know and I'll start with Phase 1!**