codebaxing 0.2.44 → 0.2.48

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -88,19 +88,40 @@ After installing, AI agents can use these tools:
88
88
 
89
89
  ## Configuration
90
90
 
91
+ ### Cloud Embedding (Fastest)
92
+
93
+ Use OpenAI or Voyage for ~25x faster indexing:
94
+
95
+ ```bash
96
+ # OpenAI (text-embedding-3-small, 384 dims)
97
+ CODEBAXING_EMBEDDING_PROVIDER=openai OPENAI_API_KEY=sk-... npx codebaxing@latest index /path
98
+
99
+ # Voyage (voyage-code-3, 1024 dims, code-optimized)
100
+ CODEBAXING_EMBEDDING_PROVIDER=voyage VOYAGE_API_KEY=va-... npx codebaxing@latest index /path
101
+ ```
102
+
103
+ > **Note:** Switching between local and cloud providers requires full re-index due to dimension differences.
104
+
91
105
  ### Environment Variables
92
106
 
93
107
  | Variable | Description | Default |
94
108
  |----------|-------------|---------|
95
109
  | `CHROMADB_URL` | ChromaDB server URL | `http://localhost:8000` |
96
- | `CODEBAXING_DEVICE` | Compute: `cpu`, `cuda` | `cpu` |
97
- | `CODEBAXING_WORKERS` | Worker threads for parallel embedding (0=off, 1-8) | `2` |
110
+ | `CODEBAXING_EMBEDDING_PROVIDER` | Embedding backend: `local`, `openai`, `voyage` | `local` |
111
+ | `CODEBAXING_DEVICE` | Compute device (local only): `cpu`, `cuda` | `cpu` |
112
+ | `CODEBAXING_DTYPE` | Model quantization (local only): `fp32`, `fp16`, `q8`, `q4` | `q8` |
113
+ | `CODEBAXING_WORKERS` | Worker threads for parallel embedding (local only, 0=off) | `2` |
98
114
  | `CODEBAXING_MAX_FILE_SIZE` | Max file size in MB | `1` |
99
- | `CODEBAXING_MAX_CHUNKS` | Max chunks to index | `100000` |
115
+ | `CODEBAXING_MAX_CHUNKS` | Max chunks to index | `500000` |
100
116
  | `CODEBAXING_FILES_PER_BATCH` | Files per batch (lower = less RAM) | `100` |
101
117
  | `CODEBAXING_PARALLEL_BATCHES` | Concurrent batches | `3` |
102
118
  | `CODEBAXING_METADATA_SAVE_INTERVAL` | Save progress every N batches | `10` |
103
- | `CODEBAXING_MODEL_CACHE` | Directory for embedding model cache | `~/.cache/codebaxing/models` |
119
+ | `CODEBAXING_MODEL_CACHE` | Model cache directory (local only) | `~/.cache/codebaxing/models` |
120
+ | `CODEBAXING_OPENAI_API_KEY` | OpenAI API key (or use `OPENAI_API_KEY`) | - |
121
+ | `CODEBAXING_VOYAGE_API_KEY` | Voyage API key (or use `VOYAGE_API_KEY`) | - |
122
+ | `CODEBAXING_EMBEDDING_MODEL` | Override embedding model name | per-provider default |
123
+ | `CODEBAXING_EMBEDDING_DIMENSIONS` | Override embedding dimensions | per-provider default |
124
+ | `CODEBAXING_EMBEDDING_BASE_URL` | Custom API endpoint for cloud providers | provider default |
104
125
 
105
126
  ### Manual Editor Config
106
127
 
@@ -163,13 +184,16 @@ Python, JavaScript, TypeScript, Go, Rust, Java, C/C++, C#, Ruby, PHP, Kotlin, Sw
163
184
 
164
185
  | Component | Technology |
165
186
  |-----------|------------|
166
- | Embedding Model | `all-MiniLM-L6-v2` (384 dimensions, ONNX) |
167
- | Model Cache | `~/.cache/codebaxing/models/` (~90MB, downloaded once) |
187
+ | Local Embedding | `all-MiniLM-L6-v2` (384 dims, ONNX, q8 quantized) |
188
+ | Cloud Embedding | OpenAI `text-embedding-3-small` or Voyage `voyage-code-3` |
189
+ | Model Cache | `~/.cache/codebaxing/models/` (local only, downloaded once) |
168
190
  | Vector Database | ChromaDB |
169
- | Code Parser | Tree-sitter |
191
+ | Code Parser | Tree-sitter (28 languages) |
170
192
  | MCP SDK | `@modelcontextprotocol/sdk` |
171
193
 
172
- The embedding model is downloaded from HuggingFace on first run and cached locally at `~/.cache/codebaxing/models/`. Subsequent runs reuse the cached model without network access. To use a custom cache location, set `CODEBAXING_MODEL_CACHE`.
194
+ **Local mode**: The embedding model is downloaded from HuggingFace on first run and cached at `~/.cache/codebaxing/models/`. Uses q8 quantization (~3x faster than fp32). No network access after initial download.
195
+
196
+ **Cloud mode**: Sends code chunks to OpenAI/Voyage API for embedding. ~25x faster than local CPU. Requires API key.
173
197
 
174
198
  ## License
175
199
 
package/README.vi.md CHANGED
@@ -88,18 +88,38 @@ Sau khi cài, AI agents có thể dùng các tools:
88
88
 
89
89
  ## Cấu Hình
90
90
 
91
+ ### Cloud Embedding (Nhanh nhất)
92
+
93
+ Dùng OpenAI hoặc Voyage để index nhanh ~25x:
94
+
95
+ ```bash
96
+ # OpenAI (text-embedding-3-small, 384 dims)
97
+ CODEBAXING_EMBEDDING_PROVIDER=openai OPENAI_API_KEY=sk-... npx codebaxing@latest index /path
98
+
99
+ # Voyage (voyage-code-3, 1024 dims, tối ưu cho code)
100
+ CODEBAXING_EMBEDDING_PROVIDER=voyage VOYAGE_API_KEY=va-... npx codebaxing@latest index /path
101
+ ```
102
+
103
+ > **Lưu ý:** Chuyển đổi giữa local và cloud cần re-index do khác dimension.
104
+
91
105
  ### Biến Môi Trường
92
106
 
93
107
  | Biến | Mô tả | Mặc định |
94
108
  |------|-------|----------|
95
109
  | `CHROMADB_URL` | URL ChromaDB server | `http://localhost:8000` |
96
- | `CODEBAXING_DEVICE` | Compute: `cpu`, `cuda` | `cpu` |
97
- | `CODEBAXING_WORKERS` | Worker threads cho embedding song song (0=tắt, 1-8) | `2` |
110
+ | `CODEBAXING_EMBEDDING_PROVIDER` | Backend embedding: `local`, `openai`, `voyage` | `local` |
111
+ | `CODEBAXING_DEVICE` | Compute device (chỉ local): `cpu`, `cuda` | `cpu` |
112
+ | `CODEBAXING_DTYPE` | Quantization (chỉ local): `fp32`, `fp16`, `q8`, `q4` | `q8` |
113
+ | `CODEBAXING_WORKERS` | Worker threads cho embedding (chỉ local, 0=tắt) | `2` |
98
114
  | `CODEBAXING_MAX_FILE_SIZE` | Kích thước file tối đa (MB) | `1` |
99
- | `CODEBAXING_MAX_CHUNKS` | Số chunks tối đa | `100000` |
115
+ | `CODEBAXING_MAX_CHUNKS` | Số chunks tối đa | `500000` |
100
116
  | `CODEBAXING_FILES_PER_BATCH` | Files mỗi batch (thấp = ít RAM) | `100` |
101
117
  | `CODEBAXING_PARALLEL_BATCHES` | Số batches chạy song song | `3` |
102
118
  | `CODEBAXING_METADATA_SAVE_INTERVAL` | Lưu tiến trình mỗi N batches | `10` |
119
+ | `CODEBAXING_OPENAI_API_KEY` | OpenAI API key (hoặc dùng `OPENAI_API_KEY`) | - |
120
+ | `CODEBAXING_VOYAGE_API_KEY` | Voyage API key (hoặc dùng `VOYAGE_API_KEY`) | - |
121
+ | `CODEBAXING_EMBEDDING_MODEL` | Override tên model embedding | mặc định theo provider |
122
+ | `CODEBAXING_EMBEDDING_DIMENSIONS` | Override số dimensions | mặc định theo provider |
103
123
 
104
124
  ### Cấu Hình Editor Thủ Công
105
125
 
@@ -162,11 +182,16 @@ Python, JavaScript, TypeScript, Go, Rust, Java, C/C++, C#, Ruby, PHP, Kotlin, Sw
162
182
 
163
183
  | Component | Công nghệ |
164
184
  |-----------|-----------|
165
- | Embedding Model | `all-MiniLM-L6-v2` (384 dimensions) |
185
+ | Local Embedding | `all-MiniLM-L6-v2` (384 dims, ONNX, q8 quantized) |
186
+ | Cloud Embedding | OpenAI `text-embedding-3-small` hoặc Voyage `voyage-code-3` |
166
187
  | Vector Database | ChromaDB |
167
- | Code Parser | Tree-sitter |
188
+ | Code Parser | Tree-sitter (28 ngôn ngữ) |
168
189
  | MCP SDK | `@modelcontextprotocol/sdk` |
169
190
 
191
+ **Local mode**: Model tải từ HuggingFace lần đầu, cache tại `~/.cache/codebaxing/models/`. Dùng q8 quantization (~3x nhanh hơn fp32). Không cần mạng sau lần đầu.
192
+
193
+ **Cloud mode**: Gửi code chunks đến OpenAI/Voyage API. ~25x nhanh hơn local CPU. Cần API key.
194
+
170
195
  ## License
171
196
 
172
197
  MIT
@@ -10,4 +10,12 @@ export interface IParser {
10
10
  /** Return list of file extensions this parser supports. */
11
11
  getSupportedExtensions(): string[];
12
12
  }
13
+ export interface IEmbeddingService {
14
+ readonly dimensions: number;
15
+ embed(text: string, isQuery?: boolean): Promise<number[]>;
16
+ embedBatch(texts: string[], isQuery?: boolean): Promise<number[][]>;
17
+ getStats(): Record<string, unknown>;
18
+ clearCache(): void;
19
+ unload(): Promise<void>;
20
+ }
13
21
  //# sourceMappingURL=interfaces.d.ts.map
@@ -1 +1 @@
1
- {"version":3,"file":"interfaces.d.ts","sourceRoot":"","sources":["../../src/core/interfaces.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,OAAO,KAAK,EAAE,UAAU,EAAE,MAAM,aAAa,CAAC;AAE9C,MAAM,WAAW,OAAO;IACtB,0DAA0D;IAC1D,QAAQ,CAAC,QAAQ,EAAE,MAAM,GAAG,OAAO,CAAC;IAEpC,mDAAmD;IACnD,SAAS,CAAC,QAAQ,EAAE,MAAM,GAAG,UAAU,CAAC;IAExC,2DAA2D;IAC3D,sBAAsB,IAAI,MAAM,EAAE,CAAC;CACpC"}
1
+ {"version":3,"file":"interfaces.d.ts","sourceRoot":"","sources":["../../src/core/interfaces.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,OAAO,KAAK,EAAE,UAAU,EAAE,MAAM,aAAa,CAAC;AAE9C,MAAM,WAAW,OAAO;IACtB,0DAA0D;IAC1D,QAAQ,CAAC,QAAQ,EAAE,MAAM,GAAG,OAAO,CAAC;IAEpC,mDAAmD;IACnD,SAAS,CAAC,QAAQ,EAAE,MAAM,GAAG,UAAU,CAAC;IAExC,2DAA2D;IAC3D,sBAAsB,IAAI,MAAM,EAAE,CAAC;CACpC;AAED,MAAM,WAAW,iBAAiB;IAChC,QAAQ,CAAC,UAAU,EAAE,MAAM,CAAC;IAC5B,KAAK,CAAC,IAAI,EAAE,MAAM,EAAE,OAAO,CAAC,EAAE,OAAO,GAAG,OAAO,CAAC,MAAM,EAAE,CAAC,CAAC;IAC1D,UAAU,CAAC,KAAK,EAAE,MAAM,EAAE,EAAE,OAAO,CAAC,EAAE,OAAO,GAAG,OAAO,CAAC,MAAM,EAAE,EAAE,CAAC,CAAC;IACpE,QAAQ,IAAI,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,CAAC;IACpC,UAAU,IAAI,IAAI,CAAC;IACnB,MAAM,IAAI,OAAO,CAAC,IAAI,CAAC,CAAC;CACzB"}
package/dist/index.d.ts CHANGED
@@ -7,7 +7,7 @@
7
7
  export declare const VERSION = "0.1.0";
8
8
  export { Symbol, SymbolType, ParsedFile, CodebaseIndex } from './core/models.js';
9
9
  export { Memory, MemoryType } from './core/models.js';
10
- export type { IParser } from './core/interfaces.js';
10
+ export type { IParser, IEmbeddingService } from './core/interfaces.js';
11
11
  export { CodebaxingError, ParseError, IndexingError, EmbeddingError, SearchError, ConfigurationError, } from './core/exceptions.js';
12
12
  export { TreeSitterParser } from './parsers/treesitter-parser.js';
13
13
  export { getLanguageForFile, getSupportedExtensions, EXTENSION_MAP } from './parsers/language-configs.js';
@@ -15,4 +15,6 @@ export { SourceRetriever, discoverFiles, loadIgnoreConfig, ensureIgnoreConfig }
15
15
  export type { IgnoreConfig } from './indexing/source-retriever.js';
16
16
  export { MemoryRetriever } from './indexing/memory-retriever.js';
17
17
  export { EmbeddingService, getEmbeddingService } from './indexing/embedding-service.js';
18
+ export { CloudEmbeddingService } from './indexing/cloud-embedding-service.js';
19
+ export { createEmbeddingService, getConfiguredProvider } from './indexing/embedding-factory.js';
18
20
  //# sourceMappingURL=index.d.ts.map
@@ -1 +1 @@
1
- {"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,eAAO,MAAM,OAAO,UAAU,CAAC;AAG/B,OAAO,EAAE,MAAM,EAAE,UAAU,EAAE,UAAU,EAAE,aAAa,EAAE,MAAM,kBAAkB,CAAC;AACjF,OAAO,EAAE,MAAM,EAAE,UAAU,EAAE,MAAM,kBAAkB,CAAC;AACtD,YAAY,EAAE,OAAO,EAAE,MAAM,sBAAsB,CAAC;AAGpD,OAAO,EACL,eAAe,EACf,UAAU,EACV,aAAa,EACb,cAAc,EACd,WAAW,EACX,kBAAkB,GACnB,MAAM,sBAAsB,CAAC;AAG9B,OAAO,EAAE,gBAAgB,EAAE,MAAM,gCAAgC,CAAC;AAClE,OAAO,EAAE,kBAAkB,EAAE,sBAAsB,EAAE,aAAa,EAAE,MAAM,+BAA+B,CAAC;AAG1G,OAAO,EAAE,eAAe,EAAE,aAAa,EAAE,gBAAgB,EAAE,kBAAkB,EAAE,MAAM,gCAAgC,CAAC;AACtH,YAAY,EAAE,YAAY,EAAE,MAAM,gCAAgC,CAAC;AACnE,OAAO,EAAE,eAAe,EAAE,MAAM,gCAAgC,CAAC;AACjE,OAAO,EAAE,gBAAgB,EAAE,mBAAmB,EAAE,MAAM,iCAAiC,CAAC"}
1
+ {"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,eAAO,MAAM,OAAO,UAAU,CAAC;AAG/B,OAAO,EAAE,MAAM,EAAE,UAAU,EAAE,UAAU,EAAE,aAAa,EAAE,MAAM,kBAAkB,CAAC;AACjF,OAAO,EAAE,MAAM,EAAE,UAAU,EAAE,MAAM,kBAAkB,CAAC;AACtD,YAAY,EAAE,OAAO,EAAE,iBAAiB,EAAE,MAAM,sBAAsB,CAAC;AAGvE,OAAO,EACL,eAAe,EACf,UAAU,EACV,aAAa,EACb,cAAc,EACd,WAAW,EACX,kBAAkB,GACnB,MAAM,sBAAsB,CAAC;AAG9B,OAAO,EAAE,gBAAgB,EAAE,MAAM,gCAAgC,CAAC;AAClE,OAAO,EAAE,kBAAkB,EAAE,sBAAsB,EAAE,aAAa,EAAE,MAAM,+BAA+B,CAAC;AAG1G,OAAO,EAAE,eAAe,EAAE,aAAa,EAAE,gBAAgB,EAAE,kBAAkB,EAAE,MAAM,gCAAgC,CAAC;AACtH,YAAY,EAAE,YAAY,EAAE,MAAM,gCAAgC,CAAC;AACnE,OAAO,EAAE,eAAe,EAAE,MAAM,gCAAgC,CAAC;AACjE,OAAO,EAAE,gBAAgB,EAAE,mBAAmB,EAAE,MAAM,iCAAiC,CAAC;AACxF,OAAO,EAAE,qBAAqB,EAAE,MAAM,uCAAuC,CAAC;AAC9E,OAAO,EAAE,sBAAsB,EAAE,qBAAqB,EAAE,MAAM,iCAAiC,CAAC"}
package/dist/index.js CHANGED
@@ -17,4 +17,6 @@ export { getLanguageForFile, getSupportedExtensions, EXTENSION_MAP } from './par
17
17
  export { SourceRetriever, discoverFiles, loadIgnoreConfig, ensureIgnoreConfig } from './indexing/source-retriever.js';
18
18
  export { MemoryRetriever } from './indexing/memory-retriever.js';
19
19
  export { EmbeddingService, getEmbeddingService } from './indexing/embedding-service.js';
20
+ export { CloudEmbeddingService } from './indexing/cloud-embedding-service.js';
21
+ export { createEmbeddingService, getConfiguredProvider } from './indexing/embedding-factory.js';
20
22
  //# sourceMappingURL=index.js.map
package/dist/index.js.map CHANGED
@@ -1 +1 @@
1
- {"version":3,"file":"index.js","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,MAAM,CAAC,MAAM,OAAO,GAAG,OAAO,CAAC;AAE/B,cAAc;AACd,OAAO,EAAE,MAAM,EAAE,UAAU,EAAE,UAAU,EAAE,aAAa,EAAE,MAAM,kBAAkB,CAAC;AACjF,OAAO,EAAE,MAAM,EAAE,UAAU,EAAE,MAAM,kBAAkB,CAAC;AAGtD,aAAa;AACb,OAAO,EACL,eAAe,EACf,UAAU,EACV,aAAa,EACb,cAAc,EACd,WAAW,EACX,kBAAkB,GACnB,MAAM,sBAAsB,CAAC;AAE9B,UAAU;AACV,OAAO,EAAE,gBAAgB,EAAE,MAAM,gCAAgC,CAAC;AAClE,OAAO,EAAE,kBAAkB,EAAE,sBAAsB,EAAE,aAAa,EAAE,MAAM,+BAA+B,CAAC;AAE1G,WAAW;AACX,OAAO,EAAE,eAAe,EAAE,aAAa,EAAE,gBAAgB,EAAE,kBAAkB,EAAE,MAAM,gCAAgC,CAAC;AAEtH,OAAO,EAAE,eAAe,EAAE,MAAM,gCAAgC,CAAC;AACjE,OAAO,EAAE,gBAAgB,EAAE,mBAAmB,EAAE,MAAM,iCAAiC,CAAC"}
1
+ {"version":3,"file":"index.js","sourceRoot":"","sources":["../src/index.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,MAAM,CAAC,MAAM,OAAO,GAAG,OAAO,CAAC;AAE/B,cAAc;AACd,OAAO,EAAE,MAAM,EAAE,UAAU,EAAE,UAAU,EAAE,aAAa,EAAE,MAAM,kBAAkB,CAAC;AACjF,OAAO,EAAE,MAAM,EAAE,UAAU,EAAE,MAAM,kBAAkB,CAAC;AAGtD,aAAa;AACb,OAAO,EACL,eAAe,EACf,UAAU,EACV,aAAa,EACb,cAAc,EACd,WAAW,EACX,kBAAkB,GACnB,MAAM,sBAAsB,CAAC;AAE9B,UAAU;AACV,OAAO,EAAE,gBAAgB,EAAE,MAAM,gCAAgC,CAAC;AAClE,OAAO,EAAE,kBAAkB,EAAE,sBAAsB,EAAE,aAAa,EAAE,MAAM,+BAA+B,CAAC;AAE1G,WAAW;AACX,OAAO,EAAE,eAAe,EAAE,aAAa,EAAE,gBAAgB,EAAE,kBAAkB,EAAE,MAAM,gCAAgC,CAAC;AAEtH,OAAO,EAAE,eAAe,EAAE,MAAM,gCAAgC,CAAC;AACjE,OAAO,EAAE,gBAAgB,EAAE,mBAAmB,EAAE,MAAM,iCAAiC,CAAC;AACxF,OAAO,EAAE,qBAAqB,EAAE,MAAM,uCAAuC,CAAC;AAC9E,OAAO,EAAE,sBAAsB,EAAE,qBAAqB,EAAE,MAAM,iCAAiC,CAAC"}
@@ -0,0 +1,38 @@
1
+ /**
2
+ * Cloud embedding service using OpenAI or Voyage APIs.
3
+ *
4
+ * Sends texts to a cloud embedding API instead of running ONNX locally.
5
+ * Significantly faster for large codebases (~10,000+ texts/sec vs ~400 on CPU).
6
+ */
7
+ import type { IEmbeddingService } from '../core/interfaces.js';
8
+ export type CloudProvider = 'openai' | 'voyage';
9
+ export interface CloudEmbeddingConfig {
10
+ provider: CloudProvider;
11
+ apiKey: string;
12
+ model?: string;
13
+ dimensions?: number;
14
+ baseUrl?: string;
15
+ batchSize?: number;
16
+ concurrency?: number;
17
+ }
18
+ export declare class CloudEmbeddingService implements IEmbeddingService {
19
+ private provider;
20
+ private apiKey;
21
+ private model;
22
+ private _dimensions;
23
+ private baseUrl;
24
+ private batchSize;
25
+ private concurrency;
26
+ private cache;
27
+ private cacheMaxSize;
28
+ private stats;
29
+ constructor(config: CloudEmbeddingConfig);
30
+ get dimensions(): number;
31
+ embed(text: string, isQuery?: boolean): Promise<number[]>;
32
+ embedBatch(texts: string[], _isQuery?: boolean): Promise<number[][]>;
33
+ private callApi;
34
+ getStats(): Record<string, unknown>;
35
+ clearCache(): void;
36
+ unload(): Promise<void>;
37
+ }
38
+ //# sourceMappingURL=cloud-embedding-service.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"cloud-embedding-service.d.ts","sourceRoot":"","sources":["../../src/indexing/cloud-embedding-service.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,OAAO,KAAK,EAAE,iBAAiB,EAAE,MAAM,uBAAuB,CAAC;AAK/D,MAAM,MAAM,aAAa,GAAG,QAAQ,GAAG,QAAQ,CAAC;AAEhD,MAAM,WAAW,oBAAoB;IACnC,QAAQ,EAAE,aAAa,CAAC;IACxB,MAAM,EAAE,MAAM,CAAC;IACf,KAAK,CAAC,EAAE,MAAM,CAAC;IACf,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB,OAAO,CAAC,EAAE,MAAM,CAAC;IACjB,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,WAAW,CAAC,EAAE,MAAM,CAAC;CACtB;AA0BD,qBAAa,qBAAsB,YAAW,iBAAiB;IAC7D,OAAO,CAAC,QAAQ,CAAgB;IAChC,OAAO,CAAC,MAAM,CAAS;IACvB,OAAO,CAAC,KAAK,CAAS;IACtB,OAAO,CAAC,WAAW,CAAS;IAC5B,OAAO,CAAC,OAAO,CAAS;IACxB,OAAO,CAAC,SAAS,CAAS;IAC1B,OAAO,CAAC,WAAW,CAAS;IAG5B,OAAO,CAAC,KAAK,CAAoC;IACjD,OAAO,CAAC,YAAY,CAAQ;IAG5B,OAAO,CAAC,KAAK,CAQX;gBAEU,MAAM,EAAE,oBAAoB;IAgBxC,IAAI,UAAU,IAAI,MAAM,CAEvB;IAEK,KAAK,CAAC,IAAI,EAAE,MAAM,EAAE,OAAO,GAAE,OAAe,GAAG,OAAO,CAAC,MAAM,EAAE,CAAC;IAYhE,UAAU,CAAC,KAAK,EAAE,MAAM,EAAE,EAAE,QAAQ,GAAE,OAAe,GAAG,OAAO,CAAC,MAAM,EAAE,EAAE,CAAC;YA8CnE,OAAO;IAoErB,QAAQ,IAAI,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC;IAWnC,UAAU,IAAI,IAAI;IAMZ,MAAM,IAAI,OAAO,CAAC,IAAI,CAAC;CAG9B"}
@@ -0,0 +1,181 @@
1
+ /**
2
+ * Cloud embedding service using OpenAI or Voyage APIs.
3
+ *
4
+ * Sends texts to a cloud embedding API instead of running ONNX locally.
5
+ * Significantly faster for large codebases (~10,000+ texts/sec vs ~400 on CPU).
6
+ */
7
+ import { EmbeddingError } from '../core/exceptions.js';
8
+ const PROVIDER_DEFAULTS = {
9
+ openai: {
10
+ model: 'text-embedding-3-small',
11
+ dimensions: 384,
12
+ batchSize: 2048,
13
+ baseUrl: 'https://api.openai.com/v1/embeddings',
14
+ },
15
+ voyage: {
16
+ model: 'voyage-code-3',
17
+ dimensions: 1024,
18
+ batchSize: 128,
19
+ baseUrl: 'https://api.voyageai.com/v1/embeddings',
20
+ },
21
+ };
22
+ // ─── CloudEmbeddingService ───────────────────────────────────────────────────
23
+ export class CloudEmbeddingService {
24
+ provider;
25
+ apiKey;
26
+ model;
27
+ _dimensions;
28
+ baseUrl;
29
+ batchSize;
30
+ concurrency;
31
+ // LRU cache
32
+ cache = new Map();
33
+ cacheMaxSize = 5000;
34
+ // Stats
35
+ stats = {
36
+ totalEmbeddings: 0,
37
+ totalBatches: 0,
38
+ totalTime: 0,
39
+ cacheHits: 0,
40
+ cacheMisses: 0,
41
+ apiCalls: 0,
42
+ totalTokens: 0,
43
+ };
44
+ constructor(config) {
45
+ this.provider = config.provider;
46
+ this.apiKey = config.apiKey;
47
+ const defaults = PROVIDER_DEFAULTS[config.provider];
48
+ this.model = config.model ?? defaults.model;
49
+ this._dimensions = config.dimensions ?? defaults.dimensions;
50
+ this.baseUrl = config.baseUrl ?? defaults.baseUrl;
51
+ this.batchSize = config.batchSize ?? defaults.batchSize;
52
+ this.concurrency = config.concurrency ?? 5;
53
+ console.error(`[codebaxing] Using cloud embeddings: ${this.provider} (model: ${this.model}, dims: ${this._dimensions})`);
54
+ }
55
+ get dimensions() {
56
+ return this._dimensions;
57
+ }
58
+ async embed(text, isQuery = false) {
59
+ const cacheKey = `${isQuery ? 'q:' : 'd:'}${text}`;
60
+ const cached = this.cache.get(cacheKey);
61
+ if (cached) {
62
+ this.stats.cacheHits++;
63
+ return cached;
64
+ }
65
+ const results = await this.embedBatch([text], isQuery);
66
+ return results[0];
67
+ }
68
+ async embedBatch(texts, _isQuery = false) {
69
+ if (texts.length === 0)
70
+ return [];
71
+ const startTime = performance.now();
72
+ // Split into API-sized batches
73
+ const batches = [];
74
+ for (let i = 0; i < texts.length; i += this.batchSize) {
75
+ batches.push(texts.slice(i, i + this.batchSize));
76
+ }
77
+ // Process with concurrency limit
78
+ const allEmbeddings = new Array(texts.length);
79
+ let offset = 0;
80
+ for (let i = 0; i < batches.length; i += this.concurrency) {
81
+ const concurrent = batches.slice(i, i + this.concurrency);
82
+ const promises = concurrent.map(batch => this.callApi(batch));
83
+ const results = await Promise.all(promises);
84
+ for (const embeddings of results) {
85
+ for (const embedding of embeddings) {
86
+ allEmbeddings[offset++] = embedding;
87
+ }
88
+ }
89
+ }
90
+ // Update cache
91
+ for (let i = 0; i < texts.length; i++) {
92
+ const cacheKey = `d:${texts[i]}`;
93
+ this.stats.cacheMisses++;
94
+ if (this.cache.size >= this.cacheMaxSize) {
95
+ const firstKey = this.cache.keys().next().value;
96
+ this.cache.delete(firstKey);
97
+ }
98
+ this.cache.set(cacheKey, allEmbeddings[i]);
99
+ }
100
+ const elapsed = (performance.now() - startTime) / 1000;
101
+ this.stats.totalEmbeddings += texts.length;
102
+ this.stats.totalBatches++;
103
+ this.stats.totalTime += elapsed;
104
+ return allEmbeddings;
105
+ }
106
+ async callApi(texts, retries = 3) {
107
+ const body = {
108
+ model: this.model,
109
+ input: texts,
110
+ };
111
+ // OpenAI supports dimension reduction
112
+ if (this.provider === 'openai' && this._dimensions) {
113
+ body.dimensions = this._dimensions;
114
+ }
115
+ // Voyage uses input_type
116
+ if (this.provider === 'voyage') {
117
+ body.input_type = 'document';
118
+ }
119
+ for (let attempt = 0; attempt < retries; attempt++) {
120
+ try {
121
+ const response = await fetch(this.baseUrl, {
122
+ method: 'POST',
123
+ headers: {
124
+ 'Content-Type': 'application/json',
125
+ 'Authorization': `Bearer ${this.apiKey}`,
126
+ },
127
+ body: JSON.stringify(body),
128
+ });
129
+ if (response.status === 429 || response.status >= 500) {
130
+ const retryAfter = response.headers.get('retry-after');
131
+ const waitMs = retryAfter ? parseInt(retryAfter, 10) * 1000 : Math.pow(2, attempt) * 1000;
132
+ console.error(`[codebaxing] API rate limited (${response.status}), retrying in ${waitMs}ms...`);
133
+ await new Promise(r => setTimeout(r, waitMs));
134
+ continue;
135
+ }
136
+ if (!response.ok) {
137
+ const errorText = await response.text();
138
+ throw new EmbeddingError(`${this.provider} API error (${response.status}): ${errorText.slice(0, 200)}`);
139
+ }
140
+ const json = await response.json();
141
+ this.stats.apiCalls++;
142
+ if (json.usage) {
143
+ this.stats.totalTokens += json.usage.total_tokens ?? json.usage.prompt_tokens ?? 0;
144
+ }
145
+ // Sort by index to maintain order
146
+ const sorted = json.data.sort((a, b) => a.index - b.index);
147
+ return sorted.map(d => d.embedding);
148
+ }
149
+ catch (e) {
150
+ if (e instanceof EmbeddingError)
151
+ throw e;
152
+ if (attempt === retries - 1) {
153
+ throw new EmbeddingError(`${this.provider} API call failed: ${e.message}`);
154
+ }
155
+ const waitMs = Math.pow(2, attempt) * 1000;
156
+ console.error(`[codebaxing] API error, retrying in ${waitMs}ms: ${e.message}`);
157
+ await new Promise(r => setTimeout(r, waitMs));
158
+ }
159
+ }
160
+ throw new EmbeddingError(`${this.provider} API failed after ${retries} retries`);
161
+ }
162
+ getStats() {
163
+ return {
164
+ ...this.stats,
165
+ provider: this.provider,
166
+ model: this.model,
167
+ embeddingsPerSecond: this.stats.totalTime > 0
168
+ ? this.stats.totalEmbeddings / this.stats.totalTime
169
+ : 0,
170
+ };
171
+ }
172
+ clearCache() {
173
+ this.cache.clear();
174
+ this.stats.cacheHits = 0;
175
+ this.stats.cacheMisses = 0;
176
+ }
177
+ async unload() {
178
+ this.cache.clear();
179
+ }
180
+ }
181
+ //# sourceMappingURL=cloud-embedding-service.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"cloud-embedding-service.js","sourceRoot":"","sources":["../../src/indexing/cloud-embedding-service.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAGH,OAAO,EAAE,cAAc,EAAE,MAAM,uBAAuB,CAAC;AAuBvD,MAAM,iBAAiB,GAA4C;IACjE,MAAM,EAAE;QACN,KAAK,EAAE,wBAAwB;QAC/B,UAAU,EAAE,GAAG;QACf,SAAS,EAAE,IAAI;QACf,OAAO,EAAE,sCAAsC;KAChD;IACD,MAAM,EAAE;QACN,KAAK,EAAE,eAAe;QACtB,UAAU,EAAE,IAAI;QAChB,SAAS,EAAE,GAAG;QACd,OAAO,EAAE,wCAAwC;KAClD;CACF,CAAC;AAEF,gFAAgF;AAEhF,MAAM,OAAO,qBAAqB;IACxB,QAAQ,CAAgB;IACxB,MAAM,CAAS;IACf,KAAK,CAAS;IACd,WAAW,CAAS;IACpB,OAAO,CAAS;IAChB,SAAS,CAAS;IAClB,WAAW,CAAS;IAE5B,YAAY;IACJ,KAAK,GAA0B,IAAI,GAAG,EAAE,CAAC;IACzC,YAAY,GAAG,IAAI,CAAC;IAE5B,QAAQ;IACA,KAAK,GAAG;QACd,eAAe,EAAE,CAAC;QAClB,YAAY,EAAE,CAAC;QACf,SAAS,EAAE,CAAC;QACZ,SAAS,EAAE,CAAC;QACZ,WAAW,EAAE,CAAC;QACd,QAAQ,EAAE,CAAC;QACX,WAAW,EAAE,CAAC;KACf,CAAC;IAEF,YAAY,MAA4B;QACtC,IAAI,CAAC,QAAQ,GAAG,MAAM,CAAC,QAAQ,CAAC;QAChC,IAAI,CAAC,MAAM,GAAG,MAAM,CAAC,MAAM,CAAC;QAE5B,MAAM,QAAQ,GAAG,iBAAiB,CAAC,MAAM,CAAC,QAAQ,CAAC,CAAC;QACpD,IAAI,CAAC,KAAK,GAAG,MAAM,CAAC,KAAK,IAAI,QAAQ,CAAC,KAAK,CAAC;QAC5C,IAAI,CAAC,WAAW,GAAG,MAAM,CAAC,UAAU,IAAI,QAAQ,CAAC,UAAU,CAAC;QAC5D,IAAI,CAAC,OAAO,GAAG,MAAM,CAAC,OAAO,IAAI,QAAQ,CAAC,OAAO,CAAC;QAClD,IAAI,CAAC,SAAS,GAAG,MAAM,CAAC,SAAS,IAAI,QAAQ,CAAC,SAAS,CAAC;QACxD,IAAI,CAAC,WAAW,GAAG,MAAM,CAAC,WAAW,IAAI,CAAC,CAAC;QAE3C,OAAO,CAAC,KAAK,CACX,wCAAwC,IAAI,CAAC,QAAQ,YAAY,IAAI,CAAC,KAAK,WAAW,IAAI,CAAC,WAAW,GAAG,CAC1G,CAAC;IACJ,CAAC;IAED,IAAI,UAAU;QACZ,OAAO,IAAI,CAAC,WAAW,CAAC;IAC1B,CAAC;IAED,KAAK,CAAC,KAAK,CAAC,IAAY,EAAE,UAAmB,KAAK;QAChD,MAAM,QAAQ,GAAG,GAAG,OAAO,CAAC,CAAC,CAAC,IAAI,CAAC,CAAC,CAAC,IAAI,GAAG,IAAI,EAAE,CAAC;QACnD,MAAM,MAAM,GAAG,IAAI,CAAC,KAAK,CAAC,GAAG,CAAC,QAAQ,CAAC,CAAC;QACxC,IAAI,MAAM,EAAE,CAAC;YACX,IAAI,CAAC,KAAK,CAAC,SAAS,EAAE,CAAC;YACvB,OAAO,MAAM,CAAC;QAChB,CAAC;QAED,MAAM,OAAO,GAAG,MAAM,IAAI,CAAC,UAAU,CAAC,CAAC,IAAI,CAAC,EAAE,OAAO,CAAC,CAAC;QACvD,OAAO,OAAO,CAAC,CAAC,CAAC,CAAC;IACpB,CAAC;IAED,KAAK,CAAC,UAAU,CAAC,KAAe,EAAE,WAAoB,KAAK;QACzD,IAAI,KAAK,CAAC,MAAM,KAAK,CAAC;YAAE,OAAO,EAAE,CAAC;QAElC,MAAM,SAAS,GAAG,WAAW,CAAC,GAAG,EAAE,CAAC;QAEpC,+BAA+B;QAC/B,MAAM,OAAO,GAAe,EAAE,CAAC;QAC/B,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,KAAK,CAAC,MAAM,EAAE,CAAC,IAAI,IAAI,CAAC,SAAS,EAAE,CAAC;YACtD,OAAO,CAAC,IAAI,CAAC,KAAK,CAAC,KAAK,CAAC,CAAC,EAAE,CAAC,GAAG,IAAI,CAAC,SAAS,CAAC,CAAC,CAAC;QACnD,CAAC;QAED,iCAAiC;QACjC,MAAM,aAAa,GAAe,IAAI,KAAK,CAAC,KAAK,CAAC,MAAM,CAAC,CAAC;QAC1D,IAAI,MAAM,GAAG,CAAC,CAAC;QAEf,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,OAAO,CAAC,MAAM,EAAE,CAAC,IAAI,IAAI,CAAC,WAAW,EAAE,CAAC;YAC1D,MAAM,UAAU,GAAG,OAAO,CAAC,KAAK,CAAC,CAAC,EAAE,CAAC,GAAG,IAAI,CAAC,WAAW,CAAC,CAAC;YAC1D,MAAM,QAAQ,GAAG,UAAU,CAAC,GAAG,CAAC,KAAK,CAAC,EAAE,CAAC,IAAI,CAAC,OAAO,CAAC,KAAK,CAAC,CAAC,CAAC;YAC9D,MAAM,OAAO,GAAG,MAAM,OAAO,CAAC,GAAG,CAAC,QAAQ,CAAC,CAAC;YAE5C,KAAK,MAAM,UAAU,IAAI,OAAO,EAAE,CAAC;gBACjC,KAAK,MAAM,SAAS,IAAI,UAAU,EAAE,CAAC;oBACnC,aAAa,CAAC,MAAM,EAAE,CAAC,GAAG,SAAS,CAAC;gBACtC,CAAC;YACH,CAAC;QACH,CAAC;QAED,eAAe;QACf,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,KAAK,CAAC,MAAM,EAAE,CAAC,EAAE,EAAE,CAAC;YACtC,MAAM,QAAQ,GAAG,KAAK,KAAK,CAAC,CAAC,CAAC,EAAE,CAAC;YACjC,IAAI,CAAC,KAAK,CAAC,WAAW,EAAE,CAAC;YACzB,IAAI,IAAI,CAAC,KAAK,CAAC,IAAI,IAAI,IAAI,CAAC,YAAY,EAAE,CAAC;gBACzC,MAAM,QAAQ,GAAG,IAAI,CAAC,KAAK,CAAC,IAAI,EAAE,CAAC,IAAI,EAAE,CAAC,KAAM,CAAC;gBACjD,IAAI,CAAC,KAAK,CAAC,MAAM,CAAC,QAAQ,CAAC,CAAC;YAC9B,CAAC;YACD,IAAI,CAAC,KAAK,CAAC,GAAG,CAAC,QAAQ,EAAE,aAAa,CAAC,CAAC,CAAC,CAAC,CAAC;QAC7C,CAAC;QAED,MAAM,OAAO,GAAG,CAAC,WAAW,CAAC,GAAG,EAAE,GAAG,SAAS,CAAC,GAAG,IAAI,CAAC;QACvD,IAAI,CAAC,KAAK,CAAC,eAAe,IAAI,KAAK,CAAC,MAAM,CAAC;QAC3C,IAAI,CAAC,KAAK,CAAC,YAAY,EAAE,CAAC;QAC1B,IAAI,CAAC,KAAK,CAAC,SAAS,IAAI,OAAO,CAAC;QAEhC,OAAO,aAAa,CAAC;IACvB,CAAC;IAEO,KAAK,CAAC,OAAO,CAAC,KAAe,EAAE,OAAO,GAAG,CAAC;QAChD,MAAM,IAAI,GAA4B;YACpC,KAAK,EAAE,IAAI,CAAC,KAAK;YACjB,KAAK,EAAE,KAAK;SACb,CAAC;QAEF,sCAAsC;QACtC,IAAI,IAAI,CAAC,QAAQ,KAAK,QAAQ,IAAI,IAAI,CAAC,WAAW,EAAE,CAAC;YACnD,IAAI,CAAC,UAAU,GAAG,IAAI,CAAC,WAAW,CAAC;QACrC,CAAC;QACD,yBAAyB;QACzB,IAAI,IAAI,CAAC,QAAQ,KAAK,QAAQ,EAAE,CAAC;YAC/B,IAAI,CAAC,UAAU,GAAG,UAAU,CAAC;QAC/B,CAAC;QAED,KAAK,IAAI,OAAO,GAAG,CAAC,EAAE,OAAO,GAAG,OAAO,EAAE,OAAO,EAAE,EAAE,CAAC;YACnD,IAAI,CAAC;gBACH,MAAM,QAAQ,GAAG,MAAM,KAAK,CAAC,IAAI,CAAC,OAAO,EAAE;oBACzC,MAAM,EAAE,MAAM;oBACd,OAAO,EAAE;wBACP,cAAc,EAAE,kBAAkB;wBAClC,eAAe,EAAE,UAAU,IAAI,CAAC,MAAM,EAAE;qBACzC;oBACD,IAAI,EAAE,IAAI,CAAC,SAAS,CAAC,IAAI,CAAC;iBAC3B,CAAC,CAAC;gBAEH,IAAI,QAAQ,CAAC,MAAM,KAAK,GAAG,IAAI,QAAQ,CAAC,MAAM,IAAI,GAAG,EAAE,CAAC;oBACtD,MAAM,UAAU,GAAG,QAAQ,CAAC,OAAO,CAAC,GAAG,CAAC,aAAa,CAAC,CAAC;oBACvD,MAAM,MAAM,GAAG,UAAU,CAAC,CAAC,CAAC,QAAQ,CAAC,UAAU,EAAE,EAAE,CAAC,GAAG,IAAI,CAAC,CAAC,CAAC,IAAI,CAAC,GAAG,CAAC,CAAC,EAAE,OAAO,CAAC,GAAG,IAAI,CAAC;oBAC1F,OAAO,CAAC,KAAK,CAAC,kCAAkC,QAAQ,CAAC,MAAM,kBAAkB,MAAM,OAAO,CAAC,CAAC;oBAChG,MAAM,IAAI,OAAO,CAAC,CAAC,CAAC,EAAE,CAAC,UAAU,CAAC,CAAC,EAAE,MAAM,CAAC,CAAC,CAAC;oBAC9C,SAAS;gBACX,CAAC;gBAED,IAAI,CAAC,QAAQ,CAAC,EAAE,EAAE,CAAC;oBACjB,MAAM,SAAS,GAAG,MAAM,QAAQ,CAAC,IAAI,EAAE,CAAC;oBACxC,MAAM,IAAI,cAAc,CACtB,GAAG,IAAI,CAAC,QAAQ,eAAe,QAAQ,CAAC,MAAM,MAAM,SAAS,CAAC,KAAK,CAAC,CAAC,EAAE,GAAG,CAAC,EAAE,CAC9E,CAAC;gBACJ,CAAC;gBAED,MAAM,IAAI,GAAG,MAAM,QAAQ,CAAC,IAAI,EAG/B,CAAC;gBAEF,IAAI,CAAC,KAAK,CAAC,QAAQ,EAAE,CAAC;gBACtB,IAAI,IAAI,CAAC,KAAK,EAAE,CAAC;oBACf,IAAI,CAAC,KAAK,CAAC,WAAW,IAAI,IAAI,CAAC,KAAK,CAAC,YAAY,IAAI,IAAI,CAAC,KAAK,CAAC,aAAa,IAAI,CAAC,CAAC;gBACrF,CAAC;gBAED,kCAAkC;gBAClC,MAAM,MAAM,GAAG,IAAI,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC,EAAE,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,KAAK,GAAG,CAAC,CAAC,KAAK,CAAC,CAAC;gBAC3D,OAAO,MAAM,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,SAAS,CAAC,CAAC;YACtC,CAAC;YAAC,OAAO,CAAC,EAAE,CAAC;gBACX,IAAI,CAAC,YAAY,cAAc;oBAAE,MAAM,CAAC,CAAC;gBACzC,IAAI,OAAO,KAAK,OAAO,GAAG,CAAC,EAAE,CAAC;oBAC5B,MAAM,IAAI,cAAc,CAAC,GAAG,IAAI,CAAC,QAAQ,qBAAsB,CAAW,CAAC,OAAO,EAAE,CAAC,CAAC;gBACxF,CAAC;gBACD,MAAM,MAAM,GAAG,IAAI,CAAC,GAAG,CAAC,CAAC,EAAE,OAAO,CAAC,GAAG,IAAI,CAAC;gBAC3C,OAAO,CAAC,KAAK,CAAC,uCAAuC,MAAM,OAAQ,CAAW,CAAC,OAAO,EAAE,CAAC,CAAC;gBAC1F,MAAM,IAAI,OAAO,CAAC,CAAC,CAAC,EAAE,CAAC,UAAU,CAAC,CAAC,EAAE,MAAM,CAAC,CAAC,CAAC;YAChD,CAAC;QACH,CAAC;QAED,MAAM,IAAI,cAAc,CAAC,GAAG,IAAI,CAAC,QAAQ,qBAAqB,OAAO,UAAU,CAAC,CAAC;IACnF,CAAC;IAED,QAAQ;QACN,OAAO;YACL,GAAG,IAAI,CAAC,KAAK;YACb,QAAQ,EAAE,IAAI,CAAC,QAAQ;YACvB,KAAK,EAAE,IAAI,CAAC,KAAK;YACjB,mBAAmB,EAAE,IAAI,CAAC,KAAK,CAAC,SAAS,GAAG,CAAC;gBAC3C,CAAC,CAAC,IAAI,CAAC,KAAK,CAAC,eAAe,GAAG,IAAI,CAAC,KAAK,CAAC,SAAS;gBACnD,CAAC,CAAC,CAAC;SACN,CAAC;IACJ,CAAC;IAED,UAAU;QACR,IAAI,CAAC,KAAK,CAAC,KAAK,EAAE,CAAC;QACnB,IAAI,CAAC,KAAK,CAAC,SAAS,GAAG,CAAC,CAAC;QACzB,IAAI,CAAC,KAAK,CAAC,WAAW,GAAG,CAAC,CAAC;IAC7B,CAAC;IAED,KAAK,CAAC,MAAM;QACV,IAAI,CAAC,KAAK,CAAC,KAAK,EAAE,CAAC;IACrB,CAAC;CACF"}
@@ -0,0 +1,19 @@
1
+ /**
2
+ * Factory for creating embedding services based on configuration.
3
+ *
4
+ * Supports local ONNX inference or cloud APIs (OpenAI, Voyage).
5
+ * Configured via CODEBAXING_EMBEDDING_PROVIDER environment variable.
6
+ */
7
+ import type { IEmbeddingService } from '../core/interfaces.js';
8
+ import { type DeviceType, type DType } from './embedding-service.js';
9
+ export type EmbeddingProvider = 'local' | 'openai' | 'voyage';
10
+ export declare function getConfiguredProvider(): EmbeddingProvider;
11
+ export interface CreateEmbeddingServiceOptions {
12
+ modelName?: string;
13
+ showProgress?: boolean;
14
+ device?: DeviceType;
15
+ dtype?: DType;
16
+ disableCachePurge?: boolean;
17
+ }
18
+ export declare function createEmbeddingService(options?: CreateEmbeddingServiceOptions): IEmbeddingService;
19
+ //# sourceMappingURL=embedding-factory.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"embedding-factory.d.ts","sourceRoot":"","sources":["../../src/indexing/embedding-factory.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAEH,OAAO,KAAK,EAAE,iBAAiB,EAAE,MAAM,uBAAuB,CAAC;AAE/D,OAAO,EAAmC,KAAK,UAAU,EAAE,KAAK,KAAK,EAAE,MAAM,wBAAwB,CAAC;AAKtG,MAAM,MAAM,iBAAiB,GAAG,OAAO,GAAG,QAAQ,GAAG,QAAQ,CAAC;AAM9D,wBAAgB,qBAAqB,IAAI,iBAAiB,CAMzD;AAID,MAAM,WAAW,6BAA6B;IAC5C,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,YAAY,CAAC,EAAE,OAAO,CAAC;IACvB,MAAM,CAAC,EAAE,UAAU,CAAC;IACpB,KAAK,CAAC,EAAE,KAAK,CAAC;IACd,iBAAiB,CAAC,EAAE,OAAO,CAAC;CAC7B;AAED,wBAAgB,sBAAsB,CACpC,OAAO,GAAE,6BAAkC,GAC1C,iBAAiB,CA0BnB"}
@@ -0,0 +1,62 @@
1
+ /**
2
+ * Factory for creating embedding services based on configuration.
3
+ *
4
+ * Supports local ONNX inference or cloud APIs (OpenAI, Voyage).
5
+ * Configured via CODEBAXING_EMBEDDING_PROVIDER environment variable.
6
+ */
7
+ import { EmbeddingError } from '../core/exceptions.js';
8
+ import { EmbeddingService, DEFAULT_MODEL } from './embedding-service.js';
9
+ import { CloudEmbeddingService } from './cloud-embedding-service.js';
10
+ const VALID_PROVIDERS = ['local', 'openai', 'voyage'];
11
+ // ─── Configuration ───────────────────────────────────────────────────────────
12
+ export function getConfiguredProvider() {
13
+ const envProvider = process.env.CODEBAXING_EMBEDDING_PROVIDER?.toLowerCase();
14
+ if (envProvider && VALID_PROVIDERS.includes(envProvider)) {
15
+ return envProvider;
16
+ }
17
+ return 'local';
18
+ }
19
+ export function createEmbeddingService(options = {}) {
20
+ const provider = getConfiguredProvider();
21
+ if (provider === 'local') {
22
+ return new EmbeddingService(options.modelName ?? DEFAULT_MODEL, {
23
+ showProgress: options.showProgress,
24
+ device: options.device,
25
+ dtype: options.dtype,
26
+ disableCachePurge: options.disableCachePurge,
27
+ });
28
+ }
29
+ // Cloud providers
30
+ const apiKey = getApiKey(provider);
31
+ const model = process.env.CODEBAXING_EMBEDDING_MODEL;
32
+ const dimensionsEnv = process.env.CODEBAXING_EMBEDDING_DIMENSIONS;
33
+ const dimensions = dimensionsEnv ? parseInt(dimensionsEnv, 10) : undefined;
34
+ const baseUrl = process.env.CODEBAXING_EMBEDDING_BASE_URL;
35
+ return new CloudEmbeddingService({
36
+ provider: provider,
37
+ apiKey,
38
+ model,
39
+ dimensions,
40
+ baseUrl,
41
+ });
42
+ }
43
+ function getApiKey(provider) {
44
+ let apiKey;
45
+ if (provider === 'openai') {
46
+ apiKey = process.env.CODEBAXING_OPENAI_API_KEY ?? process.env.OPENAI_API_KEY;
47
+ if (!apiKey) {
48
+ throw new EmbeddingError('OpenAI API key required. Set CODEBAXING_OPENAI_API_KEY or OPENAI_API_KEY environment variable.');
49
+ }
50
+ }
51
+ else if (provider === 'voyage') {
52
+ apiKey = process.env.CODEBAXING_VOYAGE_API_KEY ?? process.env.VOYAGE_API_KEY;
53
+ if (!apiKey) {
54
+ throw new EmbeddingError('Voyage API key required. Set CODEBAXING_VOYAGE_API_KEY or VOYAGE_API_KEY environment variable.');
55
+ }
56
+ }
57
+ else {
58
+ throw new EmbeddingError(`Unknown cloud provider: ${provider}`);
59
+ }
60
+ return apiKey;
61
+ }
62
+ //# sourceMappingURL=embedding-factory.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"embedding-factory.js","sourceRoot":"","sources":["../../src/indexing/embedding-factory.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAGH,OAAO,EAAE,cAAc,EAAE,MAAM,uBAAuB,CAAC;AACvD,OAAO,EAAE,gBAAgB,EAAE,aAAa,EAA+B,MAAM,wBAAwB,CAAC;AACtG,OAAO,EAAE,qBAAqB,EAAsB,MAAM,8BAA8B,CAAC;AAMzF,MAAM,eAAe,GAAwB,CAAC,OAAO,EAAE,QAAQ,EAAE,QAAQ,CAAC,CAAC;AAE3E,gFAAgF;AAEhF,MAAM,UAAU,qBAAqB;IACnC,MAAM,WAAW,GAAG,OAAO,CAAC,GAAG,CAAC,6BAA6B,EAAE,WAAW,EAAE,CAAC;IAC7E,IAAI,WAAW,IAAI,eAAe,CAAC,QAAQ,CAAC,WAAgC,CAAC,EAAE,CAAC;QAC9E,OAAO,WAAgC,CAAC;IAC1C,CAAC;IACD,OAAO,OAAO,CAAC;AACjB,CAAC;AAYD,MAAM,UAAU,sBAAsB,CACpC,UAAyC,EAAE;IAE3C,MAAM,QAAQ,GAAG,qBAAqB,EAAE,CAAC;IAEzC,IAAI,QAAQ,KAAK,OAAO,EAAE,CAAC;QACzB,OAAO,IAAI,gBAAgB,CAAC,OAAO,CAAC,SAAS,IAAI,aAAa,EAAE;YAC9D,YAAY,EAAE,OAAO,CAAC,YAAY;YAClC,MAAM,EAAE,OAAO,CAAC,MAAM;YACtB,KAAK,EAAE,OAAO,CAAC,KAAK;YACpB,iBAAiB,EAAE,OAAO,CAAC,iBAAiB;SAC7C,CAAC,CAAC;IACL,CAAC;IAED,kBAAkB;IAClB,MAAM,MAAM,GAAG,SAAS,CAAC,QAAQ,CAAC,CAAC;IACnC,MAAM,KAAK,GAAG,OAAO,CAAC,GAAG,CAAC,0BAA0B,CAAC;IACrD,MAAM,aAAa,GAAG,OAAO,CAAC,GAAG,CAAC,+BAA+B,CAAC;IAClE,MAAM,UAAU,GAAG,aAAa,CAAC,CAAC,CAAC,QAAQ,CAAC,aAAa,EAAE,EAAE,CAAC,CAAC,CAAC,CAAC,SAAS,CAAC;IAC3E,MAAM,OAAO,GAAG,OAAO,CAAC,GAAG,CAAC,6BAA6B,CAAC;IAE1D,OAAO,IAAI,qBAAqB,CAAC;QAC/B,QAAQ,EAAE,QAAyB;QACnC,MAAM;QACN,KAAK;QACL,UAAU;QACV,OAAO;KACR,CAAC,CAAC;AACL,CAAC;AAED,SAAS,SAAS,CAAC,QAA2B;IAC5C,IAAI,MAA0B,CAAC;IAE/B,IAAI,QAAQ,KAAK,QAAQ,EAAE,CAAC;QAC1B,MAAM,GAAG,OAAO,CAAC,GAAG,CAAC,yBAAyB,IAAI,OAAO,CAAC,GAAG,CAAC,cAAc,CAAC;QAC7E,IAAI,CAAC,MAAM,EAAE,CAAC;YACZ,MAAM,IAAI,cAAc,CACtB,gGAAgG,CACjG,CAAC;QACJ,CAAC;IACH,CAAC;SAAM,IAAI,QAAQ,KAAK,QAAQ,EAAE,CAAC;QACjC,MAAM,GAAG,OAAO,CAAC,GAAG,CAAC,yBAAyB,IAAI,OAAO,CAAC,GAAG,CAAC,cAAc,CAAC;QAC7E,IAAI,CAAC,MAAM,EAAE,CAAC;YACZ,MAAM,IAAI,cAAc,CACtB,gGAAgG,CACjG,CAAC;QACJ,CAAC;IACH,CAAC;SAAM,CAAC;QACN,MAAM,IAAI,cAAc,CAAC,2BAA2B,QAAQ,EAAE,CAAC,CAAC;IAClE,CAAC;IAED,OAAO,MAAM,CAAC;AAChB,CAAC"}
@@ -13,12 +13,19 @@
13
13
  *
14
14
  * Note: macOS does not support CUDA. Use 'webgpu' for GPU acceleration on Mac.
15
15
  */
16
+ import type { IEmbeddingService } from '../core/interfaces.js';
16
17
  export type DeviceType = 'cpu' | 'cuda' | 'webgpu' | 'auto';
18
+ export type DType = 'fp32' | 'fp16' | 'q8' | 'q4';
17
19
  /**
18
20
  * Get the configured device from environment variable.
19
21
  * Defaults to 'cpu' for maximum compatibility.
20
22
  */
21
23
  export declare function getConfiguredDevice(): DeviceType;
24
+ /**
25
+ * Get the configured dtype from environment variable.
26
+ * Defaults to 'q8' for best speed/quality tradeoff (~3x faster than fp32).
27
+ */
28
+ export declare function getConfiguredDtype(): DType;
22
29
  export interface EmbeddingModelConfig {
23
30
  modelId: string;
24
31
  dimensions: number;
@@ -28,10 +35,11 @@ export interface EmbeddingModelConfig {
28
35
  }
29
36
  export declare const EMBEDDING_MODELS: Record<string, EmbeddingModelConfig>;
30
37
  export declare const DEFAULT_MODEL = "all-MiniLM-L6-v2";
31
- export declare class EmbeddingService {
38
+ export declare class EmbeddingService implements IEmbeddingService {
32
39
  private modelName;
33
40
  private config;
34
41
  private device;
42
+ private dtype;
35
43
  private extractor;
36
44
  private loading;
37
45
  private showProgress;
@@ -48,6 +56,7 @@ export declare class EmbeddingService {
48
56
  constructor(modelName?: string, options?: {
49
57
  showProgress?: boolean;
50
58
  device?: DeviceType;
59
+ dtype?: DType;
51
60
  disableCachePurge?: boolean;
52
61
  });
53
62
  /** Get the device being used for inference */
@@ -1 +1 @@
1
- {"version":3,"file":"embedding-service.d.ts","sourceRoot":"","sources":["../../src/indexing/embedding-service.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;GAcG;AASH,MAAM,MAAM,UAAU,GAAG,KAAK,GAAG,MAAM,GAAG,QAAQ,GAAG,MAAM,CAAC;AAI5D;;;GAGG;AACH,wBAAgB,mBAAmB,IAAI,UAAU,CAMhD;AAyCD,MAAM,WAAW,oBAAoB;IACnC,OAAO,EAAE,MAAM,CAAC;IAChB,UAAU,EAAE,MAAM,CAAC;IACnB,YAAY,EAAE,MAAM,CAAC;IACrB,WAAW,EAAE,MAAM,CAAC;IACpB,cAAc,EAAE,MAAM,CAAC;CACxB;AAED,eAAO,MAAM,gBAAgB,EAAE,MAAM,CAAC,MAAM,EAAE,oBAAoB,CAejE,CAAC;AAEF,eAAO,MAAM,aAAa,qBAAqB,CAAC;AAIhD,qBAAa,gBAAgB;IAC3B,OAAO,CAAC,SAAS,CAAS;IAC1B,OAAO,CAAC,MAAM,CAAuB;IACrC,OAAO,CAAC,MAAM,CAAa;IAE3B,OAAO,CAAC,SAAS,CAAa;IAC9B,OAAO,CAAC,OAAO,CAA8B;IAC7C,OAAO,CAAC,YAAY,CAAU;IAG9B,OAAO,CAAC,KAAK,CAAoC;IACjD,OAAO,CAAC,YAAY,CAAQ;IAG5B,KAAK;;;;;;MAMH;IAGF,OAAO,CAAC,iBAAiB,CAAU;gBAGjC,SAAS,GAAE,MAAsB,EACjC,OAAO,GAAE;QAAE,YAAY,CAAC,EAAE,OAAO,CAAC;QAAC,MAAM,CAAC,EAAE,UAAU,CAAC;QAAC,iBAAiB,CAAC,EAAE,OAAO,CAAA;KAAO;IAqB5F,8CAA8C;IAC9C,SAAS,IAAI,UAAU;IAIvB,IAAI,UAAU,IAAI,MAAM,CAEvB;YAEa,SAAS;IA+FjB,KAAK,CAAC,IAAI,EAAE,MAAM,EAAE,OAAO,GAAE,OAAe,GAAG,OAAO,CAAC,MAAM,EAAE,CAAC;IAkChE,UAAU,CAAC,KAAK,EAAE,MAAM,EAAE,EAAE,OAAO,GAAE,OAAe,GAAG,OAAO,CAAC,MAAM,EAAE,EAAE,CAAC;IAsEhF,QAAQ,IAAI,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC;IAUnC,UAAU,IAAI,IAAI;IAMlB;;;OAGG;IACH,MAAM,CAAC,eAAe,CAAC,OAAO,EAAE,MAAM,GAAG,OAAO;IAmB1C,MAAM,IAAI,OAAO,CAAC,IAAI,CAAC;CAM9B;AAMD,wBAAgB,mBAAmB,CACjC,SAAS,GAAE,MAAsB,EACjC,OAAO,GAAE;IAAE,YAAY,CAAC,EAAE,OAAO,CAAC;IAAC,MAAM,CAAC,EAAE,UAAU,CAAA;CAAO,GAC5D,gBAAgB,CASlB;AAED,wBAAsB,qBAAqB,CAAC,SAAS,CAAC,EAAE,MAAM,GAAG,OAAO,CAAC,IAAI,CAAC,CAa7E"}
1
+ {"version":3,"file":"embedding-service.d.ts","sourceRoot":"","sources":["../../src/indexing/embedding-service.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;GAcG;AAMH,OAAO,KAAK,EAAE,iBAAiB,EAAE,MAAM,uBAAuB,CAAC;AAI/D,MAAM,MAAM,UAAU,GAAG,KAAK,GAAG,MAAM,GAAG,QAAQ,GAAG,MAAM,CAAC;AAC5D,MAAM,MAAM,KAAK,GAAG,MAAM,GAAG,MAAM,GAAG,IAAI,GAAG,IAAI,CAAC;AAKlD;;;GAGG;AACH,wBAAgB,mBAAmB,IAAI,UAAU,CAMhD;AAED;;;GAGG;AACH,wBAAgB,kBAAkB,IAAI,KAAK,CAM1C;AAyCD,MAAM,WAAW,oBAAoB;IACnC,OAAO,EAAE,MAAM,CAAC;IAChB,UAAU,EAAE,MAAM,CAAC;IACnB,YAAY,EAAE,MAAM,CAAC;IACrB,WAAW,EAAE,MAAM,CAAC;IACpB,cAAc,EAAE,MAAM,CAAC;CACxB;AAED,eAAO,MAAM,gBAAgB,EAAE,MAAM,CAAC,MAAM,EAAE,oBAAoB,CAejE,CAAC;AAEF,eAAO,MAAM,aAAa,qBAAqB,CAAC;AAIhD,qBAAa,gBAAiB,YAAW,iBAAiB;IACxD,OAAO,CAAC,SAAS,CAAS;IAC1B,OAAO,CAAC,MAAM,CAAuB;IACrC,OAAO,CAAC,MAAM,CAAa;IAC3B,OAAO,CAAC,KAAK,CAAQ;IAErB,OAAO,CAAC,SAAS,CAAa;IAC9B,OAAO,CAAC,OAAO,CAA8B;IAC7C,OAAO,CAAC,YAAY,CAAU;IAG9B,OAAO,CAAC,KAAK,CAAoC;IACjD,OAAO,CAAC,YAAY,CAAQ;IAG5B,KAAK;;;;;;MAMH;IAGF,OAAO,CAAC,iBAAiB,CAAU;gBAGjC,SAAS,GAAE,MAAsB,EACjC,OAAO,GAAE;QAAE,YAAY,CAAC,EAAE,OAAO,CAAC;QAAC,MAAM,CAAC,EAAE,UAAU,CAAC;QAAC,KAAK,CAAC,EAAE,KAAK,CAAC;QAAC,iBAAiB,CAAC,EAAE,OAAO,CAAA;KAAO;IAsB3G,8CAA8C;IAC9C,SAAS,IAAI,UAAU;IAIvB,IAAI,UAAU,IAAI,MAAM,CAEvB;YAEa,SAAS;IAgGjB,KAAK,CAAC,IAAI,EAAE,MAAM,EAAE,OAAO,GAAE,OAAe,GAAG,OAAO,CAAC,MAAM,EAAE,CAAC;IAkChE,UAAU,CAAC,KAAK,EAAE,MAAM,EAAE,EAAE,OAAO,GAAE,OAAe,GAAG,OAAO,CAAC,MAAM,EAAE,EAAE,CAAC;IAsEhF,QAAQ,IAAI,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC;IAUnC,UAAU,IAAI,IAAI;IAMlB;;;OAGG;IACH,MAAM,CAAC,eAAe,CAAC,OAAO,EAAE,MAAM,GAAG,OAAO;IAmB1C,MAAM,IAAI,OAAO,CAAC,IAAI,CAAC;CAM9B;AAMD,wBAAgB,mBAAmB,CACjC,SAAS,GAAE,MAAsB,EACjC,OAAO,GAAE;IAAE,YAAY,CAAC,EAAE,OAAO,CAAC;IAAC,MAAM,CAAC,EAAE,UAAU,CAAA;CAAO,GAC5D,gBAAgB,CASlB;AAED,wBAAsB,qBAAqB,CAAC,SAAS,CAAC,EAAE,MAAM,GAAG,OAAO,CAAC,IAAI,CAAC,CAa7E"}