fastembed 1.0.0 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +1 -0
- data/.yardopts +6 -0
- data/BENCHMARKS.md +124 -1
- data/CHANGELOG.md +14 -0
- data/README.md +395 -74
- data/benchmark/compare_all.rb +167 -0
- data/benchmark/compare_python.py +60 -0
- data/benchmark/memory_profile.rb +70 -0
- data/benchmark/profile.rb +198 -0
- data/benchmark/reranker_benchmark.rb +158 -0
- data/exe/fastembed +6 -0
- data/fastembed.gemspec +3 -0
- data/lib/fastembed/async.rb +193 -0
- data/lib/fastembed/base_model.rb +247 -0
- data/lib/fastembed/base_model_info.rb +61 -0
- data/lib/fastembed/cli.rb +745 -0
- data/lib/fastembed/custom_model_registry.rb +255 -0
- data/lib/fastembed/image_embedding.rb +313 -0
- data/lib/fastembed/late_interaction_embedding.rb +260 -0
- data/lib/fastembed/late_interaction_model_info.rb +91 -0
- data/lib/fastembed/model_info.rb +59 -19
- data/lib/fastembed/model_management.rb +82 -23
- data/lib/fastembed/onnx_embedding_model.rb +25 -4
- data/lib/fastembed/pooling.rb +39 -3
- data/lib/fastembed/progress.rb +52 -0
- data/lib/fastembed/quantization.rb +75 -0
- data/lib/fastembed/reranker_model_info.rb +91 -0
- data/lib/fastembed/sparse_embedding.rb +261 -0
- data/lib/fastembed/sparse_model_info.rb +80 -0
- data/lib/fastembed/text_cross_encoder.rb +217 -0
- data/lib/fastembed/text_embedding.rb +161 -28
- data/lib/fastembed/validators.rb +59 -0
- data/lib/fastembed/version.rb +1 -1
- data/lib/fastembed.rb +42 -1
- data/plan.md +257 -0
- data/scripts/verify_models.rb +229 -0
- metadata +70 -3
data/README.md
CHANGED
|
@@ -3,156 +3,477 @@
|
|
|
3
3
|
[](https://rubygems.org/gems/fastembed)
|
|
4
4
|
[](https://github.com/khasinski/fastembed-rb/actions/workflows/ci.yml)
|
|
5
5
|
|
|
6
|
-
Fast, lightweight text embeddings in Ruby.
|
|
6
|
+
Fast, lightweight text embeddings in Ruby. A port of [FastEmbed](https://github.com/qdrant/fastembed) by Qdrant.
|
|
7
7
|
|
|
8
8
|
```ruby
|
|
9
9
|
embedding = Fastembed::TextEmbedding.new
|
|
10
|
-
vectors = embedding.embed(["
|
|
11
|
-
# => [[0.123, -0.456, ...], [0.789, 0.012, ...]] (384-dimensional vectors)
|
|
10
|
+
vectors = embedding.embed(["The quick brown fox", "jumps over the lazy dog"]).to_a
|
|
12
11
|
```
|
|
13
12
|
|
|
14
|
-
|
|
13
|
+
Supports dense embeddings, sparse embeddings (SPLADE), late interaction (ColBERT), reranking, and image embeddings - all running locally with ONNX Runtime.
|
|
15
14
|
|
|
16
|
-
|
|
15
|
+
## Table of Contents
|
|
17
16
|
|
|
18
|
-
-
|
|
19
|
-
-
|
|
20
|
-
-
|
|
21
|
-
-
|
|
17
|
+
- [Installation](#installation)
|
|
18
|
+
- [Getting Started](#getting-started)
|
|
19
|
+
- [Text Embeddings](#text-embeddings)
|
|
20
|
+
- [Reranking](#reranking)
|
|
21
|
+
- [Sparse Embeddings](#sparse-embeddings)
|
|
22
|
+
- [Late Interaction (ColBERT)](#late-interaction-colbert)
|
|
23
|
+
- [Image Embeddings](#image-embeddings)
|
|
24
|
+
- [Async Processing](#async-processing)
|
|
25
|
+
- [Progress Tracking](#progress-tracking)
|
|
26
|
+
- [CLI](#cli)
|
|
27
|
+
- [Custom Models](#custom-models)
|
|
28
|
+
- [Configuration](#configuration)
|
|
29
|
+
- [Performance](#performance)
|
|
22
30
|
|
|
23
31
|
## Installation
|
|
24
32
|
|
|
33
|
+
Add to your Gemfile:
|
|
34
|
+
|
|
35
|
+
```ruby
|
|
36
|
+
gem "fastembed"
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
For image embeddings, also add:
|
|
40
|
+
|
|
25
41
|
```ruby
|
|
26
|
-
gem
|
|
42
|
+
gem "mini_magick"
|
|
27
43
|
```
|
|
28
44
|
|
|
29
|
-
##
|
|
45
|
+
## Getting Started
|
|
30
46
|
|
|
31
47
|
```ruby
|
|
32
|
-
require
|
|
48
|
+
require "fastembed"
|
|
33
49
|
|
|
34
|
-
# Create embedding model (downloads
|
|
50
|
+
# Create an embedding model (downloads ~67MB on first use)
|
|
35
51
|
embedding = Fastembed::TextEmbedding.new
|
|
36
52
|
|
|
37
|
-
# Embed
|
|
38
|
-
|
|
39
|
-
|
|
53
|
+
# Embed some text
|
|
54
|
+
documents = [
|
|
55
|
+
"Ruby is a dynamic programming language",
|
|
56
|
+
"Python is great for data science",
|
|
57
|
+
"JavaScript runs in the browser"
|
|
58
|
+
]
|
|
59
|
+
vectors = embedding.embed(documents).to_a
|
|
40
60
|
|
|
41
|
-
#
|
|
42
|
-
|
|
43
|
-
puts similarity # => 0.89 (high similarity!)
|
|
61
|
+
# Each vector is 384 floats (for the default model)
|
|
62
|
+
vectors.first.length # => 384
|
|
44
63
|
```
|
|
45
64
|
|
|
46
|
-
|
|
65
|
+
### Semantic Search
|
|
66
|
+
|
|
67
|
+
Find documents by meaning, not just keywords:
|
|
47
68
|
|
|
48
69
|
```ruby
|
|
70
|
+
embedding = Fastembed::TextEmbedding.new
|
|
71
|
+
|
|
49
72
|
# Your document corpus
|
|
50
73
|
documents = [
|
|
51
|
-
"The
|
|
52
|
-
"Machine learning
|
|
53
|
-
"Ruby on Rails is a web
|
|
54
|
-
"
|
|
74
|
+
"The cat sat on the mat",
|
|
75
|
+
"Machine learning powers modern AI",
|
|
76
|
+
"Ruby on Rails is a web framework",
|
|
77
|
+
"Deep learning uses neural networks"
|
|
55
78
|
]
|
|
56
|
-
|
|
57
|
-
# Create embeddings for all documents
|
|
58
|
-
embedding = Fastembed::TextEmbedding.new
|
|
59
79
|
doc_vectors = embedding.embed(documents).to_a
|
|
60
80
|
|
|
61
|
-
# Search
|
|
62
|
-
query = "
|
|
63
|
-
query_vector = embedding.embed(query).first
|
|
81
|
+
# Search for a concept
|
|
82
|
+
query = "artificial intelligence and neural nets"
|
|
83
|
+
query_vector = embedding.embed([query]).first
|
|
84
|
+
|
|
85
|
+
# Find the most similar document (cosine similarity)
|
|
86
|
+
scores = doc_vectors.map { |v| query_vector.zip(v).sum { |a, b| a * b } }
|
|
87
|
+
best_idx = scores.each_with_index.max.last
|
|
88
|
+
|
|
89
|
+
puts documents[best_idx] # => "Deep learning uses neural networks"
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
### Integration with Vector Databases
|
|
93
|
+
|
|
94
|
+
```ruby
|
|
95
|
+
# With Qdrant
|
|
96
|
+
require "qdrant"
|
|
64
97
|
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
98
|
+
embedding = Fastembed::TextEmbedding.new
|
|
99
|
+
client = Qdrant::Client.new(url: "http://localhost:6333")
|
|
100
|
+
|
|
101
|
+
# Index documents
|
|
102
|
+
documents.each_with_index do |doc, i|
|
|
103
|
+
vector = embedding.embed([doc]).first
|
|
104
|
+
client.points.upsert(
|
|
105
|
+
collection_name: "docs",
|
|
106
|
+
points: [{ id: i, vector: vector, payload: { text: doc } }]
|
|
107
|
+
)
|
|
69
108
|
end
|
|
70
109
|
|
|
71
|
-
|
|
72
|
-
|
|
110
|
+
# Search
|
|
111
|
+
query_vector = embedding.embed(["your search query"]).first
|
|
112
|
+
results = client.points.search(collection_name: "docs", vector: query_vector, limit: 5)
|
|
73
113
|
```
|
|
74
114
|
|
|
75
|
-
##
|
|
115
|
+
## Text Embeddings
|
|
76
116
|
|
|
77
117
|
### Choose a Model
|
|
78
118
|
|
|
79
119
|
```ruby
|
|
80
|
-
# Default:
|
|
120
|
+
# Default: fast and accurate (384 dimensions, 67MB)
|
|
81
121
|
embedding = Fastembed::TextEmbedding.new
|
|
82
122
|
|
|
83
123
|
# Higher accuracy (768 dimensions, 210MB)
|
|
84
124
|
embedding = Fastembed::TextEmbedding.new(model_name: "BAAI/bge-base-en-v1.5")
|
|
85
125
|
|
|
86
|
-
# Multilingual
|
|
126
|
+
# Multilingual - 100+ languages (384 dimensions)
|
|
87
127
|
embedding = Fastembed::TextEmbedding.new(model_name: "intfloat/multilingual-e5-small")
|
|
88
128
|
|
|
89
|
-
# Long documents
|
|
129
|
+
# Long documents - 8192 token context (768 dimensions)
|
|
90
130
|
embedding = Fastembed::TextEmbedding.new(model_name: "nomic-ai/nomic-embed-text-v1.5")
|
|
91
131
|
```
|
|
92
132
|
|
|
93
|
-
###
|
|
133
|
+
### Supported Models
|
|
134
|
+
|
|
135
|
+
| Model | Dimensions | Size | Notes |
|
|
136
|
+
|-------|-----------|------|-------|
|
|
137
|
+
| `BAAI/bge-small-en-v1.5` | 384 | 67MB | Default, fast |
|
|
138
|
+
| `BAAI/bge-base-en-v1.5` | 768 | 210MB | Better accuracy |
|
|
139
|
+
| `BAAI/bge-large-en-v1.5` | 1024 | 1.2GB | Best accuracy |
|
|
140
|
+
| `sentence-transformers/all-MiniLM-L6-v2` | 384 | 90MB | General purpose |
|
|
141
|
+
| `sentence-transformers/all-mpnet-base-v2` | 768 | 440MB | High quality |
|
|
142
|
+
| `intfloat/multilingual-e5-small` | 384 | 450MB | 100+ languages |
|
|
143
|
+
| `intfloat/multilingual-e5-base` | 768 | 1.1GB | Multilingual, better |
|
|
144
|
+
| `nomic-ai/nomic-embed-text-v1.5` | 768 | 520MB | 8192 token context |
|
|
145
|
+
| `jinaai/jina-embeddings-v2-base-en` | 768 | 520MB | 8192 token context |
|
|
146
|
+
|
|
147
|
+
### Query vs Passage Embeddings
|
|
148
|
+
|
|
149
|
+
For asymmetric search (short queries, long documents), use specialized methods:
|
|
94
150
|
|
|
95
151
|
```ruby
|
|
96
|
-
#
|
|
97
|
-
|
|
152
|
+
# For search queries
|
|
153
|
+
query_vectors = embedding.query_embed(["What is Ruby?"]).to_a
|
|
98
154
|
|
|
99
|
-
|
|
100
|
-
|
|
155
|
+
# For documents/passages
|
|
156
|
+
doc_vectors = embedding.passage_embed(documents).to_a
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
### Lazy Evaluation
|
|
160
|
+
|
|
161
|
+
Embeddings are generated lazily, making it memory-efficient for large datasets:
|
|
162
|
+
|
|
163
|
+
```ruby
|
|
164
|
+
# Process millions of documents without loading all vectors into memory
|
|
165
|
+
File.foreach("documents.txt").lazy.each_slice(1000) do |batch|
|
|
166
|
+
embedding.embed(batch).each do |vector|
|
|
167
|
+
store_in_database(vector)
|
|
168
|
+
end
|
|
101
169
|
end
|
|
102
170
|
```
|
|
103
171
|
|
|
104
|
-
|
|
172
|
+
## Reranking
|
|
173
|
+
|
|
174
|
+
Rerankers score query-document pairs for more accurate relevance ranking. Use them after initial retrieval:
|
|
105
175
|
|
|
106
176
|
```ruby
|
|
107
|
-
Fastembed::
|
|
108
|
-
|
|
177
|
+
reranker = Fastembed::TextCrossEncoder.new
|
|
178
|
+
|
|
179
|
+
query = "What is machine learning?"
|
|
180
|
+
documents = [
|
|
181
|
+
"Machine learning is a branch of AI",
|
|
182
|
+
"The weather is nice today",
|
|
183
|
+
"Deep learning uses neural networks"
|
|
184
|
+
]
|
|
185
|
+
|
|
186
|
+
# Get raw scores (higher = more relevant)
|
|
187
|
+
scores = reranker.rerank(query: query, documents: documents)
|
|
188
|
+
# => [8.5, -10.2, 5.3]
|
|
189
|
+
|
|
190
|
+
# Get sorted results with metadata
|
|
191
|
+
results = reranker.rerank_with_scores(query: query, documents: documents, top_k: 2)
|
|
192
|
+
# => [
|
|
193
|
+
# { document: "Machine learning is...", score: 8.5, index: 0 },
|
|
194
|
+
# { document: "Deep learning uses...", score: 5.3, index: 2 }
|
|
195
|
+
# ]
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
### Reranker Models
|
|
199
|
+
|
|
200
|
+
| Model | Size | Notes |
|
|
201
|
+
|-------|------|-------|
|
|
202
|
+
| `cross-encoder/ms-marco-MiniLM-L-6-v2` | 80MB | Default, fast |
|
|
203
|
+
| `cross-encoder/ms-marco-MiniLM-L-12-v2` | 120MB | Better accuracy |
|
|
204
|
+
| `BAAI/bge-reranker-base` | 1.1GB | High accuracy |
|
|
205
|
+
| `BAAI/bge-reranker-large` | 2.2GB | Best accuracy |
|
|
206
|
+
|
|
207
|
+
## Sparse Embeddings
|
|
208
|
+
|
|
209
|
+
SPLADE models produce sparse vectors where each dimension corresponds to a vocabulary term. Great for hybrid search:
|
|
210
|
+
|
|
211
|
+
```ruby
|
|
212
|
+
sparse = Fastembed::TextSparseEmbedding.new
|
|
213
|
+
|
|
214
|
+
result = sparse.embed(["Ruby programming language"]).first
|
|
215
|
+
# => #<SparseEmbedding indices=[1234, 5678, ...] values=[0.8, 1.2, ...]>
|
|
216
|
+
|
|
217
|
+
result.indices # vocabulary token IDs with non-zero weights
|
|
218
|
+
result.values # corresponding weights
|
|
219
|
+
result.nnz # number of non-zero elements
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
### Hybrid Search
|
|
223
|
+
|
|
224
|
+
Combine dense and sparse embeddings for better results:
|
|
225
|
+
|
|
226
|
+
```ruby
|
|
227
|
+
dense = Fastembed::TextEmbedding.new
|
|
228
|
+
sparse = Fastembed::TextSparseEmbedding.new
|
|
229
|
+
|
|
230
|
+
documents = ["your documents here"]
|
|
231
|
+
|
|
232
|
+
# Generate both types of embeddings
|
|
233
|
+
dense_vectors = dense.embed(documents).to_a
|
|
234
|
+
sparse_vectors = sparse.embed(documents).to_a
|
|
235
|
+
|
|
236
|
+
# Store both in your vector database and combine scores at query time
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
## Late Interaction (ColBERT)
|
|
240
|
+
|
|
241
|
+
ColBERT produces token-level embeddings for fine-grained matching:
|
|
242
|
+
|
|
243
|
+
```ruby
|
|
244
|
+
colbert = Fastembed::LateInteractionTextEmbedding.new
|
|
245
|
+
|
|
246
|
+
query = colbert.query_embed(["What is Ruby?"]).first
|
|
247
|
+
doc = colbert.embed(["Ruby is a programming language"]).first
|
|
248
|
+
|
|
249
|
+
# MaxSim scoring - sum of max similarities per query token
|
|
250
|
+
score = query.max_sim(doc)
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
### Late Interaction Models
|
|
254
|
+
|
|
255
|
+
| Model | Dimensions | Notes |
|
|
256
|
+
|-------|-----------|-------|
|
|
257
|
+
| `colbert-ir/colbertv2.0` | 128 | Default |
|
|
258
|
+
| `jinaai/jina-colbert-v1-en` | 768 | 8192 token context |
|
|
259
|
+
|
|
260
|
+
## Image Embeddings
|
|
261
|
+
|
|
262
|
+
Convert images to vectors for visual search:
|
|
263
|
+
|
|
264
|
+
```ruby
|
|
265
|
+
# Requires mini_magick gem
|
|
266
|
+
image_embed = Fastembed::ImageEmbedding.new
|
|
267
|
+
|
|
268
|
+
# From file paths
|
|
269
|
+
vectors = image_embed.embed(["photo1.jpg", "photo2.png"]).to_a
|
|
270
|
+
|
|
271
|
+
# From URLs
|
|
272
|
+
vectors = image_embed.embed(["https://example.com/image.jpg"]).to_a
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
### Image Models
|
|
276
|
+
|
|
277
|
+
| Model | Dimensions | Notes |
|
|
278
|
+
|-------|-----------|-------|
|
|
279
|
+
| `Qdrant/clip-ViT-B-32-vision` | 512 | Default, CLIP |
|
|
280
|
+
| `Qdrant/resnet50-onnx` | 2048 | ResNet50 |
|
|
281
|
+
| `jinaai/jina-clip-v1` | 768 | Jina CLIP |
|
|
282
|
+
|
|
283
|
+
## Async Processing
|
|
284
|
+
|
|
285
|
+
Run embeddings in background threads:
|
|
286
|
+
|
|
287
|
+
```ruby
|
|
288
|
+
embedding = Fastembed::TextEmbedding.new
|
|
289
|
+
|
|
290
|
+
# Start async embedding
|
|
291
|
+
future = embedding.embed_async(large_document_list)
|
|
292
|
+
|
|
293
|
+
# Do other work...
|
|
294
|
+
|
|
295
|
+
# Get results when ready (blocks until complete)
|
|
296
|
+
vectors = future.value
|
|
297
|
+
```
|
|
298
|
+
|
|
299
|
+
### Parallel Processing
|
|
300
|
+
|
|
301
|
+
```ruby
|
|
302
|
+
# Process multiple batches concurrently
|
|
303
|
+
futures = documents.each_slice(1000).map do |batch|
|
|
304
|
+
embedding.embed_async(batch)
|
|
109
305
|
end
|
|
306
|
+
|
|
307
|
+
# Wait for all and combine results
|
|
308
|
+
all_vectors = futures.flat_map(&:value)
|
|
110
309
|
```
|
|
111
310
|
|
|
112
|
-
|
|
311
|
+
### Future Methods
|
|
113
312
|
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
313
|
+
```ruby
|
|
314
|
+
future.complete? # check if done
|
|
315
|
+
future.pending? # check if still running
|
|
316
|
+
future.success? # completed without error?
|
|
317
|
+
future.failure? # completed with error?
|
|
318
|
+
future.error # get the error if failed
|
|
319
|
+
future.wait(timeout: 5) # wait up to 5 seconds
|
|
320
|
+
|
|
321
|
+
# Chaining
|
|
322
|
+
future.then { |vectors| vectors.map(&:first) }
|
|
323
|
+
.rescue { |e| puts "Error: #{e}" }
|
|
324
|
+
```
|
|
125
325
|
|
|
126
|
-
|
|
326
|
+
### Async Utilities
|
|
327
|
+
|
|
328
|
+
```ruby
|
|
329
|
+
# Wait for all futures
|
|
330
|
+
results = Fastembed::Async.all(futures)
|
|
331
|
+
|
|
332
|
+
# Get first completed result
|
|
333
|
+
result = Fastembed::Async.race(futures, timeout: 10)
|
|
334
|
+
```
|
|
335
|
+
|
|
336
|
+
## Progress Tracking
|
|
337
|
+
|
|
338
|
+
Track progress for large embedding jobs:
|
|
339
|
+
|
|
340
|
+
```ruby
|
|
341
|
+
embedding = Fastembed::TextEmbedding.new
|
|
342
|
+
|
|
343
|
+
documents = Array.new(10_000) { "document text" }
|
|
344
|
+
|
|
345
|
+
embedding.embed(documents, batch_size: 256) do |progress|
|
|
346
|
+
puts "Batch #{progress.current}/#{progress.total}"
|
|
347
|
+
puts "#{(progress.percentage * 100).round}% complete"
|
|
348
|
+
puts "~#{progress.documents_processed} documents processed"
|
|
349
|
+
end.to_a
|
|
350
|
+
```
|
|
351
|
+
|
|
352
|
+
## CLI
|
|
353
|
+
|
|
354
|
+
FastEmbed includes a command-line tool:
|
|
355
|
+
|
|
356
|
+
```bash
|
|
357
|
+
# List available models
|
|
358
|
+
fastembed list # embedding models
|
|
359
|
+
fastembed list-reranker # reranker models
|
|
360
|
+
fastembed list-sparse # sparse models
|
|
361
|
+
fastembed list-image # image models
|
|
362
|
+
|
|
363
|
+
# Get model info
|
|
364
|
+
fastembed info "BAAI/bge-small-en-v1.5"
|
|
365
|
+
|
|
366
|
+
# Pre-download a model
|
|
367
|
+
fastembed download "BAAI/bge-base-en-v1.5"
|
|
368
|
+
|
|
369
|
+
# Embed text (outputs JSON)
|
|
370
|
+
fastembed embed "Hello world" "Another text"
|
|
127
371
|
|
|
128
|
-
|
|
372
|
+
# Different output formats
|
|
373
|
+
fastembed embed -f ndjson "Hello world"
|
|
374
|
+
fastembed embed -f csv "Hello world"
|
|
129
375
|
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
| 1 document | ~6.5ms |
|
|
133
|
-
| 100 documents | ~500 docs/sec |
|
|
134
|
-
| 1000 documents | ~500 docs/sec |
|
|
376
|
+
# Read from file
|
|
377
|
+
fastembed embed -i documents.txt
|
|
135
378
|
|
|
136
|
-
|
|
379
|
+
# Use different model
|
|
380
|
+
fastembed embed -m "BAAI/bge-base-en-v1.5" "Hello"
|
|
381
|
+
|
|
382
|
+
# Rerank documents
|
|
383
|
+
fastembed rerank "query" "doc1" "doc2" "doc3"
|
|
384
|
+
|
|
385
|
+
# Benchmark a model
|
|
386
|
+
fastembed benchmark -m "BAAI/bge-small-en-v1.5" -n 100
|
|
387
|
+
```
|
|
388
|
+
|
|
389
|
+
## Custom Models
|
|
390
|
+
|
|
391
|
+
Register custom models from HuggingFace:
|
|
392
|
+
|
|
393
|
+
```ruby
|
|
394
|
+
Fastembed.register_model(
|
|
395
|
+
model_name: "my-org/my-model",
|
|
396
|
+
dim: 768,
|
|
397
|
+
description: "My custom model",
|
|
398
|
+
sources: { hf: "my-org/my-model" },
|
|
399
|
+
model_file: "onnx/model.onnx"
|
|
400
|
+
)
|
|
401
|
+
|
|
402
|
+
# Now use it like any other model
|
|
403
|
+
embedding = Fastembed::TextEmbedding.new(model_name: "my-org/my-model")
|
|
404
|
+
```
|
|
405
|
+
|
|
406
|
+
### Load from Local Directory
|
|
407
|
+
|
|
408
|
+
```ruby
|
|
409
|
+
embedding = Fastembed::TextEmbedding.new(
|
|
410
|
+
local_model_dir: "/path/to/model",
|
|
411
|
+
model_file: "model.onnx",
|
|
412
|
+
tokenizer_file: "tokenizer.json"
|
|
413
|
+
)
|
|
414
|
+
```
|
|
137
415
|
|
|
138
416
|
## Configuration
|
|
139
417
|
|
|
418
|
+
### Initialization Options
|
|
419
|
+
|
|
140
420
|
```ruby
|
|
141
421
|
Fastembed::TextEmbedding.new(
|
|
142
|
-
model_name: "BAAI/bge-small-en-v1.5", #
|
|
143
|
-
cache_dir: "~/.cache/fastembed", #
|
|
422
|
+
model_name: "BAAI/bge-small-en-v1.5", # model to use
|
|
423
|
+
cache_dir: "~/.cache/fastembed", # where to store models
|
|
144
424
|
threads: 4, # ONNX Runtime threads
|
|
145
|
-
providers: ["CUDAExecutionProvider"]
|
|
425
|
+
providers: ["CUDAExecutionProvider"], # GPU acceleration
|
|
426
|
+
show_progress: true, # show download progress
|
|
427
|
+
quantization: :q4 # use quantized model
|
|
428
|
+
)
|
|
429
|
+
```
|
|
430
|
+
|
|
431
|
+
### Quantization
|
|
432
|
+
|
|
433
|
+
Use smaller, faster models with quantization:
|
|
434
|
+
|
|
435
|
+
```ruby
|
|
436
|
+
# Available: :fp32 (default), :fp16, :int8, :uint8, :q4
|
|
437
|
+
embedding = Fastembed::TextEmbedding.new(quantization: :int8)
|
|
438
|
+
```
|
|
439
|
+
|
|
440
|
+
### Environment Variables
|
|
441
|
+
|
|
442
|
+
| Variable | Description |
|
|
443
|
+
|----------|-------------|
|
|
444
|
+
| `FASTEMBED_CACHE_PATH` | Custom model cache directory |
|
|
445
|
+
| `HF_TOKEN` | HuggingFace token for private models |
|
|
446
|
+
|
|
447
|
+
### GPU Acceleration
|
|
448
|
+
|
|
449
|
+
```ruby
|
|
450
|
+
# CUDA (Linux/Windows with NVIDIA GPU)
|
|
451
|
+
embedding = Fastembed::TextEmbedding.new(
|
|
452
|
+
providers: ["CUDAExecutionProvider", "CPUExecutionProvider"]
|
|
453
|
+
)
|
|
454
|
+
|
|
455
|
+
# CoreML (macOS)
|
|
456
|
+
embedding = Fastembed::TextEmbedding.new(
|
|
457
|
+
providers: ["CoreMLExecutionProvider", "CPUExecutionProvider"]
|
|
146
458
|
)
|
|
147
459
|
```
|
|
148
460
|
|
|
149
|
-
|
|
150
|
-
|
|
461
|
+
## Performance
|
|
462
|
+
|
|
463
|
+
On Apple M1 Max with the default model (BAAI/bge-small-en-v1.5):
|
|
464
|
+
|
|
465
|
+
| Batch Size | Documents/sec | Latency |
|
|
466
|
+
|------------|--------------|---------|
|
|
467
|
+
| 1 | ~150 | ~6.5ms |
|
|
468
|
+
| 32 | ~500 | ~64ms |
|
|
469
|
+
| 256 | ~550 | ~465ms |
|
|
470
|
+
|
|
471
|
+
Larger models are slower but more accurate. See [BENCHMARKS.md](BENCHMARKS.md) for detailed comparisons.
|
|
151
472
|
|
|
152
473
|
## Requirements
|
|
153
474
|
|
|
154
475
|
- Ruby >= 3.3
|
|
155
|
-
- ~70MB disk space
|
|
476
|
+
- ~70MB-2GB disk space (varies by model)
|
|
156
477
|
|
|
157
478
|
## Acknowledgments
|
|
158
479
|
|