fastembed 1.0.0 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +1 -0
- data/.yardopts +6 -0
- data/BENCHMARKS.md +124 -1
- data/CHANGELOG.md +14 -0
- data/README.md +395 -74
- data/benchmark/compare_all.rb +167 -0
- data/benchmark/compare_python.py +60 -0
- data/benchmark/memory_profile.rb +70 -0
- data/benchmark/profile.rb +198 -0
- data/benchmark/reranker_benchmark.rb +158 -0
- data/exe/fastembed +6 -0
- data/fastembed.gemspec +3 -0
- data/lib/fastembed/async.rb +193 -0
- data/lib/fastembed/base_model.rb +247 -0
- data/lib/fastembed/base_model_info.rb +61 -0
- data/lib/fastembed/cli.rb +745 -0
- data/lib/fastembed/custom_model_registry.rb +255 -0
- data/lib/fastembed/image_embedding.rb +313 -0
- data/lib/fastembed/late_interaction_embedding.rb +260 -0
- data/lib/fastembed/late_interaction_model_info.rb +91 -0
- data/lib/fastembed/model_info.rb +59 -19
- data/lib/fastembed/model_management.rb +82 -23
- data/lib/fastembed/onnx_embedding_model.rb +25 -4
- data/lib/fastembed/pooling.rb +39 -3
- data/lib/fastembed/progress.rb +52 -0
- data/lib/fastembed/quantization.rb +75 -0
- data/lib/fastembed/reranker_model_info.rb +91 -0
- data/lib/fastembed/sparse_embedding.rb +261 -0
- data/lib/fastembed/sparse_model_info.rb +80 -0
- data/lib/fastembed/text_cross_encoder.rb +217 -0
- data/lib/fastembed/text_embedding.rb +161 -28
- data/lib/fastembed/validators.rb +59 -0
- data/lib/fastembed/version.rb +1 -1
- data/lib/fastembed.rb +42 -1
- data/plan.md +257 -0
- data/scripts/verify_models.rb +229 -0
- metadata +70 -3
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 954cdd87ba985d20a8a5bf0e5676178ebc891032f275fb4cb8cf547c0378a476
|
|
4
|
+
data.tar.gz: 4c11fbd4894906ab46a99746a2efa16f8a56e494a4fcd64012d642b992914918
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 948c3d705493fe26f3fd4fcb8b14965c0223b7ddd119307502b9af8f8503552b7693846a2a004ea97aca3d3cc5cd39f77017acd82ee2b48be56b879e6eaa1c7b
|
|
7
|
+
data.tar.gz: 1ceb5a0de19bc743b215332706d20facc49cda03f7e82211ff4306de1fb443d3f508f2b4b7a59dbdc43f3bbb720fe7accb80f342f45e7cc7cd0bf1c65fbb105e
|
data/.rubocop.yml
CHANGED
data/.yardopts
ADDED
data/BENCHMARKS.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Benchmarks
|
|
2
2
|
|
|
3
|
-
Performance benchmarks on Apple M1 Max, Ruby 3.3.
|
|
3
|
+
Performance benchmarks on Apple M1 Max, Ruby 3.3, Python 3.13 (January 2026).
|
|
4
4
|
|
|
5
5
|
## Single Document Latency
|
|
6
6
|
|
|
@@ -69,6 +69,98 @@ We tested CoreML execution provider to see if GPU/Neural Engine acceleration hel
|
|
|
69
69
|
|
|
70
70
|
**Recommendation:** Stick with the default CPU provider.
|
|
71
71
|
|
|
72
|
+
## Ruby vs Python FastEmbed
|
|
73
|
+
|
|
74
|
+
Comprehensive comparison of fastembed-rb against Python FastEmbed (v0.7.4) on Apple M1 Max.
|
|
75
|
+
|
|
76
|
+
### Text Embeddings (100 documents)
|
|
77
|
+
|
|
78
|
+
| Model | Ruby (docs/sec) | Python (docs/sec) | Ratio |
|
|
79
|
+
|-------|-----------------|-------------------|-------|
|
|
80
|
+
| BAAI/bge-small-en-v1.5 | 566 | 629 | 0.90x |
|
|
81
|
+
| BAAI/bge-base-en-v1.5 | 176 | 169 | **1.04x** |
|
|
82
|
+
| all-MiniLM-L6-v2 | 922 | 1309 | 0.70x |
|
|
83
|
+
|
|
84
|
+
Ruby is within 10-30% of Python for text embeddings. Both use the same ONNX Runtime backend.
|
|
85
|
+
|
|
86
|
+
### Rerankers (100 query-document pairs)
|
|
87
|
+
|
|
88
|
+
| Model | Ruby (pairs/sec) | Python (pairs/sec) | Ratio |
|
|
89
|
+
|-------|------------------|-------------------|-------|
|
|
90
|
+
| ms-marco-MiniLM-L-6-v2 | 986 | 982 | **1.00x** |
|
|
91
|
+
| ms-marco-MiniLM-L-12-v2 | 398 | 512 | 0.78x |
|
|
92
|
+
| BAAI/bge-reranker-base | 132 | 124 | **1.06x** |
|
|
93
|
+
|
|
94
|
+
Ruby matches or beats Python on rerankers.
|
|
95
|
+
|
|
96
|
+
### Sparse Embeddings - SPLADE (100 documents)
|
|
97
|
+
|
|
98
|
+
| Model | Ruby (docs/sec) | Python (docs/sec) | Ratio |
|
|
99
|
+
|-------|-----------------|-------------------|-------|
|
|
100
|
+
| Splade_PP_en_v1 | 23 | 108 | 0.21x |
|
|
101
|
+
|
|
102
|
+
Ruby's SPLADE implementation is slower due to post-processing overhead. Python uses optimized numpy operations for the log1p transformation.
|
|
103
|
+
|
|
104
|
+
### Late Interaction - ColBERT (100 documents)
|
|
105
|
+
|
|
106
|
+
| Model | Ruby (docs/sec) | Python (docs/sec) | Ratio |
|
|
107
|
+
|-------|-----------------|-------------------|-------|
|
|
108
|
+
| colbert-ir/colbertv2.0 | 191 | 184 | **1.04x** |
|
|
109
|
+
|
|
110
|
+
Ruby slightly outperforms Python for ColBERT embeddings.
|
|
111
|
+
|
|
112
|
+
### Image Embeddings (100 images)
|
|
113
|
+
|
|
114
|
+
| Model | Ruby (imgs/sec) | Python (imgs/sec) | Ratio |
|
|
115
|
+
|-------|-----------------|-------------------|-------|
|
|
116
|
+
| clip-ViT-B-32-vision | 9 | 42 | 0.22x |
|
|
117
|
+
|
|
118
|
+
Ruby's image embedding is slower due to MiniMagick subprocess overhead for image preprocessing. Python uses Pillow which is more efficient for batch processing.
|
|
119
|
+
|
|
120
|
+
### Summary
|
|
121
|
+
|
|
122
|
+
| Category | Ruby vs Python |
|
|
123
|
+
|----------|---------------|
|
|
124
|
+
| Text Embeddings | ~90% of Python speed |
|
|
125
|
+
| Rerankers | **Equal or faster** |
|
|
126
|
+
| ColBERT | **Equal or faster** |
|
|
127
|
+
| Sparse (SPLADE) | ~21% of Python speed |
|
|
128
|
+
| Image | ~22% of Python speed |
|
|
129
|
+
|
|
130
|
+
**Recommendation:** Ruby is excellent for text embeddings, reranking, and ColBERT. For heavy sparse or image embedding workloads, consider Python.
|
|
131
|
+
|
|
132
|
+
### Why the Differences?
|
|
133
|
+
|
|
134
|
+
Both implementations use the same ONNX Runtime for model inference. The differences come from:
|
|
135
|
+
|
|
136
|
+
1. **Text/Reranker/ColBERT** - Hot path is tokenization (Rust) + inference (C++). Minimal language overhead. Ruby matches Python.
|
|
137
|
+
|
|
138
|
+
2. **Sparse (SPLADE)** - Requires post-processing with log1p transformation. Python's numpy vectorization is faster than Ruby loops.
|
|
139
|
+
|
|
140
|
+
3. **Image** - Requires image preprocessing (resize, normalize). Python's Pillow is faster than Ruby's MiniMagick (subprocess-based).
|
|
141
|
+
|
|
142
|
+
### Memory Usage
|
|
143
|
+
|
|
144
|
+
| State | Ruby |
|
|
145
|
+
|-------|------|
|
|
146
|
+
| Initial | 33 MB |
|
|
147
|
+
| Model loaded | 277 MB |
|
|
148
|
+
| +1000 embeddings | 359 MB |
|
|
149
|
+
| After GC | 355 MB |
|
|
150
|
+
|
|
151
|
+
Memory is stable across multiple embedding rounds - no leaks detected.
|
|
152
|
+
|
|
153
|
+
### Embedding Quality
|
|
154
|
+
|
|
155
|
+
Both implementations produce identical embeddings (same ONNX models), verified by cosine similarity tests:
|
|
156
|
+
|
|
157
|
+
```
|
|
158
|
+
'dog' vs 'puppy' = 0.855 (high - PASS)
|
|
159
|
+
'dog' vs 'cat' = 0.688 (medium - PASS)
|
|
160
|
+
'machine learning' vs 'artificial intelligence' = 0.718 (high - PASS)
|
|
161
|
+
'machine learning' vs 'cooking recipes' = 0.426 (low - PASS)
|
|
162
|
+
```
|
|
163
|
+
|
|
72
164
|
## Running Your Own Benchmarks
|
|
73
165
|
|
|
74
166
|
```ruby
|
|
@@ -81,3 +173,34 @@ texts = Array.new(1000) { "Sample text for benchmarking" }
|
|
|
81
173
|
result = Benchmark.measure { embedding.embed(texts).to_a }
|
|
82
174
|
puts "#{1000 / result.real} docs/sec"
|
|
83
175
|
```
|
|
176
|
+
|
|
177
|
+
## Reranker Performance
|
|
178
|
+
|
|
179
|
+
TextCrossEncoder (cross-encoder) performance:
|
|
180
|
+
|
|
181
|
+
| Model | 100 pairs | Throughput |
|
|
182
|
+
|-------|-----------|------------|
|
|
183
|
+
| ms-marco-MiniLM-L-6-v2 | 102ms | **986 pairs/sec** |
|
|
184
|
+
| ms-marco-MiniLM-L-12-v2 | 252ms | **398 pairs/sec** |
|
|
185
|
+
| bge-reranker-base | 758ms | **132 pairs/sec** |
|
|
186
|
+
|
|
187
|
+
Cross-encoders are slower than embedding models because they process query-document pairs together rather than encoding them independently.
|
|
188
|
+
|
|
189
|
+
### Profiling Scripts
|
|
190
|
+
|
|
191
|
+
The `benchmark/` directory contains:
|
|
192
|
+
|
|
193
|
+
- `profile.rb` - Comprehensive embedding performance profiling
|
|
194
|
+
- `reranker_benchmark.rb` - Reranker/cross-encoder performance
|
|
195
|
+
- `memory_profile.rb` - Memory usage analysis
|
|
196
|
+
- `compare_python.py` - Python FastEmbed comparison
|
|
197
|
+
- `compare_all.rb` - Unified Ruby vs Python comparison
|
|
198
|
+
|
|
199
|
+
Run with:
|
|
200
|
+
```bash
|
|
201
|
+
ruby benchmark/profile.rb
|
|
202
|
+
ruby benchmark/reranker_benchmark.rb
|
|
203
|
+
ruby benchmark/memory_profile.rb
|
|
204
|
+
ruby benchmark/compare_all.rb
|
|
205
|
+
python3 benchmark/compare_python.py
|
|
206
|
+
```
|
data/CHANGELOG.md
CHANGED
|
@@ -7,6 +7,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
7
7
|
|
|
8
8
|
## [Unreleased]
|
|
9
9
|
|
|
10
|
+
### Added
|
|
11
|
+
|
|
12
|
+
- `TextCrossEncoder` class for reranking query-document pairs
|
|
13
|
+
- Support for cross-encoder/reranker models:
|
|
14
|
+
- cross-encoder/ms-marco-MiniLM-L-6-v2 (default)
|
|
15
|
+
- cross-encoder/ms-marco-MiniLM-L-12-v2
|
|
16
|
+
- BAAI/bge-reranker-base
|
|
17
|
+
- BAAI/bge-reranker-large
|
|
18
|
+
- jinaai/jina-reranker-v1-turbo-en
|
|
19
|
+
- `rerank` method for scoring query-document pairs
|
|
20
|
+
- `rerank_with_scores` method for sorted results with top_k support
|
|
21
|
+
- CLI tool (`fastembed`) with `embed` and `list-models` commands
|
|
22
|
+
- Comprehensive benchmark suite comparing Ruby vs Python performance
|
|
23
|
+
|
|
10
24
|
## [1.0.0] - 2025-01-08
|
|
11
25
|
|
|
12
26
|
### Added
|