mlx-raclate 0.1.0b1__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- mlx_raclate/__init__.py +1 -0
- mlx_raclate/models/__init__.py +0 -0
- mlx_raclate/models/base.py +225 -0
- mlx_raclate/models/gemma3_text.py +913 -0
- mlx_raclate/models/lfm2.py +671 -0
- mlx_raclate/models/modernbert.py +900 -0
- mlx_raclate/models/qwen3.py +582 -0
- mlx_raclate/models/t5gemma_encoder.py +857 -0
- mlx_raclate/py.typed +0 -0
- mlx_raclate/tuner/TUNER.md +305 -0
- mlx_raclate/tuner/__init__.py +0 -0
- mlx_raclate/tuner/collators.py +291 -0
- mlx_raclate/tuner/datasets.py +247 -0
- mlx_raclate/tuner/model_card_utils.py +206 -0
- mlx_raclate/tuner/trainer.py +648 -0
- mlx_raclate/tuner/utils.py +292 -0
- mlx_raclate/utils/__init__.py +0 -0
- mlx_raclate/utils/server.py +390 -0
- mlx_raclate/utils/tokenizer_utils.py +353 -0
- mlx_raclate/utils/train.py +249 -0
- mlx_raclate/utils/utils.py +625 -0
- mlx_raclate-0.1.0b1.dist-info/METADATA +216 -0
- mlx_raclate-0.1.0b1.dist-info/RECORD +25 -0
- mlx_raclate-0.1.0b1.dist-info/WHEEL +4 -0
- mlx_raclate-0.1.0b1.dist-info/licenses/LICENSE +19 -0
|
@@ -0,0 +1,216 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: mlx-raclate
|
|
3
|
+
Version: 0.1.0b1
|
|
4
|
+
Summary: Raclate is a python library to run and train models for Retrieval and Classification, built on top of MLX.
|
|
5
|
+
Author-email: pappitti <pap@pitti.io>
|
|
6
|
+
License-File: LICENSE
|
|
7
|
+
Requires-Python: >=3.13
|
|
8
|
+
Requires-Dist: datasets>=4.4.1
|
|
9
|
+
Requires-Dist: fastapi>=0.121.3
|
|
10
|
+
Requires-Dist: huggingface-hub>=0.36.0
|
|
11
|
+
Requires-Dist: jinja2>=3.1.6
|
|
12
|
+
Requires-Dist: mlx>=0.29.0
|
|
13
|
+
Requires-Dist: transformers>=4.57.1
|
|
14
|
+
Requires-Dist: uvicorn>=0.38.0
|
|
15
|
+
Description-Content-Type: text/markdown
|
|
16
|
+
|
|
17
|
+
# RACLATE (MLX)
|
|
18
|
+
|
|
19
|
+
**R**etrieval **A**nd **C**lassification including **LATE** interaction on Apple Silicon.
|
|
20
|
+
|
|
21
|
+
`mlx-raclate` is a versatile library built on Apple's [MLX](https://github.com/ml-explore/mlx) framework. It provides a unified interface to **train** and **run** classifiers and embedding models - including ModernBERT and Late Interaction (ColBERT-style) models - natively on macOS.
|
|
22
|
+
|
|
23
|
+
> **Note:** This project evolved from `modernbert-mlx` to support a wider range of architectures and tasks. It is currently feature-complete but in an early release stage; bugs may occur.
|
|
24
|
+
|
|
25
|
+
## Key Features
|
|
26
|
+
|
|
27
|
+
* **Apple Silicon Native:** Fully optimized for M-series chips using MLX.
|
|
28
|
+
* **Unified Pipeline:** A single interface to load and run Masked LM, Text Classification, and Sentence Similarity tasks.
|
|
29
|
+
* **Late Interaction Support:** First-class support for **MaxSim** (ColBERT-style) retrieval, particularly with LFM2 and ModernBERT architectures.
|
|
30
|
+
* **Full Fine-Tuning:** specialized trainer for fine-tuning small-to-mid-sized models (ModernBERT, Qwen2.5/3, LFM2, Gemma) on local hardware.
|
|
31
|
+
|
|
32
|
+
## Installation
|
|
33
|
+
|
|
34
|
+
Install via `uv` or `pip`:
|
|
35
|
+
|
|
36
|
+
```bash
|
|
37
|
+
uv add --prerelease=allow mlx-raclate
|
|
38
|
+
# or
|
|
39
|
+
pip install --pre mlx-raclate
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
From source:
|
|
43
|
+
|
|
44
|
+
```bash
|
|
45
|
+
git clone https://github.com/pappitti/mlx-raclate.git
|
|
46
|
+
cd mlx-raclate
|
|
47
|
+
uv sync
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
## Supported Architectures
|
|
51
|
+
|
|
52
|
+
`mlx-raclate` supports architectures specifically useful for efficient local retrieval and classification:
|
|
53
|
+
|
|
54
|
+
* **ModernBERT**: (e.g., `answerdotai/ModernBERT-base`)
|
|
55
|
+
* **LFM2**: Liquid Foundation Models (e.g., `LiquidAI/LFM2-350M`, `LiquidAI/LFM2-ColBERT-350M`)
|
|
56
|
+
* **Qwen3 Embedding**: (e.g., `Qwen/Qwen3-Embedding-0.6B`)
|
|
57
|
+
* **Gemma3 Embedding**: (e.g., `google/embeddinggemma-300m`)
|
|
58
|
+
* **T5Gemma Encoder**: stripping out the decoder part of T5Gemma models (e.g, `google/t5gemma-2b-2b-ul2`)
|
|
59
|
+
|
|
60
|
+
## Inference: Quick Start
|
|
61
|
+
|
|
62
|
+
The library uses a `pipeline` concept similar to Hugging Face Transformers. You can specify a pipeline manually, or let the loader infer it from the model configuration.
|
|
63
|
+
If no pipeline is found, the Model class is loaded, which returns normalized embeddings.
|
|
64
|
+
|
|
65
|
+
### 1. Text Classification
|
|
66
|
+
Supports multi-class, multi-label, and regression tasks.
|
|
67
|
+
|
|
68
|
+
```python
|
|
69
|
+
from mlx_raclate.utils.utils import load
|
|
70
|
+
import mlx.core as mx
|
|
71
|
+
|
|
72
|
+
# Load model (pipeline inferred automatically if architecture matches)
|
|
73
|
+
model, tokenizer = load("NousResearch/Minos-v1", pipeline="text-classification")
|
|
74
|
+
|
|
75
|
+
texts = ["How do I build a bomb?", "What is the capital of France?"]
|
|
76
|
+
|
|
77
|
+
# Batch tokenize
|
|
78
|
+
inputs = tokenizer._tokenizer(texts, return_tensors="mlx", padding=True, truncation=True)
|
|
79
|
+
|
|
80
|
+
# Run Inference
|
|
81
|
+
outputs = model(
|
|
82
|
+
input_ids=inputs['input_ids'],
|
|
83
|
+
attention_mask=inputs['attention_mask']
|
|
84
|
+
)
|
|
85
|
+
|
|
86
|
+
# Get probabilities
|
|
87
|
+
probs = outputs["probabilities"]
|
|
88
|
+
# ... process argmax/topk
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
### 2. Sentence Similarity (Dense Retrieval)
|
|
92
|
+
2.1 Standard Bi-Encoder approach using Cosine Similarity.
|
|
93
|
+
|
|
94
|
+
```python
|
|
95
|
+
from mlx_raclate.utils.utils import load
|
|
96
|
+
|
|
97
|
+
model, tokenizer = load("nomic-ai/modernbert-embed-base", pipeline="sentence-similarity")
|
|
98
|
+
|
|
99
|
+
queries = ["What is MLX?"]
|
|
100
|
+
docs = ["MLX is an array framework for Apple Silicon."]
|
|
101
|
+
|
|
102
|
+
# Encode
|
|
103
|
+
q_input = tokenizer._tokenizer(queries, return_tensors="mlx", padding=True)
|
|
104
|
+
d_input = tokenizer._tokenizer(docs, return_tensors="mlx", padding=True)
|
|
105
|
+
|
|
106
|
+
# Forward pass calculates similarity matrix automatically
|
|
107
|
+
outputs = model(
|
|
108
|
+
input_ids=q_input['input_ids'],
|
|
109
|
+
reference_input_ids=d_input['input_ids'],
|
|
110
|
+
attention_mask=q_input['attention_mask'],
|
|
111
|
+
reference_attention_mask=d_input['attention_mask']
|
|
112
|
+
)
|
|
113
|
+
|
|
114
|
+
print(outputs['similarities']) # Cosine similarity matrix
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
2.2. Late Interaction (ColBERT / MaxSim)
|
|
118
|
+
By enabling `use_late_interaction`, the model computes **MaxSim** scores (interaction between all token embeddings) instead of standard Cosine similarity of pooled embeddings.
|
|
119
|
+
|
|
120
|
+
This is ideal for models like **LFM2-ColBERT**, but it works with any model.
|
|
121
|
+
|
|
122
|
+
```python
|
|
123
|
+
from mlx_raclate.utils.utils import load
|
|
124
|
+
|
|
125
|
+
# Load a ColBERT-style model
|
|
126
|
+
model, tokenizer = load(
|
|
127
|
+
"LiquidAI/LFM2-ColBERT-350M",
|
|
128
|
+
pipeline="sentence-similarity",
|
|
129
|
+
model_config={"use_late_interaction": True} # <--- Enables MaxSim
|
|
130
|
+
)
|
|
131
|
+
|
|
132
|
+
queries = ["Who creates liquid neural networks?"]
|
|
133
|
+
docs = ["Liquid AI is a company founded by researchers from MIT..."]
|
|
134
|
+
|
|
135
|
+
# Tokenize
|
|
136
|
+
q_input = tokenizer._tokenizer(queries, return_tensors="mlx", padding=True)
|
|
137
|
+
d_input = tokenizer._tokenizer(docs, return_tensors="mlx", padding=True)
|
|
138
|
+
|
|
139
|
+
# The model keeps embeddings unpooled and computes MaxSim
|
|
140
|
+
outputs = model(
|
|
141
|
+
input_ids=q_input['input_ids'],
|
|
142
|
+
reference_input_ids=d_input['input_ids'],
|
|
143
|
+
attention_mask=q_input['attention_mask'],
|
|
144
|
+
reference_attention_mask=d_input['attention_mask']
|
|
145
|
+
)
|
|
146
|
+
|
|
147
|
+
print("MaxSim Scores:", outputs['similarities'])
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
## Pipelines Reference
|
|
151
|
+
|
|
152
|
+
When using `load()`, the `pipeline` argument determines the class and return types. If not provided, `mlx-raclate` attempts to infer it from the `config.json`.
|
|
153
|
+
|
|
154
|
+
| Pipeline | Class | Output | Use Case |
|
|
155
|
+
| :--- | :--- | :--- | :--- |
|
|
156
|
+
| `embeddings` | `Model` | Raw Embeddings | Feature extraction |
|
|
157
|
+
| `text-classification` | `ModelForSequenceClassification` | Logits/Probs | Sentiment, Intent, Regression |
|
|
158
|
+
| `sentence-similarity` | `ModelForSentenceSimilarity` | Embeddings & Similarity | Semantic Search, RAG |
|
|
159
|
+
| `sentence-transformers` | `ModelForSentenceTransformers` | Embeddings & Similarity | Same as `sentence-similarity` but different sanitization strategy for Sentence Transformers weights |
|
|
160
|
+
| `masked-lm` | `ModelForMaskedLM` | Token Logits | Domain adaptation, MLM training |
|
|
161
|
+
| `token-classification` | `ModelForTokenClassification` | Token Logits | NER tasks |
|
|
162
|
+
| `zero-shot-classification` | `ModelForMaskedLM` | Token Logits | Implementation of [this AnswerAI paper](https://arxiv.org/html/2502.03793v2) |
|
|
163
|
+
|
|
164
|
+
Detailed code for each pipeline is available in the `test` directory of this repository. See `tests/inference_examples`.
|
|
165
|
+
|
|
166
|
+
## Server
|
|
167
|
+
|
|
168
|
+
`mlx-raclate` includes a FastAPI server for classifier inference. See `mlx_raclate.utils.server`
|
|
169
|
+
|
|
170
|
+
## Training (Tuner)
|
|
171
|
+
|
|
172
|
+
`mlx-raclate` includes a robust training engine specifically designed for fine-tuning these architectures on Apple Silicon.
|
|
173
|
+
|
|
174
|
+
It supports:
|
|
175
|
+
* **Full Fine-tuning** (LoRA is not currently supported/needed for these model sizes).
|
|
176
|
+
* **Tasks:** Text Classification, Sentence Similarity (Bi-Encoder & Late Interaction), and Masked LM.
|
|
177
|
+
* **Efficiency:** Gradient Accumulation, Gradient Checkpointing, and Smart Collation.
|
|
178
|
+
|
|
179
|
+
For detailed training documentation, supported datasets, and CLI usage, please see [TUNER.md](src/mlx_raclate/tuner/TUNER.md).
|
|
180
|
+
|
|
181
|
+
### Quick Training Snippet
|
|
182
|
+
|
|
183
|
+
```python
|
|
184
|
+
from mlx_raclate.tuner.trainer import Trainer, TrainingArgs
|
|
185
|
+
from mlx_raclate.utils.utils import load
|
|
186
|
+
|
|
187
|
+
# Load model
|
|
188
|
+
model, tokenizer = load("Qwen/Qwen3-Embedding-0.6B", pipeline="text-classification", train=True)
|
|
189
|
+
|
|
190
|
+
# Define Args
|
|
191
|
+
args = TrainingArgs(
|
|
192
|
+
output_dir="outputs/my_classifier",
|
|
193
|
+
learning_rate=1e-5,
|
|
194
|
+
num_train_epochs=3,
|
|
195
|
+
batch_size=4
|
|
196
|
+
)
|
|
197
|
+
|
|
198
|
+
# Initialize Trainer
|
|
199
|
+
trainer = Trainer(
|
|
200
|
+
model=model,
|
|
201
|
+
tokenizer=tokenizer,
|
|
202
|
+
task_type="text-classification",
|
|
203
|
+
training_args=args,
|
|
204
|
+
train_dataset=train_dataset, # See TUNER.md for dataset formatting
|
|
205
|
+
eval_dataset=eval_dataset
|
|
206
|
+
)
|
|
207
|
+
|
|
208
|
+
trainer.train()
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
## Acknowledgements
|
|
212
|
+
|
|
213
|
+
* [MLX](https://github.com/ml-explore/mlx) team for the framework.
|
|
214
|
+
* [Transformers](https://github.com/huggingface/transformers) for the configuration standards.
|
|
215
|
+
* [MLX-Embeddings](https://github.com/Blaizzy/mlx-embeddings) for inspiration on broader embeddings architecture. MLX-Raclate focuses on longer-context models but you should definitely look there for BERT, XLM_RoBERTa and image embeddings.
|
|
216
|
+
* [PyLate](https://github.com/lightonai/pylate) for inspiration on Late Interaction mechanics.
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
mlx_raclate/__init__.py,sha256=AbpHGcgLb-kRsJGnwFEktk7uzpZOCcBY74-YBdrKVGs,1
|
|
2
|
+
mlx_raclate/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
3
|
+
mlx_raclate/models/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
4
|
+
mlx_raclate/models/base.py,sha256=0V8CfjWIoeZVwd7oWV5M3lif4k9P2R-DjVYaL5yqaSE,8794
|
|
5
|
+
mlx_raclate/models/gemma3_text.py,sha256=daSVYlKs4OITUuuI5KJPVSSkjfske3YLlYHjQ9DNMHU,32519
|
|
6
|
+
mlx_raclate/models/lfm2.py,sha256=o_TmjnZR3o35YG-3n3u2no4cVUfUvTOfQElAnC4S5yA,22885
|
|
7
|
+
mlx_raclate/models/modernbert.py,sha256=esMexSVnjvqz8-bcdamXSw0-pAP_lL1wws7Gk9OvLqM,33486
|
|
8
|
+
mlx_raclate/models/qwen3.py,sha256=4xiJyXs9vvIF_NBini-eYscesYgADZa_KEzZPerRaDY,20339
|
|
9
|
+
mlx_raclate/models/t5gemma_encoder.py,sha256=Q24jMAYtSB6DYwmF1XJ5lxQsO0FACH836jwTHIF_Co0,30377
|
|
10
|
+
mlx_raclate/tuner/TUNER.md,sha256=w_FgVIrV2U3ob_bwJwmGKDBWzw_7bS14wozAosas3x0,14983
|
|
11
|
+
mlx_raclate/tuner/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
12
|
+
mlx_raclate/tuner/collators.py,sha256=eELZWrwYTDUXaJTj4Go2pwnKhlS6yzdS8RNLvFsOhbU,11462
|
|
13
|
+
mlx_raclate/tuner/datasets.py,sha256=xnpTCTqpX8tIrb71E47E055wzxUSBDS3-uCGk3WAoYE,11212
|
|
14
|
+
mlx_raclate/tuner/model_card_utils.py,sha256=XeYia6OMCdZmt1k7NpLTfnpnlODTk3ztCz-2PjCYRMU,6466
|
|
15
|
+
mlx_raclate/tuner/trainer.py,sha256=WU9RkYQxhCtjhaLKIzOAhDOQQ6XHGgqFnuIJfUvZHPc,26734
|
|
16
|
+
mlx_raclate/tuner/utils.py,sha256=Vo95eX0NWTuRXv4pshRp-VTcnaYLtHprRXiR0snndQY,9879
|
|
17
|
+
mlx_raclate/utils/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
18
|
+
mlx_raclate/utils/server.py,sha256=2BnwGyG0YnJADqMrsvpN5DPyTEANozeLaaOACpbICfU,15541
|
|
19
|
+
mlx_raclate/utils/tokenizer_utils.py,sha256=vrk5tVS0GfeP8OKgGGyI2eS4J2wOcxaBy5jqPRSXs3c,11415
|
|
20
|
+
mlx_raclate/utils/train.py,sha256=p7jTCTQXN7I5yc_LkdzOY84YaYaVTlhfpjGP3zq8k_U,12929
|
|
21
|
+
mlx_raclate/utils/utils.py,sha256=2SszegS5ppmREfIdgr7xKweY1R5u00PoyYnaNI1BG2w,23526
|
|
22
|
+
mlx_raclate-0.1.0b1.dist-info/METADATA,sha256=SvOVI0DjIQ7B4FTDbES61lqUQF_pSBEb2rlO20HODNc,8344
|
|
23
|
+
mlx_raclate-0.1.0b1.dist-info/WHEEL,sha256=WLgqFyCfm_KASv4WHyYy0P3pM_m7J5L9k2skdKLirC8,87
|
|
24
|
+
mlx_raclate-0.1.0b1.dist-info/licenses/LICENSE,sha256=GEfg4GmBQu1DR8FEGp-oHI-93USx2LvNXjZH-ZF1nX8,1035
|
|
25
|
+
mlx_raclate-0.1.0b1.dist-info/RECORD,,
|
|
@@ -0,0 +1,19 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
4
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
5
|
+
in the Software without restriction, including without limitation the rights
|
|
6
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
7
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
8
|
+
furnished to do so, subject to the following conditions:
|
|
9
|
+
|
|
10
|
+
The above copyright notice and this permission notice shall be included in all
|
|
11
|
+
copies or substantial portions of the Software.
|
|
12
|
+
|
|
13
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
14
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
15
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
16
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
17
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
18
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
19
|
+
SOFTWARE.
|