mlx-raclate 0.1.0b1__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,216 @@
1
+ Metadata-Version: 2.4
2
+ Name: mlx-raclate
3
+ Version: 0.1.0b1
4
+ Summary: Raclate is a python library to run and train models for Retrieval and Classification, built on top of MLX.
5
+ Author-email: pappitti <pap@pitti.io>
6
+ License-File: LICENSE
7
+ Requires-Python: >=3.13
8
+ Requires-Dist: datasets>=4.4.1
9
+ Requires-Dist: fastapi>=0.121.3
10
+ Requires-Dist: huggingface-hub>=0.36.0
11
+ Requires-Dist: jinja2>=3.1.6
12
+ Requires-Dist: mlx>=0.29.0
13
+ Requires-Dist: transformers>=4.57.1
14
+ Requires-Dist: uvicorn>=0.38.0
15
+ Description-Content-Type: text/markdown
16
+
17
+ # RACLATE (MLX)
18
+
19
+ **R**etrieval **A**nd **C**lassification including **LATE** interaction on Apple Silicon.
20
+
21
+ `mlx-raclate` is a versatile library built on Apple's [MLX](https://github.com/ml-explore/mlx) framework. It provides a unified interface to **train** and **run** classifiers and embedding models - including ModernBERT and Late Interaction (ColBERT-style) models - natively on macOS.
22
+
23
+ > **Note:** This project evolved from `modernbert-mlx` to support a wider range of architectures and tasks. It is currently feature-complete but in an early release stage; bugs may occur.
24
+
25
+ ## Key Features
26
+
27
+ * **Apple Silicon Native:** Fully optimized for M-series chips using MLX.
28
+ * **Unified Pipeline:** A single interface to load and run Masked LM, Text Classification, and Sentence Similarity tasks.
29
+ * **Late Interaction Support:** First-class support for **MaxSim** (ColBERT-style) retrieval, particularly with LFM2 and ModernBERT architectures.
30
+ * **Full Fine-Tuning:** specialized trainer for fine-tuning small-to-mid-sized models (ModernBERT, Qwen2.5/3, LFM2, Gemma) on local hardware.
31
+
32
+ ## Installation
33
+
34
+ Install via `uv` or `pip`:
35
+
36
+ ```bash
37
+ uv add --prerelease=allow mlx-raclate
38
+ # or
39
+ pip install --pre mlx-raclate
40
+ ```
41
+
42
+ From source:
43
+
44
+ ```bash
45
+ git clone https://github.com/pappitti/mlx-raclate.git
46
+ cd mlx-raclate
47
+ uv sync
48
+ ```
49
+
50
+ ## Supported Architectures
51
+
52
+ `mlx-raclate` supports architectures specifically useful for efficient local retrieval and classification:
53
+
54
+ * **ModernBERT**: (e.g., `answerdotai/ModernBERT-base`)
55
+ * **LFM2**: Liquid Foundation Models (e.g., `LiquidAI/LFM2-350M`, `LiquidAI/LFM2-ColBERT-350M`)
56
+ * **Qwen3 Embedding**: (e.g., `Qwen/Qwen3-Embedding-0.6B`)
57
+ * **Gemma3 Embedding**: (e.g., `google/embeddinggemma-300m`)
58
+ * **T5Gemma Encoder**: stripping out the decoder part of T5Gemma models (e.g, `google/t5gemma-2b-2b-ul2`)
59
+
60
+ ## Inference: Quick Start
61
+
62
+ The library uses a `pipeline` concept similar to Hugging Face Transformers. You can specify a pipeline manually, or let the loader infer it from the model configuration.
63
+ If no pipeline is found, the Model class is loaded, which returns normalized embeddings.
64
+
65
+ ### 1. Text Classification
66
+ Supports multi-class, multi-label, and regression tasks.
67
+
68
+ ```python
69
+ from mlx_raclate.utils.utils import load
70
+ import mlx.core as mx
71
+
72
+ # Load model (pipeline inferred automatically if architecture matches)
73
+ model, tokenizer = load("NousResearch/Minos-v1", pipeline="text-classification")
74
+
75
+ texts = ["How do I build a bomb?", "What is the capital of France?"]
76
+
77
+ # Batch tokenize
78
+ inputs = tokenizer._tokenizer(texts, return_tensors="mlx", padding=True, truncation=True)
79
+
80
+ # Run Inference
81
+ outputs = model(
82
+ input_ids=inputs['input_ids'],
83
+ attention_mask=inputs['attention_mask']
84
+ )
85
+
86
+ # Get probabilities
87
+ probs = outputs["probabilities"]
88
+ # ... process argmax/topk
89
+ ```
90
+
91
+ ### 2. Sentence Similarity (Dense Retrieval)
92
+ 2.1 Standard Bi-Encoder approach using Cosine Similarity.
93
+
94
+ ```python
95
+ from mlx_raclate.utils.utils import load
96
+
97
+ model, tokenizer = load("nomic-ai/modernbert-embed-base", pipeline="sentence-similarity")
98
+
99
+ queries = ["What is MLX?"]
100
+ docs = ["MLX is an array framework for Apple Silicon."]
101
+
102
+ # Encode
103
+ q_input = tokenizer._tokenizer(queries, return_tensors="mlx", padding=True)
104
+ d_input = tokenizer._tokenizer(docs, return_tensors="mlx", padding=True)
105
+
106
+ # Forward pass calculates similarity matrix automatically
107
+ outputs = model(
108
+ input_ids=q_input['input_ids'],
109
+ reference_input_ids=d_input['input_ids'],
110
+ attention_mask=q_input['attention_mask'],
111
+ reference_attention_mask=d_input['attention_mask']
112
+ )
113
+
114
+ print(outputs['similarities']) # Cosine similarity matrix
115
+ ```
116
+
117
+ 2.2. Late Interaction (ColBERT / MaxSim)
118
+ By enabling `use_late_interaction`, the model computes **MaxSim** scores (interaction between all token embeddings) instead of standard Cosine similarity of pooled embeddings.
119
+
120
+ This is ideal for models like **LFM2-ColBERT**, but it works with any model.
121
+
122
+ ```python
123
+ from mlx_raclate.utils.utils import load
124
+
125
+ # Load a ColBERT-style model
126
+ model, tokenizer = load(
127
+ "LiquidAI/LFM2-ColBERT-350M",
128
+ pipeline="sentence-similarity",
129
+ model_config={"use_late_interaction": True} # <--- Enables MaxSim
130
+ )
131
+
132
+ queries = ["Who creates liquid neural networks?"]
133
+ docs = ["Liquid AI is a company founded by researchers from MIT..."]
134
+
135
+ # Tokenize
136
+ q_input = tokenizer._tokenizer(queries, return_tensors="mlx", padding=True)
137
+ d_input = tokenizer._tokenizer(docs, return_tensors="mlx", padding=True)
138
+
139
+ # The model keeps embeddings unpooled and computes MaxSim
140
+ outputs = model(
141
+ input_ids=q_input['input_ids'],
142
+ reference_input_ids=d_input['input_ids'],
143
+ attention_mask=q_input['attention_mask'],
144
+ reference_attention_mask=d_input['attention_mask']
145
+ )
146
+
147
+ print("MaxSim Scores:", outputs['similarities'])
148
+ ```
149
+
150
+ ## Pipelines Reference
151
+
152
+ When using `load()`, the `pipeline` argument determines the class and return types. If not provided, `mlx-raclate` attempts to infer it from the `config.json`.
153
+
154
+ | Pipeline | Class | Output | Use Case |
155
+ | :--- | :--- | :--- | :--- |
156
+ | `embeddings` | `Model` | Raw Embeddings | Feature extraction |
157
+ | `text-classification` | `ModelForSequenceClassification` | Logits/Probs | Sentiment, Intent, Regression |
158
+ | `sentence-similarity` | `ModelForSentenceSimilarity` | Embeddings & Similarity | Semantic Search, RAG |
159
+ | `sentence-transformers` | `ModelForSentenceTransformers` | Embeddings & Similarity | Same as `sentence-similarity` but different sanitization strategy for Sentence Transformers weights |
160
+ | `masked-lm` | `ModelForMaskedLM` | Token Logits | Domain adaptation, MLM training |
161
+ | `token-classification` | `ModelForTokenClassification` | Token Logits | NER tasks |
162
+ | `zero-shot-classification` | `ModelForMaskedLM` | Token Logits | Implementation of [this AnswerAI paper](https://arxiv.org/html/2502.03793v2) |
163
+
164
+ Detailed code for each pipeline is available in the `test` directory of this repository. See `tests/inference_examples`.
165
+
166
+ ## Server
167
+
168
+ `mlx-raclate` includes a FastAPI server for classifier inference. See `mlx_raclate.utils.server`
169
+
170
+ ## Training (Tuner)
171
+
172
+ `mlx-raclate` includes a robust training engine specifically designed for fine-tuning these architectures on Apple Silicon.
173
+
174
+ It supports:
175
+ * **Full Fine-tuning** (LoRA is not currently supported/needed for these model sizes).
176
+ * **Tasks:** Text Classification, Sentence Similarity (Bi-Encoder & Late Interaction), and Masked LM.
177
+ * **Efficiency:** Gradient Accumulation, Gradient Checkpointing, and Smart Collation.
178
+
179
+ For detailed training documentation, supported datasets, and CLI usage, please see [TUNER.md](src/mlx_raclate/tuner/TUNER.md).
180
+
181
+ ### Quick Training Snippet
182
+
183
+ ```python
184
+ from mlx_raclate.tuner.trainer import Trainer, TrainingArgs
185
+ from mlx_raclate.utils.utils import load
186
+
187
+ # Load model
188
+ model, tokenizer = load("Qwen/Qwen3-Embedding-0.6B", pipeline="text-classification", train=True)
189
+
190
+ # Define Args
191
+ args = TrainingArgs(
192
+ output_dir="outputs/my_classifier",
193
+ learning_rate=1e-5,
194
+ num_train_epochs=3,
195
+ batch_size=4
196
+ )
197
+
198
+ # Initialize Trainer
199
+ trainer = Trainer(
200
+ model=model,
201
+ tokenizer=tokenizer,
202
+ task_type="text-classification",
203
+ training_args=args,
204
+ train_dataset=train_dataset, # See TUNER.md for dataset formatting
205
+ eval_dataset=eval_dataset
206
+ )
207
+
208
+ trainer.train()
209
+ ```
210
+
211
+ ## Acknowledgements
212
+
213
+ * [MLX](https://github.com/ml-explore/mlx) team for the framework.
214
+ * [Transformers](https://github.com/huggingface/transformers) for the configuration standards.
215
+ * [MLX-Embeddings](https://github.com/Blaizzy/mlx-embeddings) for inspiration on broader embeddings architecture. MLX-Raclate focuses on longer-context models but you should definitely look there for BERT, XLM_RoBERTa and image embeddings.
216
+ * [PyLate](https://github.com/lightonai/pylate) for inspiration on Late Interaction mechanics.
@@ -0,0 +1,25 @@
1
+ mlx_raclate/__init__.py,sha256=AbpHGcgLb-kRsJGnwFEktk7uzpZOCcBY74-YBdrKVGs,1
2
+ mlx_raclate/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
3
+ mlx_raclate/models/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
4
+ mlx_raclate/models/base.py,sha256=0V8CfjWIoeZVwd7oWV5M3lif4k9P2R-DjVYaL5yqaSE,8794
5
+ mlx_raclate/models/gemma3_text.py,sha256=daSVYlKs4OITUuuI5KJPVSSkjfske3YLlYHjQ9DNMHU,32519
6
+ mlx_raclate/models/lfm2.py,sha256=o_TmjnZR3o35YG-3n3u2no4cVUfUvTOfQElAnC4S5yA,22885
7
+ mlx_raclate/models/modernbert.py,sha256=esMexSVnjvqz8-bcdamXSw0-pAP_lL1wws7Gk9OvLqM,33486
8
+ mlx_raclate/models/qwen3.py,sha256=4xiJyXs9vvIF_NBini-eYscesYgADZa_KEzZPerRaDY,20339
9
+ mlx_raclate/models/t5gemma_encoder.py,sha256=Q24jMAYtSB6DYwmF1XJ5lxQsO0FACH836jwTHIF_Co0,30377
10
+ mlx_raclate/tuner/TUNER.md,sha256=w_FgVIrV2U3ob_bwJwmGKDBWzw_7bS14wozAosas3x0,14983
11
+ mlx_raclate/tuner/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
12
+ mlx_raclate/tuner/collators.py,sha256=eELZWrwYTDUXaJTj4Go2pwnKhlS6yzdS8RNLvFsOhbU,11462
13
+ mlx_raclate/tuner/datasets.py,sha256=xnpTCTqpX8tIrb71E47E055wzxUSBDS3-uCGk3WAoYE,11212
14
+ mlx_raclate/tuner/model_card_utils.py,sha256=XeYia6OMCdZmt1k7NpLTfnpnlODTk3ztCz-2PjCYRMU,6466
15
+ mlx_raclate/tuner/trainer.py,sha256=WU9RkYQxhCtjhaLKIzOAhDOQQ6XHGgqFnuIJfUvZHPc,26734
16
+ mlx_raclate/tuner/utils.py,sha256=Vo95eX0NWTuRXv4pshRp-VTcnaYLtHprRXiR0snndQY,9879
17
+ mlx_raclate/utils/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
18
+ mlx_raclate/utils/server.py,sha256=2BnwGyG0YnJADqMrsvpN5DPyTEANozeLaaOACpbICfU,15541
19
+ mlx_raclate/utils/tokenizer_utils.py,sha256=vrk5tVS0GfeP8OKgGGyI2eS4J2wOcxaBy5jqPRSXs3c,11415
20
+ mlx_raclate/utils/train.py,sha256=p7jTCTQXN7I5yc_LkdzOY84YaYaVTlhfpjGP3zq8k_U,12929
21
+ mlx_raclate/utils/utils.py,sha256=2SszegS5ppmREfIdgr7xKweY1R5u00PoyYnaNI1BG2w,23526
22
+ mlx_raclate-0.1.0b1.dist-info/METADATA,sha256=SvOVI0DjIQ7B4FTDbES61lqUQF_pSBEb2rlO20HODNc,8344
23
+ mlx_raclate-0.1.0b1.dist-info/WHEEL,sha256=WLgqFyCfm_KASv4WHyYy0P3pM_m7J5L9k2skdKLirC8,87
24
+ mlx_raclate-0.1.0b1.dist-info/licenses/LICENSE,sha256=GEfg4GmBQu1DR8FEGp-oHI-93USx2LvNXjZH-ZF1nX8,1035
25
+ mlx_raclate-0.1.0b1.dist-info/RECORD,,
@@ -0,0 +1,4 @@
1
+ Wheel-Version: 1.0
2
+ Generator: hatchling 1.28.0
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
@@ -0,0 +1,19 @@
1
+ MIT License
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ of this software and associated documentation files (the "Software"), to deal
5
+ in the Software without restriction, including without limitation the rights
6
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ copies of the Software, and to permit persons to whom the Software is
8
+ furnished to do so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in all
11
+ copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19
+ SOFTWARE.