floudsonnx 1.0.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- floudsonnx-1.0.0/LICENSE +17 -0
- floudsonnx-1.0.0/PKG-INFO +531 -0
- floudsonnx-1.0.0/README.md +484 -0
- floudsonnx-1.0.0/pyproject.toml +123 -0
- floudsonnx-1.0.0/setup.cfg +4 -0
- floudsonnx-1.0.0/src/floudsonnx/__init__.py +90 -0
- floudsonnx-1.0.0/src/floudsonnx/api/__init__.py +7 -0
- floudsonnx-1.0.0/src/floudsonnx/api/client.py +115 -0
- floudsonnx-1.0.0/src/floudsonnx/api/routers/__init__.py +4 -0
- floudsonnx-1.0.0/src/floudsonnx/api/routers/health.py +19 -0
- floudsonnx-1.0.0/src/floudsonnx/api/routers/models.py +93 -0
- floudsonnx-1.0.0/src/floudsonnx/api/server.py +30 -0
- floudsonnx-1.0.0/src/floudsonnx/cli/__init__.py +4 -0
- floudsonnx-1.0.0/src/floudsonnx/cli/main.py +190 -0
- floudsonnx-1.0.0/src/floudsonnx/config/__init__.py +8 -0
- floudsonnx-1.0.0/src/floudsonnx/config/model_config.py +108 -0
- floudsonnx-1.0.0/src/floudsonnx/config/settings.py +46 -0
- floudsonnx-1.0.0/src/floudsonnx/exceptions.py +84 -0
- floudsonnx-1.0.0/src/floudsonnx/runtime/__init__.py +8 -0
- floudsonnx-1.0.0/src/floudsonnx/runtime/loader.py +212 -0
- floudsonnx-1.0.0/src/floudsonnx/runtime/ort_seq2seq.py +57 -0
- floudsonnx-1.0.0/src/floudsonnx/runtime/ort_session.py +79 -0
- floudsonnx-1.0.0/src/floudsonnx/runtime/session_pool.py +116 -0
- floudsonnx-1.0.0/src/floudsonnx/runtime/strategy.py +54 -0
- floudsonnx-1.0.0/src/floudsonnx/runtime/tokenizer_cache.py +108 -0
- floudsonnx-1.0.0/src/floudsonnx/store/__init__.py +8 -0
- floudsonnx-1.0.0/src/floudsonnx/store/exporter_bridge.py +123 -0
- floudsonnx-1.0.0/src/floudsonnx/store/manifest.py +80 -0
- floudsonnx-1.0.0/src/floudsonnx/store/registry.py +202 -0
- floudsonnx-1.0.0/src/floudsonnx/utils/__init__.py +8 -0
- floudsonnx-1.0.0/src/floudsonnx/utils/concurrent_dict.py +89 -0
- floudsonnx-1.0.0/src/floudsonnx/utils/path_guard.py +41 -0
- floudsonnx-1.0.0/src/floudsonnx.egg-info/PKG-INFO +531 -0
- floudsonnx-1.0.0/src/floudsonnx.egg-info/SOURCES.txt +36 -0
- floudsonnx-1.0.0/src/floudsonnx.egg-info/dependency_links.txt +1 -0
- floudsonnx-1.0.0/src/floudsonnx.egg-info/entry_points.txt +2 -0
- floudsonnx-1.0.0/src/floudsonnx.egg-info/requires.txt +31 -0
- floudsonnx-1.0.0/src/floudsonnx.egg-info/top_level.txt +1 -0
floudsonnx-1.0.0/LICENSE
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
Apache License
|
|
2
|
+
Version 2.0, January 2004
|
|
3
|
+
http://www.apache.org/licenses/
|
|
4
|
+
|
|
5
|
+
Copyright (c) 2026 Goutam Malakar
|
|
6
|
+
|
|
7
|
+
Licensed under the Apache License, Version 2.0 (the "License");
|
|
8
|
+
you may not use this file except in compliance with the License.
|
|
9
|
+
You may obtain a copy of the License at
|
|
10
|
+
|
|
11
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
|
12
|
+
|
|
13
|
+
Unless required by applicable law or agreed to in writing, software
|
|
14
|
+
distributed under the License is distributed on an "AS IS" BASIS,
|
|
15
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
16
|
+
See the License for the specific language governing permissions and
|
|
17
|
+
limitations under the License.
|
|
@@ -0,0 +1,531 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: floudsonnx
|
|
3
|
+
Version: 1.0.0
|
|
4
|
+
Summary: Ollama-style ONNX model store and runtime - pull, cache, and serve ORT sessions
|
|
5
|
+
Author: Goutam Malakar
|
|
6
|
+
License-Expression: Apache-2.0
|
|
7
|
+
Project-URL: Repository, https://github.com/gmalakar/floudsonnx
|
|
8
|
+
Project-URL: Issues, https://github.com/gmalakar/floudsonnx/issues
|
|
9
|
+
Keywords: onnx,onnxruntime,model-serving,huggingface,inference,flouds
|
|
10
|
+
Classifier: Development Status :: 3 - Alpha
|
|
11
|
+
Classifier: Programming Language :: Python :: 3
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
14
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
15
|
+
Classifier: Topic :: Software Development :: Libraries
|
|
16
|
+
Classifier: Programming Language :: Python :: 3 :: Only
|
|
17
|
+
Requires-Python: <3.13,>=3.11
|
|
18
|
+
Description-Content-Type: text/markdown
|
|
19
|
+
License-File: LICENSE
|
|
20
|
+
Requires-Dist: onnxruntime>=1.20.1
|
|
21
|
+
Requires-Dist: transformers<4.58.0,>=4.44.0
|
|
22
|
+
Requires-Dist: pydantic>=2.0
|
|
23
|
+
Requires-Dist: numpy<1.27,>=1.26.4
|
|
24
|
+
Provides-Extra: export
|
|
25
|
+
Requires-Dist: flouds-model-exporter>=1.0.1; extra == "export"
|
|
26
|
+
Provides-Extra: seq2seq
|
|
27
|
+
Requires-Dist: optimum[onnxruntime]>=1.22.0; extra == "seq2seq"
|
|
28
|
+
Provides-Extra: server
|
|
29
|
+
Requires-Dist: fastapi>=0.110.0; extra == "server"
|
|
30
|
+
Requires-Dist: uvicorn[standard]>=0.29.0; extra == "server"
|
|
31
|
+
Provides-Extra: dev
|
|
32
|
+
Requires-Dist: pytest>=7.0.0; extra == "dev"
|
|
33
|
+
Requires-Dist: pytest-mock>=3.12.0; extra == "dev"
|
|
34
|
+
Requires-Dist: pre-commit==4.5.1; extra == "dev"
|
|
35
|
+
Requires-Dist: black==24.1.1; extra == "dev"
|
|
36
|
+
Requires-Dist: isort==5.13.2; extra == "dev"
|
|
37
|
+
Requires-Dist: flake8==7.0.0; extra == "dev"
|
|
38
|
+
Requires-Dist: flake8-bugbear; extra == "dev"
|
|
39
|
+
Requires-Dist: flake8-comprehensions; extra == "dev"
|
|
40
|
+
Requires-Dist: mypy==1.8.0; extra == "dev"
|
|
41
|
+
Requires-Dist: pyright==1.1.410; extra == "dev"
|
|
42
|
+
Requires-Dist: bandit==1.7.5; extra == "dev"
|
|
43
|
+
Requires-Dist: pbr; extra == "dev"
|
|
44
|
+
Provides-Extra: all
|
|
45
|
+
Requires-Dist: floudsonnx[export,seq2seq,server]; extra == "all"
|
|
46
|
+
Dynamic: license-file
|
|
47
|
+
|
|
48
|
+
# floudsonnx
|
|
49
|
+
|
|
50
|
+
[](https://www.python.org/)
|
|
51
|
+
[](LICENSE)
|
|
52
|
+
|
|
53
|
+
`floudsonnx` is a small ONNX model store and runtime for Python. It can pull
|
|
54
|
+
Hugging Face models into a local ONNX store, cache runtime sessions, and load
|
|
55
|
+
models as either `onnxruntime.InferenceSession` or Optimum
|
|
56
|
+
`ORTModelForSeq2SeqLM` objects.
|
|
57
|
+
|
|
58
|
+
The package is designed for applications that want an Ollama-like local model
|
|
59
|
+
store, but for ONNX artifacts.
|
|
60
|
+
|
|
61
|
+
## Important: Hugging Face Models Must Be Converted To ONNX
|
|
62
|
+
|
|
63
|
+
`floudsonnx` runs ONNX models. Hugging Face models must be exported to ONNX
|
|
64
|
+
before they can be loaded as runtime sessions.
|
|
65
|
+
|
|
66
|
+
Install the `export` extra when you want `floudsonnx` to convert Hugging Face
|
|
67
|
+
models automatically:
|
|
68
|
+
|
|
69
|
+
```bash
|
|
70
|
+
pip install "floudsonnx[export]"
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
That extra installs `flouds-model-exporter`, which is required for converting
|
|
74
|
+
Hugging Face models to ONNX. Without it, `floudsonnx` can still load models
|
|
75
|
+
that already exist in the local ONNX store, but it cannot pull/export new
|
|
76
|
+
Hugging Face models.
|
|
77
|
+
|
|
78
|
+
For private or gated Hugging Face models, provide a token with the standard
|
|
79
|
+
Hugging Face Hub variable:
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
export HUGGINGFACE_HUB_TOKEN="hf_xxx_your_token"
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
`HUGGINGFACE_HUB_TOKEN` is read by the Hugging Face/exporter stack during
|
|
86
|
+
pull/export.
|
|
87
|
+
|
|
88
|
+
You can also pass a token per call:
|
|
89
|
+
|
|
90
|
+
```python
|
|
91
|
+
from floudsonnx import pull
|
|
92
|
+
|
|
93
|
+
pull("org/private-model", model_for="fe", hf_token="hf_xxx_your_token")
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
Never commit real Hugging Face tokens to source control.
|
|
97
|
+
|
|
98
|
+
## Features
|
|
99
|
+
|
|
100
|
+
- Local model store under `~/.flouds/models` by default.
|
|
101
|
+
- Auto-export from Hugging Face through the optional `flouds-model-exporter`
|
|
102
|
+
integration.
|
|
103
|
+
- Runtime loading for feature extraction, sequence classification, ranker,
|
|
104
|
+
seq2seq, and LLM-style model categories.
|
|
105
|
+
- Thread-safe session caching for ONNX Runtime and seq2seq models.
|
|
106
|
+
- Tokenizer loading and caching.
|
|
107
|
+
- Python API, CLI, and optional FastAPI server.
|
|
108
|
+
- Manifest files for locally stored models.
|
|
109
|
+
|
|
110
|
+
## Installation
|
|
111
|
+
|
|
112
|
+
`floudsonnx` supports Python `3.11` and `3.12`.
|
|
113
|
+
|
|
114
|
+
```bash
|
|
115
|
+
pip install floudsonnx
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
The core install loads models that are already exported to ONNX. Install extras
|
|
119
|
+
for export, seq2seq, or server support:
|
|
120
|
+
|
|
121
|
+
```bash
|
|
122
|
+
pip install "floudsonnx[export]" # auto-export via flouds-model-exporter
|
|
123
|
+
pip install "floudsonnx[seq2seq]" # Optimum ORTModelForSeq2SeqLM support
|
|
124
|
+
pip install "floudsonnx[server]" # FastAPI + Uvicorn HTTP server
|
|
125
|
+
pip install "floudsonnx[all]" # export + seq2seq + server
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
Core dependencies:
|
|
129
|
+
|
|
130
|
+
- `onnxruntime`
|
|
131
|
+
- `transformers`
|
|
132
|
+
- `pydantic`
|
|
133
|
+
- `numpy`
|
|
134
|
+
|
|
135
|
+
Optional extras:
|
|
136
|
+
|
|
137
|
+
- `export`: `flouds-model-exporter>=1.0.1`
|
|
138
|
+
- `seq2seq`: `optimum[onnxruntime]`
|
|
139
|
+
- `server`: `fastapi`, `uvicorn[standard]`
|
|
140
|
+
|
|
141
|
+
For feature-extraction, classification, and ranker exports, prefer
|
|
142
|
+
`library="transformers"` to avoid `sentence-transformers` auto-detection in the
|
|
143
|
+
exporter.
|
|
144
|
+
|
|
145
|
+
### Store Location And `ONNX_PATH`
|
|
146
|
+
|
|
147
|
+
`floudsonnx` stores models under `~/.flouds/models` by default. Override the
|
|
148
|
+
store root with `FloudsOnnxSettings`:
|
|
149
|
+
|
|
150
|
+
```python
|
|
151
|
+
from floudsonnx import FloudsOnnxClient, FloudsOnnxSettings
|
|
152
|
+
|
|
153
|
+
client = FloudsOnnxClient(
|
|
154
|
+
FloudsOnnxSettings(onnx_path="/path/to/onnx/models")
|
|
155
|
+
)
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
When `floudsonnx` calls `flouds-model-exporter`, it sets the raw `ONNX_PATH`
|
|
159
|
+
environment variable internally so exported files land in the selected
|
|
160
|
+
`floudsonnx` store.
|
|
161
|
+
|
|
162
|
+
If you run `flouds-model-exporter` directly, use its native `ONNX_PATH`
|
|
163
|
+
variable:
|
|
164
|
+
|
|
165
|
+
```bash
|
|
166
|
+
export ONNX_PATH="/path/to/onnx/models"
|
|
167
|
+
flouds-export export --model-name t5-small --model-for s2s --task seq2seq-lm
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
## Quickstart
|
|
171
|
+
|
|
172
|
+
This example pulls a feature-extraction model, loads it, tokenizes input text,
|
|
173
|
+
builds the ONNX Runtime input feed from the session's actual inputs, and runs
|
|
174
|
+
inference.
|
|
175
|
+
|
|
176
|
+
```python
|
|
177
|
+
import numpy as np
|
|
178
|
+
|
|
179
|
+
from floudsonnx import create_model
|
|
180
|
+
|
|
181
|
+
model = create_model(
|
|
182
|
+
"sentence-transformers/all-MiniLM-L6-v2",
|
|
183
|
+
model_for="fe",
|
|
184
|
+
task="feature-extraction",
|
|
185
|
+
library="transformers",
|
|
186
|
+
normalize_embeddings=True,
|
|
187
|
+
)
|
|
188
|
+
|
|
189
|
+
encoded = model.tokenizer(
|
|
190
|
+
["Hello world"],
|
|
191
|
+
return_tensors="np",
|
|
192
|
+
padding=True,
|
|
193
|
+
truncation=True,
|
|
194
|
+
max_length=64,
|
|
195
|
+
)
|
|
196
|
+
|
|
197
|
+
session_inputs = {item.name for item in model.session.get_inputs()}
|
|
198
|
+
feed = {name: encoded[name].astype(np.int64) for name in session_inputs if name in encoded}
|
|
199
|
+
|
|
200
|
+
# Some exported encoder models require token_type_ids even when the tokenizer
|
|
201
|
+
# does not return them.
|
|
202
|
+
for missing_name in session_inputs - set(feed):
|
|
203
|
+
feed[missing_name] = np.zeros_like(next(iter(feed.values())))
|
|
204
|
+
|
|
205
|
+
outputs = model.run(None, feed)
|
|
206
|
+
print(outputs[0].shape)
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
The `create_model()` helper pulls/exports the model if needed, then loads it.
|
|
210
|
+
Use `load_model()` when the model is already present on disk and you do not
|
|
211
|
+
want auto-export.
|
|
212
|
+
|
|
213
|
+
## Seq2Seq Example
|
|
214
|
+
|
|
215
|
+
Seq2seq loading requires the `seq2seq` extra. Pulling from Hugging Face also
|
|
216
|
+
requires the `export` extra.
|
|
217
|
+
|
|
218
|
+
```python
|
|
219
|
+
from floudsonnx import create_model
|
|
220
|
+
|
|
221
|
+
model = create_model("t5-small", model_for="s2s", task="seq2seq-lm")
|
|
222
|
+
|
|
223
|
+
encoded = model.tokenizer(
|
|
224
|
+
["summarize: The quick brown fox jumps over the lazy dog."],
|
|
225
|
+
return_tensors="pt",
|
|
226
|
+
truncation=True,
|
|
227
|
+
max_length=64,
|
|
228
|
+
)
|
|
229
|
+
|
|
230
|
+
tokens = model.seq2seq_model.generate(
|
|
231
|
+
input_ids=encoded["input_ids"],
|
|
232
|
+
max_new_tokens=32,
|
|
233
|
+
)
|
|
234
|
+
|
|
235
|
+
print(model.tokenizer.decode(tokens[0], skip_special_tokens=True))
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
## Model Types
|
|
239
|
+
|
|
240
|
+
| `model_for` | Default export task | Runtime strategy |
|
|
241
|
+
|---|---|---|
|
|
242
|
+
| `fe` | `feature-extraction` | `onnxruntime.InferenceSession` |
|
|
243
|
+
| `sc` | `text-classification` | `onnxruntime.InferenceSession` |
|
|
244
|
+
| `ranker` | `text-classification` | `onnxruntime.InferenceSession` |
|
|
245
|
+
| `s2s` | `seq2seq-lm` | `ORTModelForSeq2SeqLM` |
|
|
246
|
+
| `llm` | `text-generation-with-past` | `onnxruntime.InferenceSession` by default, or `ORTModelForSeq2SeqLM` when configured with `use_seq2seqlm=True` |
|
|
247
|
+
|
|
248
|
+
## Python API
|
|
249
|
+
|
|
250
|
+
Top-level convenience functions:
|
|
251
|
+
|
|
252
|
+
```python
|
|
253
|
+
from floudsonnx import (
|
|
254
|
+
create_model,
|
|
255
|
+
list_models,
|
|
256
|
+
load_model,
|
|
257
|
+
pull,
|
|
258
|
+
remove_model,
|
|
259
|
+
)
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
Available top-level functions:
|
|
263
|
+
|
|
264
|
+
| Function | Description |
|
|
265
|
+
|---|---|
|
|
266
|
+
| `create_model(model_name, model_for="fe", **kwargs)` | Pull/export if needed, then load the model. |
|
|
267
|
+
| `load_model(model_name, model_for="fe")` | Load an existing local model without auto-export. |
|
|
268
|
+
| `pull(model_name, model_for="fe", **kwargs)` | Export/store the model and return its manifest. |
|
|
269
|
+
| `list_models()` | Return local `ModelManifest` objects. |
|
|
270
|
+
| `remove_model(model_name, model_for="fe")` | Delete the local model and evict cached sessions. |
|
|
271
|
+
|
|
272
|
+
For explicit settings and cache control, use `FloudsOnnxClient`:
|
|
273
|
+
|
|
274
|
+
```python
|
|
275
|
+
from floudsonnx import FloudsOnnxClient, FloudsOnnxSettings
|
|
276
|
+
|
|
277
|
+
settings = FloudsOnnxSettings(
|
|
278
|
+
home_dir="/data/flouds",
|
|
279
|
+
session_provider="CPUExecutionProvider",
|
|
280
|
+
)
|
|
281
|
+
client = FloudsOnnxClient(settings)
|
|
282
|
+
|
|
283
|
+
manifest = client.pull(
|
|
284
|
+
"BAAI/bge-base-en-v1.5",
|
|
285
|
+
model_for="fe",
|
|
286
|
+
task="feature-extraction",
|
|
287
|
+
library="transformers",
|
|
288
|
+
normalize_embeddings=True,
|
|
289
|
+
)
|
|
290
|
+
|
|
291
|
+
model = client.load_model("BAAI/bge-base-en-v1.5", model_for="fe")
|
|
292
|
+
print(model.session_strategy)
|
|
293
|
+
|
|
294
|
+
client.unload("BAAI/bge-base-en-v1.5", model_for="fe")
|
|
295
|
+
client.remove("BAAI/bge-base-en-v1.5", model_for="fe")
|
|
296
|
+
```
|
|
297
|
+
|
|
298
|
+
`FloudsOnnxClient` methods:
|
|
299
|
+
|
|
300
|
+
- `pull(...)`
|
|
301
|
+
- `list()`
|
|
302
|
+
- `remove(model_name, model_for="fe")`
|
|
303
|
+
- `create_model(...)`
|
|
304
|
+
- `load_model(model_name, model_for="fe")`
|
|
305
|
+
- `reload(model_name, model_for="fe")`
|
|
306
|
+
- `unload(model_name, model_for="fe")`
|
|
307
|
+
- `is_loaded(model_name, model_for="fe")`
|
|
308
|
+
- `cache_stats()`
|
|
309
|
+
|
|
310
|
+
## LoadedModel
|
|
311
|
+
|
|
312
|
+
`create_model()` and `load_model()` return a `LoadedModel` with:
|
|
313
|
+
|
|
314
|
+
- `model_name`
|
|
315
|
+
- `model_for`
|
|
316
|
+
- `model_dir`
|
|
317
|
+
- `config`
|
|
318
|
+
- `tokenizer`
|
|
319
|
+
- `session_strategy`
|
|
320
|
+
- `session` for `onnxruntime.InferenceSession` models
|
|
321
|
+
- `seq2seq_model` for `ORTModelForSeq2SeqLM` models
|
|
322
|
+
- `is_seq2seq`
|
|
323
|
+
- `run(output_names, input_feed, run_options=None)`
|
|
324
|
+
|
|
325
|
+
For encoder/classification/ranker models, call `model.run(...)`.
|
|
326
|
+
|
|
327
|
+
For seq2seq models, call `model.seq2seq_model.generate(...)`.
|
|
328
|
+
|
|
329
|
+
## CLI
|
|
330
|
+
|
|
331
|
+
```bash
|
|
332
|
+
floudsonnx --help
|
|
333
|
+
```
|
|
334
|
+
|
|
335
|
+
Commands:
|
|
336
|
+
|
|
337
|
+
```bash
|
|
338
|
+
# Pull/export a model to the local store
|
|
339
|
+
floudsonnx pull sentence-transformers/all-MiniLM-L6-v2 --for fe --task feature-extraction
|
|
340
|
+
|
|
341
|
+
# Disable ONNX optimization during pull
|
|
342
|
+
floudsonnx pull t5-small --for s2s --task seq2seq-lm --no-optimize
|
|
343
|
+
|
|
344
|
+
# List locally stored models
|
|
345
|
+
floudsonnx list
|
|
346
|
+
|
|
347
|
+
# Show manifest JSON
|
|
348
|
+
floudsonnx info sentence-transformers/all-MiniLM-L6-v2 --for fe
|
|
349
|
+
|
|
350
|
+
# Remove from local store
|
|
351
|
+
floudsonnx remove sentence-transformers/all-MiniLM-L6-v2 --for fe
|
|
352
|
+
|
|
353
|
+
# Evict and reload from disk
|
|
354
|
+
floudsonnx reload sentence-transformers/all-MiniLM-L6-v2 --for fe
|
|
355
|
+
|
|
356
|
+
# Show session cache stats
|
|
357
|
+
floudsonnx stats
|
|
358
|
+
|
|
359
|
+
# Start the optional HTTP server
|
|
360
|
+
floudsonnx serve --host 127.0.0.1 --port 19720
|
|
361
|
+
```
|
|
362
|
+
|
|
363
|
+
Current `pull` CLI options:
|
|
364
|
+
|
|
365
|
+
- `--for`
|
|
366
|
+
- `--task`
|
|
367
|
+
- `--optimize`
|
|
368
|
+
- `--no-optimize`
|
|
369
|
+
- `--optimization-level`
|
|
370
|
+
- `--opset-version`
|
|
371
|
+
- `--device`
|
|
372
|
+
- `--framework`
|
|
373
|
+
- `--library`
|
|
374
|
+
- `--normalize-embeddings`
|
|
375
|
+
- `--force`
|
|
376
|
+
- `--trust-remote-code`
|
|
377
|
+
- `--use-external-data-format`
|
|
378
|
+
- `--use-subprocess`
|
|
379
|
+
- `--use-fallback-if-failed`
|
|
380
|
+
- `--merge`
|
|
381
|
+
- `--skip-validator`
|
|
382
|
+
- `--hf-token`
|
|
383
|
+
|
|
384
|
+
Example:
|
|
385
|
+
|
|
386
|
+
```bash
|
|
387
|
+
floudsonnx pull sentence-transformers/all-MiniLM-L6-v2 \
|
|
388
|
+
--for fe \
|
|
389
|
+
--task feature-extraction \
|
|
390
|
+
--library transformers \
|
|
391
|
+
--normalize-embeddings
|
|
392
|
+
```
|
|
393
|
+
|
|
394
|
+
## Optional HTTP Server
|
|
395
|
+
|
|
396
|
+
Install the server extra:
|
|
397
|
+
|
|
398
|
+
```bash
|
|
399
|
+
pip install "floudsonnx[server]"
|
|
400
|
+
```
|
|
401
|
+
|
|
402
|
+
Start the server:
|
|
403
|
+
|
|
404
|
+
```bash
|
|
405
|
+
floudsonnx serve --host 127.0.0.1 --port 19720
|
|
406
|
+
```
|
|
407
|
+
|
|
408
|
+
Routes:
|
|
409
|
+
|
|
410
|
+
| Method | Path | Description |
|
|
411
|
+
|---|---|---|
|
|
412
|
+
| `GET` | `/health` | Health check. |
|
|
413
|
+
| `GET` | `/api/v1/models` | List local model manifests. |
|
|
414
|
+
| `GET` | `/api/v1/models/{name}?model_for=fe` | Get one manifest. |
|
|
415
|
+
| `POST` | `/api/v1/models/pull` | Pull/export a model. |
|
|
416
|
+
| `POST` | `/api/v1/models/load` | Load a local model into cache. |
|
|
417
|
+
| `POST` | `/api/v1/models/reload` | Evict and reload a model. |
|
|
418
|
+
| `POST` | `/api/v1/models/unload` | Evict a model from memory. |
|
|
419
|
+
| `DELETE` | `/api/v1/models/{name}?model_for=fe` | Remove a model from disk. |
|
|
420
|
+
| `GET` | `/api/v1/stats` | Return cache statistics. |
|
|
421
|
+
|
|
422
|
+
Pull request body:
|
|
423
|
+
|
|
424
|
+
```json
|
|
425
|
+
{
|
|
426
|
+
"model_name": "sentence-transformers/all-MiniLM-L6-v2",
|
|
427
|
+
"model_for": "fe",
|
|
428
|
+
"task": "feature-extraction",
|
|
429
|
+
"force": false,
|
|
430
|
+
"optimize": false,
|
|
431
|
+
"trust_remote_code": false,
|
|
432
|
+
"use_external_data_format": false,
|
|
433
|
+
"use_fallback_if_failed": false,
|
|
434
|
+
"hf_token": null
|
|
435
|
+
}
|
|
436
|
+
```
|
|
437
|
+
|
|
438
|
+
Load/reload/unload request body:
|
|
439
|
+
|
|
440
|
+
```json
|
|
441
|
+
{
|
|
442
|
+
"model_name": "sentence-transformers/all-MiniLM-L6-v2",
|
|
443
|
+
"model_for": "fe"
|
|
444
|
+
}
|
|
445
|
+
```
|
|
446
|
+
|
|
447
|
+
## Configuration
|
|
448
|
+
|
|
449
|
+
Configure `floudsonnx` with `FloudsOnnxSettings` when creating a client:
|
|
450
|
+
|
|
451
|
+
```python
|
|
452
|
+
from floudsonnx import FloudsOnnxClient, FloudsOnnxSettings
|
|
453
|
+
|
|
454
|
+
client = FloudsOnnxClient(
|
|
455
|
+
FloudsOnnxSettings(
|
|
456
|
+
home_dir="/data/flouds",
|
|
457
|
+
onnx_path="/data/flouds/models",
|
|
458
|
+
session_provider="CPUExecutionProvider",
|
|
459
|
+
encoder_cache_max=5,
|
|
460
|
+
decoder_cache_max=5,
|
|
461
|
+
seq2seq_cache_max=3,
|
|
462
|
+
)
|
|
463
|
+
)
|
|
464
|
+
```
|
|
465
|
+
|
|
466
|
+
Additional variables:
|
|
467
|
+
|
|
468
|
+
| Environment variable | Used by | Description |
|
|
469
|
+
|---|---|---|
|
|
470
|
+
| `HUGGINGFACE_HUB_TOKEN` | Hugging Face Hub / `flouds-model-exporter` | Standard token variable for private or gated Hugging Face models. |
|
|
471
|
+
| `ONNX_PATH` | `flouds-model-exporter` | Native exporter output root. `floudsonnx` sets this internally during export; configure the `floudsonnx` store with `FloudsOnnxSettings`. |
|
|
472
|
+
|
|
473
|
+
## Local Store Layout
|
|
474
|
+
|
|
475
|
+
Default layout:
|
|
476
|
+
|
|
477
|
+
```text
|
|
478
|
+
~/.flouds/
|
|
479
|
+
`-- models/
|
|
480
|
+
|-- fe/
|
|
481
|
+
| `-- all-MiniLM-L6-v2/
|
|
482
|
+
| |-- model.onnx
|
|
483
|
+
| |-- model_optimized.onnx
|
|
484
|
+
| |-- tokenizer.json
|
|
485
|
+
| `-- manifest.json
|
|
486
|
+
|-- s2s/
|
|
487
|
+
|-- sc/
|
|
488
|
+
|-- ranker/
|
|
489
|
+
`-- llm/
|
|
490
|
+
```
|
|
491
|
+
|
|
492
|
+
Each model directory contains a `manifest.json` with the model name, model
|
|
493
|
+
type, export options, selected runtime strategy, discovered ONNX files, and
|
|
494
|
+
model config.
|
|
495
|
+
|
|
496
|
+
## Development
|
|
497
|
+
|
|
498
|
+
```bash
|
|
499
|
+
pip install -r requirements-dev.txt
|
|
500
|
+
pip install -e ".[export,seq2seq,server]"
|
|
501
|
+
pre-commit install
|
|
502
|
+
pytest tests/unit -v
|
|
503
|
+
```
|
|
504
|
+
|
|
505
|
+
Useful checks:
|
|
506
|
+
|
|
507
|
+
```bash
|
|
508
|
+
python tools/check_dependency_sync.py
|
|
509
|
+
black --check src tests tools
|
|
510
|
+
isort --check-only src tests tools
|
|
511
|
+
flake8 src tests tools
|
|
512
|
+
mypy src/floudsonnx
|
|
513
|
+
pyright src/floudsonnx
|
|
514
|
+
python -m build
|
|
515
|
+
twine check dist/*
|
|
516
|
+
```
|
|
517
|
+
|
|
518
|
+
Integration tests download and export real models:
|
|
519
|
+
|
|
520
|
+
```bash
|
|
521
|
+
pytest tests/integration -m integration -v
|
|
522
|
+
```
|
|
523
|
+
|
|
524
|
+
## Release
|
|
525
|
+
|
|
526
|
+
Releases are tag-driven through GitHub Actions. See
|
|
527
|
+
[`docs/RELEASE_PROCESS.md`](docs/RELEASE_PROCESS.md).
|
|
528
|
+
|
|
529
|
+
## License
|
|
530
|
+
|
|
531
|
+
Apache-2.0. See [`LICENSE`](LICENSE).
|