shadowlm 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. shadowlm-0.1.0/LICENSE +21 -0
  2. shadowlm-0.1.0/PKG-INFO +491 -0
  3. shadowlm-0.1.0/README.md +447 -0
  4. shadowlm-0.1.0/pyproject.toml +80 -0
  5. shadowlm-0.1.0/setup.cfg +4 -0
  6. shadowlm-0.1.0/shadowlm/__init__.py +43 -0
  7. shadowlm-0.1.0/shadowlm/_quiet.py +23 -0
  8. shadowlm-0.1.0/shadowlm/accel.py +70 -0
  9. shadowlm-0.1.0/shadowlm/ascii.py +79 -0
  10. shadowlm-0.1.0/shadowlm/backends/__init__.py +64 -0
  11. shadowlm-0.1.0/shadowlm/backends/base.py +99 -0
  12. shadowlm-0.1.0/shadowlm/backends/mlx.py +884 -0
  13. shadowlm-0.1.0/shadowlm/backends/torch.py +863 -0
  14. shadowlm-0.1.0/shadowlm/bottleneck.py +128 -0
  15. shadowlm-0.1.0/shadowlm/capture.py +239 -0
  16. shadowlm-0.1.0/shadowlm/charts.py +112 -0
  17. shadowlm-0.1.0/shadowlm/cli.py +289 -0
  18. shadowlm-0.1.0/shadowlm/data.py +273 -0
  19. shadowlm-0.1.0/shadowlm/methods/__init__.py +54 -0
  20. shadowlm-0.1.0/shadowlm/methods/adapter.py +15 -0
  21. shadowlm-0.1.0/shadowlm/methods/base.py +82 -0
  22. shadowlm-0.1.0/shadowlm/methods/bitfit.py +17 -0
  23. shadowlm-0.1.0/shadowlm/methods/cpt.py +16 -0
  24. shadowlm-0.1.0/shadowlm/methods/dora.py +15 -0
  25. shadowlm-0.1.0/shadowlm/methods/dpo.py +17 -0
  26. shadowlm-0.1.0/shadowlm/methods/full.py +16 -0
  27. shadowlm-0.1.0/shadowlm/methods/grpo.py +26 -0
  28. shadowlm-0.1.0/shadowlm/methods/lora.py +14 -0
  29. shadowlm-0.1.0/shadowlm/methods/more.py +23 -0
  30. shadowlm-0.1.0/shadowlm/methods/ptuning.py +14 -0
  31. shadowlm-0.1.0/shadowlm/methods/qlora.py +15 -0
  32. shadowlm-0.1.0/shadowlm/methods/soft_prompt.py +15 -0
  33. shadowlm-0.1.0/shadowlm/models.py +329 -0
  34. shadowlm-0.1.0/shadowlm/more.py +288 -0
  35. shadowlm-0.1.0/shadowlm/rl.py +220 -0
  36. shadowlm-0.1.0/shadowlm/runs.py +77 -0
  37. shadowlm-0.1.0/shadowlm/training.py +332 -0
  38. shadowlm-0.1.0/shadowlm.egg-info/PKG-INFO +491 -0
  39. shadowlm-0.1.0/shadowlm.egg-info/SOURCES.txt +41 -0
  40. shadowlm-0.1.0/shadowlm.egg-info/dependency_links.txt +1 -0
  41. shadowlm-0.1.0/shadowlm.egg-info/entry_points.txt +2 -0
  42. shadowlm-0.1.0/shadowlm.egg-info/requires.txt +23 -0
  43. shadowlm-0.1.0/shadowlm.egg-info/top_level.txt +1 -0
shadowlm-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Lyzr Research Labs
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,491 @@
1
+ Metadata-Version: 2.4
2
+ Name: shadowlm
3
+ Version: 0.1.0
4
+ Summary: ShadowLM Trainer — fine-tune any open model, from any harness, with any method.
5
+ Author: Lyzr Research Labs
6
+ Maintainer-email: Khush Patel <khush@lyzr.ai>
7
+ License-Expression: MIT
8
+ Project-URL: Homepage, https://github.com/open-gitagent/shadowLM
9
+ Project-URL: Repository, https://github.com/open-gitagent/shadowLM
10
+ Project-URL: Issues, https://github.com/open-gitagent/shadowLM/issues
11
+ Keywords: fine-tuning,llm,lora,qlora,dpo,grpo,rlhf,mlx,pytorch,peft,agents,training
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: Intended Audience :: Science/Research
15
+ Classifier: Operating System :: OS Independent
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.10
18
+ Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Programming Language :: Python :: 3.12
20
+ Classifier: Programming Language :: Python :: 3.13
21
+ Classifier: Programming Language :: Python :: 3.14
22
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
23
+ Requires-Python: >=3.10
24
+ Description-Content-Type: text/markdown
25
+ License-File: LICENSE
26
+ Provides-Extra: mlx
27
+ Requires-Dist: mlx-lm>=0.20; extra == "mlx"
28
+ Provides-Extra: preference
29
+ Requires-Dist: mlx-lm-lora>=2.0; extra == "preference"
30
+ Provides-Extra: retrieval
31
+ Requires-Dist: sentence-transformers>=3.0; extra == "retrieval"
32
+ Provides-Extra: torch
33
+ Requires-Dist: datasets>=2.20; extra == "torch"
34
+ Requires-Dist: transformers>=4.43; extra == "torch"
35
+ Requires-Dist: trl>=0.9; extra == "torch"
36
+ Requires-Dist: peft>=0.12; extra == "torch"
37
+ Requires-Dist: accelerate>=0.33; extra == "torch"
38
+ Requires-Dist: torch>=2.3; extra == "torch"
39
+ Provides-Extra: all
40
+ Requires-Dist: shadowlm[retrieval,torch]; extra == "all"
41
+ Provides-Extra: mlx-all
42
+ Requires-Dist: shadowlm[mlx,preference,retrieval]; extra == "mlx-all"
43
+ Dynamic: license-file
44
+
45
+ <p align="center">
46
+ <img src="https://raw.githubusercontent.com/open-gitagent/shadowLM/main/assets/banner.png" alt="ShadowLM Trainer — any open model, any harness, any method">
47
+ </p>
48
+
49
+ <p align="center">
50
+ <img alt="License: MIT" src="https://img.shields.io/badge/license-MIT-E5484D">
51
+ <img alt="Python 3.10+" src="https://img.shields.io/badge/python-3.10%2B-16120E">
52
+ <img alt="Methods" src="https://img.shields.io/badge/training_methods-12-E5484D">
53
+ <img alt="Core dependencies" src="https://img.shields.io/badge/core_dependencies-0-16120E">
54
+ </p>
55
+
56
+ <details>
57
+ <summary>Table of contents</summary>
58
+
59
+ - [Why ShadowLM Trainer](#why-shadowlm-trainer)
60
+ - [Backends](#backends)
61
+ - [Training methods](#training-methods)
62
+ - [Install & run](#install--run)
63
+ - [The shadow accelerator](#the-shadow-accelerator)
64
+ - [Training parameters](#training-parameters)
65
+ - [API surface](#api-surface)
66
+ - [Layout](#layout)
67
+ - [The road ahead](#the-road-ahead)
68
+ - [License](#license)
69
+
70
+ </details>
71
+
72
+ # ShadowLM Trainer
73
+
74
+ **A fine-tuning SDK. Any open model. Any harness. Any method.**
75
+
76
+ ```bash
77
+ pip install 'shadowlm[all]' # the full package — every dependency included
78
+ pip install shadowlm # core SDK only (zero dependencies)
79
+ ```
80
+
81
+ ```python
82
+ import shadowlm as slm
83
+
84
+ ds = slm.Dataset.from_jsonl("data.jsonl").as_chat() # datasets
85
+ model = slm.load("mlx-community/Qwen2.5-0.5B-Instruct-4bit", # load
86
+ accelerator="shadow")
87
+ run = model.finetune(ds, method="lora", max_steps=60) # finetune
88
+ print(run.loss, run.sparkline()) # live metrics
89
+ print(model.generate("What is the capital of France?")) # inference
90
+ model.save("out/", fmt="adapter") # ship it
91
+ ```
92
+
93
+ Change `method="lora"` to `qlora`, `dora`, `full`, `dpo`, `grpo`, `bitfit`,
94
+ `prompt`, `adapter`, `more`… and nothing else changes. That's the whole idea.
95
+
96
+ **Why "shadow"?** Because the model you train here is meant to *shadow* the
97
+ frontier model behind your agent: `slm.capture()` records the traffic the big
98
+ rented model handles, you fine-tune a small open model on it, run it in the
99
+ big one's shadow until it performs identically — then switch, and own the
100
+ weights. The SDK is that engine; [ShadowLM Studio](#shadowlm-studio) will run
101
+ the full loop.
102
+
103
+ ## Why ShadowLM Trainer
104
+
105
+ - **Twelve training methods, one argument.** LoRA to full fine-tuning to DPO to
106
+ RL-from-rewards to soft prompts — every technique is a declarative spec the
107
+ backends read. Adding your own is one file.
108
+ - **Mixture of Retrieval Experts (`more`)** — ShadowLM's signature method: facts
109
+ fused into attention so the model looks them up instead of hallucinating them
110
+ ([details below](#mixture-of-retrieval-experts--teach-facts-not-vibes)).
111
+ - **Agent RL, built in.** Collect multi-step rollouts, score whole episodes with
112
+ an LLM judge, train with DPO or trajectory-level GRPO. No reward math required.
113
+ `slm.capture(model)` turns any OpenAI-compatible harness into trajectories —
114
+ the harness runs unchanged.
115
+ - **The shadow accelerator.** One knob (`accelerator="shadow"`) that turns on the
116
+ optimizations that are *safe for your model and hardware* — and logs exactly
117
+ what it enabled. No silent magic.
118
+ - **Runs are records.** Every finetune persists status, config, and metrics.
119
+ Terminal loss charts, sparklines, resumable checkpoints, run history that
120
+ survives the process.
121
+ - **Honest engineering.** No mock backends, no silently-ignored arguments (the
122
+ mlx backend *tells you* when a torch-only knob doesn't apply), base-model
123
+ requirements enforced with errors that say what to do instead.
124
+ - **Pure-stdlib core.** `pip install shadowlm` has zero dependencies; training
125
+ backends are opt-in extras for your hardware.
126
+
127
+ ## Backends
128
+
129
+ **`torch` (CUDA) is the production backend** — PyTorch + `transformers` + `trl`
130
+ + `peft`, the stack serious training runs on. `mlx` exists so the *same code*
131
+ develops fast on an Apple laptop before it ships to a GPU box.
132
+
133
+ | backend | hardware | engine |
134
+ |---------|----------|--------|
135
+ | `torch` | **CUDA GPU** (production), or CPU (`device="cpu"`) | `transformers` + `trl` + `peft` — SFT / DPO / GRPO |
136
+ | `mlx` | Apple Silicon | `mlx-lm` — the local dev loop |
137
+
138
+ `auto` resolves CUDA → `torch`, else Apple Silicon → `mlx`, else `torch` on CPU.
139
+ One device knob, no mock fallback. The whole torch path — SFT, DPO, GRPO, eval,
140
+ generation — is exercised in CI-style on CPU, so the code a CUDA box runs is
141
+ tested code, not blind code.
142
+
143
+ The pipeline is the standard HuggingFace flow — `datasets` formats and chat
144
+ templates, LoRA/QLoRA adapters, chat-template inference.
145
+
146
+ ## Training methods
147
+
148
+ Each technique lives in its own module under `shadowlm/methods/` as a declarative
149
+ spec — backends read the spec (adapter kind, base requirements, data rendering),
150
+ never the method name.
151
+
152
+ | method | what it does | base model | default LR |
153
+ |--------|--------------|------------|------------|
154
+ | `lora` | LoRA adapters | either | 2e-4 |
155
+ | `qlora` | LoRA adapters, lowest memory | **4-bit required** | 2e-4 |
156
+ | `dora` | weight-decomposed LoRA, often better at low rank | either | 2e-4 |
157
+ | `full` | update every transformer weight | **unquantized required** | 2e-5 |
158
+ | `cpt` | continued pretraining on raw domain text (no chat template) | either | 5e-5 |
159
+ | `dpo` | preference optimization on `{prompt, chosen, rejected}` pairs vs a frozen reference (`beta=0.1`) | either | 5e-6 |
160
+ | `grpo` | RL from reward functions (`reward_fns=[...]`) or collected `TrajectoryGroup`s | either | 5e-6 |
161
+ | `more` | **mixture of retrieval experts** — facts embedded into a frozen index fused into attention; near-zero-hallucination recall (`retrieval_k`, `retrieval_layers`) | either | 1e-4 |
162
+ | `bitfit` | train only the bias terms (~0.1% of params) | **unquantized required** | 5e-4 |
163
+ | `prompt` | soft prompts — `num_virtual_tokens` learned vectors, model frozen (torch) | either | 5e-3 |
164
+ | `ptuning` | p-tuning — prompt embeddings via a small encoder (torch) | either | 5e-3 |
165
+ | `adapter` | bottleneck adapter modules after each layer (width = `lora_r`) | either | 1e-4 |
166
+
167
+ SFT methods train on chat/instruction/text data; `dpo` trains on preference
168
+ pairs (the `preference` format, auto-detected from `chosen`/`rejected` columns);
169
+ `grpo` trains on `{prompt[, answer]}` rows with your reward functions:
170
+
171
+ ```python
172
+ def prefers_blue(prompts, completions, answer, types=None):
173
+ return [1.0 if "blue" in c.lower() else 0.0 for c in completions]
174
+
175
+ run = model.finetune(rows, method="grpo", reward_fns=[prefers_blue],
176
+ grpo_group_size=4)
177
+ ```
178
+
179
+ On CUDA, dpo/grpo ride on trl (`DPOTrainer` / `GRPOTrainer`); on Apple Silicon
180
+ they need `pip install shadowlm[preference]`. ORPO / PPO-style RLHF exist in
181
+ the substrates and follow the same `trainer=` slot.
182
+
183
+ ### Mixture of Retrieval Experts — teach facts, not vibes
184
+
185
+ `more` is for *facts*: each training fact is embedded into a frozen FAISS
186
+ index; wrapped attention layers retrieve each token's nearest memories and
187
+ attend over them through small trainable projections (plus LoRA for capacity).
188
+ The model learns to look facts up instead of hallucinating them, and the index
189
+ travels inside the adapter dir — `load(adapter=...)` rebuilds everything
190
+ (verified on both backends: exact recall of held-in facts, before and after
191
+ reload). Needs `pip install shadowlm[retrieval]`.
192
+
193
+ ### Train any harness without opening the box
194
+
195
+ Every agent must call a model, so the model API is the one boundary that
196
+ always exists. `slm.capture(model)` serves an OpenAI-compatible endpoint
197
+ (SSE streaming included; parallel calls serialized safely), records every
198
+ call your harness makes, and reconstructs multi-turn episodes (prefix-merged,
199
+ branch-safe) into trajectories:
200
+
201
+ ```python
202
+ with slm.capture(model) as proxy: # http://127.0.0.1:8327/v1
203
+ run_my_agent(base_url=proxy.base_url) # any OpenAI-client harness, unchanged
204
+ trajectories = proxy.trajectories()
205
+ group = slm.judge_group(slm.TrajectoryGroup(trajectories), judge=judge)
206
+ run = model.finetune([group], method="grpo")
207
+ ```
208
+
209
+ The async rollout-service tier (gateways, prewarming, fleet-scale trainers)
210
+ belongs to the studio.
211
+
212
+ ### Agent RL: trajectories + judge rewards
213
+
214
+ For multi-step agents, score whole episodes instead of writing reward math:
215
+
216
+ ```python
217
+ group = slm.TrajectoryGroup( # several attempts at one task
218
+ slm.Trajectory(messages=rollout_messages, reward=0.0) for _ in range(6))
219
+ group = slm.judge_group(group, judge=judge_model) # LLM-as-judge scores 0–1
220
+ run = model.finetune(group.to_preference_rows(), method="dpo")
221
+ ```
222
+
223
+ `judge_group` asks a judge model to score attempts against a rubric (with a
224
+ best/worst ranking fallback that keeps small local judges reliable). Train on
225
+ the scored groups two ways: `group.to_preference_rows()` → DPO, or directly —
226
+ `model.finetune(groups, method="grpo")` runs advantage-weighted policy
227
+ gradient over the trajectories (rewards normalized within each group, loss on
228
+ assistant tokens only). Collect on-policy rollouts, score, train, repeat.
229
+
230
+ ### Bring your own method
231
+
232
+ Base requirements are enforced with clear errors (e.g. `qlora` on a 16-bit model
233
+ tells you to load a 4-bit one). Adding a technique is one file:
234
+
235
+ ```python
236
+ # shadowlm/methods/my_method.py (or methods.register(...) at runtime)
237
+ from .base import TrainingMethod, register
238
+
239
+ register(TrainingMethod(
240
+ name="my-method",
241
+ description="LoRA variant with my defaults",
242
+ default_learning_rate=1e-4,
243
+ ))
244
+ ```
245
+
246
+ ## Install & run
247
+
248
+ `pip install 'shadowlm[all]'` gives you everything for a CUDA / CPU box.
249
+ Prefer picking parts? Each extra is independent:
250
+
251
+ | extra | what it adds |
252
+ |-------|--------------|
253
+ | `[torch]` | training on CUDA / CPU — `transformers` + `trl` + `peft` + `torch` |
254
+ | `[mlx]` | the local-dev backend (`mlx-lm`) |
255
+ | `[preference]` | dpo / grpo on the mlx backend (`mlx-lm-lora`) |
256
+ | `[retrieval]` | the `more` method — fact index (`sentence-transformers`) |
257
+ | `[mlx-all]` | everything for the local dev loop |
258
+
259
+ To run the examples, grab the repo:
260
+
261
+ ```bash
262
+ git clone https://github.com/open-gitagent/shadowLM && cd shadowLM
263
+ python3 -m venv .venv && source .venv/bin/activate && pip install -e '.[mlx]'
264
+ python examples/quickstart.py # datasets → finetune → inference, end to end
265
+ ```
266
+
267
+ No hardware handy? `examples/colab_quickstart.ipynb` runs the same flow on a
268
+ free Colab GPU.
269
+
270
+ Output (mlx backend, a 0.5B model — 3.5 seconds of training):
271
+
272
+ ```
273
+ Dataset('sample_dataset', format='chat', rows=8)
274
+ before: The capital of France is Paris.
275
+ [shadow] enabled: gradient checkpointing
276
+ [mlx:gpu] finetuning Qwen2.5-0.5B-Instruct-4bit · lora · 8 examples · 40 iters · lora r=16 on 24 layers · lr 0.0002 (linear, warmup 5)
277
+ [████████████████████████] step 40/40 loss 0.0718 lr 5.00e-05 11.7 st/s 1,048 tok/s
278
+ [mlx] done · final loss 0.0718 · adapter ~/.shadowlm/runs/Qwen2.5-0.5B-Instruct-4bit-…
279
+
280
+ loss ▇▆█▇▆▇▇█▅▅▄▅▃▂▃▃▁▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ 4.2120 → 0.0718
281
+ ♥ succeeded · 40 steps · 3.5s
282
+
283
+ after: The capital of France is Paris.
284
+ ```
285
+
286
+ ### The CLI
287
+
288
+ Everything above, without opening Python — installed with the package:
289
+
290
+ ```bash
291
+ shadowlm finetune data.jsonl --model Qwen/Qwen2.5-0.5B-Instruct --method lora
292
+ shadowlm runs # run history: status, steps, losses, duration
293
+ shadowlm plot <run-id> # terminal loss charts for any recorded run
294
+ shadowlm chat out/adapter/ # talk to what you trained (base model auto-resolved)
295
+ shadowlm methods # the registered methods, defaults included
296
+ ```
297
+
298
+ Every `TrainConfig` hyperparameter is a flag (`--max-steps`, `--lora-r`,
299
+ `--num-train-epochs`, …) — generated from the dataclass, so the CLI can't
300
+ drift from the SDK. `shadowlm finetune --help` lists them all.
301
+
302
+ ### CUDA box
303
+
304
+ ```python
305
+ model = slm.load("Qwen/Qwen2.5-0.5B-Instruct", backend="torch",
306
+ accelerator="shadow", load_in_4bit=True)
307
+ run = model.finetune(ds, method="qlora", max_steps=60)
308
+ model.save("out/", fmt="merged")
309
+ ```
310
+
311
+ ## The shadow accelerator
312
+
313
+ `accelerator="shadow"` is ShadowLM's in-house optimization layer. It sits on top of
314
+ whichever backend is active and turns on the speed/memory optimizations that are
315
+ *safe for the current model and hardware*:
316
+
317
+ - gradient checkpointing (trade compute for VRAM on bigger models)
318
+ - flash-attention-2 (on CUDA, when available)
319
+ - a fused optimizer
320
+
321
+ Modes: `"auto"` (default — enable what helps at the current size), `"shadow"`
322
+ (force all on), `"none"` (off). It is honest — it logs exactly what it enabled and
323
+ no-ops when an optimization wouldn't help.
324
+
325
+ ## Training parameters
326
+
327
+ `finetune(**hyperparams)` accepts the full `TrainConfig` surface:
328
+
329
+ - **adapters** — `lora_r`, `lora_alpha`, `lora_dropout`, `target_modules`
330
+ (`"all"` / `"attention"` / `"mlp"` presets, or explicit names), `use_rslora`*
331
+ - **optimization** — `learning_rate` (default per method), `per_device_train_batch_size`,
332
+ `gradient_accumulation_steps`, `warmup_steps` / `warmup_ratio`, `max_steps` /
333
+ `num_train_epochs`, `weight_decay`, `max_grad_norm`*, `lr_scheduler_type`
334
+ (linear / cosine / constant — real schedules on both backends), `optim`*, `seed`
335
+ - **data** — `max_seq_length`, `packing`*, `train_on_completions` (mask the prompt,
336
+ learn only on responses — mlx; torch masks via prompt/completion data automatically)
337
+ - **logging / checkpoints** — `logging_steps`, `eval_steps` (int, or a 0–1 fraction
338
+ of total steps), `save_steps` (mid-run checkpoints), `resume_from_checkpoint`,
339
+ `report_to`*
340
+
341
+ \* torch-backend only; the mlx backend logs a note instead of silently ignoring.
342
+
343
+ ## API surface
344
+
345
+ | Call | What it does |
346
+ |------|--------------|
347
+ | `slm.Dataset.load(path)` | any supported file by extension (.jsonl/.json/.csv/.parquet) |
348
+ | `slm.Dataset.from_jsonl / from_csv / from_json / from_parquet / from_list` | format auto-detected: ChatML (`messages`), ShareGPT (`conversations`), alpaca instruction, raw text — or force with `format=` |
349
+ | `slm.Dataset.from_hf(repo, subset=, split=, token=)` | HuggingFace Hub datasets |
350
+ | `ds.as_chat()` / `ds.as_text()` | force chat or raw-text format |
351
+ | `ds.split(test_size=0.1, seed=0)` | held-out train/eval split → `(train, eval)` |
352
+ | `ds[0:100]`, `ds.head()`, `ds.columns`, `len(ds)` | row slicing & inspection |
353
+ | `slm.load(name, backend=, accelerator=, device=, load_in_4bit=, adapter=)` | load a model (or attach a trained adapter) |
354
+ | `model.finetune(ds, method=<any of 12 — see Training methods>, eval_dataset=ds\|"auto", reward_fns=, on_step=, on_eval=, **hyperparams)` | train; returns a `TrainingRun` (`eval_dataset="auto"` holds out 10%) |
355
+ | `model.generate(prompt, ...)` | single-prompt inference |
356
+ | `model.chat(messages, tools=...)` → `Reply` | multi-turn chat via the model's chat template; OpenAI-style tool schemas in, parsed `reply.tool_calls` out |
357
+ | `model.save(path, fmt="adapter"\|"merged")` | export |
358
+ | `run.loss`, `run.eval_loss`, `run.step`, `run.progress`, `run.sparkline()`, `run.checkpoint` | live + final run state |
359
+ | `slm.runs.list() / latest() / load(id) / delete(id)` | run history — every finetune persists a `run.json` (status, config, metrics) |
360
+ | `run.plot("loss"\|"eval_loss"\|"lr"\|"grad_norm", smooth=, window=, log=, clip=)` | terminal charts — raw dots + EMA overlay, view window, log scale, p95/p99 clip |
361
+ | `run.series(name)`, `run.smoothed(weight)` | raw (steps, values) series + EMA — the data feed for any UI chart |
362
+
363
+ Every run records itself — `succeeded`, `failed` (with the error), or `stopped`
364
+ (Ctrl-C) — so history survives the process. Resume any recorded run with
365
+ `model.finetune(ds, resume_from_checkpoint=run.checkpoint)`; pass `save_steps=N`
366
+ to keep mid-run checkpoints so even interrupted runs are resumable.
367
+
368
+ Pass `on_step` / `on_eval` to `finetune` to stream `Metric(step, loss, lr, ...)`
369
+ as training happens — that's the hook ShadowLM Studio's live charts will use.
370
+
371
+ ### Train / eval split
372
+
373
+ Hold out a validation set so you can see overfitting, not just training loss:
374
+
375
+ ```python
376
+ train, val = slm.Dataset.from_jsonl("data.jsonl").split(test_size=0.2)
377
+ run = model.finetune(train, eval_dataset=val, eval_steps=10, max_steps=40)
378
+
379
+ print(run.loss) # final train loss
380
+ print(run.eval_loss) # final held-out eval loss
381
+ print([(m.step, m.loss) for m in run.eval_metrics])
382
+ # e.g. (0, 4.02) (10, 1.62) (20, 0.83) (30, 0.92) (40, 1.09)
383
+ # ^ eval bottoms out, then rises = overfitting
384
+ ```
385
+
386
+ Eval runs on both backends (mlx `val_dataset`; torch `eval_strategy="steps"`).
387
+
388
+ ### Tool calling
389
+
390
+ Both ends of function calling work. **Training:** chat rows may carry
391
+ `tool_calls` messages and a per-row `tools` list of schemas — they're rendered
392
+ through the model's chat template (ShareGPT rows keep their `tools` through
393
+ conversion). **Inference:**
394
+
395
+ ```python
396
+ reply = model.chat(messages, tools=[{"type": "function", "function": {...}}])
397
+ reply.tool_calls # [{"name": "get_weather", "arguments": {...}}]
398
+ messages.append(reply.to_message())
399
+ messages.append({"role": "tool", "content": json.dumps(result)})
400
+ final = model.chat(messages, tools=tools) # uses the tool result
401
+ ```
402
+
403
+ ## Layout
404
+
405
+ ```
406
+ shadowlm/
407
+ __init__.py public surface: load, Dataset, TrainingRun, Metric, TrainConfig
408
+ data.py Dataset — load + format detection + chat normalization
409
+ training.py TrainConfig, Metric, TrainingRun (sparkline, progress)
410
+ models.py Model (finetune / generate / save) and load()
411
+ runs.py run history — list / load / resume / delete past runs
412
+ accel.py the shadow accelerator — optimization planning
413
+ more.py mixture of retrieval experts (index + attention fusion)
414
+ bottleneck.py Houlsby-style bottleneck adapters
415
+ rl.py Trajectory, TrajectoryGroup, judge rewards
416
+ capture.py OpenAI-compatible capture proxy — record any harness
417
+ cli.py the `shadowlm` command — finetune/runs/plot/chat/methods
418
+ methods/ training techniques — one module per method
419
+ base.py TrainingMethod spec + registry
420
+ lora qlora dora full cpt dpo grpo more bitfit soft_prompt ptuning adapter
421
+ backends/
422
+ base.py Backend interface + Callbacks bridge
423
+ mlx.py MLXBackend — Apple Silicon (Metal GPU)
424
+ torch.py TorchBackend — PyTorch (CUDA / CPU)
425
+ examples/
426
+ quickstart.py datasets → finetune → inference, end to end
427
+ train_eval_split.py held-out validation + overfitting signal
428
+ infer_adapter.py train → save → reload adapter in a fresh model → infer
429
+ dpo_preferences.py preference pairs → style transfer on unseen prompts
430
+ grpo_rewards.py RL from programmable reward functions
431
+ judge_rewards.py LLM-as-judge rewards → preference pairs → DPO
432
+ tool_calling.py tool schemas in, parsed calls out, tool loop, training
433
+ runs_and_charts.py run history + terminal loss/LR/eval charts
434
+ harness_capture.py record a black-box agent through the proxy, then train
435
+ colab_quickstart.ipynb the full tour on a Colab GPU
436
+ colab_gpu_tests.ipynb CUDA verification suite (method × precision matrix)
437
+ retrieval_experts.py mixture of retrieval experts — exact fact recall
438
+ sample_dataset.jsonl
439
+ tests/
440
+ gpu/test_cuda.py CUDA verification — every method × every legal precision,
441
+ each cell: train → reload → generate → continue training
442
+ ```
443
+
444
+ ## The road ahead
445
+
446
+ The SDK is the core, and it ships first. Everything that follows wraps this
447
+ exact API — nothing gets reimplemented.
448
+
449
+ ### ShadowLM Studio
450
+
451
+ The multi-user destination: a web service and remote-GPU workers wrapping this
452
+ SDK. Studio runs the enterprise migration loop end to end — baseline on the
453
+ rented frontier model → collect & fine-tune → **shadow mode** (your model runs
454
+ behind the same agent until it's proven) → gradual switch.
455
+
456
+ - **Job queue → CUDA workers** — submit from the browser or the SDK, train on
457
+ the GPU pool; the torch backend is already the production path.
458
+ - **Live training charts** — streamed over the `on_step` / `on_eval` hooks that
459
+ exist today; `run.series()` is the data feed.
460
+ - **Team run history** — the `run.json` records every finetune already writes,
461
+ made shared and searchable.
462
+ - **Dataset + adapter registry** — upload, version, and one-click attach what
463
+ the SDK's `Dataset` and `load(adapter=)` already understand.
464
+ - **Eval gates** — advance traffic only when quality holds and the savings beat
465
+ the cost: task-level evals and cost-per-task, built on the SDK's run records.
466
+
467
+ Current status:
468
+
469
+ - [x] SDK: datasets → finetune → inference on mlx / torch
470
+ - [x] 12 training methods incl. MoRE, trajectory GRPO, judge rewards
471
+ - [x] Train/eval split with held-out validation loss
472
+ - [x] Shadow accelerator (gradient checkpointing, flash-attn, fused optim)
473
+ - [x] Harness capture proxy — OpenAI-compatible, SSE streaming, trajectory
474
+ reconstruction
475
+ - [x] ShadowLM CLI — finetune / runs / plot / chat / methods from the shell
476
+ - [ ] ShadowLM Studio
477
+
478
+ ## Contributing
479
+
480
+ Adding a training method is one file (see [Bring your own method](#bring-your-own-method));
481
+ bug reports with a failing snippet are gold. Fork → branch → PR. Give the repo a
482
+ ⭐ if it trains something for you — it genuinely helps others find it.
483
+
484
+ ## Star history
485
+
486
+ [![Star History Chart](https://api.star-history.com/svg?repos=open-gitagent/shadowLM&type=Date)](https://star-history.com/#open-gitagent/shadowLM&Date)
487
+
488
+ ## License
489
+
490
+ [MIT](./LICENSE) — built by [Lyzr Research Labs](https://lyzr.ai) · maintained by
491
+ [Khush Patel](mailto:khush@lyzr.ai) · `slm♥`