@mcptoolshop/backpropagate 1.1.1 → 1.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.es.md +217 -211
- package/README.fr.md +216 -210
- package/README.hi.md +216 -210
- package/README.it.md +221 -215
- package/README.ja.md +221 -215
- package/README.md +216 -210
- package/README.pt-BR.md +215 -209
- package/README.zh.md +215 -209
- package/bin/backpropagate.js +29 -196
- package/package.json +2 -5
package/README.md
CHANGED
|
@@ -10,144 +10,152 @@
|
|
|
10
10
|
<a href="https://github.com/mcp-tool-shop-org/backpropagate/actions/workflows/ci.yml"><img src="https://github.com/mcp-tool-shop-org/backpropagate/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
|
|
11
11
|
<a href="https://pypi.org/project/backpropagate/"><img src="https://img.shields.io/pypi/v/backpropagate" alt="PyPI"></a>
|
|
12
12
|
<a href="https://codecov.io/gh/mcp-tool-shop-org/backpropagate"><img src="https://img.shields.io/codecov/c/github/mcp-tool-shop-org/backpropagate" alt="Coverage"></a>
|
|
13
|
+
<a href="https://scorecard.dev/viewer/?uri=github.com/mcp-tool-shop-org/backpropagate"><img src="https://api.scorecard.dev/projects/github.com/mcp-tool-shop-org/backpropagate/badge" alt="OpenSSF Scorecard"></a>
|
|
13
14
|
<a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue" alt="MIT License"></a>
|
|
14
15
|
<a href="https://mcp-tool-shop-org.github.io/backpropagate/"><img src="https://img.shields.io/badge/Landing_Page-live-blue" alt="Landing Page"></a>
|
|
15
16
|
</p>
|
|
16
17
|
|
|
17
|
-
|
|
18
|
+
# Train an adapter. Ship it to Ollama. Move on.
|
|
18
19
|
|
|
19
|
-
|
|
20
|
+
Backpropagate is a Python library for fine-tuning large language models on a single GPU. Three lines of code train a 7B model on a 16GB card. One more command exports it to Ollama so you can `ollama run` your finetune. Works first-class on Windows.
|
|
20
21
|
|
|
21
|
-
|
|
22
|
+
```python
|
|
23
|
+
from backpropagate import Trainer
|
|
22
24
|
|
|
23
|
-
|
|
25
|
+
trainer = Trainer("Qwen/Qwen2.5-7B-Instruct")
|
|
26
|
+
trainer.train("my_data.jsonl", steps=100)
|
|
27
|
+
trainer.export("gguf", quantization="q4_k_m")
|
|
28
|
+
```
|
|
24
29
|
|
|
25
30
|
```bash
|
|
26
|
-
|
|
31
|
+
backprop ollama register ./output/lora --name my-model
|
|
32
|
+
ollama run my-model
|
|
27
33
|
```
|
|
28
34
|
|
|
29
|
-
|
|
30
|
-
from backpropagate import Trainer
|
|
35
|
+
That's it. There's no YAML config file. There's no `accelerate launch` ceremony. There's no separate "now convert it to GGUF" tutorial. If you have a CUDA GPU and a JSONL file with your training data, you're three lines away from a working finetune.
|
|
31
36
|
|
|
32
|
-
|
|
33
|
-
trainer.train("examples/quickstart.jsonl", steps=10)
|
|
34
|
-
trainer.export("gguf", quantization="q4_k_m") # Ready for Ollama
|
|
35
|
-
```
|
|
37
|
+
## Install
|
|
36
38
|
|
|
37
|
-
|
|
39
|
+
```bash
|
|
40
|
+
# Recommended: isolated Python install (no conflicts with system Python or other projects)
|
|
41
|
+
pipx install backpropagate
|
|
42
|
+
|
|
43
|
+
# Or via uv (faster install, same isolation)
|
|
44
|
+
uv tool install backpropagate
|
|
38
45
|
|
|
39
|
-
|
|
46
|
+
# Standard pip (if you manage your own virtualenv)
|
|
47
|
+
pip install backpropagate
|
|
48
|
+
```
|
|
40
49
|
|
|
41
|
-
|
|
50
|
+
If you want the optional features, swap the install for one of these:
|
|
42
51
|
|
|
43
52
|
```bash
|
|
44
|
-
|
|
45
|
-
|
|
53
|
+
pipx install "backpropagate[standard]" # adds Unsloth (2x faster training) + the web UI
|
|
54
|
+
pipx install "backpropagate[full]" # adds everything: unsloth, ui, monitoring, export, etc.
|
|
46
55
|
```
|
|
47
56
|
|
|
48
|
-
|
|
57
|
+
Prefer Docker? `docker pull ghcr.io/mcp-tool-shop-org/backpropagate:latest` works too. Images ship for both `linux/amd64` and `linux/arm64`, so Apple Silicon and ARM Linux operators get a native image. A canonical `compose.yaml` for "UI in a container" lives at the repo root — `docker compose up` brings the web UI up on `http://localhost:7860` with a persistent `~/.backpropagate` volume mount.
|
|
49
58
|
|
|
50
|
-
##
|
|
59
|
+
## Where Backpropagate sits in the space
|
|
51
60
|
|
|
52
|
-
|
|
61
|
+
There are several good libraries for fine-tuning LLMs. They're each great at different things:
|
|
53
62
|
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
63
|
+
- **[Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)** — if you like YAML configs and want a community of recipes to copy from
|
|
64
|
+
- **[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)** — if you want a web GUI and built-in support for DPO/PPO/RLHF
|
|
65
|
+
- **[Unsloth](https://github.com/unslothai/unsloth)** — if you need the fastest possible training and you're on a supported model family
|
|
66
|
+
- **[torchtune](https://github.com/pytorch/torchtune)** — if you want Meta's first-party PyTorch-native recipes you can edit
|
|
58
67
|
|
|
59
|
-
|
|
68
|
+
Backpropagate is the missing option: **a 3-line Python API for solo operators on a single consumer GPU who want to train an adapter and ship it.** No YAML, no GUI, no DPO/PPO, no multi-node. Just the loop everyone actually needs and the export step that gets in the way.
|
|
60
69
|
|
|
61
|
-
|
|
70
|
+
If you tried one of the libraries above and bounced off the config-file ceremony, or hit a model-family gap, or wanted Windows-first defaults — Backpropagate is for you.
|
|
62
71
|
|
|
63
|
-
|
|
64
|
-
|---------|----------|
|
|
65
|
-
| Fine-tuning is complex | 3 lines: load, train, save |
|
|
66
|
-
| Windows is a nightmare | First-class Windows support |
|
|
67
|
-
| VRAM management is hard | Auto batch sizing, GPU monitoring |
|
|
68
|
-
| Model export is confusing | One-click GGUF + Ollama registration |
|
|
69
|
-
| Long runs cause forgetting | Multi-run SLAO training |
|
|
72
|
+
## What you can fine-tune on a 16GB consumer GPU
|
|
70
73
|
|
|
71
|
-
|
|
74
|
+
Here's the practical envelope on a 16GB card (RTX 4080 / 5080 / 4070 Ti Super):
|
|
72
75
|
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
-
|
|
76
|
-
-
|
|
77
|
-
-
|
|
78
|
-
-
|
|
76
|
+
| Model | Method | Status |
|
|
77
|
+
|---|---|---|
|
|
78
|
+
| Qwen-3.5-4B / Phi-4-mini-3.8B / SmolLM3-3B | LoRA / QLoRA / DoRA | Comfortable. Full sequence length, room to spare. |
|
|
79
|
+
| Qwen-2.5-7B / Llama-3.1-8B / Mistral-7B | QLoRA | Standard. ~7-8 GB. Backpropagate's default presets. |
|
|
80
|
+
| Llama-3 13B | QLoRA + sample packing | Tight but works. Use shorter sequences. |
|
|
81
|
+
| Mixtral 8x7B (47B total parameters) | AQLM 2-bit + LoRA | Experimental in v1.4. The largest model you can touch on a 16GB card. |
|
|
79
82
|
|
|
80
|
-
|
|
83
|
+
For models 3B and smaller, full fine-tuning (not just LoRA) is feasible on 16GB and is planned as a `mode="full"` option for v1.4. For 7B+ models, full fine-tuning needs a 24GB+ GPU — consider an A100 cloud rental, or stick with LoRA, which recent research shows matches full fine-tuning quality on most post-training tasks anyway (see [the anti-pitch section](#what-backpropagate-is-not-for) for citations).
|
|
81
84
|
|
|
82
|
-
|
|
83
|
-
pip install backpropagate # Core only (minimal)
|
|
84
|
-
pip install backpropagate[unsloth] # + Unsloth 2x faster training
|
|
85
|
-
pip install backpropagate[ui] # + Reflex (Radix UI) web interface
|
|
86
|
-
pip install backpropagate[standard] # unsloth + ui (recommended)
|
|
87
|
-
pip install backpropagate[full] # Everything
|
|
88
|
-
```
|
|
85
|
+
## What Backpropagate is NOT for
|
|
89
86
|
|
|
90
|
-
|
|
91
|
-
|-------|-------------|--------------|
|
|
92
|
-
| `unsloth` | 2x faster training, 50% less VRAM | unsloth |
|
|
93
|
-
| `ui` | Reflex (Radix UI) web interface | reflex>=0.9.2, fastapi>=0.115 |
|
|
94
|
-
| `validation` | Pydantic config validation | pydantic, pydantic-settings |
|
|
95
|
-
| `export` | GGUF export for Ollama | llama-cpp-python |
|
|
96
|
-
| `monitoring` | WandB + system monitoring (auto-wired into trainer in v1.1.0) | wandb, psutil |
|
|
97
|
-
| `observability` | OpenTelemetry tracing | opentelemetry-api, opentelemetry-sdk |
|
|
98
|
-
| `logging` | Structured logging | structlog |
|
|
99
|
-
| `security` | JWT auth + token generation | PyJWT, cryptography |
|
|
100
|
-
| `production` | unsloth + ui + validation + logging + security | (bundle) |
|
|
87
|
+
Honest scope helps everyone. Backpropagate doesn't do these things, and trying to make it do them would be a worse experience than reaching for the right tool:
|
|
101
88
|
|
|
102
|
-
**
|
|
89
|
+
- **Full-parameter fine-tuning of 7B+ models** — Backpropagate uses LoRA / QLoRA, which trains a small adapter rather than updating every weight. For models 7B and larger, full fine-tuning needs 24GB+ of GPU memory and doesn't fit on a 16GB consumer card. For models 3B and smaller, full fine-tuning IS feasible on 16GB; a `mode="full"` option is planned for v1.4. The bigger picture: recent research ([Biderman 2024](https://arxiv.org/abs/2405.09673), [Thinking Machines 2025](https://thinkingmachines.ai/blog/lora/)) shows that LoRA at correct configuration matches full fine-tuning quality on most post-training tasks (instruction-following, domain adaptation, persona/style) at 67% of the compute — so for the work most operators actually want, you don't lose anything by sticking with LoRA. If you genuinely need full fine-tuning of a 7B+ model, use HuggingFace `transformers.Trainer` directly on a 24GB+ card.
|
|
90
|
+
- **DPO / PPO / GRPO / preference tuning** — Backpropagate does single-stage supervised fine-tuning only. For preference learning, use TRL directly or LLaMA-Factory.
|
|
91
|
+
- **Multi-node training** — single GPU on one machine only. Multi-GPU on one machine works (via `accelerate launch`) but isn't officially supported.
|
|
92
|
+
- **macOS training** — Apple Silicon doesn't have CUDA, so training has to run on a Linux or Windows box with an NVIDIA GPU. You can still run the trained model on a Mac via Ollama.
|
|
93
|
+
- **Anything outside the tested model families** — Qwen 2.5 / 3.5 (7B / 4B), Phi-4-mini-3.8B, SmolLM3-3B, Llama 3.2 (3B / 1B), Mistral 7B. Other models often work but aren't pinned in CI.
|
|
103
94
|
|
|
104
|
-
|
|
95
|
+
If you need any of those things, reach for one of the libraries listed above. They're better at them.
|
|
105
96
|
|
|
106
|
-
|
|
97
|
+
## What Backpropagate gives you
|
|
107
98
|
|
|
108
|
-
|
|
109
|
-
- **Windows.** Visual Studio Build Tools (C++) and CMake are required for the `[export]` extra (`llama-cpp-python` builds from source). `bitsandbytes` wheel is published for Windows natively now (>= 0.43); older guides mentioning `bitsandbytes-windows` are stale.
|
|
110
|
-
- **macOS.** GPU training is **not supported** — no CUDA. You can install Backpropagate to run *inference* on an exported GGUF via Ollama, but `trainer.train()` raises `DEP_GPU_NOT_AVAILABLE`. Use a CUDA machine for training.
|
|
111
|
-
- **Linux.** Most distros work out of the box. If you're using the PyPI binary release, note that the Linux build uses CPU-only torch (to stay under GitHub's 2 GB release-asset cap); install with the matching CUDA wheel from pytorch.org first.
|
|
99
|
+
Four things, in one install:
|
|
112
100
|
|
|
113
|
-
|
|
101
|
+
**1. A real 3-line API that runs without a config file.**
|
|
102
|
+
The snippet at the top of this README runs end-to-end. No `accelerate config`, no YAML, no Hydra overrides. Just `Trainer(model).train(data)` and you have a finetune.
|
|
114
103
|
|
|
115
|
-
|
|
104
|
+
**2. Windows that actually works.**
|
|
105
|
+
Most ML libraries treat Windows like an afterthought. Backpropagate is tested first-class on Windows + RTX 5080. The library handles the runtime quirks for you — it knows how to pre-tokenize your data so Windows multiprocessing doesn't crash, it automatically disables xformers on RTX 40/50 cards where it would break, and it picks dataloader settings that don't blow up. You don't have to know any of this. It just runs.
|
|
116
106
|
|
|
117
|
-
|
|
107
|
+
**3. Built for unattended runs.**
|
|
108
|
+
Training takes hours. You don't want to babysit it. Backpropagate is designed to be left running:
|
|
118
109
|
|
|
119
|
-
|
|
110
|
+
- If you run out of GPU memory, it automatically halves the batch size and retries — up to three times. No hand-tuning.
|
|
111
|
+
- If your GPU gets too hot, it pauses until things cool down and then continues.
|
|
112
|
+
- Every checkpoint is written atomically — if your laptop crashes mid-save, the previous good checkpoint is still intact.
|
|
113
|
+
- Every training run gets a unique ID that's stamped onto every log line, every checkpoint, and every Weights & Biases entry. If something goes wrong, one ID lets a maintainer correlate everything.
|
|
114
|
+
- Errors come with stable codes (`RUNTIME_GPU_OOM`, `DEP_OLLAMA_REGISTRATION_FAILED`, etc.) so you can grep your logs and the [troubleshooting guide](https://mcp-tool-shop-org.github.io/backpropagate/handbook/troubleshooting/) for the fix. CUDA-specific failures have a dedicated [CUDA troubleshooting page](https://mcp-tool-shop-org.github.io/backpropagate/handbook/troubleshooting-cuda/).
|
|
120
115
|
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
| `BACKPROPAGATE_LOG_LEVEL` | `INFO` | `DEBUG` / `INFO` / `WARNING` / `ERROR` |
|
|
124
|
-
| `BACKPROPAGATE_LOG_JSON` | auto | Force JSON (`true`) or console (`false`) logs |
|
|
125
|
-
| `BACKPROPAGATE_LOG_FILE` | unset | Path to mirror logs into |
|
|
126
|
-
| `BACKPROPAGATE_DEFER_FEATURE_DETECTION` | unset | Skip optional-dep detection at startup for the fastest CLI cold start |
|
|
127
|
-
| `BACKPROPAGATE_SECURITY__REQUIRE_AUTH_FOR_SHARE` | `true` | When `true`, refuses `backprop ui --share` without `--auth` |
|
|
128
|
-
| `BACKPROPAGATE_UI__OUTPUT_DIR` | `~/.backpropagate/ui-outputs` | Sandbox base for all UI filesystem writes; denylist-validated |
|
|
129
|
-
| `BACKPROPAGATE_MODEL__NAME` | `Qwen/Qwen2.5-7B-Instruct` | Default model |
|
|
130
|
-
| `BACKPROPAGATE_TRAINING__LEARNING_RATE` | `2e-4` | Learning rate |
|
|
131
|
-
| `BACKPROPAGATE_LORA__R` | `16` | LoRA rank |
|
|
116
|
+
**4. One command from trained adapter to `ollama run`.**
|
|
117
|
+
Lots of libraries train a model. Few of them get out of your way when you want to actually use it. Backpropagate exports to GGUF (the format Ollama uses) and registers an Ollama model in one command. You go from "training done" to "I can chat with my finetune" in about 30 seconds.
|
|
132
118
|
|
|
133
|
-
|
|
119
|
+
## Quick Start
|
|
134
120
|
|
|
135
|
-
|
|
121
|
+
The repo ships a tiny example dataset so the snippet from the top of this README runs on a clean install:
|
|
136
122
|
|
|
137
|
-
|
|
123
|
+
```bash
|
|
124
|
+
pipx install "backpropagate[standard]"
|
|
138
125
|
|
|
139
|
-
|
|
126
|
+
python -c "
|
|
140
127
|
from backpropagate import Trainer
|
|
128
|
+
trainer = Trainer('Qwen/Qwen2.5-7B-Instruct')
|
|
129
|
+
trainer.train('examples/quickstart.jsonl', steps=10)
|
|
130
|
+
trainer.export('gguf', quantization='q4_k_m')
|
|
131
|
+
"
|
|
132
|
+
```
|
|
141
133
|
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
134
|
+
This trains a Qwen 2.5 7B adapter on 5 short ShareGPT-format conversations, then exports the result to GGUF. For your own data, format your JSONL one example per line:
|
|
135
|
+
|
|
136
|
+
```jsonl
|
|
137
|
+
{"conversations": [{"from": "human", "value": "What is Python?"}, {"from": "gpt", "value": "A programming language."}]}
|
|
138
|
+
{"conversations": [{"from": "human", "value": "Explain recursion."}, {"from": "gpt", "value": "A function that calls itself."}]}
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
Alpaca (`instruction` / `output`), OpenAI chat (`messages`), and raw text formats also work — Backpropagate auto-detects the format.
|
|
142
|
+
|
|
143
|
+
For more end-to-end workflows (fine-tune-and-push-to-HF-Hub, resume after OOM, multi-run SLAO across a long campaign, etc.) see the [handbook recipes page](https://mcp-tool-shop-org.github.io/backpropagate/handbook/recipes/).
|
|
144
|
+
|
|
145
|
+
### Web UI (optional)
|
|
146
|
+
|
|
147
|
+
If you'd rather click than type Python, install the UI extra and launch:
|
|
148
|
+
|
|
149
|
+
```bash
|
|
150
|
+
pipx install "backpropagate[ui]"
|
|
151
|
+
backprop ui --port 7862
|
|
146
152
|
```
|
|
147
153
|
|
|
148
|
-
|
|
154
|
+
A local web interface opens at `http://localhost:7862` where you can point at a dataset, pick a model, train, and export. The UI is local-only by default. To expose it to other devices, see [Web UI](#web-ui) below for the `--share` + `--auth` security contract.
|
|
155
|
+
|
|
156
|
+
## Multi-run training
|
|
149
157
|
|
|
150
|
-
|
|
158
|
+
If you want to fine-tune incrementally across multiple datasets — say you get new training data every week and want to add it without forgetting what you learned before — Backpropagate's `multi_run` mode is for you:
|
|
151
159
|
|
|
152
160
|
```python
|
|
153
161
|
from backpropagate import Trainer
|
|
@@ -159,198 +167,196 @@ result = trainer.multi_run(
|
|
|
159
167
|
num_runs=5,
|
|
160
168
|
steps_per_run=100,
|
|
161
169
|
samples_per_run=1000,
|
|
162
|
-
merge_mode="slao", # Single LoRA Continual Learning via Asymmetric Merging
|
|
163
170
|
)
|
|
164
171
|
```
|
|
165
172
|
|
|
166
|
-
|
|
173
|
+
This runs five training passes, merging the adapter between runs in a way that preserves earlier knowledge while incorporating new examples. The technique is based on recent continual-learning research — see [References](#references) at the bottom of this README.
|
|
167
174
|
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
```python
|
|
171
|
-
# Export to GGUF
|
|
172
|
-
result = trainer.export("gguf", quantization="q4_k_m")
|
|
173
|
-
|
|
174
|
-
# Register with Ollama separately
|
|
175
|
-
from backpropagate import register_with_ollama
|
|
176
|
-
register_with_ollama(result.path, "my-finetuned-model")
|
|
177
|
-
# ollama run my-finetuned-model
|
|
178
|
-
```
|
|
179
|
-
|
|
180
|
-
### CLI
|
|
175
|
+
The CLI version:
|
|
181
176
|
|
|
182
177
|
```bash
|
|
183
|
-
backprop
|
|
184
|
-
backprop multi-run --data my_data.jsonl --runs 5 --steps 100
|
|
185
|
-
backprop export ./output/lora --format gguf --quantization q4_k_m --ollama --ollama-name my-model
|
|
186
|
-
backprop ui --port 7862
|
|
187
|
-
backprop info
|
|
188
|
-
backprop list-runs # v1.1.0: query past training runs
|
|
189
|
-
backprop show-run <run-id> # v1.1.0: detail view
|
|
190
|
-
backprop resume <run-id> # v1.1.0: resume a crashed multi-run
|
|
191
|
-
backprop push ./output/lora --repo me/my-model # v1.1.0: push adapter to HF Hub
|
|
178
|
+
backprop multi-run --data my_data.jsonl --runs 5 --steps 100 --samples 1000
|
|
192
179
|
```
|
|
193
180
|
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
### Resume from checkpoint (v1.1.0)
|
|
181
|
+
## Resume from checkpoint
|
|
197
182
|
|
|
198
|
-
A 5-run
|
|
183
|
+
A 5-run training that crashes at run 4 is recoverable. Every multi-run session writes its run ID into the on-disk history and checkpoint manifest, so picking up where you left off is one command:
|
|
199
184
|
|
|
200
185
|
```bash
|
|
201
|
-
backprop resume <run-id>
|
|
202
|
-
backprop multi-run --data ... --resume <run-id>
|
|
203
|
-
backprop train --data ... --resume <run-id>
|
|
186
|
+
backprop resume <run-id>
|
|
187
|
+
backprop multi-run --data ... --resume <run-id>
|
|
188
|
+
backprop train --data ... --resume <run-id> # single-run resume
|
|
204
189
|
```
|
|
205
190
|
|
|
206
|
-
The default behavior of `backprop multi-run` (no `--resume`) auto-detects an in-progress entry
|
|
207
|
-
|
|
208
|
-
When a multi-run resumes, the latest checkpoint for that run_id is loaded into the model, the SLAO merger state is restored from `slao/` next to the checkpoint, and the run loop continues from `last_completed_run + 1`. The history entry's `status` flips back to `running` so `backprop list-runs --status running` shows the live session.
|
|
191
|
+
The default behavior of `backprop multi-run` (no `--resume`) auto-detects an in-progress entry in the same output directory and continues it. To force a clean start, point at a fresh output directory.
|
|
209
192
|
|
|
210
|
-
|
|
193
|
+
## Training history
|
|
211
194
|
|
|
212
|
-
`
|
|
195
|
+
Every `backprop train` and `backprop multi-run` invocation records a row in `<output>/run_history.json` — model used, dataset, hyperparameters, status, final loss, loss history. You can list and inspect past runs:
|
|
213
196
|
|
|
214
197
|
```bash
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
backprop
|
|
198
|
+
backprop list-runs # last 20 runs
|
|
199
|
+
backprop list-runs --status failed # filter by status
|
|
200
|
+
backprop list-runs --json --limit 100 # machine-readable
|
|
201
|
+
backprop show-run abcd1234 # detail view (partial ID is fine)
|
|
218
202
|
```
|
|
219
203
|
|
|
220
|
-
|
|
221
|
-
|
|
222
|
-
### Training history
|
|
204
|
+
## Experiment tracking
|
|
223
205
|
|
|
224
|
-
|
|
206
|
+
Backpropagate auto-detects installed experiment trackers (Weights & Biases, TensorBoard, MLflow) and wires them in. If `wandb` is installed and you're logged in, every run automatically logs to W&B with a run name that matches the on-disk run ID — so you can grep across W&B, your logs, and `run_history.json` using one identifier.
|
|
225
207
|
|
|
226
208
|
```bash
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
backprop
|
|
230
|
-
backprop show-run abcd1234 # detail view (partial run_id ok)
|
|
209
|
+
pip install backpropagate[monitoring] # installs wandb + psutil
|
|
210
|
+
wandb login # one-time setup
|
|
211
|
+
backprop train --data my_data.jsonl
|
|
231
212
|
```
|
|
232
213
|
|
|
233
|
-
|
|
214
|
+
Override with `Trainer(report_to=["wandb"])`, `Trainer(report_to=["tensorboard"])`, or `Trainer(report_to="none")` to opt out.
|
|
234
215
|
|
|
235
|
-
|
|
216
|
+
## Web UI
|
|
236
217
|
|
|
237
|
-
|
|
218
|
+
The Reflex web interface is opt-in — install with `pipx install "backpropagate[ui]"` and launch:
|
|
238
219
|
|
|
239
220
|
```bash
|
|
240
221
|
backprop ui --port 7862
|
|
241
222
|
```
|
|
242
223
|
|
|
243
|
-
To expose a public
|
|
224
|
+
The UI runs locally on `http://localhost:7862`. To expose it to other devices (other people on your network, a public URL, etc.) you must pair `--share` (or `--host`) with `--auth`:
|
|
244
225
|
|
|
245
226
|
```bash
|
|
246
227
|
backprop ui --share --auth alice:hunter2
|
|
247
228
|
```
|
|
248
229
|
|
|
249
|
-
`backprop ui --share` without `--auth` exits with
|
|
230
|
+
`backprop ui --share` without `--auth` exits with an error. The reason: `--share` publishes a URL anyone on the internet can reach, and without authentication that means anyone can drive your training pipeline and read your HuggingFace token. There is no opt-out for this — if you don't want to set credentials, use SSH port-forwarding instead:
|
|
250
231
|
|
|
251
|
-
|
|
232
|
+
```bash
|
|
233
|
+
# On the client:
|
|
234
|
+
ssh -L 7860:localhost:7860 <your-training-host>
|
|
235
|
+
# On the server:
|
|
236
|
+
backprop ui # no --share
|
|
237
|
+
# Then open http://localhost:7860 in your local browser
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
See [handbook/security.md](https://mcp-tool-shop-org.github.io/backpropagate/handbook/security/) for the full threat model.
|
|
252
241
|
|
|
253
242
|
Filesystem writes from the UI are sandboxed to a single directory:
|
|
254
243
|
|
|
255
244
|
- Default: `~/.backpropagate/ui-outputs`
|
|
256
|
-
- Override: `BACKPROPAGATE_UI__OUTPUT_DIR=/path/you/own`
|
|
257
|
-
- The override is
|
|
245
|
+
- Override: set `BACKPROPAGATE_UI__OUTPUT_DIR=/path/you/own`
|
|
246
|
+
- The override is denylist-validated — system or credential paths (`/etc`, `~/.ssh`, `~/.aws`, `C:\Windows\System32`, etc.) are refused
|
|
258
247
|
|
|
259
|
-
##
|
|
248
|
+
## Platform notes
|
|
260
249
|
|
|
261
|
-
|
|
250
|
+
**Requirements:** Python 3.10+ · CUDA GPU (8GB+ VRAM) · PyTorch 2.0+
|
|
262
251
|
|
|
263
|
-
|
|
264
|
-
- Automatic xformers disable for RTX 40/50 series
|
|
265
|
-
- Safe dataloader settings
|
|
266
|
-
- Tested on RTX 5080 (16GB VRAM)
|
|
252
|
+
Python 3.10 reaches upstream end-of-life in October 2026, and Backpropagate plans to drop 3.10 in v1.4. For new installs, prefer Python 3.11 or 3.12 — 3.11 is the most-tested floor.
|
|
267
253
|
|
|
268
|
-
|
|
254
|
+
Backpropagate handles the runtime quirks of training on different platforms, but it can't fix install-time problems. The two most common are:
|
|
269
255
|
|
|
270
|
-
|
|
271
|
-
|
|
272
|
-
| Qwen 2.5 7B | ~12GB | Medium | Best |
|
|
273
|
-
| Qwen 2.5 3B | ~8GB | Fast | Good |
|
|
274
|
-
| Llama 3.2 3B | ~8GB | Fast | Good |
|
|
275
|
-
| Llama 3.2 1B | ~6GB | Fastest | Basic |
|
|
276
|
-
| Mistral 7B | ~12GB | Medium | Good |
|
|
256
|
+
- **Wrong CUDA wheel.** PyTorch is published one binary per CUDA version. If you pick the wrong one, you silently get CPU-only PyTorch and training is impossibly slow. Use the wheel picker at <https://pytorch.org/get-started/locally/> for your driver. Run `nvidia-smi` to see your driver / CUDA version.
|
|
257
|
+
- **Windows + GGUF export.** The `[export]` extra builds `llama-cpp-python` from source, which needs Visual Studio Build Tools (C++ component) and CMake.
|
|
277
258
|
|
|
278
|
-
|
|
259
|
+
**macOS:** GPU training is not supported (no CUDA). You can run the trained adapter on a Mac via Ollama, but `trainer.train()` raises `DEP_GPU_NOT_AVAILABLE`. Use a CUDA Linux or Windows machine for the training itself.
|
|
279
260
|
|
|
261
|
+
See the [troubleshooting handbook page](https://mcp-tool-shop-org.github.io/backpropagate/handbook/troubleshooting/) for the long-form install fix-it guide, and the dedicated [CUDA troubleshooting page](https://mcp-tool-shop-org.github.io/backpropagate/handbook/troubleshooting-cuda/) for driver / VRAM / xformers / bf16-vs-fp16 issues.
|
|
262
|
+
|
|
263
|
+
## CLI
|
|
264
|
+
|
|
265
|
+
Every Python API has a CLI mirror:
|
|
266
|
+
|
|
267
|
+
```bash
|
|
268
|
+
backprop train --data my_data.jsonl --model Qwen/Qwen2.5-7B-Instruct --steps 100
|
|
269
|
+
backprop multi-run --data my_data.jsonl --runs 5 --steps 100
|
|
270
|
+
backprop export ./output/lora --format gguf --quantization q4_k_m --ollama --ollama-name my-model
|
|
271
|
+
backprop ui --port 7862
|
|
272
|
+
backprop info # environment + version snapshot
|
|
273
|
+
backprop list-runs # past training runs
|
|
274
|
+
backprop show-run <run-id> # detail view
|
|
275
|
+
backprop resume <run-id> # resume a crashed run
|
|
276
|
+
backprop push ./output/lora --repo me/my-model # push adapter to HuggingFace Hub
|
|
277
|
+
backprop diff-runs <run-a> <run-b> # diff two runs side by side
|
|
278
|
+
backprop replay <run-id> # re-run with same config / dataset
|
|
279
|
+
backprop export-runs --format jsonl # bulk export run history
|
|
280
280
|
```
|
|
281
|
-
|
|
282
|
-
|
|
283
|
-
|
|
284
|
-
|
|
285
|
-
|
|
286
|
-
|
|
287
|
-
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
|
|
291
|
-
|
|
292
|
-
|
|
293
|
-
|
|
294
|
-
|
|
295
|
-
|
|
296
|
-
|
|
297
|
-
|
|
298
|
-
|
|
299
|
-
|
|
300
|
-
|
|
301
|
-
|
|
302
|
-
|
|
303
|
-
|
|
304
|
-
|
|
305
|
-
|
|
281
|
+
|
|
282
|
+
Full reference at [the CLI handbook page](https://mcp-tool-shop-org.github.io/backpropagate/handbook/cli-reference/), or `backprop <subcommand> --help`.
|
|
283
|
+
|
|
284
|
+
## Configuration
|
|
285
|
+
|
|
286
|
+
Every setting can be overridden with an environment variable using the `BACKPROPAGATE_` prefix:
|
|
287
|
+
|
|
288
|
+
| Variable | Default | Notes |
|
|
289
|
+
|---|---|---|
|
|
290
|
+
| `BACKPROPAGATE_LOG_LEVEL` | `INFO` | `DEBUG` / `INFO` / `WARNING` / `ERROR` |
|
|
291
|
+
| `BACKPROPAGATE_LOG_JSON` | auto | Force JSON or console logs |
|
|
292
|
+
| `BACKPROPAGATE_MODEL__NAME` | `Qwen/Qwen2.5-7B-Instruct` | Default model |
|
|
293
|
+
| `BACKPROPAGATE_TRAINING__LEARNING_RATE` | `2e-4` | Learning rate |
|
|
294
|
+
| `BACKPROPAGATE_LORA__R` | `256` | LoRA rank (v1.3 default; pass `--lora-preset=fast` for the v1.2.x default of 16) |
|
|
295
|
+
| `BACKPROPAGATE_UI__OUTPUT_DIR` | `~/.backpropagate/ui-outputs` | UI filesystem sandbox |
|
|
296
|
+
|
|
297
|
+
Nested keys use double underscore (`MODEL__NAME`, not `MODEL_NAME`). The full reference is at [the env-vars handbook page](https://mcp-tool-shop-org.github.io/backpropagate/handbook/env-vars/).
|
|
298
|
+
|
|
299
|
+
## Model presets
|
|
300
|
+
|
|
301
|
+
| Preset | VRAM | License | Notes |
|
|
302
|
+
|---|---|---|---|
|
|
303
|
+
| Qwen-3.5-4B | ~8GB | Apache 2.0 | Recommended default for sub-5B. Best quality at this size. |
|
|
304
|
+
| Phi-4-mini-3.8B | ~8GB | MIT | Strong on reasoning / math / code. Strict license-clean. |
|
|
305
|
+
| SmolLM3-3B | ~6GB | Apache 2.0 | Fully open recipe. Native 64K context. |
|
|
306
|
+
| Qwen 2.5 7B | ~12GB | Apache 2.0 | Existing default. Best quality of the legacy 7B presets. |
|
|
307
|
+
| Qwen 2.5 3B | ~8GB | Qwen-Research | ⚠ research license — see Qwen license terms before commercial use. |
|
|
308
|
+
| Llama 3.2 3B | ~8GB | Llama Community | Solid alternative to Qwen 3B with permissive caveats. |
|
|
309
|
+
| Llama 3.2 1B | ~6GB | Llama Community | For quick experiments on small cards. |
|
|
310
|
+
| Mistral 7B | ~12GB | Apache 2.0 | Comparable to Qwen 7B, different chat template. |
|
|
311
|
+
|
|
312
|
+
Other models often work, but only these eight are pinned in CI. Pass `--lora-preset=quality` (default) for rank-256 / all-linear targets per Biderman 2024 + Thinking Machines 2025, or `--lora-preset=fast` for the legacy rank-16 / q+v target if you need the v1.2.x footprint.
|
|
306
313
|
|
|
307
314
|
## Troubleshooting
|
|
308
315
|
|
|
309
|
-
A short index of the most common first-run failures. The full reverse index
|
|
316
|
+
A short index of the most common first-run failures. The full reverse index is at [the troubleshooting handbook page](https://mcp-tool-shop-org.github.io/backpropagate/handbook/troubleshooting/). For driver / VRAM / mixed-precision deep-dive see the [CUDA troubleshooting page](https://mcp-tool-shop-org.github.io/backpropagate/handbook/troubleshooting-cuda/).
|
|
310
317
|
|
|
311
|
-
| Symptom |
|
|
312
|
-
|
|
313
|
-
| GPU runs out of memory mid-training | `RUNTIME_GPU_OOM` |
|
|
314
|
-
|
|
|
315
|
-
| Bad model name typo | `INPUT_VALIDATION_FAILED` or `DEP_MODEL_LOAD_FAILED` | Verify the `org/name` identifier at <https://huggingface.co/models>. |
|
|
318
|
+
| Symptom | Error code | Fix |
|
|
319
|
+
|---|---|---|
|
|
320
|
+
| GPU runs out of memory mid-training | `RUNTIME_GPU_OOM` | Automatic — Backpropagate halves the batch size and retries up to 3 times. To opt out: `Trainer(oom_recovery=False)`. To force smaller: `--batch-size 1`. |
|
|
321
|
+
| HuggingFace returns 401 / "model not found" | `DEP_MODEL_LOAD_FAILED` | `huggingface-cli login` and retry. For typos, copy the exact ID from <https://huggingface.co/models>. |
|
|
316
322
|
| `register_with_ollama` connection refused | `DEP_OLLAMA_REGISTRATION_FAILED` | Start the daemon: `ollama serve`. Install from <https://ollama.com>. Retryable. |
|
|
317
|
-
| Disk full during checkpoint save | `STATE_CHECKPOINT_INVALID` | Atomic writes leave a `.partial` directory on crash — safe to delete.
|
|
318
|
-
| Training paused
|
|
319
|
-
| `backprop ui --share` rejected | `INPUT_AUTH_REQUIRED` | Pass `--auth user:password`, or
|
|
320
|
-
| Multi-run "validation overlap" | `CONFIG_INVALID` (Stage A backend B-001) | Lower `--samples` below the training-pool size, increase dataset, or disable validation. |
|
|
323
|
+
| Disk full during checkpoint save | `STATE_CHECKPOINT_INVALID` | Atomic writes leave a `.partial` directory on crash — safe to delete. The previous good checkpoint is intact. |
|
|
324
|
+
| Training paused on GPU overheat | `RUNTIME_GPU_TEMPERATURE_CRITICAL` | Automatic — Backpropagate pauses on the temperature threshold and resumes as the GPU cools. Improve airflow if it keeps happening. |
|
|
325
|
+
| `backprop ui --share` rejected | `INPUT_AUTH_REQUIRED` | Pass `--auth user:password`, or use SSH port-forwarding instead (see [Web UI](#web-ui)). |
|
|
321
326
|
| GGUF export failed on first try | `RUNTIME_GGUF_EXPORT_FAILED` | `pip install backpropagate[export]`; on Windows you also need Visual C++ Build Tools + CMake. |
|
|
322
327
|
|
|
323
328
|
## Reporting bugs
|
|
324
329
|
|
|
325
|
-
When something fails, Backpropagate prints a `run_started run_id=<uuid>`
|
|
330
|
+
When something fails, Backpropagate prints a line at startup like `run_started run_id=<uuid>` and binds the same ID to every log line, every checkpoint, and every Weights & Biases entry. **Include the `run_id` in any bug report** — it lets a maintainer correlate everything for that exact run.
|
|
326
331
|
|
|
327
332
|
A good bug report includes:
|
|
328
333
|
|
|
329
|
-
1.
|
|
330
|
-
2. **The error code** — the `[CODE_NAME]: message` line in stderr
|
|
331
|
-
3. **The redacted command line.** Stderr
|
|
334
|
+
1. **The `run_id`** — the UUID printed at startup.
|
|
335
|
+
2. **The error code** — the `[CODE_NAME]: message` line in stderr. See [error codes](https://mcp-tool-shop-org.github.io/backpropagate/handbook/error-codes/) for the catalog.
|
|
336
|
+
3. **The redacted command line.** Stderr is automatically redacted (Bearer tokens, `sk-*`, `hf_*`, AWS keys, `password=` / `token=` pairs are scrubbed) — safe to paste. For the full unredacted traceback, re-run with `--verbose`, but review before posting.
|
|
332
337
|
4. **Python / PyTorch versions, GPU model, OS.** `backprop info` prints all of this in one go.
|
|
333
338
|
|
|
339
|
+
Questions, ideas, or "is this expected" threads belong in [GitHub Discussions](https://github.com/mcp-tool-shop-org/backpropagate/discussions). Security issues should be reported privately via the [GitHub Security Advisory](https://github.com/mcp-tool-shop-org/backpropagate/security/advisories/new) form — see [SECURITY.md](SECURITY.md) for the policy.
|
|
340
|
+
|
|
334
341
|
## Privacy
|
|
335
342
|
|
|
336
343
|
All training happens locally on your GPU. Backpropagate makes no network requests except to download models from HuggingFace (which you initiate). No telemetry, no cloud dependency.
|
|
337
344
|
|
|
338
|
-
##
|
|
345
|
+
## References
|
|
339
346
|
|
|
340
|
-
|
|
341
|
-
|----------|-------|-------|
|
|
342
|
-
| A. Security | 6/8 | SECURITY.md, trust model, no secrets/telemetry, safe_path(). MCP items skipped |
|
|
343
|
-
| B. Error Handling | 5/7 | Structured exception shape (`code`/`message`/`hint`/`cause`/`retryable`) via ERROR_CODES registry; CLI exit codes 0/1/2/3; no raw stack traces without `--verbose`; `run_id` correlation; redacted stderr; `--share`+`--auth` gating. MCP/desktop/vscode skipped. |
|
|
344
|
-
| C. Operator Docs | 4/7 | README, CHANGELOG, LICENSE, --help. Logging/MCP/complex skipped |
|
|
345
|
-
| D. Shipping Hygiene | 6/9 | verify.sh, version=tag, 5 scanners in CI, dependabot, python_requires, clean build |
|
|
346
|
-
| E. Identity | 4/4 | Logo, translations, landing page, metadata |
|
|
347
|
-
| **Total** | **25/31** | 14 items skipped with justification · `shipcheck audit` passes 100% · Audit date: 2026-05-21 (B-row re-graded after Stage B + Stage A CLI exit-code work) |
|
|
347
|
+
Backpropagate's defaults and multi-run training mode are built on recent research. If you're interested in the underlying techniques:
|
|
348
348
|
|
|
349
|
-
|
|
349
|
+
- **Hu et al. 2021.** *LoRA: Low-Rank Adaptation of Large Language Models.* [arXiv:2106.09685](https://arxiv.org/abs/2106.09685) — the foundational paper introducing LoRA, which is how Backpropagate trains adapters efficiently.
|
|
350
|
+
- **Biderman et al. 2024.** *LoRA Learns Less and Forgets Less.* [arXiv:2405.09673](https://arxiv.org/abs/2405.09673) — empirical evidence that LoRA at rank 256 with all-linear targets matches full fine-tuning quality on most post-training tasks at 67% of the compute. Drives Backpropagate's v1.3 default LoRA configuration.
|
|
351
|
+
- **Thinking Machines 2025.** *LoRA Without Regret.* [thinkingmachines.ai/blog/lora](https://thinkingmachines.ai/blog/lora/) — the practical follow-up identifying the 10× learning-rate-vs-full-FT correction needed at high LoRA rank.
|
|
352
|
+
- **Kirkpatrick et al. 2017.** *Overcoming catastrophic forgetting in neural networks.* [arXiv:1612.00796](https://arxiv.org/abs/1612.00796) — the original characterization of why neural networks "forget" earlier training when you fine-tune on new data (EWC — Elastic Weight Consolidation).
|
|
353
|
+
- **Wang et al. 2023.** *Orthogonal Subspace Learning for Language Model Continual Learning.* [arXiv:2310.14152](https://arxiv.org/abs/2310.14152) — O-LoRA, an earlier approach to using LoRA for continual learning by constraining new adapters to orthogonal subspaces.
|
|
354
|
+
- **Yadav et al. 2023.** *TIES-Merging: Resolving Interference When Merging Models.* [arXiv:2306.01708](https://arxiv.org/abs/2306.01708) — a foundational technique for merging multiple fine-tuned models without interference.
|
|
355
|
+
- **Qiao & Mahdavi 2025.** *Merge before Forget: A Single LoRA Continual Learning via Continual Merging.* [arXiv:2512.23017](https://arxiv.org/abs/2512.23017) — the specific algorithm Backpropagate's multi-run merger implements. A December 2025 preprint; Backpropagate is the paper's first known downstream adopter.
|
|
350
356
|
|
|
351
357
|
## License
|
|
352
358
|
|
|
353
|
-
MIT — see [LICENSE](LICENSE)
|
|
359
|
+
MIT — see [LICENSE](LICENSE).
|
|
354
360
|
|
|
355
361
|
---
|
|
356
362
|
|