@mcptoolshop/backpropagate 1.0.5 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,109 +1,359 @@
1
+ <p align="center">
2
+ <a href="README.ja.md">日本語</a> | <a href="README.zh.md">中文</a> | <a href="README.es.md">Español</a> | <a href="README.fr.md">Français</a> | <a href="README.hi.md">हिन्दी</a> | <a href="README.it.md">Italiano</a> | <a href="README.pt-BR.md">Português (BR)</a>
3
+ </p>
4
+
1
5
  <p align="center">
2
6
  <img src="https://raw.githubusercontent.com/mcp-tool-shop-org/brand/main/logos/backpropagate/readme.png" alt="Backpropagate" width="400">
3
7
  </p>
4
8
 
5
9
  <p align="center">
6
- <a href="https://www.npmjs.com/package/@mcptoolshop/backpropagate"><img src="https://img.shields.io/npm/v/@mcptoolshop/backpropagate" alt="npm version"></a>
7
- <a href="https://github.com/mcp-tool-shop-org/backpropagate/actions/workflows/release-binaries.yml"><img src="https://github.com/mcp-tool-shop-org/backpropagate/actions/workflows/release-binaries.yml/badge.svg" alt="Release Binaries"></a>
8
- <a href="https://github.com/mcp-tool-shop-org/backpropagate/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue" alt="MIT License"></a>
10
+ <a href="https://github.com/mcp-tool-shop-org/backpropagate/actions/workflows/ci.yml"><img src="https://github.com/mcp-tool-shop-org/backpropagate/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
11
+ <a href="https://pypi.org/project/backpropagate/"><img src="https://img.shields.io/pypi/v/backpropagate" alt="PyPI"></a>
12
+ <a href="https://codecov.io/gh/mcp-tool-shop-org/backpropagate"><img src="https://img.shields.io/codecov/c/github/mcp-tool-shop-org/backpropagate" alt="Coverage"></a>
13
+ <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue" alt="MIT License"></a>
9
14
  <a href="https://mcp-tool-shop-org.github.io/backpropagate/"><img src="https://img.shields.io/badge/Landing_Page-live-blue" alt="Landing Page"></a>
10
15
  </p>
11
16
 
12
- Headless LLM fine-tuning CLI with smart defaults. Train, export, and serve models from a single command. Zero-dependency npm distribution powered by [@mcptoolshop/npm-launcher](https://www.npmjs.com/package/@mcptoolshop/npm-launcher).
17
+ **Headless LLM fine-tuning in 3 lines. Smart defaults, VRAM-aware batch sizing, multi-run SLAO, and one-click GGUF export for Ollama.**
13
18
 
14
- ## Install
19
+ *SLAO is Single LoRA Continual Learning via Asymmetric Merging — the merge-between-runs technique that prevents catastrophic forgetting in extended fine-tuning campaigns ([paper](https://arxiv.org/abs/2512.23017)).*
15
20
 
16
- **macOS / Windows** zero-dependency binary via npm:
21
+ *Train LLMs in 3 lines of code. Export to Ollama in one more.*
22
+
23
+ ## Quick Start
17
24
 
18
25
  ```bash
19
- npx @mcptoolshop/backpropagate info
26
+ pip install backpropagate[standard]
20
27
  ```
21
28
 
22
- Or install globally:
29
+ ```python
30
+ from backpropagate import Trainer
31
+
32
+ trainer = Trainer("Qwen/Qwen2.5-7B-Instruct")
33
+ trainer.train("examples/quickstart.jsonl", steps=10)
34
+ trainer.export("gguf", quantization="q4_k_m") # Ready for Ollama
35
+ ```
36
+
37
+ The repo ships a small `examples/quickstart.jsonl` (5 ShareGPT-format examples) so the snippet above runs end-to-end on a clean install. For your own training, see [Dataset Format](#dataset-format) below.
38
+
39
+ ### No-code path: Web UI
40
+
41
+ Prefer a UI to a Python REPL? Install the same extra and run:
23
42
 
24
43
  ```bash
25
- npm install -g @mcptoolshop/backpropagate
44
+ pip install backpropagate[standard]
45
+ backprop ui --port 7862
26
46
  ```
27
47
 
28
- **Linux**install via pip (PyTorch is too large for a single-file binary):
48
+ The Reflex (Radix UI) interface lets you point at a JSONL file, pick a model, train, and export no Python required. The UI is local-first; for public-internet exposure see [Web UI](#web-ui) below for the `--share` + `--auth` security contract and supported tunnel options (Cloudflare Tunnel, ngrok).
49
+
50
+ ## Dataset Format
51
+
52
+ Your JSONL training file should have one example per line. The simplest format is ShareGPT chat:
53
+
54
+ ```jsonl
55
+ {"conversations": [{"from": "human", "value": "What is Python?"}, {"from": "gpt", "value": "A programming language."}]}
56
+ {"conversations": [{"from": "human", "value": "Explain recursion."}, {"from": "gpt", "value": "A function that calls itself."}]}
57
+ ```
58
+
59
+ Alpaca (`instruction`/`output`), OpenAI chat (`messages`), and raw text formats are also supported. See `examples/quickstart.jsonl` for a copyable starting point.
60
+
61
+ ## Why Backpropagate?
62
+
63
+ | Problem | Solution |
64
+ |---------|----------|
65
+ | Fine-tuning is complex | 3 lines: load, train, save |
66
+ | Windows is a nightmare | First-class Windows support |
67
+ | VRAM management is hard | Auto batch sizing, GPU monitoring |
68
+ | Model export is confusing | One-click GGUF + Ollama registration |
69
+ | Long runs cause forgetting | Multi-run SLAO training |
70
+
71
+ ## Key Features
72
+
73
+ - **Headless by Design**: Built for CI/CD pipelines, automated workflows, and programmatic execution.
74
+ - **Smart Defaults**: Automatically configures optimal hyperparameters based on your hardware and dataset.
75
+ - **Multi-Run SLAO Training**: Advanced training strategies to prevent catastrophic forgetting during long runs.
76
+ - **First-Class Windows Support**: Tested and optimized for Windows environments, avoiding common PyTorch/CUDA pitfalls.
77
+ - **Seamless Export**: One-click export to GGUF format and automatic registration with Ollama.
78
+ - **Modular Architecture**: Install only the dependencies you need (e.g., `[unsloth]`, `[ui]`, `[export]`).
79
+
80
+ ## Installation
29
81
 
30
82
  ```bash
31
- pipx install backpropagate
32
- # or: pip install backpropagate
83
+ pip install backpropagate # Core only (minimal)
84
+ pip install backpropagate[unsloth] # + Unsloth 2x faster training
85
+ pip install backpropagate[ui] # + Reflex (Radix UI) web interface
86
+ pip install backpropagate[standard] # unsloth + ui (recommended)
87
+ pip install backpropagate[full] # Everything
33
88
  ```
34
89
 
35
- The npm package downloads a platform-specific binary from GitHub Releases, verifies its SHA256 checksum, and caches it locally. Subsequent runs start instantly.
90
+ | Extra | Description | Dependencies |
91
+ |-------|-------------|--------------|
92
+ | `unsloth` | 2x faster training, 50% less VRAM | unsloth |
93
+ | `ui` | Reflex (Radix UI) web interface | reflex>=0.9.2, fastapi>=0.115 |
94
+ | `validation` | Pydantic config validation | pydantic, pydantic-settings |
95
+ | `export` | GGUF export for Ollama | llama-cpp-python |
96
+ | `monitoring` | WandB + system monitoring (auto-wired into trainer in v1.1.0) | wandb, psutil |
97
+ | `observability` | OpenTelemetry tracing | opentelemetry-api, opentelemetry-sdk |
98
+ | `logging` | Structured logging | structlog |
99
+ | `security` | JWT auth + token generation | PyJWT, cryptography |
100
+ | `production` | unsloth + ui + validation + logging + security | (bundle) |
101
+
102
+ **Requirements:** Python 3.10+ · CUDA GPU (8GB+ VRAM) · PyTorch 2.0+
103
+
104
+ ### Platform prerequisites
105
+
106
+ Backpropagate handles the runtime quirks (multiprocessing, xformers on RTX 40/50, dataloader workers on Windows). It does **not** handle the install-time platform pain — fix those first:
107
+
108
+ - **CUDA toolkit version.** PyTorch is published per-CUDA — picking the wrong wheel silently installs CPU-only torch. Use the picker at <https://pytorch.org/get-started/locally/> for the exact `pip install torch ...` command for your driver. Run `nvidia-smi` to see your driver / CUDA version.
109
+ - **Windows.** Visual Studio Build Tools (C++) and CMake are required for the `[export]` extra (`llama-cpp-python` builds from source). `bitsandbytes` wheel is published for Windows natively now (>= 0.43); older guides mentioning `bitsandbytes-windows` are stale.
110
+ - **macOS.** GPU training is **not supported** — no CUDA. You can install Backpropagate to run *inference* on an exported GGUF via Ollama, but `trainer.train()` raises `DEP_GPU_NOT_AVAILABLE`. Use a CUDA machine for training.
111
+ - **Linux.** Most distros work out of the box. If you're using the PyPI binary release, note that the Linux build uses CPU-only torch (to stay under GitHub's 2 GB release-asset cap); install with the matching CUDA wheel from pytorch.org first.
112
+
113
+ For the long-form install troubleshooting, see [the troubleshooting handbook page](https://mcp-tool-shop-org.github.io/backpropagate/handbook/troubleshooting/).
114
+
115
+ ## Configuration
116
+
117
+ All settings can be overridden with environment variables using the `BACKPROPAGATE_` prefix (e.g., `BACKPROPAGATE_LOG_LEVEL=debug`). A `.env` file in the project root is loaded automatically when the `[validation]` extra is installed.
118
+
119
+ Common knobs (see [the full env-vars reference](https://mcp-tool-shop-org.github.io/backpropagate/handbook/env-vars/) for everything):
120
+
121
+ | Variable | Default | Notes |
122
+ |----------|---------|-------|
123
+ | `BACKPROPAGATE_LOG_LEVEL` | `INFO` | `DEBUG` / `INFO` / `WARNING` / `ERROR` |
124
+ | `BACKPROPAGATE_LOG_JSON` | auto | Force JSON (`true`) or console (`false`) logs |
125
+ | `BACKPROPAGATE_LOG_FILE` | unset | Path to mirror logs into |
126
+ | `BACKPROPAGATE_DEFER_FEATURE_DETECTION` | unset | Skip optional-dep detection at startup for the fastest CLI cold start |
127
+ | `BACKPROPAGATE_SECURITY__REQUIRE_AUTH_FOR_SHARE` | `true` | When `true`, refuses `backprop ui --share` without `--auth` |
128
+ | `BACKPROPAGATE_UI__OUTPUT_DIR` | `~/.backpropagate/ui-outputs` | Sandbox base for all UI filesystem writes; denylist-validated |
129
+ | `BACKPROPAGATE_MODEL__NAME` | `Qwen/Qwen2.5-7B-Instruct` | Default model |
130
+ | `BACKPROPAGATE_TRAINING__LEARNING_RATE` | `2e-4` | Learning rate |
131
+ | `BACKPROPAGATE_LORA__R` | `16` | LoRA rank |
132
+
133
+ Nested keys use double underscore as the delimiter (Pydantic `env_nested_delimiter` convention).
36
134
 
37
135
  ## Usage
38
136
 
39
- ```bash
40
- # Show system info (GPU, Python, PyTorch, CUDA)
41
- backpropagate info
137
+ ### Basic Training
42
138
 
43
- # Fine-tune a model
44
- backpropagate train --model unsloth/Qwen2.5-7B-Instruct-bnb-4bit --data my_data.jsonl --steps 100
139
+ ```python
140
+ from backpropagate import Trainer
45
141
 
46
- # Multi-run training with SLAO merging
47
- backpropagate multi-run --model meta-llama/Llama-3-8B --data train.jsonl --runs 5
142
+ trainer = Trainer("Qwen/Qwen2.5-7B-Instruct")
143
+ trainer.train("my_data.jsonl", steps=100)
144
+ trainer.save("./my-model")
145
+ trainer.export("gguf", quantization="q4_k_m")
146
+ ```
147
+
148
+ `Qwen/Qwen2.5-7B-Instruct` is the canonical default — the value `Trainer()` resolves when called with no model argument (see [`config.py`](backpropagate/config.py) `ModelConfig.name`). Older examples pinned the pre-quantized `unsloth/Qwen2.5-7B-Instruct-bnb-4bit`; we switched the default to the official Qwen weights for better reliability ([CHANGELOG v0.1.3](CHANGELOG.md)). Either model works.
149
+
150
+ ### Multi-Run SLAO Training
48
151
 
49
- # Export to GGUF for Ollama
50
- backpropagate export ./my-model --format gguf --quantization q4_k_m --ollama
152
+ ```python
153
+ from backpropagate import Trainer
51
154
 
52
- # Launch web UI
53
- backpropagate ui
155
+ trainer = Trainer("Qwen/Qwen2.5-7B-Instruct")
54
156
 
55
- # View/modify configuration
56
- backpropagate config --show
157
+ result = trainer.multi_run(
158
+ dataset="HuggingFaceH4/ultrachat_200k",
159
+ num_runs=5,
160
+ steps_per_run=100,
161
+ samples_per_run=1000,
162
+ merge_mode="slao", # Single LoRA Continual Learning via Asymmetric Merging
163
+ )
57
164
  ```
58
165
 
59
- ## Features
166
+ SLAO (Single LoRA Continual Learning via Asymmetric Merging) implements the [Merge before Forget](https://arxiv.org/abs/2512.23017) paper: orthogonal A-matrix init via QR decomposition, asymmetric A/B handling, and time-aware `λ(i) = 1/√i` scaling. The CLI flag is `--samples` (the underlying field is `samples_per_run`).
60
167
 
61
- - **Smart Defaults** — Auto-configures hyperparameters based on your hardware and dataset
62
- - **VRAM-Aware** — Automatic batch sizing and GPU memory management
63
- - **Multi-Run SLAO** — Prevents catastrophic forgetting during long training runs
64
- - **One-Click Export** — GGUF export with automatic Ollama registration
65
- - **Windows-First** — Tested and optimized for Windows, Linux, and macOS
66
- - **Headless** — Built for CI/CD pipelines and automated workflows
168
+ ### Export to Ollama
67
169
 
68
- ## GPU Training
170
+ ```python
171
+ # Export to GGUF
172
+ result = trainer.export("gguf", quantization="q4_k_m")
173
+
174
+ # Register with Ollama separately
175
+ from backpropagate import register_with_ollama
176
+ register_with_ollama(result.path, "my-finetuned-model")
177
+ # ollama run my-finetuned-model
178
+ ```
69
179
 
70
- The npm-distributed binary uses CPU-only PyTorch to keep download sizes manageable. For GPU-accelerated training, install via pip instead:
180
+ ### CLI
71
181
 
72
182
  ```bash
73
- pip install backpropagate[standard]
183
+ backprop train --data my_data.jsonl --model Qwen/Qwen2.5-7B-Instruct --steps 100
184
+ backprop multi-run --data my_data.jsonl --runs 5 --steps 100
185
+ backprop export ./output/lora --format gguf --quantization q4_k_m --ollama --ollama-name my-model
186
+ backprop ui --port 7862
187
+ backprop info
188
+ backprop list-runs # v1.1.0: query past training runs
189
+ backprop show-run <run-id> # v1.1.0: detail view
190
+ backprop resume <run-id> # v1.1.0: resume a crashed multi-run
191
+ backprop push ./output/lora --repo me/my-model # v1.1.0: push adapter to HF Hub
192
+ ```
193
+
194
+ See the [CLI reference](https://mcp-tool-shop-org.github.io/backpropagate/handbook/cli-reference/) for every subcommand and flag, or run `backprop <subcommand> --help`.
195
+
196
+ ### Resume from checkpoint (v1.1.0)
197
+
198
+ A 5-run multi-run that crashes at run 4 is now recoverable. Every multi-run session writes its run_id into both `run_history.json` and the on-disk checkpoint manifest, so picking up where you left off is one command:
199
+
200
+ ```bash
201
+ backprop resume <run-id> # picks up the in-progress session
202
+ backprop multi-run --data ... --resume <run-id> # explicit form
203
+ backprop train --data ... --resume <run-id> # single-run resume (continues run_id)
74
204
  ```
75
205
 
76
- The npm binary is ideal for `info`, `config`, `export`, `ui`, and CPU-based inference. For production GPU training workloads, use the pip distribution.
206
+ The default behavior of `backprop multi-run` (no `--resume`) auto-detects an in-progress entry for the same output directory and continues it. Pass `resume_from="off"` (Python API) or omit `--resume` and start in a fresh output dir to force a clean session.
207
+
208
+ When a multi-run resumes, the latest checkpoint for that run_id is loaded into the model, the SLAO merger state is restored from `slao/` next to the checkpoint, and the run loop continues from `last_completed_run + 1`. The history entry's `status` flips back to `running` so `backprop list-runs --status running` shows the live session.
77
209
 
78
- ## How It Works
210
+ ### Experiment tracking (v1.1.0)
79
211
 
80
- This package is a thin wrapper around [@mcptoolshop/npm-launcher](https://www.npmjs.com/package/@mcptoolshop/npm-launcher). On first run it:
212
+ `Trainer` auto-detects installed experiment trackers (`wandb`, `tensorboard`, `mlflow`) and wires them into the underlying `transformers.TrainingArguments`. The default `report_to="auto"` picks up whatever's importable:
213
+
214
+ ```bash
215
+ pip install backpropagate[monitoring] # installs wandb + psutil
216
+ wandb login # one-time
217
+ backprop train --data my_data.jsonl # W&B run gets the same run_id prefix as the on-disk history
218
+ ```
81
219
 
82
- 1. Detects your platform (darwin-arm64, win-x64)
83
- 2. Downloads the matching binary from [GitHub Releases](https://github.com/mcp-tool-shop-org/backpropagate/releases)
84
- 3. Verifies the SHA256 checksum
85
- 4. Caches the binary locally (~/.cache/mcptoolshop/ or %LOCALAPPDATA%\mcptoolshop\)
86
- 5. Runs the binary with full argument passthrough
220
+ Override with `Trainer(report_to=["wandb"])`, `Trainer(report_to=["tensorboard"])`, or `Trainer(report_to="none")` to opt out explicitly. For MLflow add `pip install mlflow`; for TensorBoard add `pip install tensorboard`. The W&B run name is `backprop-<run_id_prefix>` so an operator can grep across W&B, our logs, and `run_history.json` by the same identifier.
87
221
 
88
- On Linux, the package exits with a message directing you to `pipx install backpropagate` instead. PyTorch's native libraries (~1.5GB on x86_64) make a single-file binary impractical for GitHub release distribution.
222
+ ### Training history
89
223
 
90
- ## Cache Management
224
+ Every `backprop train` and `backprop multi-run` invocation records a row in `<output>/run_history.json` with the run_id, model, dataset, hyperparameters, status, final loss, loss history, and (for multi-run) the SLAO merge timeline. List recent runs:
91
225
 
92
226
  ```bash
93
- # Show where binaries are cached
94
- backpropagate --print-cache-path
227
+ backprop list-runs # most recent 20 runs, all statuses
228
+ backprop list-runs --status failed # filter
229
+ backprop list-runs --json --limit 100 # machine-readable
230
+ backprop show-run abcd1234 # detail view (partial run_id ok)
231
+ ```
232
+
233
+ Run history survives across processes — the `Runs` tab in the web UI is a separate, in-memory view; the on-disk history is the source of truth for `list-runs` / `show-run` / `resume`.
234
+
235
+ ### Web UI
236
+
237
+ Launch the Reflex interface locally:
238
+
239
+ ```bash
240
+ backprop ui --port 7862
241
+ ```
242
+
243
+ To expose a public-internet URL, you must pair `--share` with `--auth`:
244
+
245
+ ```bash
246
+ backprop ui --share --auth alice:hunter2
247
+ ```
248
+
249
+ `backprop ui --share` without `--auth` exits with code `1` and the structured error `[INPUT_AUTH_REQUIRED]`. The rationale: `--share` publishes a `*.gradio.live` URL that anyone on the internet can hit, and without auth that means anyone can drive your training pipeline.
250
+
251
+ To explicitly opt out (e.g. an internal dev environment), set the env var `BACKPROPAGATE_SECURITY__REQUIRE_AUTH_FOR_SHARE=false`. A loud warning will print on every launch — and there's a 5-second grace period before the unauth'd UI binds, so you can `Ctrl-C` if it looks wrong.
252
+
253
+ Filesystem writes from the UI are sandboxed to a single directory:
254
+
255
+ - Default: `~/.backpropagate/ui-outputs`
256
+ - Override: `BACKPROPAGATE_UI__OUTPUT_DIR=/path/you/own`
257
+ - The override is **denylist-validated** — system / credential paths (`/etc`, `/var`, `~/.ssh`, `~/.aws`, `C:\Windows\System32`, etc.) are refused with `[UI_OUTPUT_DIR_FORBIDDEN]`.
258
+
259
+ ## Windows Support
260
+
261
+ Backpropagate is designed to work on Windows out of the box:
262
+
263
+ - Pre-tokenization to avoid multiprocessing crashes
264
+ - Automatic xformers disable for RTX 40/50 series
265
+ - Safe dataloader settings
266
+ - Tested on RTX 5080 (16GB VRAM)
267
+
268
+ ## Model Presets
269
+
270
+ | Preset | VRAM | Speed | Quality |
271
+ |--------|------|-------|---------|
272
+ | Qwen 2.5 7B | ~12GB | Medium | Best |
273
+ | Qwen 2.5 3B | ~8GB | Fast | Good |
274
+ | Llama 3.2 3B | ~8GB | Fast | Good |
275
+ | Llama 3.2 1B | ~6GB | Fastest | Basic |
276
+ | Mistral 7B | ~12GB | Medium | Good |
277
+
278
+ ## Architecture
95
279
 
96
- # Clear cached binaries
97
- backpropagate --clear-cache
98
280
  ```
281
+ backpropagate/
282
+ ├── trainer.py # Core Trainer class
283
+ ├── multi_run.py # Multi-run SLAO training
284
+ ├── slao.py # SLAO LoRA merging algorithm
285
+ ├── datasets.py # Dataset loading, filtering & curriculum
286
+ ├── export.py # GGUF/Ollama export
287
+ ├── config.py # Pydantic settings + training presets
288
+ ├── gpu_safety.py # GPU monitoring & safety
289
+ ├── cli.py # CLI entry point (backprop command)
290
+ ├── checkpoints.py # Checkpoint management
291
+ ├── exceptions.py # Structured error hierarchy
292
+ ├── feature_flags.py # Optional feature detection
293
+ ├── security.py # Path traversal & torch security
294
+ ├── logging_config.py # Structured logging setup
295
+ ├── ui_theme.py # Radix theme tokens + CSS (Reflex era)
296
+ ├── ui_state.py # rx.State subclasses
297
+ ├── ui_app/ # Reflex web interface (Radix UI)
298
+ │ ├── app.py # rx.App entry point
299
+ │ ├── chrome.py # Header / LeftNav / SideRail / Footer
300
+ │ ├── pages/ # Train / Multi-Run / Export / Dataset
301
+ │ └── components/ # Bp* primitives (status pill, sparkline, event log…)
302
+ ├── ui_security.py # Rate limiting, CSRF, file validation (framework-agnostic)
303
+ ├── ui_gradio_legacy.py # DEPRECATED — preserved as v1.0 reference; removed in v1.2
304
+ └── theme_gradio_legacy.py # DEPRECATED — same
305
+ ```
306
+
307
+ ## Troubleshooting
308
+
309
+ A short index of the most common first-run failures. The full reverse index lives at [the troubleshooting handbook page](https://mcp-tool-shop-org.github.io/backpropagate/handbook/troubleshooting/); every code below is documented at [error codes](https://mcp-tool-shop-org.github.io/backpropagate/handbook/error-codes/).
310
+
311
+ | Symptom | Code | Fix |
312
+ |---------|------|-----|
313
+ | GPU runs out of memory mid-training | `RUNTIME_GPU_OOM` | OOM auto-recovery (B-002) halves batch size up to 3 times automatically. To opt out: `Trainer(oom_recovery=False)`. To force smaller: `--batch-size 1`. |
314
+ | HF Hub returns 401 / "model not found" | `DEP_MODEL_LOAD_FAILED` | `huggingface-cli login` and re-try. For typos, copy the exact id from <https://huggingface.co/models>. |
315
+ | Bad model name typo | `INPUT_VALIDATION_FAILED` or `DEP_MODEL_LOAD_FAILED` | Verify the `org/name` identifier at <https://huggingface.co/models>. |
316
+ | `register_with_ollama` connection refused | `DEP_OLLAMA_REGISTRATION_FAILED` | Start the daemon: `ollama serve`. Install from <https://ollama.com>. Retryable. |
317
+ | Disk full during checkpoint save | `STATE_CHECKPOINT_INVALID` | Atomic writes leave a `.partial` directory on crash — safe to delete. Previous good checkpoint is intact. |
318
+ | Training paused / aborted on GPU overheat | `RUNTIME_GPU_TEMPERATURE_CRITICAL` | B-003 monitor pauses on NVML temp threshold; resumes automatically as the GPU cools. Improve airflow or lower sustained load. |
319
+ | `backprop ui --share` rejected | `INPUT_AUTH_REQUIRED` | Pass `--auth user:password`, or set `BACKPROPAGATE_SECURITY__REQUIRE_AUTH_FOR_SHARE=false` to opt out (loud warning). |
320
+ | Multi-run "validation overlap" | `CONFIG_INVALID` (Stage A backend B-001) | Lower `--samples` below the training-pool size, increase dataset, or disable validation. |
321
+ | GGUF export failed on first try | `RUNTIME_GGUF_EXPORT_FAILED` | `pip install backpropagate[export]`; on Windows you also need Visual C++ Build Tools + CMake. |
322
+
323
+ ## Reporting bugs
99
324
 
100
- ## Links
325
+ When something fails, Backpropagate prints a `run_started run_id=<uuid>` line at startup and binds the same id to checkpoint manifests, SLAO merge history, and structured log lines. Include the `run_id` in any bug report — it lets a maintainer correlate every log line, every checkpoint, and every merge for that exact run.
101
326
 
102
- - [Source Code](https://github.com/mcp-tool-shop-org/backpropagate)
103
- - [PyPI Package](https://pypi.org/project/backpropagate/)
104
- - [Landing Page](https://mcp-tool-shop-org.github.io/backpropagate/)
105
- - [Changelog](https://github.com/mcp-tool-shop-org/backpropagate/blob/main/CHANGELOG.md)
327
+ A good bug report includes:
328
+
329
+ 1. **`run_id`** — the uuid printed at startup (also available as `TrainingRun.run_id` and `RunResult.run_id`).
330
+ 2. **The error code** — the `[CODE_NAME]: message` line in stderr is what to grep for; see [error codes](https://mcp-tool-shop-org.github.io/backpropagate/handbook/error-codes/) for the catalog.
331
+ 3. **The redacted command line.** Stderr in non-verbose mode is automatically redacted (Bearer tokens, `sk-*`, `hf_*`, AWS keys, `password=`/`token=`/`api_key=` pairs are scrubbed) — safe to paste. For the full unredacted traceback, re-run with `--verbose`, but review before posting.
332
+ 4. **Python / PyTorch versions, GPU model, OS.** `backprop info` prints all of this in one go.
333
+
334
+ ## Privacy
335
+
336
+ All training happens locally on your GPU. Backpropagate makes no network requests except to download models from HuggingFace (which you initiate). No telemetry, no cloud dependency.
337
+
338
+ ## Scorecard
339
+
340
+ | Category | Score | Notes |
341
+ |----------|-------|-------|
342
+ | A. Security | 6/8 | SECURITY.md, trust model, no secrets/telemetry, safe_path(). MCP items skipped |
343
+ | B. Error Handling | 5/7 | Structured exception shape (`code`/`message`/`hint`/`cause`/`retryable`) via ERROR_CODES registry; CLI exit codes 0/1/2/3; no raw stack traces without `--verbose`; `run_id` correlation; redacted stderr; `--share`+`--auth` gating. MCP/desktop/vscode skipped. |
344
+ | C. Operator Docs | 4/7 | README, CHANGELOG, LICENSE, --help. Logging/MCP/complex skipped |
345
+ | D. Shipping Hygiene | 6/9 | verify.sh, version=tag, 5 scanners in CI, dependabot, python_requires, clean build |
346
+ | E. Identity | 4/4 | Logo, translations, landing page, metadata |
347
+ | **Total** | **25/31** | 14 items skipped with justification · `shipcheck audit` passes 100% · Audit date: 2026-05-21 (B-row re-graded after Stage B + Stage A CLI exit-code work) |
348
+
349
+ Design history and what each line item maps to: see [ROADMAP.md](ROADMAP.md) — all Week 1–4 items are shipped in v1.1.0.
106
350
 
107
351
  ## License
108
352
 
109
- MIT
353
+ MIT — see [LICENSE](LICENSE) for details.
354
+
355
+ ---
356
+
357
+ <p align="center">
358
+ Built by <a href="https://mcp-tool-shop.github.io/">MCP Tool Shop</a>
359
+ </p>