PyPI - soup-cli - Versions diffs - 0.53.0__tar.gz → 0.53.2__tar.gz - Mend

soup-cli 0.53.0tar.gz → 0.53.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (525) hide show

{soup_cli-0.53.0 → soup_cli-0.53.2}/CONTRIBUTING.md RENAMED Viewed

@@ -107,11 +107,11 @@ soup_cli/
   cans/               - Shareable .can artifact format + run/publish orchestrator (v0.26.0 + v0.33.0)
   data/traces/        - Trace-to-Preference harvester (v0.26.0)
   data/collators.py   - CrossDocCollator for sample packing (v0.33.0)
-  utils/              - GPU, errors, MoE, GaLore, QAT, Unsloth, vLLM, SGLang, Liger, FlashAttn, FSDP, Ring Attention, long-context, quality, curriculum, freeze, dataset-registry, mlx, peft_builder, paths, topology, launcher, mii, pipeline, cut_ce, fp8, gradient_ckpt, kernel_picker, cross_doc_attn, activation_offload, hf, spec_pairing, structured_output, metrics, tracing, auto_quant, lr_finder, grad_accum, mixed_precision, warmup, spike_recovery, convergence, v028_features, multipack_sampler, multipack, neat_packing, jinja_analyzer, quant_menu, relora, peft_patches, peft_wiring, dpo_variants, optimizer_zoo, lr_groups, loftq_init, block_expansion, tts, classifier, distill, bitnet, ebft_gdpo, moe_quant, reasoning_effort, gguf_quant, kv_cache, advanced_precision, save_formats
+  utils/              - GPU, errors, MoE, GaLore, QAT, Unsloth, vLLM, SGLang, Liger, FlashAttn, FSDP, Ring Attention, long-context, quality, curriculum, freeze, dataset-registry, mlx, peft_builder, paths, topology, launcher, mii, pipeline, cut_ce, fp8, gradient_ckpt, kernel_picker, cross_doc_attn, activation_offload, hf, spec_pairing, structured_output, metrics, tracing, auto_quant, lr_finder, grad_accum, mixed_precision, warmup, spike_recovery, convergence, v028_features, multipack_sampler, multipack, neat_packing, jinja_analyzer, quant_menu, relora, peft_patches, peft_wiring, dpo_variants, optimizer_zoo, lr_groups, loftq_init, block_expansion, tts, classifier, distill, bitnet, ebft_gdpo, moe_quant, reasoning_effort, gguf_quant, kv_cache, advanced_precision, save_formats, deploy_measure
   templates/          - 17 built-in soup.yaml templates (YAML + manifest.json) with load_template loader (v0.39.0, +bco v0.40.0)
   ui/                 - Web UI (FastAPI + HTML/JS SPA)
-tests/                - Test suite (180 files, 7610 tests)
+tests/                - Test suite (185 files, 7842 tests)
 examples/             - Real-world config examples and datasets
 ```
@@ -263,6 +263,10 @@ pytest tests/ --cov=soup_cli --cov-report=html
 | test_v0500_part_d.py | v0.50.0 Part D — 7 stability/efficiency knobs (`ref_model_ema_alpha` / `replay_buffer_size` / `async_grpo_prefetch` / `tis_threshold` / `mask_truncated_completions` / `defer_rerolling` / `skip_zero_advantage` / `off_policy_mask_threshold`); explicit bool-rejection field_validator across all numeric fields (tdd-guide HIGH fix); `mask_truncated_completions` requires `tis_threshold` cross-validator; SoupConfig task-gate naming every offending field; `grpo_fp16` task-gate (code-review HIGH fix) (v0.50.0 Part D) |
 | test_v0500_part_e.py | v0.50.0 Part E — `task='prm'` (Process Reward Model) + `vision_grpo` flag; `validate_prm_compat` (data.format / modality / mlx gates); `validate_vision_grpo_compat` (task ∈ {grpo, ppo} / modality='vision' / non-mlx); `build_prm_trainer` deferred stub; SoupConfig integration with all rejection paths exercised (v0.50.0 Part E) |
 | test_v0520.py | v0.52.0 Modality II — TTS / classifier / distill / BitNet / EBFT-GDPO / MoE quant / reasoning_effort: TTS family allowlist + per-family emotion allowlists (Orpheus + Oute) + validate_tts_compat; classifier / reranker / cross_encoder tasks + num_labels (with field_validator bool guard) + label_names dedup + classifier-only field gates; distill divergence (kl alias canonicalised, Literal excludes alias) + teacher_model + distill_temperature bounds; BitNet 1.58 quant + bitnet/tq1_0 export-format stubs + Falcon-E recipe + is_bitnet_model org-prefix detect; EBFT (structured/strided) + GDPO (standard/length_normalized/margin) variant allowlists + task gates; MoE expert quant (nf4/int8_rowwise) + train_router_only requiring moe_lora=true; reasoning_effort + train_on_eot with SFT-family task gate; 6 new recipes (5 TTS + Falcon-E BitNet); review-fix coverage (num_labels bool guard, Oute emotion allowlist, lazy-import in classifier validator, task gates, oversize / NaN / Inf matrices). Test count: 272 (v0.52.0) |
+| test_v0531_82.py | v0.53.1 #82 autopilot pre-quantized detection: `detect_prequantized_format` + `decide_quantization(prequantized=...)` + `detect_prequantized_format_from_path` with cwd-containment + config.json symlink rejection + name/config aliases + word-boundary regex (v0.53.1) |
+| test_v0531_142.py | v0.53.1 #142 merge_4bit + export_torchao live wiring: BNB-4bit single-stage merge + TorchAO PTQ with per-scheme kwarg allowlist + CLI `--save-format` + `--quant-config` + `load_quant_config` (yaml.safe_load + 256 KB cap + extension allowlist) + path TOCTOU (v0.53.1) |
+| test_v0531_139.py | v0.53.1 #139 export_advanced_gguf live: 3-stage llama.cpp pipeline (convert → imatrix → quantize) + UD-prefix strip + subprocess argv shape + `_prepare_calibration_text` JSONL alias fallback + null-byte strip + 50 MB cap + POSIX O_NOFOLLOW + `_safe_stderr` Rich escape (v0.53.1) |
+| test_v0531_109.py | v0.53.1 #109 deploy autopilot --measure: `compute_cache_key` + `sha_of_file` + `measure_candidate` OK/MINOR/MAJOR bands + `pick_best` soft-fallback (max-by-delta) + cache round-trip with symlink rejection on load AND save + CLI integration + `_MAX_CANDIDATES=32` cap + `render_measure_table` markup escape regression (v0.53.1) |
 | test_v0530.py | v0.53.0 Quant Menu II — UD GGUFs + KV cache + NVFP4 + LF parity + save formats: Parts A+B GGUF (UD ladder 14 entries + IQ 12 + Apple/ARM 10 frozensets + non-overlap invariant + `validate_*` case-insensitive + rejection matrix + `is_advanced_gguf_format` union + `_LOWER_INDEX` O(1) lookup + MappingProxyType immutability + `validate_calibration_data_path` shape rejection + 4096-boundary + `export_advanced_gguf` v0.53.1 deferred stub); Part C KV cache (`KV_CACHE_TYPES` frozenset + `validate_kv_cache_type` case + bool/null/oversize/non-string rejection + `requires_hopper` delegates to spec + `get_kv_cache_spec` frozen + schema fp8-on-mlx rejected with specific message + q8_0-on-mlx allowed); Part D advanced precision (`fp8_attention` requires `quantization_aware='fp8'` BEFORE mlx-gate ordering + bool guards on every string param + schema rejects-without-fp8-qat; `nvfp4` mlx + vision rejection + bool guards; `unsloth_bnb_4bit` backend='unsloth' + quantization='4bit' rejection matrix; `apply_*` deferred); Part E LF parity (`bnb_4bit_use_double_quant` rejects none/8bit/gptq parametrize; `llm_int8` rejects default-none + 4bit; `quantize_ref_model` happy on dpo/grpo/kto + rejects sft/pretrain; `quantize_reward_model` happy on ppo/reward_model + rejects dpo; explicit `TypeError("v0.53.0 flag must be bool")` from `_validate_v053_bool_fields`; explicit-null surfaces as `valid boolean` ValidationError); Part F save formats (`MERGE_SAVE_FORMATS` lowercase normalisation + rejection matrix; `TORCHAO_PTQ_SCHEMES` CASE-SENSITIVE — `int4weightonly` rejected; `validate_quant_config_path` 4096-boundary; `MergeSaveSpec` + `TorchAOPTQSpec` frozen + MappingProxyType immutability; `merge_4bit` + `export_torchao` deferred); Cross-cutting (full 5-field YAML round-trip + cardinality invariant + tautological-assert replaced with allowlist + idempotent re-validate + `get_gguf_spec` unknown raises + bool guards on backend/modality/quantization across every Part D validator). Test count: 154 (v0.53.0) |
 | test_v0510.py | v0.51.0 Model Catalog Expansion + Alternative Model Hubs: Part E hubs.py (`SUPPORTED_HUBS` + `validate_hub_name` + `validate_hub_endpoint` SSRF parity / CRLF rejection / IPv6 mapped private rejected / IPv6 loopback ok / control chars; `resolve_endpoint` env-var override; `default_endpoint` + `endpoint_env_var` + `required_hub_package` + `is_hf` with bool guards; MappingProxyType immutability); TrainingConfig `hub` field (default + Literal accept + None reject + case-insensitive normalisation + YAML round-trip) + SoupConfig `_validate_hub_supported` (mlx + non-hf rejected; mlx + hf accepted; modelers + transformers accepted); Part D MULTIPACK_ARCHITECTURES extension (20 new arches parametrize + legacy preserved + exact count=38 + frozenset immutability); Parts A/B/C 26 new recipes (parametrize over every name × {get_recipe / RecipeMeta / SoupConfig load / yaml.safe_load / model id no null/whitespace/empty parts / max_length bounds / GRPO required fields}); baichuan-sft uses `hub: modelscope`; total recipe count >= 105 (v0.51.0) |

{soup_cli-0.53.0 → soup_cli-0.53.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: soup-cli
-Version: 0.53.0
+Version: 0.53.2
 Summary: Fine-tune LLMs in one command. No SSH, no config hell.
 Project-URL: Homepage, https://github.com/MakazhanAlpamys/Soup
 Project-URL: Repository, https://github.com/MakazhanAlpamys/Soup
@@ -134,15 +134,14 @@ soup train
 Latest highlights only. Full history: [GitHub Releases](https://github.com/MakazhanAlpamys/Soup/releases).
-**v0.53.0 — Quant Menu II (UD GGUFs + KV cache + NVFP4 + LF parity + save formats)**: Unsloth Dynamic 2.0 GGUF ladder (UD-Q8_K_XL … UD-IQ1_M), IQ + Apple/ARM GGUF flavours, `kv_cache_type` (q8_0 / bf16 / f16 / fp8), FP8 attention, NVFP4 (Blackwell), explicit `unsloth_bnb_4bit`, BNB double-quant, ref/reward model quantization, merge-4bit save, TorchAO PTQ export schema — schema-only release; live llama.cpp imatrix + serve / merge / export wiring lands in v0.53.1.
+**v0.53.2 — Modality II live trainers**: Four v0.52.0 deferred stubs lifted into real, end-to-end-trainable wrappers — knowledge distillation, sequence classification, EBFT / GDPO loss kernels, and gpt-oss-style `reasoning_effort` system-prompt injection.
-- **Unsloth Dynamic 2.0 GGUF ladder.** 14-entry closed allowlist (UD-Q{8..2}_K_XL + UD-IQ{4_XS, 3_M, 3_XXS, 2_M, 2_XS, 2_XXS, 1_M, 1_S}) with `validate_ud_gguf_format` case-insensitive canonical normalisation. `--calibration-data <jsonl>` flag shape-validates now; cwd-containment + TOCTOU symlink rejection land at CLI dispatch in v0.53.1.
-- **IQ + Apple/ARM GGUF.** 12-entry IQ family (IQ1/2/3/4 — including IQ4_NL non-linear) + 10-entry Apple/ARM-friendly set (Q4_0_4_4 / Q4_NL / Q5_K_M / etc.) wrapped in `MappingProxyType` metadata.
-- **KV cache types.** New `training.kv_cache_type: q8_0 | bf16 | f16 | fp8`. FP8 is Hopper-only — cross-validator rejects `fp8` on the MLX backend; the SM-capability check fires at serve construction.
-- **FP8 attention + NVFP4 + native `unsloth_bnb_4bit`.** Three new bool flags. `fp8_attention=true` requires `quantization_aware='fp8'` and a non-MLX backend; `nvfp4=true` is gated to CUDA + text modality (Blackwell SM ≥ 12 check is runtime-only); `unsloth_bnb_4bit=true` requires `backend='unsloth'` + `quantization='4bit'`.
-- **LF / Axolotl parity.** `bnb_4bit_use_double_quant` (requires `quantization='4bit'`), `llm_int8` (asserts `quantization='8bit'` — distinct from v0.41.0 `load_in_8bit` aliasing), `quantize_ref_model` (extends v0.40.5 Quant Menu to the ref model on DPO/IPO/SimPO/ORPO/BCO/KTO/GRPO/PPO/preference), `quantize_reward_model` (PPO + reward_model tasks).
-- **Advanced save formats.** `soup merge --save-format 4bit | 4bit_forced` (single BNB-4bit merged checkpoint without dequant/merge/requant cycle) and `soup export --format torchao --quant-config <yaml>` (closed allowlist Int4WeightOnly / Int8DynActInt4 / Float8DynActFloat8 / NVFP4) — schema lands now; live writers land in v0.53.1.
-- **+157 net new tests** (7453 → 7610) across 154 tests in `test_v0530.py`. Five review agents (python / code / security / tdd / verification) ran in parallel; every CRITICAL / HIGH / MEDIUM / LOW finding fixed or documented: O(1) `_LOWER_INDEX` for GGUF lookup, ref-task set extended with GRPO + KTO, `_validate_v053_bool_fields` no longer silently coerces `None`, `requires_hopper` reads from spec metadata, `fp8_attention` validator order swapped so the `quantization_aware='fp8'` error fires first, `validate_calibration_data_path` / `validate_quant_config_path` docstrings name the exact controls v0.53.1 CLI dispatch must add.
+- **`soup train` with `task: distill`.** New `DistillTrainerWrapper`: student + frozen teacher both load via `AutoModelForCausalLM` (separate `trust_remote_code` resolution for each), KL / forward_KL / reverse_KL / JS divergence kernels scaled by `temperature**2` per the Hinton paper. Device-bridge: teacher inputs auto-move to the teacher's device, teacher logits move back onto the student's device before the KL kernel — survives HF Trainer's auto-CUDA promotion on a CPU-tagged run. `DataCollatorForSeq2Seq(label_pad_token_id=-100)` handles variable-length pre-tokenised loss-masked rows correctly.
+- **`soup train` with `task: classifier | reranker | cross_encoder`.** New `ClassifierTrainerWrapper`: `AutoModelForSequenceClassification` with `num_labels` and `label_names`, auto-routes `single_label_classification` / `multi_label_classification` from `tcfg.classifier_kind`. Multi-label string labels resolved via the `label_names` map with a 1024-entry cap + dedup. Training Setup Panel renders `Head: num_labels=N, kind=...` instead of LoRA r/alpha for the classifier family.
+- **EBFT structured / strided + GDPO standard / length_normalized / margin loss kernels.** `apply_ebft_loss` and `apply_gdpo_loss` exit the v0.52.0 `NotImplementedError` stubs with finite-only-input guards and bool-rejected numeric params. `attach_ebft_compute_loss(trainer, tcfg)` (SFT) and `attach_gdpo_compute_loss(trainer, tcfg)` (DPO) wrap `Trainer.compute_loss` idempotently — re-attach is a no-op via a marker attribute on the wrapped method. Auto-attached when the corresponding `*_variant` field is set on `TrainingConfig`.
+- **gpt-oss `reasoning_effort` + `train_on_eot`.** `apply_reasoning_effort_prefix(messages, level)` injects `<|reasoning_effort|>{low,medium,high}<|/reasoning_effort|>` into the system turn (creates one if absent), returning a new list (caller's messages immutable). `build_assistant_only_labels(train_on_eot=True)` keeps the EOT/EOS token unmasked at the assistant-turn boundary so the model learns when to stop. Both gated to the SFT-family at config-load.
+- **+120 net new tests** (7722 → 7842) across `test_v0532.py`. Four review agents (python / code / security / tdd) ran; every CRITICAL / HIGH / MEDIUM / LOW finding fixed — separate `trust_remote_code` resolution for student vs teacher, idempotent attach hooks with regression tests, 1024-entry multi-label cap, `dpo_margin` defaults to `None` (not `0.0`) so missing values raise rather than silently zero, source-grep regression guards on the trainer-routing call sites use the full instantiation expression (no comment-only false-positives), Panel renders the classifier head instead of LoRA r/alpha.
+- **Local end-to-end CPU smoke** confirms both new wrappers train 2 steps with finite loss on `hf-internal-testing/tiny-random-gpt2`. Two real bugs surfaced and were fixed during the smoke (collator label padding + teacher / student device mismatch) — both have source-level regression guards in the test suite. ONNX export QA: pipeline integrity proven on tiny-gpt2; TinyLlama-1.1B full export is host-RAM-bound (documented in `tests/qa/v053_qa.md`).
 ## Why Soup?
@@ -367,6 +366,97 @@ soup init --template pretrain
 soup train
 ```
+## Knowledge Distillation
+Train a small student model to match a larger teacher's output distribution.
+```yaml
+base: HuggingFaceTB/SmolLM2-135M
+task: distill
+modality: text
+backend: transformers
+data:
+  train: ./data/chat.jsonl
+  max_length: 2048
+  chat_template: chatml
+training:
+  teacher_model: meta-llama/Llama-3.1-8B
+  distill_divergence: forward_kl   # kl | forward_kl | reverse_kl | js
+  distill_temperature: 2.0
+  epochs: 3
+  lr: 5e-5
+  quantization: 4bit               # quantizes student only
+```
+Loss = student CE + (T**2) × KL(teacher_logits / T  ||  student_logits / T).
+Teacher is loaded once, frozen via `requires_grad_(False)` + `.eval()`, and its
+inputs / logits are auto-bridged across CPU / CUDA devices.
+## Sequence Classification
+Train a classifier head on top of any base model — supports single-label,
+multi-label, and cross-encoder reranking.
+```yaml
+base: BAAI/bge-base-en-v1.5
+task: classifier              # or `reranker`, `cross_encoder`
+modality: text
+backend: transformers
+data:
+  train: ./data/labelled.jsonl   # rows: {"text": "...", "label": "spam"} or {"text": "...", "label": [0, 1, 0]}
+  max_length: 256
+training:
+  num_labels: 3
+  classifier_kind: single_label   # or `multi_label`
+  label_names: [ham, spam, promo] # required when labels are strings
+  epochs: 5
+  lr: 2e-5
+  batch_size: 32
+```
+Routes `classifier` / `reranker` / `cross_encoder` through
+`AutoModelForSequenceClassification`. Multi-label heads cap at 1024 entries per
+row, dedup via set conversion, and reject null bytes in label strings.
+## Reasoning Effort + EOT Control
+gpt-oss-style reasoning-effort control for instruction tuning.
+```yaml
+training:
+  reasoning_effort: high      # low | medium | high
+  train_on_eot: true          # do NOT mask the EOT/EOS token in the loss
+```
+`reasoning_effort` injects `<|reasoning_effort|>high<|/reasoning_effort|>` into
+the system turn (creating one if absent). `train_on_eot=True` makes the model
+learn when to stop generating by training on the trailing EOS token instead of
+masking it out. Both are gated to the SFT-family of tasks.
+## EBFT / GDPO Loss Variants
+Entropy-regularised SFT (`ebft_variant: structured | strided`) and generalised
+DPO (`gdpo_variant: standard | length_normalized | margin`) — both attach
+idempotently via `compute_loss` wrappers and auto-fire when the corresponding
+variant field is set on `TrainingConfig`.
+```yaml
+# SFT with EBFT structured
+training:
+  ebft_variant: structured
+  ebft_temperature: 1.0
+# DPO with GDPO length_normalized
+task: dpo
+training:
+  gdpo_variant: length_normalized
+  dpo_beta: 0.1
+```
 ## MoE Model Support
 Fine-tune Mixture of Experts models (Mixtral, Qwen3-30B-A3B, DeepSeek V3) with ScatterMoE LoRA — applies LoRA to both attention layers and expert FFN layers:
@@ -3732,6 +3822,40 @@ Cross-validator ordering picks the most actionable error: `quantization_aware='f
 `soup export --format torchao --quant-config <yaml>` is the planned PTQ export surface for `torchao.quantize_` + `save_pretrained`. Four schemes are allowlisted: `Int4WeightOnly`, `Int8DynActInt4`, `Float8DynActFloat8`, `NVFP4`. CASE-SENSITIVE — these are PyTorch class names and `torchao.quantize_` looks them up by exact name. Diverges from `--save-format` (lowercase-normalised) on purpose; documented at both validators.
+## Quant Menu II + Export Pipeline (v0.53.1)
+v0.53.1 lifts the v0.53.0 schema-only stubs to live wiring:
+```bash
+# Single-stage BNB-4bit merged checkpoint (no dequant/merge/requant)
+soup merge -a ./adapter -o ./merged_4bit --save-format 4bit
+# TorchAO PTQ export — closed per-scheme kwarg allowlist
+cat > q.yaml <<EOF
+scheme: Int4WeightOnly
+group_size: 32
+EOF
+soup export --model ./merged --format torchao --quant-config ./q.yaml --output ./out
+# Unsloth Dynamic 2.0 / IQ / Apple-ARM GGUF via llama.cpp imatrix
+soup export --model ./merged --format gguf-ud \
+    --gguf-flavour UD-Q4_K_XL \
+    --calibration-data ./calib.jsonl \
+    --output ./out/model.UD-Q4_K_XL.gguf
+# Deploy autopilot with live Quant-Lobotomy measurement
+soup deploy autopilot --target rtx-4090-24gb \
+    --base meta-llama/Llama-3.2-1B \
+    --measure --tasks ./eval_tasks.jsonl \
+    --measure-candidates 4bit,gptq,awq
+```
+Autopilot also detects pre-quantized bases automatically — `TheBloke/Llama-2-7B-Chat-GPTQ` is recommended `gptq` instead of stacking 4-bit on top. Detection runs against the base-model name regex AND any local `config.json`'s `quantization_config.quant_method`. Out-of-cwd model paths are silently skipped (soft-probe semantics).
+The advanced GGUF pipeline uses POSIX `O_NOFOLLOW` to defeat the TOCTOU race between the dispatch-time symlink check and the actual open of the calibration data — a crafted environment cannot race-swap the calibration file between validate and read.
+`soup deploy autopilot --measure` caches results at `~/.soup/deploy_autopilot_cache.json` keyed on `(base, profile, eval-tasks)`. Repeat invocations short-circuit; pass `SOUP_DEPLOY_AUTOPILOT_CACHE=<path>` to redirect (constrained to home / cwd / tempdir). The recommended candidate uses soft-fallback: first `OK` by insertion order, else the candidate with the smallest delta (least drop relative to its own baseline).
 ## Changelog
 See [GitHub Releases](https://github.com/MakazhanAlpamys/Soup/releases) for version history.

{soup_cli-0.53.0 → soup_cli-0.53.2}/README.md RENAMED Viewed

@@ -43,15 +43,14 @@ soup train
 Latest highlights only. Full history: [GitHub Releases](https://github.com/MakazhanAlpamys/Soup/releases).
-**v0.53.0 — Quant Menu II (UD GGUFs + KV cache + NVFP4 + LF parity + save formats)**: Unsloth Dynamic 2.0 GGUF ladder (UD-Q8_K_XL … UD-IQ1_M), IQ + Apple/ARM GGUF flavours, `kv_cache_type` (q8_0 / bf16 / f16 / fp8), FP8 attention, NVFP4 (Blackwell), explicit `unsloth_bnb_4bit`, BNB double-quant, ref/reward model quantization, merge-4bit save, TorchAO PTQ export schema — schema-only release; live llama.cpp imatrix + serve / merge / export wiring lands in v0.53.1.
+**v0.53.2 — Modality II live trainers**: Four v0.52.0 deferred stubs lifted into real, end-to-end-trainable wrappers — knowledge distillation, sequence classification, EBFT / GDPO loss kernels, and gpt-oss-style `reasoning_effort` system-prompt injection.
-- **Unsloth Dynamic 2.0 GGUF ladder.** 14-entry closed allowlist (UD-Q{8..2}_K_XL + UD-IQ{4_XS, 3_M, 3_XXS, 2_M, 2_XS, 2_XXS, 1_M, 1_S}) with `validate_ud_gguf_format` case-insensitive canonical normalisation. `--calibration-data <jsonl>` flag shape-validates now; cwd-containment + TOCTOU symlink rejection land at CLI dispatch in v0.53.1.
-- **IQ + Apple/ARM GGUF.** 12-entry IQ family (IQ1/2/3/4 — including IQ4_NL non-linear) + 10-entry Apple/ARM-friendly set (Q4_0_4_4 / Q4_NL / Q5_K_M / etc.) wrapped in `MappingProxyType` metadata.
-- **KV cache types.** New `training.kv_cache_type: q8_0 | bf16 | f16 | fp8`. FP8 is Hopper-only — cross-validator rejects `fp8` on the MLX backend; the SM-capability check fires at serve construction.
-- **FP8 attention + NVFP4 + native `unsloth_bnb_4bit`.** Three new bool flags. `fp8_attention=true` requires `quantization_aware='fp8'` and a non-MLX backend; `nvfp4=true` is gated to CUDA + text modality (Blackwell SM ≥ 12 check is runtime-only); `unsloth_bnb_4bit=true` requires `backend='unsloth'` + `quantization='4bit'`.
-- **LF / Axolotl parity.** `bnb_4bit_use_double_quant` (requires `quantization='4bit'`), `llm_int8` (asserts `quantization='8bit'` — distinct from v0.41.0 `load_in_8bit` aliasing), `quantize_ref_model` (extends v0.40.5 Quant Menu to the ref model on DPO/IPO/SimPO/ORPO/BCO/KTO/GRPO/PPO/preference), `quantize_reward_model` (PPO + reward_model tasks).
-- **Advanced save formats.** `soup merge --save-format 4bit | 4bit_forced` (single BNB-4bit merged checkpoint without dequant/merge/requant cycle) and `soup export --format torchao --quant-config <yaml>` (closed allowlist Int4WeightOnly / Int8DynActInt4 / Float8DynActFloat8 / NVFP4) — schema lands now; live writers land in v0.53.1.
-- **+157 net new tests** (7453 → 7610) across 154 tests in `test_v0530.py`. Five review agents (python / code / security / tdd / verification) ran in parallel; every CRITICAL / HIGH / MEDIUM / LOW finding fixed or documented: O(1) `_LOWER_INDEX` for GGUF lookup, ref-task set extended with GRPO + KTO, `_validate_v053_bool_fields` no longer silently coerces `None`, `requires_hopper` reads from spec metadata, `fp8_attention` validator order swapped so the `quantization_aware='fp8'` error fires first, `validate_calibration_data_path` / `validate_quant_config_path` docstrings name the exact controls v0.53.1 CLI dispatch must add.
+- **`soup train` with `task: distill`.** New `DistillTrainerWrapper`: student + frozen teacher both load via `AutoModelForCausalLM` (separate `trust_remote_code` resolution for each), KL / forward_KL / reverse_KL / JS divergence kernels scaled by `temperature**2` per the Hinton paper. Device-bridge: teacher inputs auto-move to the teacher's device, teacher logits move back onto the student's device before the KL kernel — survives HF Trainer's auto-CUDA promotion on a CPU-tagged run. `DataCollatorForSeq2Seq(label_pad_token_id=-100)` handles variable-length pre-tokenised loss-masked rows correctly.
+- **`soup train` with `task: classifier | reranker | cross_encoder`.** New `ClassifierTrainerWrapper`: `AutoModelForSequenceClassification` with `num_labels` and `label_names`, auto-routes `single_label_classification` / `multi_label_classification` from `tcfg.classifier_kind`. Multi-label string labels resolved via the `label_names` map with a 1024-entry cap + dedup. Training Setup Panel renders `Head: num_labels=N, kind=...` instead of LoRA r/alpha for the classifier family.
+- **EBFT structured / strided + GDPO standard / length_normalized / margin loss kernels.** `apply_ebft_loss` and `apply_gdpo_loss` exit the v0.52.0 `NotImplementedError` stubs with finite-only-input guards and bool-rejected numeric params. `attach_ebft_compute_loss(trainer, tcfg)` (SFT) and `attach_gdpo_compute_loss(trainer, tcfg)` (DPO) wrap `Trainer.compute_loss` idempotently — re-attach is a no-op via a marker attribute on the wrapped method. Auto-attached when the corresponding `*_variant` field is set on `TrainingConfig`.
+- **gpt-oss `reasoning_effort` + `train_on_eot`.** `apply_reasoning_effort_prefix(messages, level)` injects `<|reasoning_effort|>{low,medium,high}<|/reasoning_effort|>` into the system turn (creates one if absent), returning a new list (caller's messages immutable). `build_assistant_only_labels(train_on_eot=True)` keeps the EOT/EOS token unmasked at the assistant-turn boundary so the model learns when to stop. Both gated to the SFT-family at config-load.
+- **+120 net new tests** (7722 → 7842) across `test_v0532.py`. Four review agents (python / code / security / tdd) ran; every CRITICAL / HIGH / MEDIUM / LOW finding fixed — separate `trust_remote_code` resolution for student vs teacher, idempotent attach hooks with regression tests, 1024-entry multi-label cap, `dpo_margin` defaults to `None` (not `0.0`) so missing values raise rather than silently zero, source-grep regression guards on the trainer-routing call sites use the full instantiation expression (no comment-only false-positives), Panel renders the classifier head instead of LoRA r/alpha.
+- **Local end-to-end CPU smoke** confirms both new wrappers train 2 steps with finite loss on `hf-internal-testing/tiny-random-gpt2`. Two real bugs surfaced and were fixed during the smoke (collator label padding + teacher / student device mismatch) — both have source-level regression guards in the test suite. ONNX export QA: pipeline integrity proven on tiny-gpt2; TinyLlama-1.1B full export is host-RAM-bound (documented in `tests/qa/v053_qa.md`).
 ## Why Soup?
@@ -276,6 +275,97 @@ soup init --template pretrain
 soup train
 ```
+## Knowledge Distillation
+Train a small student model to match a larger teacher's output distribution.
+```yaml
+base: HuggingFaceTB/SmolLM2-135M
+task: distill
+modality: text
+backend: transformers
+data:
+  train: ./data/chat.jsonl
+  max_length: 2048
+  chat_template: chatml
+training:
+  teacher_model: meta-llama/Llama-3.1-8B
+  distill_divergence: forward_kl   # kl | forward_kl | reverse_kl | js
+  distill_temperature: 2.0
+  epochs: 3
+  lr: 5e-5
+  quantization: 4bit               # quantizes student only
+```
+Loss = student CE + (T**2) × KL(teacher_logits / T  ||  student_logits / T).
+Teacher is loaded once, frozen via `requires_grad_(False)` + `.eval()`, and its
+inputs / logits are auto-bridged across CPU / CUDA devices.
+## Sequence Classification
+Train a classifier head on top of any base model — supports single-label,
+multi-label, and cross-encoder reranking.
+```yaml
+base: BAAI/bge-base-en-v1.5
+task: classifier              # or `reranker`, `cross_encoder`
+modality: text
+backend: transformers
+data:
+  train: ./data/labelled.jsonl   # rows: {"text": "...", "label": "spam"} or {"text": "...", "label": [0, 1, 0]}
+  max_length: 256
+training:
+  num_labels: 3
+  classifier_kind: single_label   # or `multi_label`
+  label_names: [ham, spam, promo] # required when labels are strings
+  epochs: 5
+  lr: 2e-5
+  batch_size: 32
+```
+Routes `classifier` / `reranker` / `cross_encoder` through
+`AutoModelForSequenceClassification`. Multi-label heads cap at 1024 entries per
+row, dedup via set conversion, and reject null bytes in label strings.
+## Reasoning Effort + EOT Control
+gpt-oss-style reasoning-effort control for instruction tuning.
+```yaml
+training:
+  reasoning_effort: high      # low | medium | high
+  train_on_eot: true          # do NOT mask the EOT/EOS token in the loss
+```
+`reasoning_effort` injects `<|reasoning_effort|>high<|/reasoning_effort|>` into
+the system turn (creating one if absent). `train_on_eot=True` makes the model
+learn when to stop generating by training on the trailing EOS token instead of
+masking it out. Both are gated to the SFT-family of tasks.
+## EBFT / GDPO Loss Variants
+Entropy-regularised SFT (`ebft_variant: structured | strided`) and generalised
+DPO (`gdpo_variant: standard | length_normalized | margin`) — both attach
+idempotently via `compute_loss` wrappers and auto-fire when the corresponding
+variant field is set on `TrainingConfig`.
+```yaml
+# SFT with EBFT structured
+training:
+  ebft_variant: structured
+  ebft_temperature: 1.0
+# DPO with GDPO length_normalized
+task: dpo
+training:
+  gdpo_variant: length_normalized
+  dpo_beta: 0.1
+```
 ## MoE Model Support
 Fine-tune Mixture of Experts models (Mixtral, Qwen3-30B-A3B, DeepSeek V3) with ScatterMoE LoRA — applies LoRA to both attention layers and expert FFN layers:
@@ -3641,6 +3731,40 @@ Cross-validator ordering picks the most actionable error: `quantization_aware='f
 `soup export --format torchao --quant-config <yaml>` is the planned PTQ export surface for `torchao.quantize_` + `save_pretrained`. Four schemes are allowlisted: `Int4WeightOnly`, `Int8DynActInt4`, `Float8DynActFloat8`, `NVFP4`. CASE-SENSITIVE — these are PyTorch class names and `torchao.quantize_` looks them up by exact name. Diverges from `--save-format` (lowercase-normalised) on purpose; documented at both validators.
+## Quant Menu II + Export Pipeline (v0.53.1)
+v0.53.1 lifts the v0.53.0 schema-only stubs to live wiring:
+```bash
+# Single-stage BNB-4bit merged checkpoint (no dequant/merge/requant)
+soup merge -a ./adapter -o ./merged_4bit --save-format 4bit
+# TorchAO PTQ export — closed per-scheme kwarg allowlist
+cat > q.yaml <<EOF
+scheme: Int4WeightOnly
+group_size: 32
+EOF
+soup export --model ./merged --format torchao --quant-config ./q.yaml --output ./out
+# Unsloth Dynamic 2.0 / IQ / Apple-ARM GGUF via llama.cpp imatrix
+soup export --model ./merged --format gguf-ud \
+    --gguf-flavour UD-Q4_K_XL \
+    --calibration-data ./calib.jsonl \
+    --output ./out/model.UD-Q4_K_XL.gguf
+# Deploy autopilot with live Quant-Lobotomy measurement
+soup deploy autopilot --target rtx-4090-24gb \
+    --base meta-llama/Llama-3.2-1B \
+    --measure --tasks ./eval_tasks.jsonl \
+    --measure-candidates 4bit,gptq,awq
+```
+Autopilot also detects pre-quantized bases automatically — `TheBloke/Llama-2-7B-Chat-GPTQ` is recommended `gptq` instead of stacking 4-bit on top. Detection runs against the base-model name regex AND any local `config.json`'s `quantization_config.quant_method`. Out-of-cwd model paths are silently skipped (soft-probe semantics).
+The advanced GGUF pipeline uses POSIX `O_NOFOLLOW` to defeat the TOCTOU race between the dispatch-time symlink check and the actual open of the calibration data — a crafted environment cannot race-swap the calibration file between validate and read.
+`soup deploy autopilot --measure` caches results at `~/.soup/deploy_autopilot_cache.json` keyed on `(base, profile, eval-tasks)`. Repeat invocations short-circuit; pass `SOUP_DEPLOY_AUTOPILOT_CACHE=<path>` to redirect (constrained to home / cwd / tempdir). The recommended candidate uses soft-fallback: first `OK` by insertion order, else the candidate with the smallest delta (least drop relative to its own baseline).
 ## Changelog
 See [GitHub Releases](https://github.com/MakazhanAlpamys/Soup/releases) for version history.

{soup_cli-0.53.0 → soup_cli-0.53.2}/SECURITY.md RENAMED Viewed

@@ -9,14 +9,15 @@ We provide security updates for the following versions:
 - **Versions older than 3 minor versions:** No support
 Example:
-- v0.53.0 -- Full support (latest)
+- v0.53.2 -- Full support (latest)
+- v0.53.1 -- Full support
+- v0.53.0 -- Full support
 - v0.52.0 -- Full support
 - v0.51.0 -- Full support
 - v0.50.0 -- Full support
-- v0.49.0 -- Full support
+- v0.49.0 -- Bug-fix support only
 - v0.48.0 -- Bug-fix support only
-- v0.47.0-v0.47.x -- Bug-fix support only
-- v0.46.x and below -- No support
+- v0.47.x and below -- No support
 ## Reporting a Vulnerability
@@ -147,6 +148,9 @@ No known critical vulnerabilities in current releases.
 - **v0.32.0 — Training Stability & Auto-Tuning**: `--find-lr-output` containment via shared `utils/paths.is_under_cwd` (prevents writes outside cwd); `save_lr_finder_report` rejects NaN / Infinity floats in `lrs` / `losses` and serialises with `allow_nan=False` (keeps the report parser-safe); `compute_lr_schedule` rejects non-positive `start_lr`, inverted ranges, and `num_steps` outside `[2, 10_000]`; `pick_mixed_precision` rejects empty / null-byte / >200-char model names and resolves multi-version quirks (`qwen2.5` vs `qwen2`, `phi-3.5` vs `phi-3`) by longest-substring-first iteration so an added family can never accidentally make a more-specific entry dead code; `compute_warmup_steps` clamps to `[10, 1000]` with a `ratio==0.0` short-circuit matching HF Trainer's "no warmup" convention; `SpikeRecoveryStrategy` is `@dataclass(frozen=True)` (post-construction mutation cannot bypass validation), `max_attempts ∈ [1, 10]`, `lr_decay ∈ (0, 1)`, `min_lr > 0`; cross-validator `_validate_spike_recovery_requires_watchdog` rejects `loss_spike_recovery=true, loss_watchdog=false` at config-load (fails fast instead of never triggering); `convergence_window ∈ [5, 10_000]`, `convergence_rel_tol ∈ (0, 1]`, `recommend_action` reuses `detect_plateau` so plateau heuristic stays single-source-of-truth; `GradAccumMonitor.recommend()` caps doubled `accum` at `MAX_ACCUM=1024` so a runaway advisory loop cannot blow up DataLoader prefetch; `generate_config` validates BOTH the YAML output path AND the embedded `decisions["output"]` field via `is_under_cwd` (closes the gap where a crafted `decisions["output"]="../../etc"` would have silently propagated into the rendered YAML)
 - **v0.34.0 — Observability & Dev UX**: `.crash` bundle generator (`utils/crash.py`) recursively redacts `hf_*` / `sk-*` / `Bearer …` token-shaped strings in any captured `config` and metric tail before serialisation, so a `.crash` file shared on a public GitHub issue cannot leak credentials; `output_dir` is reduced to `os.path.basename` so `$HOME` doesn't leak; `write_crash_bundle` uses `os.path.realpath + commonpath` for cwd containment (Windows-safe; raises `ValueError` not `PermissionError` so callers cannot silently swallow with `except OSError`); filename appends `secrets.token_hex(4)` so two crashes in the same UTC second don't collide; bundle truncated to `MAX_BUNDLE_BYTES=1_000_000`. `train.py` crash-write surfaces failures to the user (no silent missing-bundle). `profiling.py` `resolve_trace_path` rejects empty / `.` / `..` / `/` / `\\` / null-byte `run_id` (closes the `output_dir/profiles/../trace.json` escape) and uses `os.path.realpath + is_under_cwd`; profiles dir is created only on successful torch import (no stale empty dirs on torch-less CI). `tracker.get_run` LIKE-prefix match escapes `%` / `_` / `\\` and uses `ESCAPE '\\'` so a crafted `run_id` cannot widen the match (mirrors v0.26.0 registry policy). Lazy schema migration (`_ensure_schema`) tolerates the "duplicate column" race when two CLI processes start simultaneously on a fresh DB (fork-based multi-GPU training, TUI auto-refresh). `runs.py show/replay/clean` switched user `run_id` rendering to `markup_escape` and switched `clean` containment from broken `Path.resolve() + relative_to()` to project-standard `os.path.realpath + is_under_cwd`. `tui_app.py` lazy-imports `ExperimentTracker` and `markup_escape`s every DB-sourced string before passing into Textual widgets so a crafted base_model / experiment_name cannot inject `[bold red]…[/]` markup. `run_cost.estimate_run_cost_usd` rejects `bool` in `num_gpus` (bool is a subclass of int — same defence as v0.30.0 `Candidate.__post_init__`); duration clamped to `[0, 1 year]`; unknown GPU returns `None` so callers render `—` instead of fabricating `$0.00`. `log_level.parse_log_level` rejects non-string + null-byte input.
 - **v0.33.0 — Live Wire**: RLVR `code_exec_reward` adds OS-level isolation (Linux best-effort `os.unshare(CLONE_NEWUSER|CLONE_NEWNET|CLONE_NEWPID)`, macOS `sandbox-exec` with default-deny `MACOS_SANDBOX_PROFILE` narrowed to a 3-name `mach-lookup` allowlist to prevent DNS / NSURLSession bypass of `(deny network*)`); `prune_checkpoints` switches to TOCTOU-safe `os.lstat + S_ISLNK` + `shutil.rmtree(onerror=_abort_on_symlink)` so a symlink encountered mid-walk aborts rather than escapes; `run_gate` wraps each task scorer in a typed `try/except` so backend failures produce `score=None, error=str(exc)` (never silent `score=1.0`); `_parse_judge_url` removes the bare `http://` catch-all (defence-in-depth after the Pydantic GateTask validator); `soup can run` requires `--yes` or explicit consent callback and raises `ValueError` (not `PermissionError`, which is an `OSError` subclass that broad `except` blocks would swallow); GGUF `rglob` result for ollama deploy is `realpath+commonpath` checked against extract_dir (prevents symlink escape from a crafted can); `DeployTarget.path` validator normalises mixed `\\`/`/` separators before splitting (closes a Windows `..` bypass); `CAN_FORMAT_VERSION` 1→2 (additive — v1 still loads); `soup can publish` validates `repo_id` via `utils/hf.validate_repo_id`, resolves token via `resolve_token`, sanitises commit messages (first-line, 200-char cap), uses HTTPS-only HfApi; `_write_spike_recovery_hint` adds `is_under_cwd` containment check on `args.output_dir` from raw HF `TrainingArguments`; `lookup_entry_by_output_dir` emits `ResourceWarning` when 1000-row scan limit is hit (no silent miss); `CrossDocCollator` no longer mutates input feature dicts (HF Dataset rows are cached and reused — mutation broke subsequent batches); `Candidate` rejects `bool` in `score`/`latency_ms` (was sneaking past `int` isinstance check); `evaluate_candidate` latency mean now divides by *completed* prompts (excludes crashed) so a broken candidate isn't artificially fast; `auto_quant.run_auto_quant_picker` soft-falls-back to highest-scored candidate when no candidate clears `min_score` (server still binds); `build_logits_processors` returns `[]` when neither `outlines` nor `lm-format-enforcer` is installed (server degrades to free-form rather than 500); MII server uses loopback-only CORS, max_tokens cap [1, 16384], stream rejection, generic 500 with no stack-trace leak; `os.execvp` auto-reexec uses list args (no shell), all forwarded flags pre-validated; `cleanup_extract_dir` uses `os.path.commonpath` (Windows-safe) instead of `startswith`; `_run_subprocess` catches `TimeoutExpired` and returns rc=124 (coreutils convention) instead of an unhandled traceback; new `eval_results` and `tensorrt` artifact kinds in `RegistryStore._VALID_KINDS`
+- **v0.53.2 — Modality II live trainers**: lifts four v0.52.0 deferred stubs (#137, #135, #133, #132) into real trainer wrappers while keeping the project's hardening invariants. (#137 reasoning_effort + train_on_eot) `apply_reasoning_effort_prefix` follows v0.41.0 / v0.51.0 validator policy (bool-first, null-byte / empty / oversize / case-insensitive normalisation); messages list is treated as immutable (returns a new list — matches v0.33.0 #47 `CrossDocCollator` policy). `build_assistant_only_labels(train_on_eot=True)` reuses the existing v0.36.0 mask infrastructure — same null-byte / max_length / bool guards. (#135 EBFT / GDPO) `apply_ebft_loss` and `apply_gdpo_loss` enforce **finite-only inputs** (`torch.isfinite` guard on tensor inputs + `math.isfinite` on scalar params) — NaN / Inf would silently corrupt training otherwise. `dpo_margin` defaults to `None` (not `0.0`) per security-review M3 fix: silent zeroing in the `margin` variant when the operator forgot to set the margin would have looked like training success but produced a meaningless gradient. Both attach hooks (`attach_ebft_compute_loss`, `attach_gdpo_compute_loss`) are **idempotent** via a marker attribute on the wrapped method — re-attach is a no-op and a dedicated test class verifies the invariant (code-review M2 fix). (#133 DistillTrainerWrapper) **Separate trust_remote_code resolution for student and teacher** (security-review L2 fix): `model_requires_trust_remote_code(teacher)` runs independently of the student probe, otherwise a malicious teacher could piggy-back on the student's opt-in. Teacher is loaded with `device_map="cpu" if device == "cpu" else "auto"`, frozen via `requires_grad_(False)` + `.eval()` immediately after load — never participates in gradient computation. `_DistillTrainer.compute_loss` device-bridge: `teacher_device = next(teacher_ref.parameters()).device`, `teacher_inputs.to(teacher_device)` before teacher forward, `teacher_logits.to(student_logits.device)` before KL kernel — defends against HF Trainer's auto-CUDA promotion silently producing cross-device `index_select` crashes. **DataCollator correctness fix** (surfaced during Wave 3 CPU smoke): `DataCollatorForLanguageModeling` does NOT pad pre-tokenised `labels` — switched to `DataCollatorForSeq2Seq(label_pad_token_id=-100, padding=True)` so variable-length loss-masked rows batch correctly without runtime crash. (#132 ClassifierTrainerWrapper) `_normalise_label` caps multi-label entries at **1024 per row** (matches v0.52.0 schema cap; security-review HIGH fix — unbounded would allow OOM via crafted JSONL), dedups via set conversion, validates `label_names` map entries reject null bytes + empty strings. `problem_type` is set explicitly from `tcfg.classifier_kind` (not silently inferred from labels) so a multi-label-shaped row in a single-label config raises rather than mis-trains. Training Setup Panel renders `Head: num_labels=N, kind=...` for classifier-family tasks instead of meaningless LoRA r/alpha lines (code-review L3 cosmetic fix — Panel no longer mis-represents what the wrapper is doing). (Cross-cutting) `commands/train.py` task routing branches added for `distill` and `classifier` / `reranker` / `cross_encoder` — source-grep regression guards in the test suite use the **full instantiation expression** `DistillTrainerWrapper(cfg, **trainer_kwargs)` so comment-only mentions of the class name cannot satisfy the regression check (TDD-review hardening). Both new factories (`build_distill_trainer`, `build_classifier_trainer`) reject unknown kwargs via Python signature contract — dedicated `pytest.raises(TypeError)` tests cover the path (TDD-review L1 fix). Test surface: 1 new test file (`test_v0532.py`) carrying 120 new tests across 14 classes. Known limitations: (1) `#71` TinyLlama-1.1B-LoRA full ONNX export is host-RAM-bound (≥16 GB free RAM needed for the `onnx.load(load_external_data=True)` post-process step); tiny-gpt2 smoke proves pipeline integrity — recorded in `tests/qa/v053_qa.md`. (2) Distillation supports same-tokenizer pairs only — cross-tokenizer (Llama → Qwen) needs a projection or sequence-level loss, out of scope. (3) Classifier wrapper has no LoRA path — full head + base training; LoRA classifier finetuning is a follow-up. (4) EBFT / GDPO auto-attach only fires when the corresponding `*_variant` field is set; manual `attach_*` invocation from custom training loops is supported and idempotent. (5) `reasoning_effort` injection happens at data-prep time inside `build_format_row`; changing the level between runs requires re-rendering the dataset. (v0.53.2)
+- **v0.53.1 — Quant Menu II + Export pipeline live**: lifts six v0.53.0 deferred stubs to live wiring while keeping the project's hardening invariants. New shared helper `soup_cli/utils/paths.enforce_under_cwd_and_no_symlink` consolidates the v0.33.0 #22 TOCTOU pattern (cwd containment via `os.path.realpath + os.path.commonpath` + `os.lstat + S_ISLNK` rejection) — used by `commands/merge.py`, `commands/export.py`, `utils/save_formats.py`, and `utils/gguf_quant.py` so the same boundary check fires at every CLI dispatch point. `merge_4bit` and `export_torchao` (`utils/save_formats.py`): cwd containment + symlink rejection on `merged_dir` / `model_dir` / `output_dir`; `load_quant_config` enforces `yaml.safe_load` only + 256 KB cap + extension allowlist (`.yaml`/`.yml`); **per-scheme closed kwarg allowlist** rejects dunder keys + unknown params before the splat into `torchao.<scheme>Config(**kwargs)` (security-review HIGH fix — `Int4WeightOnly` accepts `{group_size, inner_k_tiles}`, `NVFP4` accepts nothing extra). Corrected BNB-4bit skip-modules kwarg name from `llm_int8_skip_modules` to `bnb_4bit_skip_modules`. `export_advanced_gguf` (`utils/gguf_quant.py`): all three subprocess invocations (`convert_hf_to_gguf.py`, `llama-imatrix`, `llama-quantize`) use argv-list form with no shell, 30-min timeout, `sys.executable` for the convert script; `_run_convert_to_f16` realpath-verifies that `convert_hf_to_gguf.py` stays inside the `llama_cpp_dir` after resolution (security-review HIGH M5 fix — defends against a symlinked script escape). `_prepare_calibration_text` strips null bytes, collapses newlines to spaces, caps per-line at 8 KB + total at 50 MB (security-review M1), uses POSIX `O_NOFOLLOW` to refuse symlinks at the kernel level (security-review M3 — closes the TOCTOU window between the dispatch-time check and the actual `open()`); requires ≥ 1 usable row before invoking imatrix. `_safe_stderr` Rich-markup-escapes subprocess stderr before embedding in `RuntimeError` (security-review L4) so a crafted llama.cpp error cannot inject `[red]...[/]` into the operator-facing panel. UD-prefix stripped from flavour arg before passing to llama-quantize (`UD-Q4_K_XL` → `Q4_K_XL`). Calibration data path containment + symlink rejection fires at CLI dispatch in `commands/export.py::_export_gguf_advanced`. `detect_prequantized_format_from_path` (`autopilot/decisions.py`): cwd containment + `os.lstat + S_ISLNK` on `<model_dir>/config.json` (security-review HIGH H2 — out-of-cwd model paths silently return `None` to preserve soft-probe semantics so HF Hub repo IDs aren't rejected); null-byte rejection on `model_dir`. `commands/merge.py`: early `is_under_cwd(output)` check at CLI boundary (security-review M4) — consistent with the v0.20.0 / v0.40.2 containment-at-the-boundary policy. `deploy_measure.py`: cache file written atomically via `tempfile.mkstemp` + `os.replace` with `os.lstat + S_ISLNK` rejection on BOTH `load_cache` and `save_cache` (security-review M2 — was missing on the load side); env override `SOUP_DEPLOY_AUTOPILOT_CACHE` rejects null bytes + control chars before any path resolution and confines the override to home / cwd / tempdir; cache file gets best-effort 0o600 perms on POSIX (matches v0.26.0 registry.db policy); 1 MB cache-file cap. `_DEPLOY_MEASURE_BEFORE_GEN` / `_AFTER_FACTORY` module-level callables are documented as a non-public escape hatch (deferred until v0.46.1 live model-loader). Test surface: 4 new test files (`test_v0531_82.py` / `test_v0531_109.py` / `test_v0531_139.py` / `test_v0531_142.py`) carrying 112 new tests covering happy paths + failure modes + every security guard (POSIX symlink rejection, per-scheme kwarg allowlist, TOCTOU defences, `_MAX_CANDIDATES` cap, MINOR-verdict band, mxfp4 word boundary, BNB-alias detection, render-table markup escape). Known limitations: (1) `_DEPLOY_MEASURE_BEFORE_GEN` / `_AFTER_FACTORY` are a stop-gap until v0.46.1 ships first-party transformers / vLLM generator factories. (2) `#70` GGUF and `#72` AWQ/GPTQ manual QA smokes remain pending — require CUDA + llama.cpp build; recipes scripted in `tests/qa/v053_qa.md`. (3) BNB-4bit merge + TorchAO PTQ live happy-path is mock-covered only — CPU-only CI cannot execute the real BNB / torchao kernels. (4) `_prepare_calibration_text` accepts JSONL with `text` / `prompt` / `content` aliases + raw text fallback; other formats (parquet / markdown) are out of scope. (5) Cache key truncates `base_sha` to 16 hex chars at the call site (collision probability ≈ 1-in-2³² across ~4 billion entries). (6) Pre-quantized detection is heuristic — name regex + local `config.json` probe; HF Hub repo IDs without local download fall back to name-only matching. (7) `enforce_under_cwd_and_no_symlink` checks only the leaf path; deeper traversal relies on the per-file leaf check at each site. (v0.53.1)
 - **v0.53.0 — Quant Menu II (UD GGUFs + KV cache + NVFP4 + LF parity + save formats)**: 6 schema-only Parts; live wiring deferred to v0.53.1. Every new validator follows the project's established hardening policy: closed allowlists (`UD_GGUF_FORMATS`, `IQ_GGUF_FORMATS`, `APPLE_ARM_GGUF_FORMATS`, `KV_CACHE_TYPES`, `MERGE_SAVE_FORMATS`, `TORCHAO_PTQ_SCHEMES`) as `frozenset` so registries cannot be mutated; `_GGUF_METADATA` / `_KV_CACHE_METADATA` / `_MERGE_METADATA` / `_TORCHAO_METADATA` wrapped in `MappingProxyType`; `_LOWER_INDEX` for GGUF lookup is also `MappingProxyType`-wrapped (replaces O(N) walk with O(1) lookup — code-review MEDIUM fix). All string validators reject non-string / bool / empty / null-byte / oversize with case-insensitive normalisation (matches v0.41.0 `validate_optimizer_name` / v0.51.0 `validate_hub_name` policy); `validate_torchao_scheme` is INTENTIONALLY case-sensitive (PyTorch class names — `torchao.quantize_` looks them up by exact name) with the asymmetry documented at both validators (security-review LOW fix). `validate_calibration_data_path` + `validate_quant_config_path` are shape-only at this release; their docstrings name the exact controls a v0.53.1 CLI dispatch contributor MUST add (`os.path.realpath` + `os.path.commonpath` cwd containment, `os.lstat` + `stat.S_ISLNK` symlink rejection before `open()`, existence check, `yaml.safe_load`-only for quant configs) — closes the security-review MEDIUM "documentation gap at trust boundary" finding. SoupConfig cross-validators: `_validate_fp8_attention_compat` (requires `quantization_aware='fp8'` BEFORE the MLX gate so the more actionable error fires first — code-review MEDIUM fix); `_validate_nvfp4_compat` (non-MLX + `modality='text'`; Blackwell SM ≥ 12.0 runtime check fires at trainer construction); `_validate_unsloth_bnb_4bit_compat` (requires `backend='unsloth'` + `quantization='4bit'`); `_validate_bnb_4bit_double_quant` (requires `quantization='4bit'` — rejects `none`/`8bit`/Quant-Menu); `_validate_llm_int8_alias` (asserts `quantization='8bit'`, deliberately disjoint from v0.41.0 `load_in_8bit` aliasing); `_validate_quantize_ref_reward` (extended ref-task allowlist `{dpo, ipo, simpo, orpo, bco, kto, preference, grpo, ppo}` per code-review HIGH fix — first-cut omitted grpo + kto + ppo which all have reference policies); `_validate_kv_cache_type_supported` (only `fp8` gated to non-MLX in v0.53.0; q8_0/bf16/f16 pass-through documented at validator site so v0.53.1 contributor sees the gate immediately). `requires_hopper` reads from `_KV_CACHE_METADATA` spec — single source of truth so adding a Hopper-only type means flipping the spec field only (code-review MEDIUM fix). All 7 new bool fields share `_validate_v053_bool_fields` `field_validator(mode='before')` that rejects bool-as-int with explicit `TypeError("v0.53.0 flag must be bool")` and passes `None` through to Pydantic's `default=False` rather than silently coercing it (python-review MEDIUM fix — `fp8_attention: null` in YAML now surfaces as a "valid boolean" ValidationError instead of masquerading as `False`). Known limitations: (1) Every live wiring is deferred to v0.53.1 — `export_advanced_gguf`, `apply_kv_cache_type`, `apply_fp8_attention`, `apply_nvfp4`, `merge_4bit`, `export_torchao` all raise `NotImplementedError` with explicit `v0.53.1` markers. (2) `validate_calibration_data_path` + `validate_quant_config_path` are shape-only this release; CLI dispatch in v0.53.1 MUST add cwd-containment + TOCTOU symlink rejection. (3) `kv_cache_type` MLX permissive policy: only `fp8` is rejected, the other three pass-through; v0.53.1 may narrow further. (4) Hopper SM-capability check is runtime-only — schema accepts `kv_cache_type='fp8'` + `fp8_attention=true` without GPU probe. (5) NVFP4 + Blackwell (SM ≥ 12.0) check is runtime-only. (6) `bnb_4bit_use_double_quant` only gated against `quantization`, not against `quantization_aware` — the latter combination is already rejected by v0.28.0 Quant-Menu + QAT cross-validator. (7) `llm_int8` is an assertion not an aliaser — diverges from v0.41.0 `load_in_8bit` design on purpose. (v0.53.0)
 - **v0.52.0 — Modality II (TTS + Distillation + BitNet + EBFT-GDPO + MoE quant + reasoning_effort)**: 7 schema-only Parts; live trainer / loss / export wiring deferred to v0.52.1. Every new validator follows the project's established hardening policy: closed allowlist (`SUPPORTED_TTS_FAMILIES`, `CLASSIFIER_TASKS`, `DIVERGENCES`, `BITNET_QUANT_FORMATS`, `BITNET_EXPORT_FORMATS`, `EBFT_VARIANTS`, `GDPO_VARIANTS`, `MOE_EXPERT_QUANT_FORMATS`, `REASONING_EFFORT_LEVELS`, per-family `_FAMILY_EMOTIONS`) wrapped in `frozenset` / `MappingProxyType` so registries cannot be mutated at runtime; `validate_*` helpers reject non-string / bool / empty / null-byte / oversize / unknown inputs with case-insensitive normalisation (matches v0.41.0 `validate_optimizer_name` / v0.50.0 `grpo_variant` / v0.51.0 `hub` policy); float validators (`validate_distill_temperature`, `validate_ebft_temperature`) gate on `math.isfinite` to reject NaN AND `±inf` (matches v0.32.0 `save_lr_finder_report` policy). `field_validator(mode="before")` on `num_labels` (security-review HIGH fix) rejects `bool` before Pydantic's `ge=1` coercion silently treats `True` as `1`. Field validator on `reasoning_effort` routes through the shared `validate_reasoning_effort` helper so the schema and runtime validator agree on what's accepted (security-review MEDIUM fix). SoupConfig cross-validators: `_validate_tts_compat` (requires `task='tts'` + `modality='audio_out'` + non-MLX backend; per-family emotion allowlist via `_FAMILY_EMOTIONS`), `_validate_classifier_compat` (with lazy-import early-return — code-review HIGH fix — so SFT hot path doesn't pay import cost; requires `num_labels` on classifier tasks; rejects classifier-only fields outside the task family with named offenders), `_validate_distill_compat` (requires `teacher_model` when `task='distill'`; rejects distill-only fields outside the task), `_validate_bitnet_compat` (gates to non-MLX + text-modality + task ∈ {sft, pretrain, dpo}), `_validate_ebft_compat` + `_validate_gdpo_compat` (task-family gates), `_validate_moe_expert_quant_compat` (requires `moe_lora=true` to prevent silent no-op), `_validate_reasoning_effort_task_gate` (code-review HIGH fix — rejects `reasoning_effort` + `train_on_eot` outside the SFT-family task set with named offenders; mirrors v0.50.0 GRPO stability task-gate policy). Public `DIVERGENCES` frozenset is derived from `_DIVERGENCE_ALIASES` so adding a new alias updates both the accepted-input set and the error message in lockstep (review fix LOW). `validate_bitnet_export` enforces a closed-allowlist canonical form for `soup export --format <bitnet|tq1_0>`, both of which are CLI-registered with a yellow advisory panel + `Exit(0)` stub (no artifact written until v0.52.1 — the format flag is accepted so existing scripts pinned to v0.52.0 will not break). 6 new YAML recipes appended (5 TTS + Falcon-E BitNet) — every entry is exercised by `tests/test_v0520.py` for `load_config_from_string` round-trip + `_no_null_or_whitespace` model-id check (mirrors v0.51.0 review-fix LOW). Known limitations: (1) Every live trainer / loss / export path is deferred to v0.52.1 — `build_tts_trainer`, `build_classifier_trainer`, `build_distill_trainer`, `build_bitnet_trainer`, `export_bitnet_gguf`, `apply_ebft_loss`, `apply_gdpo_loss`, `apply_moe_expert_quant` all raise `NotImplementedError` with explicit `v0.52.1` markers; schema accepts every new task / quant / variant + the CLI stub for `soup export --format bitnet/tq1_0` prints a deferred-advisory panel and exits 0. (2) `modality='audio_out'` accepted on non-TTS tasks — design choice this release so future audio-output tasks (ASR / V2A) can reuse it; today's runtime trainer dispatch must check `task == 'tts'` to avoid silent routing into the deferred TTS path. (3) Oute emotion allowlist is a tight 6-entry subset (neutral / happy / sad / angry / calm / excited); operators wanting custom emotions will need a v0.52.1 patch to extend `OUTE_EMOTIONS`. (4) `is_bitnet_model` is best-effort heuristic over name prefixes (`bitnet`, `falcon-e`, `1bitllm`, `onebit`); a BitNet checkpoint published under an org without any of those prefixes returns False. This is detection, not gating — the trainer wrapper (v0.52.1) loads the model regardless of the heuristic. (5) `quantization='bitnet_1.58'` gated to task ∈ {sft, pretrain, dpo} — extending to GRPO / PPO / RewardModel requires upstream onebitllms RL kernels not yet shipped. (v0.52.0)
 - **v0.51.0 — Model Catalog Expansion + Alternative Model Hubs**: 5 release Parts. New `soup_cli/utils/hubs.py` ships closed allowlist `SUPPORTED_HUBS = frozenset({hf, modelscope, modelers})` + three `MappingProxyType`-wrapped registries (`_HUB_DEFAULT_ENDPOINTS` / `_HUB_ENDPOINT_ENV` / `_HUB_PACKAGE`) so the registry cannot be mutated at runtime (matches v0.36.0 `_REGISTRY` policy). `validate_hub_name` rejects non-string / bool / empty / null-byte / >32-char / unknown with case-insensitive normalisation (matches v0.41.0 `validate_optimizer_name` policy). `validate_hub_endpoint` is the SSRF kernel — full parity with v0.29.0 `utils/hf.resolve_endpoint`: scheme allowlist (`http`/`https` only), null-byte rejection, **control-character / CRLF rejection** added in v0.51.0 as a defence-in-depth review fix (defends against URL-as-HTTP-header injection if the URL ever flows into a raw HTTP client), `0.0.0.0` explicitly rejected, plain HTTP only for loopback `{localhost, 127.0.0.1, ::1}`, RFC1918 / link-local / cloud-metadata IPs (169.254.x) rejected via `ipaddress.ip_address` for plain HTTP. `resolve_endpoint(hub, *, env=None)` looks up the per-hub env var (`HF_ENDPOINT` / `MODELSCOPE_ENDPOINT` / `MODELERS_ENDPOINT`) and runs the override through `validate_hub_endpoint`; default endpoints are baked-in HTTPS URLs. `is_hf` rejects `bool` explicitly (review fix HIGH — bool is a subclass of int and would have silently fallen through `hub.lower() == "hf"` → `False`, which happens to be correct by accident but violates the contract; matches v0.30.0 `Candidate` / v0.34.0 `estimate_run_cost_usd` policy). `TrainingConfig.hub: Literal["hf","modelscope","modelers"]` field gets a `field_validator(mode="before")` `_normalize_hub` that delegates to `validate_hub_name` so `hub: HF` in YAML normalises to `"hf"` (review fix HIGH — first-cut had Pydantic Literal exact-match while `validate_hub_name` was case-insensitive, breaking the v0.41.0 `validate_optimizer_name` / v0.50.0 `grpo_variant` / `rollout_backend` policy of agreement between schema and shared validator). SoupConfig `_validate_hub_supported` cross-validator rejects `hub != 'hf'` on `backend == 'mlx'` with a distinct error message (review fix HIGH — `mlx-lm` only downloads from HF Hub; without this gate a `backend: mlx` + `hub: modelscope` config would silently pass schema load and fail at runtime with a confusing `mlx-lm` error). 26 new YAML recipes appended to `soup_cli/recipes/catalog.py` — every entry is exercised by `tests/test_v0510.py` via `load_config_from_string` round-trip + `yaml.safe_load` (no Python tags / no template injection / no credential leak in the YAML strings) + a `_no_null_or_whitespace` model-id check that rejects empty path components (review fix LOW — first-cut allowed `"/name"` leading-slash IDs to pass). Two non-`<N>B` `size` strings (`"image"` / `"ocr"` / `"moe"` / `"medium"`) were normalised to `"N/A"` (review fix MEDIUM — `search_recipes(size=…)` would silently miss those entries, and the autopilot VRAM estimator could not parse them). Known limitations: (1) Live downloader / uploader / push integration deferred to v0.51.1 — `TrainingConfig.hub` schema lock-in ships now (Literal accept + MLX cross-validator + case-normalisation), but `soup data download --hub modelscope` and `soup push --hub modelers` still route through the existing HF Hub code path; the actual `modelscope-sdk` / `openmind-hub` adapters are the v0.51.1 deliverable. Same stub-then-live pattern as v0.27.0 MII / v0.37.0 multipack / v0.50.0 GRPO Plus. (2) Speculative / aspirational `base` model IDs in some Part A/C recipes — the catalog ships entries for `openai/gpt-oss-{20,120}b`, `THUDM/glm-5`, `Qwen/Qwen-Image`, `deepseek-ai/DeepSeek-OCR`, `PaddlePaddle/PaddleOCR-VL`, `google/embeddinggemma-300m` so users have ready-made recipes the moment those repos go live (matches the plan's "match Unsloth's day-zero coverage" directive). Recipes for not-yet-published repos will surface a clear HF Hub 404 when the user runs `soup train --recipe <name>`. (3) DNS-resolved private hostnames not blocked — `validate_hub_endpoint` only rejects literal RFC1918 / link-local IP addresses; a hostname like `corp-proxy.internal` that DNS-resolves to a private IP is accepted at validation time (mirrors the v0.29.0 `HF_ENDPOINT` policy — DNS resolution is intentionally not performed in this local-tool threat model). (v0.51.0)

{soup_cli-0.53.0 → soup_cli-0.53.2}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "soup-cli"
-version = "0.53.0"
+version = "0.53.2"
 description = "Fine-tune LLMs in one command. No SSH, no config hell."
 readme = "README.md"
 license = "Apache-2.0"

{soup_cli-0.53.0 → soup_cli-0.53.2}/soup_cli/__init__.py RENAMED Viewed

@@ -1,3 +1,3 @@
 """Soup CLI — Fine-tune LLMs in one command."""
-__version__ = "0.53.0"
+__version__ = "0.53.2"

{soup_cli-0.53.0 → soup_cli-0.53.2}/soup_cli/autopilot/__init__.py RENAMED Viewed

@@ -17,6 +17,8 @@ from soup_cli.autopilot.decisions import (
     decide_performance_flags,
     decide_quantization,
     decide_task,
+    detect_prequantized_format,
+    detect_prequantized_format_from_path,
     parse_gpu_budget,
 )
 from soup_cli.autopilot.generate_config import build_soup_config, write_yaml
@@ -37,6 +39,8 @@ __all__ = [
     "decide_performance_flags",
     "decide_quantization",
     "decide_task",
+    "detect_prequantized_format",
+    "detect_prequantized_format_from_path",
     "parse_gpu_budget",
     "write_yaml",
 ]

soup-cli 0.53.0__tar.gz → 0.53.2__tar.gz

soup-cli 0.53.0tar.gz → 0.53.2tar.gz