PyPI - soup-cli - Versions diffs - 0.53.2__tar.gz → 0.53.3__tar.gz - Mend

soup-cli 0.53.2tar.gz → 0.53.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (522) hide show

{soup_cli-0.53.2 → soup_cli-0.53.3}/CONTRIBUTING.md RENAMED Viewed

@@ -111,7 +111,7 @@ soup_cli/
   templates/          - 17 built-in soup.yaml templates (YAML + manifest.json) with load_template loader (v0.39.0, +bco v0.40.0)
   ui/                 - Web UI (FastAPI + HTML/JS SPA)
-tests/                - Test suite (185 files, 7842 tests)
+tests/                - Test suite (186 files, 7879 tests)
 examples/             - Real-world config examples and datasets
 ```

{soup_cli-0.53.2 → soup_cli-0.53.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: soup-cli
-Version: 0.53.2
+Version: 0.53.3
 Summary: Fine-tune LLMs in one command. No SSH, no config hell.
 Project-URL: Homepage, https://github.com/MakazhanAlpamys/Soup
 Project-URL: Repository, https://github.com/MakazhanAlpamys/Soup
@@ -134,14 +134,13 @@ soup train
 Latest highlights only. Full history: [GitHub Releases](https://github.com/MakazhanAlpamys/Soup/releases).
-**v0.53.2 — Modality II live trainers**: Four v0.52.0 deferred stubs lifted into real, end-to-end-trainable wrappers — knowledge distillation, sequence classification, EBFT / GDPO loss kernels, and gpt-oss-style `reasoning_effort` system-prompt injection.
+**v0.53.3 — GRPO Plus partial wiring**: Two surgical GRPO Plus fixes from the v0.50.0 deferred-stub family land cleanly. The four larger items (stability callback, variant loss kernels, PRM trainer, multi-objective preference live combine) are scope-deferred to v0.53.4 — each warrants a focused release because they each require deep TRL trainer subclassing.
-- **`soup train` with `task: distill`.** New `DistillTrainerWrapper`: student + frozen teacher both load via `AutoModelForCausalLM` (separate `trust_remote_code` resolution for each), KL / forward_KL / reverse_KL / JS divergence kernels scaled by `temperature**2` per the Hinton paper. Device-bridge: teacher inputs auto-move to the teacher's device, teacher logits move back onto the student's device before the KL kernel — survives HF Trainer's auto-CUDA promotion on a CPU-tagged run. `DataCollatorForSeq2Seq(label_pad_token_id=-100)` handles variable-length pre-tokenised loss-masked rows correctly.
-- **`soup train` with `task: classifier | reranker | cross_encoder`.** New `ClassifierTrainerWrapper`: `AutoModelForSequenceClassification` with `num_labels` and `label_names`, auto-routes `single_label_classification` / `multi_label_classification` from `tcfg.classifier_kind`. Multi-label string labels resolved via the `label_names` map with a 1024-entry cap + dedup. Training Setup Panel renders `Head: num_labels=N, kind=...` instead of LoRA r/alpha for the classifier family.
-- **EBFT structured / strided + GDPO standard / length_normalized / margin loss kernels.** `apply_ebft_loss` and `apply_gdpo_loss` exit the v0.52.0 `NotImplementedError` stubs with finite-only-input guards and bool-rejected numeric params. `attach_ebft_compute_loss(trainer, tcfg)` (SFT) and `attach_gdpo_compute_loss(trainer, tcfg)` (DPO) wrap `Trainer.compute_loss` idempotently — re-attach is a no-op via a marker attribute on the wrapped method. Auto-attached when the corresponding `*_variant` field is set on `TrainingConfig`.
-- **gpt-oss `reasoning_effort` + `train_on_eot`.** `apply_reasoning_effort_prefix(messages, level)` injects `<|reasoning_effort|>{low,medium,high}<|/reasoning_effort|>` into the system turn (creates one if absent), returning a new list (caller's messages immutable). `build_assistant_only_labels(train_on_eot=True)` keeps the EOT/EOS token unmasked at the assistant-turn boundary so the model learns when to stop. Both gated to the SFT-family at config-load.
-- **+120 net new tests** (7722 → 7842) across `test_v0532.py`. Four review agents (python / code / security / tdd) ran; every CRITICAL / HIGH / MEDIUM / LOW finding fixed — separate `trust_remote_code` resolution for student vs teacher, idempotent attach hooks with regression tests, 1024-entry multi-label cap, `dpo_margin` defaults to `None` (not `0.0`) so missing values raise rather than silently zero, source-grep regression guards on the trainer-routing call sites use the full instantiation expression (no comment-only false-positives), Panel renders the classifier head instead of LoRA r/alpha.
-- **Local end-to-end CPU smoke** confirms both new wrappers train 2 steps with finite loss on `hf-internal-testing/tiny-random-gpt2`. Two real bugs surfaced and were fixed during the smoke (collator label padding + teacher / student device mismatch) — both have source-level regression guards in the test suite. ONNX export QA: pipeline integrity proven on tiny-gpt2; TinyLlama-1.1B full export is host-RAM-bound (documented in `tests/qa/v053_qa.md`).
+- **`grpo_fp16: true` is now wired** through `GRPOTrainerWrapper._build_precision_kwargs` — non-CUDA devices (CPU / MPS / XPU) skip HF Trainer's CUDA-specific `fp16`/`bf16` kwargs, CUDA + `grpo_fp16=True` forces FP16 for unsloth parity, default CUDA stays on bf16. The schema now rejects the silent-mutex combo `grpo_fp16=True + auto_mixed_precision=true` at config load with an actionable message naming both flags.
+- **Vision GRPO base-model probe.** `KNOWN_VLM_REGEX` matches 10 VLM families (Qwen2-VL / Qwen2.5-VL / QVQ / Pixtral / InternVL / Llama-3.2-Vision / LLaVA / MiniCPM-V / Idefics / ShareGPT4V / Fuyu) with word-boundary anchors that reject substring noise like `"my-pixtralish"`. A YAML pairing `vision_grpo: true` with a non-VLM checkpoint is now rejected at schema load with a friendly message naming the expected families — instead of surfacing as a cryptic `"module has no attribute 'vision_tower'"` at trainer load time. Error message truncates the echoed base to 64 chars per the v0.34.0 redaction policy.
+- **Validator ordering matters.** `_validate_grpo_fp16_amp_exclusive` short-circuits when `task != 'grpo'` so the v0.50.0 stability task-gate diagnosis ("`grpo_fp16` requires `task=grpo`") fires first regardless of validator execution order — keeps the most actionable error in front of the user.
+- **+37 net new tests** (7842 → 7879) across `test_v0533.py`. Four review agents (python / code / security / tdd) ran; every HIGH / MEDIUM / LOW finding fixed — task-gate priority short-circuit, MPS device branch explicitly documented, 64-char error-message truncation, `Optional[object]` → `object` parameter cleanup, QVQ regex coverage, 512-char exact-boundary regression test, explicit `grpo_fp16: false` produces bf16 path.
+- **Local YAML smoke** confirms both fixes round-trip via `load_config_from_string`: known VLM bases pass, non-VLM bases plus `vision_grpo: true` are rejected, and the `grpo_fp16` + `auto_mixed_precision` combo is rejected with both flag names in the message.
 ## Why Soup?

{soup_cli-0.53.2 → soup_cli-0.53.3}/README.md RENAMED Viewed

@@ -43,14 +43,13 @@ soup train
 Latest highlights only. Full history: [GitHub Releases](https://github.com/MakazhanAlpamys/Soup/releases).
-**v0.53.2 — Modality II live trainers**: Four v0.52.0 deferred stubs lifted into real, end-to-end-trainable wrappers — knowledge distillation, sequence classification, EBFT / GDPO loss kernels, and gpt-oss-style `reasoning_effort` system-prompt injection.
-- **`soup train` with `task: distill`.** New `DistillTrainerWrapper`: student + frozen teacher both load via `AutoModelForCausalLM` (separate `trust_remote_code` resolution for each), KL / forward_KL / reverse_KL / JS divergence kernels scaled by `temperature**2` per the Hinton paper. Device-bridge: teacher inputs auto-move to the teacher's device, teacher logits move back onto the student's device before the KL kernel — survives HF Trainer's auto-CUDA promotion on a CPU-tagged run. `DataCollatorForSeq2Seq(label_pad_token_id=-100)` handles variable-length pre-tokenised loss-masked rows correctly.
-- **`soup train` with `task: classifier | reranker | cross_encoder`.** New `ClassifierTrainerWrapper`: `AutoModelForSequenceClassification` with `num_labels` and `label_names`, auto-routes `single_label_classification` / `multi_label_classification` from `tcfg.classifier_kind`. Multi-label string labels resolved via the `label_names` map with a 1024-entry cap + dedup. Training Setup Panel renders `Head: num_labels=N, kind=...` instead of LoRA r/alpha for the classifier family.
-- **EBFT structured / strided + GDPO standard / length_normalized / margin loss kernels.** `apply_ebft_loss` and `apply_gdpo_loss` exit the v0.52.0 `NotImplementedError` stubs with finite-only-input guards and bool-rejected numeric params. `attach_ebft_compute_loss(trainer, tcfg)` (SFT) and `attach_gdpo_compute_loss(trainer, tcfg)` (DPO) wrap `Trainer.compute_loss` idempotently — re-attach is a no-op via a marker attribute on the wrapped method. Auto-attached when the corresponding `*_variant` field is set on `TrainingConfig`.
-- **gpt-oss `reasoning_effort` + `train_on_eot`.** `apply_reasoning_effort_prefix(messages, level)` injects `<|reasoning_effort|>{low,medium,high}<|/reasoning_effort|>` into the system turn (creates one if absent), returning a new list (caller's messages immutable). `build_assistant_only_labels(train_on_eot=True)` keeps the EOT/EOS token unmasked at the assistant-turn boundary so the model learns when to stop. Both gated to the SFT-family at config-load.
-- **+120 net new tests** (7722 → 7842) across `test_v0532.py`. Four review agents (python / code / security / tdd) ran; every CRITICAL / HIGH / MEDIUM / LOW finding fixed — separate `trust_remote_code` resolution for student vs teacher, idempotent attach hooks with regression tests, 1024-entry multi-label cap, `dpo_margin` defaults to `None` (not `0.0`) so missing values raise rather than silently zero, source-grep regression guards on the trainer-routing call sites use the full instantiation expression (no comment-only false-positives), Panel renders the classifier head instead of LoRA r/alpha.
-- **Local end-to-end CPU smoke** confirms both new wrappers train 2 steps with finite loss on `hf-internal-testing/tiny-random-gpt2`. Two real bugs surfaced and were fixed during the smoke (collator label padding + teacher / student device mismatch) — both have source-level regression guards in the test suite. ONNX export QA: pipeline integrity proven on tiny-gpt2; TinyLlama-1.1B full export is host-RAM-bound (documented in `tests/qa/v053_qa.md`).
+**v0.53.3 — GRPO Plus partial wiring**: Two surgical GRPO Plus fixes from the v0.50.0 deferred-stub family land cleanly. The four larger items (stability callback, variant loss kernels, PRM trainer, multi-objective preference live combine) are scope-deferred to v0.53.4 — each warrants a focused release because they each require deep TRL trainer subclassing.
+- **`grpo_fp16: true` is now wired** through `GRPOTrainerWrapper._build_precision_kwargs` — non-CUDA devices (CPU / MPS / XPU) skip HF Trainer's CUDA-specific `fp16`/`bf16` kwargs, CUDA + `grpo_fp16=True` forces FP16 for unsloth parity, default CUDA stays on bf16. The schema now rejects the silent-mutex combo `grpo_fp16=True + auto_mixed_precision=true` at config load with an actionable message naming both flags.
+- **Vision GRPO base-model probe.** `KNOWN_VLM_REGEX` matches 10 VLM families (Qwen2-VL / Qwen2.5-VL / QVQ / Pixtral / InternVL / Llama-3.2-Vision / LLaVA / MiniCPM-V / Idefics / ShareGPT4V / Fuyu) with word-boundary anchors that reject substring noise like `"my-pixtralish"`. A YAML pairing `vision_grpo: true` with a non-VLM checkpoint is now rejected at schema load with a friendly message naming the expected families — instead of surfacing as a cryptic `"module has no attribute 'vision_tower'"` at trainer load time. Error message truncates the echoed base to 64 chars per the v0.34.0 redaction policy.
+- **Validator ordering matters.** `_validate_grpo_fp16_amp_exclusive` short-circuits when `task != 'grpo'` so the v0.50.0 stability task-gate diagnosis ("`grpo_fp16` requires `task=grpo`") fires first regardless of validator execution order — keeps the most actionable error in front of the user.
+- **+37 net new tests** (7842 → 7879) across `test_v0533.py`. Four review agents (python / code / security / tdd) ran; every HIGH / MEDIUM / LOW finding fixed — task-gate priority short-circuit, MPS device branch explicitly documented, 64-char error-message truncation, `Optional[object]` → `object` parameter cleanup, QVQ regex coverage, 512-char exact-boundary regression test, explicit `grpo_fp16: false` produces bf16 path.
+- **Local YAML smoke** confirms both fixes round-trip via `load_config_from_string`: known VLM bases pass, non-VLM bases plus `vision_grpo: true` are rejected, and the `grpo_fp16` + `auto_mixed_precision` combo is rejected with both flag names in the message.
 ## Why Soup?

{soup_cli-0.53.2 → soup_cli-0.53.3}/SECURITY.md RENAMED Viewed

@@ -9,7 +9,8 @@ We provide security updates for the following versions:
 - **Versions older than 3 minor versions:** No support
 Example:
-- v0.53.2 -- Full support (latest)
+- v0.53.3 -- Full support (latest)
+- v0.53.2 -- Full support
 - v0.53.1 -- Full support
 - v0.53.0 -- Full support
 - v0.52.0 -- Full support
@@ -148,6 +149,8 @@ No known critical vulnerabilities in current releases.
 - **v0.32.0 — Training Stability & Auto-Tuning**: `--find-lr-output` containment via shared `utils/paths.is_under_cwd` (prevents writes outside cwd); `save_lr_finder_report` rejects NaN / Infinity floats in `lrs` / `losses` and serialises with `allow_nan=False` (keeps the report parser-safe); `compute_lr_schedule` rejects non-positive `start_lr`, inverted ranges, and `num_steps` outside `[2, 10_000]`; `pick_mixed_precision` rejects empty / null-byte / >200-char model names and resolves multi-version quirks (`qwen2.5` vs `qwen2`, `phi-3.5` vs `phi-3`) by longest-substring-first iteration so an added family can never accidentally make a more-specific entry dead code; `compute_warmup_steps` clamps to `[10, 1000]` with a `ratio==0.0` short-circuit matching HF Trainer's "no warmup" convention; `SpikeRecoveryStrategy` is `@dataclass(frozen=True)` (post-construction mutation cannot bypass validation), `max_attempts ∈ [1, 10]`, `lr_decay ∈ (0, 1)`, `min_lr > 0`; cross-validator `_validate_spike_recovery_requires_watchdog` rejects `loss_spike_recovery=true, loss_watchdog=false` at config-load (fails fast instead of never triggering); `convergence_window ∈ [5, 10_000]`, `convergence_rel_tol ∈ (0, 1]`, `recommend_action` reuses `detect_plateau` so plateau heuristic stays single-source-of-truth; `GradAccumMonitor.recommend()` caps doubled `accum` at `MAX_ACCUM=1024` so a runaway advisory loop cannot blow up DataLoader prefetch; `generate_config` validates BOTH the YAML output path AND the embedded `decisions["output"]` field via `is_under_cwd` (closes the gap where a crafted `decisions["output"]="../../etc"` would have silently propagated into the rendered YAML)
 - **v0.34.0 — Observability & Dev UX**: `.crash` bundle generator (`utils/crash.py`) recursively redacts `hf_*` / `sk-*` / `Bearer …` token-shaped strings in any captured `config` and metric tail before serialisation, so a `.crash` file shared on a public GitHub issue cannot leak credentials; `output_dir` is reduced to `os.path.basename` so `$HOME` doesn't leak; `write_crash_bundle` uses `os.path.realpath + commonpath` for cwd containment (Windows-safe; raises `ValueError` not `PermissionError` so callers cannot silently swallow with `except OSError`); filename appends `secrets.token_hex(4)` so two crashes in the same UTC second don't collide; bundle truncated to `MAX_BUNDLE_BYTES=1_000_000`. `train.py` crash-write surfaces failures to the user (no silent missing-bundle). `profiling.py` `resolve_trace_path` rejects empty / `.` / `..` / `/` / `\\` / null-byte `run_id` (closes the `output_dir/profiles/../trace.json` escape) and uses `os.path.realpath + is_under_cwd`; profiles dir is created only on successful torch import (no stale empty dirs on torch-less CI). `tracker.get_run` LIKE-prefix match escapes `%` / `_` / `\\` and uses `ESCAPE '\\'` so a crafted `run_id` cannot widen the match (mirrors v0.26.0 registry policy). Lazy schema migration (`_ensure_schema`) tolerates the "duplicate column" race when two CLI processes start simultaneously on a fresh DB (fork-based multi-GPU training, TUI auto-refresh). `runs.py show/replay/clean` switched user `run_id` rendering to `markup_escape` and switched `clean` containment from broken `Path.resolve() + relative_to()` to project-standard `os.path.realpath + is_under_cwd`. `tui_app.py` lazy-imports `ExperimentTracker` and `markup_escape`s every DB-sourced string before passing into Textual widgets so a crafted base_model / experiment_name cannot inject `[bold red]…[/]` markup. `run_cost.estimate_run_cost_usd` rejects `bool` in `num_gpus` (bool is a subclass of int — same defence as v0.30.0 `Candidate.__post_init__`); duration clamped to `[0, 1 year]`; unknown GPU returns `None` so callers render `—` instead of fabricating `$0.00`. `log_level.parse_log_level` rejects non-string + null-byte input.
 - **v0.33.0 — Live Wire**: RLVR `code_exec_reward` adds OS-level isolation (Linux best-effort `os.unshare(CLONE_NEWUSER|CLONE_NEWNET|CLONE_NEWPID)`, macOS `sandbox-exec` with default-deny `MACOS_SANDBOX_PROFILE` narrowed to a 3-name `mach-lookup` allowlist to prevent DNS / NSURLSession bypass of `(deny network*)`); `prune_checkpoints` switches to TOCTOU-safe `os.lstat + S_ISLNK` + `shutil.rmtree(onerror=_abort_on_symlink)` so a symlink encountered mid-walk aborts rather than escapes; `run_gate` wraps each task scorer in a typed `try/except` so backend failures produce `score=None, error=str(exc)` (never silent `score=1.0`); `_parse_judge_url` removes the bare `http://` catch-all (defence-in-depth after the Pydantic GateTask validator); `soup can run` requires `--yes` or explicit consent callback and raises `ValueError` (not `PermissionError`, which is an `OSError` subclass that broad `except` blocks would swallow); GGUF `rglob` result for ollama deploy is `realpath+commonpath` checked against extract_dir (prevents symlink escape from a crafted can); `DeployTarget.path` validator normalises mixed `\\`/`/` separators before splitting (closes a Windows `..` bypass); `CAN_FORMAT_VERSION` 1→2 (additive — v1 still loads); `soup can publish` validates `repo_id` via `utils/hf.validate_repo_id`, resolves token via `resolve_token`, sanitises commit messages (first-line, 200-char cap), uses HTTPS-only HfApi; `_write_spike_recovery_hint` adds `is_under_cwd` containment check on `args.output_dir` from raw HF `TrainingArguments`; `lookup_entry_by_output_dir` emits `ResourceWarning` when 1000-row scan limit is hit (no silent miss); `CrossDocCollator` no longer mutates input feature dicts (HF Dataset rows are cached and reused — mutation broke subsequent batches); `Candidate` rejects `bool` in `score`/`latency_ms` (was sneaking past `int` isinstance check); `evaluate_candidate` latency mean now divides by *completed* prompts (excludes crashed) so a broken candidate isn't artificially fast; `auto_quant.run_auto_quant_picker` soft-falls-back to highest-scored candidate when no candidate clears `min_score` (server still binds); `build_logits_processors` returns `[]` when neither `outlines` nor `lm-format-enforcer` is installed (server degrades to free-form rather than 500); MII server uses loopback-only CORS, max_tokens cap [1, 16384], stream rejection, generic 500 with no stack-trace leak; `os.execvp` auto-reexec uses list args (no shell), all forwarded flags pre-validated; `cleanup_extract_dir` uses `os.path.commonpath` (Windows-safe) instead of `startswith`; `_run_subprocess` catches `TimeoutExpired` and returns rc=124 (coreutils convention) instead of an unhandled traceback; new `eval_results` and `tensorrt` artifact kinds in `RegistryStore._VALID_KINDS`
+- **v0.53.3 — GRPO Plus partial wiring (#128 grpo_fp16, #129 vision-VLM probe)**: lifts two surgical v0.50.0 GRPO Plus deferred stubs while keeping the project's hardening invariants; the four larger items (#127 stability callback, #123 6 GRPO variant loss kernels, #126 PRMTrainerWrapper, #68 multi-objective preference live combine) are scope-deferred to v0.53.4. (#128 grpo_fp16 routing) New `_validate_grpo_fp16_amp_exclusive` SoupConfig cross-validator rejects the silent-mutex combo `grpo_fp16=True + auto_mixed_precision=True` at config load — both flags pick the mixed-precision dtype via different codepaths; combining them is a footgun where downstream behaviour depends on validator execution order. Cross-validator short-circuits when `task != 'grpo'` so the v0.50.0 stability task-gate diagnosis fires first (keeps the most actionable error at the front; code-review HIGH fix). New `GRPOTrainerWrapper._build_precision_kwargs(self) -> dict[str, bool]` returns the `{fp16, bf16}` HF kwargs per `(device, grpo_fp16)` matrix: non-CUDA (CPU / MPS / XPU) → both False (HF Trainer's fp16/bf16 kwargs are CUDA-specific, MPS / XPU use their own mixed-precision paths), CUDA + `grpo_fp16=True` → `fp16=True, bf16=False` (unsloth parity), default CUDA → `fp16=False, bf16=True` (legacy v0.50.0 path). Direct attribute access on `self.config.training.grpo_fp16` (no `getattr` fallback — Pydantic-guaranteed field). (#129 vision-GRPO base probe) New `soup_cli/utils/prm.KNOWN_VLM_REGEX` compiled regex with 10 word-boundary alternatives covering Qwen2-VL / Qwen2.5-VL / QVQ / Pixtral / InternVL / InternVL2_5 / InternVL3 / Llama-3.2-Vision (any size via `[a-z0-9._-]*vision` glob) / LLaVA / MiniCPM-V / Idefics / ShareGPT4V / Fuyu. Word-boundary idiom `(?:^|[^a-z0-9])…(?:[^a-z0-9]|$)` mirrors v0.39.0 `is_gemma4_model` / v0.44.0 `is_llama4_model` / v0.49.0 `is_llama_model` policy — rejects substring noise like `"my-pixtralish"`. New `is_known_vlm_base(name: object) -> bool` is defensive — returns False (never raises) on non-string / bool / empty / null-byte / `>_MAX_BASE_NAME_LEN=512`. Extended `validate_vision_grpo_compat` with optional `base: str | None = None` kwarg — `None` / empty-string skips the probe (back-compat for legacy v0.50.0 Part E callers); non-empty-non-VLM raises `ValueError` with friendly message naming the expected families (Qwen2-VL / Pixtral / InternVL / Llama-3.2-Vision / LLaVA / MiniCPM-V). Error message **truncates the echoed `base` to 64 chars** before serialisation (security-review MEDIUM fix mirroring v0.34.0 `crash.py` `output_dir` basename policy — defends against adversarial / long bases bloating error logs and from leaking unredacted user input into operator-facing tracebacks). `_validate_vision_grpo` in SoupConfig threads `base=self.base` so a YAML pairing `vision_grpo: true` with a non-VLM checkpoint is rejected at schema-load instead of surfacing as a cryptic `"module has no attribute 'vision_tower'"` runtime error. Test surface: 1 new test file (`test_v0533.py`) carrying 37 new tests covering: every `_build_precision_kwargs` matrix cell (CUDA + grpo_fp16 / default CUDA / CPU / MPS), every cross-validator branch (mutex rejection / task-gate priority / both-off pass), every regex alternative (Qwen2-VL / Pixtral / QVQ / Llama-3.2-Vision variants / negative matches), every defensive guard (bool / non-string / null-byte / 512-byte boundary), error-message truncation (security-review M regression), and end-to-end YAML load (happy + reject). Known limitations: (1) Scope-deferred — 4 larger v0.53.3 items moved to v0.53.4 because each requires deep TRL subclassing and warrants its own focused release; the v0.40.x stub-then-live cadence shipped 5 patch releases over 6 weeks, mirroring that here. (2) VLM allowlist is static name-regex only; a legitimate VLM published under an org whose checkpoint name lacks any of those tokens (e.g. a custom internal fork) is rejected at schema-load and operators must omit `vision_grpo: true` until a future release adds a runtime `model.config.vision_config` probe. (3) `_build_precision_kwargs` is GRPO-only — other RL trainers (PPO / RewardModel) follow their existing mixed-precision conventions. (v0.53.3)
 - **v0.53.2 — Modality II live trainers**: lifts four v0.52.0 deferred stubs (#137, #135, #133, #132) into real trainer wrappers while keeping the project's hardening invariants. (#137 reasoning_effort + train_on_eot) `apply_reasoning_effort_prefix` follows v0.41.0 / v0.51.0 validator policy (bool-first, null-byte / empty / oversize / case-insensitive normalisation); messages list is treated as immutable (returns a new list — matches v0.33.0 #47 `CrossDocCollator` policy). `build_assistant_only_labels(train_on_eot=True)` reuses the existing v0.36.0 mask infrastructure — same null-byte / max_length / bool guards. (#135 EBFT / GDPO) `apply_ebft_loss` and `apply_gdpo_loss` enforce **finite-only inputs** (`torch.isfinite` guard on tensor inputs + `math.isfinite` on scalar params) — NaN / Inf would silently corrupt training otherwise. `dpo_margin` defaults to `None` (not `0.0`) per security-review M3 fix: silent zeroing in the `margin` variant when the operator forgot to set the margin would have looked like training success but produced a meaningless gradient. Both attach hooks (`attach_ebft_compute_loss`, `attach_gdpo_compute_loss`) are **idempotent** via a marker attribute on the wrapped method — re-attach is a no-op and a dedicated test class verifies the invariant (code-review M2 fix). (#133 DistillTrainerWrapper) **Separate trust_remote_code resolution for student and teacher** (security-review L2 fix): `model_requires_trust_remote_code(teacher)` runs independently of the student probe, otherwise a malicious teacher could piggy-back on the student's opt-in. Teacher is loaded with `device_map="cpu" if device == "cpu" else "auto"`, frozen via `requires_grad_(False)` + `.eval()` immediately after load — never participates in gradient computation. `_DistillTrainer.compute_loss` device-bridge: `teacher_device = next(teacher_ref.parameters()).device`, `teacher_inputs.to(teacher_device)` before teacher forward, `teacher_logits.to(student_logits.device)` before KL kernel — defends against HF Trainer's auto-CUDA promotion silently producing cross-device `index_select` crashes. **DataCollator correctness fix** (surfaced during Wave 3 CPU smoke): `DataCollatorForLanguageModeling` does NOT pad pre-tokenised `labels` — switched to `DataCollatorForSeq2Seq(label_pad_token_id=-100, padding=True)` so variable-length loss-masked rows batch correctly without runtime crash. (#132 ClassifierTrainerWrapper) `_normalise_label` caps multi-label entries at **1024 per row** (matches v0.52.0 schema cap; security-review HIGH fix — unbounded would allow OOM via crafted JSONL), dedups via set conversion, validates `label_names` map entries reject null bytes + empty strings. `problem_type` is set explicitly from `tcfg.classifier_kind` (not silently inferred from labels) so a multi-label-shaped row in a single-label config raises rather than mis-trains. Training Setup Panel renders `Head: num_labels=N, kind=...` for classifier-family tasks instead of meaningless LoRA r/alpha lines (code-review L3 cosmetic fix — Panel no longer mis-represents what the wrapper is doing). (Cross-cutting) `commands/train.py` task routing branches added for `distill` and `classifier` / `reranker` / `cross_encoder` — source-grep regression guards in the test suite use the **full instantiation expression** `DistillTrainerWrapper(cfg, **trainer_kwargs)` so comment-only mentions of the class name cannot satisfy the regression check (TDD-review hardening). Both new factories (`build_distill_trainer`, `build_classifier_trainer`) reject unknown kwargs via Python signature contract — dedicated `pytest.raises(TypeError)` tests cover the path (TDD-review L1 fix). Test surface: 1 new test file (`test_v0532.py`) carrying 120 new tests across 14 classes. Known limitations: (1) `#71` TinyLlama-1.1B-LoRA full ONNX export is host-RAM-bound (≥16 GB free RAM needed for the `onnx.load(load_external_data=True)` post-process step); tiny-gpt2 smoke proves pipeline integrity — recorded in `tests/qa/v053_qa.md`. (2) Distillation supports same-tokenizer pairs only — cross-tokenizer (Llama → Qwen) needs a projection or sequence-level loss, out of scope. (3) Classifier wrapper has no LoRA path — full head + base training; LoRA classifier finetuning is a follow-up. (4) EBFT / GDPO auto-attach only fires when the corresponding `*_variant` field is set; manual `attach_*` invocation from custom training loops is supported and idempotent. (5) `reasoning_effort` injection happens at data-prep time inside `build_format_row`; changing the level between runs requires re-rendering the dataset. (v0.53.2)
 - **v0.53.1 — Quant Menu II + Export pipeline live**: lifts six v0.53.0 deferred stubs to live wiring while keeping the project's hardening invariants. New shared helper `soup_cli/utils/paths.enforce_under_cwd_and_no_symlink` consolidates the v0.33.0 #22 TOCTOU pattern (cwd containment via `os.path.realpath + os.path.commonpath` + `os.lstat + S_ISLNK` rejection) — used by `commands/merge.py`, `commands/export.py`, `utils/save_formats.py`, and `utils/gguf_quant.py` so the same boundary check fires at every CLI dispatch point. `merge_4bit` and `export_torchao` (`utils/save_formats.py`): cwd containment + symlink rejection on `merged_dir` / `model_dir` / `output_dir`; `load_quant_config` enforces `yaml.safe_load` only + 256 KB cap + extension allowlist (`.yaml`/`.yml`); **per-scheme closed kwarg allowlist** rejects dunder keys + unknown params before the splat into `torchao.<scheme>Config(**kwargs)` (security-review HIGH fix — `Int4WeightOnly` accepts `{group_size, inner_k_tiles}`, `NVFP4` accepts nothing extra). Corrected BNB-4bit skip-modules kwarg name from `llm_int8_skip_modules` to `bnb_4bit_skip_modules`. `export_advanced_gguf` (`utils/gguf_quant.py`): all three subprocess invocations (`convert_hf_to_gguf.py`, `llama-imatrix`, `llama-quantize`) use argv-list form with no shell, 30-min timeout, `sys.executable` for the convert script; `_run_convert_to_f16` realpath-verifies that `convert_hf_to_gguf.py` stays inside the `llama_cpp_dir` after resolution (security-review HIGH M5 fix — defends against a symlinked script escape). `_prepare_calibration_text` strips null bytes, collapses newlines to spaces, caps per-line at 8 KB + total at 50 MB (security-review M1), uses POSIX `O_NOFOLLOW` to refuse symlinks at the kernel level (security-review M3 — closes the TOCTOU window between the dispatch-time check and the actual `open()`); requires ≥ 1 usable row before invoking imatrix. `_safe_stderr` Rich-markup-escapes subprocess stderr before embedding in `RuntimeError` (security-review L4) so a crafted llama.cpp error cannot inject `[red]...[/]` into the operator-facing panel. UD-prefix stripped from flavour arg before passing to llama-quantize (`UD-Q4_K_XL` → `Q4_K_XL`). Calibration data path containment + symlink rejection fires at CLI dispatch in `commands/export.py::_export_gguf_advanced`. `detect_prequantized_format_from_path` (`autopilot/decisions.py`): cwd containment + `os.lstat + S_ISLNK` on `<model_dir>/config.json` (security-review HIGH H2 — out-of-cwd model paths silently return `None` to preserve soft-probe semantics so HF Hub repo IDs aren't rejected); null-byte rejection on `model_dir`. `commands/merge.py`: early `is_under_cwd(output)` check at CLI boundary (security-review M4) — consistent with the v0.20.0 / v0.40.2 containment-at-the-boundary policy. `deploy_measure.py`: cache file written atomically via `tempfile.mkstemp` + `os.replace` with `os.lstat + S_ISLNK` rejection on BOTH `load_cache` and `save_cache` (security-review M2 — was missing on the load side); env override `SOUP_DEPLOY_AUTOPILOT_CACHE` rejects null bytes + control chars before any path resolution and confines the override to home / cwd / tempdir; cache file gets best-effort 0o600 perms on POSIX (matches v0.26.0 registry.db policy); 1 MB cache-file cap. `_DEPLOY_MEASURE_BEFORE_GEN` / `_AFTER_FACTORY` module-level callables are documented as a non-public escape hatch (deferred until v0.46.1 live model-loader). Test surface: 4 new test files (`test_v0531_82.py` / `test_v0531_109.py` / `test_v0531_139.py` / `test_v0531_142.py`) carrying 112 new tests covering happy paths + failure modes + every security guard (POSIX symlink rejection, per-scheme kwarg allowlist, TOCTOU defences, `_MAX_CANDIDATES` cap, MINOR-verdict band, mxfp4 word boundary, BNB-alias detection, render-table markup escape). Known limitations: (1) `_DEPLOY_MEASURE_BEFORE_GEN` / `_AFTER_FACTORY` are a stop-gap until v0.46.1 ships first-party transformers / vLLM generator factories. (2) `#70` GGUF and `#72` AWQ/GPTQ manual QA smokes remain pending — require CUDA + llama.cpp build; recipes scripted in `tests/qa/v053_qa.md`. (3) BNB-4bit merge + TorchAO PTQ live happy-path is mock-covered only — CPU-only CI cannot execute the real BNB / torchao kernels. (4) `_prepare_calibration_text` accepts JSONL with `text` / `prompt` / `content` aliases + raw text fallback; other formats (parquet / markdown) are out of scope. (5) Cache key truncates `base_sha` to 16 hex chars at the call site (collision probability ≈ 1-in-2³² across ~4 billion entries). (6) Pre-quantized detection is heuristic — name regex + local `config.json` probe; HF Hub repo IDs without local download fall back to name-only matching. (7) `enforce_under_cwd_and_no_symlink` checks only the leaf path; deeper traversal relies on the per-file leaf check at each site. (v0.53.1)

{soup_cli-0.53.2 → soup_cli-0.53.3}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "soup-cli"
-version = "0.53.2"
+version = "0.53.3"
 description = "Fine-tune LLMs in one command. No SSH, no config hell."
 readme = "README.md"
 license = "Apache-2.0"

{soup_cli-0.53.2 → soup_cli-0.53.3}/soup_cli/__init__.py RENAMED Viewed

@@ -1,3 +1,3 @@
 """Soup CLI — Fine-tune LLMs in one command."""
-__version__ = "0.53.2"
+__version__ = "0.53.3"

{soup_cli-0.53.2 → soup_cli-0.53.3}/soup_cli/config/schema.py RENAMED Viewed

@@ -2512,6 +2512,7 @@ class SoupConfig(BaseModel):
                 task=self.task,
                 modality=self.modality,
                 backend=self.backend,
+                base=self.base,  # v0.53.3 #129 — name-regex VLM probe
             )
         except ValueError as exc:
             raise ValueError(str(exc)) from exc
@@ -2557,6 +2558,35 @@ class SoupConfig(BaseModel):
             )
         return self
+    @model_validator(mode="after")
+    def _validate_grpo_fp16_amp_exclusive(self) -> "SoupConfig":
+        """v0.53.3 #128 — ``grpo_fp16`` and ``auto_mixed_precision`` are
+        mutually exclusive.
+        Both flags pick the mixed-precision dtype but go through different
+        codepaths (``grpo_fp16`` forces ``fp16=True, bf16=False`` on
+        GRPOConfig directly; ``auto_mixed_precision`` runs the v0.32.0
+        per-model + per-GPU picker). Combining them is a footgun where the
+        downstream behaviour depends on order-of-evaluation — fail fast at
+        config-load with a friendly message naming both flags so the user
+        picks one.
+        """
+        # Short-circuit when task is not 'grpo' so the v0.50.0 stability
+        # task-gate error fires first (code-review HIGH fix — keeps a
+        # consistent "wrong-task" diagnosis ahead of the mutual-exclusion
+        # one, regardless of validator execution order).
+        if self.task != "grpo":
+            return self
+        if self.training.grpo_fp16 and self.training.auto_mixed_precision:
+            raise ValueError(
+                "grpo_fp16=True and auto_mixed_precision=True are mutually "
+                "exclusive — both pick the mixed-precision dtype but go "
+                "through different codepaths. Pick one: grpo_fp16 forces "
+                "FP16 (unsloth parity), auto_mixed_precision uses the "
+                "v0.32.0 per-GPU picker."
+            )
+        return self
     @model_validator(mode="after")
     def _validate_hub_supported(self) -> "SoupConfig":
         """v0.51.0 Part E — ``hub`` other than ``hf`` requires a non-mlx

{soup_cli-0.53.2 → soup_cli-0.53.3}/soup_cli/trainer/grpo.py RENAMED Viewed

@@ -54,6 +54,32 @@ class GRPOTrainerWrapper:
         self.tokenizer = None
         self.trainer = None
+    def _build_precision_kwargs(self) -> dict[str, bool]:
+        """Resolve fp16/bf16 kwargs for GRPOConfig (v0.53.3 #128).
+        Priority:
+        - Non-CUDA device (CPU / MPS / XPU) → no mixed precision (both
+          False). HF Trainer's fp16/bf16 kwargs are CUDA-specific; non-CUDA
+          backends must use their own mixed-precision path (MPS Metal,
+          XPU IPEX). Documented explicitly so future MPS work doesn't
+          regress this branch silently.
+        - ``grpo_fp16=True`` (CUDA) → ``fp16=True, bf16=False`` (unsloth
+          parity).
+        - Default CUDA → ``fp16=False, bf16=True`` (legacy v0.50.0 path).
+        ``auto_mixed_precision`` is mutually exclusive with ``grpo_fp16``
+        (rejected at schema load via ``_validate_grpo_fp16_amp_exclusive``);
+        when only ``auto_mixed_precision`` is set, the v0.32.0 picker runs
+        elsewhere in the training loop and overrides this default.
+        """
+        if self.device != "cuda":
+            return {"fp16": False, "bf16": False}
+        # grpo_fp16 is a Pydantic field with default=False; direct attribute
+        # access (no getattr fallback) so a typo would fail loudly.
+        if self.config.training.grpo_fp16:
+            return {"fp16": True, "bf16": False}
+        return {"fp16": False, "bf16": True}
     def setup(self, dataset: dict):
         """Load model, tokenizer, apply LoRA, create GRPO trainer."""
         from datasets import Dataset
@@ -166,7 +192,7 @@ class GRPOTrainerWrapper:
             "logging_steps": tcfg.logging_steps,
             "save_steps": tcfg.save_steps,
             "save_total_limit": 3,
-            "bf16": self.device == "cuda",
+            **self._build_precision_kwargs(),
             "report_to": self.report_to,
             "remove_unused_columns": False,
             "deepspeed": self.deepspeed_config,

soup_cli-0.53.3/soup_cli/utils/prm.py ADDED Viewed

@@ -0,0 +1,175 @@
+"""PRM (Process Reward Model) — v0.50.0 Part E + v0.53.3 #129.
+Schema helpers for the new ``task='prm'`` stepwise-supervised trainer.
+The PRM data format (``data.format='prm'``) was schema-locked in v0.42.0
+Part A; v0.50.0 promotes it to a first-class task with cross-validators.
+v0.53.3 #129 extends :func:`validate_vision_grpo_compat` with an optional
+``base`` model name probe (``KNOWN_VLM_REGEX``) so a config that pairs
+``vision_grpo: true`` with a non-VLM checkpoint is rejected at schema-load
+with an actionable message naming a known VLM family.
+The actual PRM trainer wrapper (``soup_cli/trainer/prm.py``) is deferred
+to v0.50.1 — mirrors v0.27.0 MII / v0.37.0 multipack / v0.41.0 LLaMA Pro /
+v0.45.0 plugins / v0.49.0 LongLoRA stub-then-live pattern.
+Security:
+- Pure schema-time validation; no filesystem touch.
+- All validators raise ``ValueError`` with actionable messages.
+- Name-regex probe rejects null-byte / non-string / oversize inputs by
+  returning ``False`` (no exception — mirrors v0.39.0 ``is_gemma4_model``
+  / v0.44.0 ``is_llama4_model`` / v0.49.0 ``is_llama_model`` policy).
+"""
+from __future__ import annotations
+import re
+# v0.53.3 #129 — case-insensitive name allowlist for VLM bases.
+# Each alternative uses word-style boundaries so substring noise like
+# ``"my-pixtralish"`` does not match. The list is deliberately small and
+# additive — extending it does not break callers because callers always
+# pass through :func:`is_known_vlm_base`.
+_VLM_PATTERNS = (
+    r"(?:^|[^a-z0-9])qwen[\d.]*-vl(?:[^a-z0-9]|$)",   # Qwen2-VL / Qwen2.5-VL
+    r"(?:^|[^a-z0-9])qvq(?:[^a-z0-9]|$)",              # QVQ-72B
+    r"(?:^|[^a-z0-9])pixtral(?:[^a-z0-9]|$)",          # Pixtral
+    r"(?:^|[^a-z0-9])internvl[\d._]*(?:[^a-z0-9]|$)",  # InternVL/InternVL2_5/InternVL3
+    # Llama-3.2-Vision (any size in between, e.g. Llama-3.2-11B-Vision)
+    r"(?:^|[^a-z0-9])llama-?3\.?2[a-z0-9._-]*vision(?:[^a-z0-9]|$)",
+    r"(?:^|[^a-z0-9])llava(?:[^a-z0-9]|$)",            # LLaVA
+    r"(?:^|[^a-z0-9])minicpm-?v(?:[^a-z0-9]|$)",       # MiniCPM-V
+    r"(?:^|[^a-z0-9])idefics[\d]*(?:[^a-z0-9]|$)",     # Idefics
+    r"(?:^|[^a-z0-9])sharegpt4v(?:[^a-z0-9]|$)",       # ShareGPT4V
+    r"(?:^|[^a-z0-9])fuyu(?:[^a-z0-9]|$)",             # Fuyu
+)
+KNOWN_VLM_REGEX = re.compile("|".join(_VLM_PATTERNS), re.IGNORECASE)
+_MAX_BASE_NAME_LEN = 512
+def is_known_vlm_base(name: object) -> bool:
+    """Best-effort check whether ``name`` matches a known VLM family.
+    Returns ``False`` (never raises) on any of: non-string, empty, null
+    byte, length > 512. Match is case-insensitive with word boundaries so
+    substring noise (``"my-pixtralish"``) does not false-positive — mirrors
+    v0.39.0 / v0.44.0 / v0.49.0 model-detection policy.
+    """
+    if isinstance(name, bool):
+        return False
+    if not isinstance(name, str):
+        return False
+    if not name:
+        return False
+    if "\x00" in name:
+        return False
+    if len(name) > _MAX_BASE_NAME_LEN:
+        return False
+    return KNOWN_VLM_REGEX.search(name) is not None
+def validate_prm_compat(
+    *,
+    task: str,
+    data_format: str,
+    backend: str,
+    modality: str,
+) -> None:
+    """Schema-time gate for ``task='prm'``.
+    Rejects:
+    - non-PRM task (the function is intended to be called only when
+      ``task == 'prm'``; defence-in-depth).
+    - ``data.format`` not in ``{'prm', 'auto'}`` — PRM requires the
+      stepwise-supervised data shape from v0.42.0 Part A.
+    - ``backend='mlx'`` — PRM trainer is HF Trainer-specific.
+    - ``modality != 'text'`` — vision/audio PRM not modelled.
+    """
+    if not isinstance(task, str) or not task:
+        raise ValueError("task must be a non-empty string")
+    if task != "prm":
+        raise ValueError(
+            f"validate_prm_compat called with task={task!r} (expected 'prm')"
+        )
+    if not isinstance(data_format, str) or not data_format:
+        raise ValueError("data.format must be a non-empty string")
+    if data_format not in ("prm", "auto"):
+        raise ValueError(
+            f"task='prm' requires data.format in ('prm', 'auto'); "
+            f"got data.format={data_format!r}"
+        )
+    if backend == "mlx":
+        raise ValueError(
+            "task='prm' is not supported on backend=mlx in v0.50.0"
+        )
+    if modality != "text":
+        raise ValueError(
+            f"task='prm' requires modality='text'; got modality={modality!r}"
+        )
+def validate_vision_grpo_compat(
+    *,
+    task: str,
+    modality: str,
+    backend: str,
+    base: str | None = None,
+) -> None:
+    """Schema-time gate for ``vision_grpo=True``.
+    Rejects on:
+    - task not in {'grpo', 'ppo'} (vision RL is only meaningful for RL);
+    - modality != 'vision' (the whole point of the flag);
+    - backend == 'mlx' (no VLM-RL on MLX);
+    - v0.53.3 #129: ``base`` (when supplied, non-empty) does not match a
+      known VLM family — the runtime trainer error would be cryptic
+      ("module has no attribute 'vision_tower'") so we surface a friendly
+      schema-load rejection naming the expected families instead.
+    ``base=None`` or empty-string skips the probe (backwards-compatible —
+    legacy callers from v0.50.0 Part E pass no ``base`` kwarg).
+    """
+    if not isinstance(task, str) or not task:
+        raise ValueError("task must be a non-empty string")
+    if task not in ("grpo", "ppo"):
+        raise ValueError(
+            f"vision_grpo requires task in ('grpo', 'ppo'); got task={task!r}"
+        )
+    if modality != "vision":
+        raise ValueError(
+            f"vision_grpo requires modality='vision'; got modality={modality!r}"
+        )
+    if backend == "mlx":
+        raise ValueError(
+            "vision_grpo is not supported on backend=mlx in v0.50.0"
+        )
+    # v0.53.3 #129 — name-regex probe (deliberately permissive: empty /
+    # None / non-string skips the probe).
+    if isinstance(base, str) and base and not is_known_vlm_base(base):
+        # Truncate the echoed value to keep adversarial / long bases from
+        # bloating error logs (security review fix; mirrors v0.34.0 crash
+        # redaction policy).
+        safe_base = base if len(base) <= 64 else base[:61] + "..."
+        raise ValueError(
+            f"vision_grpo=True requires a known VLM base; got base={safe_base!r}. "
+            "Expected one of the Qwen2-VL / Pixtral / InternVL / "
+            "Llama-3.2-Vision / LLaVA / MiniCPM-V families. If your base "
+            "is a legitimate VLM not in the allowlist, omit vision_grpo "
+            "until a future release adds a runtime config-probe path."
+        )
+def build_prm_trainer() -> None:
+    """Live PRM trainer factory — deferred to v0.50.1.
+    Planned v0.50.1 signature:
+    ``build_prm_trainer(*, config, model, tokenizer, train_dataset, eval_dataset)``.
+    Raises ``NotImplementedError`` so callers cannot silently train an
+    SFT model when they asked for PRM.
+    """
+    raise NotImplementedError(
+        "PRM trainer (task='prm') live wiring deferred to v0.50.1. "
+        "Schema accepts the value but no trainer wrapper is registered yet."
+    )

{soup_cli-0.53.2 → soup_cli-0.53.3}/tests/test_v0500_part_e.py RENAMED Viewed

@@ -92,9 +92,12 @@ def test_prm_compat_none_format():
 def test_vision_grpo_soupconfig_ppo_happy():
-    """tdd-guide MEDIUM fix: confirm ppo path is wired at SoupConfig level."""
+    """tdd-guide MEDIUM fix: confirm ppo path is wired at SoupConfig level.
+    v0.53.3 #129 — uses a known VLM base so the new name-regex probe passes.
+    """
     yaml = """
-base: test-llama
+base: Qwen/Qwen2-VL-7B-Instruct
 task: ppo
 modality: vision
 data:
@@ -221,8 +224,9 @@ training:
 def test_vision_grpo_soupconfig_happy():
+    # v0.53.3 #129 — uses a known VLM base.
     yaml = """
-base: test-llama
+base: Qwen/Qwen2-VL-7B-Instruct
 task: grpo
 modality: vision
 data:

soup-cli 0.53.2__tar.gz → 0.53.3__tar.gz

soup-cli 0.53.2tar.gz → 0.53.3tar.gz