PyPI - mlxsmith - Versions diffs - 0.1.1__tar.gz → 0.1.3__tar.gz - Mend

mlxsmith 0.1.1tar.gz → 0.1.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (97) hide show

{mlxsmith-0.1.1/src/mlxsmith.egg-info → mlxsmith-0.1.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: mlxsmith
-Version: 0.1.1
+Version: 0.1.3
 Summary: Apple Silicon MLX fine-tuning toolkit — SFT, DPO/ORPO, GRPO, distillation, and OpenAI-compatible serving.
 Author-email: Shannon Labs <hmbown@gmail.com>
 License: MIT
@@ -36,18 +36,19 @@ Provides-Extra: llm
 Requires-Dist: mlx-lm>=0.30.5; extra == "llm"
 Requires-Dist: transformers>=5.0.0; extra == "llm"
 Requires-Dist: datasets>=3.0.0; extra == "llm"
+Provides-Extra: lora
+Requires-Dist: mlx-lm-lora>=1.0.0; extra == "lora"
 Provides-Extra: serve
 Requires-Dist: fastapi>=0.128.0; extra == "serve"
 Requires-Dist: uvicorn>=0.40.0; extra == "serve"
 Requires-Dist: httpx>=0.28.0; extra == "serve"
-Provides-Extra: zmlx
-Requires-Dist: zmlx; extra == "zmlx"
 Provides-Extra: dev
 Requires-Dist: pytest>=9.0.0; extra == "dev"
 Requires-Dist: ruff>=0.14.0; extra == "dev"
 Provides-Extra: all
 Requires-Dist: mlx>=0.30.4; extra == "all"
 Requires-Dist: mlx-lm>=0.30.5; extra == "all"
+Requires-Dist: mlx-lm-lora>=1.0.0; extra == "all"
 Requires-Dist: transformers>=5.0.0; extra == "all"
 Requires-Dist: datasets>=3.0.0; extra == "all"
 Requires-Dist: fastapi>=0.128.0; extra == "all"
@@ -59,7 +60,7 @@ Dynamic: license-file
 Apple Silicon MLX fine-tuning toolkit — SFT, DPO/ORPO, GRPO, distillation, and OpenAI-compatible serving.
-**Status:** alpha (v0.1.0). Full training pipeline validated on Qwen3-4B.
+**Status:** alpha (v0.1.2). Full training pipeline validated on Qwen3-4B.
 ## Install
@@ -76,6 +77,9 @@ pip install mlxsmith
 # Apple Silicon training + serving
 pip install "mlxsmith[mlx,llm,serve]"
+# mlx-lm-lora passthrough (advanced training methods)
+pip install "mlxsmith[lora]"
 # Everything
 pip install "mlxsmith[all]"
 ```
@@ -85,7 +89,7 @@ pip install "mlxsmith[all]"
 ```bash
 mlxsmith init myproj
 cd myproj
-mlxsmith doctor        # check Python, MLX, Metal, ZMLX
+mlxsmith doctor        # check Python, MLX, Metal
 ```
 ## Training
@@ -133,6 +137,22 @@ mlxsmith distill --teacher large-model --student small-model --mode opd
 mlxsmith pipeline
 ```
+### mlx-lm-lora parity (all methods)
+Use the passthrough to access mlx-lm-lora features (DPO variants, GRPO variants,
+PPO, synthetic datasets, judge training, etc.):
+```bash
+# Train with mlx-lm-lora directly
+mlxsmith lora train --model Qwen/Qwen3-4B-Instruct-2507 --data data/prefs --train-mode dpo -- --beta 0.1
+# Generate synthetic datasets
+mlxsmith lora synthetic prompts -- --model mlx-community/Qwen3-4B-Instruct-2507-4bit --num-samples 1000
+# Train judge model
+mlxsmith lora judge -- --model mlx-community/Qwen3-4B-Instruct-2507-4bit --data data/prefs
+```
 ## Serving
 OpenAI-compatible `/v1/chat/completions` endpoint.
@@ -204,6 +224,7 @@ Built-in verifiers for eval, RFT, and preference tuning:
 - **pytest** — sandboxed test execution
 - **docker** — containerized verification
 - **compose** — multi-verifier composition (AND/OR/weighted)
+- **llm_judge** — LLM-based self-verification / ThinkPRM-style verifier
 See `docs/VERIFIERS.md` for the verifier API.
@@ -232,6 +253,9 @@ mlxsmith config env               # show environment variable mapping
 Config sources (in priority order): CLI flags > environment variables (`MLXSMITH__SECTION__KEY`) > config file > defaults.
+Training optimizers are configurable via `train.optimizer` and `train.optimizer_kwargs`
+(for example `adamw`, `adam`, `qhadam`, `muon` when available in MLX).
 ## SDK (programmatic API)
 For building custom training loops:
@@ -269,14 +293,6 @@ mlxsmith rlm history               # view history
 Includes task generation, mutation for data diversity, corpus management, EMA-based gating, and weight pointer IPC for multi-process coordination. See `docs/orchestrator.md`.
-### ZMLX acceleration
-Optional zero-copy MLX acceleration backend.
-```bash
-mlxsmith accel status
-```
 ## Docs
 - `docs/PROJECT_FORMAT.md` — project layout and artifacts

{mlxsmith-0.1.1 → mlxsmith-0.1.3}/README.md RENAMED Viewed

@@ -2,7 +2,7 @@
 Apple Silicon MLX fine-tuning toolkit — SFT, DPO/ORPO, GRPO, distillation, and OpenAI-compatible serving.
-**Status:** alpha (v0.1.0). Full training pipeline validated on Qwen3-4B.
+**Status:** alpha (v0.1.2). Full training pipeline validated on Qwen3-4B.
 ## Install
@@ -19,6 +19,9 @@ pip install mlxsmith
 # Apple Silicon training + serving
 pip install "mlxsmith[mlx,llm,serve]"
+# mlx-lm-lora passthrough (advanced training methods)
+pip install "mlxsmith[lora]"
 # Everything
 pip install "mlxsmith[all]"
 ```
@@ -28,7 +31,7 @@ pip install "mlxsmith[all]"
 ```bash
 mlxsmith init myproj
 cd myproj
-mlxsmith doctor        # check Python, MLX, Metal, ZMLX
+mlxsmith doctor        # check Python, MLX, Metal
 ```
 ## Training
@@ -76,6 +79,22 @@ mlxsmith distill --teacher large-model --student small-model --mode opd
 mlxsmith pipeline
 ```
+### mlx-lm-lora parity (all methods)
+Use the passthrough to access mlx-lm-lora features (DPO variants, GRPO variants,
+PPO, synthetic datasets, judge training, etc.):
+```bash
+# Train with mlx-lm-lora directly
+mlxsmith lora train --model Qwen/Qwen3-4B-Instruct-2507 --data data/prefs --train-mode dpo -- --beta 0.1
+# Generate synthetic datasets
+mlxsmith lora synthetic prompts -- --model mlx-community/Qwen3-4B-Instruct-2507-4bit --num-samples 1000
+# Train judge model
+mlxsmith lora judge -- --model mlx-community/Qwen3-4B-Instruct-2507-4bit --data data/prefs
+```
 ## Serving
 OpenAI-compatible `/v1/chat/completions` endpoint.
@@ -147,6 +166,7 @@ Built-in verifiers for eval, RFT, and preference tuning:
 - **pytest** — sandboxed test execution
 - **docker** — containerized verification
 - **compose** — multi-verifier composition (AND/OR/weighted)
+- **llm_judge** — LLM-based self-verification / ThinkPRM-style verifier
 See `docs/VERIFIERS.md` for the verifier API.
@@ -175,6 +195,9 @@ mlxsmith config env               # show environment variable mapping
 Config sources (in priority order): CLI flags > environment variables (`MLXSMITH__SECTION__KEY`) > config file > defaults.
+Training optimizers are configurable via `train.optimizer` and `train.optimizer_kwargs`
+(for example `adamw`, `adam`, `qhadam`, `muon` when available in MLX).
 ## SDK (programmatic API)
 For building custom training loops:
@@ -212,14 +235,6 @@ mlxsmith rlm history               # view history
 Includes task generation, mutation for data diversity, corpus management, EMA-based gating, and weight pointer IPC for multi-process coordination. See `docs/orchestrator.md`.
-### ZMLX acceleration
-Optional zero-copy MLX acceleration backend.
-```bash
-mlxsmith accel status
-```
 ## Docs
 - `docs/PROJECT_FORMAT.md` — project layout and artifacts

{mlxsmith-0.1.1 → mlxsmith-0.1.3}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "mlxsmith"
-version = "0.1.1"
+version = "0.1.3"
 description = "Apple Silicon MLX fine-tuning toolkit — SFT, DPO/ORPO, GRPO, distillation, and OpenAI-compatible serving."
 readme = {file = "README.md", content-type = "text/markdown"}
 requires-python = ">=3.10"
@@ -47,16 +47,17 @@ llm = [
   "transformers>=5.0.0",
   "datasets>=3.0.0",
 ]
+lora = ["mlx-lm-lora>=1.0.0"]
 serve = [
   "fastapi>=0.128.0",
   "uvicorn>=0.40.0",
   "httpx>=0.28.0",
 ]
-zmlx = ["zmlx"]
 dev = ["pytest>=9.0.0", "ruff>=0.14.0"]
 all = [
   "mlx>=0.30.4",
   "mlx-lm>=0.30.5",
+  "mlx-lm-lora>=1.0.0",
   "transformers>=5.0.0",
   "datasets>=3.0.0",
   "fastapi>=0.128.0",

{mlxsmith-0.1.1 → mlxsmith-0.1.3}/src/mlxsmith/accel/__init__.py RENAMED Viewed

@@ -1,10 +1,7 @@
 from __future__ import annotations
 from .none import NoneBackend
-from .zmlx_backend import ZMLXBackend
 def get_backend(name: str):
     if name == "none":
         return NoneBackend()
-    if name == "zmlx":
-        return ZMLXBackend()
     raise ValueError(f"Unknown accel backend: {name}")

{mlxsmith-0.1.1 → mlxsmith-0.1.3}/src/mlxsmith/bench.py RENAMED Viewed

@@ -44,7 +44,12 @@ def run_bench(
     mode = (mode or "inference").lower()
     if mode == "trainer":
-        opt, _params = llm.optimizer_and_params(lr=cfg.train.lr, weight_decay=cfg.train.weight_decay)
+        opt, _params = llm.optimizer_and_params(
+            lr=cfg.train.lr,
+            weight_decay=cfg.train.weight_decay,
+            optimizer=cfg.train.optimizer,
+            optimizer_kwargs=cfg.train.optimizer_kwargs,
+        )
         prompt_ids = llm.encode(prompt)
         ids = llm.encode(prompt + " " + "x" * max_tokens)
         for i in range(max(1, reps)):
@@ -59,7 +64,12 @@ def run_bench(
             elapsed = max(time.time() - t0, 1e-6)
             results.append({"rep": i, "steps": steps, "time_s": elapsed, "steps_per_s": steps / elapsed})
     elif mode == "end_to_end":
-        opt, _params = llm.optimizer_and_params(lr=cfg.train.lr, weight_decay=cfg.train.weight_decay)
+        opt, _params = llm.optimizer_and_params(
+            lr=cfg.train.lr,
+            weight_decay=cfg.train.weight_decay,
+            optimizer=cfg.train.optimizer,
+            optimizer_kwargs=cfg.train.optimizer_kwargs,
+        )
         for i in range(max(1, reps)):
             t0 = time.time()
             gen = llm.generate(prompt, max_new_tokens=max_tokens, temperature=0.0)

{mlxsmith-0.1.1 → mlxsmith-0.1.3}/src/mlxsmith/cli.py RENAMED Viewed

@@ -24,6 +24,8 @@ from .train.sft import run_sft
 from .train.pref import run_pref
 from .train.rft import run_rft
 from .train.distill import run_distill
+from .train.online_dpo import run_online_dpo
+from .train.self_verify import run_self_verify
 from .eval import run_eval
 from .bench import run_bench
 from .rlm import run_rlm, run_rlm_orchestrated
@@ -40,6 +42,13 @@ from .envs import (
     resolve_env_path as resolve_env_path_plugin,
     load_manifest as load_env_manifest,
 )
+from .integrations.mlx_lm_lora import (
+    build_train_command as build_mlx_lm_lora_train_command,
+    build_synthetic_command as build_mlx_lm_lora_synth_command,
+    build_judge_command as build_mlx_lm_lora_judge_command,
+    build_reward_functions_command as build_mlx_lm_lora_reward_functions_command,
+    run_command as run_mlx_lm_lora_command,
+)
 app = typer.Typer(
     add_completion=False,
@@ -65,6 +74,9 @@ def init(path: str = typer.Argument(..., help="Project directory to create")):
     (p / "verifiers" / "regex.py").write_text(_sample_verifier_regex(), encoding="utf-8")
     (p / "verifiers" / "pytest.py").write_text(_sample_verifier_pytest(), encoding="utf-8")
     (p / "verifiers" / "jsonschema.py").write_text(_sample_verifier_jsonschema(), encoding="utf-8")
+    (p / "verifiers" / "llm_judge.py").write_text(_sample_verifier_llm_judge(), encoding="utf-8")
+    (p / "verifiers" / "rubrics").mkdir(parents=True, exist_ok=True)
+    (p / "verifiers" / "rubrics" / "coding.txt").write_text(_sample_judge_rubric(), encoding="utf-8")
     (p / "eval" / "suites" / "coding.yaml").write_text(_sample_eval_suite(), encoding="utf-8")
     console.print(f"[green]Initialized[/green] {p.resolve()}")
@@ -83,7 +95,6 @@ def doctor():
     table.add_row("cpu_count", str(info.cpu_count))
     table.add_row("metal", str(info.has_metal))
     table.add_row("mlx", f"{info.has_mlx} {info.mlx_version or ''}".strip())
-    table.add_row("zmlx", str(info.has_zmlx))
     console.print(table)
@@ -342,14 +353,19 @@ def pref(
     data: str = typer.Option("data/prefs", "--data"),
     model: str = typer.Option(..., "--model", help="Base adapter or model path (e.g., runs/sft_0001/adapter)"),
     accel: Optional[str] = typer.Option(None, "--accel", help="Override accel.backend"),
-    algo: Optional[str] = typer.Option(None, "--algo", help="Override pref.algo (dpo|orpo|grpo)"),
+    algo: Optional[str] = typer.Option(None, "--algo", help="Override pref.algo (legacy)"),
+    loss_type: Optional[str] = typer.Option(None, "--loss-type", help="dpo|cpo|orpo|ipo|hinge"),
 ):
     root = project_root_from_cwd()
+    overrides = {}
+    if loss_type is not None:
+        overrides["pref.loss_type"] = loss_type
     cfg = get_config(
         config_path=config,
         root=root,
         accel_backend=accel,
         algo=algo,
+        **overrides,
     )
     data_dir = root / data
     run = run_pref(root, cfg, data_dir, Path(model), cfg.accel.backend)
@@ -364,13 +380,27 @@ def rft(
     model: str = typer.Option(..., "--model"),
     accel: Optional[str] = typer.Option(None, "--accel", help="Override accel.backend"),
     rollouts: Optional[int] = typer.Option(None, "--rollouts", help="Override rft.rollouts"),
+    loss_type: Optional[str] = typer.Option(None, "--loss-type", help="grpo|dr_grpo|dapo"),
+    epsilon_low: Optional[float] = typer.Option(None, "--epsilon-low"),
+    epsilon_high: Optional[float] = typer.Option(None, "--epsilon-high"),
+    token_level_loss: Optional[bool] = typer.Option(None, "--token-level-loss/--sequence-level-loss"),
 ):
     root = project_root_from_cwd()
+    overrides = {}
+    if loss_type is not None:
+        overrides["rft.loss_type"] = loss_type
+    if epsilon_low is not None:
+        overrides["rft.epsilon_low"] = epsilon_low
+    if epsilon_high is not None:
+        overrides["rft.epsilon_high"] = epsilon_high
+    if token_level_loss is not None:
+        overrides["rft.token_level_loss"] = token_level_loss
     cfg = get_config(
         config_path=config,
         root=root,
         accel_backend=accel,
         rollouts=rollouts,
+        **overrides,
     )
     run = run_rft(root, cfg, root / env, root / verifier, Path(model), cfg.accel.backend)
     console.print(f"[bold]Run:[/bold] {run.run_dir}")
@@ -438,6 +468,142 @@ def distill(
     console.print(f"[bold]Run:[/bold] {run.run_dir}")
+@app.command("online-dpo")
+def online_dpo(
+    data: str = typer.Option(..., "--data", help="JSONL with prompts"),
+    model: str = typer.Option(..., "--model"),
+    judge_model: Optional[str] = typer.Option(None, "--judge-model"),
+    judge_backend: str = typer.Option("mlx-lm", "--judge-backend"),
+    rubric: Optional[str] = typer.Option(None, "--rubric"),
+    group_size: Optional[int] = typer.Option(None, "--group-size"),
+    max_new_tokens: Optional[int] = typer.Option(None, "--max-new-tokens"),
+    temperature: Optional[float] = typer.Option(None, "--temperature"),
+    config: str = typer.Option("mlxsmith.yaml", "-c", "--config", help="Config file path"),
+    accel: Optional[str] = typer.Option(None, "--accel", help="Override accel.backend"),
+):
+    root = project_root_from_cwd()
+    cfg = get_config(config_path=config, root=root, accel_backend=accel)
+    run = run_online_dpo(
+        root,
+        cfg,
+        Path(data),
+        model,
+        cfg.accel.backend,
+        judge_model=judge_model,
+        judge_backend=judge_backend,
+        rubric=rubric,
+        group_size=group_size,
+        max_new_tokens=max_new_tokens,
+        temperature=temperature,
+    )
+    console.print(f"[bold]Run:[/bold] {run.run_dir}")
+@app.command("self-verify")
+def self_verify(
+    data: str = typer.Option(..., "--data", help="JSONL with prompts"),
+    model: str = typer.Option(..., "--model"),
+    verifier_model: Optional[str] = typer.Option(None, "--verifier-model"),
+    verifier_backend: str = typer.Option("mlx-lm", "--verifier-backend"),
+    rubric: Optional[str] = typer.Option(None, "--rubric"),
+    max_new_tokens: Optional[int] = typer.Option(None, "--max-new-tokens"),
+    temperature: Optional[float] = typer.Option(None, "--temperature"),
+    config: str = typer.Option("mlxsmith.yaml", "-c", "--config", help="Config file path"),
+    accel: Optional[str] = typer.Option(None, "--accel", help="Override accel.backend"),
+):
+    root = project_root_from_cwd()
+    cfg = get_config(config_path=config, root=root, accel_backend=accel)
+    run = run_self_verify(
+        root,
+        cfg,
+        Path(data),
+        model,
+        cfg.accel.backend,
+        verifier_model=verifier_model,
+        verifier_backend=verifier_backend,
+        rubric=rubric,
+        max_new_tokens=max_new_tokens,
+        temperature=temperature,
+    )
+    console.print(f"[bold]Run:[/bold] {run.run_dir}")
+lora_app = typer.Typer(help="mlx-lm-lora passthrough commands")
+app.add_typer(lora_app, name="lora")
+@lora_app.command(
+    "train",
+    context_settings={"allow_extra_args": True, "ignore_unknown_options": True},
+)
+def lora_train(
+    ctx: typer.Context,
+    config: Optional[str] = typer.Option(None, "--config", help="mlx-lm-lora config path"),
+    model: Optional[str] = typer.Option(None, "--model", help="Model id or path"),
+    data: Optional[str] = typer.Option(None, "--data", help="Dataset path or HF dataset"),
+    train_mode: Optional[str] = typer.Option(None, "--train-mode", help="sft|dpo|orpo|grpo|ppo|..."),
+    train_type: Optional[str] = typer.Option(None, "--train-type", help="lora|dora|full"),
+    dry_run: bool = typer.Option(False, "--dry-run"),
+):
+    """Run mlx-lm-lora training with passthrough args.
+    Use `--` to pass through any additional mlx-lm-lora flags.
+    """
+    root = project_root_from_cwd()
+    cmd = build_mlx_lm_lora_train_command(
+        config=config,
+        model=model,
+        data=data,
+        train_mode=train_mode,
+        train_type=train_type,
+        extra_args=list(ctx.args),
+    )
+    run_mlx_lm_lora_command(cmd, dry_run=dry_run, cwd=root)
+@lora_app.command(
+    "synthetic",
+    context_settings={"allow_extra_args": True, "ignore_unknown_options": True},
+)
+def lora_synthetic(
+    ctx: typer.Context,
+    kind: str = typer.Argument(..., help="prompts|sft|dpo"),
+    dry_run: bool = typer.Option(False, "--dry-run"),
+):
+    """Run mlx-lm-lora synthetic dataset generation."""
+    root = project_root_from_cwd()
+    cmd = build_mlx_lm_lora_synth_command(kind, extra_args=list(ctx.args))
+    run_mlx_lm_lora_command(cmd, dry_run=dry_run, cwd=root)
+@lora_app.command(
+    "judge",
+    context_settings={"allow_extra_args": True, "ignore_unknown_options": True},
+)
+def lora_judge(
+    ctx: typer.Context,
+    dry_run: bool = typer.Option(False, "--dry-run"),
+):
+    """Run mlx-lm-lora judge model training."""
+    root = project_root_from_cwd()
+    cmd = build_mlx_lm_lora_judge_command(extra_args=list(ctx.args))
+    run_mlx_lm_lora_command(cmd, dry_run=dry_run, cwd=root)
+@lora_app.command(
+    "reward-functions",
+    context_settings={"allow_extra_args": True, "ignore_unknown_options": True},
+)
+def lora_reward_functions(
+    ctx: typer.Context,
+    dry_run: bool = typer.Option(False, "--dry-run"),
+):
+    """List mlx-lm-lora reward functions."""
+    root = project_root_from_cwd()
+    cmd = build_mlx_lm_lora_reward_functions_command(extra_args=list(ctx.args))
+    run_mlx_lm_lora_command(cmd, dry_run=dry_run, cwd=root)
 @app.command()
 def eval(
     suite: str = typer.Option("eval/suites/coding.yaml", "--suite"),
@@ -729,7 +895,7 @@ def rlm_history(limit: int = typer.Option(10, "--limit")):
 @accel_app.command("status")
 def accel_status():
-    backends = ["none", "zmlx"]
+    backends = ["none"]
     table = Table(title="mlxsmith accel status")
     table.add_column("backend")
     table.add_column("available")
@@ -934,6 +1100,25 @@ def verify(prompt: str, completion: str, workdir: str, **kwargs):
 """
+def _sample_verifier_llm_judge() -> str:
+    return """from mlxsmith.verifiers.llm_judge import verify as _verify
+def verify(prompt: str, completion: str, workdir: str, **kwargs):
+    # Pass model=... or set MLXSMITH_JUDGE_MODEL for the judge model id.
+    return _verify(prompt, completion, workdir, **kwargs)
+"""
+def _sample_judge_rubric() -> str:
+    return """Score from 0.0 to 1.0.
+- 1.0: Correct, complete, and safe.
+- 0.7: Mostly correct with small issues.
+- 0.4: Partial correctness or unclear reasoning.
+- 0.0: Incorrect or unsafe.
+Return JSON only.
+"""
 def _sample_eval_suite() -> str:
     return """name: coding-eval-sample
 notes: |

{mlxsmith-0.1.1 → mlxsmith-0.1.3}/src/mlxsmith/config_models.py RENAMED Viewed

@@ -6,7 +6,7 @@ from typing import Dict, List, Literal, Optional, Any
 from pydantic import BaseModel, Field, field_validator
-AccelBackendName = Literal["none", "zmlx"]
+AccelBackendName = Literal["none"]
 class ModelConfig(BaseModel):
@@ -47,6 +47,8 @@ class TrainConfig(BaseModel):
     grad_accum: int = 8
     lr: float = 2e-4
     weight_decay: float = 0.0
+    optimizer: str = "adamw"
+    optimizer_kwargs: Dict[str, Any] = Field(default_factory=dict)
     iters: int = 1000
     save_every: int = 100
     eval_every: int = 100
@@ -61,6 +63,11 @@ class TrainConfig(BaseModel):
             raise ValueError("value must be non-negative")
         return v
+    @field_validator("optimizer")
+    @classmethod
+    def normalize_optimizer(cls, v: str) -> str:
+        return v.strip().lower()
 class LoraConfig(BaseModel):
     """LoRA/DoRA adapter configuration."""
@@ -89,11 +96,13 @@ class LoraConfig(BaseModel):
 class PrefConfig(BaseModel):
-    """Preference tuning configuration (DPO, ORPO, GRPO)."""
+    """Preference tuning configuration (DPO variants)."""
     algo: Literal["dpo", "orpo", "grpo"] = "dpo"
+    loss_type: Literal["dpo", "cpo", "orpo", "ipo", "hinge"] = "dpo"
     beta: float = 0.1
     kl_coeff: float = 0.0
+    delta: float = 0.0
     reference_model: Optional[str] = None
@@ -101,12 +110,16 @@ class RftConfig(BaseModel):
     """Reinforcement fine-tuning configuration."""
     algo: Literal["grpo"] = "grpo"
+    loss_type: Literal["grpo", "dr_grpo", "dapo"] = "grpo"
     rollouts: int = 8
     kl_coeff: float = 0.02
     max_steps_per_task: int = 1
     temperature: float = 0.8
     max_new_tokens: int = 256
     normalize_advantage: bool = True
+    epsilon_low: float = 0.2
+    epsilon_high: float = 0.2
+    token_level_loss: bool = False
     reference_model: Optional[str] = None
@@ -164,6 +177,7 @@ CLI_ALIASES: dict[str, tuple[str, ...]] = {
     "lr": ("train", "lr"),
     "batch_size": ("train", "batch_size"),
     "iters": ("train", "iters"),
+    "optimizer": ("train", "optimizer"),
     "model_id": ("model", "id"),
     "accel_backend": ("accel", "backend"),
     "host": ("serve", "host"),

mlxsmith-0.1.3/src/mlxsmith/integrations/__init__.py ADDED Viewed

@@ -0,0 +1,19 @@
+"""External integrations for mlxsmith."""
+from .mlx_lm_lora import (
+    build_train_command as build_mlx_lm_lora_train_command,
+    build_synthetic_command as build_mlx_lm_lora_synth_command,
+    build_judge_command as build_mlx_lm_lora_judge_command,
+    build_reward_functions_command as build_mlx_lm_lora_reward_functions_command,
+    run_command as run_mlx_lm_lora_command,
+    ensure_available as ensure_mlx_lm_lora_available,
+)
+__all__ = [
+    "build_mlx_lm_lora_train_command",
+    "build_mlx_lm_lora_synth_command",
+    "build_mlx_lm_lora_judge_command",
+    "build_mlx_lm_lora_reward_functions_command",
+    "run_mlx_lm_lora_command",
+    "ensure_mlx_lm_lora_available",
+]

mlxsmith 0.1.1__tar.gz → 0.1.3__tar.gz

mlxsmith 0.1.1tar.gz → 0.1.3tar.gz