PyPI - driftless - Versions diffs - 0.2.4__tar.gz → 0.2.6__tar.gz - Mend

driftless 0.2.4tar.gz → 0.2.6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (89) hide show

{driftless-0.2.4 → driftless-0.2.6}/CHANGELOG.md RENAMED Viewed

@@ -17,6 +17,28 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ---
+## [0.2.6] - 2026-07-01
+### Added
+- **P0.3 multi-seed tuning selection** — optional `migration.split_seed_count`
+  (1–5) averages tuning-split metrics across shuffle seeds when scoring repair
+  candidates; holdout validation still uses the primary `--seed` only.
+---
+## [0.2.5] - 2026-07-01
+### Added
+- **`init-ci` label-audit workflow** — scaffold `driftless-label-audit.yml` (or
+  `-all` matrix) with `audit-labels --fail` on eval dataset path changes.
+- **`init-ci` judge-check workflow** — scaffold `driftless-judge-check.yml` when
+  `eval.judge.calibration_path` is set; uses `--enforce` when gate thresholds
+  are configured.
+---
 ## [0.2.4] - 2026-07-01
 ### Fixed
@@ -120,8 +142,10 @@ First public release on [PyPI](https://pypi.org/project/driftless/0.1.0/).
 - **Docs** — project overview, repair algorithm spec, 2×2 migration methodology,
   Poetry + Dependabot product framing.
-[Unreleased]: https://github.com/driftless-dev/driftless/compare/v0.2.4...HEAD
-[0.2.4]: https://github.com/driftless-dev/driftless/releases/tag/v0.2.4
+[Unreleased]: https://github.com/driftless-dev/driftless/compare/v0.2.6...HEAD
+[0.2.6]: https://github.com/driftless-dev/driftless/releases/tag/v0.2.6
+[0.2.5]: https://github.com/driftless-dev/driftless/compare/v0.2.5...v0.2.6
+[0.2.4]: https://github.com/driftless-dev/driftless/compare/v0.2.4...v0.2.5
 [0.2.3]: https://github.com/driftless-dev/driftless/compare/v0.2.3...v0.2.4
 [0.2.2]: https://github.com/driftless-dev/driftless/compare/v0.2.2...v0.2.3
 [0.2.1]: https://github.com/driftless-dev/driftless/releases/tag/v0.2.1

{driftless-0.2.4 → driftless-0.2.6}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: driftless
-Version: 0.2.4
+Version: 0.2.6
 Summary: Keep prompts in sync when model or eval data changes — Poetry-style lock regeneration, Dependabot-style PRs.
 Project-URL: Homepage, https://github.com/driftless-dev/driftless
 Project-URL: Repository, https://github.com/driftless-dev/driftless
@@ -96,7 +96,7 @@ optimizes against it, with your team owning the definition of "good":
 |---|---|
 | `init` | Scaffold a `driftless.yml`. |
 | `init-policy` | Scaffold a `.driftless/policy.yml` (when to migrate). |
-| `init-ci` | Scaffold `.github/workflows/` for scan, migrate, refine, and poll. |
+| `init-ci` | Scaffold `.github/workflows/` for scan, migrate, refine, poll, plan, label audit, and judge check. |
 | `scan` | Find probable LLM usage and at-risk models. |
 | `plan` | Discover at-risk workflows and apply the migration policy (CI triage). |
 | `plan --act` | Migrate + open a PR/issue for every actionable trigger (close the loop). |
@@ -129,11 +129,11 @@ propose it.
 ## GitHub-native usage
 A composite GitHub Action (`action.yml`) wraps the CLI so scans and migrations
-can run in CI. See `.github/workflows/` for a scheduled deprecation scan and a
-manually-triggered migration that opens a PR (or an issue when blocked).
+can run in CI. See `.github/workflows/` for a scheduled deprecation scan, weekly
+`plan --act` triage, and manually-triggered migration workflows.
 ```yaml
-- uses: driftless-dev/driftless@v0.2.4
+- uses: driftless-dev/driftless@v0.2.6
   with:
     command: scan
 ```

{driftless-0.2.4 → driftless-0.2.6}/README.md RENAMED Viewed

@@ -57,7 +57,7 @@ optimizes against it, with your team owning the definition of "good":
 |---|---|
 | `init` | Scaffold a `driftless.yml`. |
 | `init-policy` | Scaffold a `.driftless/policy.yml` (when to migrate). |
-| `init-ci` | Scaffold `.github/workflows/` for scan, migrate, refine, and poll. |
+| `init-ci` | Scaffold `.github/workflows/` for scan, migrate, refine, poll, plan, label audit, and judge check. |
 | `scan` | Find probable LLM usage and at-risk models. |
 | `plan` | Discover at-risk workflows and apply the migration policy (CI triage). |
 | `plan --act` | Migrate + open a PR/issue for every actionable trigger (close the loop). |
@@ -90,11 +90,11 @@ propose it.
 ## GitHub-native usage
 A composite GitHub Action (`action.yml`) wraps the CLI so scans and migrations
-can run in CI. See `.github/workflows/` for a scheduled deprecation scan and a
-manually-triggered migration that opens a PR (or an issue when blocked).
+can run in CI. See `.github/workflows/` for a scheduled deprecation scan, weekly
+`plan --act` triage, and manually-triggered migration workflows.
 ```yaml
-- uses: driftless-dev/driftless@v0.2.4
+- uses: driftless-dev/driftless@v0.2.6
   with:
     command: scan
 ```

{driftless-0.2.4 → driftless-0.2.6}/docs/RELEASE.md RENAMED Viewed

@@ -153,7 +153,7 @@ After a release, users can pin the composite Action by release tag
 (`action.yml` lives at the repo root — no `/action` path segment):
 ```yaml
-- uses: driftless-dev/driftless@v0.2.4
+- uses: driftless-dev/driftless@v0.2.6
   with:
     command: scan
 ```
@@ -161,9 +161,9 @@ After a release, users can pin the composite Action by release tag
 Or pin the PyPI package in the Action input:
 ```yaml
-- uses: driftless-dev/driftless@v0.2.4
+- uses: driftless-dev/driftless@v0.2.6
   with:
-    version: "==0.2.4"
+    version: "==0.2.6"
     command: migrate
 ```
@@ -171,7 +171,7 @@ Optionally maintain a floating **`v1`** tag on the latest stable minor release
 (point it at the current release tag after each publish):
 ```bash
-git tag -f v1 v0.2.4 && git push origin v1 --force
+git tag -f v1 v0.2.6 && git push origin v1 --force
 ```
 Update [`action.yml`](../action.yml) default `version` input when cutting releases.
@@ -213,7 +213,9 @@ In **Settings → Secrets and variables → Actions**, add:
 | `ANTHROPIC_API_KEY` | Live eval matrix job (`provider: anthropic`) |
 If a secret is missing, that provider job exits cleanly with a warning (CI stays
-green). When both are set, nightly runs append to
+green). On scheduled or manual runs, the **secrets-preflight** job writes a
+summary table to the workflow run so you can see which keys are configured.
+When both are set, nightly runs append to
 `.driftless/regression-metrics.jsonl` and check against
 `tests/fixtures/live_eval_baseline.json` with `--require-all`.

{driftless-0.2.4 → driftless-0.2.6}/site/docs.html RENAMED Viewed

@@ -428,7 +428,7 @@ driftless view -w support_classifier</code></pre>
     <span class="tok-k">runs-on</span>: ubuntu-latest
     <span class="tok-k">steps</span>:
       - <span class="tok-k">uses</span>: actions/checkout@v4
-      - <span class="tok-k">uses</span>: driftless-dev/driftless@v0.2.4
+      - <span class="tok-k">uses</span>: driftless-dev/driftless@v0.2.6
         <span class="tok-k">with</span>:
           <span class="tok-k">command</span>: <span class="tok-s">plan</span></code></pre>
         <p>A scheduled <code class="inline">plan</code> gates CI when a deprecated model needs attention; a manually-triggered <code class="inline">migrate</code> opens a PR (or an issue when blocked) with the evidence attached.</p>

{driftless-0.2.4 → driftless-0.2.6}/src/driftless/__init__.py RENAMED Viewed

@@ -1,3 +1,3 @@
 """driftless: Dependabot for LLM models."""
-__version__ = "0.2.4"
+__version__ = "0.2.6"

{driftless-0.2.4 → driftless-0.2.6}/src/driftless/cli.py RENAMED Viewed

@@ -136,6 +136,16 @@ def init_ci(
     plan: bool = typer.Option(
         False, "--plan/--no-plan", help="Scaffold scheduled plan --act workflow."
     ),
+    audit_labels: bool | None = typer.Option(
+        None,
+        "--audit-labels/--no-audit-labels",
+        help="Scaffold label-audit CI workflow (default: on if labels_path is set).",
+    ),
+    judge_check: bool | None = typer.Option(
+        None,
+        "--judge-check/--no-judge-check",
+        help="Scaffold judge-calibration CI workflow (default: on if calibration_path is set).",
+    ),
 ) -> None:
     """Scaffold GitHub Actions workflows wired to the driftless composite Action."""
     from .init_ci import CHECKLIST, scaffold_ci_from_path
@@ -151,6 +161,8 @@ def init_ci(
             include_refine=refine,
             include_poll=poll,
             include_plan=plan,
+            include_audit_labels=audit_labels,
+            include_judge_check=judge_check,
         )
     except DriftlessError as exc:
         _fail(exc)

{driftless-0.2.4 → driftless-0.2.6}/src/driftless/contract.py RENAMED Viewed

@@ -324,6 +324,16 @@ class MigrationSpec(StrictModel):
     allow_business_logic_edits: bool = False
     max_iterations: int = 8
     holdout_required: bool = True
+    # When >1, average tuning-split metrics across this many shuffle seeds
+    # (seed, seed+1, …) when scoring repair candidates. Holdout uses ``seed`` only.
+    split_seed_count: int = 1
+    @field_validator("split_seed_count")
+    @classmethod
+    def _split_seed_count_range(cls, v: int) -> int:
+        if v < 1 or v > 5:
+            raise ValueError("migration.split_seed_count must be between 1 and 5")
+        return v
 class RepairSpec(StrictModel):

{driftless-0.2.4 → driftless-0.2.6}/src/driftless/engine.py RENAMED Viewed

@@ -30,10 +30,10 @@ from .calibrate import suggest_thresholds
 from .compare import ThresholdCheck, check_thresholds
 from .contract import ThresholdsSpec, Workflow
 from .errors import DriftlessError
-from .evaluation import Metrics, RecordRow, RunAnalysis, analyze
+from .evaluation import Metrics, RecordRow, RunAnalysis, analyze, average_metrics
 from .harness import run_workflow
 from .progress import log as progress_log
-from .splits import make_splits, materialize_inputs
+from .splits import Split, make_splits, materialize_inputs
 # --------------------------------------------------------------------------- #
@@ -336,6 +336,8 @@ class MigrationResult:
     message: str = ""
     # Frozen editable files at loop start — baseline for per-candidate diffs in reports/UI.
     original_editable_files: dict[str, str] = field(default_factory=dict)
+    # Shuffle seeds used for tuning (primary ``seed`` only when split_seed_count==1).
+    split_seeds_used: list[int] = field(default_factory=list)
     @property
     def succeeded(self) -> bool:
@@ -516,11 +518,19 @@ def run_migration(
         )
     split = make_splits(workflow, cwd=cwd, seed=seed)
+    split_seeds_used = list(range(seed, seed + mig.split_seed_count))
     size_warnings = assess_split_sizes(
         len(split.input_lines),
         len(split.holdout_idx),
         holdout_required=mig.holdout_required,
     )
+    if mig.split_seed_count > 1:
+        size_warnings.append(
+            f"Multi-seed tuning: candidate selection averages metrics across "
+            f"{mig.split_seed_count} shuffle seeds ({split_seeds_used[0]}.."
+            f"{split_seeds_used[-1]}); each candidate scoring multiplies tuning "
+            "workflow runs."
+        )
     use_ids = bool(workflow.eval.id_field) and split.gold is not None
@@ -532,18 +542,23 @@ def run_migration(
         return judge_evidence_samples(rows)
     def evaluate_on(
-        model: str, idx: list[int], files: dict[str, str] | None = None
+        model: str,
+        idx: list[int],
+        files: dict[str, str] | None = None,
+        *,
+        split_ref: Split | None = None,
     ) -> RunAnalysis:
+        sp = split_ref or split
         file_ctx = apply_files(files, cwd=cwd) if files else nullcontext()
-        idx_lines = split.lines_for(idx)
+        idx_lines = sp.lines_for(idx)
         with materialize_inputs(workflow, idx_lines, cwd=cwd):
             with file_ctx:
                 run = run_workflow(workflow, model, cwd=cwd)
-                if use_ids:
+                if use_ids and sp.gold_ids is not None:
                     return analyze(
                         workflow,
                         run,
-                        gold_by_id=split.gold_by_id_for(idx),
+                        gold_by_id=sp.gold_by_id_for(idx),
                         inputs=idx_lines,
                         judge=judge,
                         cwd=cwd,
@@ -551,12 +566,27 @@ def run_migration(
                 return analyze(
                     workflow,
                     run,
-                    gold_labels=split.gold_for(idx),
+                    gold_labels=sp.gold_for(idx),
                     inputs=idx_lines,
                     judge=judge,
                     cwd=cwd,
                 )
+    def evaluate_tuning(
+        model: str, files: dict[str, str] | None = None
+    ) -> RunAnalysis:
+        """Score on the tuning split; average across seeds when configured."""
+        if mig.split_seed_count <= 1:
+            return evaluate_on(model, split.tuning_idx, files)
+        tuning_splits = [make_splits(workflow, cwd=cwd, seed=s) for s in split_seeds_used]
+        analyses = [
+            evaluate_on(model, sp.tuning_idx, files, split_ref=sp) for sp in tuning_splits
+        ]
+        return RunAnalysis(
+            metrics=average_metrics([a.metrics for a in analyses]),
+            rows=analyses[0].rows,
+        )
     progress_log(
         f"migration: phase 1/3 — initial eval "
         f"({len(split.tuning_idx)} tuning examples, model={current})"
@@ -597,6 +627,7 @@ def run_migration(
                 holdout_checks=holdout_checks,
                 tuning_checks=naive_checks,
                 warnings=size_warnings,
+                split_seeds_used=split_seeds_used,
                 judge_agreement=judge_agreement_info,
                 judge_evidence=_judge_evidence(naive_analysis.rows),
                 message="naive model swap passes thresholds; only the model ID changes",
@@ -691,7 +722,7 @@ def run_migration(
             cand_size = _patch_diff_size(patch.files, original_editable)
             try:
                 validate_patch_scope(patch, workflow, cwd)
-                analysis = evaluate_on(target_model, split.tuning_idx, files=patch.files)
+                analysis = evaluate_tuning(target_model, files=patch.files)
             except DriftlessError as exc:
                 experiment_log.append(
                     AttemptRecord(
@@ -786,6 +817,7 @@ def run_migration(
                     experiment_log=experiment_log,
                     cluster_history=cluster_history,
                     warnings=size_warnings,
+                split_seeds_used=split_seeds_used,
                     judge_agreement=judge_agreement_info,
                     judge_evidence=_judge_evidence(best_analysis.rows),
                     original_editable_files=original_editable,
@@ -855,6 +887,7 @@ def run_migration(
             experiment_log=experiment_log,
             cluster_history=cluster_history,
             warnings=size_warnings,
+            split_seeds_used=split_seeds_used,
             suggested_thresholds=suggested,
             judge_agreement=judge_agreement_info,
             judge_evidence=_judge_evidence(best_analysis.rows),
@@ -887,6 +920,7 @@ def run_migration(
         experiment_log=experiment_log,
         cluster_history=cluster_history,
         warnings=size_warnings,
+        split_seeds_used=split_seeds_used,
         judge_agreement=judge_agreement_info,
         judge_evidence=_judge_evidence(best_analysis.rows),
         original_editable_files=original_editable,

{driftless-0.2.4 → driftless-0.2.6}/src/driftless/evaluation.py RENAMED Viewed

@@ -96,6 +96,39 @@ class Metrics:
     scored: int = 0
+def average_metrics(items: list[Metrics]) -> Metrics:
+    """Mean of headline metrics across multiple eval runs (multi-seed tuning)."""
+    if not items:
+        raise ValueError("average_metrics requires at least one Metrics")
+    if len(items) == 1:
+        return items[0]
+    def _mean(vals: list[float | None]) -> float | None:
+        nums = [v for v in vals if v is not None]
+        return sum(nums) / len(nums) if nums else None
+    def _mean_int(vals: list[int]) -> int:
+        return int(round(sum(vals) / len(vals)))
+    costs = [m.total_cost for m in items if m.total_cost is not None]
+    return Metrics(
+        n=items[0].n,
+        schema_error_rate=_mean([m.schema_error_rate for m in items]),
+        refusal_rate=_mean([m.refusal_rate for m in items]) or 0.0,
+        accuracy=_mean([m.accuracy for m in items]),
+        precision=_mean([m.precision for m in items]),
+        recall=_mean([m.recall for m in items]),
+        f1=_mean([m.f1 for m in items]),
+        avg_latency_ms=_mean([m.avg_latency_ms for m in items]),
+        total_cost=sum(costs) if costs else None,
+        score=_mean([m.score for m in items]),
+        schema_errors=_mean_int([m.schema_errors for m in items]),
+        refusals=_mean_int([m.refusals for m in items]),
+        labeled=items[0].labeled,
+        scored=items[0].scored,
+    )
 def load_jsonl(path: Path) -> list[OutputRecord]:
     records: list[OutputRecord] = []
     with path.open(encoding="utf-8") as fh:

{driftless-0.2.4 → driftless-0.2.6}/src/driftless/init_ci.py RENAMED Viewed

@@ -2,6 +2,7 @@
 from __future__ import annotations
+from dataclasses import dataclass
 from pathlib import Path
 from . import __version__
@@ -203,6 +204,204 @@ jobs:
 """
+def label_audit_workflows(contract: Contract) -> list[str]:
+    """Workflow names eligible for gold-label auditing (classification + labels_path)."""
+    names: list[str] = []
+    for name, wf in contract.workflows.items():
+        if wf.eval.grading != "label":
+            continue
+        if not wf.eval.labels_path:
+            continue
+        names.append(name)
+    return names
+def label_audit_paths(contract: Contract) -> list[str]:
+    """Union of dataset paths for workflows included in label audit."""
+    paths: list[str] = []
+    for name in label_audit_workflows(contract):
+        for path in dataset_paths(contract.workflows[name]):
+            if path not in paths:
+                paths.append(path)
+    return paths
+def render_audit_labels_workflow(
+    action_ref: str,
+    workflow_names: list[str],
+    paths: list[str],
+) -> str:
+    if not workflow_names:
+        raise ValueError("workflow_names must not be empty")
+    title = (
+        f"driftless label audit ({workflow_names[0]})"
+        if len(workflow_names) == 1
+        else "driftless label audit"
+    )
+    if len(workflow_names) == 1:
+        matrix_block = ""
+        workflow_arg = workflow_names[0]
+        workflow_step = f"""\
+      - name: Audit gold labels ({workflow_names[0]})
+        uses: {action_ref}
+        with:
+          command: audit-labels
+          workflow: {workflow_arg}
+          args: "--fail"
+"""
+    else:
+        matrix_yaml = "\n".join(f"          - {name!r}" for name in workflow_names)
+        matrix_block = f"""\
+    strategy:
+      fail-fast: false
+      matrix:
+        workflow:
+{matrix_yaml}
+"""
+        workflow_step = f"""\
+      - name: Audit gold labels (${{{{ matrix.workflow }}}})
+        uses: {action_ref}
+        with:
+          command: audit-labels
+          workflow: ${{{{ matrix.workflow }}}}
+          args: "--fail"
+"""
+    return f"""\
+name: {title}
+# Fail CI when duplicate/near-duplicate inputs carry disagreeing gold labels.
+on:
+  pull_request:
+    paths:
+{_path_filter_block(paths)}\
+  push:
+    branches: [main]
+    paths:
+{_path_filter_block(paths)}\
+  workflow_dispatch:
+jobs:
+  audit:
+    runs-on: ubuntu-latest
+{matrix_block}\
+    steps:
+      - uses: actions/checkout@v4
+{workflow_step}\
+"""
+@dataclass(frozen=True)
+class JudgeCheckTarget:
+    name: str
+    calibration_path: str
+    enforce: bool
+def judge_check_targets(contract: Contract) -> list[JudgeCheckTarget]:
+    """Judge-graded workflows with a human calibration set configured."""
+    targets: list[JudgeCheckTarget] = []
+    for name, wf in contract.workflows.items():
+        if wf.eval.grading != "judge" or wf.eval.judge is None:
+            continue
+        spec = wf.eval.judge
+        if not spec.calibration_path:
+            continue
+        enforce = spec.max_mae is not None or spec.min_correlation is not None
+        targets.append(
+            JudgeCheckTarget(
+                name=name,
+                calibration_path=spec.calibration_path,
+                enforce=enforce,
+            )
+        )
+    return targets
+def judge_check_paths(contract: Contract) -> list[str]:
+    paths: list[str] = []
+    for target in judge_check_targets(contract):
+        if target.calibration_path not in paths:
+            paths.append(target.calibration_path)
+    return paths
+def render_judge_check_workflow(
+    action_ref: str,
+    targets: list[JudgeCheckTarget],
+    paths: list[str],
+) -> str:
+    if not targets:
+        raise ValueError("targets must not be empty")
+    title = (
+        f"driftless judge check ({targets[0].name})"
+        if len(targets) == 1
+        else "driftless judge check"
+    )
+    if len(targets) == 1:
+        target = targets[0]
+        matrix_block = ""
+        args = '"--enforce"' if target.enforce else '""'
+        workflow_step = f"""\
+      - name: Judge calibration check ({target.name})
+        uses: {action_ref}
+        with:
+          command: judge-check
+          workflow: {target.name}
+          args: {args}
+        env:
+{_provider_env_block()}\
+"""
+    else:
+        include_lines: list[str] = []
+        for target in targets:
+            args = '"--enforce"' if target.enforce else '""'
+            include_lines.append(
+                f"          - workflow: {target.name!r}\n"
+                f"            args: {args}"
+            )
+        matrix_block = (
+            "    strategy:\n"
+            "      fail-fast: false\n"
+            "      matrix:\n"
+            "        include:\n"
+            + "\n".join(include_lines)
+            + "\n\n"
+        )
+        workflow_step = f"""\
+      - name: Judge calibration check (${{{{ matrix.workflow }}}})
+        uses: {action_ref}
+        with:
+          command: judge-check
+          workflow: ${{{{ matrix.workflow }}}}
+          args: ${{{{ matrix.args }}}}
+        env:
+{_provider_env_block()}\
+"""
+    return f"""\
+name: {title}
+# Measure LLM-judge agreement against human-scored calibration records.
+on:
+  pull_request:
+    paths:
+{_path_filter_block(paths)}\
+  push:
+    branches: [main]
+    paths:
+{_path_filter_block(paths)}\
+  workflow_dispatch:
+jobs:
+  judge-check:
+    runs-on: ubuntu-latest
+{matrix_block}\
+    steps:
+      - uses: actions/checkout@v4
+{workflow_step}\
+"""
 def render_plan_workflow(action_ref: str) -> str:
     return f"""\
 name: driftless plan (deprecation triage)
@@ -251,6 +450,8 @@ def scaffold_ci(
     include_refine: bool = True,
     include_poll: bool | None = None,
     include_plan: bool = False,
+    include_audit_labels: bool | None = None,
+    include_judge_check: bool | None = None,
 ) -> list[Path]:
     """Write GitHub workflow YAML files under ``out_dir``."""
     action_ref = action_ref or default_action_ref()
@@ -293,10 +494,52 @@ def scaffold_ci(
     if include_plan:
         write(out_dir / "driftless-plan-act.yml", render_plan_workflow(action_ref))
+    audit_names = label_audit_workflows(contract)
+    audit_needed = include_audit_labels
+    if audit_needed is None:
+        audit_needed = bool(audit_names)
+    if audit_needed:
+        if not audit_names:
+            raise DriftlessError(
+                "label audit workflow requires a classification workflow with eval.labels_path",
+                hint="add labels_path to a workflow or pass --no-audit-labels",
+            )
+        audit_paths = label_audit_paths(contract)
+        fname = (
+            "driftless-label-audit.yml"
+            if len(audit_names) == 1
+            else "driftless-label-audit-all.yml"
+        )
+        write(
+            out_dir / fname,
+            render_audit_labels_workflow(action_ref, audit_names, audit_paths),
+        )
+    judge_targets = judge_check_targets(contract)
+    judge_needed = include_judge_check
+    if judge_needed is None:
+        judge_needed = bool(judge_targets)
+    if judge_needed:
+        if not judge_targets:
+            raise DriftlessError(
+                "judge-check workflow requires eval.judge.calibration_path",
+                hint="add a human-scored calibration set or pass --no-judge-check",
+            )
+        judge_paths = judge_check_paths(contract)
+        fname = (
+            "driftless-judge-check.yml"
+            if len(judge_targets) == 1
+            else "driftless-judge-check-all.yml"
+        )
+        write(
+            out_dir / fname,
+            render_judge_check_workflow(action_ref, judge_targets, judge_paths),
+        )
     if not written:
         raise DriftlessError(
             "nothing to scaffold",
-            hint="enable at least one of scan, migrate, refine, poll, or plan",
+            hint="enable at least one of scan, migrate, refine, poll, plan, audit-labels, or judge-check",
         )
     return written
@@ -321,5 +564,7 @@ Next steps:
   2. For poll workflows: DRIFTLESS_DATASOURCE_TOKEN if eval.data_source URLs need auth.
   3. Confirm workflow path filters match your eval dataset paths in driftless.yml.
   4. Run driftless validate -w <workflow> locally before enabling scheduled jobs.
-  5. Pin the Action ref when upgrading: uses: driftless-dev/driftless@vX.Y.Z
+  5. Run driftless audit-labels -w <workflow> locally; CI uses --fail on label conflicts.
+  6. For judge-graded workflows: driftless judge-check -w <workflow> --enforce when gates are set.
+  7. Pin the Action ref when upgrading: uses: driftless-dev/driftless@vX.Y.Z
 """

{driftless-0.2.4 → driftless-0.2.6}/src/driftless/report.py RENAMED Viewed

@@ -568,6 +568,7 @@ def result_to_dict(result: MigrationResult) -> dict:
         "experiment_log": [asdict(a) for a in result.experiment_log],
         "cluster_trajectory": cluster_trajectories(result.cluster_history),
         "warnings": result.warnings,
+        "split_seeds_used": result.split_seeds_used,
         "judge_agreement": asdict(result.judge_agreement) if result.judge_agreement else None,
         "judge_evidence": result.judge_evidence,
         "suggested_thresholds": result.suggested_thresholds,

driftless 0.2.4__tar.gz → 0.2.6__tar.gz

driftless 0.2.4tar.gz → 0.2.6tar.gz