PyPI - metaensemble - Versions diffs - 0.2.0__tar.gz → 0.3.0__tar.gz - Mend

metaensemble 0.2.0tar.gz → 0.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (100) hide show

{metaensemble-0.2.0 → metaensemble-0.3.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: metaensemble
-Version: 0.2.0
+Version: 0.3.0
 Summary: A typed runtime for ensembles of cognitive agents
 License-Expression: MIT
 Requires-Python: >=3.10
@@ -31,7 +31,7 @@ Dynamic: license-file
 MetaEnsemble gives every agent a persistent ID, every handoff a schema-validated contract, and every run an entry in an append-only ledger. Multiple agents instantiated from one Role specification execute in parallel. Identities survive across sessions. Token-efficient by construction.
-**v0.2.0 status:** feedback-first release. The software records and gates local agent work, but measured quality-per-token improvements remain a product hypothesis until the live evaluation set is larger and fully baseline-comparable. See [SYSTEM-CARD.md](./docs/SYSTEM-CARD.md).
+**v0.3.0 status:** feedback-first release, now with a measured calibration. The first full-tier cycle (320 live runs, 8 cells × 8 software tasks × 5 seeds) found acceptance-quality parity with strong single-agent baselines at a 1.55× token premium on tasks that fit one context, that every protocol primitive is load-bearing (ablations degrade it — the Manifest most), and that the full protocol more than doubles the runtime's default-subagent baseline. No quality-per-token superiority is claimed for this task class. See [SYSTEM-CARD.md](./docs/SYSTEM-CARD.md) and `evals/reports/20260704T140844Z-full.md`.
 ---
@@ -181,7 +181,7 @@ See [DEPLOYMENT.md](./docs/DEPLOYMENT.md) for the per-action behaviour and the f
 ## Status
-v0.2.0. All core phases complete and tested:
+v0.3.0. All core phases complete and tested:
 - Typed substrate (Manifest YAML, Brief JSON, Ledger SQLite + JSONL).
 - Lifecycle hooks for SessionStart, PreToolUse, PostToolUse, Write/deliverable-sync, file-tool provenance, SubagentStop (background-dispatch finalization), and Stop, with command-injection invariants enforced by an audit test.
@@ -191,9 +191,9 @@ v0.2.0. All core phases complete and tested:
 - Five-axis deliverable check on successful Runs: pytest, bandit, ruff, radon, and coverage for `.py` deliverables, plus project-configured per-axis commands (`axis_commands` in `quality.yaml`) so non-Python deliverables are checked across the same correctness/security/maintainability/complexity/coverage axes; quality runners ship in the `[test]` extras so CI runs the real tools.
 - Failed-run accounting via the `interrupted` and `budget_exceeded` outcomes (schema migration 002) plus the two-layer reconcile module.
 - Ledger field completeness — every documented Ledger field (Role version, model, tool use, files touched, output, gate state, review findings) is a column with an assertion test.
-- Evaluation harness under `evals/` with replay/smoke/full tiers, Wilson confidence intervals, and `pass@budget` / `quality_per_1k_tokens` / `orchestration_overhead_ratio` metrics. The shipped replay pack is a non-empirical bootstrap fixture. Live smoke/full runs are wired for side-effect-free classification-smoke checks; calibration and baseline-superiority claims still require larger labeled/fixture sets.
+- Evaluation harness under `evals/` with replay/smoke/full tiers, Wilson confidence intervals, and `pass@budget` / `quality_per_1k_tokens` / `orchestration_overhead_ratio` metrics. The full tier runs Suite-A software tasks live in sandboxed per-run workspaces with graded acceptance and honest cell isolation; the first calibration cycle is checked in at `evals/reports/20260704T140844Z-full.md`. Domain-classification calibration still requires an independently labeled set.
-v0.2.0 is feedback-first. Issues are welcome; see [CONTRIBUTING.md](./CONTRIBUTING.md) to get started.
+v0.3.0 is feedback-first. Issues are welcome; see [CONTRIBUTING.md](./CONTRIBUTING.md) to get started.
 See [PERFORMANCE.md](./docs/PERFORMANCE.md) for the engineering contract and benchmark numbers, [SYSTEM-CARD.md](./docs/SYSTEM-CARD.md) for known limitations and intended-use boundaries, and [SECURITY.md](./SECURITY.md) for the trust model.
 Release publication is gated by [RELEASE-CHECKLIST.md](./docs/RELEASE-CHECKLIST.md).

{metaensemble-0.2.0 → metaensemble-0.3.0}/README.md RENAMED Viewed

@@ -4,7 +4,7 @@
 MetaEnsemble gives every agent a persistent ID, every handoff a schema-validated contract, and every run an entry in an append-only ledger. Multiple agents instantiated from one Role specification execute in parallel. Identities survive across sessions. Token-efficient by construction.
-**v0.2.0 status:** feedback-first release. The software records and gates local agent work, but measured quality-per-token improvements remain a product hypothesis until the live evaluation set is larger and fully baseline-comparable. See [SYSTEM-CARD.md](./docs/SYSTEM-CARD.md).
+**v0.3.0 status:** feedback-first release, now with a measured calibration. The first full-tier cycle (320 live runs, 8 cells × 8 software tasks × 5 seeds) found acceptance-quality parity with strong single-agent baselines at a 1.55× token premium on tasks that fit one context, that every protocol primitive is load-bearing (ablations degrade it — the Manifest most), and that the full protocol more than doubles the runtime's default-subagent baseline. No quality-per-token superiority is claimed for this task class. See [SYSTEM-CARD.md](./docs/SYSTEM-CARD.md) and `evals/reports/20260704T140844Z-full.md`.
 ---
@@ -154,7 +154,7 @@ See [DEPLOYMENT.md](./docs/DEPLOYMENT.md) for the per-action behaviour and the f
 ## Status
-v0.2.0. All core phases complete and tested:
+v0.3.0. All core phases complete and tested:
 - Typed substrate (Manifest YAML, Brief JSON, Ledger SQLite + JSONL).
 - Lifecycle hooks for SessionStart, PreToolUse, PostToolUse, Write/deliverable-sync, file-tool provenance, SubagentStop (background-dispatch finalization), and Stop, with command-injection invariants enforced by an audit test.
@@ -164,9 +164,9 @@ v0.2.0. All core phases complete and tested:
 - Five-axis deliverable check on successful Runs: pytest, bandit, ruff, radon, and coverage for `.py` deliverables, plus project-configured per-axis commands (`axis_commands` in `quality.yaml`) so non-Python deliverables are checked across the same correctness/security/maintainability/complexity/coverage axes; quality runners ship in the `[test]` extras so CI runs the real tools.
 - Failed-run accounting via the `interrupted` and `budget_exceeded` outcomes (schema migration 002) plus the two-layer reconcile module.
 - Ledger field completeness — every documented Ledger field (Role version, model, tool use, files touched, output, gate state, review findings) is a column with an assertion test.
-- Evaluation harness under `evals/` with replay/smoke/full tiers, Wilson confidence intervals, and `pass@budget` / `quality_per_1k_tokens` / `orchestration_overhead_ratio` metrics. The shipped replay pack is a non-empirical bootstrap fixture. Live smoke/full runs are wired for side-effect-free classification-smoke checks; calibration and baseline-superiority claims still require larger labeled/fixture sets.
+- Evaluation harness under `evals/` with replay/smoke/full tiers, Wilson confidence intervals, and `pass@budget` / `quality_per_1k_tokens` / `orchestration_overhead_ratio` metrics. The full tier runs Suite-A software tasks live in sandboxed per-run workspaces with graded acceptance and honest cell isolation; the first calibration cycle is checked in at `evals/reports/20260704T140844Z-full.md`. Domain-classification calibration still requires an independently labeled set.
-v0.2.0 is feedback-first. Issues are welcome; see [CONTRIBUTING.md](./CONTRIBUTING.md) to get started.
+v0.3.0 is feedback-first. Issues are welcome; see [CONTRIBUTING.md](./CONTRIBUTING.md) to get started.
 See [PERFORMANCE.md](./docs/PERFORMANCE.md) for the engineering contract and benchmark numbers, [SYSTEM-CARD.md](./docs/SYSTEM-CARD.md) for known limitations and intended-use boundaries, and [SECURITY.md](./SECURITY.md) for the trust model.
 Release publication is gated by [RELEASE-CHECKLIST.md](./docs/RELEASE-CHECKLIST.md).

{metaensemble-0.2.0 → metaensemble-0.3.0}/evals/README.md RENAMED Viewed

@@ -23,20 +23,26 @@ evals/
 │   └── suite_b/               # domain-specific classification smoke set
 │       ├── README.md
 │       └── items.yaml
-├── baselines/                 # B1 / B2 / B3 baseline definitions
-│   ├── b1_single_agent.yaml
-│   ├── b2_single_agent_prompted.yaml
-│   ├── b3_subagent_default.yaml
-│   └── b4_best_prompt.yaml    # best-single-agent baseline
+├── fixtures/                  # deterministic Suite-A fixture repos
+│   ├── build.py               # single-commit builder; SHAs identical on every
+│   │                          # machine (pinned author/committer/date)
+│   ├── paginator/             # oss-fixture-paginator source tree
+│   └── legacy/                # oss-fixture-legacy source tree
 ├── cassettes/                 # replay fixtures; bootstrap pack is non-empirical
 ├── runners/                   # cell × seed executors
 │   ├── __init__.py
-│   ├── api.py                 # tiered runner: replay / live / smoke
-│   ├── metrics.py             # Wilson CI, pass@budget, quality_per_1k_tokens
-│   └── replay.py              # cassette-based PR runner
+│   ├── api.py                 # tiered dispatch: replay / live smoke (suite B)
+│   ├── suite_a.py             # live Suite-A: sandboxed workspaces, per-cell
+│   │                          # prompts, hook isolation via --setting-sources
+│   ├── acceptance.py          # graded acceptance checkers (build, tests, lint,
+│   │                          # API surface, links, perf, CI matrix)
+│   └── metrics.py             # Wilson CI, pass@budget, quality_per_1k_tokens
 └── reports/                   # generated reports per cycle (gitignored)
 ```
+The cell matrix (B1–B4 baselines, `MM_full`, three ablations) is defined
+in `configs/default.yaml`, not in separate baseline files.
 ## Tiered evaluation
 | Tier | When it runs | Live API calls | Budget |
@@ -54,7 +60,7 @@ The release ships a compact `evals/cassettes/bootstrap.jsonl` pack so the
 replay tier works in a clean checkout. That pack is deliberately marked
 non-empirical; it verifies the harness mechanics, not MetaEnsemble's
 quality claim. Live smoke/full reports are empirical for the cells and
-datasets actually run; the report notes any skipped deferred fixtures.
+datasets actually run.
 ## Headline metrics
@@ -88,10 +94,18 @@ open-source repos. Each task has:
 See `evals/datasets/suite_a/tasks.yaml` for the current set.
-The current Suite-A rows still contain deferred fixture SHAs. The live
-full tier names those skipped tasks in the report rather than treating
-them as passed or failed. Release certification across software tasks
-requires replacing the deferred SHAs with real fixture repositories.
+Every Suite-A row pins a resolved starting SHA: tasks a1/a2 pin the
+deterministic fixture commits from `evals/fixtures/build.py` (the builder
+produces byte-identical trees and therefore identical SHAs on every
+machine), and tasks a3–a8 pin the v0.2.0 release commit of this
+repository, with each description verified true at that SHA. Live runs
+materialize a fresh sandbox workspace per cell × task × seed (local
+clones only — no network), grade the result with
+`evals/runners/acceptance.py`, and keep every workspace on disk beside a
+`run-manifest.jsonl` for post-hoc inspection. Baseline cells run with
+`--setting-sources project,local` so the user-level MetaEnsemble hooks
+are excluded; MM cells run with all setting sources — the cell
+difference is the real orchestration layer, not only the prompt.
 ## Suite B — domain-specific classification (12 items, *smoke only*)
@@ -121,7 +135,8 @@ metaensemble eval --tier full --allow-live --cells all --seeds 5 --budget-usd 0.
 The output report lands in the current working directory at
 `evals/reports/<UTC-date>-<tier>.md` and is linked from
-`PERFORMANCE.md §4` once a cycle ships.
+`PERFORMANCE.md §4`. The first full-tier cycle shipped 2026-07-04 and is
+checked in at `evals/reports/20260704T140844Z-full.md`.
 Supported flags:

metaensemble-0.3.0/evals/datasets/suite_a/tasks.yaml ADDED Viewed

@@ -0,0 +1,157 @@
+# Suite A — eight software-engineering tasks.
+#
+# Each task gates the cell's pass-rate against measurable acceptance
+# criteria. The starting state is a frozen Git SHA so re-runs are
+# reproducible; the acceptance criteria are executed by `runners/api.py`
+# after the cell's deliverable lands.
+#
+# Starting states:
+# - `oss-fixture-*` repos are materialized by `evals/fixtures/build.py`;
+#   their SHAs are the deterministic single-commit values in
+#   `evals.fixtures.build.FIXTURE_SHAS`.
+# - `metaensemble` tasks pin the v0.2.0 release commit
+#   (`git rev-parse v0.2.0^{commit}`). Every description below was
+#   reconciled against that tree.
+tasks:
+  - id: a1_bugfix_off_by_one
+    title: Fix an off-by-one in a paginator
+    description: |
+      The paginator at `pagination.py:42` returns one fewer item than
+      expected on every page boundary. Locate and fix the bug; add a
+      regression test.
+    starting_repo: oss-fixture-paginator
+    starting_sha: cbb6c2178af85ab778dd215379bf0928b6e52268
+    acceptance:
+      - kind: build_passes
+      - kind: test_count_at_least
+        value: 5
+      - kind: lint_clean
+      - kind: file_modified
+        path: pagination.py
+      - kind: file_modified
+        path: test_pagination.py
+  - id: a2_refactor_module
+    title: Split a 680-line module into three cohesive files
+    description: |
+      The module `legacy/big_module.py` (680 lines) has three
+      responsibilities: parsing, validation, and rendering of the
+      recfile record format. Split it into three files preserving the
+      public API listed in `api_manifest.json`. No behavior change.
+    starting_repo: oss-fixture-legacy
+    starting_sha: c04afa1fb995fc47c53a7336dcb5873c4a4bdeb4
+    acceptance:
+      - kind: build_passes
+      - kind: test_count_at_least
+        value: 12
+      - kind: api_surface_preserved
+  - id: a3_doc_update
+    title: Document rollback verification in the USER-GUIDE
+    description: |
+      The "When all else fails — recovery" section of `docs/USER-GUIDE.md`
+      lists the split rollback commands (`metaensemble unadopt
+      --purge-state`, `metaensemble user-teardown --purge-state`) but
+      never tells the reader how to confirm a completed rollback.
+      `docs/DEPLOYMENT.md` covers this under "Verifying the rollback".
+      Add a short paragraph to the recovery section of
+      `docs/USER-GUIDE.md` explaining how to verify both rollback
+      scopes, with a link to the DEPLOYMENT.md section.
+    starting_repo: metaensemble
+    starting_sha: 27ac404d80312028eff49a5dca3a04338ff8f8ed
+    acceptance:
+      - kind: markdown_links_resolve
+      - kind: file_modified
+        path: docs/USER-GUIDE.md
+  - id: a4_test_addition
+    title: Add a reconcile provenance test
+    description: |
+      `metaensemble/tests/test_reconcile.py` has 20 tests covering
+      session and stale reconciliation. Reconciliation copies a pending
+      sidecar's `brief_in_path` into the recorded Run row, but no test
+      asserts it. Add a test that asserts a sidecar's `brief_in_path`
+      survives reconciliation into the Run row.
+    starting_repo: metaensemble
+    starting_sha: 27ac404d80312028eff49a5dca3a04338ff8f8ed
+    acceptance:
+      - kind: build_passes
+      - kind: test_count_delta_at_least
+        value: 1
+      - kind: file_modified
+        path: metaensemble/tests/test_reconcile.py
+  - id: a5_design_review
+    title: Review the uninstall and rollback design
+    description: |
+      Read the "Recovery and rollback" section of `docs/DEPLOYMENT.md`,
+      including its "Verifying the rollback" and "Full local rollback
+      after live testing" subsections, and produce a one-page review at
+      `reports/<date>-uninstall-review.md` naming at least three risks
+      the documented design does not address. Create the workspace-local
+      `reports/` directory if it does not exist.
+    starting_repo: metaensemble
+    starting_sha: 27ac404d80312028eff49a5dca3a04338ff8f8ed
+    acceptance:
+      - kind: file_exists
+        glob: reports/*-uninstall-review.md
+      - kind: word_count_at_least
+        value: 300
+  - id: a6_security_review
+    title: Security-review the transcript walker
+    description: |
+      `metaensemble/lib/transcript.py` reads the runtime's session
+      transcript JSONL from the hook-supplied `transcript_path`. Write
+      `reports/<date>-transcript-security.md` listing every defensive
+      assumption the walker makes and one concrete attack it survives.
+      Create the workspace-local `reports/` directory if it does not
+      exist.
+    starting_repo: metaensemble
+    starting_sha: 27ac404d80312028eff49a5dca3a04338ff8f8ed
+    acceptance:
+      - kind: file_exists
+        glob: reports/*-transcript-security.md
+  - id: a7_perf_tune
+    title: Tighten the get_window_burn p95 budget
+    description: |
+      `metaensemble/tests/test_perf_ledger.py` benchmarks
+      `get_window_burn` against the module-wide 10ms p95 budget on a
+      10k-row Ledger, and `idx_runs_window` already keeps the query
+      indexed. Measure the actual headroom, then tighten
+      `test_get_window_burn_meets_p95` to assert a dedicated 5ms p95
+      budget for `get_window_burn` without loosening any other
+      benchmark's budget.
+    starting_repo: metaensemble
+    starting_sha: 27ac404d80312028eff49a5dca3a04338ff8f8ed
+    acceptance:
+      - kind: build_passes
+      - kind: perf_benchmark_passes
+        benchmark: test_get_window_burn_meets_p95
+      - kind: file_modified
+        path: metaensemble/tests/test_perf_ledger.py
+  - id: a8_infra_change
+    title: Add the no-quality CI matrix axis
+    description: |
+      `.github/workflows/ci.yml` runs one test job over a
+      python-version matrix, and every leg installs the quality
+      runners (ruff, bandit, radon) via the `[test]` extra. Add a
+      `no-quality` matrix axis that runs pytest a second time with the
+      quality runners absent, and add `@pytest.mark.requires_radon` /
+      `requires_bandit` markers to the tests that need those tools so
+      the new leg can deselect them.
+    starting_repo: metaensemble
+    starting_sha: 27ac404d80312028eff49a5dca3a04338ff8f8ed
+    acceptance:
+      - kind: ci_yaml_has_matrix_axis
+        axis: no-quality
+# Every starting_sha above is a resolved, frozen commit: a1/a2 pin the
+# deterministic fixture commits built by `evals/fixtures/build.py`
+# (FIXTURE_SHAS), and a3-a8 pin the v0.2.0 release commit. The loader
+# needs no network round-trip, and
+# `metaensemble/tests/test_eval_fixtures.py` gates this file against
+# placeholder or drifted SHAs.

metaensemble-0.3.0/evals/fixtures/__init__.py ADDED Viewed

@@ -0,0 +1,7 @@
+"""Suite-A fixture source trees and the deterministic workspace builder.
+`build.build_fixture` materializes a fixture source tree into a
+single-commit git repository whose SHA is identical on every machine;
+`build.FIXTURE_SHAS` records the expected SHAs that
+`evals/datasets/suite_a/tasks.yaml` pins as `starting_sha`.
+"""

metaensemble-0.3.0/evals/fixtures/build.py ADDED Viewed

@@ -0,0 +1,140 @@
+"""Materialize Suite-A fixture workspaces as deterministic git repos.
+Each fixture source tree under ``evals/fixtures/`` is plain files with
+no ``.git``. :func:`build_fixture` copies a tree into a destination
+directory, normalizes file modes, and creates exactly one commit with a
+fixed author/committer identity and date, so the resulting commit SHA
+is identical on every machine. ``FIXTURE_SHAS`` records the expected
+SHAs; ``evals/datasets/suite_a/tasks.yaml`` pins the same values as
+``starting_sha`` for the ``oss-fixture-*`` tasks.
+Recompute the SHAs after editing a fixture source tree with::
+    python -m evals.fixtures.build --print-shas
+"""
+from __future__ import annotations
+import argparse
+import os
+import shutil
+import subprocess
+import tempfile
+from pathlib import Path
+_FIXTURES_ROOT = Path(__file__).resolve().parent
+_SOURCE_DIRS: dict[str, Path] = {
+    "oss-fixture-paginator": _FIXTURES_ROOT / "paginator",
+    "oss-fixture-legacy": _FIXTURES_ROOT / "legacy",
+}
+# Expected deterministic single-commit SHAs, produced by running this
+# builder. `metaensemble/tests/test_eval_fixtures.py` fails when a
+# fixture source tree drifts from these values without re-pinning.
+FIXTURE_SHAS: dict[str, str] = {
+    "oss-fixture-paginator": "cbb6c2178af85ab778dd215379bf0928b6e52268",
+    "oss-fixture-legacy": "c04afa1fb995fc47c53a7336dcb5873c4a4bdeb4",
+}
+# Fixed commit identity: with author, committer, and both dates pinned,
+# the commit SHA depends only on the tree contents and the message.
+_COMMIT_ENV = {
+    "GIT_AUTHOR_NAME": "MetaEnsemble Fixtures",
+    "GIT_AUTHOR_EMAIL": "fixtures@metaensemble.invalid",
+    "GIT_AUTHOR_DATE": "2026-01-01T00:00:00 +0000",
+    "GIT_COMMITTER_NAME": "MetaEnsemble Fixtures",
+    "GIT_COMMITTER_EMAIL": "fixtures@metaensemble.invalid",
+    "GIT_COMMITTER_DATE": "2026-01-01T00:00:00 +0000",
+}
+_IGNORED_NAMES = ("__pycache__", "*.pyc", ".pytest_cache", ".DS_Store", ".git")
+def _git(args: list[str], cwd: Path) -> str:
+    """Run git with the pinned identity and no user/system config."""
+    env = dict(os.environ)
+    env.update(_COMMIT_ENV)
+    # Isolate from user- and machine-level git config (gpg signing,
+    # autocrlf, templates) so the commit is byte-identical everywhere.
+    env["GIT_CONFIG_GLOBAL"] = os.devnull
+    env["GIT_CONFIG_SYSTEM"] = os.devnull
+    proc = subprocess.run(
+        ["git", *args],
+        cwd=str(cwd),
+        env=env,
+        capture_output=True,
+        text=True,
+    )
+    if proc.returncode != 0:
+        raise RuntimeError(
+            f"git {' '.join(args)} failed in {cwd}: {proc.stderr.strip()}"
+        )
+    return proc.stdout.strip()
+def build_fixture(name: str, dest: Path) -> str:
+    """Materialize fixture ``name`` into ``dest`` as a one-commit git repo.
+    ``name`` is one of ``FIXTURE_SHAS``'s keys. ``dest`` is created if
+    needed and must be empty. Returns the full 40-character commit SHA,
+    which is deterministic across machines.
+    """
+    source = _SOURCE_DIRS.get(name)
+    if source is None:
+        known = ", ".join(sorted(_SOURCE_DIRS))
+        raise ValueError(f"unknown fixture {name!r}; expected one of: {known}")
+    dest = Path(dest)
+    dest.mkdir(parents=True, exist_ok=True)
+    if any(dest.iterdir()):
+        raise ValueError(f"fixture destination {dest} is not empty")
+    shutil.copytree(
+        source,
+        dest,
+        dirs_exist_ok=True,
+        ignore=shutil.ignore_patterns(*_IGNORED_NAMES),
+    )
+    # Normalize modes so umask and checkout quirks cannot change the
+    # tree hash: directories 755, files 644.
+    for path in sorted(dest.rglob("*")):
+        if path.is_dir():
+            path.chmod(0o755)
+        elif path.is_file():
+            path.chmod(0o644)
+    _git(["init", "-q"], dest)
+    _git(["add", "-A"], dest)
+    _git(
+        ["commit", "-q", "--no-gpg-sign", "-m", f"fixture: {name} frozen starting state"],
+        dest,
+    )
+    sha = _git(["rev-parse", "HEAD"], dest)
+    if len(sha) != 40:
+        raise RuntimeError(f"unexpected rev-parse output: {sha!r}")
+    return sha
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(
+        prog="python -m evals.fixtures.build",
+        description=__doc__,
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+    )
+    parser.add_argument(
+        "--print-shas",
+        action="store_true",
+        help="build every fixture into a temp dir and print `name sha` lines",
+    )
+    args = parser.parse_args(argv)
+    if not args.print_shas:
+        parser.print_help()
+        return 2
+    for name in sorted(_SOURCE_DIRS):
+        with tempfile.TemporaryDirectory(prefix="me-fixture-") as tmp:
+            sha = build_fixture(name, Path(tmp) / "repo")
+        expected = FIXTURE_SHAS.get(name)
+        marker = "" if sha == expected else "  (differs from FIXTURE_SHAS)"
+        print(f"{name} {sha}{marker}")
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

metaensemble-0.3.0/evals/fixtures/legacy/legacy/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ """The ``legacy`` package. Public API lives in ``legacy.big_module``."""

metaensemble 0.2.0__tar.gz → 0.3.0__tar.gz

metaensemble 0.2.0tar.gz → 0.3.0tar.gz