PyPI - leanlab - Versions diffs - 0.2.1__tar.gz → 0.2.2__tar.gz - Mend

leanlab 0.2.1tar.gz → 0.2.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (127) hide show

{leanlab-0.2.1 → leanlab-0.2.2}/.github/RELEASING.md RENAMED Viewed

@@ -22,19 +22,25 @@ Do this once, before the first release.
 ## Cut a release
-1. Bump the version in **three** places (keep them in sync):
-   - `pyproject.toml` → `version`
-   - `frontend/package.json` → `version`
-   - `leanlab/cli.py` → `_version()` fallback
-2. Move the `## [Unreleased]` notes into a new dated section in `CHANGELOG.md`.
-3. Commit, then tag and push:
-   ```bash
-   git commit -am "release: v0.2.2"
-   git tag v0.2.2
-   git push origin main --tags
-   ```
-4. The `Publish to PyPI` workflow runs: build UI → build wheel → publish → release.
-   Watch it under the repo's **Actions** tab.
+One command does everything — bump (all 3 version spots), roll the CHANGELOG,
+run the tests, commit, tag, and push:
+```bash
+uv run python scripts/release.py patch     # 0.2.1 -> 0.2.2  (or: minor | major | X.Y.Z)
+```
+It prints the change, asks before pushing, and on confirm pushes `main` + the
+tag — which triggers `publish.yml` (build UI → build wheel → publish to PyPI →
+GitHub Release). Watch it under the repo's **Actions** tab.
+Flags: `--dry-run` (show changes, write nothing) · `--skip-tests` · `--yes`
+(push without the prompt). Before running, write your release notes under
+`## [Unreleased]` in `CHANGELOG.md` — the script moves them into the new version
+section for you.
+Doing it by hand instead: bump `version` in `pyproject.toml`,
+`frontend/package.json`, and `leanlab/cli.py` (`_version()` fallback); move the
+CHANGELOG notes; then `git tag vX.Y.Z && git push origin main vX.Y.Z`.
 ## Verify

{leanlab-0.2.1 → leanlab-0.2.2}/.github/workflows/ci.yml RENAMED Viewed

@@ -15,9 +15,9 @@ jobs:
     name: Lint (ruff)
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@v7
       - name: Install uv
-        uses: astral-sh/setup-uv@v5
+        uses: astral-sh/setup-uv@v7
       - name: Ruff
         run: uvx ruff check leanlab tests
@@ -29,10 +29,10 @@ jobs:
       matrix:
         python: ["3.11", "3.12", "3.13"]
     steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@v7
       - name: Install uv
-        uses: astral-sh/setup-uv@v5
+        uses: astral-sh/setup-uv@v7
         with:
           python-version: ${{ matrix.python }}
           enable-cache: true
@@ -47,9 +47,9 @@ jobs:
     name: Build (wheel + UI)
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@v7
-      - uses: actions/setup-node@v4
+      - uses: actions/setup-node@v6
         with:
           node-version: "20"
           cache: npm
@@ -62,7 +62,7 @@ jobs:
           npm run build
       - name: Install uv
-        uses: astral-sh/setup-uv@v5
+        uses: astral-sh/setup-uv@v7
       - name: Build sdist + wheel
         run: uv build
@@ -73,7 +73,7 @@ jobs:
             && echo "✓ board_dist bundled in the wheel" \
             || (echo "✗ board_dist missing from the wheel" && exit 1)
-      - uses: actions/upload-artifact@v4
+      - uses: actions/upload-artifact@v7
         with:
           name: dist
           path: dist/*

{leanlab-0.2.1 → leanlab-0.2.2}/.github/workflows/publish.yml RENAMED Viewed

@@ -18,10 +18,10 @@ jobs:
       id-token: write          # OIDC: PyPI Trusted Publishing mints a short-lived token
     steps:
       - name: Checkout
-        uses: actions/checkout@v4
+        uses: actions/checkout@v7
       # The wheel must ship the compiled React board (board_dist/), so build it first.
-      - uses: actions/setup-node@v4
+      - uses: actions/setup-node@v6
         with:
           node-version: "20"
           cache: npm
@@ -33,7 +33,7 @@ jobs:
           npm run build
       - name: Install uv
-        uses: astral-sh/setup-uv@v5
+        uses: astral-sh/setup-uv@v7
       - name: Build sdist + wheel
         run: uv build
@@ -66,7 +66,7 @@ jobs:
           echo "path=/tmp/release-notes.md" >> "$GITHUB_OUTPUT"
       - name: Create GitHub Release
-        uses: softprops/action-gh-release@v2
+        uses: softprops/action-gh-release@v3
         with:
           tag_name: ${{ github.ref_name }}
           name: ${{ github.ref_name }}

{leanlab-0.2.1 → leanlab-0.2.2}/CHANGELOG.md RENAMED Viewed

@@ -6,6 +6,16 @@ adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 ## [Unreleased]
+## [0.2.2] - 2026-06-26
+### Added
+- One-command release script (`scripts/release.py`) and a `ruff check` lint job in CI.
+### Changed
+- README is now user-facing (PyPI install + quick start for both lab types). The
+  project concept, structure, two-lab mapping, and coding-lab flow moved to
+  `docs/OVERVIEW.md`.
 ## [0.2.1] - 2026-06-26
 ### Fixed
@@ -35,7 +45,8 @@ adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
   evaluator) and coding labs (spec-writer → engineer → gate → reviewer →
   tech-lead), a live dashboard, and a Claude Code skill (`leanlab init --for-agent`).
-[Unreleased]: https://github.com/bacharSalleh/leanlab/compare/v0.2.1...HEAD
+[Unreleased]: https://github.com/bacharSalleh/leanlab/compare/v0.2.2...HEAD
+[0.2.2]: https://github.com/bacharSalleh/leanlab/compare/v0.2.1...v0.2.2
 [0.2.1]: https://github.com/bacharSalleh/leanlab/compare/v0.2.0...v0.2.1
 [0.2.0]: https://github.com/bacharSalleh/leanlab/compare/v0.1.0...v0.2.0
 [0.1.0]: https://github.com/bacharSalleh/leanlab/releases/tag/v0.1.0

leanlab-0.2.2/PKG-INFO ADDED Viewed

@@ -0,0 +1,110 @@
+Metadata-Version: 2.4
+Name: leanlab
+Version: 0.2.2
+Summary: A self-improving lab for AI agents — evolve ML experiments against a frozen metric, or ship coding tasks through a spec → gate → review → merge loop with locked acceptance tests.
+Project-URL: Homepage, https://github.com/bacharSalleh/leanlab
+Project-URL: Repository, https://github.com/bacharSalleh/leanlab
+Project-URL: Issues, https://github.com/bacharSalleh/leanlab/issues
+Project-URL: Changelog, https://github.com/bacharSalleh/leanlab/blob/main/CHANGELOG.md
+Author-email: Bashar <welcomebachar@gmail.com>
+License: MIT License
+        Copyright (c) 2026 Bashar
+        Permission is hereby granted, free of charge, to any person obtaining a copy
+        of this software and associated documentation files (the "Software"), to deal
+        in the Software without restriction, including without limitation the rights
+        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+        copies of the Software, and to permit persons to whom the Software is
+        furnished to do so, subject to the following conditions:
+        The above copyright notice and this permission notice shall be included in all
+        copies or substantial portions of the Software.
+        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+        SOFTWARE.
+License-File: LICENSE
+Keywords: agents,claude,cli,coding-agent,evaluation,experiment,lab,llm,self-improving
+Classifier: Development Status :: 4 - Beta
+Classifier: Environment :: Console
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Software Development :: Quality Assurance
+Classifier: Topic :: Software Development :: Testing
+Requires-Python: >=3.11
+Requires-Dist: questionary>=2
+Requires-Dist: rich>=13
+Description-Content-Type: text/markdown
+# leanlab
+[![PyPI](https://img.shields.io/pypi/v/leanlab.svg)](https://pypi.org/project/leanlab/)
+[![CI](https://github.com/bacharSalleh/leanlab/actions/workflows/ci.yml/badge.svg)](https://github.com/bacharSalleh/leanlab/actions/workflows/ci.yml)
+[![Python](https://img.shields.io/pypi/pyversions/leanlab.svg)](https://pypi.org/project/leanlab/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
+**Self-improving labs for AI agents.** Point leanlab at a task and a team of
+Claude agents iterates toward a goal — evolving ML / optimization experiments
+against a frozen metric, or shipping coding tasks through a
+spec → gate → review → merge loop with locked acceptance tests.
+## Install
+```bash
+pipx install leanlab        # or:  pip install leanlab   ·   uvx leanlab
+```
+Requires **Python 3.11+** and the **`claude` CLI** (the agents run on Claude Code).
+## Quick start
+leanlab runs **inside your own project** — each lab lives in a `.leanlab/<name>/`
+folder; the engine stays in the installed tool.
+**Metric lab** — evolve a number (ML, optimization, anything that prints a score):
+```bash
+cd ~/my-project
+leanlab init iris          # describe the task; Claude drafts the lab + scorer
+leanlab check iris         # verify it's wired correctly (free)
+leanlab lock iris          # freeze the scorer
+leanlab run iris --n 5     # the agents evolve experiments (uses Claude)
+leanlab serve iris         # watch the live dashboard
+```
+**Coding lab** — ship a coding task with locked acceptance tests:
+```bash
+cd ~/my-repo                              # a git repository
+leanlab spec "add a /health endpoint"    # spec-writer drafts + locks the tests
+leanlab build add-health                 # engineer → gate → reviewer → merge
+leanlab board                            # live board: tasks, timeline, playbook
+```
+## Let Claude Code drive it
+```bash
+cd ~/my-project && leanlab init --for-agent   # installs a Claude Code skill
+```
+Then just ask Claude Code — *"use leanlab to add a /health endpoint"* — and it
+specs, builds, and merges through the honest test gate for you.
+## Docs
+- **[docs/USAGE.md](docs/USAGE.md)** — every command, in order, with examples.
+- **[docs/OVERVIEW.md](docs/OVERVIEW.md)** — how it works: the loop, the two lab
+  types, the coding-lab flow, and the project structure.
+- **[CONTRIBUTING.md](CONTRIBUTING.md)** — local development (uv, tests, the React board).
+MIT licensed — see [LICENSE](LICENSE).

leanlab-0.2.2/README.md ADDED Viewed

@@ -0,0 +1,62 @@
+# leanlab
+[![PyPI](https://img.shields.io/pypi/v/leanlab.svg)](https://pypi.org/project/leanlab/)
+[![CI](https://github.com/bacharSalleh/leanlab/actions/workflows/ci.yml/badge.svg)](https://github.com/bacharSalleh/leanlab/actions/workflows/ci.yml)
+[![Python](https://img.shields.io/pypi/pyversions/leanlab.svg)](https://pypi.org/project/leanlab/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
+**Self-improving labs for AI agents.** Point leanlab at a task and a team of
+Claude agents iterates toward a goal — evolving ML / optimization experiments
+against a frozen metric, or shipping coding tasks through a
+spec → gate → review → merge loop with locked acceptance tests.
+## Install
+```bash
+pipx install leanlab        # or:  pip install leanlab   ·   uvx leanlab
+```
+Requires **Python 3.11+** and the **`claude` CLI** (the agents run on Claude Code).
+## Quick start
+leanlab runs **inside your own project** — each lab lives in a `.leanlab/<name>/`
+folder; the engine stays in the installed tool.
+**Metric lab** — evolve a number (ML, optimization, anything that prints a score):
+```bash
+cd ~/my-project
+leanlab init iris          # describe the task; Claude drafts the lab + scorer
+leanlab check iris         # verify it's wired correctly (free)
+leanlab lock iris          # freeze the scorer
+leanlab run iris --n 5     # the agents evolve experiments (uses Claude)
+leanlab serve iris         # watch the live dashboard
+```
+**Coding lab** — ship a coding task with locked acceptance tests:
+```bash
+cd ~/my-repo                              # a git repository
+leanlab spec "add a /health endpoint"    # spec-writer drafts + locks the tests
+leanlab build add-health                 # engineer → gate → reviewer → merge
+leanlab board                            # live board: tasks, timeline, playbook
+```
+## Let Claude Code drive it
+```bash
+cd ~/my-project && leanlab init --for-agent   # installs a Claude Code skill
+```
+Then just ask Claude Code — *"use leanlab to add a /health endpoint"* — and it
+specs, builds, and merges through the honest test gate for you.
+## Docs
+- **[docs/USAGE.md](docs/USAGE.md)** — every command, in order, with examples.
+- **[docs/OVERVIEW.md](docs/OVERVIEW.md)** — how it works: the loop, the two lab
+  types, the coding-lab flow, and the project structure.
+- **[CONTRIBUTING.md](CONTRIBUTING.md)** — local development (uv, tests, the React board).
+MIT licensed — see [LICENSE](LICENSE).

leanlab-0.2.1/README.md → leanlab-0.2.2/docs/OVERVIEW.md RENAMED Viewed

@@ -1,91 +1,28 @@
-# leanlab
+# How leanlab works
-[![PyPI](https://img.shields.io/pypi/v/leanlab.svg)](https://pypi.org/project/leanlab/)
-[![CI](https://github.com/bacharSalleh/leanlab/actions/workflows/ci.yml/badge.svg)](https://github.com/bacharSalleh/leanlab/actions/workflows/ci.yml)
-[![Python](https://img.shields.io/pypi/pyversions/leanlab.svg)](https://pypi.org/project/leanlab/)
-[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
+This is the deeper tour — the idea, the two lab types, the coding-lab flow, and
+the project structure. For installation and day-to-day commands, see the
+[README](../README.md) and [USAGE.md](USAGE.md).
-```bash
-pipx install leanlab     # or: pip install leanlab  ·  uvx leanlab
-```
+## The idea
-A small **tool for self-improving experiment labs**. A team of agents —
-**Workers** (experimenters), a **Director**, and **HyperCritics** — evolve
-solutions against a **frozen evaluator**, one experiment at a time. The same loop
-drives any task: you just describe the *lab* and Claude builds the scorer.
+leanlab runs a **self-improving loop**: make an attempt → judge it against a
+frozen criterion → keep the best → learn for next time. A team of Claude agents
+drives the loop; you only describe the *lab*.
-It is the trading "selflearn" idea, generalized: **strategy → Experiment**,
+It generalizes the trading "selflearn" idea: **strategy → Experiment**,
 **Manager → Director**, `results.csv → results.jsonl`, and the objective (what to
 maximize or minimize) is configuration, not code.
 leanlab is used **inside your own project** (like archik): each lab lives in a
-`.leanlab/<name>/` folder; the engine stays in the installed tool.
-## Quick start
-```bash
-uv tool install --force --editable /path/to/leanlab   # install the `leanlab` tool
-cd ~/my-project && uv init                            # your project (a uv project)
-leanlab init iris        # describe the task; Claude drafts the lab
-leanlab check iris       # verify it's wired correctly (free)
-leanlab lock iris        # freeze the scorer
-leanlab run iris --n 5   # the agents evolve experiments (costs Claude)
-leanlab serve iris       # watch the live dashboard
-```
-**Full command guide:** [docs/USAGE.md](docs/USAGE.md) — the flow and what each
-command does exactly.
-## Anatomy
-```
-leanlab/                     # the installable tool (engine — never copied into your project)
-├── cli.py                   # commands: init · check · fix · run · serve · list · lock · unlock
-├── core/
-│   ├── loop.py              # run N experiments, score, log, wake Director/Critic
-│   ├── monitor.py           # live dashboard: stat chips + progress chart + table + stream
-│   ├── init.py              # interactive `init` — Claude drafts task + evaluator
-│   ├── doctor.py            # preflight checks + Claude-powered `fix`
-│   └── agents/              # ports & adapters — the backend-agnostic agent layer
-└── templates/agents/        # CLAUDE.md (Worker) · director.md · critic.md  (injected, not copied)
+`.leanlab/<name>/` folder; the engine stays in the installed tool and is never
+copied into your project.
-<your project>/.leanlab/<name>/   # a lab — only YOUR files
-├── task.md          goal + experiment contract
-├── lab.json         objective {metric, direction}, commands, cadences
-├── evaluation.py    the FROZEN evaluator → prints ONE line of JSON metrics
-├── validate.py      structural check the Worker runs (no score)
-├── experiments/     where the Worker writes one file per loop
-└── results.jsonl    the book: one JSON record per experiment
-```
-**How a lab plugs in:** the engine never imports a lab. It runs the lab's
-`validate_cmd` / `eval_cmd` (from `lab.json`) as subprocesses, reads the **JSON
-metrics** the evaluator prints, and ranks by the configured **objective**. So a lab
-can be ML, trading, graphics, optimization — anything that can print a metric.
-## Make your own lab
-`leanlab init <name>` is interactive: you describe the task in plain words, Claude
-drafts `task.md` and picks the objective, then proposes an `evaluation.py` you
-approve (or give feedback to revise). It installs the scorer's libraries and
-self-checks the wiring before finishing. Then `leanlab lock <name>` and
-`leanlab run <name>`.
+## Two lab types
-If a lab is mis-wired, `leanlab check` tells you what's wrong and `leanlab fix`
-has Claude repair it.
-## The example lab: house-prices
-This repo dogfoods itself — `.leanlab/house-prices` predicts California median
-house value (**minimize RMSE**). Each experiment defines `build_estimator()` (any
-scikit-learn-style model); the evaluator fits it on a fixed split and reports
-`rmse / mae / r2 / overfit_gap / train_secs` on held-out data.
-## Two lab types — naming map
-leanlab runs the same loop two ways. A **metric lab** (ML/optimization — evolve a number)
-and a **coding lab** (do coding tasks on a repo — pass tests). Same engine, different words:
+The same loop runs two ways. A **metric lab** (ML / optimization — evolve a
+number) and a **coding lab** (do coding tasks on a repo — pass tests). Same
+engine, different words:
 **The team (agents)**
@@ -119,23 +56,16 @@ and a **coding lab** (do coding tasks on a repo — pass tests). Same engine, di
 | `serve` (dashboard) | `board` (dashboard) |
 | `lock` / `unlock` | (lock is automatic in `spec`) |
-**archik nodes**
-| Metric lab | Coding lab |
-|------------|-----------|
-| `loop` | `engineer` |
-| `evaluator` | `gate-runner` |
-| `results-store` | `playbook` + `coding-results` |
-| `dashboard` | `coding-board` |
-Same idea both ways: **make an attempt → judge it → keep the best → learn for next time** —
-just "experiment + metric + memory" swapped for "code change + tests + playbook."
+Same idea both ways: **make an attempt → judge it → keep the best → learn for
+next time** — just "experiment + metric + memory" swapped for "code change +
+tests + playbook."
 ## The coding lab flow
-A coding lab is an **assembly line with quality gates**. Each step hands off to the next, and
-any failed gate sends the work back to the engineer — up to `--max-attempts`. Nothing reaches
-`main` until the tests pass, the work is proven honest, and every reviewer approves.
+A coding lab is an **assembly line with quality gates**. Each step hands off to
+the next, and any failed gate sends the work back to the engineer — up to
+`--max-attempts`. Nothing reaches `main` until the tests pass, the work is proven
+honest, and every reviewer approves.
 ```
         Developer
@@ -182,44 +112,63 @@ any failed gate sends the work back to the engineer — up to `--max-attempts`.
 | Merge | *automated* | The branch merges into `main` — the change ships. |
 | Playbook | **Tech-lead** | Rewrites `PLAYBOOK.md` so the next task starts with the project's conventions and pitfalls. |
-Watch all of it live with `leanlab board`: the four roles, a per-task timeline, the agent chat
-(every session, with token cost), and the growing playbook.
+Watch it live with `leanlab board`: the four roles, a per-task round-by-round
+timeline, the agent chat (every session, with token cost), and the playbook.
-**Why it compounds:** every merged task adds its locked tests to `main` (a ratchet that never
-loosens), and the playbook accumulates — so the lab keeps getting better at *your* project.
+**Why it compounds:** every merged task adds its locked tests to `main` (a
+ratchet that never loosens), and the playbook accumulates — so the lab keeps
+getting better at *your* project.
-## Develop / test
+## Structure
-```bash
-uv sync
-uv run pytest                         # the test suite
-uv run leanlab list                   # run the tool from the checkout, no install
 ```
+leanlab/                     # the installable tool (engine — never copied into your project)
+├── cli.py                   # commands: init · check · fix · run · serve · spec · build · board · list · lock · unlock
+├── core/
+│   ├── loop.py              # run N experiments, score, log, wake Director/Critic
+│   ├── monitor.py           # metric-lab live dashboard
+│   ├── init.py              # interactive `init` — Claude drafts task + evaluator
+│   ├── doctor.py            # preflight checks + Claude-powered `fix`
+│   ├── coding/              # the coding lab: spec · engineer · gate · reviewer · tech-lead · board
+│   └── agents/              # ports & adapters — the backend-agnostic agent layer
+├── templates/agents/        # the agent personas (injected into prompts, not copied)
+└── core/coding/board_dist/  # the React board UI, compiled (built from frontend/)
-### Board UI (React + Tailwind)
-The `leanlab board` dashboard is a React + Tailwind app in [`frontend/`](frontend/), built
-into `leanlab/core/coding/board_dist/` and served by the Python board server. The Python side
-exposes the data as `/api/state`, `/api/task`, and `/api/stream` (SSE); React renders it.
-```bash
-cd frontend && npm install && npm run build   # compile the UI (re-run after editing src/)
+<your project>/.leanlab/<name>/   # a metric lab — only YOUR files
+├── task.md          goal + experiment contract
+├── lab.json         objective {metric, direction}, commands, cadences
+├── evaluation.py    the FROZEN evaluator → prints ONE line of JSON metrics
+├── validate.py      structural check the Worker runs (no score)
+├── experiments/     where the Worker writes one file per loop
+└── results.jsonl    the book: one JSON record per experiment
 ```
-For live UI work, run `leanlab board --no-open` (API on `:8766`) and `npm run dev` in `frontend/`
-(Vite on `:5173`, proxying `/api`). The compiled `board_dist/` ships inside the wheel.
+**How a lab plugs in:** the engine never imports a lab. It runs the lab's
+`validate_cmd` / `eval_cmd` (from `lab.json`) as subprocesses, reads the **JSON
+metrics** the evaluator prints, and ranks by the configured **objective**. So a
+lab can be ML, trading, graphics, optimization — anything that can print a
+metric.
-## Let Claude Code drive it
+## Making a metric lab
-```bash
-cd ~/my-project && leanlab init --for-agent   # installs .claude/skills/leanlab/SKILL.md
-```
-Then talk to Claude Code — *"use leanlab to add a /health endpoint"* — and it specs, builds, and
-merges through the honest test gate (`spec --yes` / `build` run headless). See `docs/USAGE.md`.
+`leanlab init <name>` is interactive: you describe the task in plain words, Claude
+drafts `task.md` and picks the objective, then proposes an `evaluation.py` you
+approve (or give feedback to revise). It installs the scorer's libraries and
+self-checks the wiring before finishing. Then `leanlab lock <name>` and
+`leanlab run <name>`. If a lab is mis-wired, `leanlab check` says what's wrong and
+`leanlab fix` has Claude repair it.
+**Example — house-prices:** this repo dogfoods itself. `.leanlab/house-prices`
+predicts California median house value (**minimize RMSE**). Each experiment
+defines `build_estimator()` (any scikit-learn-style model); the evaluator fits it
+on a fixed split and reports `rmse / mae / r2 / overfit_gap / train_secs` on
+held-out data.
-## Notes
+## Honesty model
 - Agents get full tools and are told to be proactive researchers (web, ML, `uv add`).
-- The Worker never runs the evaluator, so scores stay honest; `lock` freezes it.
-- The evaluator (and agent specs) live in the package and are injected into prompts —
-  nothing framework-level is copied into your project.
+- The Worker never runs the evaluator, so metric scores stay honest; `lock` freezes it.
+- In coding labs, acceptance tests are locked (sha256, out of the worktree),
+  restored before every gate, and re-run in isolation to catch fixture-gaming.
+- The evaluator and agent personas live in the package and are injected into
+  prompts — nothing framework-level is copied into your project.

{leanlab-0.2.1 → leanlab-0.2.2}/frontend/package.json RENAMED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "leanlab-board",
   "private": true,
-  "version": "0.2.1",
+  "version": "0.2.2",
   "type": "module",
   "description": "React + Tailwind UI for the leanlab coding board (built into the Python wheel).",
   "scripts": {

{leanlab-0.2.1 → leanlab-0.2.2}/leanlab/cli.py RENAMED Viewed

@@ -29,7 +29,7 @@ def _version() -> str:
         from importlib.metadata import PackageNotFoundError, version
         return version("leanlab")
     except (ImportError, PackageNotFoundError):
-        return "0.2.1"
+        return "0.2.2"
 def labs_dir() -> Path:

{leanlab-0.2.1 → leanlab-0.2.2}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "leanlab"
-version = "0.2.1"
+version = "0.2.2"
 description = "A self-improving lab for AI agents — evolve ML experiments against a frozen metric, or ship coding tasks through a spec → gate → review → merge loop with locked acceptance tests."
 readme = "README.md"
 requires-python = ">=3.11"

leanlab 0.2.1__tar.gz → 0.2.2__tar.gz

leanlab 0.2.1tar.gz → 0.2.2tar.gz