npm - atris - Versions diffs - 2.5.2 → 2.5.4 - Mend

atris 2.5.2 → 2.5.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (50) hide show

package/README.md +10 -0
package/atris/experiments/README.md +118 -0
package/atris/experiments/_examples/smoke-keep-revert/README.md +45 -0
package/atris/experiments/_examples/smoke-keep-revert/candidate.py +8 -0
package/atris/experiments/_examples/smoke-keep-revert/loop.py +129 -0
package/atris/experiments/_examples/smoke-keep-revert/measure.py +47 -0
package/atris/experiments/_examples/smoke-keep-revert/program.md +3 -0
package/atris/experiments/_examples/smoke-keep-revert/proposals/bad_patch.py +19 -0
package/atris/experiments/_examples/smoke-keep-revert/proposals/fix_patch.py +22 -0
package/atris/experiments/_examples/smoke-keep-revert/reset.py +21 -0
package/atris/experiments/_examples/smoke-keep-revert/results.tsv +5 -0
package/atris/experiments/_examples/smoke-keep-revert/visual.svg +52 -0
package/atris/experiments/_fixtures/invalid/BadName/loop.py +1 -0
package/atris/experiments/_fixtures/invalid/BadName/program.md +3 -0
package/atris/experiments/_fixtures/invalid/BadName/results.tsv +1 -0
package/atris/experiments/_fixtures/invalid/bloated-context/loop.py +1 -0
package/atris/experiments/_fixtures/invalid/bloated-context/measure.py +1 -0
package/atris/experiments/_fixtures/invalid/bloated-context/program.md +6 -0
package/atris/experiments/_fixtures/invalid/bloated-context/results.tsv +1 -0
package/atris/experiments/_fixtures/valid/good-experiment/loop.py +1 -0
package/atris/experiments/_fixtures/valid/good-experiment/measure.py +1 -0
package/atris/experiments/_fixtures/valid/good-experiment/program.md +3 -0
package/atris/experiments/_fixtures/valid/good-experiment/results.tsv +1 -0
package/atris/experiments/_template/pack/loop.py +3 -0
package/atris/experiments/_template/pack/measure.py +13 -0
package/atris/experiments/_template/pack/program.md +3 -0
package/atris/experiments/_template/pack/reset.py +3 -0
package/atris/experiments/_template/pack/results.tsv +1 -0
package/atris/experiments/benchmark_runtime.py +81 -0
package/atris/experiments/benchmark_validate.py +70 -0
package/atris/experiments/validate.py +92 -0
package/atris/policies/atris-design.md +66 -0
package/atris/skills/README.md +1 -0
package/atris/skills/apps/SKILL.md +243 -0
package/atris/skills/autoresearch/SKILL.md +63 -0
package/atris/skills/create-app/SKILL.md +6 -0
package/atris/skills/design/SKILL.md +15 -1
package/atris/skills/drive/SKILL.md +335 -20
package/atris/skills/ramp/SKILL.md +295 -0
package/bin/atris.js +76 -5
package/commands/business.js +132 -0
package/commands/clean.js +113 -70
package/commands/console.js +397 -0
package/commands/experiments.js +216 -0
package/commands/init.js +4 -0
package/commands/pull.js +311 -0
package/commands/push.js +170 -0
package/commands/run.js +366 -0
package/commands/status.js +21 -1
package/package.json +2 -1

package/README.md CHANGED Viewed

@@ -41,6 +41,16 @@ Commands: `brainstorm` (optional) → `plan` → `do` → `review`
 Works with: Claude Code, Cursor, Windsurf, GitHub Copilot, any agent.
+## Experiments
+Atris also supports Karpathy-style keep/revert loops inside `atris/experiments/`.
+```bash
+atris experiments init self-heal
+atris experiments validate
+atris experiments benchmark
+```
 ## Update
 ```bash

package/atris/experiments/README.md ADDED Viewed

@@ -0,0 +1,118 @@
+# experiments
+Karpathy-style experiment framework for Atris workspaces.
+This folder defines the schema, validation rules, and benchmark harness for self-improvement loops.
+Live experiment packs belong directly inside `atris/experiments/`.
+## What This Is
+An experiment is not "the agent rewrote its prompt and said it improved."
+An experiment is:
+1. one bounded target
+2. one external metric
+3. one keep/revert loop
+4. one append-only log
+If the metric goes up, keep the change.
+If it does not, revert it.
+## Schema
+```text
+atris/experiments/
+├── README.md
+├── validate.py
+├── benchmark_validate.py
+├── benchmark_runtime.py
+├── _template/           # packaged scaffolds
+├── _examples/           # packaged smoke examples
+├── _fixtures/           # validator benchmark cases
+└── <experiment-slug>/
+    ├── program.md
+    ├── measure.py
+    ├── loop.py
+    ├── results.tsv
+    ├── reset.py            # preferred
+    ├── proposals/          # optional
+    └── <bounded-target>    # candidate.py, system_prompt.txt, etc.
+```
+## Rules
+1. One bounded mutation target per experiment.
+2. `measure.py` must use an external metric the agent cannot fake.
+3. `loop.py` must keep only improvements and revert regressions.
+4. `program.md` stays short and task-specific.
+5. `results.tsv` stays append-only.
+## Repo Contents
+- `_template/pack/` - starter files for a new experiment
+- `validate.py` - structural and bloat checks
+- `benchmark_validate.py` - validator benchmark on fixed good/bad fixtures
+- `benchmark_runtime.py` - runtime benchmark on packaged example packs
+- `_examples/` - tiny reference implementation
+## Example
+Start with the smallest honest pack:
+```text
+_examples/smoke-keep-revert/
+├── candidate.py
+├── measure.py
+├── loop.py
+├── reset.py
+├── results.tsv
+└── proposals/
+    ├── bad_patch.py
+    └── fix_patch.py
+```
+What it does:
+- `candidate.py` starts broken on purpose
+- `measure.py` scores it on a fixed word-count test
+- `bad_patch.py` makes it worse
+- `fix_patch.py` actually fixes it
+- `loop.py` keeps only the fix
+Run it:
+```bash
+python _examples/smoke-keep-revert/reset.py
+python _examples/smoke-keep-revert/loop.py \
+  --proposal _examples/smoke-keep-revert/proposals/bad_patch.py \
+  --proposal _examples/smoke-keep-revert/proposals/fix_patch.py
+```
+Visual:
+```text
+broken target
+   ↓
+score = 0.2
+   ↓
+bad patch
+   ↓
+score = 0.0
+   ↓
+REVERT
+   ↓
+good patch
+   ↓
+score = 1.0
+   ↓
+KEEP
+```
+## Commands
+```bash
+python validate.py .
+python benchmark_validate.py
+python benchmark_runtime.py
+```

package/atris/experiments/_examples/smoke-keep-revert/README.md ADDED Viewed

@@ -0,0 +1,45 @@
+# smoke-keep-revert
+Smallest honest example of the framework.
+![Smoke Keep/Revert Flow](./visual.svg)
+## Files
+```text
+candidate.py   -> bounded target
+measure.py     -> hard score
+loop.py        -> keep/revert engine
+reset.py       -> restore baseline
+results.tsv    -> trial log
+proposals/     -> bad patch + good patch
+```
+## Flow
+```text
+candidate.py is wrong
+   ↓
+measure.py scores baseline
+   ↓
+loop.py applies bad_patch.py
+   ↓
+score does not improve
+   ↓
+loop.py reverts the change
+   ↓
+loop.py applies fix_patch.py
+   ↓
+score improves
+   ↓
+loop.py keeps the change
+```
+## Run
+```bash
+python reset.py
+python loop.py \
+  --proposal proposals/bad_patch.py \
+  --proposal proposals/fix_patch.py
+```

package/atris/experiments/_examples/smoke-keep-revert/candidate.py ADDED Viewed

@@ -0,0 +1,8 @@
+"""Bounded mutation target for the smoke experiment."""
+def count_words(text: str) -> int:
+    cleaned = text.strip()
+    if not cleaned:
+        return 0
+    return len(cleaned.split())

package/atris/experiments/_examples/smoke-keep-revert/loop.py ADDED Viewed

@@ -0,0 +1,129 @@
+"""Shared keep/revert loop for a bounded local experiment."""
+from __future__ import annotations
+import argparse
+import csv
+import json
+import os
+from pathlib import Path
+import shutil
+import subprocess
+import sys
+from datetime import datetime, timezone
+EXPERIMENT_DIR = Path(__file__).resolve().parent
+DEFAULT_TARGET = EXPERIMENT_DIR / "candidate.py"
+DEFAULT_MEASURE = EXPERIMENT_DIR / "measure.py"
+DEFAULT_RESULTS = EXPERIMENT_DIR / "results.tsv"
+def run_measure(measure_path: Path) -> dict:
+    proc = subprocess.run(
+        [sys.executable, str(measure_path)],
+        cwd=str(EXPERIMENT_DIR),
+        capture_output=True,
+        text=True,
+        check=True,
+    )
+    return json.loads(proc.stdout.strip())
+def append_result(results_path: Path, row: dict) -> None:
+    write_header = not results_path.exists() or results_path.stat().st_size == 0
+    with results_path.open("a", newline="", encoding="utf-8") as handle:
+        writer = csv.DictWriter(
+            handle,
+            fieldnames=[
+                "timestamp",
+                "trial",
+                "status",
+                "old_score",
+                "new_score",
+                "proposal",
+                "description",
+            ],
+            delimiter="\t",
+        )
+        if write_header:
+            writer.writeheader()
+        writer.writerow(row)
+def restore_backup(backup_path: Path, target_path: Path) -> None:
+    shutil.copy2(backup_path, target_path)
+    backup_path.unlink(missing_ok=True)
+def main() -> int:
+    parser = argparse.ArgumentParser(description="Run a bounded keep/revert experiment.")
+    parser.add_argument("--proposal", action="append", default=[])
+    args = parser.parse_args()
+    target_path = DEFAULT_TARGET.resolve()
+    measure_path = DEFAULT_MEASURE.resolve()
+    results_path = DEFAULT_RESULTS.resolve()
+    baseline = run_measure(measure_path)
+    current_score = float(baseline["score"])
+    print(f"BASELINE {current_score:.4f}")
+    for trial_index, proposal in enumerate(args.proposal, start=1):
+        proposal_path = Path(proposal).resolve()
+        backup_path = target_path.with_suffix(target_path.suffix + f".trial{trial_index}.bak")
+        shutil.copy2(target_path, backup_path)
+        status = "error"
+        old_score = current_score
+        new_score = current_score
+        description = ""
+        try:
+            proc = subprocess.run(
+                [sys.executable, str(proposal_path)],
+                cwd=str(EXPERIMENT_DIR),
+                capture_output=True,
+                text=True,
+                check=True,
+                env={**os.environ, "EXPERIMENT_TARGET": str(target_path)},
+            )
+            if proc.stdout.strip():
+                description = proc.stdout.strip().splitlines()[-1][:200]
+            measured = run_measure(measure_path)
+            new_score = float(measured["score"])
+            if new_score > current_score:
+                status = "kept"
+                current_score = new_score
+                backup_path.unlink(missing_ok=True)
+            else:
+                status = "reverted"
+                restore_backup(backup_path, target_path)
+        except subprocess.CalledProcessError as exc:
+            restore_backup(backup_path, target_path)
+            stderr = (exc.stderr or exc.stdout or "").strip()
+            description = (stderr.splitlines()[-1] if stderr else "proposal failed")[:200]
+            status = "error"
+        append_result(
+            results_path,
+            {
+                "timestamp": datetime.now(timezone.utc).isoformat(),
+                "trial": trial_index,
+                "status": status,
+                "old_score": f"{old_score:.4f}",
+                "new_score": f"{new_score:.4f}",
+                "proposal": proposal_path.name,
+                "description": description,
+            },
+        )
+        print(f"TRIAL {trial_index} {status.upper()} score={new_score:.4f} proposal={proposal_path.name}")
+    final_measure = run_measure(measure_path)
+    print(f"FINAL {final_measure['score']:.4f}")
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

package/atris/experiments/_examples/smoke-keep-revert/measure.py ADDED Viewed

@@ -0,0 +1,47 @@
+"""Objective metric for the smoke keep/revert example."""
+from __future__ import annotations
+import json
+from pathlib import Path
+import sys
+EXPERIMENT_DIR = Path(__file__).resolve().parent
+if str(EXPERIMENT_DIR) not in sys.path:
+    sys.path.insert(0, str(EXPERIMENT_DIR))
+from candidate import count_words
+CASES = [
+    ("", 0),
+    ("one", 1),
+    ("two words", 2),
+    ("  three   spaced   words ", 3),
+    ("punctuation, still counts", 3),
+]
+def main() -> int:
+    passed = 0
+    for text, expected in CASES:
+        actual = count_words(text)
+        if actual == expected:
+            passed += 1
+    total = len(CASES)
+    score = passed / total if total else 0.0
+    payload = {
+        "score": round(score, 4),
+        "passed": passed,
+        "total": total,
+        "status": "pass" if passed == total else "fail",
+    }
+    print(json.dumps(payload))
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

package/atris/experiments/_examples/smoke-keep-revert/program.md ADDED Viewed

@@ -0,0 +1,3 @@
+# Program
+Prove the keep/revert loop in the smallest honest way.

package/atris/experiments/_examples/smoke-keep-revert/proposals/bad_patch.py ADDED Viewed

@@ -0,0 +1,19 @@
+"""A deliberately bad mutation that should be reverted."""
+from pathlib import Path
+import os
+TARGET = Path(os.environ["EXPERIMENT_TARGET"])
+TARGET.write_text(
+    '''"""Bounded mutation target for the smoke experiment."""
+def count_words(text: str) -> int:
+    return 0
+''',
+    encoding="utf-8",
+)
+print("applied bad proposal")

package/atris/experiments/_examples/smoke-keep-revert/proposals/fix_patch.py ADDED Viewed

@@ -0,0 +1,22 @@
+"""A good mutation that should be kept."""
+from pathlib import Path
+import os
+TARGET = Path(os.environ["EXPERIMENT_TARGET"])
+TARGET.write_text(
+    '''"""Bounded mutation target for the smoke experiment."""
+def count_words(text: str) -> int:
+    cleaned = text.strip()
+    if not cleaned:
+        return 0
+    return len(cleaned.split())
+''',
+    encoding="utf-8",
+)
+print("applied good proposal")

package/atris/experiments/_examples/smoke-keep-revert/reset.py ADDED Viewed

@@ -0,0 +1,21 @@
+"""Restore the smoke example to its baseline."""
+from pathlib import Path
+TARGET = Path(__file__).resolve().parent / "candidate.py"
+TARGET.write_text(
+    '''"""Bounded mutation target for the smoke experiment."""
+def count_words(text: str) -> int:
+    cleaned = text.strip()
+    if not cleaned:
+        return 0
+    return len(cleaned)
+''',
+    encoding="utf-8",
+)
+print("reset smoke-keep-revert to baseline")

package/atris/experiments/_examples/smoke-keep-revert/results.tsv ADDED Viewed

@@ -0,0 +1,5 @@
+timestamp	trial	status	old_score	new_score	proposal	description
+2026-03-11T11:05:17.887045+00:00	1	reverted	0.2000	0.2000	bad_patch.py	applied bad proposal
+2026-03-11T11:05:17.920737+00:00	2	kept	0.2000	1.0000	fix_patch.py	applied good proposal
+2026-03-11T11:05:40.063680+00:00	1	reverted	0.2000	0.2000	bad_patch.py	applied bad proposal
+2026-03-11T11:05:40.097842+00:00	2	kept	0.2000	1.0000	fix_patch.py	applied good proposal

package/atris/experiments/_examples/smoke-keep-revert/visual.svg ADDED Viewed

@@ -0,0 +1,52 @@
+<svg width="980" height="260" viewBox="0 0 980 260" fill="none" xmlns="http://www.w3.org/2000/svg">
+  <rect width="980" height="260" fill="#F7F5EF"/>
+  <text x="40" y="42" font-family="Helvetica, Arial, sans-serif" font-size="28" font-weight="700" fill="#111111">Smoke Keep/Revert</text>
+  <text x="40" y="68" font-family="Helvetica, Arial, sans-serif" font-size="14" fill="#4B5563">One bounded target. One hard metric. Reject the loser. Keep the winner.</text>
+  <rect x="40" y="110" width="150" height="88" rx="16" fill="#FFF7ED" stroke="#C2410C" stroke-width="2"/>
+  <text x="115" y="140" text-anchor="middle" font-family="Helvetica, Arial, sans-serif" font-size="18" font-weight="700" fill="#9A3412">Broken Target</text>
+  <text x="115" y="166" text-anchor="middle" font-family="Helvetica, Arial, sans-serif" font-size="14" fill="#7C2D12">candidate.py</text>
+  <text x="115" y="186" text-anchor="middle" font-family="Helvetica, Arial, sans-serif" font-size="14" fill="#7C2D12">buggy on purpose</text>
+  <rect x="220" y="110" width="150" height="88" rx="16" fill="#EFF6FF" stroke="#1D4ED8" stroke-width="2"/>
+  <text x="295" y="140" text-anchor="middle" font-family="Helvetica, Arial, sans-serif" font-size="18" font-weight="700" fill="#1E3A8A">Measure</text>
+  <text x="295" y="166" text-anchor="middle" font-family="Helvetica, Arial, sans-serif" font-size="14" fill="#1D4ED8">score = 0.2</text>
+  <text x="295" y="186" text-anchor="middle" font-family="Helvetica, Arial, sans-serif" font-size="14" fill="#1D4ED8">baseline truth</text>
+  <rect x="400" y="38" width="160" height="72" rx="16" fill="#FEF2F2" stroke="#DC2626" stroke-width="2"/>
+  <text x="480" y="66" text-anchor="middle" font-family="Helvetica, Arial, sans-serif" font-size="18" font-weight="700" fill="#991B1B">Bad Patch</text>
+  <text x="480" y="92" text-anchor="middle" font-family="Helvetica, Arial, sans-serif" font-size="14" fill="#B91C1C">score falls to 0.0</text>
+  <rect x="400" y="150" width="160" height="72" rx="16" fill="#ECFDF5" stroke="#059669" stroke-width="2"/>
+  <text x="480" y="178" text-anchor="middle" font-family="Helvetica, Arial, sans-serif" font-size="18" font-weight="700" fill="#065F46">Good Patch</text>
+  <text x="480" y="204" text-anchor="middle" font-family="Helvetica, Arial, sans-serif" font-size="14" fill="#047857">score rises to 1.0</text>
+  <rect x="610" y="38" width="150" height="72" rx="16" fill="#FEE2E2" stroke="#DC2626" stroke-width="2"/>
+  <text x="685" y="66" text-anchor="middle" font-family="Helvetica, Arial, sans-serif" font-size="18" font-weight="700" fill="#991B1B">REVERT</text>
+  <text x="685" y="92" text-anchor="middle" font-family="Helvetica, Arial, sans-serif" font-size="14" fill="#B91C1C">reject loser</text>
+  <rect x="610" y="150" width="150" height="72" rx="16" fill="#DCFCE7" stroke="#16A34A" stroke-width="2"/>
+  <text x="685" y="178" text-anchor="middle" font-family="Helvetica, Arial, sans-serif" font-size="18" font-weight="700" fill="#166534">KEEP</text>
+  <text x="685" y="204" text-anchor="middle" font-family="Helvetica, Arial, sans-serif" font-size="14" fill="#15803D">accept winner</text>
+  <rect x="800" y="110" width="140" height="88" rx="16" fill="#F0FDF4" stroke="#16A34A" stroke-width="2"/>
+  <text x="870" y="140" text-anchor="middle" font-family="Helvetica, Arial, sans-serif" font-size="18" font-weight="700" fill="#166534">Final State</text>
+  <text x="870" y="166" text-anchor="middle" font-family="Helvetica, Arial, sans-serif" font-size="14" fill="#15803D">fixed target</text>
+  <text x="870" y="186" text-anchor="middle" font-family="Helvetica, Arial, sans-serif" font-size="14" fill="#15803D">score = 1.0</text>
+  <path d="M190 154H220" stroke="#6B7280" stroke-width="3"/>
+  <path d="M365 154H385" stroke="#6B7280" stroke-width="3"/>
+  <path d="M560 74H600" stroke="#DC2626" stroke-width="3"/>
+  <path d="M560 186H600" stroke="#16A34A" stroke-width="3"/>
+  <path d="M760 154H790" stroke="#6B7280" stroke-width="3"/>
+  <path d="M480 110V138" stroke="#6B7280" stroke-width="3" stroke-dasharray="8 8"/>
+  <path d="M295 154C340 154 350 74 400 74" stroke="#DC2626" stroke-width="3" fill="none"/>
+  <path d="M295 154C340 154 350 186 400 186" stroke="#16A34A" stroke-width="3" fill="none"/>
+  <polygon points="220,154 210,148 210,160" fill="#6B7280"/>
+  <polygon points="385,154 375,148 375,160" fill="#6B7280"/>
+  <polygon points="600,74 590,68 590,80" fill="#DC2626"/>
+  <polygon points="600,186 590,180 590,192" fill="#16A34A"/>
+  <polygon points="790,154 780,148 780,160" fill="#6B7280"/>
+</svg>

package/atris/experiments/_fixtures/invalid/BadName/loop.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ print("ok")

package/atris/experiments/_fixtures/invalid/BadName/program.md ADDED Viewed

@@ -0,0 +1,3 @@
+# Program
+This pack is invalid because the folder name is bad and files are missing.

package/atris/experiments/_fixtures/invalid/BadName/results.tsv ADDED Viewed

	@@ -0,0 +1 @@
1	+ timestamp trial status old_score new_score proposal description

package/atris/experiments/_fixtures/invalid/bloated-context/loop.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ print("ok")

package/atris/experiments/_fixtures/invalid/bloated-context/measure.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ print("ok")

package/atris/experiments/_fixtures/invalid/bloated-context/program.md ADDED Viewed

@@ -0,0 +1,6 @@
+# Program
+This program is intentionally too long.
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

package/atris/experiments/_fixtures/invalid/bloated-context/results.tsv ADDED Viewed

	@@ -0,0 +1 @@
1	+ timestamp trial status old_score new_score proposal description

package/atris/experiments/_fixtures/valid/good-experiment/loop.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ print("ok")

package/atris/experiments/_fixtures/valid/good-experiment/measure.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ print("ok")

package/atris/experiments/_fixtures/valid/good-experiment/program.md ADDED Viewed

@@ -0,0 +1,3 @@
+# Program
+Keep this experiment small and measurable.

package/atris/experiments/_fixtures/valid/good-experiment/results.tsv ADDED Viewed

	@@ -0,0 +1 @@
1	+ timestamp trial status old_score new_score proposal description

package/atris/experiments/_template/pack/loop.py ADDED Viewed

@@ -0,0 +1,3 @@
+"""Minimal keep/revert loop template."""
+print("Replace this template with a bounded keep/revert loop.")

package/atris/experiments/_template/pack/measure.py ADDED Viewed

@@ -0,0 +1,13 @@
+"""Return a machine-readable score for the experiment."""
+import json
+payload = {
+    "score": 0.0,
+    "passed": 0,
+    "total": 0,
+    "status": "fail",
+}
+print(json.dumps(payload))

package/atris/experiments/_template/pack/program.md ADDED Viewed

@@ -0,0 +1,3 @@
+# Program
+State the outcome you want to improve in one short paragraph.

package/atris/experiments/_template/pack/reset.py ADDED Viewed

@@ -0,0 +1,3 @@
+"""Restore the experiment pack to baseline."""
+print("Implement baseline reset here.")

package/atris/experiments/_template/pack/results.tsv ADDED Viewed

	@@ -0,0 +1 @@
1	+ timestamp trial status old_score new_score proposal description

package/atris/experiments/benchmark_runtime.py ADDED Viewed

@@ -0,0 +1,81 @@
+"""Runtime benchmark for example experiment packs."""
+from __future__ import annotations
+import json
+from pathlib import Path
+import subprocess
+import sys
+ROOT = Path(__file__).resolve().parent
+EXAMPLES_DIR = ROOT / "_examples"
+CASES = [
+    {
+        "name": "smoke-keep-revert",
+        "baseline_below": 1.0,
+        "expected_final": 1.0,
+        "proposals": ["proposals/bad_patch.py", "proposals/fix_patch.py"],
+    },
+]
+def run_python(script: Path, *args: str) -> subprocess.CompletedProcess[str]:
+    return subprocess.run(
+        [sys.executable, str(script), *args],
+        cwd=str(script.parent),
+        capture_output=True,
+        text=True,
+        check=True,
+    )
+def run_measure(exp_dir: Path) -> dict:
+    proc = run_python(exp_dir / "measure.py")
+    return json.loads(proc.stdout.strip())
+def main() -> int:
+    passed = 0
+    failures = []
+    for case in CASES:
+        exp_dir = EXAMPLES_DIR / case["name"]
+        run_python(exp_dir / "reset.py")
+        baseline = run_measure(exp_dir)
+        if float(baseline["score"]) >= case["baseline_below"]:
+            failures.append(f"{case['name']}: baseline too high ({baseline['score']})")
+            continue
+        proposal_args: list[str] = []
+        for proposal in case["proposals"]:
+            proposal_args.extend(["--proposal", str(exp_dir / proposal)])
+        run_python(exp_dir / "loop.py", *proposal_args)
+        final = run_measure(exp_dir)
+        if float(final["score"]) != case["expected_final"]:
+            failures.append(
+                f"{case['name']}: final score {final['score']} != {case['expected_final']}"
+            )
+            continue
+        passed += 1
+    total = len(CASES)
+    score = passed / total if total else 0.0
+    print(f"SCORE {score:.4f} ({passed}/{total})")
+    if failures:
+        for failure in failures:
+            print(f"FAIL {failure}")
+        return 1
+    print("PASS benchmark_runtime")
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())