PyPI - simplicio-cli - Versions diffs - 0.4.1__tar.gz → 0.4.3__tar.gz - Mend

simplicio-cli 0.4.1tar.gz → 0.4.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

{simplicio_cli-0.4.1/simplicio_cli.egg-info → simplicio_cli-0.4.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: simplicio-cli
-Version: 0.4.1
+Version: 0.4.3
 Summary: Portable task-to-code pipeline that works with any LLM. Turn a one-line task into a verified code change — diff + test + verify loop. +55 pts on a 156-check benchmark, 21% faster, ~same tokens.
 Author-email: Wesley Simplicio <wesleybob4@gmail.com>
 License: MIT
@@ -31,8 +31,8 @@ Requires-Dist: sentence-transformers>=2.2
 Requires-Dist: numpy>=1.23
 Requires-Dist: anthropic>=0.30
 Requires-Dist: openai>=1.30
-Requires-Dist: simplicio-mapper>=0.6.0
-Requires-Dist: simplicio-prompt>=1.9.0
+Requires-Dist: simplicio-mapper>=0.6.1
+Requires-Dist: simplicio-prompt>=1.12.0
 Requires-Dist: httpx>=0.27
 Requires-Dist: orjson>=3.10
 Requires-Dist: diskcache>=5.6
@@ -134,12 +134,25 @@ M1 MacBook (8 GB), five sub-4B tiny models, six frontier 2026 models, and three
 mid-tier 7B–12B open models. Every one gained at least **+14 points** when
 wrapped in simplicio's 6-layer contract.
-#### Hugging Face — Qwen2.5-Coder, re-run on 2026-05-27 (latest mapper, 10 cases/side, 156 checks)
+#### Hugging Face — recommended Qwen3-Coder defaults (HF router)
-First batch of the smaller→larger re-benchmark against the latest
-`simplicio-mapper` artifacts. The 1.5B runs on CPU via `transformers`
-(Hugging Face Inference Providers does not serve it); the 3B and 7B run
-through the HF router (`https://router.huggingface.co/v1`).
+The served Qwen Coder recommendation now uses the Qwen3-Coder MoE family.
+`Qwen/Qwen2.5-Coder-3B-Instruct` and
+`Qwen/Qwen2.5-Coder-7B-Instruct` remain available as legacy fallback models for
+historical comparisons and hardware that cannot host the MoE successors.
+| Slot | Recommended model | Route | Notes |
+|---|---|---|---|
+| Efficient coder | `Qwen/Qwen3-Coder-30B-A3B-Instruct` | HF router | 30B total / ~3B active MoE successor to the 3B slot |
+| High-ceiling coder | `Qwen/Qwen3-Coder-Next` | HF router | 80B total / ~3B active MoE successor to the 7B slot |
+> Reproduce the new default set:
+> `BENCH_BASE_URL=https://router.huggingface.co/v1 BENCH_API_KEY=<hf-token>
+> BENCH_MODELS="Qwen/Qwen3-Coder-30B-A3B-Instruct,Qwen/Qwen3-Coder-Next"
+> python3 bench/run_offline.py`.
+Legacy Qwen2.5-Coder baseline, re-run on 2026-05-27 against the latest
+`simplicio-mapper` artifacts (10 cases/side, 156 checks):
 | Model | Without simplicio | With simplicio | Gain |
 |---|---|---|---|
@@ -148,10 +161,9 @@ through the HF router (`https://router.huggingface.co/v1`).
 | **Qwen 2.5 Coder 1.5B** (`Qwen/Qwen2.5-Coder-1.5B-Instruct`, local CPU) | 30% | **92%** | **+62 pts** |
 | **HF avg (3 models · 10 cases · 156 checks)** | **34%** | **94%** | **+60 pts (+172%)** |
-> Monotonic from smaller to larger: pass-rate with simplicio climbs **92% →
-> 94% → 96%** as the model grows, while the raw-prompt baseline stays at
-> **30–38%**. The 1.5B model gains the most (**+62 pts**) — the contract does
-> the heaviest lifting where the model is weakest. Reproduce:
+> Monotonic from smaller to larger in the legacy baseline: pass-rate with
+> simplicio climbs **92% → 94% → 96%** as the model grows, while the raw-prompt
+> baseline stays at **30–38%**. Reproduce the legacy set:
 > `BENCH_BASE_URL=https://router.huggingface.co/v1 BENCH_API_KEY=<hf-token>
 > BENCH_MODELS="local:Qwen/Qwen2.5-Coder-1.5B-Instruct,Qwen/Qwen2.5-Coder-3B-Instruct,Qwen/Qwen2.5-Coder-7B-Instruct"
 > python3 bench/run_offline.py`.
@@ -167,7 +179,18 @@ Pro) show `n/a` for the new column: their OpenRouter calls hit account-level
 HTTP 402 / provider failures on >50% of requests this round, so the sample is
 too small to publish; their old numbers still stand.
-#### Local offline — qwen2.5-coder on Ollama, M1 8 GB, run on 2026-05-27 (30 runs/side, 156 checks)
+#### Local offline — Qwen3-Coder GGUF recommendation, Qwen2.5 legacy baseline
+For local OpenAI-compatible servers, prefer the Qwen3-Coder GGUF builds when
+the machine can host MoE weights:
+| Slot | Recommended local weights | Notes |
+|---|---|---|
+| Efficient coder | `unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF` | Primary local successor for the 3B-active slot |
+| High-ceiling coder | `unsloth/Qwen3-Coder-Next-GGUF` | 24 GB GPU-class successor for long-context work |
+The last fully offline fallback baseline remains qwen2.5-coder on Ollama,
+M1 8 GB, run on 2026-05-27 (30 runs/side, 156 checks):
 | Model | Without simplicio | With simplicio | Gain |
 |---|---|---|---|
@@ -180,7 +203,7 @@ too small to publish; their old numbers still stand.
 > `http://localhost:11434/v1` (Ollama's OpenAI-compatible endpoint). A
 > 1.5B-param model running on a 4-year-old laptop reaches **88%** pass-rate
 > with simplicio's contract — same hardware, same model, raw prompt = 32%.
-> Reproduce: `BENCH_BASE_URL=http://localhost:11434/v1 BENCH_API_KEY=ollama
+> Reproduce the legacy fallback: `BENCH_BASE_URL=http://localhost:11434/v1 BENCH_API_KEY=ollama
 > BENCH_MODELS="qwen2.5-coder:7b" python3 bench/run_offline.py`.
 #### Tiny models — sub-4B, run on 2026-05-26 (50 runs/side, 260 checks)
@@ -382,6 +405,29 @@ simplicio task "..." --stack angular --target ...
 How it works: simplicio shells out to `claude -p "<prompt>"` (or `codex exec "<prompt>"`) as a subprocess, captures stdout, runs the test loop. The inner CLI authenticates via your existing OAuth session in `~/.claude/` or `~/.codex/`. simplicio sets `SIMPLICIO_HOOK_GUARD=1` in the subprocess env so the inner Claude Code session does **not** re-fire simplicio's own UserPromptSubmit hook (no infinite recursion).
+For orchestrators such as SendSprint, `simplicio task` also has a structured
+contract:
+```bash
+simplicio task "hide Delete button for non-admins" \
+  --stack angular \
+  --target src/app/screen/screen.component.html \
+  --dry-run-task \
+  --json
+simplicio task "front-only task" \
+  --stack angular \
+  --target src/app/screen/screen.component.html \
+  --bound-paths "src/app/**" \
+  --json
+```
+`--dry-run-task` generates the would-be diff/test output without applying or
+testing it. `--json` returns `{task_id, applied, files_changed, tokens_used,
+cost_usd, diff_summary, warnings}`. Repeat `--bound-paths <glob>` to reject
+diffs outside the allowed edit surface; violations are reported in `warnings`
+and the command exits non-zero.
 ### Path 3 example — standalone with API key
 ```bash

{simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/README.md RENAMED Viewed

@@ -92,12 +92,25 @@ M1 MacBook (8 GB), five sub-4B tiny models, six frontier 2026 models, and three
 mid-tier 7B–12B open models. Every one gained at least **+14 points** when
 wrapped in simplicio's 6-layer contract.
-#### Hugging Face — Qwen2.5-Coder, re-run on 2026-05-27 (latest mapper, 10 cases/side, 156 checks)
+#### Hugging Face — recommended Qwen3-Coder defaults (HF router)
-First batch of the smaller→larger re-benchmark against the latest
-`simplicio-mapper` artifacts. The 1.5B runs on CPU via `transformers`
-(Hugging Face Inference Providers does not serve it); the 3B and 7B run
-through the HF router (`https://router.huggingface.co/v1`).
+The served Qwen Coder recommendation now uses the Qwen3-Coder MoE family.
+`Qwen/Qwen2.5-Coder-3B-Instruct` and
+`Qwen/Qwen2.5-Coder-7B-Instruct` remain available as legacy fallback models for
+historical comparisons and hardware that cannot host the MoE successors.
+| Slot | Recommended model | Route | Notes |
+|---|---|---|---|
+| Efficient coder | `Qwen/Qwen3-Coder-30B-A3B-Instruct` | HF router | 30B total / ~3B active MoE successor to the 3B slot |
+| High-ceiling coder | `Qwen/Qwen3-Coder-Next` | HF router | 80B total / ~3B active MoE successor to the 7B slot |
+> Reproduce the new default set:
+> `BENCH_BASE_URL=https://router.huggingface.co/v1 BENCH_API_KEY=<hf-token>
+> BENCH_MODELS="Qwen/Qwen3-Coder-30B-A3B-Instruct,Qwen/Qwen3-Coder-Next"
+> python3 bench/run_offline.py`.
+Legacy Qwen2.5-Coder baseline, re-run on 2026-05-27 against the latest
+`simplicio-mapper` artifacts (10 cases/side, 156 checks):
 | Model | Without simplicio | With simplicio | Gain |
 |---|---|---|---|
@@ -106,10 +119,9 @@ through the HF router (`https://router.huggingface.co/v1`).
 | **Qwen 2.5 Coder 1.5B** (`Qwen/Qwen2.5-Coder-1.5B-Instruct`, local CPU) | 30% | **92%** | **+62 pts** |
 | **HF avg (3 models · 10 cases · 156 checks)** | **34%** | **94%** | **+60 pts (+172%)** |
-> Monotonic from smaller to larger: pass-rate with simplicio climbs **92% →
-> 94% → 96%** as the model grows, while the raw-prompt baseline stays at
-> **30–38%**. The 1.5B model gains the most (**+62 pts**) — the contract does
-> the heaviest lifting where the model is weakest. Reproduce:
+> Monotonic from smaller to larger in the legacy baseline: pass-rate with
+> simplicio climbs **92% → 94% → 96%** as the model grows, while the raw-prompt
+> baseline stays at **30–38%**. Reproduce the legacy set:
 > `BENCH_BASE_URL=https://router.huggingface.co/v1 BENCH_API_KEY=<hf-token>
 > BENCH_MODELS="local:Qwen/Qwen2.5-Coder-1.5B-Instruct,Qwen/Qwen2.5-Coder-3B-Instruct,Qwen/Qwen2.5-Coder-7B-Instruct"
 > python3 bench/run_offline.py`.
@@ -125,7 +137,18 @@ Pro) show `n/a` for the new column: their OpenRouter calls hit account-level
 HTTP 402 / provider failures on >50% of requests this round, so the sample is
 too small to publish; their old numbers still stand.
-#### Local offline — qwen2.5-coder on Ollama, M1 8 GB, run on 2026-05-27 (30 runs/side, 156 checks)
+#### Local offline — Qwen3-Coder GGUF recommendation, Qwen2.5 legacy baseline
+For local OpenAI-compatible servers, prefer the Qwen3-Coder GGUF builds when
+the machine can host MoE weights:
+| Slot | Recommended local weights | Notes |
+|---|---|---|
+| Efficient coder | `unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF` | Primary local successor for the 3B-active slot |
+| High-ceiling coder | `unsloth/Qwen3-Coder-Next-GGUF` | 24 GB GPU-class successor for long-context work |
+The last fully offline fallback baseline remains qwen2.5-coder on Ollama,
+M1 8 GB, run on 2026-05-27 (30 runs/side, 156 checks):
 | Model | Without simplicio | With simplicio | Gain |
 |---|---|---|---|
@@ -138,7 +161,7 @@ too small to publish; their old numbers still stand.
 > `http://localhost:11434/v1` (Ollama's OpenAI-compatible endpoint). A
 > 1.5B-param model running on a 4-year-old laptop reaches **88%** pass-rate
 > with simplicio's contract — same hardware, same model, raw prompt = 32%.
-> Reproduce: `BENCH_BASE_URL=http://localhost:11434/v1 BENCH_API_KEY=ollama
+> Reproduce the legacy fallback: `BENCH_BASE_URL=http://localhost:11434/v1 BENCH_API_KEY=ollama
 > BENCH_MODELS="qwen2.5-coder:7b" python3 bench/run_offline.py`.
 #### Tiny models — sub-4B, run on 2026-05-26 (50 runs/side, 260 checks)
@@ -340,6 +363,29 @@ simplicio task "..." --stack angular --target ...
 How it works: simplicio shells out to `claude -p "<prompt>"` (or `codex exec "<prompt>"`) as a subprocess, captures stdout, runs the test loop. The inner CLI authenticates via your existing OAuth session in `~/.claude/` or `~/.codex/`. simplicio sets `SIMPLICIO_HOOK_GUARD=1` in the subprocess env so the inner Claude Code session does **not** re-fire simplicio's own UserPromptSubmit hook (no infinite recursion).
+For orchestrators such as SendSprint, `simplicio task` also has a structured
+contract:
+```bash
+simplicio task "hide Delete button for non-admins" \
+  --stack angular \
+  --target src/app/screen/screen.component.html \
+  --dry-run-task \
+  --json
+simplicio task "front-only task" \
+  --stack angular \
+  --target src/app/screen/screen.component.html \
+  --bound-paths "src/app/**" \
+  --json
+```
+`--dry-run-task` generates the would-be diff/test output without applying or
+testing it. `--json` returns `{task_id, applied, files_changed, tokens_used,
+cost_usd, diff_summary, warnings}`. Repeat `--bound-paths <glob>` to reject
+diffs outside the allowed edit surface; violations are reported in `warnings`
+and the command exits non-zero.
 ### Path 3 example — standalone with API key
 ```bash

{simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "simplicio-cli"
-version = "0.4.1"
+version = "0.4.3"
 description = "Portable task-to-code pipeline that works with any LLM. Turn a one-line task into a verified code change — diff + test + verify loop. +55 pts on a 156-check benchmark, 21% faster, ~same tokens."
 readme = "README.md"
 license = { text = "MIT" }
@@ -45,8 +45,8 @@ dependencies = [
   "numpy>=1.23",
   "anthropic>=0.30",
   "openai>=1.30",
-  "simplicio-mapper>=0.6.0",
-  "simplicio-prompt>=1.9.0",
+  "simplicio-mapper>=0.6.1",
+  "simplicio-prompt>=1.12.0",
   "httpx>=0.27",
   "orjson>=3.10",
   "diskcache>=5.6",

simplicio_cli-0.4.3/simplicio/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = "0.4.3"

{simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/cli.py RENAMED Viewed

@@ -12,6 +12,7 @@ first CLI use instead — the closest equivalent that works on every machine.
 from __future__ import annotations
 import argparse
+import json
 import os
 import sys
 from pathlib import Path
@@ -27,7 +28,8 @@ def maybe_autoinstall(cmd: str | None) -> bool:
         return False
     if cmd in ("init", "detect"):
         return False
-    claude_home = Path.home() / ".claude"
+    home = Path(os.environ["HOME"]) if os.environ.get("HOME") else Path.home()
+    claude_home = home / ".claude"
     if not claude_home.is_dir():
         return False
     hook_path = claude_home / "hooks" / "simplicio-userpromptsubmit.sh"
@@ -50,7 +52,7 @@ def maybe_autoinstall(cmd: str | None) -> bool:
     return False
-def main():
+def main(argv=None):
     ap = argparse.ArgumentParser(prog="simplicio")
     sub = ap.add_subparsers(dest="cmd", required=True)
@@ -63,6 +65,12 @@ def main():
     pt.add_argument("--target", required=True)
     pt.add_argument("--criteria", default="- true state\n- false state")
     pt.add_argument("--constraints", default="- build passes")
+    pt.add_argument("--dry-run-task", action="store_true",
+                    help="generate the would-be task output without applying/testing")
+    pt.add_argument("--json", action="store_true",
+                    help="emit stable structured task output")
+    pt.add_argument("--bound-paths", action="append", default=[],
+                    help="glob limiting which paths the task may change; repeatable")
     pb = sub.add_parser("bench", help="compare with vs without (real numbers)")
@@ -81,7 +89,7 @@ def main():
     p_det.add_argument("--quiet", action="store_true")
     p_det.add_argument("--json", action="store_true")
-    a = ap.parse_args()
+    a = ap.parse_args(argv)
     maybe_autoinstall(a.cmd)
     if a.cmd == "index":
         from .precedent import index_repo
@@ -113,8 +121,30 @@ def main():
             argv += ["--json"]
         return detect_main(argv)
     else:
-        from .pipeline import run
-        run(a.root, a.stack, a.goal, a.target, a.criteria, a.constraints)
+        from .pipeline import run, run_task
+        if a.json or a.dry_run_task:
+            result = run_task(
+                a.root,
+                a.stack,
+                a.goal,
+                a.target,
+                a.criteria,
+                a.constraints,
+                dry_run_task=a.dry_run_task,
+                bound_paths=a.bound_paths,
+                quiet=a.json,
+            )
+            if a.json:
+                print(json.dumps(result, sort_keys=True))
+            else:
+                status = "DRY-RUN" if a.dry_run_task else "DONE"
+                print(f"{status}: {result['diff_summary']}")
+                for warning in result["warnings"]:
+                    print(f"warning: {warning}", file=sys.stderr)
+            return 0 if (a.dry_run_task or result["applied"]) else 1
+        run(a.root, a.stack, a.goal, a.target, a.criteria, a.constraints,
+            bound_paths=a.bound_paths)
+        return 0
 if __name__ == "__main__":
     main()

{simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/pipeline.py RENAMED Viewed

@@ -1,5 +1,6 @@
 """pipeline.py — build -> generate -> validate -> test -> fix (loop)."""
 from dataclasses import dataclass
+import fnmatch
 import os, re, subprocess
 from .observability import estimate_tokens, log_run
 from .prompt import build_prompt
@@ -18,7 +19,40 @@ class FailureClassification:
     kind: str
     guidance: str
-def validate_generated_output(output):
+def extract_changed_files(output):
+    text = output or ""
+    files = []
+    for match in re.finditer(r"^diff --git a/(.+?) b/(.+?)$", text, flags=re.M):
+        files.append(match.group(2).strip())
+    for match in re.finditer(r"^\+\+\+ b/(.+?)$", text, flags=re.M):
+        files.append(match.group(1).strip())
+    return list(dict.fromkeys(f for f in files if f and f != "/dev/null"))
+def _matches_bound(path, patterns):
+    normalized = path.replace(os.sep, "/").lstrip("./")
+    for raw in patterns or []:
+        pattern = str(raw).replace(os.sep, "/").lstrip("./")
+        if fnmatch.fnmatch(normalized, pattern):
+            return True
+        if pattern.endswith("/**"):
+            prefix = pattern[:-3].rstrip("/")
+            if normalized == prefix or normalized.startswith(f"{prefix}/"):
+                return True
+    return False
+def _bound_path_warnings(files, bound_paths):
+    if not bound_paths:
+        return []
+    outside = [path for path in files if not _matches_bound(path, bound_paths)]
+    if not outside:
+        return []
+    return [
+        "diff touches path outside bound paths: "
+        + ", ".join(outside)
+        + f" (allowed: {', '.join(bound_paths)})"
+    ]
+def validate_generated_output(output, bound_paths=None):
     text = output or ""
     hints = []
     has_diff = bool(re.search(r"^diff --git |^--- .+\n\+\+\+ ", text, flags=re.M))
@@ -29,6 +63,7 @@ def validate_generated_output(output):
         hints.append("include a TEST block or concrete test code")
     if re.search(r"(?i)\b(pseudocode|placeholder|todo: implement)\b", text):
         hints.append("replace placeholders with executable code")
+    hints.extend(_bound_path_warnings(extract_changed_files(output), bound_paths))
     return ValidationResult(
         ok=not hints,
         reason="ok" if not hints else "; ".join(hints),
@@ -64,10 +99,10 @@ def build_retry_feedback(attempt, validation=None, test_log=""):
     lines.append("Return the full corrected DIFF + TEST block only.")
     return "\n".join(lines)
-def _apply_and_test(output, root):
+def _apply_and_test(output, root, bound_paths=None):
     os.makedirs(os.path.join(root, ".simplicio"), exist_ok=True)
     open(os.path.join(root, ".simplicio/last_output.txt"), "w").write(output or "")
-    validation = validate_generated_output(output)
+    validation = validate_generated_output(output, bound_paths)
     if not validation.ok:
         return False, f"pre-apply validation failed: {validation.reason}"
     # PLUG: extract diff -> git apply; extract test. Here we run the test command.
@@ -75,13 +110,47 @@ def _apply_and_test(output, root):
     p = subprocess.run(cmd, shell=True, cwd=root, capture_output=True, text=True)
     return p.returncode == 0, (p.stdout + p.stderr)[-2000:]
-def run(root, stack, goal, target, criteria, constraints):
+def _diff_summary(files_changed):
+    if not files_changed:
+        return "no changed files reported"
+    return "changed " + ", ".join(files_changed)
+def _task_result(task_id, prompt, output, *, applied, warnings=None):
+    files_changed = extract_changed_files(output)
+    return {
+        "task_id": task_id,
+        "applied": bool(applied),
+        "files_changed": files_changed,
+        "tokens_used": {
+            "prompt": estimate_tokens(prompt),
+            "completion": estimate_tokens(output or ""),
+        },
+        "cost_usd": 0.0,
+        "diff_summary": _diff_summary(files_changed),
+        "warnings": warnings or [],
+    }
+def run_task(root, stack, goal, target, criteria, constraints, *,
+             dry_run_task=False, bound_paths=None, quiet=False):
     prompt = build_prompt(root, stack, goal, target, criteria, constraints)
+    if dry_run_task:
+        output = generate(prompt)
+        validation = validate_generated_output(output, bound_paths)
+        warnings = [] if validation.ok else [validation.reason]
+        return _task_result(target, prompt, output, applied=False, warnings=warnings)
     feedback = None
+    last_output = ""
+    last_validation = None
+    last_log = ""
     for t in range(1, MAX_ATTEMPTS + 1):
-        print(f"--- attempt {t} (provider={os.environ.get('SIMPLICIO_PROVIDER','claude')}) ---")
+        if not quiet:
+            print(f"--- attempt {t} (provider={os.environ.get('SIMPLICIO_PROVIDER','claude')}) ---")
         output = generate(prompt, feedback)
-        ok, log = _apply_and_test(output, root)
+        last_output = output or ""
+        last_validation = validate_generated_output(output, bound_paths)
+        ok, log = _apply_and_test(output, root, bound_paths)
+        last_log = log
         log_run(root, {
             "mode": "pipeline",
             "attempt": t,
@@ -92,9 +161,24 @@ def run(root, stack, goal, target, criteria, constraints):
             "stack": stack,
         })
         if ok:
-            print("PASSED the contract. DONE.")
-            return output
-        print("failed:", log[:300])
-        feedback = build_retry_feedback(t + 1, validate_generated_output(output), log)
-    print("attempts exhausted — manual review needed.")
+            if not quiet:
+                print("PASSED the contract. DONE.")
+            return _task_result(target, prompt, output, applied=True)
+        if not quiet:
+            print("failed:", log[:300])
+        feedback = build_retry_feedback(t + 1, last_validation, log)
+    if not quiet:
+        print("attempts exhausted — manual review needed.")
+    warnings = []
+    if last_validation and not last_validation.ok:
+        warnings.append(last_validation.reason)
+    elif last_log:
+        warnings.append(last_log[:500])
+    return _task_result(target, prompt, last_output, applied=False, warnings=warnings)
+def run(root, stack, goal, target, criteria, constraints, bound_paths=None):
+    result = run_task(root, stack, goal, target, criteria, constraints,
+                      bound_paths=bound_paths)
+    if result["applied"]:
+        return result
     return None

{simplicio_cli-0.4.1 → simplicio_cli-0.4.3/simplicio_cli.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: simplicio-cli
-Version: 0.4.1
+Version: 0.4.3
 Summary: Portable task-to-code pipeline that works with any LLM. Turn a one-line task into a verified code change — diff + test + verify loop. +55 pts on a 156-check benchmark, 21% faster, ~same tokens.
 Author-email: Wesley Simplicio <wesleybob4@gmail.com>
 License: MIT
@@ -31,8 +31,8 @@ Requires-Dist: sentence-transformers>=2.2
 Requires-Dist: numpy>=1.23
 Requires-Dist: anthropic>=0.30
 Requires-Dist: openai>=1.30
-Requires-Dist: simplicio-mapper>=0.6.0
-Requires-Dist: simplicio-prompt>=1.9.0
+Requires-Dist: simplicio-mapper>=0.6.1
+Requires-Dist: simplicio-prompt>=1.12.0
 Requires-Dist: httpx>=0.27
 Requires-Dist: orjson>=3.10
 Requires-Dist: diskcache>=5.6
@@ -134,12 +134,25 @@ M1 MacBook (8 GB), five sub-4B tiny models, six frontier 2026 models, and three
 mid-tier 7B–12B open models. Every one gained at least **+14 points** when
 wrapped in simplicio's 6-layer contract.
-#### Hugging Face — Qwen2.5-Coder, re-run on 2026-05-27 (latest mapper, 10 cases/side, 156 checks)
+#### Hugging Face — recommended Qwen3-Coder defaults (HF router)
-First batch of the smaller→larger re-benchmark against the latest
-`simplicio-mapper` artifacts. The 1.5B runs on CPU via `transformers`
-(Hugging Face Inference Providers does not serve it); the 3B and 7B run
-through the HF router (`https://router.huggingface.co/v1`).
+The served Qwen Coder recommendation now uses the Qwen3-Coder MoE family.
+`Qwen/Qwen2.5-Coder-3B-Instruct` and
+`Qwen/Qwen2.5-Coder-7B-Instruct` remain available as legacy fallback models for
+historical comparisons and hardware that cannot host the MoE successors.
+| Slot | Recommended model | Route | Notes |
+|---|---|---|---|
+| Efficient coder | `Qwen/Qwen3-Coder-30B-A3B-Instruct` | HF router | 30B total / ~3B active MoE successor to the 3B slot |
+| High-ceiling coder | `Qwen/Qwen3-Coder-Next` | HF router | 80B total / ~3B active MoE successor to the 7B slot |
+> Reproduce the new default set:
+> `BENCH_BASE_URL=https://router.huggingface.co/v1 BENCH_API_KEY=<hf-token>
+> BENCH_MODELS="Qwen/Qwen3-Coder-30B-A3B-Instruct,Qwen/Qwen3-Coder-Next"
+> python3 bench/run_offline.py`.
+Legacy Qwen2.5-Coder baseline, re-run on 2026-05-27 against the latest
+`simplicio-mapper` artifacts (10 cases/side, 156 checks):
 | Model | Without simplicio | With simplicio | Gain |
 |---|---|---|---|
@@ -148,10 +161,9 @@ through the HF router (`https://router.huggingface.co/v1`).
 | **Qwen 2.5 Coder 1.5B** (`Qwen/Qwen2.5-Coder-1.5B-Instruct`, local CPU) | 30% | **92%** | **+62 pts** |
 | **HF avg (3 models · 10 cases · 156 checks)** | **34%** | **94%** | **+60 pts (+172%)** |
-> Monotonic from smaller to larger: pass-rate with simplicio climbs **92% →
-> 94% → 96%** as the model grows, while the raw-prompt baseline stays at
-> **30–38%**. The 1.5B model gains the most (**+62 pts**) — the contract does
-> the heaviest lifting where the model is weakest. Reproduce:
+> Monotonic from smaller to larger in the legacy baseline: pass-rate with
+> simplicio climbs **92% → 94% → 96%** as the model grows, while the raw-prompt
+> baseline stays at **30–38%**. Reproduce the legacy set:
 > `BENCH_BASE_URL=https://router.huggingface.co/v1 BENCH_API_KEY=<hf-token>
 > BENCH_MODELS="local:Qwen/Qwen2.5-Coder-1.5B-Instruct,Qwen/Qwen2.5-Coder-3B-Instruct,Qwen/Qwen2.5-Coder-7B-Instruct"
 > python3 bench/run_offline.py`.
@@ -167,7 +179,18 @@ Pro) show `n/a` for the new column: their OpenRouter calls hit account-level
 HTTP 402 / provider failures on >50% of requests this round, so the sample is
 too small to publish; their old numbers still stand.
-#### Local offline — qwen2.5-coder on Ollama, M1 8 GB, run on 2026-05-27 (30 runs/side, 156 checks)
+#### Local offline — Qwen3-Coder GGUF recommendation, Qwen2.5 legacy baseline
+For local OpenAI-compatible servers, prefer the Qwen3-Coder GGUF builds when
+the machine can host MoE weights:
+| Slot | Recommended local weights | Notes |
+|---|---|---|
+| Efficient coder | `unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF` | Primary local successor for the 3B-active slot |
+| High-ceiling coder | `unsloth/Qwen3-Coder-Next-GGUF` | 24 GB GPU-class successor for long-context work |
+The last fully offline fallback baseline remains qwen2.5-coder on Ollama,
+M1 8 GB, run on 2026-05-27 (30 runs/side, 156 checks):
 | Model | Without simplicio | With simplicio | Gain |
 |---|---|---|---|
@@ -180,7 +203,7 @@ too small to publish; their old numbers still stand.
 > `http://localhost:11434/v1` (Ollama's OpenAI-compatible endpoint). A
 > 1.5B-param model running on a 4-year-old laptop reaches **88%** pass-rate
 > with simplicio's contract — same hardware, same model, raw prompt = 32%.
-> Reproduce: `BENCH_BASE_URL=http://localhost:11434/v1 BENCH_API_KEY=ollama
+> Reproduce the legacy fallback: `BENCH_BASE_URL=http://localhost:11434/v1 BENCH_API_KEY=ollama
 > BENCH_MODELS="qwen2.5-coder:7b" python3 bench/run_offline.py`.
 #### Tiny models — sub-4B, run on 2026-05-26 (50 runs/side, 260 checks)
@@ -382,6 +405,29 @@ simplicio task "..." --stack angular --target ...
 How it works: simplicio shells out to `claude -p "<prompt>"` (or `codex exec "<prompt>"`) as a subprocess, captures stdout, runs the test loop. The inner CLI authenticates via your existing OAuth session in `~/.claude/` or `~/.codex/`. simplicio sets `SIMPLICIO_HOOK_GUARD=1` in the subprocess env so the inner Claude Code session does **not** re-fire simplicio's own UserPromptSubmit hook (no infinite recursion).
+For orchestrators such as SendSprint, `simplicio task` also has a structured
+contract:
+```bash
+simplicio task "hide Delete button for non-admins" \
+  --stack angular \
+  --target src/app/screen/screen.component.html \
+  --dry-run-task \
+  --json
+simplicio task "front-only task" \
+  --stack angular \
+  --target src/app/screen/screen.component.html \
+  --bound-paths "src/app/**" \
+  --json
+```
+`--dry-run-task` generates the would-be diff/test output without applying or
+testing it. `--json` returns `{task_id, applied, files_changed, tokens_used,
+cost_usd, diff_summary, warnings}`. Repeat `--bound-paths <glob>` to reject
+diffs outside the allowed edit surface; violations are reported in `warnings`
+and the command exits non-zero.
 ### Path 3 example — standalone with API key
 ```bash

{simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio_cli.egg-info/requires.txt RENAMED Viewed

@@ -2,8 +2,8 @@ sentence-transformers>=2.2
 numpy>=1.23
 anthropic>=0.30
 openai>=1.30
-simplicio-mapper>=0.6.0
-simplicio-prompt>=1.9.0
+simplicio-mapper>=0.6.1
+simplicio-prompt>=1.12.0
 httpx>=0.27
 orjson>=3.10
 diskcache>=5.6