simplicio-cli 0.4.1__tar.gz → 0.4.3__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {simplicio_cli-0.4.1/simplicio_cli.egg-info → simplicio_cli-0.4.3}/PKG-INFO +60 -14
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/README.md +57 -11
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/pyproject.toml +3 -3
- simplicio_cli-0.4.3/simplicio/__init__.py +1 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/cli.py +35 -5
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/pipeline.py +95 -11
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3/simplicio_cli.egg-info}/PKG-INFO +60 -14
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio_cli.egg-info/requires.txt +2 -2
- simplicio_cli-0.4.1/simplicio/__init__.py +0 -1
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/LICENSE +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/setup.cfg +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/adaptive.py +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/bench.py +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/cache.py +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/detect.py +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/init.py +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/mapper.py +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/observability.py +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/precedent.py +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/prompt.py +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/providers.py +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/skill_router.py +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/templates/SKILL.md +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/templates/simplicio_prompt.md +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/templates/userpromptsubmit-hook.sh +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/utils/__init__.py +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/utils/cache.py +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/utils/http_client.py +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio/utils/serialization.py +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio_cli.egg-info/SOURCES.txt +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio_cli.egg-info/dependency_links.txt +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio_cli.egg-info/entry_points.txt +0 -0
- {simplicio_cli-0.4.1 → simplicio_cli-0.4.3}/simplicio_cli.egg-info/top_level.txt +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: simplicio-cli
|
|
3
|
-
Version: 0.4.
|
|
3
|
+
Version: 0.4.3
|
|
4
4
|
Summary: Portable task-to-code pipeline that works with any LLM. Turn a one-line task into a verified code change — diff + test + verify loop. +55 pts on a 156-check benchmark, 21% faster, ~same tokens.
|
|
5
5
|
Author-email: Wesley Simplicio <wesleybob4@gmail.com>
|
|
6
6
|
License: MIT
|
|
@@ -31,8 +31,8 @@ Requires-Dist: sentence-transformers>=2.2
|
|
|
31
31
|
Requires-Dist: numpy>=1.23
|
|
32
32
|
Requires-Dist: anthropic>=0.30
|
|
33
33
|
Requires-Dist: openai>=1.30
|
|
34
|
-
Requires-Dist: simplicio-mapper>=0.6.
|
|
35
|
-
Requires-Dist: simplicio-prompt>=1.
|
|
34
|
+
Requires-Dist: simplicio-mapper>=0.6.1
|
|
35
|
+
Requires-Dist: simplicio-prompt>=1.12.0
|
|
36
36
|
Requires-Dist: httpx>=0.27
|
|
37
37
|
Requires-Dist: orjson>=3.10
|
|
38
38
|
Requires-Dist: diskcache>=5.6
|
|
@@ -134,12 +134,25 @@ M1 MacBook (8 GB), five sub-4B tiny models, six frontier 2026 models, and three
|
|
|
134
134
|
mid-tier 7B–12B open models. Every one gained at least **+14 points** when
|
|
135
135
|
wrapped in simplicio's 6-layer contract.
|
|
136
136
|
|
|
137
|
-
#### Hugging Face —
|
|
137
|
+
#### Hugging Face — recommended Qwen3-Coder defaults (HF router)
|
|
138
138
|
|
|
139
|
-
|
|
140
|
-
`
|
|
141
|
-
|
|
142
|
-
|
|
139
|
+
The served Qwen Coder recommendation now uses the Qwen3-Coder MoE family.
|
|
140
|
+
`Qwen/Qwen2.5-Coder-3B-Instruct` and
|
|
141
|
+
`Qwen/Qwen2.5-Coder-7B-Instruct` remain available as legacy fallback models for
|
|
142
|
+
historical comparisons and hardware that cannot host the MoE successors.
|
|
143
|
+
|
|
144
|
+
| Slot | Recommended model | Route | Notes |
|
|
145
|
+
|---|---|---|---|
|
|
146
|
+
| Efficient coder | `Qwen/Qwen3-Coder-30B-A3B-Instruct` | HF router | 30B total / ~3B active MoE successor to the 3B slot |
|
|
147
|
+
| High-ceiling coder | `Qwen/Qwen3-Coder-Next` | HF router | 80B total / ~3B active MoE successor to the 7B slot |
|
|
148
|
+
|
|
149
|
+
> Reproduce the new default set:
|
|
150
|
+
> `BENCH_BASE_URL=https://router.huggingface.co/v1 BENCH_API_KEY=<hf-token>
|
|
151
|
+
> BENCH_MODELS="Qwen/Qwen3-Coder-30B-A3B-Instruct,Qwen/Qwen3-Coder-Next"
|
|
152
|
+
> python3 bench/run_offline.py`.
|
|
153
|
+
|
|
154
|
+
Legacy Qwen2.5-Coder baseline, re-run on 2026-05-27 against the latest
|
|
155
|
+
`simplicio-mapper` artifacts (10 cases/side, 156 checks):
|
|
143
156
|
|
|
144
157
|
| Model | Without simplicio | With simplicio | Gain |
|
|
145
158
|
|---|---|---|---|
|
|
@@ -148,10 +161,9 @@ through the HF router (`https://router.huggingface.co/v1`).
|
|
|
148
161
|
| **Qwen 2.5 Coder 1.5B** (`Qwen/Qwen2.5-Coder-1.5B-Instruct`, local CPU) | 30% | **92%** | **+62 pts** |
|
|
149
162
|
| **HF avg (3 models · 10 cases · 156 checks)** | **34%** | **94%** | **+60 pts (+172%)** |
|
|
150
163
|
|
|
151
|
-
> Monotonic from smaller to larger: pass-rate with
|
|
152
|
-
> 94% → 96%** as the model grows, while the raw-prompt
|
|
153
|
-
> **30–38%**.
|
|
154
|
-
> the heaviest lifting where the model is weakest. Reproduce:
|
|
164
|
+
> Monotonic from smaller to larger in the legacy baseline: pass-rate with
|
|
165
|
+
> simplicio climbs **92% → 94% → 96%** as the model grows, while the raw-prompt
|
|
166
|
+
> baseline stays at **30–38%**. Reproduce the legacy set:
|
|
155
167
|
> `BENCH_BASE_URL=https://router.huggingface.co/v1 BENCH_API_KEY=<hf-token>
|
|
156
168
|
> BENCH_MODELS="local:Qwen/Qwen2.5-Coder-1.5B-Instruct,Qwen/Qwen2.5-Coder-3B-Instruct,Qwen/Qwen2.5-Coder-7B-Instruct"
|
|
157
169
|
> python3 bench/run_offline.py`.
|
|
@@ -167,7 +179,18 @@ Pro) show `n/a` for the new column: their OpenRouter calls hit account-level
|
|
|
167
179
|
HTTP 402 / provider failures on >50% of requests this round, so the sample is
|
|
168
180
|
too small to publish; their old numbers still stand.
|
|
169
181
|
|
|
170
|
-
#### Local offline —
|
|
182
|
+
#### Local offline — Qwen3-Coder GGUF recommendation, Qwen2.5 legacy baseline
|
|
183
|
+
|
|
184
|
+
For local OpenAI-compatible servers, prefer the Qwen3-Coder GGUF builds when
|
|
185
|
+
the machine can host MoE weights:
|
|
186
|
+
|
|
187
|
+
| Slot | Recommended local weights | Notes |
|
|
188
|
+
|---|---|---|
|
|
189
|
+
| Efficient coder | `unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF` | Primary local successor for the 3B-active slot |
|
|
190
|
+
| High-ceiling coder | `unsloth/Qwen3-Coder-Next-GGUF` | 24 GB GPU-class successor for long-context work |
|
|
191
|
+
|
|
192
|
+
The last fully offline fallback baseline remains qwen2.5-coder on Ollama,
|
|
193
|
+
M1 8 GB, run on 2026-05-27 (30 runs/side, 156 checks):
|
|
171
194
|
|
|
172
195
|
| Model | Without simplicio | With simplicio | Gain |
|
|
173
196
|
|---|---|---|---|
|
|
@@ -180,7 +203,7 @@ too small to publish; their old numbers still stand.
|
|
|
180
203
|
> `http://localhost:11434/v1` (Ollama's OpenAI-compatible endpoint). A
|
|
181
204
|
> 1.5B-param model running on a 4-year-old laptop reaches **88%** pass-rate
|
|
182
205
|
> with simplicio's contract — same hardware, same model, raw prompt = 32%.
|
|
183
|
-
> Reproduce: `BENCH_BASE_URL=http://localhost:11434/v1 BENCH_API_KEY=ollama
|
|
206
|
+
> Reproduce the legacy fallback: `BENCH_BASE_URL=http://localhost:11434/v1 BENCH_API_KEY=ollama
|
|
184
207
|
> BENCH_MODELS="qwen2.5-coder:7b" python3 bench/run_offline.py`.
|
|
185
208
|
|
|
186
209
|
#### Tiny models — sub-4B, run on 2026-05-26 (50 runs/side, 260 checks)
|
|
@@ -382,6 +405,29 @@ simplicio task "..." --stack angular --target ...
|
|
|
382
405
|
|
|
383
406
|
How it works: simplicio shells out to `claude -p "<prompt>"` (or `codex exec "<prompt>"`) as a subprocess, captures stdout, runs the test loop. The inner CLI authenticates via your existing OAuth session in `~/.claude/` or `~/.codex/`. simplicio sets `SIMPLICIO_HOOK_GUARD=1` in the subprocess env so the inner Claude Code session does **not** re-fire simplicio's own UserPromptSubmit hook (no infinite recursion).
|
|
384
407
|
|
|
408
|
+
For orchestrators such as SendSprint, `simplicio task` also has a structured
|
|
409
|
+
contract:
|
|
410
|
+
|
|
411
|
+
```bash
|
|
412
|
+
simplicio task "hide Delete button for non-admins" \
|
|
413
|
+
--stack angular \
|
|
414
|
+
--target src/app/screen/screen.component.html \
|
|
415
|
+
--dry-run-task \
|
|
416
|
+
--json
|
|
417
|
+
|
|
418
|
+
simplicio task "front-only task" \
|
|
419
|
+
--stack angular \
|
|
420
|
+
--target src/app/screen/screen.component.html \
|
|
421
|
+
--bound-paths "src/app/**" \
|
|
422
|
+
--json
|
|
423
|
+
```
|
|
424
|
+
|
|
425
|
+
`--dry-run-task` generates the would-be diff/test output without applying or
|
|
426
|
+
testing it. `--json` returns `{task_id, applied, files_changed, tokens_used,
|
|
427
|
+
cost_usd, diff_summary, warnings}`. Repeat `--bound-paths <glob>` to reject
|
|
428
|
+
diffs outside the allowed edit surface; violations are reported in `warnings`
|
|
429
|
+
and the command exits non-zero.
|
|
430
|
+
|
|
385
431
|
### Path 3 example — standalone with API key
|
|
386
432
|
|
|
387
433
|
```bash
|
|
@@ -92,12 +92,25 @@ M1 MacBook (8 GB), five sub-4B tiny models, six frontier 2026 models, and three
|
|
|
92
92
|
mid-tier 7B–12B open models. Every one gained at least **+14 points** when
|
|
93
93
|
wrapped in simplicio's 6-layer contract.
|
|
94
94
|
|
|
95
|
-
#### Hugging Face —
|
|
95
|
+
#### Hugging Face — recommended Qwen3-Coder defaults (HF router)
|
|
96
96
|
|
|
97
|
-
|
|
98
|
-
`
|
|
99
|
-
|
|
100
|
-
|
|
97
|
+
The served Qwen Coder recommendation now uses the Qwen3-Coder MoE family.
|
|
98
|
+
`Qwen/Qwen2.5-Coder-3B-Instruct` and
|
|
99
|
+
`Qwen/Qwen2.5-Coder-7B-Instruct` remain available as legacy fallback models for
|
|
100
|
+
historical comparisons and hardware that cannot host the MoE successors.
|
|
101
|
+
|
|
102
|
+
| Slot | Recommended model | Route | Notes |
|
|
103
|
+
|---|---|---|---|
|
|
104
|
+
| Efficient coder | `Qwen/Qwen3-Coder-30B-A3B-Instruct` | HF router | 30B total / ~3B active MoE successor to the 3B slot |
|
|
105
|
+
| High-ceiling coder | `Qwen/Qwen3-Coder-Next` | HF router | 80B total / ~3B active MoE successor to the 7B slot |
|
|
106
|
+
|
|
107
|
+
> Reproduce the new default set:
|
|
108
|
+
> `BENCH_BASE_URL=https://router.huggingface.co/v1 BENCH_API_KEY=<hf-token>
|
|
109
|
+
> BENCH_MODELS="Qwen/Qwen3-Coder-30B-A3B-Instruct,Qwen/Qwen3-Coder-Next"
|
|
110
|
+
> python3 bench/run_offline.py`.
|
|
111
|
+
|
|
112
|
+
Legacy Qwen2.5-Coder baseline, re-run on 2026-05-27 against the latest
|
|
113
|
+
`simplicio-mapper` artifacts (10 cases/side, 156 checks):
|
|
101
114
|
|
|
102
115
|
| Model | Without simplicio | With simplicio | Gain |
|
|
103
116
|
|---|---|---|---|
|
|
@@ -106,10 +119,9 @@ through the HF router (`https://router.huggingface.co/v1`).
|
|
|
106
119
|
| **Qwen 2.5 Coder 1.5B** (`Qwen/Qwen2.5-Coder-1.5B-Instruct`, local CPU) | 30% | **92%** | **+62 pts** |
|
|
107
120
|
| **HF avg (3 models · 10 cases · 156 checks)** | **34%** | **94%** | **+60 pts (+172%)** |
|
|
108
121
|
|
|
109
|
-
> Monotonic from smaller to larger: pass-rate with
|
|
110
|
-
> 94% → 96%** as the model grows, while the raw-prompt
|
|
111
|
-
> **30–38%**.
|
|
112
|
-
> the heaviest lifting where the model is weakest. Reproduce:
|
|
122
|
+
> Monotonic from smaller to larger in the legacy baseline: pass-rate with
|
|
123
|
+
> simplicio climbs **92% → 94% → 96%** as the model grows, while the raw-prompt
|
|
124
|
+
> baseline stays at **30–38%**. Reproduce the legacy set:
|
|
113
125
|
> `BENCH_BASE_URL=https://router.huggingface.co/v1 BENCH_API_KEY=<hf-token>
|
|
114
126
|
> BENCH_MODELS="local:Qwen/Qwen2.5-Coder-1.5B-Instruct,Qwen/Qwen2.5-Coder-3B-Instruct,Qwen/Qwen2.5-Coder-7B-Instruct"
|
|
115
127
|
> python3 bench/run_offline.py`.
|
|
@@ -125,7 +137,18 @@ Pro) show `n/a` for the new column: their OpenRouter calls hit account-level
|
|
|
125
137
|
HTTP 402 / provider failures on >50% of requests this round, so the sample is
|
|
126
138
|
too small to publish; their old numbers still stand.
|
|
127
139
|
|
|
128
|
-
#### Local offline —
|
|
140
|
+
#### Local offline — Qwen3-Coder GGUF recommendation, Qwen2.5 legacy baseline
|
|
141
|
+
|
|
142
|
+
For local OpenAI-compatible servers, prefer the Qwen3-Coder GGUF builds when
|
|
143
|
+
the machine can host MoE weights:
|
|
144
|
+
|
|
145
|
+
| Slot | Recommended local weights | Notes |
|
|
146
|
+
|---|---|---|
|
|
147
|
+
| Efficient coder | `unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF` | Primary local successor for the 3B-active slot |
|
|
148
|
+
| High-ceiling coder | `unsloth/Qwen3-Coder-Next-GGUF` | 24 GB GPU-class successor for long-context work |
|
|
149
|
+
|
|
150
|
+
The last fully offline fallback baseline remains qwen2.5-coder on Ollama,
|
|
151
|
+
M1 8 GB, run on 2026-05-27 (30 runs/side, 156 checks):
|
|
129
152
|
|
|
130
153
|
| Model | Without simplicio | With simplicio | Gain |
|
|
131
154
|
|---|---|---|---|
|
|
@@ -138,7 +161,7 @@ too small to publish; their old numbers still stand.
|
|
|
138
161
|
> `http://localhost:11434/v1` (Ollama's OpenAI-compatible endpoint). A
|
|
139
162
|
> 1.5B-param model running on a 4-year-old laptop reaches **88%** pass-rate
|
|
140
163
|
> with simplicio's contract — same hardware, same model, raw prompt = 32%.
|
|
141
|
-
> Reproduce: `BENCH_BASE_URL=http://localhost:11434/v1 BENCH_API_KEY=ollama
|
|
164
|
+
> Reproduce the legacy fallback: `BENCH_BASE_URL=http://localhost:11434/v1 BENCH_API_KEY=ollama
|
|
142
165
|
> BENCH_MODELS="qwen2.5-coder:7b" python3 bench/run_offline.py`.
|
|
143
166
|
|
|
144
167
|
#### Tiny models — sub-4B, run on 2026-05-26 (50 runs/side, 260 checks)
|
|
@@ -340,6 +363,29 @@ simplicio task "..." --stack angular --target ...
|
|
|
340
363
|
|
|
341
364
|
How it works: simplicio shells out to `claude -p "<prompt>"` (or `codex exec "<prompt>"`) as a subprocess, captures stdout, runs the test loop. The inner CLI authenticates via your existing OAuth session in `~/.claude/` or `~/.codex/`. simplicio sets `SIMPLICIO_HOOK_GUARD=1` in the subprocess env so the inner Claude Code session does **not** re-fire simplicio's own UserPromptSubmit hook (no infinite recursion).
|
|
342
365
|
|
|
366
|
+
For orchestrators such as SendSprint, `simplicio task` also has a structured
|
|
367
|
+
contract:
|
|
368
|
+
|
|
369
|
+
```bash
|
|
370
|
+
simplicio task "hide Delete button for non-admins" \
|
|
371
|
+
--stack angular \
|
|
372
|
+
--target src/app/screen/screen.component.html \
|
|
373
|
+
--dry-run-task \
|
|
374
|
+
--json
|
|
375
|
+
|
|
376
|
+
simplicio task "front-only task" \
|
|
377
|
+
--stack angular \
|
|
378
|
+
--target src/app/screen/screen.component.html \
|
|
379
|
+
--bound-paths "src/app/**" \
|
|
380
|
+
--json
|
|
381
|
+
```
|
|
382
|
+
|
|
383
|
+
`--dry-run-task` generates the would-be diff/test output without applying or
|
|
384
|
+
testing it. `--json` returns `{task_id, applied, files_changed, tokens_used,
|
|
385
|
+
cost_usd, diff_summary, warnings}`. Repeat `--bound-paths <glob>` to reject
|
|
386
|
+
diffs outside the allowed edit surface; violations are reported in `warnings`
|
|
387
|
+
and the command exits non-zero.
|
|
388
|
+
|
|
343
389
|
### Path 3 example — standalone with API key
|
|
344
390
|
|
|
345
391
|
```bash
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
[project]
|
|
2
2
|
name = "simplicio-cli"
|
|
3
|
-
version = "0.4.
|
|
3
|
+
version = "0.4.3"
|
|
4
4
|
description = "Portable task-to-code pipeline that works with any LLM. Turn a one-line task into a verified code change — diff + test + verify loop. +55 pts on a 156-check benchmark, 21% faster, ~same tokens."
|
|
5
5
|
readme = "README.md"
|
|
6
6
|
license = { text = "MIT" }
|
|
@@ -45,8 +45,8 @@ dependencies = [
|
|
|
45
45
|
"numpy>=1.23",
|
|
46
46
|
"anthropic>=0.30",
|
|
47
47
|
"openai>=1.30",
|
|
48
|
-
"simplicio-mapper>=0.6.
|
|
49
|
-
"simplicio-prompt>=1.
|
|
48
|
+
"simplicio-mapper>=0.6.1",
|
|
49
|
+
"simplicio-prompt>=1.12.0",
|
|
50
50
|
"httpx>=0.27",
|
|
51
51
|
"orjson>=3.10",
|
|
52
52
|
"diskcache>=5.6",
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
__version__ = "0.4.3"
|
|
@@ -12,6 +12,7 @@ first CLI use instead — the closest equivalent that works on every machine.
|
|
|
12
12
|
from __future__ import annotations
|
|
13
13
|
|
|
14
14
|
import argparse
|
|
15
|
+
import json
|
|
15
16
|
import os
|
|
16
17
|
import sys
|
|
17
18
|
from pathlib import Path
|
|
@@ -27,7 +28,8 @@ def maybe_autoinstall(cmd: str | None) -> bool:
|
|
|
27
28
|
return False
|
|
28
29
|
if cmd in ("init", "detect"):
|
|
29
30
|
return False
|
|
30
|
-
|
|
31
|
+
home = Path(os.environ["HOME"]) if os.environ.get("HOME") else Path.home()
|
|
32
|
+
claude_home = home / ".claude"
|
|
31
33
|
if not claude_home.is_dir():
|
|
32
34
|
return False
|
|
33
35
|
hook_path = claude_home / "hooks" / "simplicio-userpromptsubmit.sh"
|
|
@@ -50,7 +52,7 @@ def maybe_autoinstall(cmd: str | None) -> bool:
|
|
|
50
52
|
return False
|
|
51
53
|
|
|
52
54
|
|
|
53
|
-
def main():
|
|
55
|
+
def main(argv=None):
|
|
54
56
|
ap = argparse.ArgumentParser(prog="simplicio")
|
|
55
57
|
sub = ap.add_subparsers(dest="cmd", required=True)
|
|
56
58
|
|
|
@@ -63,6 +65,12 @@ def main():
|
|
|
63
65
|
pt.add_argument("--target", required=True)
|
|
64
66
|
pt.add_argument("--criteria", default="- true state\n- false state")
|
|
65
67
|
pt.add_argument("--constraints", default="- build passes")
|
|
68
|
+
pt.add_argument("--dry-run-task", action="store_true",
|
|
69
|
+
help="generate the would-be task output without applying/testing")
|
|
70
|
+
pt.add_argument("--json", action="store_true",
|
|
71
|
+
help="emit stable structured task output")
|
|
72
|
+
pt.add_argument("--bound-paths", action="append", default=[],
|
|
73
|
+
help="glob limiting which paths the task may change; repeatable")
|
|
66
74
|
|
|
67
75
|
|
|
68
76
|
pb = sub.add_parser("bench", help="compare with vs without (real numbers)")
|
|
@@ -81,7 +89,7 @@ def main():
|
|
|
81
89
|
p_det.add_argument("--quiet", action="store_true")
|
|
82
90
|
p_det.add_argument("--json", action="store_true")
|
|
83
91
|
|
|
84
|
-
a = ap.parse_args()
|
|
92
|
+
a = ap.parse_args(argv)
|
|
85
93
|
maybe_autoinstall(a.cmd)
|
|
86
94
|
if a.cmd == "index":
|
|
87
95
|
from .precedent import index_repo
|
|
@@ -113,8 +121,30 @@ def main():
|
|
|
113
121
|
argv += ["--json"]
|
|
114
122
|
return detect_main(argv)
|
|
115
123
|
else:
|
|
116
|
-
from .pipeline import run
|
|
117
|
-
|
|
124
|
+
from .pipeline import run, run_task
|
|
125
|
+
if a.json or a.dry_run_task:
|
|
126
|
+
result = run_task(
|
|
127
|
+
a.root,
|
|
128
|
+
a.stack,
|
|
129
|
+
a.goal,
|
|
130
|
+
a.target,
|
|
131
|
+
a.criteria,
|
|
132
|
+
a.constraints,
|
|
133
|
+
dry_run_task=a.dry_run_task,
|
|
134
|
+
bound_paths=a.bound_paths,
|
|
135
|
+
quiet=a.json,
|
|
136
|
+
)
|
|
137
|
+
if a.json:
|
|
138
|
+
print(json.dumps(result, sort_keys=True))
|
|
139
|
+
else:
|
|
140
|
+
status = "DRY-RUN" if a.dry_run_task else "DONE"
|
|
141
|
+
print(f"{status}: {result['diff_summary']}")
|
|
142
|
+
for warning in result["warnings"]:
|
|
143
|
+
print(f"warning: {warning}", file=sys.stderr)
|
|
144
|
+
return 0 if (a.dry_run_task or result["applied"]) else 1
|
|
145
|
+
run(a.root, a.stack, a.goal, a.target, a.criteria, a.constraints,
|
|
146
|
+
bound_paths=a.bound_paths)
|
|
147
|
+
return 0
|
|
118
148
|
|
|
119
149
|
if __name__ == "__main__":
|
|
120
150
|
main()
|
|
@@ -1,5 +1,6 @@
|
|
|
1
1
|
"""pipeline.py — build -> generate -> validate -> test -> fix (loop)."""
|
|
2
2
|
from dataclasses import dataclass
|
|
3
|
+
import fnmatch
|
|
3
4
|
import os, re, subprocess
|
|
4
5
|
from .observability import estimate_tokens, log_run
|
|
5
6
|
from .prompt import build_prompt
|
|
@@ -18,7 +19,40 @@ class FailureClassification:
|
|
|
18
19
|
kind: str
|
|
19
20
|
guidance: str
|
|
20
21
|
|
|
21
|
-
def
|
|
22
|
+
def extract_changed_files(output):
|
|
23
|
+
text = output or ""
|
|
24
|
+
files = []
|
|
25
|
+
for match in re.finditer(r"^diff --git a/(.+?) b/(.+?)$", text, flags=re.M):
|
|
26
|
+
files.append(match.group(2).strip())
|
|
27
|
+
for match in re.finditer(r"^\+\+\+ b/(.+?)$", text, flags=re.M):
|
|
28
|
+
files.append(match.group(1).strip())
|
|
29
|
+
return list(dict.fromkeys(f for f in files if f and f != "/dev/null"))
|
|
30
|
+
|
|
31
|
+
def _matches_bound(path, patterns):
|
|
32
|
+
normalized = path.replace(os.sep, "/").lstrip("./")
|
|
33
|
+
for raw in patterns or []:
|
|
34
|
+
pattern = str(raw).replace(os.sep, "/").lstrip("./")
|
|
35
|
+
if fnmatch.fnmatch(normalized, pattern):
|
|
36
|
+
return True
|
|
37
|
+
if pattern.endswith("/**"):
|
|
38
|
+
prefix = pattern[:-3].rstrip("/")
|
|
39
|
+
if normalized == prefix or normalized.startswith(f"{prefix}/"):
|
|
40
|
+
return True
|
|
41
|
+
return False
|
|
42
|
+
|
|
43
|
+
def _bound_path_warnings(files, bound_paths):
|
|
44
|
+
if not bound_paths:
|
|
45
|
+
return []
|
|
46
|
+
outside = [path for path in files if not _matches_bound(path, bound_paths)]
|
|
47
|
+
if not outside:
|
|
48
|
+
return []
|
|
49
|
+
return [
|
|
50
|
+
"diff touches path outside bound paths: "
|
|
51
|
+
+ ", ".join(outside)
|
|
52
|
+
+ f" (allowed: {', '.join(bound_paths)})"
|
|
53
|
+
]
|
|
54
|
+
|
|
55
|
+
def validate_generated_output(output, bound_paths=None):
|
|
22
56
|
text = output or ""
|
|
23
57
|
hints = []
|
|
24
58
|
has_diff = bool(re.search(r"^diff --git |^--- .+\n\+\+\+ ", text, flags=re.M))
|
|
@@ -29,6 +63,7 @@ def validate_generated_output(output):
|
|
|
29
63
|
hints.append("include a TEST block or concrete test code")
|
|
30
64
|
if re.search(r"(?i)\b(pseudocode|placeholder|todo: implement)\b", text):
|
|
31
65
|
hints.append("replace placeholders with executable code")
|
|
66
|
+
hints.extend(_bound_path_warnings(extract_changed_files(output), bound_paths))
|
|
32
67
|
return ValidationResult(
|
|
33
68
|
ok=not hints,
|
|
34
69
|
reason="ok" if not hints else "; ".join(hints),
|
|
@@ -64,10 +99,10 @@ def build_retry_feedback(attempt, validation=None, test_log=""):
|
|
|
64
99
|
lines.append("Return the full corrected DIFF + TEST block only.")
|
|
65
100
|
return "\n".join(lines)
|
|
66
101
|
|
|
67
|
-
def _apply_and_test(output, root):
|
|
102
|
+
def _apply_and_test(output, root, bound_paths=None):
|
|
68
103
|
os.makedirs(os.path.join(root, ".simplicio"), exist_ok=True)
|
|
69
104
|
open(os.path.join(root, ".simplicio/last_output.txt"), "w").write(output or "")
|
|
70
|
-
validation = validate_generated_output(output)
|
|
105
|
+
validation = validate_generated_output(output, bound_paths)
|
|
71
106
|
if not validation.ok:
|
|
72
107
|
return False, f"pre-apply validation failed: {validation.reason}"
|
|
73
108
|
# PLUG: extract diff -> git apply; extract test. Here we run the test command.
|
|
@@ -75,13 +110,47 @@ def _apply_and_test(output, root):
|
|
|
75
110
|
p = subprocess.run(cmd, shell=True, cwd=root, capture_output=True, text=True)
|
|
76
111
|
return p.returncode == 0, (p.stdout + p.stderr)[-2000:]
|
|
77
112
|
|
|
78
|
-
def
|
|
113
|
+
def _diff_summary(files_changed):
|
|
114
|
+
if not files_changed:
|
|
115
|
+
return "no changed files reported"
|
|
116
|
+
return "changed " + ", ".join(files_changed)
|
|
117
|
+
|
|
118
|
+
def _task_result(task_id, prompt, output, *, applied, warnings=None):
|
|
119
|
+
files_changed = extract_changed_files(output)
|
|
120
|
+
return {
|
|
121
|
+
"task_id": task_id,
|
|
122
|
+
"applied": bool(applied),
|
|
123
|
+
"files_changed": files_changed,
|
|
124
|
+
"tokens_used": {
|
|
125
|
+
"prompt": estimate_tokens(prompt),
|
|
126
|
+
"completion": estimate_tokens(output or ""),
|
|
127
|
+
},
|
|
128
|
+
"cost_usd": 0.0,
|
|
129
|
+
"diff_summary": _diff_summary(files_changed),
|
|
130
|
+
"warnings": warnings or [],
|
|
131
|
+
}
|
|
132
|
+
|
|
133
|
+
def run_task(root, stack, goal, target, criteria, constraints, *,
|
|
134
|
+
dry_run_task=False, bound_paths=None, quiet=False):
|
|
79
135
|
prompt = build_prompt(root, stack, goal, target, criteria, constraints)
|
|
136
|
+
if dry_run_task:
|
|
137
|
+
output = generate(prompt)
|
|
138
|
+
validation = validate_generated_output(output, bound_paths)
|
|
139
|
+
warnings = [] if validation.ok else [validation.reason]
|
|
140
|
+
return _task_result(target, prompt, output, applied=False, warnings=warnings)
|
|
141
|
+
|
|
80
142
|
feedback = None
|
|
143
|
+
last_output = ""
|
|
144
|
+
last_validation = None
|
|
145
|
+
last_log = ""
|
|
81
146
|
for t in range(1, MAX_ATTEMPTS + 1):
|
|
82
|
-
|
|
147
|
+
if not quiet:
|
|
148
|
+
print(f"--- attempt {t} (provider={os.environ.get('SIMPLICIO_PROVIDER','claude')}) ---")
|
|
83
149
|
output = generate(prompt, feedback)
|
|
84
|
-
|
|
150
|
+
last_output = output or ""
|
|
151
|
+
last_validation = validate_generated_output(output, bound_paths)
|
|
152
|
+
ok, log = _apply_and_test(output, root, bound_paths)
|
|
153
|
+
last_log = log
|
|
85
154
|
log_run(root, {
|
|
86
155
|
"mode": "pipeline",
|
|
87
156
|
"attempt": t,
|
|
@@ -92,9 +161,24 @@ def run(root, stack, goal, target, criteria, constraints):
|
|
|
92
161
|
"stack": stack,
|
|
93
162
|
})
|
|
94
163
|
if ok:
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
164
|
+
if not quiet:
|
|
165
|
+
print("PASSED the contract. DONE.")
|
|
166
|
+
return _task_result(target, prompt, output, applied=True)
|
|
167
|
+
if not quiet:
|
|
168
|
+
print("failed:", log[:300])
|
|
169
|
+
feedback = build_retry_feedback(t + 1, last_validation, log)
|
|
170
|
+
if not quiet:
|
|
171
|
+
print("attempts exhausted — manual review needed.")
|
|
172
|
+
warnings = []
|
|
173
|
+
if last_validation and not last_validation.ok:
|
|
174
|
+
warnings.append(last_validation.reason)
|
|
175
|
+
elif last_log:
|
|
176
|
+
warnings.append(last_log[:500])
|
|
177
|
+
return _task_result(target, prompt, last_output, applied=False, warnings=warnings)
|
|
178
|
+
|
|
179
|
+
def run(root, stack, goal, target, criteria, constraints, bound_paths=None):
|
|
180
|
+
result = run_task(root, stack, goal, target, criteria, constraints,
|
|
181
|
+
bound_paths=bound_paths)
|
|
182
|
+
if result["applied"]:
|
|
183
|
+
return result
|
|
100
184
|
return None
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: simplicio-cli
|
|
3
|
-
Version: 0.4.
|
|
3
|
+
Version: 0.4.3
|
|
4
4
|
Summary: Portable task-to-code pipeline that works with any LLM. Turn a one-line task into a verified code change — diff + test + verify loop. +55 pts on a 156-check benchmark, 21% faster, ~same tokens.
|
|
5
5
|
Author-email: Wesley Simplicio <wesleybob4@gmail.com>
|
|
6
6
|
License: MIT
|
|
@@ -31,8 +31,8 @@ Requires-Dist: sentence-transformers>=2.2
|
|
|
31
31
|
Requires-Dist: numpy>=1.23
|
|
32
32
|
Requires-Dist: anthropic>=0.30
|
|
33
33
|
Requires-Dist: openai>=1.30
|
|
34
|
-
Requires-Dist: simplicio-mapper>=0.6.
|
|
35
|
-
Requires-Dist: simplicio-prompt>=1.
|
|
34
|
+
Requires-Dist: simplicio-mapper>=0.6.1
|
|
35
|
+
Requires-Dist: simplicio-prompt>=1.12.0
|
|
36
36
|
Requires-Dist: httpx>=0.27
|
|
37
37
|
Requires-Dist: orjson>=3.10
|
|
38
38
|
Requires-Dist: diskcache>=5.6
|
|
@@ -134,12 +134,25 @@ M1 MacBook (8 GB), five sub-4B tiny models, six frontier 2026 models, and three
|
|
|
134
134
|
mid-tier 7B–12B open models. Every one gained at least **+14 points** when
|
|
135
135
|
wrapped in simplicio's 6-layer contract.
|
|
136
136
|
|
|
137
|
-
#### Hugging Face —
|
|
137
|
+
#### Hugging Face — recommended Qwen3-Coder defaults (HF router)
|
|
138
138
|
|
|
139
|
-
|
|
140
|
-
`
|
|
141
|
-
|
|
142
|
-
|
|
139
|
+
The served Qwen Coder recommendation now uses the Qwen3-Coder MoE family.
|
|
140
|
+
`Qwen/Qwen2.5-Coder-3B-Instruct` and
|
|
141
|
+
`Qwen/Qwen2.5-Coder-7B-Instruct` remain available as legacy fallback models for
|
|
142
|
+
historical comparisons and hardware that cannot host the MoE successors.
|
|
143
|
+
|
|
144
|
+
| Slot | Recommended model | Route | Notes |
|
|
145
|
+
|---|---|---|---|
|
|
146
|
+
| Efficient coder | `Qwen/Qwen3-Coder-30B-A3B-Instruct` | HF router | 30B total / ~3B active MoE successor to the 3B slot |
|
|
147
|
+
| High-ceiling coder | `Qwen/Qwen3-Coder-Next` | HF router | 80B total / ~3B active MoE successor to the 7B slot |
|
|
148
|
+
|
|
149
|
+
> Reproduce the new default set:
|
|
150
|
+
> `BENCH_BASE_URL=https://router.huggingface.co/v1 BENCH_API_KEY=<hf-token>
|
|
151
|
+
> BENCH_MODELS="Qwen/Qwen3-Coder-30B-A3B-Instruct,Qwen/Qwen3-Coder-Next"
|
|
152
|
+
> python3 bench/run_offline.py`.
|
|
153
|
+
|
|
154
|
+
Legacy Qwen2.5-Coder baseline, re-run on 2026-05-27 against the latest
|
|
155
|
+
`simplicio-mapper` artifacts (10 cases/side, 156 checks):
|
|
143
156
|
|
|
144
157
|
| Model | Without simplicio | With simplicio | Gain |
|
|
145
158
|
|---|---|---|---|
|
|
@@ -148,10 +161,9 @@ through the HF router (`https://router.huggingface.co/v1`).
|
|
|
148
161
|
| **Qwen 2.5 Coder 1.5B** (`Qwen/Qwen2.5-Coder-1.5B-Instruct`, local CPU) | 30% | **92%** | **+62 pts** |
|
|
149
162
|
| **HF avg (3 models · 10 cases · 156 checks)** | **34%** | **94%** | **+60 pts (+172%)** |
|
|
150
163
|
|
|
151
|
-
> Monotonic from smaller to larger: pass-rate with
|
|
152
|
-
> 94% → 96%** as the model grows, while the raw-prompt
|
|
153
|
-
> **30–38%**.
|
|
154
|
-
> the heaviest lifting where the model is weakest. Reproduce:
|
|
164
|
+
> Monotonic from smaller to larger in the legacy baseline: pass-rate with
|
|
165
|
+
> simplicio climbs **92% → 94% → 96%** as the model grows, while the raw-prompt
|
|
166
|
+
> baseline stays at **30–38%**. Reproduce the legacy set:
|
|
155
167
|
> `BENCH_BASE_URL=https://router.huggingface.co/v1 BENCH_API_KEY=<hf-token>
|
|
156
168
|
> BENCH_MODELS="local:Qwen/Qwen2.5-Coder-1.5B-Instruct,Qwen/Qwen2.5-Coder-3B-Instruct,Qwen/Qwen2.5-Coder-7B-Instruct"
|
|
157
169
|
> python3 bench/run_offline.py`.
|
|
@@ -167,7 +179,18 @@ Pro) show `n/a` for the new column: their OpenRouter calls hit account-level
|
|
|
167
179
|
HTTP 402 / provider failures on >50% of requests this round, so the sample is
|
|
168
180
|
too small to publish; their old numbers still stand.
|
|
169
181
|
|
|
170
|
-
#### Local offline —
|
|
182
|
+
#### Local offline — Qwen3-Coder GGUF recommendation, Qwen2.5 legacy baseline
|
|
183
|
+
|
|
184
|
+
For local OpenAI-compatible servers, prefer the Qwen3-Coder GGUF builds when
|
|
185
|
+
the machine can host MoE weights:
|
|
186
|
+
|
|
187
|
+
| Slot | Recommended local weights | Notes |
|
|
188
|
+
|---|---|---|
|
|
189
|
+
| Efficient coder | `unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF` | Primary local successor for the 3B-active slot |
|
|
190
|
+
| High-ceiling coder | `unsloth/Qwen3-Coder-Next-GGUF` | 24 GB GPU-class successor for long-context work |
|
|
191
|
+
|
|
192
|
+
The last fully offline fallback baseline remains qwen2.5-coder on Ollama,
|
|
193
|
+
M1 8 GB, run on 2026-05-27 (30 runs/side, 156 checks):
|
|
171
194
|
|
|
172
195
|
| Model | Without simplicio | With simplicio | Gain |
|
|
173
196
|
|---|---|---|---|
|
|
@@ -180,7 +203,7 @@ too small to publish; their old numbers still stand.
|
|
|
180
203
|
> `http://localhost:11434/v1` (Ollama's OpenAI-compatible endpoint). A
|
|
181
204
|
> 1.5B-param model running on a 4-year-old laptop reaches **88%** pass-rate
|
|
182
205
|
> with simplicio's contract — same hardware, same model, raw prompt = 32%.
|
|
183
|
-
> Reproduce: `BENCH_BASE_URL=http://localhost:11434/v1 BENCH_API_KEY=ollama
|
|
206
|
+
> Reproduce the legacy fallback: `BENCH_BASE_URL=http://localhost:11434/v1 BENCH_API_KEY=ollama
|
|
184
207
|
> BENCH_MODELS="qwen2.5-coder:7b" python3 bench/run_offline.py`.
|
|
185
208
|
|
|
186
209
|
#### Tiny models — sub-4B, run on 2026-05-26 (50 runs/side, 260 checks)
|
|
@@ -382,6 +405,29 @@ simplicio task "..." --stack angular --target ...
|
|
|
382
405
|
|
|
383
406
|
How it works: simplicio shells out to `claude -p "<prompt>"` (or `codex exec "<prompt>"`) as a subprocess, captures stdout, runs the test loop. The inner CLI authenticates via your existing OAuth session in `~/.claude/` or `~/.codex/`. simplicio sets `SIMPLICIO_HOOK_GUARD=1` in the subprocess env so the inner Claude Code session does **not** re-fire simplicio's own UserPromptSubmit hook (no infinite recursion).
|
|
384
407
|
|
|
408
|
+
For orchestrators such as SendSprint, `simplicio task` also has a structured
|
|
409
|
+
contract:
|
|
410
|
+
|
|
411
|
+
```bash
|
|
412
|
+
simplicio task "hide Delete button for non-admins" \
|
|
413
|
+
--stack angular \
|
|
414
|
+
--target src/app/screen/screen.component.html \
|
|
415
|
+
--dry-run-task \
|
|
416
|
+
--json
|
|
417
|
+
|
|
418
|
+
simplicio task "front-only task" \
|
|
419
|
+
--stack angular \
|
|
420
|
+
--target src/app/screen/screen.component.html \
|
|
421
|
+
--bound-paths "src/app/**" \
|
|
422
|
+
--json
|
|
423
|
+
```
|
|
424
|
+
|
|
425
|
+
`--dry-run-task` generates the would-be diff/test output without applying or
|
|
426
|
+
testing it. `--json` returns `{task_id, applied, files_changed, tokens_used,
|
|
427
|
+
cost_usd, diff_summary, warnings}`. Repeat `--bound-paths <glob>` to reject
|
|
428
|
+
diffs outside the allowed edit surface; violations are reported in `warnings`
|
|
429
|
+
and the command exits non-zero.
|
|
430
|
+
|
|
385
431
|
### Path 3 example — standalone with API key
|
|
386
432
|
|
|
387
433
|
```bash
|
|
@@ -1 +0,0 @@
|
|
|
1
|
-
__version__ = "0.1.0"
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|