polymath-agent 0.3.1 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +52 -1
- package/dist/cli.js +876 -190
- package/package.json +3 -2
package/README.md
CHANGED
|
@@ -42,7 +42,25 @@ beats a pricey generalist at edits. Polymath assigns the cheapest model that gen
|
|
|
42
42
|
|
|
43
43
|
```bash
|
|
44
44
|
npm install -g polymath-agent
|
|
45
|
-
poly
|
|
45
|
+
poly setup # guided: optionally install a local LLM (Ollama) + connect models
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
`poly setup` asks whether to install a local LLM, or skip the prompt with a flag:
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
poly setup --local # install Ollama + pull a model (RAM-aware default)
|
|
52
|
+
poly setup --local -m qwen2.5-coder:7b # choose the model
|
|
53
|
+
poly setup --no-local # cloud only — just connect an OpenRouter key
|
|
54
|
+
poly setup --local -y # non-interactive (accept defaults / auto-install)
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
Keep everything current with `poly update` (the CLI via npm, the Ollama runtime, and
|
|
58
|
+
your local models) — add `--check` to only report what's available:
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
poly update # update CLI + Ollama + re-pull local models
|
|
62
|
+
poly update --check # report-only
|
|
63
|
+
poly update --self # just the CLI (also --ollama, --models)
|
|
46
64
|
```
|
|
47
65
|
|
|
48
66
|
**From source** (no npm publish needed):
|
|
@@ -88,6 +106,8 @@ poly usage # cost by date + model
|
|
|
88
106
|
|
|
89
107
|
| Command | What it does |
|
|
90
108
|
|---|---|
|
|
109
|
+
| `poly setup` | First-run: optionally install a local LLM (Ollama) + connect models. `--local` / `--no-local` / `-m <model>` / `-y`. |
|
|
110
|
+
| `poly update` | Update the CLI (npm), the Ollama runtime, and local models. `--check`, `--self`, `--ollama`, `--models`. |
|
|
91
111
|
| `poly login` | Connect/replace your OpenRouter API key (Claude-Code-style onboarding). |
|
|
92
112
|
| `poly run [goal]` | Launch the interactive agent. Shows the recommended routing, then executes. |
|
|
93
113
|
| `poly recommend <goal>` | Pre-run recommendation: cheapest / best-value / best-quality model combos + savings. |
|
|
@@ -100,6 +120,37 @@ poly usage # cost by date + model
|
|
|
100
120
|
After each `poly run`, rate the result 0–9 (one keypress) — your goal-achievement
|
|
101
121
|
rating joins the auto score (completed/planned steps) to power `poly analyze`.
|
|
102
122
|
|
|
123
|
+
### Outcome-driven loop (verify → escalate → repeat)
|
|
124
|
+
|
|
125
|
+
`poly run` doesn't stop at "code written" — it measures the result and keeps going
|
|
126
|
+
until the goal is actually met:
|
|
127
|
+
|
|
128
|
+
```
|
|
129
|
+
command → plan + acceptance criteria → code (cheapest model)
|
|
130
|
+
→ VERIFY result against criteria (inspects files, runs tests)
|
|
131
|
+
→ if unmet: ESCALATE (higher tier, more tokens, cost cap lifted) → fix → re-verify
|
|
132
|
+
→ repeat until all criteria pass (or --max-attempts)
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
The cheapest model gets first crack; only the criteria it *fails* trigger a pricier
|
|
136
|
+
model — so you pay for frontier capability exactly when (and only when) it's needed.
|
|
137
|
+
|
|
138
|
+
```bash
|
|
139
|
+
poly run -w -x "add an add(a,b) to calc.js and make the tests pass"
|
|
140
|
+
poly run --no-verify "..." # single pass, no verify/escalate
|
|
141
|
+
poly run --max-attempts 5 "..." # try harder before giving up
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
After each run you'll see `✓ goal met · 2 attempts` (or `⚠ goal not fully met`).
|
|
145
|
+
|
|
146
|
+
### Statistical model optimization (learned starting tier)
|
|
147
|
+
|
|
148
|
+
Every attempt is recorded with its goal type, starting tier, tokens, and pass/fail.
|
|
149
|
+
`poly analyze` then shows, per goal type, **which starting model reaches the goal
|
|
150
|
+
with the fewest total tokens** — and once there's enough evidence (≥3 verified
|
|
151
|
+
sessions), `poly run` **auto-starts at that tier**, skipping cheap attempts for goal
|
|
152
|
+
types that historically need a stronger model from the start.
|
|
153
|
+
|
|
103
154
|
### The efficiency playbook (learned routing)
|
|
104
155
|
|
|
105
156
|
Everything is captured locally (SQLite). `poly analyze` distills it into a **playbook**
|