polymath-agent 0.3.1 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +52 -1
  2. package/dist/cli.js +876 -190
  3. package/package.json +3 -2
package/README.md CHANGED
@@ -42,7 +42,25 @@ beats a pricey generalist at edits. Polymath assigns the cheapest model that gen
42
42
 
43
43
  ```bash
44
44
  npm install -g polymath-agent
45
- poly login # guided OpenRouter key setup
45
+ poly setup # guided: optionally install a local LLM (Ollama) + connect models
46
+ ```
47
+
48
+ `poly setup` asks whether to install a local LLM, or skip the prompt with a flag:
49
+
50
+ ```bash
51
+ poly setup --local # install Ollama + pull a model (RAM-aware default)
52
+ poly setup --local -m qwen2.5-coder:7b # choose the model
53
+ poly setup --no-local # cloud only — just connect an OpenRouter key
54
+ poly setup --local -y # non-interactive (accept defaults / auto-install)
55
+ ```
56
+
57
+ Keep everything current with `poly update` (the CLI via npm, the Ollama runtime, and
58
+ your local models) — add `--check` to only report what's available:
59
+
60
+ ```bash
61
+ poly update # update CLI + Ollama + re-pull local models
62
+ poly update --check # report-only
63
+ poly update --self # just the CLI (also --ollama, --models)
46
64
  ```
47
65
 
48
66
  **From source** (no npm publish needed):
@@ -88,6 +106,8 @@ poly usage # cost by date + model
88
106
 
89
107
  | Command | What it does |
90
108
  |---|---|
109
+ | `poly setup` | First-run: optionally install a local LLM (Ollama) + connect models. `--local` / `--no-local` / `-m <model>` / `-y`. |
110
+ | `poly update` | Update the CLI (npm), the Ollama runtime, and local models. `--check`, `--self`, `--ollama`, `--models`. |
91
111
  | `poly login` | Connect/replace your OpenRouter API key (Claude-Code-style onboarding). |
92
112
  | `poly run [goal]` | Launch the interactive agent. Shows the recommended routing, then executes. |
93
113
  | `poly recommend <goal>` | Pre-run recommendation: cheapest / best-value / best-quality model combos + savings. |
@@ -100,6 +120,37 @@ poly usage # cost by date + model
100
120
  After each `poly run`, rate the result 0–9 (one keypress) — your goal-achievement
101
121
  rating joins the auto score (completed/planned steps) to power `poly analyze`.
102
122
 
123
+ ### Outcome-driven loop (verify → escalate → repeat)
124
+
125
+ `poly run` doesn't stop at "code written" — it measures the result and keeps going
126
+ until the goal is actually met:
127
+
128
+ ```
129
+ command → plan + acceptance criteria → code (cheapest model)
130
+ → VERIFY result against criteria (inspects files, runs tests)
131
+ → if unmet: ESCALATE (higher tier, more tokens, cost cap lifted) → fix → re-verify
132
+ → repeat until all criteria pass (or --max-attempts)
133
+ ```
134
+
135
+ The cheapest model gets first crack; only the criteria it *fails* trigger a pricier
136
+ model — so you pay for frontier capability exactly when (and only when) it's needed.
137
+
138
+ ```bash
139
+ poly run -w -x "add an add(a,b) to calc.js and make the tests pass"
140
+ poly run --no-verify "..." # single pass, no verify/escalate
141
+ poly run --max-attempts 5 "..." # try harder before giving up
142
+ ```
143
+
144
+ After each run you'll see `✓ goal met · 2 attempts` (or `⚠ goal not fully met`).
145
+
146
+ ### Statistical model optimization (learned starting tier)
147
+
148
+ Every attempt is recorded with its goal type, starting tier, tokens, and pass/fail.
149
+ `poly analyze` then shows, per goal type, **which starting model reaches the goal
150
+ with the fewest total tokens** — and once there's enough evidence (≥3 verified
151
+ sessions), `poly run` **auto-starts at that tier**, skipping cheap attempts for goal
152
+ types that historically need a stronger model from the start.
153
+
103
154
  ### The efficiency playbook (learned routing)
104
155
 
105
156
  Everything is captured locally (SQLite). `poly analyze` distills it into a **playbook**