promptpilot 0.1.8 → 0.2.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +147 -285
- package/dist/cli.d.ts +1 -0
- package/dist/cli.js +109 -33
- package/dist/cli.js.map +1 -1
- package/dist/index.js +61 -14
- package/dist/index.js.map +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,135 +1,106 @@
|
|
|
1
1
|
# promptpilot
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
A local prompt optimizer and model router for Claude CLI and agentic LLM workflows.
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
Before your prompt reaches a remote model, PromptPilot rewrites it locally using a small Ollama model — cutting noise, compressing context, and routing to the right downstream model. No prompt rewrite costs you remote tokens.
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
---
|
|
8
8
|
|
|
9
|
-
|
|
10
|
-
- It uses a small local model before you send anything to a stronger remote model.
|
|
11
|
-
- It avoids paying remote-token costs for every prompt rewrite.
|
|
12
|
-
- It works well on laptops with limited memory by preferring small Ollama models.
|
|
13
|
-
- It uses a local Qwen router when multiple small local models are available.
|
|
9
|
+
## Install
|
|
14
10
|
|
|
15
|
-
|
|
11
|
+
```bash
|
|
12
|
+
npm install -g promptpilot
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
Requires [Ollama](https://ollama.com) running locally and Node.js >= 20.10.0.
|
|
16
|
+
|
|
17
|
+
Pull at least one small local model:
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
ollama pull qwen2.5:3b
|
|
21
|
+
```
|
|
16
22
|
|
|
17
|
-
|
|
18
|
-
- `phi3:mini`
|
|
19
|
-
- `llama3.2:3b`
|
|
23
|
+
---
|
|
20
24
|
|
|
21
25
|
## What it does
|
|
22
26
|
|
|
23
|
-
-
|
|
24
|
-
-
|
|
25
|
-
-
|
|
26
|
-
-
|
|
27
|
-
-
|
|
28
|
-
- Routes to a caller-supplied downstream model allowlist.
|
|
29
|
-
- Returns a selected target plus a ranked top 3 when routing is enabled.
|
|
30
|
-
- Outputs plain prompt text for shell pipelines or JSON for tooling/debugging.
|
|
27
|
+
- Rewrites your prompt locally before sending it anywhere
|
|
28
|
+
- Keeps session memory across turns so context carries forward
|
|
29
|
+
- Compresses old context when it gets too long
|
|
30
|
+
- Routes to the best model from a list you provide
|
|
31
|
+
- Outputs plain text for shell pipelines or JSON for tooling
|
|
31
32
|
|
|
32
|
-
|
|
33
|
+
---
|
|
33
34
|
|
|
34
|
-
|
|
35
|
+
## Quick start
|
|
35
36
|
|
|
36
37
|
```bash
|
|
37
|
-
|
|
38
|
-
npm run build
|
|
39
|
-
npm test
|
|
40
|
-
ollama pull qwen2.5:3b
|
|
38
|
+
# Optimize a prompt and print the result
|
|
41
39
|
promptpilot optimize "explain binary search simply" --plain
|
|
42
|
-
promptpilot optimize "continue my study guide" --session dsa --save-context --plain | claude
|
|
43
|
-
```
|
|
44
40
|
|
|
45
|
-
|
|
41
|
+
# Pipe directly into Claude
|
|
42
|
+
promptpilot optimize "continue my study guide" --session dsa --save-context --plain | claude
|
|
46
43
|
|
|
47
|
-
|
|
48
|
-
|
|
44
|
+
# Read from a file
|
|
45
|
+
cat notes.txt | promptpilot optimize --task summarization --plain | claude
|
|
49
46
|
```
|
|
50
47
|
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
PromptPilot v0.1.x
|
|
55
|
-
┌──────────────────────────────────────────────────────────────────────────────┐
|
|
56
|
-
│ Welcome back │
|
|
57
|
-
│ │
|
|
58
|
-
│ .-''''-. Launchpad │
|
|
59
|
-
│ .' .--. '. Run promptpilot optimize "..." │
|
|
60
|
-
│ / / oo \ \ Pipe directly into Claude with | claude│
|
|
61
|
-
│ | \_==_/ | │
|
|
62
|
-
│ \ \_/ \_/ / Custom local model │
|
|
63
|
-
│ '._/|__|\_.' Use --model promptpilot-compressor │
|
|
64
|
-
│ │
|
|
65
|
-
│ /Users/you/project Commands │
|
|
66
|
-
│ optimize optimize and route prompts │
|
|
67
|
-
│ --help show the full CLI reference │
|
|
68
|
-
└──────────────────────────────────────────────────────────────────────────────┘
|
|
69
|
-
```
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
## Session memory
|
|
70
51
|
|
|
71
|
-
|
|
52
|
+
Pass `--session <name>` to persist context across calls. PromptPilot stores sessions as JSON under `~/.promptpilot/sessions` by default.
|
|
72
53
|
|
|
73
54
|
```bash
|
|
74
|
-
|
|
75
|
-
|
|
55
|
+
# Save context after each turn
|
|
56
|
+
promptpilot optimize "start a refactor plan" --session repo-refactor --save-context --plain
|
|
57
|
+
|
|
58
|
+
# Pick up where you left off
|
|
59
|
+
promptpilot optimize "continue the refactor" --session repo-refactor --save-context --plain | claude
|
|
60
|
+
|
|
61
|
+
# Clear a session when you're done
|
|
62
|
+
promptpilot optimize --session repo-refactor --clear-session
|
|
76
63
|
```
|
|
77
64
|
|
|
78
|
-
|
|
65
|
+
---
|
|
79
66
|
|
|
80
|
-
|
|
67
|
+
## Custom compressor model
|
|
81
68
|
|
|
82
|
-
|
|
69
|
+
PromptPilot ships a `Modelfile` that builds `promptpilot-compressor` — a stripped-down Ollama model tuned to output only the rewritten prompt with no extra commentary.
|
|
83
70
|
|
|
84
71
|
```bash
|
|
85
72
|
ollama pull qwen2.5:3b
|
|
86
73
|
ollama create promptpilot-compressor -f ./Modelfile
|
|
87
|
-
ollama run promptpilot-compressor "explain recursion simply"
|
|
88
74
|
```
|
|
89
75
|
|
|
90
|
-
Use it
|
|
76
|
+
Use it:
|
|
91
77
|
|
|
92
78
|
```bash
|
|
93
|
-
# Plain output — pipe directly into Claude
|
|
94
79
|
promptpilot optimize "help me refactor this auth middleware" \
|
|
95
80
|
--model promptpilot-compressor \
|
|
96
81
|
--preset code \
|
|
97
82
|
--plain
|
|
98
|
-
|
|
99
|
-
# JSON output with debug info
|
|
100
|
-
promptpilot optimize "help me refactor this auth middleware" \
|
|
101
|
-
--model promptpilot-compressor \
|
|
102
|
-
--preset code \
|
|
103
|
-
--json --debug
|
|
104
|
-
|
|
105
|
-
# With session memory, piped into Claude
|
|
106
|
-
promptpilot optimize "continue the refactor" \
|
|
107
|
-
--model promptpilot-compressor \
|
|
108
|
-
--session repo-refactor \
|
|
109
|
-
--save-context \
|
|
110
|
-
--plain | claude
|
|
111
83
|
```
|
|
112
84
|
|
|
113
|
-
|
|
85
|
+
---
|
|
114
86
|
|
|
115
|
-
##
|
|
87
|
+
## Downstream model routing
|
|
116
88
|
|
|
117
|
-
PromptPilot
|
|
89
|
+
Tell PromptPilot which models you're allowed to use and it picks the best one for the job.
|
|
118
90
|
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
-
|
|
124
|
-
-
|
|
125
|
-
|
|
126
|
-
|
|
91
|
+
```bash
|
|
92
|
+
promptpilot optimize "rewrite this for a coding refactor" \
|
|
93
|
+
--task code \
|
|
94
|
+
--preset code \
|
|
95
|
+
--target anthropic:claude-sonnet \
|
|
96
|
+
--target openai:gpt-4.1-mini \
|
|
97
|
+
--target openai:gpt-5-codex \
|
|
98
|
+
--target-hint coding \
|
|
99
|
+
--target-hint refactor \
|
|
100
|
+
--json --debug
|
|
101
|
+
```
|
|
127
102
|
|
|
128
|
-
|
|
129
|
-
- If one target is supplied, PromptPilot selects it directly.
|
|
130
|
-
- If multiple targets are supplied, a local Qwen router ranks them and selects the top target.
|
|
131
|
-
- Routing is code-first by default: ambiguous prompts bias toward coding-capable and agentic targets.
|
|
132
|
-
- If downstream routing fails, PromptPilot still returns an optimized prompt but does not invent a target.
|
|
103
|
+
---
|
|
133
104
|
|
|
134
105
|
## Library usage
|
|
135
106
|
|
|
@@ -156,17 +127,9 @@ console.log(result.finalPrompt);
|
|
|
156
127
|
console.log(result.model);
|
|
157
128
|
```
|
|
158
129
|
|
|
159
|
-
###
|
|
130
|
+
### Routing across multiple models
|
|
160
131
|
|
|
161
132
|
```ts
|
|
162
|
-
import { createOptimizer } from "promptpilot";
|
|
163
|
-
|
|
164
|
-
const optimizer = createOptimizer({
|
|
165
|
-
provider: "ollama",
|
|
166
|
-
host: "http://localhost:11434",
|
|
167
|
-
contextStore: "local"
|
|
168
|
-
});
|
|
169
|
-
|
|
170
133
|
const result = await optimizer.optimize({
|
|
171
134
|
prompt: "rewrite this prompt for a coding refactor task",
|
|
172
135
|
task: "code",
|
|
@@ -205,7 +168,7 @@ console.log(result.rankedTargets);
|
|
|
205
168
|
console.log(result.routingReason);
|
|
206
169
|
```
|
|
207
170
|
|
|
208
|
-
###
|
|
171
|
+
### Non-coding tasks work too
|
|
209
172
|
|
|
210
173
|
```ts
|
|
211
174
|
const result = await optimizer.optimize({
|
|
@@ -233,53 +196,7 @@ const result = await optimizer.optimize({
|
|
|
233
196
|
console.log(result.selectedTarget);
|
|
234
197
|
```
|
|
235
198
|
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
Plain shell output:
|
|
239
|
-
|
|
240
|
-
```bash
|
|
241
|
-
promptpilot optimize "help me debug this failing CI job" --task code --preset code --plain
|
|
242
|
-
```
|
|
243
|
-
|
|
244
|
-
Pipe directly into Claude CLI:
|
|
245
|
-
|
|
246
|
-
```bash
|
|
247
|
-
promptpilot optimize "continue working on this refactor" --session repo-refactor --save-context --plain | claude
|
|
248
|
-
```
|
|
249
|
-
|
|
250
|
-
Route against an allowlist of downstream targets:
|
|
251
|
-
|
|
252
|
-
```bash
|
|
253
|
-
promptpilot optimize "rewrite this prompt for a coding refactor task" \
|
|
254
|
-
--task code \
|
|
255
|
-
--preset code \
|
|
256
|
-
--target anthropic:claude-sonnet \
|
|
257
|
-
--target openai:gpt-4.1-mini \
|
|
258
|
-
--target openai:gpt-5-codex \
|
|
259
|
-
--target-hint coding \
|
|
260
|
-
--target-hint refactor \
|
|
261
|
-
--json --debug
|
|
262
|
-
```
|
|
263
|
-
|
|
264
|
-
Use stdin in a pipeline:
|
|
265
|
-
|
|
266
|
-
```bash
|
|
267
|
-
cat notes.txt | promptpilot optimize --task summarization --plain | claude
|
|
268
|
-
```
|
|
269
|
-
|
|
270
|
-
Save context between calls:
|
|
271
|
-
|
|
272
|
-
```bash
|
|
273
|
-
promptpilot optimize "continue my debugger plan" --session ci-fix --save-context --plain
|
|
274
|
-
```
|
|
275
|
-
|
|
276
|
-
Clear a session:
|
|
277
|
-
|
|
278
|
-
```bash
|
|
279
|
-
promptpilot optimize --session ci-fix --clear-session
|
|
280
|
-
```
|
|
281
|
-
|
|
282
|
-
Node `child_process` example:
|
|
199
|
+
### Node child_process pipeline
|
|
283
200
|
|
|
284
201
|
```ts
|
|
285
202
|
import { spawn } from "node:child_process";
|
|
@@ -287,8 +204,7 @@ import { spawn } from "node:child_process";
|
|
|
287
204
|
const promptpilot = spawn("promptpilot", [
|
|
288
205
|
"optimize",
|
|
289
206
|
"continue working on this repo refactor",
|
|
290
|
-
"--session",
|
|
291
|
-
"repo-refactor",
|
|
207
|
+
"--session", "repo-refactor",
|
|
292
208
|
"--save-context",
|
|
293
209
|
"--plain"
|
|
294
210
|
]);
|
|
@@ -297,147 +213,93 @@ const claude = spawn("claude", [], { stdio: ["pipe", "inherit", "inherit"] });
|
|
|
297
213
|
promptpilot.stdout.pipe(claude.stdin);
|
|
298
214
|
```
|
|
299
215
|
|
|
300
|
-
|
|
216
|
+
---
|
|
217
|
+
|
|
218
|
+
## CLI flags
|
|
219
|
+
|
|
220
|
+
| Flag | What it does |
|
|
221
|
+
|---|---|
|
|
222
|
+
| `--session <id>` | Name the session for persistent memory |
|
|
223
|
+
| `--save-context` | Write this turn back into the session |
|
|
224
|
+
| `--clear-session` | Wipe a session and start fresh |
|
|
225
|
+
| `--no-context` | Ignore session history for this call |
|
|
226
|
+
| `--model <name>` | Use a specific local Ollama model |
|
|
227
|
+
| `--preset <preset>` | Prompt style: `code`, `email`, `essay`, `support`, `summarization`, `chat` |
|
|
228
|
+
| `--mode <mode>` | Rewrite mode: `clarity`, `concise`, `detailed`, `structured`, `persuasive`, `compress`, `claude_cli` |
|
|
229
|
+
| `--task <task>` | Task hint passed to the optimizer |
|
|
230
|
+
| `--tone <tone>` | Tone hint passed to the optimizer |
|
|
231
|
+
| `--target <provider:model>` | Add a downstream model to the routing pool (repeatable) |
|
|
232
|
+
| `--target-hint <value>` | Capability hint for routing (repeatable) |
|
|
233
|
+
| `--routing-priority <value>` | `cheapest_adequate`, `best_quality`, or `fastest_adequate` |
|
|
234
|
+
| `--routing-top-k <n>` | How many ranked targets to return |
|
|
235
|
+
| `--workload-bias <value>` | `code_first` to bias routing toward coding models |
|
|
236
|
+
| `--no-routing` | Skip downstream routing entirely |
|
|
237
|
+
| `--plain` | Output the final prompt as plain text |
|
|
238
|
+
| `--json` | Output full result as JSON |
|
|
239
|
+
| `--debug` | Include routing and optimization details in output |
|
|
240
|
+
| `--host <url>` | Ollama host (default: `http://localhost:11434`) |
|
|
241
|
+
| `--store <local\|sqlite>` | Session storage backend |
|
|
242
|
+
| `--storage-dir <path>` | Custom path for session files |
|
|
243
|
+
| `--sqlite-path <path>` | Path to SQLite database file |
|
|
244
|
+
| `--max-total-tokens <n>` | Token budget for the full composed prompt |
|
|
245
|
+
| `--max-context-tokens <n>` | Token budget for retrieved session context |
|
|
246
|
+
| `--max-input-tokens <n>` | Token budget for the incoming prompt |
|
|
247
|
+
| `--timeout <ms>` | Ollama request timeout in milliseconds |
|
|
248
|
+
| `--bypass-optimization` | Skip Ollama and pass the prompt through as-is |
|
|
249
|
+
| `--pin-constraint <text>` | Add a pinned constraint (repeatable) |
|
|
250
|
+
| `--tag <value>` | Tag this session entry (repeatable) |
|
|
251
|
+
| `--output-format <text>` | Output format hint |
|
|
252
|
+
| `--max-length <n>` | Max length hint for the rewritten prompt |
|
|
253
|
+
| `--target-model <name>` | Alternate flag for downstream model name |
|
|
254
|
+
|
|
255
|
+
If no prompt text is given, `promptpilot optimize` reads from stdin.
|
|
256
|
+
|
|
257
|
+
---
|
|
258
|
+
|
|
259
|
+
## How local model selection works
|
|
260
|
+
|
|
261
|
+
PromptPilot prefers small Ollama models (≤ 4B params). If only one suitable model is installed, it uses it directly. If multiple are installed, a local Qwen router picks the best one for the task. Explicit `--model` always overrides this.
|
|
262
|
+
|
|
263
|
+
Default preference order:
|
|
264
|
+
|
|
265
|
+
1. `qwen2.5:3b`
|
|
266
|
+
2. `phi3:mini`
|
|
267
|
+
3. `llama3.2:3b`
|
|
268
|
+
|
|
269
|
+
If Ollama is unavailable or times out, PromptPilot falls back to deterministic prompt shaping (whitespace cleanup, mode-specific wrappers) instead of failing outright.
|
|
270
|
+
|
|
271
|
+
---
|
|
272
|
+
|
|
273
|
+
## Exports
|
|
301
274
|
|
|
302
|
-
|
|
303
|
-
|
|
304
|
-
|
|
305
|
-
|
|
306
|
-
|
|
307
|
-
|
|
308
|
-
|
|
309
|
-
|
|
310
|
-
|
|
311
|
-
|
|
312
|
-
- optional tags
|
|
313
|
-
|
|
314
|
-
Context retrieval prefers:
|
|
315
|
-
|
|
316
|
-
- pinned constraints
|
|
317
|
-
- task goals
|
|
318
|
-
- recent relevant turns
|
|
319
|
-
- named entities and recurring references
|
|
320
|
-
- stored summaries when budgets are tight
|
|
321
|
-
|
|
322
|
-
## Token reduction
|
|
323
|
-
|
|
324
|
-
PromptPilot estimates token usage for:
|
|
325
|
-
|
|
326
|
-
- the new prompt
|
|
327
|
-
- retrieved context
|
|
328
|
-
- the final composed prompt
|
|
329
|
-
|
|
330
|
-
Budgets:
|
|
275
|
+
```ts
|
|
276
|
+
import {
|
|
277
|
+
createOptimizer,
|
|
278
|
+
optimizePrompt,
|
|
279
|
+
PromptOptimizer,
|
|
280
|
+
OllamaClient,
|
|
281
|
+
FileSessionStore,
|
|
282
|
+
SQLiteSessionStore
|
|
283
|
+
} from "promptpilot";
|
|
284
|
+
```
|
|
331
285
|
|
|
332
|
-
|
|
333
|
-
- `maxContextTokens`
|
|
334
|
-
- `maxTotalTokens`
|
|
286
|
+
Key fields on the result object:
|
|
335
287
|
|
|
336
|
-
|
|
288
|
+
| Field | Description |
|
|
289
|
+
|---|---|
|
|
290
|
+
| `optimizedPrompt` | The rewritten prompt from the local model |
|
|
291
|
+
| `finalPrompt` | The composed prompt including context |
|
|
292
|
+
| `selectedTarget` | The downstream model chosen by the router |
|
|
293
|
+
| `rankedTargets` | All targets ranked by the router |
|
|
294
|
+
| `routingReason` | Why the top target was selected |
|
|
295
|
+
| `routingWarnings` | Any issues the router flagged |
|
|
296
|
+
| `provider` | Which provider ran the optimization (`ollama` or `heuristic`) |
|
|
297
|
+
| `model` | Which local model was used |
|
|
298
|
+
| `estimatedTokensBefore` | Token estimate before optimization |
|
|
299
|
+
| `estimatedTokensAfter` | Token estimate after optimization |
|
|
337
300
|
|
|
338
|
-
|
|
301
|
+
---
|
|
339
302
|
|
|
340
|
-
|
|
341
|
-
promptpilot optimize "rewrite this prompt for a coding refactor task"
|
|
342
|
-
```
|
|
303
|
+
## License
|
|
343
304
|
|
|
344
|
-
|
|
345
|
-
|
|
346
|
-
- `--session <id>`
|
|
347
|
-
- `--model <name>`
|
|
348
|
-
- `--mode <mode>`
|
|
349
|
-
- `--task <task>`
|
|
350
|
-
- `--tone <tone>`
|
|
351
|
-
- `--preset <preset>`
|
|
352
|
-
- `--target-model <name>`
|
|
353
|
-
- `--output-format <text>`
|
|
354
|
-
- `--max-length <n>`
|
|
355
|
-
- `--tag <value>` repeatable
|
|
356
|
-
- `--pin-constraint <text>` repeatable
|
|
357
|
-
- `--target <provider:model>` repeatable
|
|
358
|
-
- `--target-hint <value>` repeatable
|
|
359
|
-
- `--routing-priority <cheapest_adequate|best_quality|fastest_adequate>`
|
|
360
|
-
- `--routing-top-k <n>`
|
|
361
|
-
- `--workload-bias <code_first>`
|
|
362
|
-
- `--no-routing`
|
|
363
|
-
- `--host <url>`
|
|
364
|
-
- `--store <local|sqlite>`
|
|
365
|
-
- `--storage-dir <path>`
|
|
366
|
-
- `--sqlite-path <path>`
|
|
367
|
-
- `--plain`
|
|
368
|
-
- `--json`
|
|
369
|
-
- `--debug`
|
|
370
|
-
- `--save-context`
|
|
371
|
-
- `--no-context`
|
|
372
|
-
- `--clear-session`
|
|
373
|
-
- `--max-total-tokens <n>`
|
|
374
|
-
- `--max-context-tokens <n>`
|
|
375
|
-
- `--max-input-tokens <n>`
|
|
376
|
-
- `--timeout <ms>`
|
|
377
|
-
- `--bypass-optimization`
|
|
378
|
-
|
|
379
|
-
If no positional prompt is provided, `promptpilot optimize` reads the raw prompt from stdin.
|
|
380
|
-
|
|
381
|
-
## Public API
|
|
382
|
-
|
|
383
|
-
Main exports:
|
|
384
|
-
|
|
385
|
-
- `createOptimizer`
|
|
386
|
-
- `optimizePrompt`
|
|
387
|
-
- `PromptOptimizer`
|
|
388
|
-
- `OllamaClient`
|
|
389
|
-
- `FileSessionStore`
|
|
390
|
-
- `SQLiteSessionStore`
|
|
391
|
-
|
|
392
|
-
Useful result fields:
|
|
393
|
-
|
|
394
|
-
- `optimizedPrompt`
|
|
395
|
-
- `finalPrompt`
|
|
396
|
-
- `selectedTarget`
|
|
397
|
-
- `rankedTargets`
|
|
398
|
-
- `routingReason`
|
|
399
|
-
- `routingWarnings`
|
|
400
|
-
- `provider`
|
|
401
|
-
- `model`
|
|
402
|
-
- `estimatedTokensBefore`
|
|
403
|
-
- `estimatedTokensAfter`
|
|
404
|
-
|
|
405
|
-
Supported modes:
|
|
406
|
-
|
|
407
|
-
- `clarity`
|
|
408
|
-
- `concise`
|
|
409
|
-
- `detailed`
|
|
410
|
-
- `structured`
|
|
411
|
-
- `persuasive`
|
|
412
|
-
- `compress`
|
|
413
|
-
- `claude_cli`
|
|
414
|
-
|
|
415
|
-
Supported presets:
|
|
416
|
-
|
|
417
|
-
- `code`
|
|
418
|
-
- `email`
|
|
419
|
-
- `essay`
|
|
420
|
-
- `support`
|
|
421
|
-
- `summarization`
|
|
422
|
-
- `chat`
|
|
423
|
-
|
|
424
|
-
## Why the default model was chosen
|
|
425
|
-
|
|
426
|
-
`qwen2.5:3b` is the default local preference because it offers a practical balance of:
|
|
427
|
-
|
|
428
|
-
- good instruction following
|
|
429
|
-
- strong enough reasoning for prompt optimization
|
|
430
|
-
- acceptable memory use on laptops
|
|
431
|
-
- good performance for code-first workflows
|
|
432
|
-
|
|
433
|
-
`phi3:mini` remains a useful lightweight option for shorter non-coding rewrites when it is installed locally and the Qwen router selects it.
|
|
434
|
-
|
|
435
|
-
## Future improvements
|
|
436
|
-
|
|
437
|
-
- semantic retrieval for context
|
|
438
|
-
- better token counting by target model
|
|
439
|
-
- prompt scoring
|
|
440
|
-
- local embeddings for relevance search
|
|
441
|
-
- response-aware context updates
|
|
442
|
-
- cache layer
|
|
443
|
-
- benchmark suite
|
|
305
|
+
MIT
|
package/dist/cli.d.ts
CHANGED