promptpilot 0.1.2 → 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +209 -107
- package/dist/cli.js +290 -5
- package/dist/cli.js.map +1 -1
- package/dist/index.d.ts +35 -1
- package/dist/index.js +238 -4
- package/dist/index.js.map +1 -1
- package/package.json +4 -2
package/README.md
CHANGED
|
@@ -1,30 +1,38 @@
|
|
|
1
1
|
# promptpilot
|
|
2
2
|
|
|
3
|
-
`promptpilot` is a
|
|
3
|
+
`promptpilot` is a code-first TypeScript package that sits between your app or CLI workflow and a downstream LLM. It optimizes prompts locally through Ollama, keeps lightweight session memory, compresses stale context, and can route each request to the best allowed downstream model for the job.
|
|
4
4
|
|
|
5
|
-
It is designed for
|
|
5
|
+
It is designed for agentic coding workflows first. If a prompt is ambiguous, PromptPilot biases toward coding-capable and tool-capable models. Non-coding tasks like email, support, summarization, and chat are still supported when the prompt makes that intent clear.
|
|
6
6
|
|
|
7
7
|
## Why local Ollama
|
|
8
8
|
|
|
9
|
-
- It keeps
|
|
10
|
-
- It
|
|
11
|
-
- It
|
|
12
|
-
- It
|
|
13
|
-
- It uses
|
|
9
|
+
- It keeps optimization and routing close to your machine.
|
|
10
|
+
- It uses a small local model before you send anything to a stronger remote model.
|
|
11
|
+
- It avoids paying remote-token costs for every prompt rewrite.
|
|
12
|
+
- It works well on laptops with limited memory by preferring small Ollama models.
|
|
13
|
+
- It uses a local Qwen router when multiple small local models are available.
|
|
14
|
+
|
|
15
|
+
Default local preference is:
|
|
16
|
+
|
|
17
|
+
- `qwen2.5:3b`
|
|
18
|
+
- `phi3:mini`
|
|
19
|
+
- `llama3.2:3b`
|
|
14
20
|
|
|
15
21
|
## What it does
|
|
16
22
|
|
|
17
|
-
- Accepts a raw prompt plus optional metadata.
|
|
23
|
+
- Accepts a raw prompt plus optional task metadata.
|
|
18
24
|
- Persists session context across turns.
|
|
19
|
-
- Retrieves relevant prior context
|
|
20
|
-
-
|
|
21
|
-
- Preserves critical instructions and constraints.
|
|
25
|
+
- Retrieves and compresses relevant prior context.
|
|
26
|
+
- Preserves pinned constraints and user intent.
|
|
22
27
|
- Estimates token usage before and after optimization.
|
|
23
|
-
-
|
|
24
|
-
-
|
|
28
|
+
- Routes to a caller-supplied downstream model allowlist.
|
|
29
|
+
- Returns a selected target plus a ranked top 3 when routing is enabled.
|
|
30
|
+
- Outputs plain prompt text for shell pipelines or JSON for tooling/debugging.
|
|
25
31
|
|
|
26
32
|
## Quick start
|
|
27
33
|
|
|
34
|
+
Local repo workflow:
|
|
35
|
+
|
|
28
36
|
```bash
|
|
29
37
|
npm install
|
|
30
38
|
npm run build
|
|
@@ -34,30 +42,44 @@ promptpilot optimize "explain binary search simply" --plain
|
|
|
34
42
|
promptpilot optimize "continue my study guide" --session dsa --save-context --plain | claude
|
|
35
43
|
```
|
|
36
44
|
|
|
37
|
-
|
|
45
|
+
Install from npm:
|
|
38
46
|
|
|
39
47
|
```bash
|
|
40
48
|
npm install -g promptpilot
|
|
41
49
|
```
|
|
42
50
|
|
|
43
|
-
|
|
51
|
+
Install one or two small Ollama models so the local router has options:
|
|
44
52
|
|
|
45
53
|
```bash
|
|
46
|
-
|
|
47
|
-
|
|
54
|
+
ollama pull qwen2.5:3b
|
|
55
|
+
ollama pull phi3:mini
|
|
48
56
|
```
|
|
49
57
|
|
|
50
|
-
|
|
58
|
+
## Core behavior
|
|
51
59
|
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
60
|
+
PromptPilot has two distinct routing layers.
|
|
61
|
+
|
|
62
|
+
1. Local optimizer routing
|
|
63
|
+
|
|
64
|
+
- Explicit `ollamaModel` or `--model` always wins.
|
|
65
|
+
- If exactly one suitable small local model exists, it uses that model directly.
|
|
66
|
+
- If multiple suitable small local models exist, a local Qwen router chooses between them.
|
|
67
|
+
- If routing cannot complete, PromptPilot falls back to deterministic prompt shaping instead of making a static guess.
|
|
68
|
+
|
|
69
|
+
2. Downstream target routing
|
|
70
|
+
|
|
71
|
+
- The caller provides the allowed downstream targets.
|
|
72
|
+
- If one target is supplied, PromptPilot selects it directly.
|
|
73
|
+
- If multiple targets are supplied, a local Qwen router ranks them and selects the top target.
|
|
74
|
+
- Routing is code-first by default: ambiguous prompts bias toward coding-capable and agentic targets.
|
|
75
|
+
- If downstream routing fails, PromptPilot still returns an optimized prompt but does not invent a target.
|
|
56
76
|
|
|
57
77
|
## Library usage
|
|
58
78
|
|
|
79
|
+
### Basic optimization
|
|
80
|
+
|
|
59
81
|
```ts
|
|
60
|
-
import { createOptimizer
|
|
82
|
+
import { createOptimizer } from "promptpilot";
|
|
61
83
|
|
|
62
84
|
const optimizer = createOptimizer({
|
|
63
85
|
provider: "ollama",
|
|
@@ -66,22 +88,92 @@ const optimizer = createOptimizer({
|
|
|
66
88
|
});
|
|
67
89
|
|
|
68
90
|
const result = await optimizer.optimize({
|
|
69
|
-
prompt: "help me
|
|
70
|
-
task: "
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
91
|
+
prompt: "help me debug this failing CI job",
|
|
92
|
+
task: "code",
|
|
93
|
+
preset: "code",
|
|
94
|
+
sessionId: "ci-fix",
|
|
95
|
+
saveContext: true
|
|
96
|
+
});
|
|
97
|
+
|
|
98
|
+
console.log(result.finalPrompt);
|
|
99
|
+
console.log(result.model);
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
### Code-first downstream routing
|
|
103
|
+
|
|
104
|
+
```ts
|
|
105
|
+
import { createOptimizer } from "promptpilot";
|
|
106
|
+
|
|
107
|
+
const optimizer = createOptimizer({
|
|
108
|
+
provider: "ollama",
|
|
109
|
+
host: "http://localhost:11434",
|
|
110
|
+
contextStore: "local"
|
|
74
111
|
});
|
|
75
112
|
|
|
76
|
-
|
|
113
|
+
const result = await optimizer.optimize({
|
|
114
|
+
prompt: "rewrite this prompt for a coding refactor task",
|
|
115
|
+
task: "code",
|
|
116
|
+
preset: "code",
|
|
117
|
+
availableTargets: [
|
|
118
|
+
{
|
|
119
|
+
provider: "anthropic",
|
|
120
|
+
model: "claude-sonnet",
|
|
121
|
+
label: "anthropic:claude-sonnet",
|
|
122
|
+
capabilities: ["coding", "writing"],
|
|
123
|
+
costRank: 2
|
|
124
|
+
},
|
|
125
|
+
{
|
|
126
|
+
provider: "openai",
|
|
127
|
+
model: "gpt-4.1-mini",
|
|
128
|
+
label: "openai:gpt-4.1-mini",
|
|
129
|
+
capabilities: ["writing", "chat"],
|
|
130
|
+
costRank: 1
|
|
131
|
+
},
|
|
132
|
+
{
|
|
133
|
+
provider: "openai",
|
|
134
|
+
model: "gpt-5-codex",
|
|
135
|
+
label: "openai:gpt-5-codex",
|
|
136
|
+
capabilities: ["coding", "agentic", "tool_use", "debugging"],
|
|
137
|
+
costRank: 3
|
|
138
|
+
}
|
|
139
|
+
],
|
|
140
|
+
routingPriority: "cheapest_adequate",
|
|
141
|
+
targetHints: ["coding", "agentic", "refactor"],
|
|
142
|
+
workloadBias: "code_first",
|
|
143
|
+
debug: true
|
|
144
|
+
});
|
|
77
145
|
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
146
|
+
console.log(result.selectedTarget);
|
|
147
|
+
console.log(result.rankedTargets);
|
|
148
|
+
console.log(result.routingReason);
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
### Lightweight writing still works
|
|
152
|
+
|
|
153
|
+
```ts
|
|
154
|
+
const result = await optimizer.optimize({
|
|
155
|
+
prompt: "write a short internship follow-up email",
|
|
156
|
+
task: "email",
|
|
157
|
+
preset: "email",
|
|
158
|
+
availableTargets: [
|
|
159
|
+
{
|
|
160
|
+
provider: "anthropic",
|
|
161
|
+
model: "claude-sonnet",
|
|
162
|
+
label: "anthropic:claude-sonnet",
|
|
163
|
+
capabilities: ["coding", "writing"],
|
|
164
|
+
costRank: 2
|
|
165
|
+
},
|
|
166
|
+
{
|
|
167
|
+
provider: "openai",
|
|
168
|
+
model: "gpt-4.1-mini",
|
|
169
|
+
label: "openai:gpt-4.1-mini",
|
|
170
|
+
capabilities: ["writing", "email", "chat"],
|
|
171
|
+
costRank: 1
|
|
172
|
+
}
|
|
173
|
+
]
|
|
82
174
|
});
|
|
83
175
|
|
|
84
|
-
console.log(
|
|
176
|
+
console.log(result.selectedTarget);
|
|
85
177
|
```
|
|
86
178
|
|
|
87
179
|
## Claude CLI usage
|
|
@@ -89,37 +181,45 @@ console.log(oneOff.finalPrompt);
|
|
|
89
181
|
Plain shell output:
|
|
90
182
|
|
|
91
183
|
```bash
|
|
92
|
-
promptpilot optimize "help me
|
|
184
|
+
promptpilot optimize "help me debug this failing CI job" --task code --preset code --plain
|
|
93
185
|
```
|
|
94
186
|
|
|
95
|
-
|
|
187
|
+
Pipe directly into Claude CLI:
|
|
96
188
|
|
|
97
189
|
```bash
|
|
98
|
-
promptpilot optimize "
|
|
190
|
+
promptpilot optimize "continue working on this refactor" --session repo-refactor --save-context --plain | claude
|
|
99
191
|
```
|
|
100
192
|
|
|
101
|
-
|
|
193
|
+
Route against an allowlist of downstream targets:
|
|
102
194
|
|
|
103
195
|
```bash
|
|
104
|
-
|
|
196
|
+
promptpilot optimize "rewrite this prompt for a coding refactor task" \
|
|
197
|
+
--task code \
|
|
198
|
+
--preset code \
|
|
199
|
+
--target anthropic:claude-sonnet \
|
|
200
|
+
--target openai:gpt-4.1-mini \
|
|
201
|
+
--target openai:gpt-5-codex \
|
|
202
|
+
--target-hint coding \
|
|
203
|
+
--target-hint refactor \
|
|
204
|
+
--json --debug
|
|
105
205
|
```
|
|
106
206
|
|
|
107
|
-
|
|
207
|
+
Use stdin in a pipeline:
|
|
108
208
|
|
|
109
209
|
```bash
|
|
110
|
-
|
|
210
|
+
cat notes.txt | promptpilot optimize --task summarization --plain | claude
|
|
111
211
|
```
|
|
112
212
|
|
|
113
|
-
|
|
213
|
+
Save context between calls:
|
|
114
214
|
|
|
115
215
|
```bash
|
|
116
|
-
promptpilot optimize "
|
|
216
|
+
promptpilot optimize "continue my debugger plan" --session ci-fix --save-context --plain
|
|
117
217
|
```
|
|
118
218
|
|
|
119
|
-
|
|
219
|
+
Clear a session:
|
|
120
220
|
|
|
121
221
|
```bash
|
|
122
|
-
promptpilot optimize --session
|
|
222
|
+
promptpilot optimize --session ci-fix --clear-session
|
|
123
223
|
```
|
|
124
224
|
|
|
125
225
|
Node `child_process` example:
|
|
@@ -127,68 +227,67 @@ Node `child_process` example:
|
|
|
127
227
|
```ts
|
|
128
228
|
import { spawn } from "node:child_process";
|
|
129
229
|
|
|
130
|
-
const
|
|
230
|
+
const promptpilot = spawn("promptpilot", [
|
|
131
231
|
"optimize",
|
|
132
|
-
"continue
|
|
232
|
+
"continue working on this repo refactor",
|
|
133
233
|
"--session",
|
|
134
|
-
"
|
|
234
|
+
"repo-refactor",
|
|
235
|
+
"--save-context",
|
|
135
236
|
"--plain"
|
|
136
237
|
]);
|
|
137
238
|
|
|
138
239
|
const claude = spawn("claude", [], { stdio: ["pipe", "inherit", "inherit"] });
|
|
139
|
-
|
|
240
|
+
promptpilot.stdout.pipe(claude.stdin);
|
|
140
241
|
```
|
|
141
242
|
|
|
142
243
|
## Session context
|
|
143
244
|
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
If you do not pass `ollamaModel` or `--model`, `promptpilot` asks Ollama which models are installed and lets a small local Qwen router choose the best small optimizer model for the current prompt. It does not statically rank multiple candidate models anymore. If a suitable Qwen router model is not available when multiple small candidates exist, it falls back to deterministic heuristic prompt optimization instead of making a static model-choice guess. If only oversized local models are available, it also falls back to deterministic heuristic optimization instead of silently using a heavy model.
|
|
245
|
+
If you pass a `sessionId`, PromptPilot stores session entries in a local store. The default store is JSON under `~/.promptpilot/sessions`. SQLite is also supported when `node:sqlite` or `better-sqlite3` is available.
|
|
147
246
|
|
|
148
247
|
Each session stores:
|
|
149
248
|
|
|
150
|
-
-
|
|
151
|
-
-
|
|
152
|
-
-
|
|
153
|
-
-
|
|
154
|
-
-
|
|
155
|
-
-
|
|
156
|
-
-
|
|
249
|
+
- user prompts
|
|
250
|
+
- optimized prompts
|
|
251
|
+
- final prompts
|
|
252
|
+
- extracted constraints
|
|
253
|
+
- summaries
|
|
254
|
+
- timestamps
|
|
255
|
+
- optional tags
|
|
157
256
|
|
|
158
257
|
Context retrieval prefers:
|
|
159
258
|
|
|
160
|
-
-
|
|
161
|
-
-
|
|
162
|
-
-
|
|
163
|
-
-
|
|
164
|
-
-
|
|
259
|
+
- pinned constraints
|
|
260
|
+
- task goals
|
|
261
|
+
- recent relevant turns
|
|
262
|
+
- named entities and recurring references
|
|
263
|
+
- stored summaries when budgets are tight
|
|
165
264
|
|
|
166
265
|
## Token reduction
|
|
167
266
|
|
|
168
|
-
|
|
267
|
+
PromptPilot estimates token usage for:
|
|
169
268
|
|
|
170
|
-
-
|
|
171
|
-
-
|
|
172
|
-
-
|
|
269
|
+
- the new prompt
|
|
270
|
+
- retrieved context
|
|
271
|
+
- the final composed prompt
|
|
173
272
|
|
|
174
|
-
|
|
273
|
+
Budgets:
|
|
175
274
|
|
|
176
275
|
- `maxInputTokens`
|
|
177
276
|
- `maxContextTokens`
|
|
178
277
|
- `maxTotalTokens`
|
|
179
278
|
|
|
180
|
-
When
|
|
279
|
+
When the budget is too large, PromptPilot compresses or summarizes old context, preserves high-signal instructions, and drops low-value context before composing the final prompt.
|
|
181
280
|
|
|
182
281
|
## CLI
|
|
183
282
|
|
|
184
283
|
```bash
|
|
185
|
-
promptpilot optimize "
|
|
284
|
+
promptpilot optimize "rewrite this prompt for a coding refactor task"
|
|
186
285
|
```
|
|
187
286
|
|
|
188
287
|
Supported flags:
|
|
189
288
|
|
|
190
289
|
- `--session <id>`
|
|
191
|
-
- `--model <name>`
|
|
290
|
+
- `--model <name>`
|
|
192
291
|
- `--mode <mode>`
|
|
193
292
|
- `--task <task>`
|
|
194
293
|
- `--tone <tone>`
|
|
@@ -198,6 +297,13 @@ Supported flags:
|
|
|
198
297
|
- `--max-length <n>`
|
|
199
298
|
- `--tag <value>` repeatable
|
|
200
299
|
- `--pin-constraint <text>` repeatable
|
|
300
|
+
- `--target <provider:model>` repeatable
|
|
301
|
+
- `--target-hint <value>` repeatable
|
|
302
|
+
- `--routing-priority <cheapest_adequate|best_quality|fastest_adequate>`
|
|
303
|
+
- `--routing-top-k <n>`
|
|
304
|
+
- `--workload-bias <code_first>`
|
|
305
|
+
- `--no-routing`
|
|
306
|
+
- `--host <url>`
|
|
201
307
|
- `--store <local|sqlite>`
|
|
202
308
|
- `--storage-dir <path>`
|
|
203
309
|
- `--sqlite-path <path>`
|
|
@@ -206,13 +312,14 @@ Supported flags:
|
|
|
206
312
|
- `--debug`
|
|
207
313
|
- `--save-context`
|
|
208
314
|
- `--no-context`
|
|
315
|
+
- `--clear-session`
|
|
209
316
|
- `--max-total-tokens <n>`
|
|
210
317
|
- `--max-context-tokens <n>`
|
|
211
318
|
- `--max-input-tokens <n>`
|
|
212
|
-
- `--
|
|
319
|
+
- `--timeout <ms>`
|
|
213
320
|
- `--bypass-optimization`
|
|
214
321
|
|
|
215
|
-
If no prompt
|
|
322
|
+
If no positional prompt is provided, `promptpilot optimize` reads the raw prompt from stdin.
|
|
216
323
|
|
|
217
324
|
## Public API
|
|
218
325
|
|
|
@@ -225,6 +332,19 @@ Main exports:
|
|
|
225
332
|
- `FileSessionStore`
|
|
226
333
|
- `SQLiteSessionStore`
|
|
227
334
|
|
|
335
|
+
Useful result fields:
|
|
336
|
+
|
|
337
|
+
- `optimizedPrompt`
|
|
338
|
+
- `finalPrompt`
|
|
339
|
+
- `selectedTarget`
|
|
340
|
+
- `rankedTargets`
|
|
341
|
+
- `routingReason`
|
|
342
|
+
- `routingWarnings`
|
|
343
|
+
- `provider`
|
|
344
|
+
- `model`
|
|
345
|
+
- `estimatedTokensBefore`
|
|
346
|
+
- `estimatedTokensAfter`
|
|
347
|
+
|
|
228
348
|
Supported modes:
|
|
229
349
|
|
|
230
350
|
- `clarity`
|
|
@@ -244,41 +364,23 @@ Supported presets:
|
|
|
244
364
|
- `summarization`
|
|
245
365
|
- `chat`
|
|
246
366
|
|
|
247
|
-
##
|
|
248
|
-
|
|
249
|
-
|
|
250
|
-
src/
|
|
251
|
-
index.ts
|
|
252
|
-
types.ts
|
|
253
|
-
errors.ts
|
|
254
|
-
cli.ts
|
|
255
|
-
core/
|
|
256
|
-
optimizer.ts
|
|
257
|
-
ollamaClient.ts
|
|
258
|
-
systemPrompt.ts
|
|
259
|
-
contextManager.ts
|
|
260
|
-
tokenEstimator.ts
|
|
261
|
-
contextCompressor.ts
|
|
262
|
-
storage/
|
|
263
|
-
fileSessionStore.ts
|
|
264
|
-
sqliteSessionStore.ts
|
|
265
|
-
utils/
|
|
266
|
-
validation.ts
|
|
267
|
-
logger.ts
|
|
268
|
-
json.ts
|
|
269
|
-
test/
|
|
270
|
-
```
|
|
367
|
+
## Why the default model was chosen
|
|
368
|
+
|
|
369
|
+
`qwen2.5:3b` is the default local preference because it offers a practical balance of:
|
|
271
370
|
|
|
272
|
-
|
|
371
|
+
- good instruction following
|
|
372
|
+
- strong enough reasoning for prompt optimization
|
|
373
|
+
- acceptable memory use on laptops
|
|
374
|
+
- good performance for code-first workflows
|
|
273
375
|
|
|
274
|
-
|
|
376
|
+
`phi3:mini` remains a useful lightweight option for shorter non-coding rewrites when it is installed locally and the Qwen router selects it.
|
|
275
377
|
|
|
276
378
|
## Future improvements
|
|
277
379
|
|
|
278
|
-
-
|
|
279
|
-
-
|
|
280
|
-
-
|
|
281
|
-
-
|
|
282
|
-
-
|
|
283
|
-
-
|
|
284
|
-
-
|
|
380
|
+
- semantic retrieval for context
|
|
381
|
+
- better token counting by target model
|
|
382
|
+
- prompt scoring
|
|
383
|
+
- local embeddings for relevance search
|
|
384
|
+
- response-aware context updates
|
|
385
|
+
- cache layer
|
|
386
|
+
- benchmark suite
|