promptpilot 0.1.1 → 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +209 -106
- package/dist/cli.js +485 -113
- package/dist/cli.js.map +1 -1
- package/dist/index.d.ts +40 -1
- package/dist/index.js +433 -112
- package/dist/index.js.map +1 -1
- package/package.json +4 -2
package/README.md
CHANGED
|
@@ -1,29 +1,38 @@
|
|
|
1
1
|
# promptpilot
|
|
2
2
|
|
|
3
|
-
`promptpilot` is a
|
|
3
|
+
`promptpilot` is a code-first TypeScript package that sits between your app or CLI workflow and a downstream LLM. It optimizes prompts locally through Ollama, keeps lightweight session memory, compresses stale context, and can route each request to the best allowed downstream model for the job.
|
|
4
4
|
|
|
5
|
-
It is designed for
|
|
5
|
+
It is designed for agentic coding workflows first. If a prompt is ambiguous, PromptPilot biases toward coding-capable and tool-capable models. Non-coding tasks like email, support, summarization, and chat are still supported when the prompt makes that intent clear.
|
|
6
6
|
|
|
7
7
|
## Why local Ollama
|
|
8
8
|
|
|
9
|
-
- It keeps
|
|
10
|
-
- It
|
|
11
|
-
- It
|
|
12
|
-
- It
|
|
9
|
+
- It keeps optimization and routing close to your machine.
|
|
10
|
+
- It uses a small local model before you send anything to a stronger remote model.
|
|
11
|
+
- It avoids paying remote-token costs for every prompt rewrite.
|
|
12
|
+
- It works well on laptops with limited memory by preferring small Ollama models.
|
|
13
|
+
- It uses a local Qwen router when multiple small local models are available.
|
|
14
|
+
|
|
15
|
+
Default local preference is:
|
|
16
|
+
|
|
17
|
+
- `qwen2.5:3b`
|
|
18
|
+
- `phi3:mini`
|
|
19
|
+
- `llama3.2:3b`
|
|
13
20
|
|
|
14
21
|
## What it does
|
|
15
22
|
|
|
16
|
-
- Accepts a raw prompt plus optional metadata.
|
|
23
|
+
- Accepts a raw prompt plus optional task metadata.
|
|
17
24
|
- Persists session context across turns.
|
|
18
|
-
- Retrieves relevant prior context
|
|
19
|
-
-
|
|
20
|
-
- Preserves critical instructions and constraints.
|
|
25
|
+
- Retrieves and compresses relevant prior context.
|
|
26
|
+
- Preserves pinned constraints and user intent.
|
|
21
27
|
- Estimates token usage before and after optimization.
|
|
22
|
-
-
|
|
23
|
-
-
|
|
28
|
+
- Routes to a caller-supplied downstream model allowlist.
|
|
29
|
+
- Returns a selected target plus a ranked top 3 when routing is enabled.
|
|
30
|
+
- Outputs plain prompt text for shell pipelines or JSON for tooling/debugging.
|
|
24
31
|
|
|
25
32
|
## Quick start
|
|
26
33
|
|
|
34
|
+
Local repo workflow:
|
|
35
|
+
|
|
27
36
|
```bash
|
|
28
37
|
npm install
|
|
29
38
|
npm run build
|
|
@@ -33,30 +42,44 @@ promptpilot optimize "explain binary search simply" --plain
|
|
|
33
42
|
promptpilot optimize "continue my study guide" --session dsa --save-context --plain | claude
|
|
34
43
|
```
|
|
35
44
|
|
|
36
|
-
|
|
45
|
+
Install from npm:
|
|
37
46
|
|
|
38
47
|
```bash
|
|
39
48
|
npm install -g promptpilot
|
|
40
49
|
```
|
|
41
50
|
|
|
42
|
-
|
|
51
|
+
Install one or two small Ollama models so the local router has options:
|
|
43
52
|
|
|
44
53
|
```bash
|
|
45
|
-
|
|
46
|
-
|
|
54
|
+
ollama pull qwen2.5:3b
|
|
55
|
+
ollama pull phi3:mini
|
|
47
56
|
```
|
|
48
57
|
|
|
49
|
-
|
|
58
|
+
## Core behavior
|
|
50
59
|
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
60
|
+
PromptPilot has two distinct routing layers.
|
|
61
|
+
|
|
62
|
+
1. Local optimizer routing
|
|
63
|
+
|
|
64
|
+
- Explicit `ollamaModel` or `--model` always wins.
|
|
65
|
+
- If exactly one suitable small local model exists, it uses that model directly.
|
|
66
|
+
- If multiple suitable small local models exist, a local Qwen router chooses between them.
|
|
67
|
+
- If routing cannot complete, PromptPilot falls back to deterministic prompt shaping instead of making a static guess.
|
|
68
|
+
|
|
69
|
+
2. Downstream target routing
|
|
70
|
+
|
|
71
|
+
- The caller provides the allowed downstream targets.
|
|
72
|
+
- If one target is supplied, PromptPilot selects it directly.
|
|
73
|
+
- If multiple targets are supplied, a local Qwen router ranks them and selects the top target.
|
|
74
|
+
- Routing is code-first by default: ambiguous prompts bias toward coding-capable and agentic targets.
|
|
75
|
+
- If downstream routing fails, PromptPilot still returns an optimized prompt but does not invent a target.
|
|
55
76
|
|
|
56
77
|
## Library usage
|
|
57
78
|
|
|
79
|
+
### Basic optimization
|
|
80
|
+
|
|
58
81
|
```ts
|
|
59
|
-
import { createOptimizer
|
|
82
|
+
import { createOptimizer } from "promptpilot";
|
|
60
83
|
|
|
61
84
|
const optimizer = createOptimizer({
|
|
62
85
|
provider: "ollama",
|
|
@@ -65,22 +88,92 @@ const optimizer = createOptimizer({
|
|
|
65
88
|
});
|
|
66
89
|
|
|
67
90
|
const result = await optimizer.optimize({
|
|
68
|
-
prompt: "help me
|
|
69
|
-
task: "
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
91
|
+
prompt: "help me debug this failing CI job",
|
|
92
|
+
task: "code",
|
|
93
|
+
preset: "code",
|
|
94
|
+
sessionId: "ci-fix",
|
|
95
|
+
saveContext: true
|
|
96
|
+
});
|
|
97
|
+
|
|
98
|
+
console.log(result.finalPrompt);
|
|
99
|
+
console.log(result.model);
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
### Code-first downstream routing
|
|
103
|
+
|
|
104
|
+
```ts
|
|
105
|
+
import { createOptimizer } from "promptpilot";
|
|
106
|
+
|
|
107
|
+
const optimizer = createOptimizer({
|
|
108
|
+
provider: "ollama",
|
|
109
|
+
host: "http://localhost:11434",
|
|
110
|
+
contextStore: "local"
|
|
73
111
|
});
|
|
74
112
|
|
|
75
|
-
|
|
113
|
+
const result = await optimizer.optimize({
|
|
114
|
+
prompt: "rewrite this prompt for a coding refactor task",
|
|
115
|
+
task: "code",
|
|
116
|
+
preset: "code",
|
|
117
|
+
availableTargets: [
|
|
118
|
+
{
|
|
119
|
+
provider: "anthropic",
|
|
120
|
+
model: "claude-sonnet",
|
|
121
|
+
label: "anthropic:claude-sonnet",
|
|
122
|
+
capabilities: ["coding", "writing"],
|
|
123
|
+
costRank: 2
|
|
124
|
+
},
|
|
125
|
+
{
|
|
126
|
+
provider: "openai",
|
|
127
|
+
model: "gpt-4.1-mini",
|
|
128
|
+
label: "openai:gpt-4.1-mini",
|
|
129
|
+
capabilities: ["writing", "chat"],
|
|
130
|
+
costRank: 1
|
|
131
|
+
},
|
|
132
|
+
{
|
|
133
|
+
provider: "openai",
|
|
134
|
+
model: "gpt-5-codex",
|
|
135
|
+
label: "openai:gpt-5-codex",
|
|
136
|
+
capabilities: ["coding", "agentic", "tool_use", "debugging"],
|
|
137
|
+
costRank: 3
|
|
138
|
+
}
|
|
139
|
+
],
|
|
140
|
+
routingPriority: "cheapest_adequate",
|
|
141
|
+
targetHints: ["coding", "agentic", "refactor"],
|
|
142
|
+
workloadBias: "code_first",
|
|
143
|
+
debug: true
|
|
144
|
+
});
|
|
76
145
|
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
146
|
+
console.log(result.selectedTarget);
|
|
147
|
+
console.log(result.rankedTargets);
|
|
148
|
+
console.log(result.routingReason);
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
### Lightweight writing still works
|
|
152
|
+
|
|
153
|
+
```ts
|
|
154
|
+
const result = await optimizer.optimize({
|
|
155
|
+
prompt: "write a short internship follow-up email",
|
|
156
|
+
task: "email",
|
|
157
|
+
preset: "email",
|
|
158
|
+
availableTargets: [
|
|
159
|
+
{
|
|
160
|
+
provider: "anthropic",
|
|
161
|
+
model: "claude-sonnet",
|
|
162
|
+
label: "anthropic:claude-sonnet",
|
|
163
|
+
capabilities: ["coding", "writing"],
|
|
164
|
+
costRank: 2
|
|
165
|
+
},
|
|
166
|
+
{
|
|
167
|
+
provider: "openai",
|
|
168
|
+
model: "gpt-4.1-mini",
|
|
169
|
+
label: "openai:gpt-4.1-mini",
|
|
170
|
+
capabilities: ["writing", "email", "chat"],
|
|
171
|
+
costRank: 1
|
|
172
|
+
}
|
|
173
|
+
]
|
|
81
174
|
});
|
|
82
175
|
|
|
83
|
-
console.log(
|
|
176
|
+
console.log(result.selectedTarget);
|
|
84
177
|
```
|
|
85
178
|
|
|
86
179
|
## Claude CLI usage
|
|
@@ -88,37 +181,45 @@ console.log(oneOff.finalPrompt);
|
|
|
88
181
|
Plain shell output:
|
|
89
182
|
|
|
90
183
|
```bash
|
|
91
|
-
promptpilot optimize "help me
|
|
184
|
+
promptpilot optimize "help me debug this failing CI job" --task code --preset code --plain
|
|
92
185
|
```
|
|
93
186
|
|
|
94
|
-
|
|
187
|
+
Pipe directly into Claude CLI:
|
|
95
188
|
|
|
96
189
|
```bash
|
|
97
|
-
promptpilot optimize "
|
|
190
|
+
promptpilot optimize "continue working on this refactor" --session repo-refactor --save-context --plain | claude
|
|
98
191
|
```
|
|
99
192
|
|
|
100
|
-
|
|
193
|
+
Route against an allowlist of downstream targets:
|
|
101
194
|
|
|
102
195
|
```bash
|
|
103
|
-
|
|
196
|
+
promptpilot optimize "rewrite this prompt for a coding refactor task" \
|
|
197
|
+
--task code \
|
|
198
|
+
--preset code \
|
|
199
|
+
--target anthropic:claude-sonnet \
|
|
200
|
+
--target openai:gpt-4.1-mini \
|
|
201
|
+
--target openai:gpt-5-codex \
|
|
202
|
+
--target-hint coding \
|
|
203
|
+
--target-hint refactor \
|
|
204
|
+
--json --debug
|
|
104
205
|
```
|
|
105
206
|
|
|
106
|
-
|
|
207
|
+
Use stdin in a pipeline:
|
|
107
208
|
|
|
108
209
|
```bash
|
|
109
|
-
|
|
210
|
+
cat notes.txt | promptpilot optimize --task summarization --plain | claude
|
|
110
211
|
```
|
|
111
212
|
|
|
112
|
-
|
|
213
|
+
Save context between calls:
|
|
113
214
|
|
|
114
215
|
```bash
|
|
115
|
-
promptpilot optimize "
|
|
216
|
+
promptpilot optimize "continue my debugger plan" --session ci-fix --save-context --plain
|
|
116
217
|
```
|
|
117
218
|
|
|
118
|
-
|
|
219
|
+
Clear a session:
|
|
119
220
|
|
|
120
221
|
```bash
|
|
121
|
-
promptpilot optimize --session
|
|
222
|
+
promptpilot optimize --session ci-fix --clear-session
|
|
122
223
|
```
|
|
123
224
|
|
|
124
225
|
Node `child_process` example:
|
|
@@ -126,68 +227,67 @@ Node `child_process` example:
|
|
|
126
227
|
```ts
|
|
127
228
|
import { spawn } from "node:child_process";
|
|
128
229
|
|
|
129
|
-
const
|
|
230
|
+
const promptpilot = spawn("promptpilot", [
|
|
130
231
|
"optimize",
|
|
131
|
-
"continue
|
|
232
|
+
"continue working on this repo refactor",
|
|
132
233
|
"--session",
|
|
133
|
-
"
|
|
234
|
+
"repo-refactor",
|
|
235
|
+
"--save-context",
|
|
134
236
|
"--plain"
|
|
135
237
|
]);
|
|
136
238
|
|
|
137
239
|
const claude = spawn("claude", [], { stdio: ["pipe", "inherit", "inherit"] });
|
|
138
|
-
|
|
240
|
+
promptpilot.stdout.pipe(claude.stdin);
|
|
139
241
|
```
|
|
140
242
|
|
|
141
243
|
## Session context
|
|
142
244
|
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
If you do not pass `ollamaModel` or `--model`, `promptpilot` asks Ollama which models are installed and picks the best small model for the job. For most workflows it prefers `qwen2.5:3b`, then `phi3:mini`, then `llama3.2:3b`. For code-heavy prompts it will prefer `qwen2.5-coder:3b` when that model is installed. If only oversized local models are available, it warns and falls back to deterministic heuristic optimization instead of silently using a heavy model.
|
|
245
|
+
If you pass a `sessionId`, PromptPilot stores session entries in a local store. The default store is JSON under `~/.promptpilot/sessions`. SQLite is also supported when `node:sqlite` or `better-sqlite3` is available.
|
|
146
246
|
|
|
147
247
|
Each session stores:
|
|
148
248
|
|
|
149
|
-
-
|
|
150
|
-
-
|
|
151
|
-
-
|
|
152
|
-
-
|
|
153
|
-
-
|
|
154
|
-
-
|
|
155
|
-
-
|
|
249
|
+
- user prompts
|
|
250
|
+
- optimized prompts
|
|
251
|
+
- final prompts
|
|
252
|
+
- extracted constraints
|
|
253
|
+
- summaries
|
|
254
|
+
- timestamps
|
|
255
|
+
- optional tags
|
|
156
256
|
|
|
157
257
|
Context retrieval prefers:
|
|
158
258
|
|
|
159
|
-
-
|
|
160
|
-
-
|
|
161
|
-
-
|
|
162
|
-
-
|
|
163
|
-
-
|
|
259
|
+
- pinned constraints
|
|
260
|
+
- task goals
|
|
261
|
+
- recent relevant turns
|
|
262
|
+
- named entities and recurring references
|
|
263
|
+
- stored summaries when budgets are tight
|
|
164
264
|
|
|
165
265
|
## Token reduction
|
|
166
266
|
|
|
167
|
-
|
|
267
|
+
PromptPilot estimates token usage for:
|
|
168
268
|
|
|
169
|
-
-
|
|
170
|
-
-
|
|
171
|
-
-
|
|
269
|
+
- the new prompt
|
|
270
|
+
- retrieved context
|
|
271
|
+
- the final composed prompt
|
|
172
272
|
|
|
173
|
-
|
|
273
|
+
Budgets:
|
|
174
274
|
|
|
175
275
|
- `maxInputTokens`
|
|
176
276
|
- `maxContextTokens`
|
|
177
277
|
- `maxTotalTokens`
|
|
178
278
|
|
|
179
|
-
When
|
|
279
|
+
When the budget is too large, PromptPilot compresses or summarizes old context, preserves high-signal instructions, and drops low-value context before composing the final prompt.
|
|
180
280
|
|
|
181
281
|
## CLI
|
|
182
282
|
|
|
183
283
|
```bash
|
|
184
|
-
promptpilot optimize "
|
|
284
|
+
promptpilot optimize "rewrite this prompt for a coding refactor task"
|
|
185
285
|
```
|
|
186
286
|
|
|
187
287
|
Supported flags:
|
|
188
288
|
|
|
189
289
|
- `--session <id>`
|
|
190
|
-
- `--model <name>`
|
|
290
|
+
- `--model <name>`
|
|
191
291
|
- `--mode <mode>`
|
|
192
292
|
- `--task <task>`
|
|
193
293
|
- `--tone <tone>`
|
|
@@ -197,6 +297,13 @@ Supported flags:
|
|
|
197
297
|
- `--max-length <n>`
|
|
198
298
|
- `--tag <value>` repeatable
|
|
199
299
|
- `--pin-constraint <text>` repeatable
|
|
300
|
+
- `--target <provider:model>` repeatable
|
|
301
|
+
- `--target-hint <value>` repeatable
|
|
302
|
+
- `--routing-priority <cheapest_adequate|best_quality|fastest_adequate>`
|
|
303
|
+
- `--routing-top-k <n>`
|
|
304
|
+
- `--workload-bias <code_first>`
|
|
305
|
+
- `--no-routing`
|
|
306
|
+
- `--host <url>`
|
|
200
307
|
- `--store <local|sqlite>`
|
|
201
308
|
- `--storage-dir <path>`
|
|
202
309
|
- `--sqlite-path <path>`
|
|
@@ -205,13 +312,14 @@ Supported flags:
|
|
|
205
312
|
- `--debug`
|
|
206
313
|
- `--save-context`
|
|
207
314
|
- `--no-context`
|
|
315
|
+
- `--clear-session`
|
|
208
316
|
- `--max-total-tokens <n>`
|
|
209
317
|
- `--max-context-tokens <n>`
|
|
210
318
|
- `--max-input-tokens <n>`
|
|
211
|
-
- `--
|
|
319
|
+
- `--timeout <ms>`
|
|
212
320
|
- `--bypass-optimization`
|
|
213
321
|
|
|
214
|
-
If no prompt
|
|
322
|
+
If no positional prompt is provided, `promptpilot optimize` reads the raw prompt from stdin.
|
|
215
323
|
|
|
216
324
|
## Public API
|
|
217
325
|
|
|
@@ -224,6 +332,19 @@ Main exports:
|
|
|
224
332
|
- `FileSessionStore`
|
|
225
333
|
- `SQLiteSessionStore`
|
|
226
334
|
|
|
335
|
+
Useful result fields:
|
|
336
|
+
|
|
337
|
+
- `optimizedPrompt`
|
|
338
|
+
- `finalPrompt`
|
|
339
|
+
- `selectedTarget`
|
|
340
|
+
- `rankedTargets`
|
|
341
|
+
- `routingReason`
|
|
342
|
+
- `routingWarnings`
|
|
343
|
+
- `provider`
|
|
344
|
+
- `model`
|
|
345
|
+
- `estimatedTokensBefore`
|
|
346
|
+
- `estimatedTokensAfter`
|
|
347
|
+
|
|
227
348
|
Supported modes:
|
|
228
349
|
|
|
229
350
|
- `clarity`
|
|
@@ -243,41 +364,23 @@ Supported presets:
|
|
|
243
364
|
- `summarization`
|
|
244
365
|
- `chat`
|
|
245
366
|
|
|
246
|
-
##
|
|
247
|
-
|
|
248
|
-
|
|
249
|
-
src/
|
|
250
|
-
index.ts
|
|
251
|
-
types.ts
|
|
252
|
-
errors.ts
|
|
253
|
-
cli.ts
|
|
254
|
-
core/
|
|
255
|
-
optimizer.ts
|
|
256
|
-
ollamaClient.ts
|
|
257
|
-
systemPrompt.ts
|
|
258
|
-
contextManager.ts
|
|
259
|
-
tokenEstimator.ts
|
|
260
|
-
contextCompressor.ts
|
|
261
|
-
storage/
|
|
262
|
-
fileSessionStore.ts
|
|
263
|
-
sqliteSessionStore.ts
|
|
264
|
-
utils/
|
|
265
|
-
validation.ts
|
|
266
|
-
logger.ts
|
|
267
|
-
json.ts
|
|
268
|
-
test/
|
|
269
|
-
```
|
|
367
|
+
## Why the default model was chosen
|
|
368
|
+
|
|
369
|
+
`qwen2.5:3b` is the default local preference because it offers a practical balance of:
|
|
270
370
|
|
|
271
|
-
|
|
371
|
+
- good instruction following
|
|
372
|
+
- strong enough reasoning for prompt optimization
|
|
373
|
+
- acceptable memory use on laptops
|
|
374
|
+
- good performance for code-first workflows
|
|
272
375
|
|
|
273
|
-
|
|
376
|
+
`phi3:mini` remains a useful lightweight option for shorter non-coding rewrites when it is installed locally and the Qwen router selects it.
|
|
274
377
|
|
|
275
378
|
## Future improvements
|
|
276
379
|
|
|
277
|
-
-
|
|
278
|
-
-
|
|
279
|
-
-
|
|
280
|
-
-
|
|
281
|
-
-
|
|
282
|
-
-
|
|
283
|
-
-
|
|
380
|
+
- semantic retrieval for context
|
|
381
|
+
- better token counting by target model
|
|
382
|
+
- prompt scoring
|
|
383
|
+
- local embeddings for relevance search
|
|
384
|
+
- response-aware context updates
|
|
385
|
+
- cache layer
|
|
386
|
+
- benchmark suite
|