promptpilot 0.1.8 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,135 +1,106 @@
1
1
  # promptpilot
2
2
 
3
- `promptpilot` is a code-first TypeScript package that sits between your app or CLI workflow and a downstream LLM. It optimizes prompts locally through Ollama, keeps lightweight session memory, compresses stale context, and can route each request to the best allowed downstream model for the job.
3
+ A local prompt optimizer and model router for Claude CLI and agentic LLM workflows.
4
4
 
5
- It is designed for agentic coding workflows first. If a prompt is ambiguous, PromptPilot biases toward coding-capable and tool-capable models. Non-coding tasks like email, support, summarization, and chat are still supported when the prompt makes that intent clear.
5
+ Before your prompt reaches a remote model, PromptPilot rewrites it locally using a small Ollama model cutting noise, compressing context, and routing to the right downstream model. No prompt rewrite costs you remote tokens.
6
6
 
7
- ## Why local Ollama
7
+ ---
8
8
 
9
- - It keeps optimization and routing close to your machine.
10
- - It uses a small local model before you send anything to a stronger remote model.
11
- - It avoids paying remote-token costs for every prompt rewrite.
12
- - It works well on laptops with limited memory by preferring small Ollama models.
13
- - It uses a local Qwen router when multiple small local models are available.
9
+ ## Install
14
10
 
15
- Default local preference is:
11
+ ```bash
12
+ npm install -g promptpilot
13
+ ```
14
+
15
+ Requires [Ollama](https://ollama.com) running locally and Node.js >= 20.10.0.
16
+
17
+ Pull at least one small local model:
18
+
19
+ ```bash
20
+ ollama pull qwen2.5:3b
21
+ ```
16
22
 
17
- - `qwen2.5:3b`
18
- - `phi3:mini`
19
- - `llama3.2:3b`
23
+ ---
20
24
 
21
25
  ## What it does
22
26
 
23
- - Accepts a raw prompt plus optional task metadata.
24
- - Persists session context across turns.
25
- - Retrieves and compresses relevant prior context.
26
- - Preserves pinned constraints and user intent.
27
- - Estimates token usage before and after optimization.
28
- - Routes to a caller-supplied downstream model allowlist.
29
- - Returns a selected target plus a ranked top 3 when routing is enabled.
30
- - Outputs plain prompt text for shell pipelines or JSON for tooling/debugging.
27
+ - Rewrites your prompt locally before sending it anywhere
28
+ - Keeps session memory across turns so context carries forward
29
+ - Compresses old context when it gets too long
30
+ - Routes to the best model from a list you provide
31
+ - Outputs plain text for shell pipelines or JSON for tooling
31
32
 
32
- ## Quick start
33
+ ---
33
34
 
34
- Local repo workflow:
35
+ ## Quick start
35
36
 
36
37
  ```bash
37
- npm install
38
- npm run build
39
- npm test
40
- ollama pull qwen2.5:3b
38
+ # Optimize a prompt and print the result
41
39
  promptpilot optimize "explain binary search simply" --plain
42
- promptpilot optimize "continue my study guide" --session dsa --save-context --plain | claude
43
- ```
44
40
 
45
- Install from npm:
41
+ # Pipe directly into Claude
42
+ promptpilot optimize "continue my study guide" --session dsa --save-context --plain | claude
46
43
 
47
- ```bash
48
- npm install -g promptpilot
44
+ # Read from a file
45
+ cat notes.txt | promptpilot optimize --task summarization --plain | claude
49
46
  ```
50
47
 
51
- Run `promptpilot` with no arguments in an interactive terminal to open the CLI welcome screen:
52
-
53
- ```text
54
- PromptPilot v0.1.x
55
- ┌──────────────────────────────────────────────────────────────────────────────┐
56
- │ Welcome back │
57
- │ │
58
- │ .-''''-. Launchpad │
59
- │ .' .--. '. Run promptpilot optimize "..." │
60
- │ / / oo \ \ Pipe directly into Claude with | claude│
61
- │ | \_==_/ | │
62
- │ \ \_/ \_/ / Custom local model │
63
- │ '._/|__|\_.' Use --model promptpilot-compressor │
64
- │ │
65
- │ /Users/you/project Commands │
66
- │ optimize optimize and route prompts │
67
- │ --help show the full CLI reference │
68
- └──────────────────────────────────────────────────────────────────────────────┘
69
- ```
48
+ ---
49
+
50
+ ## Session memory
70
51
 
71
- Install one or two small Ollama models so the local router has options:
52
+ Pass `--session <name>` to persist context across calls. PromptPilot stores sessions as JSON under `~/.promptpilot/sessions` by default.
72
53
 
73
54
  ```bash
74
- ollama pull qwen2.5:3b
75
- ollama pull phi3:mini
55
+ # Save context after each turn
56
+ promptpilot optimize "start a refactor plan" --session repo-refactor --save-context --plain
57
+
58
+ # Pick up where you left off
59
+ promptpilot optimize "continue the refactor" --session repo-refactor --save-context --plain | claude
60
+
61
+ # Clear a session when you're done
62
+ promptpilot optimize --session repo-refactor --clear-session
76
63
  ```
77
64
 
78
- ## Custom local compressor model
65
+ ---
79
66
 
80
- PromptPilot ships a `Modelfile` that defines `promptpilot-compressor`, a text-only compression model built on top of `qwen2.5:3b`. It is tuned to output only the compressed prompt with no reasoning, analysis, or commentary.
67
+ ## Custom compressor model
81
68
 
82
- Build and verify it:
69
+ PromptPilot ships a `Modelfile` that builds `promptpilot-compressor` — a stripped-down Ollama model tuned to output only the rewritten prompt with no extra commentary.
83
70
 
84
71
  ```bash
85
72
  ollama pull qwen2.5:3b
86
73
  ollama create promptpilot-compressor -f ./Modelfile
87
- ollama run promptpilot-compressor "explain recursion simply"
88
74
  ```
89
75
 
90
- Use it via the CLI after installing from npm:
76
+ Use it:
91
77
 
92
78
  ```bash
93
- # Plain output — pipe directly into Claude
94
79
  promptpilot optimize "help me refactor this auth middleware" \
95
80
  --model promptpilot-compressor \
96
81
  --preset code \
97
82
  --plain
98
-
99
- # JSON output with debug info
100
- promptpilot optimize "help me refactor this auth middleware" \
101
- --model promptpilot-compressor \
102
- --preset code \
103
- --json --debug
104
-
105
- # With session memory, piped into Claude
106
- promptpilot optimize "continue the refactor" \
107
- --model promptpilot-compressor \
108
- --session repo-refactor \
109
- --save-context \
110
- --plain | claude
111
83
  ```
112
84
 
113
- `promptpilot-compressor` outputs plain text rather than JSON. PromptPilot detects this automatically and falls back to text-only mode, stripping any reasoning leakage before using the output. Explicit `--model` always takes priority over automatic local model selection.
85
+ ---
114
86
 
115
- ## Core behavior
87
+ ## Downstream model routing
116
88
 
117
- PromptPilot has two distinct routing layers.
89
+ Tell PromptPilot which models you're allowed to use and it picks the best one for the job.
118
90
 
119
- 1. Local optimizer routing
120
-
121
- - Explicit `ollamaModel` or `--model` always wins.
122
- - If exactly one suitable small local model exists, it uses that model directly.
123
- - If multiple suitable small local models exist, a local Qwen router chooses between them.
124
- - If routing cannot complete, PromptPilot falls back to deterministic prompt shaping instead of making a static guess.
125
-
126
- 2. Downstream target routing
91
+ ```bash
92
+ promptpilot optimize "rewrite this for a coding refactor" \
93
+ --task code \
94
+ --preset code \
95
+ --target anthropic:claude-sonnet \
96
+ --target openai:gpt-4.1-mini \
97
+ --target openai:gpt-5-codex \
98
+ --target-hint coding \
99
+ --target-hint refactor \
100
+ --json --debug
101
+ ```
127
102
 
128
- - The caller provides the allowed downstream targets.
129
- - If one target is supplied, PromptPilot selects it directly.
130
- - If multiple targets are supplied, a local Qwen router ranks them and selects the top target.
131
- - Routing is code-first by default: ambiguous prompts bias toward coding-capable and agentic targets.
132
- - If downstream routing fails, PromptPilot still returns an optimized prompt but does not invent a target.
103
+ ---
133
104
 
134
105
  ## Library usage
135
106
 
@@ -156,17 +127,9 @@ console.log(result.finalPrompt);
156
127
  console.log(result.model);
157
128
  ```
158
129
 
159
- ### Code-first downstream routing
130
+ ### Routing across multiple models
160
131
 
161
132
  ```ts
162
- import { createOptimizer } from "promptpilot";
163
-
164
- const optimizer = createOptimizer({
165
- provider: "ollama",
166
- host: "http://localhost:11434",
167
- contextStore: "local"
168
- });
169
-
170
133
  const result = await optimizer.optimize({
171
134
  prompt: "rewrite this prompt for a coding refactor task",
172
135
  task: "code",
@@ -205,7 +168,7 @@ console.log(result.rankedTargets);
205
168
  console.log(result.routingReason);
206
169
  ```
207
170
 
208
- ### Lightweight writing still works
171
+ ### Non-coding tasks work too
209
172
 
210
173
  ```ts
211
174
  const result = await optimizer.optimize({
@@ -233,53 +196,7 @@ const result = await optimizer.optimize({
233
196
  console.log(result.selectedTarget);
234
197
  ```
235
198
 
236
- ## Claude CLI usage
237
-
238
- Plain shell output:
239
-
240
- ```bash
241
- promptpilot optimize "help me debug this failing CI job" --task code --preset code --plain
242
- ```
243
-
244
- Pipe directly into Claude CLI:
245
-
246
- ```bash
247
- promptpilot optimize "continue working on this refactor" --session repo-refactor --save-context --plain | claude
248
- ```
249
-
250
- Route against an allowlist of downstream targets:
251
-
252
- ```bash
253
- promptpilot optimize "rewrite this prompt for a coding refactor task" \
254
- --task code \
255
- --preset code \
256
- --target anthropic:claude-sonnet \
257
- --target openai:gpt-4.1-mini \
258
- --target openai:gpt-5-codex \
259
- --target-hint coding \
260
- --target-hint refactor \
261
- --json --debug
262
- ```
263
-
264
- Use stdin in a pipeline:
265
-
266
- ```bash
267
- cat notes.txt | promptpilot optimize --task summarization --plain | claude
268
- ```
269
-
270
- Save context between calls:
271
-
272
- ```bash
273
- promptpilot optimize "continue my debugger plan" --session ci-fix --save-context --plain
274
- ```
275
-
276
- Clear a session:
277
-
278
- ```bash
279
- promptpilot optimize --session ci-fix --clear-session
280
- ```
281
-
282
- Node `child_process` example:
199
+ ### Node child_process pipeline
283
200
 
284
201
  ```ts
285
202
  import { spawn } from "node:child_process";
@@ -287,8 +204,7 @@ import { spawn } from "node:child_process";
287
204
  const promptpilot = spawn("promptpilot", [
288
205
  "optimize",
289
206
  "continue working on this repo refactor",
290
- "--session",
291
- "repo-refactor",
207
+ "--session", "repo-refactor",
292
208
  "--save-context",
293
209
  "--plain"
294
210
  ]);
@@ -297,147 +213,93 @@ const claude = spawn("claude", [], { stdio: ["pipe", "inherit", "inherit"] });
297
213
  promptpilot.stdout.pipe(claude.stdin);
298
214
  ```
299
215
 
300
- ## Session context
216
+ ---
217
+
218
+ ## CLI flags
219
+
220
+ | Flag | What it does |
221
+ |---|---|
222
+ | `--session <id>` | Name the session for persistent memory |
223
+ | `--save-context` | Write this turn back into the session |
224
+ | `--clear-session` | Wipe a session and start fresh |
225
+ | `--no-context` | Ignore session history for this call |
226
+ | `--model <name>` | Use a specific local Ollama model |
227
+ | `--preset <preset>` | Prompt style: `code`, `email`, `essay`, `support`, `summarization`, `chat` |
228
+ | `--mode <mode>` | Rewrite mode: `clarity`, `concise`, `detailed`, `structured`, `persuasive`, `compress`, `claude_cli` |
229
+ | `--task <task>` | Task hint passed to the optimizer |
230
+ | `--tone <tone>` | Tone hint passed to the optimizer |
231
+ | `--target <provider:model>` | Add a downstream model to the routing pool (repeatable) |
232
+ | `--target-hint <value>` | Capability hint for routing (repeatable) |
233
+ | `--routing-priority <value>` | `cheapest_adequate`, `best_quality`, or `fastest_adequate` |
234
+ | `--routing-top-k <n>` | How many ranked targets to return |
235
+ | `--workload-bias <value>` | `code_first` to bias routing toward coding models |
236
+ | `--no-routing` | Skip downstream routing entirely |
237
+ | `--plain` | Output the final prompt as plain text |
238
+ | `--json` | Output full result as JSON |
239
+ | `--debug` | Include routing and optimization details in output |
240
+ | `--host <url>` | Ollama host (default: `http://localhost:11434`) |
241
+ | `--store <local\|sqlite>` | Session storage backend |
242
+ | `--storage-dir <path>` | Custom path for session files |
243
+ | `--sqlite-path <path>` | Path to SQLite database file |
244
+ | `--max-total-tokens <n>` | Token budget for the full composed prompt |
245
+ | `--max-context-tokens <n>` | Token budget for retrieved session context |
246
+ | `--max-input-tokens <n>` | Token budget for the incoming prompt |
247
+ | `--timeout <ms>` | Ollama request timeout in milliseconds |
248
+ | `--bypass-optimization` | Skip Ollama and pass the prompt through as-is |
249
+ | `--pin-constraint <text>` | Add a pinned constraint (repeatable) |
250
+ | `--tag <value>` | Tag this session entry (repeatable) |
251
+ | `--output-format <text>` | Output format hint |
252
+ | `--max-length <n>` | Max length hint for the rewritten prompt |
253
+ | `--target-model <name>` | Alternate flag for downstream model name |
254
+
255
+ If no prompt text is given, `promptpilot optimize` reads from stdin.
256
+
257
+ ---
258
+
259
+ ## How local model selection works
260
+
261
+ PromptPilot prefers small Ollama models (≤ 4B params). If only one suitable model is installed, it uses it directly. If multiple are installed, a local Qwen router picks the best one for the task. Explicit `--model` always overrides this.
262
+
263
+ Default preference order:
264
+
265
+ 1. `qwen2.5:3b`
266
+ 2. `phi3:mini`
267
+ 3. `llama3.2:3b`
268
+
269
+ If Ollama is unavailable or times out, PromptPilot falls back to deterministic prompt shaping (whitespace cleanup, mode-specific wrappers) instead of failing outright.
270
+
271
+ ---
272
+
273
+ ## Exports
301
274
 
302
- If you pass a `sessionId`, PromptPilot stores session entries in a local store. The default store is JSON under `~/.promptpilot/sessions`. SQLite is also supported when `node:sqlite` or `better-sqlite3` is available.
303
-
304
- Each session stores:
305
-
306
- - user prompts
307
- - optimized prompts
308
- - final prompts
309
- - extracted constraints
310
- - summaries
311
- - timestamps
312
- - optional tags
313
-
314
- Context retrieval prefers:
315
-
316
- - pinned constraints
317
- - task goals
318
- - recent relevant turns
319
- - named entities and recurring references
320
- - stored summaries when budgets are tight
321
-
322
- ## Token reduction
323
-
324
- PromptPilot estimates token usage for:
325
-
326
- - the new prompt
327
- - retrieved context
328
- - the final composed prompt
329
-
330
- Budgets:
275
+ ```ts
276
+ import {
277
+ createOptimizer,
278
+ optimizePrompt,
279
+ PromptOptimizer,
280
+ OllamaClient,
281
+ FileSessionStore,
282
+ SQLiteSessionStore
283
+ } from "promptpilot";
284
+ ```
331
285
 
332
- - `maxInputTokens`
333
- - `maxContextTokens`
334
- - `maxTotalTokens`
286
+ Key fields on the result object:
335
287
 
336
- When the budget is too large, PromptPilot compresses or summarizes old context, preserves high-signal instructions, and drops low-value context before composing the final prompt.
288
+ | Field | Description |
289
+ |---|---|
290
+ | `optimizedPrompt` | The rewritten prompt from the local model |
291
+ | `finalPrompt` | The composed prompt including context |
292
+ | `selectedTarget` | The downstream model chosen by the router |
293
+ | `rankedTargets` | All targets ranked by the router |
294
+ | `routingReason` | Why the top target was selected |
295
+ | `routingWarnings` | Any issues the router flagged |
296
+ | `provider` | Which provider ran the optimization (`ollama` or `heuristic`) |
297
+ | `model` | Which local model was used |
298
+ | `estimatedTokensBefore` | Token estimate before optimization |
299
+ | `estimatedTokensAfter` | Token estimate after optimization |
337
300
 
338
- ## CLI
301
+ ---
339
302
 
340
- ```bash
341
- promptpilot optimize "rewrite this prompt for a coding refactor task"
342
- ```
303
+ ## License
343
304
 
344
- Supported flags:
345
-
346
- - `--session <id>`
347
- - `--model <name>`
348
- - `--mode <mode>`
349
- - `--task <task>`
350
- - `--tone <tone>`
351
- - `--preset <preset>`
352
- - `--target-model <name>`
353
- - `--output-format <text>`
354
- - `--max-length <n>`
355
- - `--tag <value>` repeatable
356
- - `--pin-constraint <text>` repeatable
357
- - `--target <provider:model>` repeatable
358
- - `--target-hint <value>` repeatable
359
- - `--routing-priority <cheapest_adequate|best_quality|fastest_adequate>`
360
- - `--routing-top-k <n>`
361
- - `--workload-bias <code_first>`
362
- - `--no-routing`
363
- - `--host <url>`
364
- - `--store <local|sqlite>`
365
- - `--storage-dir <path>`
366
- - `--sqlite-path <path>`
367
- - `--plain`
368
- - `--json`
369
- - `--debug`
370
- - `--save-context`
371
- - `--no-context`
372
- - `--clear-session`
373
- - `--max-total-tokens <n>`
374
- - `--max-context-tokens <n>`
375
- - `--max-input-tokens <n>`
376
- - `--timeout <ms>`
377
- - `--bypass-optimization`
378
-
379
- If no positional prompt is provided, `promptpilot optimize` reads the raw prompt from stdin.
380
-
381
- ## Public API
382
-
383
- Main exports:
384
-
385
- - `createOptimizer`
386
- - `optimizePrompt`
387
- - `PromptOptimizer`
388
- - `OllamaClient`
389
- - `FileSessionStore`
390
- - `SQLiteSessionStore`
391
-
392
- Useful result fields:
393
-
394
- - `optimizedPrompt`
395
- - `finalPrompt`
396
- - `selectedTarget`
397
- - `rankedTargets`
398
- - `routingReason`
399
- - `routingWarnings`
400
- - `provider`
401
- - `model`
402
- - `estimatedTokensBefore`
403
- - `estimatedTokensAfter`
404
-
405
- Supported modes:
406
-
407
- - `clarity`
408
- - `concise`
409
- - `detailed`
410
- - `structured`
411
- - `persuasive`
412
- - `compress`
413
- - `claude_cli`
414
-
415
- Supported presets:
416
-
417
- - `code`
418
- - `email`
419
- - `essay`
420
- - `support`
421
- - `summarization`
422
- - `chat`
423
-
424
- ## Why the default model was chosen
425
-
426
- `qwen2.5:3b` is the default local preference because it offers a practical balance of:
427
-
428
- - good instruction following
429
- - strong enough reasoning for prompt optimization
430
- - acceptable memory use on laptops
431
- - good performance for code-first workflows
432
-
433
- `phi3:mini` remains a useful lightweight option for shorter non-coding rewrites when it is installed locally and the Qwen router selects it.
434
-
435
- ## Future improvements
436
-
437
- - semantic retrieval for context
438
- - better token counting by target model
439
- - prompt scoring
440
- - local embeddings for relevance search
441
- - response-aware context updates
442
- - cache layer
443
- - benchmark suite
305
+ MIT
package/dist/cli.d.ts CHANGED
@@ -21,6 +21,7 @@ interface CliDependencies {
21
21
  columns?: number;
22
22
  user?: string;
23
23
  };
24
+ spawnClaude?: (prompt: string) => Promise<number>;
24
25
  }
25
26
  declare function runCli(argv: string[], io?: CliIO, dependencies?: CliDependencies): Promise<number>;
26
27