promptpilot 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,30 +1,38 @@
1
1
  # promptpilot
2
2
 
3
- `promptpilot` is a lightweight TypeScript npm package that sits between your app or CLI workflow and a target LLM. It rewrites prompts locally through Ollama when available, stores reusable session context, compresses older turns, and emits a Claude-friendly final prompt for shell pipelines or application code.
3
+ `promptpilot` is a code-first TypeScript package that sits between your app or CLI workflow and a downstream LLM. It optimizes prompts locally through Ollama, keeps lightweight session memory, compresses stale context, and can route each request to the best allowed downstream model for the job.
4
4
 
5
- It is designed for local-first workflows on machines like an 18 GB MacBook. By default, `promptpilot` inspects your local Ollama installation, uses a small local Qwen model as a router when available, and lets that router choose the best installed small optimization model for each prompt. It still lets you override the model manually when needed.
5
+ It is designed for agentic coding workflows first. If a prompt is ambiguous, PromptPilot biases toward coding-capable and tool-capable models. Non-coding tasks like email, support, summarization, and chat are still supported when the prompt makes that intent clear.
6
6
 
7
7
  ## Why local Ollama
8
8
 
9
- - It keeps prompt optimization close to your workflow.
10
- - It reduces external API calls for prompt rewriting.
11
- - It lets you use a small, fast model for compression before sending the final prompt to a stronger remote model like Claude.
12
- - It automatically picks an installed local model that fits a low-memory workflow.
13
- - It uses Qwen to route prompt optimization to the best available small local model when possible.
9
+ - It keeps optimization and routing close to your machine.
10
+ - It uses a small local model before you send anything to a stronger remote model.
11
+ - It avoids paying remote-token costs for every prompt rewrite.
12
+ - It works well on laptops with limited memory by preferring small Ollama models.
13
+ - It uses a local Qwen router when multiple small local models are available.
14
+
15
+ Default local preference is:
16
+
17
+ - `qwen2.5:3b`
18
+ - `phi3:mini`
19
+ - `llama3.2:3b`
14
20
 
15
21
  ## What it does
16
22
 
17
- - Accepts a raw prompt plus optional metadata.
23
+ - Accepts a raw prompt plus optional task metadata.
18
24
  - Persists session context across turns.
19
- - Retrieves relevant prior context for the next prompt.
20
- - Summarizes older context when budgets get tight.
21
- - Preserves critical instructions and constraints.
25
+ - Retrieves and compresses relevant prior context.
26
+ - Preserves pinned constraints and user intent.
22
27
  - Estimates token usage before and after optimization.
23
- - Outputs plain prompt text or structured JSON.
24
- - Works cleanly with Claude CLI shell pipelines.
28
+ - Routes to a caller-supplied downstream model allowlist.
29
+ - Returns a selected target plus a ranked top 3 when routing is enabled.
30
+ - Outputs plain prompt text for shell pipelines or JSON for tooling/debugging.
25
31
 
26
32
  ## Quick start
27
33
 
34
+ Local repo workflow:
35
+
28
36
  ```bash
29
37
  npm install
30
38
  npm run build
@@ -34,30 +42,64 @@ promptpilot optimize "explain binary search simply" --plain
34
42
  promptpilot optimize "continue my study guide" --session dsa --save-context --plain | claude
35
43
  ```
36
44
 
37
- After publishing, install from npm with:
45
+ Install from npm:
38
46
 
39
47
  ```bash
40
48
  npm install -g promptpilot
41
49
  ```
42
50
 
43
- ## Install and build
51
+ Run `promptpilot` with no arguments in an interactive terminal to open the CLI welcome screen:
44
52
 
45
- ```bash
46
- npm install
47
- npm run build
53
+ ```text
54
+ PromptPilot v0.1.x
55
+ ┌──────────────────────────────────────────────────────────────────────────────┐
56
+ │ Welcome back │
57
+ │ │
58
+ │ .-''''-. Launchpad │
59
+ │ .' .--. '. Run promptpilot optimize "..." │
60
+ │ / / oo \ \ Pipe directly into Claude with | claude│
61
+ │ | \_==_/ | │
62
+ │ \ \_/ \_/ / Custom local model │
63
+ │ '._/|__|\_.' Use --model promptpilot-compressor │
64
+ │ │
65
+ │ /Users/you/project Commands │
66
+ │ optimize optimize and route prompts │
67
+ │ --help show the full CLI reference │
68
+ └──────────────────────────────────────────────────────────────────────────────┘
48
69
  ```
49
70
 
50
- Install directly from a local tarball:
71
+ Install one or two small Ollama models so the local router has options:
51
72
 
52
73
  ```bash
53
- npm pack
54
- npm install -g ./promptpilot-0.1.2.tgz
74
+ ollama pull qwen2.5:3b
75
+ ollama pull phi3:mini
55
76
  ```
56
77
 
78
+ ## Core behavior
79
+
80
+ PromptPilot has two distinct routing layers.
81
+
82
+ 1. Local optimizer routing
83
+
84
+ - Explicit `ollamaModel` or `--model` always wins.
85
+ - If exactly one suitable small local model exists, it uses that model directly.
86
+ - If multiple suitable small local models exist, a local Qwen router chooses between them.
87
+ - If routing cannot complete, PromptPilot falls back to deterministic prompt shaping instead of making a static guess.
88
+
89
+ 2. Downstream target routing
90
+
91
+ - The caller provides the allowed downstream targets.
92
+ - If one target is supplied, PromptPilot selects it directly.
93
+ - If multiple targets are supplied, a local Qwen router ranks them and selects the top target.
94
+ - Routing is code-first by default: ambiguous prompts bias toward coding-capable and agentic targets.
95
+ - If downstream routing fails, PromptPilot still returns an optimized prompt but does not invent a target.
96
+
57
97
  ## Library usage
58
98
 
99
+ ### Basic optimization
100
+
59
101
  ```ts
60
- import { createOptimizer, optimizePrompt } from "promptpilot";
102
+ import { createOptimizer } from "promptpilot";
61
103
 
62
104
  const optimizer = createOptimizer({
63
105
  provider: "ollama",
@@ -66,22 +108,92 @@ const optimizer = createOptimizer({
66
108
  });
67
109
 
68
110
  const result = await optimizer.optimize({
69
- prompt: "help me write a better follow up email for a startup internship",
70
- task: "email",
71
- tone: "professional but human",
72
- targetModel: "claude",
73
- sessionId: "internship-search"
111
+ prompt: "help me debug this failing CI job",
112
+ task: "code",
113
+ preset: "code",
114
+ sessionId: "ci-fix",
115
+ saveContext: true
116
+ });
117
+
118
+ console.log(result.finalPrompt);
119
+ console.log(result.model);
120
+ ```
121
+
122
+ ### Code-first downstream routing
123
+
124
+ ```ts
125
+ import { createOptimizer } from "promptpilot";
126
+
127
+ const optimizer = createOptimizer({
128
+ provider: "ollama",
129
+ host: "http://localhost:11434",
130
+ contextStore: "local"
74
131
  });
75
132
 
76
- console.log(result.optimizedPrompt);
133
+ const result = await optimizer.optimize({
134
+ prompt: "rewrite this prompt for a coding refactor task",
135
+ task: "code",
136
+ preset: "code",
137
+ availableTargets: [
138
+ {
139
+ provider: "anthropic",
140
+ model: "claude-sonnet",
141
+ label: "anthropic:claude-sonnet",
142
+ capabilities: ["coding", "writing"],
143
+ costRank: 2
144
+ },
145
+ {
146
+ provider: "openai",
147
+ model: "gpt-4.1-mini",
148
+ label: "openai:gpt-4.1-mini",
149
+ capabilities: ["writing", "chat"],
150
+ costRank: 1
151
+ },
152
+ {
153
+ provider: "openai",
154
+ model: "gpt-5-codex",
155
+ label: "openai:gpt-5-codex",
156
+ capabilities: ["coding", "agentic", "tool_use", "debugging"],
157
+ costRank: 3
158
+ }
159
+ ],
160
+ routingPriority: "cheapest_adequate",
161
+ targetHints: ["coding", "agentic", "refactor"],
162
+ workloadBias: "code_first",
163
+ debug: true
164
+ });
165
+
166
+ console.log(result.selectedTarget);
167
+ console.log(result.rankedTargets);
168
+ console.log(result.routingReason);
169
+ ```
77
170
 
78
- const oneOff = await optimizePrompt({
79
- prompt: "continue working on my essay intro",
80
- task: "essay",
81
- sessionId: "essay1"
171
+ ### Lightweight writing still works
172
+
173
+ ```ts
174
+ const result = await optimizer.optimize({
175
+ prompt: "write a short internship follow-up email",
176
+ task: "email",
177
+ preset: "email",
178
+ availableTargets: [
179
+ {
180
+ provider: "anthropic",
181
+ model: "claude-sonnet",
182
+ label: "anthropic:claude-sonnet",
183
+ capabilities: ["coding", "writing"],
184
+ costRank: 2
185
+ },
186
+ {
187
+ provider: "openai",
188
+ model: "gpt-4.1-mini",
189
+ label: "openai:gpt-4.1-mini",
190
+ capabilities: ["writing", "email", "chat"],
191
+ costRank: 1
192
+ }
193
+ ]
82
194
  });
83
195
 
84
- console.log(oneOff.finalPrompt);
196
+ console.log(result.selectedTarget);
85
197
  ```
86
198
 
87
199
  ## Claude CLI usage
@@ -89,37 +201,45 @@ console.log(oneOff.finalPrompt);
89
201
  Plain shell output:
90
202
 
91
203
  ```bash
92
- promptpilot optimize "help me explain binary search simply" --session study --plain
204
+ promptpilot optimize "help me debug this failing CI job" --task code --preset code --plain
93
205
  ```
94
206
 
95
- Piping into Claude CLI:
207
+ Pipe directly into Claude CLI:
96
208
 
97
209
  ```bash
98
- promptpilot optimize "help me explain binary search simply" --session study --plain | claude
210
+ promptpilot optimize "continue working on this refactor" --session repo-refactor --save-context --plain | claude
99
211
  ```
100
212
 
101
- Using stdin in a shell pipeline:
213
+ Route against an allowlist of downstream targets:
102
214
 
103
215
  ```bash
104
- cat notes.txt | promptpilot optimize --task summarization --plain | claude
216
+ promptpilot optimize "rewrite this prompt for a coding refactor task" \
217
+ --task code \
218
+ --preset code \
219
+ --target anthropic:claude-sonnet \
220
+ --target openai:gpt-4.1-mini \
221
+ --target openai:gpt-5-codex \
222
+ --target-hint coding \
223
+ --target-hint refactor \
224
+ --json --debug
105
225
  ```
106
226
 
107
- Saving context between calls:
227
+ Use stdin in a pipeline:
108
228
 
109
229
  ```bash
110
- promptpilot optimize "continue working on my essay intro" --session essay1 --task essay --save-context --plain
230
+ cat notes.txt | promptpilot optimize --task summarization --plain | claude
111
231
  ```
112
232
 
113
- Debugging token usage:
233
+ Save context between calls:
114
234
 
115
235
  ```bash
116
- promptpilot optimize "summarize these lecture notes" --session notes1 --json --debug
236
+ promptpilot optimize "continue my debugger plan" --session ci-fix --save-context --plain
117
237
  ```
118
238
 
119
- Clearing a session:
239
+ Clear a session:
120
240
 
121
241
  ```bash
122
- promptpilot optimize --session essay1 --clear-session
242
+ promptpilot optimize --session ci-fix --clear-session
123
243
  ```
124
244
 
125
245
  Node `child_process` example:
@@ -127,68 +247,67 @@ Node `child_process` example:
127
247
  ```ts
128
248
  import { spawn } from "node:child_process";
129
249
 
130
- const prompt = spawn("promptpilot", [
250
+ const promptpilot = spawn("promptpilot", [
131
251
  "optimize",
132
- "continue my study guide",
252
+ "continue working on this repo refactor",
133
253
  "--session",
134
- "dsa",
254
+ "repo-refactor",
255
+ "--save-context",
135
256
  "--plain"
136
257
  ]);
137
258
 
138
259
  const claude = spawn("claude", [], { stdio: ["pipe", "inherit", "inherit"] });
139
- prompt.stdout.pipe(claude.stdin);
260
+ promptpilot.stdout.pipe(claude.stdin);
140
261
  ```
141
262
 
142
263
  ## Session context
143
264
 
144
- By default, if you pass a `sessionId`, `promptpilot` stores optimized turns in a local session store. The default store is JSON files under `~/.promptpilot/sessions`. A SQLite store is also available when `node:sqlite` or `better-sqlite3` is present.
145
-
146
- If you do not pass `ollamaModel` or `--model`, `promptpilot` asks Ollama which models are installed and lets a small local Qwen router choose the best small optimizer model for the current prompt. It does not statically rank multiple candidate models anymore. If a suitable Qwen router model is not available when multiple small candidates exist, it falls back to deterministic heuristic prompt optimization instead of making a static model-choice guess. If only oversized local models are available, it also falls back to deterministic heuristic optimization instead of silently using a heavy model.
265
+ If you pass a `sessionId`, PromptPilot stores session entries in a local store. The default store is JSON under `~/.promptpilot/sessions`. SQLite is also supported when `node:sqlite` or `better-sqlite3` is available.
147
266
 
148
267
  Each session stores:
149
268
 
150
- - User prompts
151
- - Optimized prompts
152
- - Final prompts
153
- - Extracted constraints
154
- - Context summaries
155
- - Timestamps
156
- - Optional tags
269
+ - user prompts
270
+ - optimized prompts
271
+ - final prompts
272
+ - extracted constraints
273
+ - summaries
274
+ - timestamps
275
+ - optional tags
157
276
 
158
277
  Context retrieval prefers:
159
278
 
160
- - Pinned constraints
161
- - Task-aligned prior turns
162
- - Recent prompts
163
- - Named entities and recurring references
164
- - Stored summaries when budgets are tight
279
+ - pinned constraints
280
+ - task goals
281
+ - recent relevant turns
282
+ - named entities and recurring references
283
+ - stored summaries when budgets are tight
165
284
 
166
285
  ## Token reduction
167
286
 
168
- `promptpilot` estimates token usage heuristically for:
287
+ PromptPilot estimates token usage for:
169
288
 
170
- - The new prompt
171
- - Retrieved session context
172
- - The final composed prompt
289
+ - the new prompt
290
+ - retrieved context
291
+ - the final composed prompt
173
292
 
174
- You can control the budgets with:
293
+ Budgets:
175
294
 
176
295
  - `maxInputTokens`
177
296
  - `maxContextTokens`
178
297
  - `maxTotalTokens`
179
298
 
180
- When context is too large, it ranks prior turns, preserves high-value constraints, summarizes older context, and drops lower-signal items.
299
+ When the budget is too large, PromptPilot compresses or summarizes old context, preserves high-signal instructions, and drops low-value context before composing the final prompt.
181
300
 
182
301
  ## CLI
183
302
 
184
303
  ```bash
185
- promptpilot optimize "write me a better prompt for asking claude to summarize lecture notes"
304
+ promptpilot optimize "rewrite this prompt for a coding refactor task"
186
305
  ```
187
306
 
188
307
  Supported flags:
189
308
 
190
309
  - `--session <id>`
191
- - `--model <name>` to override auto-selection
310
+ - `--model <name>`
192
311
  - `--mode <mode>`
193
312
  - `--task <task>`
194
313
  - `--tone <tone>`
@@ -198,6 +317,13 @@ Supported flags:
198
317
  - `--max-length <n>`
199
318
  - `--tag <value>` repeatable
200
319
  - `--pin-constraint <text>` repeatable
320
+ - `--target <provider:model>` repeatable
321
+ - `--target-hint <value>` repeatable
322
+ - `--routing-priority <cheapest_adequate|best_quality|fastest_adequate>`
323
+ - `--routing-top-k <n>`
324
+ - `--workload-bias <code_first>`
325
+ - `--no-routing`
326
+ - `--host <url>`
201
327
  - `--store <local|sqlite>`
202
328
  - `--storage-dir <path>`
203
329
  - `--sqlite-path <path>`
@@ -206,13 +332,14 @@ Supported flags:
206
332
  - `--debug`
207
333
  - `--save-context`
208
334
  - `--no-context`
335
+ - `--clear-session`
209
336
  - `--max-total-tokens <n>`
210
337
  - `--max-context-tokens <n>`
211
338
  - `--max-input-tokens <n>`
212
- - `--clear-session`
339
+ - `--timeout <ms>`
213
340
  - `--bypass-optimization`
214
341
 
215
- If no prompt argument is provided, `promptpilot optimize` will read the raw prompt from stdin.
342
+ If no positional prompt is provided, `promptpilot optimize` reads the raw prompt from stdin.
216
343
 
217
344
  ## Public API
218
345
 
@@ -225,6 +352,19 @@ Main exports:
225
352
  - `FileSessionStore`
226
353
  - `SQLiteSessionStore`
227
354
 
355
+ Useful result fields:
356
+
357
+ - `optimizedPrompt`
358
+ - `finalPrompt`
359
+ - `selectedTarget`
360
+ - `rankedTargets`
361
+ - `routingReason`
362
+ - `routingWarnings`
363
+ - `provider`
364
+ - `model`
365
+ - `estimatedTokensBefore`
366
+ - `estimatedTokensAfter`
367
+
228
368
  Supported modes:
229
369
 
230
370
  - `clarity`
@@ -244,41 +384,23 @@ Supported presets:
244
384
  - `summarization`
245
385
  - `chat`
246
386
 
247
- ## File structure
387
+ ## Why the default model was chosen
248
388
 
249
- ```text
250
- src/
251
- index.ts
252
- types.ts
253
- errors.ts
254
- cli.ts
255
- core/
256
- optimizer.ts
257
- ollamaClient.ts
258
- systemPrompt.ts
259
- contextManager.ts
260
- tokenEstimator.ts
261
- contextCompressor.ts
262
- storage/
263
- fileSessionStore.ts
264
- sqliteSessionStore.ts
265
- utils/
266
- validation.ts
267
- logger.ts
268
- json.ts
269
- test/
270
- ```
389
+ `qwen2.5:3b` is the default local preference because it offers a practical balance of:
271
390
 
272
- ## Safety and fallback behavior
391
+ - good instruction following
392
+ - strong enough reasoning for prompt optimization
393
+ - acceptable memory use on laptops
394
+ - good performance for code-first workflows
273
395
 
274
- If Ollama is unavailable, `promptpilot` falls back to a deterministic local formatter that still preserves constraints and emits a Claude-compatible final prompt. Empty prompts are rejected, timeouts are supported, and hard token budget failures throw explicit errors.
396
+ `phi3:mini` remains a useful lightweight option for shorter non-coding rewrites when it is installed locally and the Qwen router selects it.
275
397
 
276
398
  ## Future improvements
277
399
 
278
- - Semantic retrieval for context
279
- - Better token counting by model
280
- - Prompt scoring
281
- - Local embeddings for relevance search
282
- - Response-aware context updates
283
- - Cache layer
284
- - Benchmark suite
400
+ - semantic retrieval for context
401
+ - better token counting by target model
402
+ - prompt scoring
403
+ - local embeddings for relevance search
404
+ - response-aware context updates
405
+ - cache layer
406
+ - benchmark suite
package/dist/cli.d.ts CHANGED
@@ -3,6 +3,8 @@ import { createOptimizer } from './index.js';
3
3
 
4
4
  type CliWriter = {
5
5
  write(message: string): void;
6
+ isTTY?: boolean;
7
+ columns?: number;
6
8
  };
7
9
  interface CliIO {
8
10
  stdout: CliWriter;
@@ -12,6 +14,13 @@ interface CliIO {
12
14
  interface CliDependencies {
13
15
  createOptimizer: typeof createOptimizer;
14
16
  readStdin: (stdin?: NodeJS.ReadStream) => Promise<string>;
17
+ getCliInfo?: (stdout: CliWriter) => {
18
+ cwd: string;
19
+ version: string;
20
+ color: boolean;
21
+ columns?: number;
22
+ user?: string;
23
+ };
15
24
  }
16
25
  declare function runCli(argv: string[], io?: CliIO, dependencies?: CliDependencies): Promise<number>;
17
26