promptpilot 0.1.2 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,30 +1,38 @@
1
1
  # promptpilot
2
2
 
3
- `promptpilot` is a lightweight TypeScript npm package that sits between your app or CLI workflow and a target LLM. It rewrites prompts locally through Ollama when available, stores reusable session context, compresses older turns, and emits a Claude-friendly final prompt for shell pipelines or application code.
3
+ `promptpilot` is a code-first TypeScript package that sits between your app or CLI workflow and a downstream LLM. It optimizes prompts locally through Ollama, keeps lightweight session memory, compresses stale context, and can route each request to the best allowed downstream model for the job.
4
4
 
5
- It is designed for local-first workflows on machines like an 18 GB MacBook. By default, `promptpilot` inspects your local Ollama installation, uses a small local Qwen model as a router when available, and lets that router choose the best installed small optimization model for each prompt. It still lets you override the model manually when needed.
5
+ It is designed for agentic coding workflows first. If a prompt is ambiguous, PromptPilot biases toward coding-capable and tool-capable models. Non-coding tasks like email, support, summarization, and chat are still supported when the prompt makes that intent clear.
6
6
 
7
7
  ## Why local Ollama
8
8
 
9
- - It keeps prompt optimization close to your workflow.
10
- - It reduces external API calls for prompt rewriting.
11
- - It lets you use a small, fast model for compression before sending the final prompt to a stronger remote model like Claude.
12
- - It automatically picks an installed local model that fits a low-memory workflow.
13
- - It uses Qwen to route prompt optimization to the best available small local model when possible.
9
+ - It keeps optimization and routing close to your machine.
10
+ - It uses a small local model before you send anything to a stronger remote model.
11
+ - It avoids paying remote-token costs for every prompt rewrite.
12
+ - It works well on laptops with limited memory by preferring small Ollama models.
13
+ - It uses a local Qwen router when multiple small local models are available.
14
+
15
+ Default local preference is:
16
+
17
+ - `qwen2.5:3b`
18
+ - `phi3:mini`
19
+ - `llama3.2:3b`
14
20
 
15
21
  ## What it does
16
22
 
17
- - Accepts a raw prompt plus optional metadata.
23
+ - Accepts a raw prompt plus optional task metadata.
18
24
  - Persists session context across turns.
19
- - Retrieves relevant prior context for the next prompt.
20
- - Summarizes older context when budgets get tight.
21
- - Preserves critical instructions and constraints.
25
+ - Retrieves and compresses relevant prior context.
26
+ - Preserves pinned constraints and user intent.
22
27
  - Estimates token usage before and after optimization.
23
- - Outputs plain prompt text or structured JSON.
24
- - Works cleanly with Claude CLI shell pipelines.
28
+ - Routes to a caller-supplied downstream model allowlist.
29
+ - Returns a selected target plus a ranked top 3 when routing is enabled.
30
+ - Outputs plain prompt text for shell pipelines or JSON for tooling/debugging.
25
31
 
26
32
  ## Quick start
27
33
 
34
+ Local repo workflow:
35
+
28
36
  ```bash
29
37
  npm install
30
38
  npm run build
@@ -34,30 +42,44 @@ promptpilot optimize "explain binary search simply" --plain
34
42
  promptpilot optimize "continue my study guide" --session dsa --save-context --plain | claude
35
43
  ```
36
44
 
37
- After publishing, install from npm with:
45
+ Install from npm:
38
46
 
39
47
  ```bash
40
48
  npm install -g promptpilot
41
49
  ```
42
50
 
43
- ## Install and build
51
+ Install one or two small Ollama models so the local router has options:
44
52
 
45
53
  ```bash
46
- npm install
47
- npm run build
54
+ ollama pull qwen2.5:3b
55
+ ollama pull phi3:mini
48
56
  ```
49
57
 
50
- Install directly from a local tarball:
58
+ ## Core behavior
51
59
 
52
- ```bash
53
- npm pack
54
- npm install -g ./promptpilot-0.1.2.tgz
55
- ```
60
+ PromptPilot has two distinct routing layers.
61
+
62
+ 1. Local optimizer routing
63
+
64
+ - Explicit `ollamaModel` or `--model` always wins.
65
+ - If exactly one suitable small local model exists, it uses that model directly.
66
+ - If multiple suitable small local models exist, a local Qwen router chooses between them.
67
+ - If routing cannot complete, PromptPilot falls back to deterministic prompt shaping instead of making a static guess.
68
+
69
+ 2. Downstream target routing
70
+
71
+ - The caller provides the allowed downstream targets.
72
+ - If one target is supplied, PromptPilot selects it directly.
73
+ - If multiple targets are supplied, a local Qwen router ranks them and selects the top target.
74
+ - Routing is code-first by default: ambiguous prompts bias toward coding-capable and agentic targets.
75
+ - If downstream routing fails, PromptPilot still returns an optimized prompt but does not invent a target.
56
76
 
57
77
  ## Library usage
58
78
 
79
+ ### Basic optimization
80
+
59
81
  ```ts
60
- import { createOptimizer, optimizePrompt } from "promptpilot";
82
+ import { createOptimizer } from "promptpilot";
61
83
 
62
84
  const optimizer = createOptimizer({
63
85
  provider: "ollama",
@@ -66,22 +88,92 @@ const optimizer = createOptimizer({
66
88
  });
67
89
 
68
90
  const result = await optimizer.optimize({
69
- prompt: "help me write a better follow up email for a startup internship",
70
- task: "email",
71
- tone: "professional but human",
72
- targetModel: "claude",
73
- sessionId: "internship-search"
91
+ prompt: "help me debug this failing CI job",
92
+ task: "code",
93
+ preset: "code",
94
+ sessionId: "ci-fix",
95
+ saveContext: true
96
+ });
97
+
98
+ console.log(result.finalPrompt);
99
+ console.log(result.model);
100
+ ```
101
+
102
+ ### Code-first downstream routing
103
+
104
+ ```ts
105
+ import { createOptimizer } from "promptpilot";
106
+
107
+ const optimizer = createOptimizer({
108
+ provider: "ollama",
109
+ host: "http://localhost:11434",
110
+ contextStore: "local"
74
111
  });
75
112
 
76
- console.log(result.optimizedPrompt);
113
+ const result = await optimizer.optimize({
114
+ prompt: "rewrite this prompt for a coding refactor task",
115
+ task: "code",
116
+ preset: "code",
117
+ availableTargets: [
118
+ {
119
+ provider: "anthropic",
120
+ model: "claude-sonnet",
121
+ label: "anthropic:claude-sonnet",
122
+ capabilities: ["coding", "writing"],
123
+ costRank: 2
124
+ },
125
+ {
126
+ provider: "openai",
127
+ model: "gpt-4.1-mini",
128
+ label: "openai:gpt-4.1-mini",
129
+ capabilities: ["writing", "chat"],
130
+ costRank: 1
131
+ },
132
+ {
133
+ provider: "openai",
134
+ model: "gpt-5-codex",
135
+ label: "openai:gpt-5-codex",
136
+ capabilities: ["coding", "agentic", "tool_use", "debugging"],
137
+ costRank: 3
138
+ }
139
+ ],
140
+ routingPriority: "cheapest_adequate",
141
+ targetHints: ["coding", "agentic", "refactor"],
142
+ workloadBias: "code_first",
143
+ debug: true
144
+ });
77
145
 
78
- const oneOff = await optimizePrompt({
79
- prompt: "continue working on my essay intro",
80
- task: "essay",
81
- sessionId: "essay1"
146
+ console.log(result.selectedTarget);
147
+ console.log(result.rankedTargets);
148
+ console.log(result.routingReason);
149
+ ```
150
+
151
+ ### Lightweight writing still works
152
+
153
+ ```ts
154
+ const result = await optimizer.optimize({
155
+ prompt: "write a short internship follow-up email",
156
+ task: "email",
157
+ preset: "email",
158
+ availableTargets: [
159
+ {
160
+ provider: "anthropic",
161
+ model: "claude-sonnet",
162
+ label: "anthropic:claude-sonnet",
163
+ capabilities: ["coding", "writing"],
164
+ costRank: 2
165
+ },
166
+ {
167
+ provider: "openai",
168
+ model: "gpt-4.1-mini",
169
+ label: "openai:gpt-4.1-mini",
170
+ capabilities: ["writing", "email", "chat"],
171
+ costRank: 1
172
+ }
173
+ ]
82
174
  });
83
175
 
84
- console.log(oneOff.finalPrompt);
176
+ console.log(result.selectedTarget);
85
177
  ```
86
178
 
87
179
  ## Claude CLI usage
@@ -89,37 +181,45 @@ console.log(oneOff.finalPrompt);
89
181
  Plain shell output:
90
182
 
91
183
  ```bash
92
- promptpilot optimize "help me explain binary search simply" --session study --plain
184
+ promptpilot optimize "help me debug this failing CI job" --task code --preset code --plain
93
185
  ```
94
186
 
95
- Piping into Claude CLI:
187
+ Pipe directly into Claude CLI:
96
188
 
97
189
  ```bash
98
- promptpilot optimize "help me explain binary search simply" --session study --plain | claude
190
+ promptpilot optimize "continue working on this refactor" --session repo-refactor --save-context --plain | claude
99
191
  ```
100
192
 
101
- Using stdin in a shell pipeline:
193
+ Route against an allowlist of downstream targets:
102
194
 
103
195
  ```bash
104
- cat notes.txt | promptpilot optimize --task summarization --plain | claude
196
+ promptpilot optimize "rewrite this prompt for a coding refactor task" \
197
+ --task code \
198
+ --preset code \
199
+ --target anthropic:claude-sonnet \
200
+ --target openai:gpt-4.1-mini \
201
+ --target openai:gpt-5-codex \
202
+ --target-hint coding \
203
+ --target-hint refactor \
204
+ --json --debug
105
205
  ```
106
206
 
107
- Saving context between calls:
207
+ Use stdin in a pipeline:
108
208
 
109
209
  ```bash
110
- promptpilot optimize "continue working on my essay intro" --session essay1 --task essay --save-context --plain
210
+ cat notes.txt | promptpilot optimize --task summarization --plain | claude
111
211
  ```
112
212
 
113
- Debugging token usage:
213
+ Save context between calls:
114
214
 
115
215
  ```bash
116
- promptpilot optimize "summarize these lecture notes" --session notes1 --json --debug
216
+ promptpilot optimize "continue my debugger plan" --session ci-fix --save-context --plain
117
217
  ```
118
218
 
119
- Clearing a session:
219
+ Clear a session:
120
220
 
121
221
  ```bash
122
- promptpilot optimize --session essay1 --clear-session
222
+ promptpilot optimize --session ci-fix --clear-session
123
223
  ```
124
224
 
125
225
  Node `child_process` example:
@@ -127,68 +227,67 @@ Node `child_process` example:
127
227
  ```ts
128
228
  import { spawn } from "node:child_process";
129
229
 
130
- const prompt = spawn("promptpilot", [
230
+ const promptpilot = spawn("promptpilot", [
131
231
  "optimize",
132
- "continue my study guide",
232
+ "continue working on this repo refactor",
133
233
  "--session",
134
- "dsa",
234
+ "repo-refactor",
235
+ "--save-context",
135
236
  "--plain"
136
237
  ]);
137
238
 
138
239
  const claude = spawn("claude", [], { stdio: ["pipe", "inherit", "inherit"] });
139
- prompt.stdout.pipe(claude.stdin);
240
+ promptpilot.stdout.pipe(claude.stdin);
140
241
  ```
141
242
 
142
243
  ## Session context
143
244
 
144
- By default, if you pass a `sessionId`, `promptpilot` stores optimized turns in a local session store. The default store is JSON files under `~/.promptpilot/sessions`. A SQLite store is also available when `node:sqlite` or `better-sqlite3` is present.
145
-
146
- If you do not pass `ollamaModel` or `--model`, `promptpilot` asks Ollama which models are installed and lets a small local Qwen router choose the best small optimizer model for the current prompt. It does not statically rank multiple candidate models anymore. If a suitable Qwen router model is not available when multiple small candidates exist, it falls back to deterministic heuristic prompt optimization instead of making a static model-choice guess. If only oversized local models are available, it also falls back to deterministic heuristic optimization instead of silently using a heavy model.
245
+ If you pass a `sessionId`, PromptPilot stores session entries in a local store. The default store is JSON under `~/.promptpilot/sessions`. SQLite is also supported when `node:sqlite` or `better-sqlite3` is available.
147
246
 
148
247
  Each session stores:
149
248
 
150
- - User prompts
151
- - Optimized prompts
152
- - Final prompts
153
- - Extracted constraints
154
- - Context summaries
155
- - Timestamps
156
- - Optional tags
249
+ - user prompts
250
+ - optimized prompts
251
+ - final prompts
252
+ - extracted constraints
253
+ - summaries
254
+ - timestamps
255
+ - optional tags
157
256
 
158
257
  Context retrieval prefers:
159
258
 
160
- - Pinned constraints
161
- - Task-aligned prior turns
162
- - Recent prompts
163
- - Named entities and recurring references
164
- - Stored summaries when budgets are tight
259
+ - pinned constraints
260
+ - task goals
261
+ - recent relevant turns
262
+ - named entities and recurring references
263
+ - stored summaries when budgets are tight
165
264
 
166
265
  ## Token reduction
167
266
 
168
- `promptpilot` estimates token usage heuristically for:
267
+ PromptPilot estimates token usage for:
169
268
 
170
- - The new prompt
171
- - Retrieved session context
172
- - The final composed prompt
269
+ - the new prompt
270
+ - retrieved context
271
+ - the final composed prompt
173
272
 
174
- You can control the budgets with:
273
+ Budgets:
175
274
 
176
275
  - `maxInputTokens`
177
276
  - `maxContextTokens`
178
277
  - `maxTotalTokens`
179
278
 
180
- When context is too large, it ranks prior turns, preserves high-value constraints, summarizes older context, and drops lower-signal items.
279
+ When the budget is too large, PromptPilot compresses or summarizes old context, preserves high-signal instructions, and drops low-value context before composing the final prompt.
181
280
 
182
281
  ## CLI
183
282
 
184
283
  ```bash
185
- promptpilot optimize "write me a better prompt for asking claude to summarize lecture notes"
284
+ promptpilot optimize "rewrite this prompt for a coding refactor task"
186
285
  ```
187
286
 
188
287
  Supported flags:
189
288
 
190
289
  - `--session <id>`
191
- - `--model <name>` to override auto-selection
290
+ - `--model <name>`
192
291
  - `--mode <mode>`
193
292
  - `--task <task>`
194
293
  - `--tone <tone>`
@@ -198,6 +297,13 @@ Supported flags:
198
297
  - `--max-length <n>`
199
298
  - `--tag <value>` repeatable
200
299
  - `--pin-constraint <text>` repeatable
300
+ - `--target <provider:model>` repeatable
301
+ - `--target-hint <value>` repeatable
302
+ - `--routing-priority <cheapest_adequate|best_quality|fastest_adequate>`
303
+ - `--routing-top-k <n>`
304
+ - `--workload-bias <code_first>`
305
+ - `--no-routing`
306
+ - `--host <url>`
201
307
  - `--store <local|sqlite>`
202
308
  - `--storage-dir <path>`
203
309
  - `--sqlite-path <path>`
@@ -206,13 +312,14 @@ Supported flags:
206
312
  - `--debug`
207
313
  - `--save-context`
208
314
  - `--no-context`
315
+ - `--clear-session`
209
316
  - `--max-total-tokens <n>`
210
317
  - `--max-context-tokens <n>`
211
318
  - `--max-input-tokens <n>`
212
- - `--clear-session`
319
+ - `--timeout <ms>`
213
320
  - `--bypass-optimization`
214
321
 
215
- If no prompt argument is provided, `promptpilot optimize` will read the raw prompt from stdin.
322
+ If no positional prompt is provided, `promptpilot optimize` reads the raw prompt from stdin.
216
323
 
217
324
  ## Public API
218
325
 
@@ -225,6 +332,19 @@ Main exports:
225
332
  - `FileSessionStore`
226
333
  - `SQLiteSessionStore`
227
334
 
335
+ Useful result fields:
336
+
337
+ - `optimizedPrompt`
338
+ - `finalPrompt`
339
+ - `selectedTarget`
340
+ - `rankedTargets`
341
+ - `routingReason`
342
+ - `routingWarnings`
343
+ - `provider`
344
+ - `model`
345
+ - `estimatedTokensBefore`
346
+ - `estimatedTokensAfter`
347
+
228
348
  Supported modes:
229
349
 
230
350
  - `clarity`
@@ -244,41 +364,23 @@ Supported presets:
244
364
  - `summarization`
245
365
  - `chat`
246
366
 
247
- ## File structure
248
-
249
- ```text
250
- src/
251
- index.ts
252
- types.ts
253
- errors.ts
254
- cli.ts
255
- core/
256
- optimizer.ts
257
- ollamaClient.ts
258
- systemPrompt.ts
259
- contextManager.ts
260
- tokenEstimator.ts
261
- contextCompressor.ts
262
- storage/
263
- fileSessionStore.ts
264
- sqliteSessionStore.ts
265
- utils/
266
- validation.ts
267
- logger.ts
268
- json.ts
269
- test/
270
- ```
367
+ ## Why the default model was chosen
368
+
369
+ `qwen2.5:3b` is the default local preference because it offers a practical balance of:
271
370
 
272
- ## Safety and fallback behavior
371
+ - good instruction following
372
+ - strong enough reasoning for prompt optimization
373
+ - acceptable memory use on laptops
374
+ - good performance for code-first workflows
273
375
 
274
- If Ollama is unavailable, `promptpilot` falls back to a deterministic local formatter that still preserves constraints and emits a Claude-compatible final prompt. Empty prompts are rejected, timeouts are supported, and hard token budget failures throw explicit errors.
376
+ `phi3:mini` remains a useful lightweight option for shorter non-coding rewrites when it is installed locally and the Qwen router selects it.
275
377
 
276
378
  ## Future improvements
277
379
 
278
- - Semantic retrieval for context
279
- - Better token counting by model
280
- - Prompt scoring
281
- - Local embeddings for relevance search
282
- - Response-aware context updates
283
- - Cache layer
284
- - Benchmark suite
380
+ - semantic retrieval for context
381
+ - better token counting by target model
382
+ - prompt scoring
383
+ - local embeddings for relevance search
384
+ - response-aware context updates
385
+ - cache layer
386
+ - benchmark suite