promptpilot 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,29 +1,38 @@
1
1
  # promptpilot
2
2
 
3
- `promptpilot` is a lightweight TypeScript npm package that sits between your app or CLI workflow and a target LLM. It rewrites prompts locally through Ollama when available, stores reusable session context, compresses older turns, and emits a Claude-friendly final prompt for shell pipelines or application code.
3
+ `promptpilot` is a code-first TypeScript package that sits between your app or CLI workflow and a downstream LLM. It optimizes prompts locally through Ollama, keeps lightweight session memory, compresses stale context, and can route each request to the best allowed downstream model for the job.
4
4
 
5
- It is designed for local-first workflows on machines like an 18 GB MacBook. By default, `promptpilot` inspects your local Ollama installation and auto-selects a small optimization model, preferring `qwen2.5:3b`, `phi3:mini`, and `llama3.2:3b` in that order. The package still lets you override the model manually when needed.
5
+ It is designed for agentic coding workflows first. If a prompt is ambiguous, PromptPilot biases toward coding-capable and tool-capable models. Non-coding tasks like email, support, summarization, and chat are still supported when the prompt makes that intent clear.
6
6
 
7
7
  ## Why local Ollama
8
8
 
9
- - It keeps prompt optimization close to your workflow.
10
- - It reduces external API calls for prompt rewriting.
11
- - It lets you use a small, fast model for compression before sending the final prompt to a stronger remote model like Claude.
12
- - It automatically picks an installed local model that fits a low-memory workflow.
9
+ - It keeps optimization and routing close to your machine.
10
+ - It uses a small local model before you send anything to a stronger remote model.
11
+ - It avoids paying remote-token costs for every prompt rewrite.
12
+ - It works well on laptops with limited memory by preferring small Ollama models.
13
+ - It uses a local Qwen router when multiple small local models are available.
14
+
15
+ Default local preference is:
16
+
17
+ - `qwen2.5:3b`
18
+ - `phi3:mini`
19
+ - `llama3.2:3b`
13
20
 
14
21
  ## What it does
15
22
 
16
- - Accepts a raw prompt plus optional metadata.
23
+ - Accepts a raw prompt plus optional task metadata.
17
24
  - Persists session context across turns.
18
- - Retrieves relevant prior context for the next prompt.
19
- - Summarizes older context when budgets get tight.
20
- - Preserves critical instructions and constraints.
25
+ - Retrieves and compresses relevant prior context.
26
+ - Preserves pinned constraints and user intent.
21
27
  - Estimates token usage before and after optimization.
22
- - Outputs plain prompt text or structured JSON.
23
- - Works cleanly with Claude CLI shell pipelines.
28
+ - Routes to a caller-supplied downstream model allowlist.
29
+ - Returns a selected target plus a ranked top 3 when routing is enabled.
30
+ - Outputs plain prompt text for shell pipelines or JSON for tooling/debugging.
24
31
 
25
32
  ## Quick start
26
33
 
34
+ Local repo workflow:
35
+
27
36
  ```bash
28
37
  npm install
29
38
  npm run build
@@ -33,30 +42,44 @@ promptpilot optimize "explain binary search simply" --plain
33
42
  promptpilot optimize "continue my study guide" --session dsa --save-context --plain | claude
34
43
  ```
35
44
 
36
- After publishing, install from npm with:
45
+ Install from npm:
37
46
 
38
47
  ```bash
39
48
  npm install -g promptpilot
40
49
  ```
41
50
 
42
- ## Install and build
51
+ Install one or two small Ollama models so the local router has options:
43
52
 
44
53
  ```bash
45
- npm install
46
- npm run build
54
+ ollama pull qwen2.5:3b
55
+ ollama pull phi3:mini
47
56
  ```
48
57
 
49
- Install directly from a local tarball:
58
+ ## Core behavior
50
59
 
51
- ```bash
52
- npm pack
53
- npm install -g ./promptpilot-0.1.1.tgz
54
- ```
60
+ PromptPilot has two distinct routing layers.
61
+
62
+ 1. Local optimizer routing
63
+
64
+ - Explicit `ollamaModel` or `--model` always wins.
65
+ - If exactly one suitable small local model exists, it uses that model directly.
66
+ - If multiple suitable small local models exist, a local Qwen router chooses between them.
67
+ - If routing cannot complete, PromptPilot falls back to deterministic prompt shaping instead of making a static guess.
68
+
69
+ 2. Downstream target routing
70
+
71
+ - The caller provides the allowed downstream targets.
72
+ - If one target is supplied, PromptPilot selects it directly.
73
+ - If multiple targets are supplied, a local Qwen router ranks them and selects the top target.
74
+ - Routing is code-first by default: ambiguous prompts bias toward coding-capable and agentic targets.
75
+ - If downstream routing fails, PromptPilot still returns an optimized prompt but does not invent a target.
55
76
 
56
77
  ## Library usage
57
78
 
79
+ ### Basic optimization
80
+
58
81
  ```ts
59
- import { createOptimizer, optimizePrompt } from "promptpilot";
82
+ import { createOptimizer } from "promptpilot";
60
83
 
61
84
  const optimizer = createOptimizer({
62
85
  provider: "ollama",
@@ -65,22 +88,92 @@ const optimizer = createOptimizer({
65
88
  });
66
89
 
67
90
  const result = await optimizer.optimize({
68
- prompt: "help me write a better follow up email for a startup internship",
69
- task: "email",
70
- tone: "professional but human",
71
- targetModel: "claude",
72
- sessionId: "internship-search"
91
+ prompt: "help me debug this failing CI job",
92
+ task: "code",
93
+ preset: "code",
94
+ sessionId: "ci-fix",
95
+ saveContext: true
96
+ });
97
+
98
+ console.log(result.finalPrompt);
99
+ console.log(result.model);
100
+ ```
101
+
102
+ ### Code-first downstream routing
103
+
104
+ ```ts
105
+ import { createOptimizer } from "promptpilot";
106
+
107
+ const optimizer = createOptimizer({
108
+ provider: "ollama",
109
+ host: "http://localhost:11434",
110
+ contextStore: "local"
73
111
  });
74
112
 
75
- console.log(result.optimizedPrompt);
113
+ const result = await optimizer.optimize({
114
+ prompt: "rewrite this prompt for a coding refactor task",
115
+ task: "code",
116
+ preset: "code",
117
+ availableTargets: [
118
+ {
119
+ provider: "anthropic",
120
+ model: "claude-sonnet",
121
+ label: "anthropic:claude-sonnet",
122
+ capabilities: ["coding", "writing"],
123
+ costRank: 2
124
+ },
125
+ {
126
+ provider: "openai",
127
+ model: "gpt-4.1-mini",
128
+ label: "openai:gpt-4.1-mini",
129
+ capabilities: ["writing", "chat"],
130
+ costRank: 1
131
+ },
132
+ {
133
+ provider: "openai",
134
+ model: "gpt-5-codex",
135
+ label: "openai:gpt-5-codex",
136
+ capabilities: ["coding", "agentic", "tool_use", "debugging"],
137
+ costRank: 3
138
+ }
139
+ ],
140
+ routingPriority: "cheapest_adequate",
141
+ targetHints: ["coding", "agentic", "refactor"],
142
+ workloadBias: "code_first",
143
+ debug: true
144
+ });
76
145
 
77
- const oneOff = await optimizePrompt({
78
- prompt: "continue working on my essay intro",
79
- task: "essay",
80
- sessionId: "essay1"
146
+ console.log(result.selectedTarget);
147
+ console.log(result.rankedTargets);
148
+ console.log(result.routingReason);
149
+ ```
150
+
151
+ ### Lightweight writing still works
152
+
153
+ ```ts
154
+ const result = await optimizer.optimize({
155
+ prompt: "write a short internship follow-up email",
156
+ task: "email",
157
+ preset: "email",
158
+ availableTargets: [
159
+ {
160
+ provider: "anthropic",
161
+ model: "claude-sonnet",
162
+ label: "anthropic:claude-sonnet",
163
+ capabilities: ["coding", "writing"],
164
+ costRank: 2
165
+ },
166
+ {
167
+ provider: "openai",
168
+ model: "gpt-4.1-mini",
169
+ label: "openai:gpt-4.1-mini",
170
+ capabilities: ["writing", "email", "chat"],
171
+ costRank: 1
172
+ }
173
+ ]
81
174
  });
82
175
 
83
- console.log(oneOff.finalPrompt);
176
+ console.log(result.selectedTarget);
84
177
  ```
85
178
 
86
179
  ## Claude CLI usage
@@ -88,37 +181,45 @@ console.log(oneOff.finalPrompt);
88
181
  Plain shell output:
89
182
 
90
183
  ```bash
91
- promptpilot optimize "help me explain binary search simply" --session study --plain
184
+ promptpilot optimize "help me debug this failing CI job" --task code --preset code --plain
92
185
  ```
93
186
 
94
- Piping into Claude CLI:
187
+ Pipe directly into Claude CLI:
95
188
 
96
189
  ```bash
97
- promptpilot optimize "help me explain binary search simply" --session study --plain | claude
190
+ promptpilot optimize "continue working on this refactor" --session repo-refactor --save-context --plain | claude
98
191
  ```
99
192
 
100
- Using stdin in a shell pipeline:
193
+ Route against an allowlist of downstream targets:
101
194
 
102
195
  ```bash
103
- cat notes.txt | promptpilot optimize --task summarization --plain | claude
196
+ promptpilot optimize "rewrite this prompt for a coding refactor task" \
197
+ --task code \
198
+ --preset code \
199
+ --target anthropic:claude-sonnet \
200
+ --target openai:gpt-4.1-mini \
201
+ --target openai:gpt-5-codex \
202
+ --target-hint coding \
203
+ --target-hint refactor \
204
+ --json --debug
104
205
  ```
105
206
 
106
- Saving context between calls:
207
+ Use stdin in a pipeline:
107
208
 
108
209
  ```bash
109
- promptpilot optimize "continue working on my essay intro" --session essay1 --task essay --save-context --plain
210
+ cat notes.txt | promptpilot optimize --task summarization --plain | claude
110
211
  ```
111
212
 
112
- Debugging token usage:
213
+ Save context between calls:
113
214
 
114
215
  ```bash
115
- promptpilot optimize "summarize these lecture notes" --session notes1 --json --debug
216
+ promptpilot optimize "continue my debugger plan" --session ci-fix --save-context --plain
116
217
  ```
117
218
 
118
- Clearing a session:
219
+ Clear a session:
119
220
 
120
221
  ```bash
121
- promptpilot optimize --session essay1 --clear-session
222
+ promptpilot optimize --session ci-fix --clear-session
122
223
  ```
123
224
 
124
225
  Node `child_process` example:
@@ -126,68 +227,67 @@ Node `child_process` example:
126
227
  ```ts
127
228
  import { spawn } from "node:child_process";
128
229
 
129
- const prompt = spawn("promptpilot", [
230
+ const promptpilot = spawn("promptpilot", [
130
231
  "optimize",
131
- "continue my study guide",
232
+ "continue working on this repo refactor",
132
233
  "--session",
133
- "dsa",
234
+ "repo-refactor",
235
+ "--save-context",
134
236
  "--plain"
135
237
  ]);
136
238
 
137
239
  const claude = spawn("claude", [], { stdio: ["pipe", "inherit", "inherit"] });
138
- prompt.stdout.pipe(claude.stdin);
240
+ promptpilot.stdout.pipe(claude.stdin);
139
241
  ```
140
242
 
141
243
  ## Session context
142
244
 
143
- By default, if you pass a `sessionId`, `promptpilot` stores optimized turns in a local session store. The default store is JSON files under `~/.promptpilot/sessions`. A SQLite store is also available when `node:sqlite` or `better-sqlite3` is present.
144
-
145
- If you do not pass `ollamaModel` or `--model`, `promptpilot` asks Ollama which models are installed and picks the best small model for the job. For most workflows it prefers `qwen2.5:3b`, then `phi3:mini`, then `llama3.2:3b`. For code-heavy prompts it will prefer `qwen2.5-coder:3b` when that model is installed. If only oversized local models are available, it warns and falls back to deterministic heuristic optimization instead of silently using a heavy model.
245
+ If you pass a `sessionId`, PromptPilot stores session entries in a local store. The default store is JSON under `~/.promptpilot/sessions`. SQLite is also supported when `node:sqlite` or `better-sqlite3` is available.
146
246
 
147
247
  Each session stores:
148
248
 
149
- - User prompts
150
- - Optimized prompts
151
- - Final prompts
152
- - Extracted constraints
153
- - Context summaries
154
- - Timestamps
155
- - Optional tags
249
+ - user prompts
250
+ - optimized prompts
251
+ - final prompts
252
+ - extracted constraints
253
+ - summaries
254
+ - timestamps
255
+ - optional tags
156
256
 
157
257
  Context retrieval prefers:
158
258
 
159
- - Pinned constraints
160
- - Task-aligned prior turns
161
- - Recent prompts
162
- - Named entities and recurring references
163
- - Stored summaries when budgets are tight
259
+ - pinned constraints
260
+ - task goals
261
+ - recent relevant turns
262
+ - named entities and recurring references
263
+ - stored summaries when budgets are tight
164
264
 
165
265
  ## Token reduction
166
266
 
167
- `promptpilot` estimates token usage heuristically for:
267
+ PromptPilot estimates token usage for:
168
268
 
169
- - The new prompt
170
- - Retrieved session context
171
- - The final composed prompt
269
+ - the new prompt
270
+ - retrieved context
271
+ - the final composed prompt
172
272
 
173
- You can control the budgets with:
273
+ Budgets:
174
274
 
175
275
  - `maxInputTokens`
176
276
  - `maxContextTokens`
177
277
  - `maxTotalTokens`
178
278
 
179
- When context is too large, it ranks prior turns, preserves high-value constraints, summarizes older context, and drops lower-signal items.
279
+ When the budget is too large, PromptPilot compresses or summarizes old context, preserves high-signal instructions, and drops low-value context before composing the final prompt.
180
280
 
181
281
  ## CLI
182
282
 
183
283
  ```bash
184
- promptpilot optimize "write me a better prompt for asking claude to summarize lecture notes"
284
+ promptpilot optimize "rewrite this prompt for a coding refactor task"
185
285
  ```
186
286
 
187
287
  Supported flags:
188
288
 
189
289
  - `--session <id>`
190
- - `--model <name>` to override auto-selection
290
+ - `--model <name>`
191
291
  - `--mode <mode>`
192
292
  - `--task <task>`
193
293
  - `--tone <tone>`
@@ -197,6 +297,13 @@ Supported flags:
197
297
  - `--max-length <n>`
198
298
  - `--tag <value>` repeatable
199
299
  - `--pin-constraint <text>` repeatable
300
+ - `--target <provider:model>` repeatable
301
+ - `--target-hint <value>` repeatable
302
+ - `--routing-priority <cheapest_adequate|best_quality|fastest_adequate>`
303
+ - `--routing-top-k <n>`
304
+ - `--workload-bias <code_first>`
305
+ - `--no-routing`
306
+ - `--host <url>`
200
307
  - `--store <local|sqlite>`
201
308
  - `--storage-dir <path>`
202
309
  - `--sqlite-path <path>`
@@ -205,13 +312,14 @@ Supported flags:
205
312
  - `--debug`
206
313
  - `--save-context`
207
314
  - `--no-context`
315
+ - `--clear-session`
208
316
  - `--max-total-tokens <n>`
209
317
  - `--max-context-tokens <n>`
210
318
  - `--max-input-tokens <n>`
211
- - `--clear-session`
319
+ - `--timeout <ms>`
212
320
  - `--bypass-optimization`
213
321
 
214
- If no prompt argument is provided, `promptpilot optimize` will read the raw prompt from stdin.
322
+ If no positional prompt is provided, `promptpilot optimize` reads the raw prompt from stdin.
215
323
 
216
324
  ## Public API
217
325
 
@@ -224,6 +332,19 @@ Main exports:
224
332
  - `FileSessionStore`
225
333
  - `SQLiteSessionStore`
226
334
 
335
+ Useful result fields:
336
+
337
+ - `optimizedPrompt`
338
+ - `finalPrompt`
339
+ - `selectedTarget`
340
+ - `rankedTargets`
341
+ - `routingReason`
342
+ - `routingWarnings`
343
+ - `provider`
344
+ - `model`
345
+ - `estimatedTokensBefore`
346
+ - `estimatedTokensAfter`
347
+
227
348
  Supported modes:
228
349
 
229
350
  - `clarity`
@@ -243,41 +364,23 @@ Supported presets:
243
364
  - `summarization`
244
365
  - `chat`
245
366
 
246
- ## File structure
247
-
248
- ```text
249
- src/
250
- index.ts
251
- types.ts
252
- errors.ts
253
- cli.ts
254
- core/
255
- optimizer.ts
256
- ollamaClient.ts
257
- systemPrompt.ts
258
- contextManager.ts
259
- tokenEstimator.ts
260
- contextCompressor.ts
261
- storage/
262
- fileSessionStore.ts
263
- sqliteSessionStore.ts
264
- utils/
265
- validation.ts
266
- logger.ts
267
- json.ts
268
- test/
269
- ```
367
+ ## Why the default model was chosen
368
+
369
+ `qwen2.5:3b` is the default local preference because it offers a practical balance of:
270
370
 
271
- ## Safety and fallback behavior
371
+ - good instruction following
372
+ - strong enough reasoning for prompt optimization
373
+ - acceptable memory use on laptops
374
+ - good performance for code-first workflows
272
375
 
273
- If Ollama is unavailable, `promptpilot` falls back to a deterministic local formatter that still preserves constraints and emits a Claude-compatible final prompt. Empty prompts are rejected, timeouts are supported, and hard token budget failures throw explicit errors.
376
+ `phi3:mini` remains a useful lightweight option for shorter non-coding rewrites when it is installed locally and the Qwen router selects it.
274
377
 
275
378
  ## Future improvements
276
379
 
277
- - Semantic retrieval for context
278
- - Better token counting by model
279
- - Prompt scoring
280
- - Local embeddings for relevance search
281
- - Response-aware context updates
282
- - Cache layer
283
- - Benchmark suite
380
+ - semantic retrieval for context
381
+ - better token counting by target model
382
+ - prompt scoring
383
+ - local embeddings for relevance search
384
+ - response-aware context updates
385
+ - cache layer
386
+ - benchmark suite