npm - promptpilot - Versions diffs - 0.1.8 → 0.2.2 - Mend

promptpilot 0.1.8 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md CHANGED Viewed

@@ -1,135 +1,106 @@
 # promptpilot
-`promptpilot` is a code-first TypeScript package that sits between your app or CLI workflow and a downstream LLM. It optimizes prompts locally through Ollama, keeps lightweight session memory, compresses stale context, and can route each request to the best allowed downstream model for the job.
+A local prompt optimizer and model router for Claude CLI and agentic LLM workflows.
-It is designed for agentic coding workflows first. If a prompt is ambiguous, PromptPilot biases toward coding-capable and tool-capable models. Non-coding tasks like email, support, summarization, and chat are still supported when the prompt makes that intent clear.
+Before your prompt reaches a remote model, PromptPilot rewrites it locally using a small Ollama model — cutting noise, compressing context, and routing to the right downstream model. No prompt rewrite costs you remote tokens.
-## Why local Ollama
+---
-- It keeps optimization and routing close to your machine.
-- It uses a small local model before you send anything to a stronger remote model.
-- It avoids paying remote-token costs for every prompt rewrite.
-- It works well on laptops with limited memory by preferring small Ollama models.
-- It uses a local Qwen router when multiple small local models are available.
+## Install
-Default local preference is:
+```bash
+npm install -g promptpilot
+```
+Requires [Ollama](https://ollama.com) running locally and Node.js >= 20.10.0.
+Pull at least one small local model:
+```bash
+ollama pull qwen2.5:3b
+```
-- `qwen2.5:3b`
-- `phi3:mini`
-- `llama3.2:3b`
+---
 ## What it does
-- Accepts a raw prompt plus optional task metadata.
-- Persists session context across turns.
-- Retrieves and compresses relevant prior context.
-- Preserves pinned constraints and user intent.
-- Estimates token usage before and after optimization.
-- Routes to a caller-supplied downstream model allowlist.
-- Returns a selected target plus a ranked top 3 when routing is enabled.
-- Outputs plain prompt text for shell pipelines or JSON for tooling/debugging.
+- Rewrites your prompt locally before sending it anywhere
+- Keeps session memory across turns so context carries forward
+- Compresses old context when it gets too long
+- Routes to the best model from a list you provide
+- Outputs plain text for shell pipelines or JSON for tooling
-## Quick start
+---
-Local repo workflow:
+## Quick start
 ```bash
-npm install
-npm run build
-npm test
-ollama pull qwen2.5:3b
+# Optimize a prompt and print the result
 promptpilot optimize "explain binary search simply" --plain
-promptpilot optimize "continue my study guide" --session dsa --save-context --plain | claude
-```
-Install from npm:
+# Pipe directly into Claude
+promptpilot optimize "continue my study guide" --session dsa --save-context --plain | claude
-```bash
-npm install -g promptpilot
+# Read from a file
+cat notes.txt | promptpilot optimize --task summarization --plain | claude
 ```
-Run `promptpilot` with no arguments in an interactive terminal to open the CLI welcome screen:
-```text
- PromptPilot  v0.1.x
-┌──────────────────────────────────────────────────────────────────────────────┐
-│ Welcome back                                                                │
-│                                                                              │
-│       .-''''-.                          Launchpad                            │
-│     .'  .--.  '.                       Run promptpilot optimize "..."        │
-│    /   / oo \   \                      Pipe directly into Claude with | claude│
-│   |    \_==_/    |                                                         │
-│    \  \_/  \_/  /                      Custom local model                    │
-│     '._/|__|\_.'                       Use --model promptpilot-compressor    │
-│                                                                              │
-│ /Users/you/project                     Commands                              │
-│                                        optimize  optimize and route prompts  │
-│                                        --help    show the full CLI reference │
-└──────────────────────────────────────────────────────────────────────────────┘
-```
+---
+## Session memory
-Install one or two small Ollama models so the local router has options:
+Pass `--session <name>` to persist context across calls. PromptPilot stores sessions as JSON under `~/.promptpilot/sessions` by default.
 ```bash
-ollama pull qwen2.5:3b
-ollama pull phi3:mini
+# Save context after each turn
+promptpilot optimize "start a refactor plan" --session repo-refactor --save-context --plain
+# Pick up where you left off
+promptpilot optimize "continue the refactor" --session repo-refactor --save-context --plain | claude
+# Clear a session when you're done
+promptpilot optimize --session repo-refactor --clear-session
 ```
-## Custom local compressor model
+---
-PromptPilot ships a `Modelfile` that defines `promptpilot-compressor`, a text-only compression model built on top of `qwen2.5:3b`. It is tuned to output only the compressed prompt with no reasoning, analysis, or commentary.
+## Custom compressor model
-Build and verify it:
+PromptPilot ships a `Modelfile` that builds `promptpilot-compressor` — a stripped-down Ollama model tuned to output only the rewritten prompt with no extra commentary.
 ```bash
 ollama pull qwen2.5:3b
 ollama create promptpilot-compressor -f ./Modelfile
-ollama run promptpilot-compressor "explain recursion simply"
 ```
-Use it via the CLI after installing from npm:
+Use it:
 ```bash
-# Plain output — pipe directly into Claude
 promptpilot optimize "help me refactor this auth middleware" \
   --model promptpilot-compressor \
   --preset code \
   --plain
-# JSON output with debug info
-promptpilot optimize "help me refactor this auth middleware" \
-  --model promptpilot-compressor \
-  --preset code \
-  --json --debug
-# With session memory, piped into Claude
-promptpilot optimize "continue the refactor" \
-  --model promptpilot-compressor \
-  --session repo-refactor \
-  --save-context \
-  --plain | claude
 ```
-`promptpilot-compressor` outputs plain text rather than JSON. PromptPilot detects this automatically and falls back to text-only mode, stripping any reasoning leakage before using the output. Explicit `--model` always takes priority over automatic local model selection.
+---
-## Core behavior
+## Downstream model routing
-PromptPilot has two distinct routing layers.
+Tell PromptPilot which models you're allowed to use and it picks the best one for the job.
-1. Local optimizer routing
-- Explicit `ollamaModel` or `--model` always wins.
-- If exactly one suitable small local model exists, it uses that model directly.
-- If multiple suitable small local models exist, a local Qwen router chooses between them.
-- If routing cannot complete, PromptPilot falls back to deterministic prompt shaping instead of making a static guess.
-2. Downstream target routing
+```bash
+promptpilot optimize "rewrite this for a coding refactor" \
+  --task code \
+  --preset code \
+  --target anthropic:claude-sonnet \
+  --target openai:gpt-4.1-mini \
+  --target openai:gpt-5-codex \
+  --target-hint coding \
+  --target-hint refactor \
+  --json --debug
+```
-- The caller provides the allowed downstream targets.
-- If one target is supplied, PromptPilot selects it directly.
-- If multiple targets are supplied, a local Qwen router ranks them and selects the top target.
-- Routing is code-first by default: ambiguous prompts bias toward coding-capable and agentic targets.
-- If downstream routing fails, PromptPilot still returns an optimized prompt but does not invent a target.
+---
 ## Library usage
@@ -156,17 +127,9 @@ console.log(result.finalPrompt);
 console.log(result.model);
 ```
-### Code-first downstream routing
+### Routing across multiple models
 ```ts
-import { createOptimizer } from "promptpilot";
-const optimizer = createOptimizer({
-  provider: "ollama",
-  host: "http://localhost:11434",
-  contextStore: "local"
-});
 const result = await optimizer.optimize({
   prompt: "rewrite this prompt for a coding refactor task",
   task: "code",
@@ -205,7 +168,7 @@ console.log(result.rankedTargets);
 console.log(result.routingReason);
 ```
-### Lightweight writing still works
+### Non-coding tasks work too
 ```ts
 const result = await optimizer.optimize({
@@ -233,53 +196,7 @@ const result = await optimizer.optimize({
 console.log(result.selectedTarget);
 ```
-## Claude CLI usage
-Plain shell output:
-```bash
-promptpilot optimize "help me debug this failing CI job" --task code --preset code --plain
-```
-Pipe directly into Claude CLI:
-```bash
-promptpilot optimize "continue working on this refactor" --session repo-refactor --save-context --plain | claude
-```
-Route against an allowlist of downstream targets:
-```bash
-promptpilot optimize "rewrite this prompt for a coding refactor task" \
-  --task code \
-  --preset code \
-  --target anthropic:claude-sonnet \
-  --target openai:gpt-4.1-mini \
-  --target openai:gpt-5-codex \
-  --target-hint coding \
-  --target-hint refactor \
-  --json --debug
-```
-Use stdin in a pipeline:
-```bash
-cat notes.txt | promptpilot optimize --task summarization --plain | claude
-```
-Save context between calls:
-```bash
-promptpilot optimize "continue my debugger plan" --session ci-fix --save-context --plain
-```
-Clear a session:
-```bash
-promptpilot optimize --session ci-fix --clear-session
-```
-Node `child_process` example:
+### Node child_process pipeline
 ```ts
 import { spawn } from "node:child_process";
@@ -287,8 +204,7 @@ import { spawn } from "node:child_process";
 const promptpilot = spawn("promptpilot", [
   "optimize",
   "continue working on this repo refactor",
-  "--session",
-  "repo-refactor",
+  "--session", "repo-refactor",
   "--save-context",
   "--plain"
 ]);
@@ -297,147 +213,93 @@ const claude = spawn("claude", [], { stdio: ["pipe", "inherit", "inherit"] });
 promptpilot.stdout.pipe(claude.stdin);
 ```
-## Session context
+---
+## CLI flags
+| Flag | What it does |
+|---|---|
+| `--session <id>` | Name the session for persistent memory |
+| `--save-context` | Write this turn back into the session |
+| `--clear-session` | Wipe a session and start fresh |
+| `--no-context` | Ignore session history for this call |
+| `--model <name>` | Use a specific local Ollama model |
+| `--preset <preset>` | Prompt style: `code`, `email`, `essay`, `support`, `summarization`, `chat` |
+| `--mode <mode>` | Rewrite mode: `clarity`, `concise`, `detailed`, `structured`, `persuasive`, `compress`, `claude_cli` |
+| `--task <task>` | Task hint passed to the optimizer |
+| `--tone <tone>` | Tone hint passed to the optimizer |
+| `--target <provider:model>` | Add a downstream model to the routing pool (repeatable) |
+| `--target-hint <value>` | Capability hint for routing (repeatable) |
+| `--routing-priority <value>` | `cheapest_adequate`, `best_quality`, or `fastest_adequate` |
+| `--routing-top-k <n>` | How many ranked targets to return |
+| `--workload-bias <value>` | `code_first` to bias routing toward coding models |
+| `--no-routing` | Skip downstream routing entirely |
+| `--plain` | Output the final prompt as plain text |
+| `--json` | Output full result as JSON |
+| `--debug` | Include routing and optimization details in output |
+| `--host <url>` | Ollama host (default: `http://localhost:11434`) |
+| `--store <local\|sqlite>` | Session storage backend |
+| `--storage-dir <path>` | Custom path for session files |
+| `--sqlite-path <path>` | Path to SQLite database file |
+| `--max-total-tokens <n>` | Token budget for the full composed prompt |
+| `--max-context-tokens <n>` | Token budget for retrieved session context |
+| `--max-input-tokens <n>` | Token budget for the incoming prompt |
+| `--timeout <ms>` | Ollama request timeout in milliseconds |
+| `--bypass-optimization` | Skip Ollama and pass the prompt through as-is |
+| `--pin-constraint <text>` | Add a pinned constraint (repeatable) |
+| `--tag <value>` | Tag this session entry (repeatable) |
+| `--output-format <text>` | Output format hint |
+| `--max-length <n>` | Max length hint for the rewritten prompt |
+| `--target-model <name>` | Alternate flag for downstream model name |
+If no prompt text is given, `promptpilot optimize` reads from stdin.
+---
+## How local model selection works
+PromptPilot prefers small Ollama models (≤ 4B params). If only one suitable model is installed, it uses it directly. If multiple are installed, a local Qwen router picks the best one for the task. Explicit `--model` always overrides this.
+Default preference order:
+1. `qwen2.5:3b`
+2. `phi3:mini`
+3. `llama3.2:3b`
+If Ollama is unavailable or times out, PromptPilot falls back to deterministic prompt shaping (whitespace cleanup, mode-specific wrappers) instead of failing outright.
+---
+## Exports
-If you pass a `sessionId`, PromptPilot stores session entries in a local store. The default store is JSON under `~/.promptpilot/sessions`. SQLite is also supported when `node:sqlite` or `better-sqlite3` is available.
-Each session stores:
-- user prompts
-- optimized prompts
-- final prompts
-- extracted constraints
-- summaries
-- timestamps
-- optional tags
-Context retrieval prefers:
-- pinned constraints
-- task goals
-- recent relevant turns
-- named entities and recurring references
-- stored summaries when budgets are tight
-## Token reduction
-PromptPilot estimates token usage for:
-- the new prompt
-- retrieved context
-- the final composed prompt
-Budgets:
+```ts
+import {
+  createOptimizer,
+  optimizePrompt,
+  PromptOptimizer,
+  OllamaClient,
+  FileSessionStore,
+  SQLiteSessionStore
+} from "promptpilot";
+```
-- `maxInputTokens`
-- `maxContextTokens`
-- `maxTotalTokens`
+Key fields on the result object:
-When the budget is too large, PromptPilot compresses or summarizes old context, preserves high-signal instructions, and drops low-value context before composing the final prompt.
+| Field | Description |
+|---|---|
+| `optimizedPrompt` | The rewritten prompt from the local model |
+| `finalPrompt` | The composed prompt including context |
+| `selectedTarget` | The downstream model chosen by the router |
+| `rankedTargets` | All targets ranked by the router |
+| `routingReason` | Why the top target was selected |
+| `routingWarnings` | Any issues the router flagged |
+| `provider` | Which provider ran the optimization (`ollama` or `heuristic`) |
+| `model` | Which local model was used |
+| `estimatedTokensBefore` | Token estimate before optimization |
+| `estimatedTokensAfter` | Token estimate after optimization |
-## CLI
+---
-```bash
-promptpilot optimize "rewrite this prompt for a coding refactor task"
-```
+## License
-Supported flags:
-- `--session <id>`
-- `--model <name>`
-- `--mode <mode>`
-- `--task <task>`
-- `--tone <tone>`
-- `--preset <preset>`
-- `--target-model <name>`
-- `--output-format <text>`
-- `--max-length <n>`
-- `--tag <value>` repeatable
-- `--pin-constraint <text>` repeatable
-- `--target <provider:model>` repeatable
-- `--target-hint <value>` repeatable
-- `--routing-priority <cheapest_adequate|best_quality|fastest_adequate>`
-- `--routing-top-k <n>`
-- `--workload-bias <code_first>`
-- `--no-routing`
-- `--host <url>`
-- `--store <local|sqlite>`
-- `--storage-dir <path>`
-- `--sqlite-path <path>`
-- `--plain`
-- `--json`
-- `--debug`
-- `--save-context`
-- `--no-context`
-- `--clear-session`
-- `--max-total-tokens <n>`
-- `--max-context-tokens <n>`
-- `--max-input-tokens <n>`
-- `--timeout <ms>`
-- `--bypass-optimization`
-If no positional prompt is provided, `promptpilot optimize` reads the raw prompt from stdin.
-## Public API
-Main exports:
-- `createOptimizer`
-- `optimizePrompt`
-- `PromptOptimizer`
-- `OllamaClient`
-- `FileSessionStore`
-- `SQLiteSessionStore`
-Useful result fields:
-- `optimizedPrompt`
-- `finalPrompt`
-- `selectedTarget`
-- `rankedTargets`
-- `routingReason`
-- `routingWarnings`
-- `provider`
-- `model`
-- `estimatedTokensBefore`
-- `estimatedTokensAfter`
-Supported modes:
-- `clarity`
-- `concise`
-- `detailed`
-- `structured`
-- `persuasive`
-- `compress`
-- `claude_cli`
-Supported presets:
-- `code`
-- `email`
-- `essay`
-- `support`
-- `summarization`
-- `chat`
-## Why the default model was chosen
-`qwen2.5:3b` is the default local preference because it offers a practical balance of:
-- good instruction following
-- strong enough reasoning for prompt optimization
-- acceptable memory use on laptops
-- good performance for code-first workflows
-`phi3:mini` remains a useful lightweight option for shorter non-coding rewrites when it is installed locally and the Qwen router selects it.
-## Future improvements
-- semantic retrieval for context
-- better token counting by target model
-- prompt scoring
-- local embeddings for relevance search
-- response-aware context updates
-- cache layer
-- benchmark suite
+MIT

package/dist/cli.d.ts CHANGED Viewed

@@ -21,6 +21,7 @@ interface CliDependencies {
         columns?: number;
         user?: string;
     };
+    spawnClaude?: (prompt: string) => Promise<number>;
 }
 declare function runCli(argv: string[], io?: CliIO, dependencies?: CliDependencies): Promise<number>;