npm - promptpilot - Versions diffs - 0.1.2 → 0.1.3 - Mend

promptpilot 0.1.2 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md CHANGED Viewed

@@ -1,30 +1,38 @@
 # promptpilot
-`promptpilot` is a lightweight TypeScript npm package that sits between your app or CLI workflow and a target LLM. It rewrites prompts locally through Ollama when available, stores reusable session context, compresses older turns, and emits a Claude-friendly final prompt for shell pipelines or application code.
+`promptpilot` is a code-first TypeScript package that sits between your app or CLI workflow and a downstream LLM. It optimizes prompts locally through Ollama, keeps lightweight session memory, compresses stale context, and can route each request to the best allowed downstream model for the job.
-It is designed for local-first workflows on machines like an 18 GB MacBook. By default, `promptpilot` inspects your local Ollama installation, uses a small local Qwen model as a router when available, and lets that router choose the best installed small optimization model for each prompt. It still lets you override the model manually when needed.
+It is designed for agentic coding workflows first. If a prompt is ambiguous, PromptPilot biases toward coding-capable and tool-capable models. Non-coding tasks like email, support, summarization, and chat are still supported when the prompt makes that intent clear.
 ## Why local Ollama
-- It keeps prompt optimization close to your workflow.
-- It reduces external API calls for prompt rewriting.
-- It lets you use a small, fast model for compression before sending the final prompt to a stronger remote model like Claude.
-- It automatically picks an installed local model that fits a low-memory workflow.
-- It uses Qwen to route prompt optimization to the best available small local model when possible.
+- It keeps optimization and routing close to your machine.
+- It uses a small local model before you send anything to a stronger remote model.
+- It avoids paying remote-token costs for every prompt rewrite.
+- It works well on laptops with limited memory by preferring small Ollama models.
+- It uses a local Qwen router when multiple small local models are available.
+Default local preference is:
+- `qwen2.5:3b`
+- `phi3:mini`
+- `llama3.2:3b`
 ## What it does
-- Accepts a raw prompt plus optional metadata.
+- Accepts a raw prompt plus optional task metadata.
 - Persists session context across turns.
-- Retrieves relevant prior context for the next prompt.
-- Summarizes older context when budgets get tight.
-- Preserves critical instructions and constraints.
+- Retrieves and compresses relevant prior context.
+- Preserves pinned constraints and user intent.
 - Estimates token usage before and after optimization.
-- Outputs plain prompt text or structured JSON.
-- Works cleanly with Claude CLI shell pipelines.
+- Routes to a caller-supplied downstream model allowlist.
+- Returns a selected target plus a ranked top 3 when routing is enabled.
+- Outputs plain prompt text for shell pipelines or JSON for tooling/debugging.
 ## Quick start
+Local repo workflow:
 ```bash
 npm install
 npm run build
@@ -34,30 +42,44 @@ promptpilot optimize "explain binary search simply" --plain
 promptpilot optimize "continue my study guide" --session dsa --save-context --plain | claude
 ```
-After publishing, install from npm with:
+Install from npm:
 ```bash
 npm install -g promptpilot
 ```
-## Install and build
+Install one or two small Ollama models so the local router has options:
 ```bash
-npm install
-npm run build
+ollama pull qwen2.5:3b
+ollama pull phi3:mini
 ```
-Install directly from a local tarball:
+## Core behavior
-```bash
-npm pack
-npm install -g ./promptpilot-0.1.2.tgz
-```
+PromptPilot has two distinct routing layers.
+1. Local optimizer routing
+- Explicit `ollamaModel` or `--model` always wins.
+- If exactly one suitable small local model exists, it uses that model directly.
+- If multiple suitable small local models exist, a local Qwen router chooses between them.
+- If routing cannot complete, PromptPilot falls back to deterministic prompt shaping instead of making a static guess.
+2. Downstream target routing
+- The caller provides the allowed downstream targets.
+- If one target is supplied, PromptPilot selects it directly.
+- If multiple targets are supplied, a local Qwen router ranks them and selects the top target.
+- Routing is code-first by default: ambiguous prompts bias toward coding-capable and agentic targets.
+- If downstream routing fails, PromptPilot still returns an optimized prompt but does not invent a target.
 ## Library usage
+### Basic optimization
 ```ts
-import { createOptimizer, optimizePrompt } from "promptpilot";
+import { createOptimizer } from "promptpilot";
 const optimizer = createOptimizer({
   provider: "ollama",
@@ -66,22 +88,92 @@ const optimizer = createOptimizer({
 });
 const result = await optimizer.optimize({
-  prompt: "help me write a better follow up email for a startup internship",
-  task: "email",
-  tone: "professional but human",
-  targetModel: "claude",
-  sessionId: "internship-search"
+  prompt: "help me debug this failing CI job",
+  task: "code",
+  preset: "code",
+  sessionId: "ci-fix",
+  saveContext: true
+});
+console.log(result.finalPrompt);
+console.log(result.model);
+```
+### Code-first downstream routing
+```ts
+import { createOptimizer } from "promptpilot";
+const optimizer = createOptimizer({
+  provider: "ollama",
+  host: "http://localhost:11434",
+  contextStore: "local"
 });
-console.log(result.optimizedPrompt);
+const result = await optimizer.optimize({
+  prompt: "rewrite this prompt for a coding refactor task",
+  task: "code",
+  preset: "code",
+  availableTargets: [
+    {
+      provider: "anthropic",
+      model: "claude-sonnet",
+      label: "anthropic:claude-sonnet",
+      capabilities: ["coding", "writing"],
+      costRank: 2
+    },
+    {
+      provider: "openai",
+      model: "gpt-4.1-mini",
+      label: "openai:gpt-4.1-mini",
+      capabilities: ["writing", "chat"],
+      costRank: 1
+    },
+    {
+      provider: "openai",
+      model: "gpt-5-codex",
+      label: "openai:gpt-5-codex",
+      capabilities: ["coding", "agentic", "tool_use", "debugging"],
+      costRank: 3
+    }
+  ],
+  routingPriority: "cheapest_adequate",
+  targetHints: ["coding", "agentic", "refactor"],
+  workloadBias: "code_first",
+  debug: true
+});
-const oneOff = await optimizePrompt({
-  prompt: "continue working on my essay intro",
-  task: "essay",
-  sessionId: "essay1"
+console.log(result.selectedTarget);
+console.log(result.rankedTargets);
+console.log(result.routingReason);
+```
+### Lightweight writing still works
+```ts
+const result = await optimizer.optimize({
+  prompt: "write a short internship follow-up email",
+  task: "email",
+  preset: "email",
+  availableTargets: [
+    {
+      provider: "anthropic",
+      model: "claude-sonnet",
+      label: "anthropic:claude-sonnet",
+      capabilities: ["coding", "writing"],
+      costRank: 2
+    },
+    {
+      provider: "openai",
+      model: "gpt-4.1-mini",
+      label: "openai:gpt-4.1-mini",
+      capabilities: ["writing", "email", "chat"],
+      costRank: 1
+    }
+  ]
 });
-console.log(oneOff.finalPrompt);
+console.log(result.selectedTarget);
 ```
 ## Claude CLI usage
@@ -89,37 +181,45 @@ console.log(oneOff.finalPrompt);
 Plain shell output:
 ```bash
-promptpilot optimize "help me explain binary search simply" --session study --plain
+promptpilot optimize "help me debug this failing CI job" --task code --preset code --plain
 ```
-Piping into Claude CLI:
+Pipe directly into Claude CLI:
 ```bash
-promptpilot optimize "help me explain binary search simply" --session study --plain | claude
+promptpilot optimize "continue working on this refactor" --session repo-refactor --save-context --plain | claude
 ```
-Using stdin in a shell pipeline:
+Route against an allowlist of downstream targets:
 ```bash
-cat notes.txt | promptpilot optimize --task summarization --plain | claude
+promptpilot optimize "rewrite this prompt for a coding refactor task" \
+  --task code \
+  --preset code \
+  --target anthropic:claude-sonnet \
+  --target openai:gpt-4.1-mini \
+  --target openai:gpt-5-codex \
+  --target-hint coding \
+  --target-hint refactor \
+  --json --debug
 ```
-Saving context between calls:
+Use stdin in a pipeline:
 ```bash
-promptpilot optimize "continue working on my essay intro" --session essay1 --task essay --save-context --plain
+cat notes.txt | promptpilot optimize --task summarization --plain | claude
 ```
-Debugging token usage:
+Save context between calls:
 ```bash
-promptpilot optimize "summarize these lecture notes" --session notes1 --json --debug
+promptpilot optimize "continue my debugger plan" --session ci-fix --save-context --plain
 ```
-Clearing a session:
+Clear a session:
 ```bash
-promptpilot optimize --session essay1 --clear-session
+promptpilot optimize --session ci-fix --clear-session
 ```
 Node `child_process` example:
@@ -127,68 +227,67 @@ Node `child_process` example:
 ```ts
 import { spawn } from "node:child_process";
-const prompt = spawn("promptpilot", [
+const promptpilot = spawn("promptpilot", [
   "optimize",
-  "continue my study guide",
+  "continue working on this repo refactor",
   "--session",
-  "dsa",
+  "repo-refactor",
+  "--save-context",
   "--plain"
 ]);
 const claude = spawn("claude", [], { stdio: ["pipe", "inherit", "inherit"] });
-prompt.stdout.pipe(claude.stdin);
+promptpilot.stdout.pipe(claude.stdin);
 ```
 ## Session context
-By default, if you pass a `sessionId`, `promptpilot` stores optimized turns in a local session store. The default store is JSON files under `~/.promptpilot/sessions`. A SQLite store is also available when `node:sqlite` or `better-sqlite3` is present.
-If you do not pass `ollamaModel` or `--model`, `promptpilot` asks Ollama which models are installed and lets a small local Qwen router choose the best small optimizer model for the current prompt. It does not statically rank multiple candidate models anymore. If a suitable Qwen router model is not available when multiple small candidates exist, it falls back to deterministic heuristic prompt optimization instead of making a static model-choice guess. If only oversized local models are available, it also falls back to deterministic heuristic optimization instead of silently using a heavy model.
+If you pass a `sessionId`, PromptPilot stores session entries in a local store. The default store is JSON under `~/.promptpilot/sessions`. SQLite is also supported when `node:sqlite` or `better-sqlite3` is available.
 Each session stores:
-- User prompts
-- Optimized prompts
-- Final prompts
-- Extracted constraints
-- Context summaries
-- Timestamps
-- Optional tags
+- user prompts
+- optimized prompts
+- final prompts
+- extracted constraints
+- summaries
+- timestamps
+- optional tags
 Context retrieval prefers:
-- Pinned constraints
-- Task-aligned prior turns
-- Recent prompts
-- Named entities and recurring references
-- Stored summaries when budgets are tight
+- pinned constraints
+- task goals
+- recent relevant turns
+- named entities and recurring references
+- stored summaries when budgets are tight
 ## Token reduction
-`promptpilot` estimates token usage heuristically for:
+PromptPilot estimates token usage for:
-- The new prompt
-- Retrieved session context
-- The final composed prompt
+- the new prompt
+- retrieved context
+- the final composed prompt
-You can control the budgets with:
+Budgets:
 - `maxInputTokens`
 - `maxContextTokens`
 - `maxTotalTokens`
-When context is too large, it ranks prior turns, preserves high-value constraints, summarizes older context, and drops lower-signal items.
+When the budget is too large, PromptPilot compresses or summarizes old context, preserves high-signal instructions, and drops low-value context before composing the final prompt.
 ## CLI
 ```bash
-promptpilot optimize "write me a better prompt for asking claude to summarize lecture notes"
+promptpilot optimize "rewrite this prompt for a coding refactor task"
 ```
 Supported flags:
 - `--session <id>`
-- `--model <name>` to override auto-selection
+- `--model <name>`
 - `--mode <mode>`
 - `--task <task>`
 - `--tone <tone>`
@@ -198,6 +297,13 @@ Supported flags:
 - `--max-length <n>`
 - `--tag <value>` repeatable
 - `--pin-constraint <text>` repeatable
+- `--target <provider:model>` repeatable
+- `--target-hint <value>` repeatable
+- `--routing-priority <cheapest_adequate|best_quality|fastest_adequate>`
+- `--routing-top-k <n>`
+- `--workload-bias <code_first>`
+- `--no-routing`
+- `--host <url>`
 - `--store <local|sqlite>`
 - `--storage-dir <path>`
 - `--sqlite-path <path>`
@@ -206,13 +312,14 @@ Supported flags:
 - `--debug`
 - `--save-context`
 - `--no-context`
+- `--clear-session`
 - `--max-total-tokens <n>`
 - `--max-context-tokens <n>`
 - `--max-input-tokens <n>`
-- `--clear-session`
+- `--timeout <ms>`
 - `--bypass-optimization`
-If no prompt argument is provided, `promptpilot optimize` will read the raw prompt from stdin.
+If no positional prompt is provided, `promptpilot optimize` reads the raw prompt from stdin.
 ## Public API
@@ -225,6 +332,19 @@ Main exports:
 - `FileSessionStore`
 - `SQLiteSessionStore`
+Useful result fields:
+- `optimizedPrompt`
+- `finalPrompt`
+- `selectedTarget`
+- `rankedTargets`
+- `routingReason`
+- `routingWarnings`
+- `provider`
+- `model`
+- `estimatedTokensBefore`
+- `estimatedTokensAfter`
 Supported modes:
 - `clarity`
@@ -244,41 +364,23 @@ Supported presets:
 - `summarization`
 - `chat`
-## File structure
-```text
-src/
-  index.ts
-  types.ts
-  errors.ts
-  cli.ts
-  core/
-    optimizer.ts
-    ollamaClient.ts
-    systemPrompt.ts
-    contextManager.ts
-    tokenEstimator.ts
-    contextCompressor.ts
-  storage/
-    fileSessionStore.ts
-    sqliteSessionStore.ts
-  utils/
-    validation.ts
-    logger.ts
-    json.ts
-test/
-```
+## Why the default model was chosen
+`qwen2.5:3b` is the default local preference because it offers a practical balance of:
-## Safety and fallback behavior
+- good instruction following
+- strong enough reasoning for prompt optimization
+- acceptable memory use on laptops
+- good performance for code-first workflows
-If Ollama is unavailable, `promptpilot` falls back to a deterministic local formatter that still preserves constraints and emits a Claude-compatible final prompt. Empty prompts are rejected, timeouts are supported, and hard token budget failures throw explicit errors.
+`phi3:mini` remains a useful lightweight option for shorter non-coding rewrites when it is installed locally and the Qwen router selects it.
 ## Future improvements
-- Semantic retrieval for context
-- Better token counting by model
-- Prompt scoring
-- Local embeddings for relevance search
-- Response-aware context updates
-- Cache layer
-- Benchmark suite
+- semantic retrieval for context
+- better token counting by target model
+- prompt scoring
+- local embeddings for relevance search
+- response-aware context updates
+- cache layer
+- benchmark suite