npm - @a-canary/pi-upskill - Versions diffs - 1.0.0 - Mend

@a-canary/pi-upskill 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md +143 -0
package/extension/index.ts +308 -0
package/package.json +21 -0
package/skills/analyze/SKILL.md +187 -0
package/skills/backfill/SKILL.md +142 -0

package/README.md ADDED Viewed

@@ -0,0 +1,143 @@
+# pi-upskill
+Learn from failures. Reduce token waste. Improve automatically.
+## Overview
+pi-upskill tracks corrections (failures → fixes) and generates skills/rules to prevent future mistakes.
+**Core flow:**
+1. Log corrections during conversation (`upskill-log` tool)
+2. Backfill from past sessions (`/upskill-backfill`)
+3. At threshold, analyze and generate ONE high-impact edit (`/upskill-analyze`)
+## Installation
+```bash
+# From git (recommended)
+pi install git:github.com/a-canary/pi-upskill
+# Or from npm after publishing
+pi install npm:pi-upskill
+# Try without installing
+pi -e git:github.com/a-canary/pi-upskill
+```
+After install, reload: `/reload`
+## Usage
+### During conversation: log corrections
+The agent uses the `upskill-log` tool:
+```
+Agent: [uses upskill-log tool]
+  failure: "Committed without running tests"
+  correction: "Always run tests before commit"
+  strength: "strong"
+  tokens_wasted: 3000
+Result: Logged correction #5 to .pi/corrections.jsonl
+        Progress: 5/20 corrections
+```
+**Strength levels:**
+- `strong` — User said "always/never/remember" → single occurrence = skill
+- `pattern` — Self-correction or repeated issue → needs 3x occurrences
+### One-time: scan past sessions
+```
+/upskill-backfill
+```
+Scans session files from pi, claude, opencode, codex. Extracts corrections for review.
+### At threshold: analyze and improve
+```
+/upskill-analyze
+```
+When 20+ corrections logged, triggers background analysis:
+1. LLM reviews all corrections
+2. Selects ONE edit for maximum token impact
+3. Applies surgical edit (skill/AGENTS.md/MEMORY.md)
+4. Removes addressed corrections
+### Check progress
+```
+/upskill-status
+```
+Shows: count, threshold, strong vs pattern, total tokens wasted.
+## Data Format
+`.pi/corrections.jsonl` — one JSON object per line:
+```json
+{"timestamp":"2025-03-13T01:30:00Z","failure":"Committed without tests","correction":"Always run tests first","context":"User reminder after broken CI","tokens_wasted":3000,"source":"user","strength":"strong"}
+```
+**Required fields (max 30 words each):**
+- `timestamp` — ISO 8601
+- `failure` — What went wrong
+- `correction` — How to fix / what to do instead
+- `source` — "user" or "self"
+- `strength` — "strong" or "pattern"
+**Optional:**
+- `context` — Relevant context
+- `tokens_wasted` — Estimated tokens
+## Configuration
+`.pi/settings.json`:
+```json
+{
+  "upskill": {
+    "threshold": 20,
+    "autoAnalyze": false,
+    "lookbackDays": 7
+  }
+}
+```
+## Architecture
+```
+~/pi-upskill/
+├── CHOICES.md           # Decision record
+├── PLAN.md              # Implementation phases
+├── README.md            # This file
+├── extension/
+│   └── index.ts         # upskill-log tool, commands
+└── skills/
+    ├── analyze/SKILL.md # Pattern analysis workflow
+    └── backfill/SKILL.md # Historical scan workflow
+```
+**Hybrid interface:**
+- Extension provides `upskill-log` tool (inline during conversation)
+- Skills provide `/upskill-backfill` and `/upskill-analyze` commands
+## Key Decisions
+See [CHOICES.md](CHOICES.md) for full decision record.
+| ID | Decision |
+|----|----------|
+| UX-0001 | User corrections with "always/never/remember" → immediate skill |
+| UX-0002 | Self-corrections → need 3x pattern before skill |
+| F-0003 | At 20 corrections → background analysis, ONE edit for max impact |
+| D-0003 | Processed corrections removed after edit applied |
+## Inspiration
+- [upskill.md](https://github.com/claude-admin/cc-plugins) — Pattern extraction from memory
+- [pi-reflect](https://github.com/jo-inc/pi-reflect) — Iterative self-improvement

package/extension/index.ts ADDED Viewed

@@ -0,0 +1,308 @@
+/**
+ * pi-upskill — Learn from failures, reduce token waste
+ *
+ * Tools:
+ *   upskill-log — Log a correction during conversation
+ *
+ * Commands:
+ *   /upskill-status — Show corrections count and threshold progress
+ *   /upskill-analyze — Trigger pattern analysis (runs in background)
+ *
+ * Configuration (.pi/settings.json):
+ *   {
+ *     "upskill": {
+ *       "threshold": 20,
+ *       "autoAnalyze": false
+ *     }
+ *   }
+ */
+import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
+import { Type } from "@sinclair/typebox";
+import * as fs from "node:fs";
+import * as path from "node:path";
+import { spawn } from "node:child_process";
+// ── Types ────────────────────────────────────────
+interface Correction {
+	timestamp: string;
+	failure: string;
+	correction: string;
+	context?: string;
+	tokens_wasted?: number;
+	source: "user" | "self";
+	strength: "strong" | "pattern";
+}
+interface UpskillSettings {
+	threshold: number;
+	autoAnalyze: boolean;
+}
+const DEFAULT_SETTINGS: UpskillSettings = {
+	threshold: 20,
+	autoAnalyze: false,
+};
+// ── Helpers ──────────────────────────────────────
+function getCorrectionsPath(cwd: string): string {
+	return path.join(cwd, ".pi", "corrections.jsonl");
+}
+function loadCorrections(filepath: string): Correction[] {
+	if (!fs.existsSync(filepath)) return [];
+	const content = fs.readFileSync(filepath, "utf-8");
+	return content
+		.trim()
+		.split("\n")
+		.filter((line) => line.trim())
+		.map((line) => JSON.parse(line));
+}
+function appendCorrection(filepath: string, correction: Correction): void {
+	const dir = path.dirname(filepath);
+	if (!fs.existsSync(dir)) {
+		fs.mkdirSync(dir, { recursive: true });
+	}
+	fs.appendFileSync(filepath, JSON.stringify(correction) + "\n", "utf-8");
+}
+function getSettings(ctx: any): UpskillSettings {
+	const projectSettings = ctx.projectSettings?.upskill || {};
+	return { ...DEFAULT_SETTINGS, ...projectSettings };
+}
+function countWords(text: string): number {
+	return text.trim().split(/\s+/).filter(Boolean).length;
+}
+// ── Extension ────────────────────────────────────
+export default function (pi: ExtensionAPI) {
+	// ── upskill-log Tool ───────────────────────────
+	pi.registerTool({
+		name: "upskill-log",
+		label: "Log Correction",
+		description: `Log a failure → correction to .pi/corrections.jsonl. Use when:
+1. User corrects you with "always", "never", or "remember" (strength: strong)
+2. You self-correct after multiple failed attempts (strength: pattern, needs 3x)
+After logging, check if threshold reached (default 20 entries). If so, suggest running /upskill-analyze.
+Each field must be 30 words or less. Estimate tokens_wasted based on conversation length from mistake to correction.`,
+		parameters: Type.Object({
+			failure: Type.String({
+				description: "What went wrong (max 30 words)",
+				maxLength: 300,
+			}),
+			correction: Type.String({
+				description: "How it was fixed / what to do instead (max 30 words)",
+				maxLength: 300,
+			}),
+			context: Type.Optional(
+				Type.String({
+					description: "Relevant context (max 30 words)",
+					maxLength: 300,
+				}),
+			),
+			tokens_wasted: Type.Optional(
+				Type.Number({
+					description: "Estimated tokens wasted (in + out) from mistake to correction",
+				}),
+			),
+			strength: Type.Optional(
+				Type.String({
+					description: "strong = always/never/remember (single occurrence sufficient), pattern = needs 3x",
+					enum: ["strong", "pattern"],
+				}),
+			),
+		}),
+		async execute(toolCallId, params, _signal, _onUpdate, ctx) {
+			const { failure, correction, context, tokens_wasted, strength = "pattern" } = params;
+			// Validate word counts
+			const failureWords = countWords(failure);
+			const correctionWords = countWords(correction);
+			const contextWords = context ? countWords(context) : 0;
+			if (failureWords > 30 || correctionWords > 30 || contextWords > 30) {
+				return {
+					content: [
+						{
+							type: "text",
+							text: `Error: Fields must be 30 words or less. Got: failure=${failureWords}, correction=${correctionWords}, context=${contextWords}`,
+						},
+					],
+					isError: true,
+				};
+			}
+			const entry: Correction = {
+				timestamp: new Date().toISOString(),
+				failure,
+				correction,
+				context,
+				tokens_wasted,
+				source: "user", // Could be inferred from context
+				strength: strength as "strong" | "pattern",
+			};
+			const correctionsPath = getCorrectionsPath(ctx.cwd);
+			appendCorrection(correctionsPath, entry);
+			const corrections = loadCorrections(correctionsPath);
+			const settings = getSettings(ctx);
+			const count = corrections.length;
+			let message = `Logged correction #${count} to .pi/corrections.jsonl`;
+			if (count >= settings.threshold) {
+				message += `\n\n**Threshold reached!** (${count}/${settings.threshold})\nRun /upskill-analyze to generate skills from patterns.`;
+			} else {
+				message += `\n\nProgress: ${count}/${settings.threshold} corrections`;
+			}
+			return {
+				content: [{ type: "text", text: message }],
+				details: { count, threshold: settings.threshold },
+			};
+		},
+		renderCall(args, theme) {
+			const strength = args.strength || "pattern";
+			const strengthColor = strength === "strong" ? "warning" : "muted";
+			return theme.fg("toolTitle", "upskill-log ") + theme.fg(strengthColor, `[${strength}]`);
+		},
+		renderResult(result, _options, theme) {
+			const details = result.details as { count: number; threshold: number } | undefined;
+			if (!details) {
+				const text = result.content[0];
+				return theme.fg("success", text?.type === "text" ? text.text : "Logged");
+			}
+			const pct = Math.round((details.count / details.threshold) * 100);
+			const bar = "█".repeat(Math.min(10, Math.floor(pct / 10))) + "░".repeat(10 - Math.min(10, Math.floor(pct / 10)));
+			return theme.fg("success", `✓ Logged #${details.count} `) + theme.fg("dim", `[${bar}] ${details.count}/${details.threshold}`);
+		},
+	});
+	// ── /upskill-status Command ───────────────────
+	pi.registerCommand("upskill-status", {
+		description: "Show corrections count and threshold progress",
+		handler: async (_args, ctx) => {
+			const correctionsPath = getCorrectionsPath(ctx.cwd);
+			const corrections = loadCorrections(correctionsPath);
+			const settings = getSettings(ctx);
+			const count = corrections.length;
+			const strong = corrections.filter((c) => c.strength === "strong").length;
+			const pattern = corrections.filter((c) => c.strength === "pattern").length;
+			const totalTokens = corrections.reduce((sum, c) => sum + (c.tokens_wasted || 0), 0);
+			const status =
+				count >= settings.threshold
+					? "🟢 Ready to analyze"
+					: `🟡 ${settings.threshold - count} more needed`;
+			ctx.ui.notify(
+				`**Upskill Status**\n\n` +
+					`Corrections: ${count}/${settings.threshold}\n` +
+					`Strong: ${strong} | Pattern: ${pattern}\n` +
+					`Tokens wasted (est): ${totalTokens.toLocaleString()}\n` +
+					`Status: ${status}\n\n` +
+					`File: ${correctionsPath}`,
+				"info",
+			);
+		},
+	});
+	// ── /upskill-analyze Command ──────────────────
+	pi.registerCommand("upskill-analyze", {
+		description: "Analyze corrections and generate skills (runs in background)",
+		handler: async (_args, ctx) => {
+			const correctionsPath = getCorrectionsPath(ctx.cwd);
+			const corrections = loadCorrections(correctionsPath);
+			if (corrections.length === 0) {
+				ctx.ui.notify("No corrections logged. Use upskill-log tool or /upskill-backfill first.", "warning");
+				return;
+			}
+			const settings = getSettings(ctx);
+			if (corrections.length < settings.threshold) {
+				const proceed = await ctx.ui.confirm(
+					"Below threshold",
+					`Only ${corrections.length} corrections (threshold: ${settings.threshold}). Analyze anyway?`,
+				);
+				if (!proceed) return;
+			}
+			// Read the analyze skill prompt
+			const skillPath = path.join(path.dirname(new URL(import.meta.url).pathname), "..", "skills", "analyze", "SKILL.md");
+			let analyzePrompt = "";
+			try {
+				const skillContent = fs.readFileSync(skillPath, "utf-8");
+				// Extract content after frontmatter
+				const match = skillContent.match(/^---\n[\s\S]*?\n---\n([\s\S]*)$/);
+				if (match) analyzePrompt = match[1].trim();
+			} catch {
+				analyzePrompt = "Analyze the corrections and propose one high-impact edit.";
+			}
+			// Build the prompt with corrections
+			const correctionsBlock = corrections.map((c, i) => {
+				const strength = c.strength === "strong" ? "⬤" : "○";
+				const tokens = c.tokens_wasted ? ` [${c.tokens_wasted} tokens]` : "";
+				return `${strength} [${i + 1}] ${c.failure} → ${c.correction}${tokens}\n    Context: ${c.context || "none"}`;
+			});
+			const fullPrompt =
+				`${analyzePrompt}\n\n` +
+				`## Corrections to Analyze (${corrections.length} entries)\n\n` +
+				`Legend: ⬤ = strong (single occurrence), ○ = pattern (needs 3x)\n\n` +
+				correctionsBlock.join("\n") +
+				`\n\n## Instructions\n\n` +
+				`1. Identify patterns in the corrections\n` +
+				`2. Select ONE edit with the largest impact on token usage\n` +
+				`3. Apply the surgical edit to the appropriate file\n` +
+				`4. Report what was changed and which corrections it addresses\n` +
+				`5. DO NOT remove the corrections file — user will review`;
+			// Spawn background process
+			const logPath = path.join(ctx.cwd, ".pi", "upskill-analysis.log");
+			const args = [
+				"-p",
+				"--no-session",
+				"--model",
+				ctx.model ? `${ctx.model.provider}/${ctx.model.id}` : "anthropic/claude-sonnet-4-5",
+				fullPrompt,
+			];
+			ctx.ui.notify(`Spawning background analysis...\nLog: ${logPath}`, "info");
+			const proc = spawn("pi", args, {
+				detached: true,
+				stdio: ["ignore", fs.openSync(logPath, "w"), fs.openSync(logPath, "a")],
+			});
+			proc.unref();
+			ctx.ui.notify(
+				`Background analysis started.\n` +
+					`- Corrections: ${corrections.length}\n` +
+					`- Log: ${logPath}\n\n` +
+					`Check log for results. Use /upskill-status to see progress.`,
+				"info",
+			);
+		},
+	});
+}

package/package.json ADDED Viewed

@@ -0,0 +1,21 @@
+{
+  "name": "@a-canary/pi-upskill",
+  "version": "1.0.0",
+  "description": "Learn from failures, reduce token waste, improve automatically",
+  "keywords": ["pi-package"],
+  "author": "a-canary",
+  "license": "MIT",
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/a-canary/pi-upskill.git"
+  },
+  "peerDependencies": {
+    "@mariozechner/pi-coding-agent": "*",
+    "@mariozechner/pi-tui": "*",
+    "@sinclair/typebox": "*"
+  },
+  "pi": {
+    "extensions": ["./extension"],
+    "skills": ["./skills"]
+  }
+}

package/skills/analyze/SKILL.md ADDED Viewed

@@ -0,0 +1,187 @@
+---
+name: analyze
+description: Analyze logged corrections and generate one high-impact skill or rule edit. Use when threshold reached or user requests improvement.
+---
+# /upskill-analyze — Pattern Analysis & Skill Generation
+Review corrections, find patterns, apply ONE surgical edit for maximum token savings.
+## When to Use
+- Threshold reached (default: 20 corrections)
+- User explicitly requests analysis
+- After backfill to process historical corrections
+## Process
+### Step 1: Read Corrections
+Load all entries from `.pi/corrections.jsonl`:
+```bash
+cat .pi/corrections.jsonl
+```
+### Step 2: Identify Patterns
+Group corrections by:
+**Type:**
+- Same failure, same correction → strong pattern
+- Same failure, different correction → needs synthesis
+- Related failures → cluster by keyword/theme
+**Priority:**
+1. `strength: "strong"` entries — single occurrence sufficient
+2. High `tokens_wasted` — prioritize for impact
+3. Frequency — 3+ pattern entries = solid pattern
+### Step 3: Select ONE Edit
+Ask: "Which single edit would have the largest impact on token usage?"
+Consider:
+- How many corrections would this prevent?
+- What's the total tokens wasted across those corrections?
+- Is this a new skill or an edit to existing file?
+**Edit targets (in order):**
+1. `.pi/skills/` — Add new skill for workflow/pattern
+2. `AGENTS.md` — Strengthen or add rule
+3. `MEMORY.md` — Add durable fact
+4. Existing skill — Strengthen wording
+### Step 4: Generate Edit
+For new skills, create `.pi/skills/{name}/SKILL.md`:
+```markdown
+---
+name: {name}
+description: {when to use}
+---
+# /{name} — {purpose}
+## Pattern Source
+{why this skill exists}
+## Process
+1. {step}
+2. {step}
+## Verification
+{testable gate}
+```
+For existing files, propose surgical edit:
+- Add bullet point to existing section
+- Strengthen wording with "ALWAYS" / "NEVER"
+- Add concrete example
+### Step 5: Apply Edit
+Use `edit` tool to apply the change. Record:
+- File edited
+- Lines changed
+- Corrections addressed
+### Step 6: Clean Up Corrections
+Remove the corrections that contributed to this edit:
+```bash
+# Keep only unprocessed corrections
+# (those not matching the pattern we just addressed)
+```
+Leave a marker comment for traceability:
+```json
+{"timestamp":"2025-03-13T02:00:00Z","event":"upskill","action":"added_skill","name":"run-tests-first","corrections_addressed":3,"tokens_saved":9000}
+```
+### Step 7: Report Results
+```
+Analysis complete.
+**Edit Applied:**
+- File: .pi/skills/run-tests-first/SKILL.md (created)
+- Action: Added new skill
+**Impact:**
+- Corrections addressed: 3
+- Tokens to save: ~9,000 per occurrence
+**Removed from corrections.jsonl:**
+- #4 "Committed without tests"
+- #7 "Forgot to run tests"
+- #12 "Skipped test step"
+Remaining: 17 corrections
+```
+## Example Analysis Prompt
+```
+You are analyzing logged corrections to improve agent behavior.
+## Corrections (12 entries)
+⬤ [1] Committed without running tests → Always run tests first
+    Context: User reminder after broken CI
+    [3000 tokens]
+○ [2] Used deprecated API → Check docs for current method
+    Context: Self-corrected after error
+    [1500 tokens]
+○ [5] Used deprecated API → Check docs for current method
+    Context: Same pattern as #2
+    [2000 tokens]
+...
+Legend: ⬤ = strong (single = skill), ○ = pattern (needs 3x)
+## Task
+1. Find patterns in these corrections
+2. Select ONE edit with largest token impact
+3. Apply surgical edit to appropriate file
+4. Report what was changed
+5. List which corrections this addresses
+Constraints:
+- Only ONE edit
+- Maximum impact on future token usage
+- Prefer adding skills for workflows
+- Prefer strengthening AGENTS.md for behavioral rules
+```
+## Verification
+After edit:
+```bash
+# Skill exists and loads
+pi --skill .pi/skills/{name} -p "help"
+# Corrections reduced
+cat .pi/corrections.jsonl | wc -l
+```
+## Configuration
+```json
+{
+  "upskill": {
+    "maxEditsPerAnalysis": 1,
+    "removeAddressedCorrections": true
+  }
+}
+```

package/skills/backfill/SKILL.md ADDED Viewed

@@ -0,0 +1,142 @@
+---
+name: backfill
+description: Scan past sessions and extract corrections to .pi/corrections.jsonl. Use for one-time historical analysis of failures.
+---
+# /upskill-backfill — Historical Correction Extraction
+Scan past conversation sessions to identify and log failures → corrections.
+## When to Use
+- First-time setup: extract patterns from past sessions
+- After realizing a pattern of mistakes
+- User requests historical analysis
+## Process
+### Step 1: Locate Session Files
+Search for session files in common locations:
+```bash
+# Pi sessions
+ls ~/.pi/agent/sessions/**/*.jsonl
+# Claude sessions (if available)
+ls ~/.claude/sessions/**/*.jsonl 2>/dev/null
+# OpenCode sessions
+ls ~/.opencode/sessions/**/* 2>/dev/null
+# Codex sessions
+ls ~/.codex/sessions/**/* 2>/dev/null
+```
+### Step 2: Filter by Date
+Default lookback: 7 days. Ask user to confirm or adjust.
+```bash
+# Find files from last N days
+find ~/.pi/agent/sessions -name "*.jsonl" -mtime -7
+```
+### Step 3: Extract Corrections
+For each session file, look for:
+**Explicit user corrections:**
+- "no", "not that", "wrong", "actually", "I meant"
+- "stop", "don't", "never", "always"
+- "remember", "make sure to", "from now on"
+**Strong signals (always/never/remember):**
+- Mark as `strength: "strong"` — single occurrence sufficient for skill
+**Self-corrections:**
+- Multiple tool calls attempting the same thing
+- Agent says "let me try again" or "that didn't work"
+- Mark as `strength: "pattern"` — needs 3x for skill
+### Step 4: Estimate Token Waste
+For each correction:
+1. Find where the mistake started
+2. Find where correction was applied
+3. Count messages/turns between
+4. Rough estimate: ~500-2000 tokens per turn (varies by model)
+### Step 5: Present Candidates
+Show extracted corrections for review:
+```
+Found 12 potential corrections:
+[1] STRONG: "Always run tests before committing"
+    Context: User reminded after broken commit
+    Tokens: ~3000
+[2] PATTERN: "Don't use deprecated API"
+    Context: Self-corrected after error
+    Tokens: ~1500
+...
+Review? [y/n/select]
+```
+### Step 6: Log Approved Corrections
+For each approved correction, use the `upskill-log` tool or write directly:
+```json
+{"timestamp":"2025-03-13T00:00:00Z","failure":"Committed without tests","correction":"Always run tests first","context":"User reminder after broken CI","tokens_wasted":3000,"source":"user","strength":"strong"}
+```
+### Step 7: Report Summary
+```
+Backfill complete:
+- Sessions scanned: 47
+- Corrections found: 12
+- Logged: 8 (4 skipped)
+- Total tokens wasted: ~15,000
+Run /upskill-status to see progress.
+```
+## Output
+Appends to `.pi/corrections.jsonl`.
+## Verification
+```bash
+cat .pi/corrections.jsonl | wc -l
+```
+## Session Parsing Details
+### Pi JSONL Format
+Each line is a JSON object. Look for:
+```json
+{"type":"message","message":{"role":"user","content":[{"type":"text","text":"no, that's wrong"}]}}
+{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"Let me try again..."}]}}
+```
+User messages with corrections → extract failure/correction pair.
+### Other Formats
+Claude/OpenCode/Codex: Parse similarly, looking for user correction signals and agent self-corrections.
+## Options
+User can specify:
+- `--lookback N` — Days to look back (default: 7)
+- `--source pi|claude|all` — Which session sources to scan
+- `--auto` — Skip review, log all found corrections