npm - pi-dev - Versions diffs - 0.2.4 → 0.2.6 - Mend

pi-dev 0.2.4 → 0.2.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/README.md +9 -7
package/dist/install.js +34 -2
package/dist/paths.js +9 -1
package/extensions/pi-flow/README.md +40 -0
package/extensions/pi-flow/index.ts +65 -0
package/extensions/pi-flow/package.json +9 -0
package/package.json +5 -4
package/skills/do/SKILL.md +4 -31
package/skills/improve-skill-flow/SKILL.md +91 -28
package/skills/to-issues/SKILL.md +1 -18
package/skills/triage/SKILL.md +0 -6

package/README.md CHANGED Viewed

@@ -35,21 +35,22 @@ You ask. It classifies. It executes. It reports.
 - **A migration gate that won't let you cheat.** `/do` refuses to touch an un-migrated repo. `/migrate` audits `AGENTS.md`, `CLAUDE.md`, scoped agent dirs, handoff systems, and ADR layouts; archives the noise; stamps a marker; and on every re-entry runs drift probes so banned conventions can't sneak back in.
 - **No handoff files. Ever.** State of work lives in three places only — code (git), the issue tracker, and merged preferences. No `docs/handoff/`, no `.scratch/flow/`, no SESSION_*.md littering your repo.
 - **Local-live verification the agent owns.** A one-time playbook captures how to boot your stack. From then on the agent boots it, drives it, and quotes the evidence in the summary. You are not the test runner.
-- **Vertical-slice issues with AI disclaimers baked in.** `/to-issues` produces independently-grabbable tickets — each with acceptance criteria, allowed-touch-set, and out-of-scope — and the disclaimer required by `/triage` lands on every one.
+- **Vertical-slice issues that read like a spec.** `/to-issues` produces independently-grabbable tickets — each with acceptance criteria, allowed-touch-set, and out-of-scope — written as plain spec so the next worker agent picks them up as input, not as a meta-tagged AI artefact.
+- **One minimal plugin where prose is provably not enough.** Ships `pi-flow`: a single pi extension whose only job is to catch `/do` ending a turn with `follow-up: ... run /do ...` mid-chain (the hand-back pattern that turns one prompt into ten "진행해" replies) and steer back into the next phase. Toggle in `~/.pi/agent/settings.json` → `piFlow.enabled`. We add a new guard only when an audit shows a rule the model demonstrably can't see itself drift past.
 ## Install
 Requires Node ≥ 20 and the [pi runtime](https://github.com/badlogic/pi).
 ```bash
-# Interactive: pick global vs project-local, then install + seed preferences
+# Interactive: pick global vs project-local, then install skills + extensions + seed preferences
 npx pi-dev@latest install
 # Or be explicit:
-npx pi-dev@latest install --global    # ~/.pi/agent/skills/   (every repo)
-npx pi-dev@latest install --local     # ./.pi/skills/         (this repo only)
+npx pi-dev@latest install --global    # ~/.pi/agent/{skills,extensions}/   (every repo)
+npx pi-dev@latest install --local     # ./.pi/{skills,extensions}/         (this repo only)
-# Refresh skills (auto-detects scope from disk; preferences are kept)
+# Refresh skills + extensions (auto-detects scope from disk; preferences are kept)
 npx pi-dev@latest update
 # See what's installed under each scope
@@ -64,10 +65,11 @@ npx pi-dev doctor
 | | global | local |
 | --- | --- | --- |
 | Skills | `~/.pi/agent/skills/` | `<repo>/.pi/skills/` |
+| Extensions | `~/.pi/agent/extensions/` | `<repo>/.pi/extensions/` |
 | Preferences | `~/.pi/agent/preferences.md` | `<repo>/.pi/preferences.md` |
 | Pi sessions see it from | every cwd | only this repo |
-| Goes in the repo's git? | no | yes (commit `.pi/skills/`, gitignore `.pi/sessions/`) |
-| Use when | the skills are part of *your* engineering taste | the skills are part of *the project's* contract |
+| Goes in the repo's git? | no | yes (commit `.pi/skills/` + `.pi/extensions/`, gitignore `.pi/sessions/`) |
+| Use when | the framework is part of *your* engineering taste | the framework is part of *the project's* contract |
 Non-interactive runs (CI, piped input) default to `--global` silently. Pass `-y` to skip the prompt and accept the default.

package/dist/install.js CHANGED Viewed

@@ -3,7 +3,19 @@ import { execSync } from "node:child_process";
 import { join } from "node:path";
 import { createInterface } from "node:readline";
 import { SKILLS, CONSUMER_SKILLS } from "./manifest.js";
-import { PKG_SKILLS_DIR, PKG_GLOBAL_PREFS_PRESET, destFor, } from "./paths.js";
+import { PKG_SKILLS_DIR, PKG_EXTENSIONS_DIR, PKG_GLOBAL_PREFS_PRESET, destFor, } from "./paths.js";
+/**
+ * Extensions shipped with pi-dev. Each is a subdirectory under `extensions/`
+ * that pi auto-discovers from `~/.pi/agent/extensions/<name>/` (global) or
+ * `.pi/extensions/<name>/` (local). The directory name is the install name
+ * verbatim — no prefix, no transform.
+ *
+ * Keep this list short. A new entry is justified only by an audit signal
+ * showing prose alone has ≥60% miss rate on a mechanically checkable rule.
+ */
+const EXTENSIONS = [
+    { name: "pi-flow", summary: "Steer mid-chain hand-backs back into the next phase." },
+];
 function ask(question) {
     const rl = createInterface({ input: process.stdin, output: process.stdout });
     return new Promise((res) => {
@@ -39,8 +51,9 @@ async function resolveScope(opts) {
 }
 export async function install(opts = {}) {
     const scope = await resolveScope(opts);
-    const { agentDir, skillsDir, prefsFile } = destFor(scope);
+    const { agentDir, skillsDir, extensionsDir, prefsFile } = destFor(scope);
     mkdirSync(skillsDir, { recursive: true });
+    mkdirSync(extensionsDir, { recursive: true });
     const skillsToInstall = opts.includeMaintainer ? SKILLS : CONSUMER_SKILLS;
     let copied = 0;
     for (const skill of skillsToInstall) {
@@ -59,6 +72,25 @@ export async function install(opts = {}) {
         copied++;
     }
     console.log(`Installed ${copied} skill(s) into ${skillsDir} [${scope}]`);
+    let extsCopied = 0;
+    for (const ext of EXTENSIONS) {
+        const src = join(PKG_EXTENSIONS_DIR, ext.name);
+        const dst = join(extensionsDir, ext.name);
+        if (!existsSync(src)) {
+            console.warn(`  skip extension ${ext.name} (source not found in package)`);
+            continue;
+        }
+        if (existsSync(dst) && !opts.force) {
+            cpSync(src, dst, { recursive: true, force: true });
+        }
+        else {
+            cpSync(src, dst, { recursive: true });
+        }
+        extsCopied++;
+    }
+    if (extsCopied > 0) {
+        console.log(`Installed ${extsCopied} extension(s) into ${extensionsDir} [${scope}]`);
+    }
     if (opts.skipPrefs) {
         console.log("Skipped preferences (pass --include-prefs on update to merge in new keys).");
         return;

package/dist/paths.js CHANGED Viewed

@@ -4,23 +4,31 @@ import { fileURLToPath } from "node:url";
 export const HOME = homedir();
 export const PI_AGENT_DIR = join(HOME, ".pi", "agent");
 export const PI_SKILLS_DIR = join(PI_AGENT_DIR, "skills");
+export const PI_EXTENSIONS_DIR = join(PI_AGENT_DIR, "extensions");
 export const PI_GLOBAL_PREFS = join(PI_AGENT_DIR, "preferences.md");
 const __filename = fileURLToPath(import.meta.url);
 const __dirname = dirname(__filename);
 /** Resolve the package root regardless of whether we run from src/ or dist/. */
 export const PKG_ROOT = resolve(__dirname, "..");
 export const PKG_SKILLS_DIR = join(PKG_ROOT, "skills");
+export const PKG_EXTENSIONS_DIR = join(PKG_ROOT, "extensions");
 export const PKG_PRESETS_DIR = join(PKG_ROOT, "presets");
 export const PKG_GLOBAL_PREFS_PRESET = join(PKG_PRESETS_DIR, "preferences.md");
 /** Compute install destinations for a given scope. `local` resolves against `cwd`. */
 export function destFor(scope, cwd = process.cwd()) {
     if (scope === "global") {
-        return { agentDir: PI_AGENT_DIR, skillsDir: PI_SKILLS_DIR, prefsFile: PI_GLOBAL_PREFS };
+        return {
+            agentDir: PI_AGENT_DIR,
+            skillsDir: PI_SKILLS_DIR,
+            extensionsDir: PI_EXTENSIONS_DIR,
+            prefsFile: PI_GLOBAL_PREFS,
+        };
     }
     const agentDir = join(cwd, ".pi");
     return {
         agentDir,
         skillsDir: join(agentDir, "skills"),
+        extensionsDir: join(agentDir, "extensions"),
         prefsFile: join(agentDir, "preferences.md"),
     };
 }

package/extensions/pi-flow/README.md ADDED Viewed

@@ -0,0 +1,40 @@
+# pi-flow
+The default plugin that ships with `pi-dev`. One job: stop `/do` from
+quietly handing the chain back to you between phases.
+## What it does
+Watches the end of every assistant turn. If the last visible text contains
+a `follow-up: ... run /do ...` line **without** a preceding
+`chain complete — no further action.` line, it queues a follow-up steer
+that prompts the next phase. Your next turn is the next `[flow N/M]`
+status line, not "진행해".
+## Why one guard, not five
+`pi-flow` is intentionally minimal. Each new guard taxes every tool call,
+slows iteration, and tempts everyone to disable the whole plugin. So:
+- We add a guard only when a real audit shows ≥60% violation on a rule
+  the model can't see itself drift past.
+- We remove a guard when the underlying root cause is gone (e.g. an
+  `.gitignore` lockout makes a path-write guard redundant).
+- Everything else stays in `SKILL.md` prose where the cost is zero.
+The hugn 2026-05 audit produced exactly one such finding (mid-chain
+hand-back, 8/13 sessions). That is what this guard exists for.
+## Toggle
+`~/.pi/agent/settings.json`:
+```json
+{
+  "piFlow": {
+    "enabled": false
+  }
+}
+```
+Default: on.

package/extensions/pi-flow/index.ts ADDED Viewed

@@ -0,0 +1,65 @@
+/**
+ * pi-flow
+ *
+ * Minimal runtime plugin that keeps /do on-chain.
+ *
+ * One guard, one event: when an assistant turn ends with a
+ * `follow-up: ... run /do ...` line that is *not* preceded by the canonical
+ * `chain complete — no further action.` terminator, the chain is being
+ * handed back mid-flight. The plugin queues a steer reminder so the next
+ * turn re-enters the chain at the next phase instead of waiting for the user
+ * to type "진행해" / "go" / "다음".
+ *
+ * Rationale: the hugn 2026-05 audit showed 8/13 /do sessions ending with
+ * chain-depth 0 and 82% of user messages being short re-entry nudges. Every
+ * other candidate guard had near-zero ROI — the taboo-path writes were already
+ * stopped by a .gitignore lockout, and other rules were taken out of policy.
+ * One predicate, one steer.
+ *
+ * Toggle:
+ *   ~/.pi/agent/settings.json → "piFlow": { "enabled": false }
+ */
+import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
+import { existsSync, readFileSync } from "node:fs";
+import { homedir } from "node:os";
+import { join } from "node:path";
+const HANDBACK = /^follow-up:\s.*\brun\s+`?\/?do\b/m;
+const TERMINATOR = /chain complete\s+\u2014\s+no further action/;
+function isEnabled(): boolean {
+  const settingsPath = join(homedir(), ".pi", "agent", "settings.json");
+  if (!existsSync(settingsPath)) return true;
+  try {
+    const raw = JSON.parse(readFileSync(settingsPath, "utf-8")) as Record<string, unknown>;
+    const cfg = (raw["piFlow"] ?? {}) as { enabled?: boolean };
+    return cfg.enabled !== false;
+  } catch {
+    return true;
+  }
+}
+export default function piFlow(pi: ExtensionAPI) {
+  if (!isEnabled()) return;
+  pi.on("message_end", async (event) => {
+    if (event.message.role !== "assistant") return;
+    const text = event.message.content
+      .filter((b): b is { type: "text"; text: string } => b.type === "text")
+      .map((b) => b.text)
+      .join("\n");
+    if (!text || !HANDBACK.test(text) || TERMINATOR.test(text)) return;
+    pi.sendMessage(
+      {
+        customType: "pi-flow",
+        content:
+          "[pi-flow] mid-chain hand-back detected. `follow-up: … run /do …` is the Step-7 terminator, " +
+          "not a phase boundary. Print the next `[flow N/M]` status line and continue the chain.",
+        display: true,
+      },
+      { triggerTurn: true, deliverAs: "followUp" },
+    );
+  });
+}

package/extensions/pi-flow/package.json ADDED Viewed

@@ -0,0 +1,9 @@
+{
+  "name": "pi-flow",
+  "version": "0.0.0",
+  "private": true,
+  "description": "Default pi-dev plugin: keeps /do on-chain by steering mid-chain hand-backs.",
+  "pi": {
+    "extensions": ["./index.ts"]
+  }
+}

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "pi-dev",
-  "version": "0.2.4",
+  "version": "0.2.6",
   "description": "An autonomous engineering skill framework for the pi runtime — built on Matt Pocock's skills.",
   "type": "module",
   "bin": {
@@ -9,6 +9,7 @@
   "files": [
     "dist",
     "skills",
+    "extensions",
     "presets",
     "README.md",
     "LICENSE"
@@ -39,10 +40,10 @@
   },
   "repository": {
     "type": "git",
-    "url": "git+https://github.com/jason2077/pi-dev.git"
+    "url": "git+https://github.com/jason2077/pi-flow.git"
   },
   "bugs": {
-    "url": "https://github.com/jason2077/pi-dev/issues"
+    "url": "https://github.com/jason2077/pi-flow/issues"
   },
-  "homepage": "https://github.com/jason2077/pi-dev#readme"
+  "homepage": "https://github.com/jason2077/pi-flow#readme"
 }

package/skills/do/SKILL.md CHANGED Viewed

@@ -26,38 +26,9 @@ The point: **one request → one finished outcome**, with as few user interrupti
    - a phase's terminal predicate fails twice and the failure cannot be characterised, OR
    - the Ambiguity protocol fires (preferences truly silent on a one-shot decision), OR
    - user interrupts.
-5. **No handoff files.** State of work lives in three places only: **code (git), issue tracker, merged preferences**. Do not create `.scratch/flow/`, `docs/handoff/`, or any session-log file. Phase outputs are remembered in-context; persistent decisions are committed to code or filed as issues.
+5. **No handoff files.** State of work lives in three places only: **code (git), issue tracker, merged preferences**. Do not create `.scratch/flow/`, `docs/handoff/`, `.handoff/`, or any session-log file (`SESSION_*.md`, `HANDOFF_*.md`). Phase outputs are remembered in-context; persistent decisions are committed to code or filed as issues.
 6. **Side-effect gates respect prefs literally.** `auto-create-issues`, `auto-apply-labels`, `auto-commit-per-slice`, `auto-pr` follow merged prefs without reinterpretation.
 7. **Status line per phase.** `[flow N/M] <phase-name> — <one-sentence what>`.
-8. **Issue-write rule (no exceptions).** Every issue body written from a `/do` flow — whether you invoke `/to-issues`, `/triage`, or call `gh issue create` / `gh issue edit` directly from a bash tool — MUST start with this disclaimer as the first non-blank line:
-   ```
-   > *This was generated by AI during triage.*
-   ```
-   This rule is global and binding regardless of which skill you think you are in. Before publishing any issue body:
-   - Build the body with `--body-file <path>` or a heredoc, never with inline `--body "..."` (heredocs make the disclaimer visible in the diff and the file is auditable).
-   - The first non-blank line of the body file must be the disclaimer literal above.
-   - Immediately after **every** `gh issue create` / `gh issue edit` returns, the **very next tool call** must be this self-check on the resulting body. Not "sometime later", not "once at the end of the batch":
-     ```bash
-     gh issue view <num> --json body --jq '.body' | awk 'NF{print; exit}' | grep -q 'generated by AI' \
-       || { echo "AI disclaimer missing on issue #<num>"; exit 1; }
-     ```
-   - **If the self-check fails, the immediately following tool call MUST be a corrective `gh issue edit <num> --body-file <fixed>` that prepends the disclaimer literal, followed by a re-run of the self-check.** Do not create the next slice, do not call any other tool, and do not summarise progress until a self-check on that issue returns success. A failed self-check that is not followed by an edit-then-recheck pair is itself a Hard-rule #8 violation — equivalent to never having run the check.
-   - The `awk 'NF{print; exit}'` form is mandatory: it selects the first **non-blank** line, which is what the rule actually constrains. `head -1` is not equivalent and will pass false positives on bodies with a leading blank line.
-   **Anti-patterns** (delete and redo if you catch any in your own draft):
-   - `gh issue create --title "…" --body "..."` (inline body, no disclaimer check)
-   - First line of issue body being `## Goal`, `## Scope`, `## Problem Statement`, `## Context`, `**Parent epic:**`, or any heading other than the disclaimer.
-   - Skipping the post-create `gh issue view ... | grep generated by AI` self-check because "I included the disclaimer in the heredoc".
-   - Running the self-check, seeing it fail, and *continuing to the next slice anyway* — "I'll fix the disclaimers in a batch at the end" is the failure mode that this rule is named after. Fix the one issue you just created, then move on.
-   - Batching N `gh issue create` calls in a row and then a single self-check at the end. The self-check is per-issue, and it gates the next `create`.
-   Skills `/to-issues` and `/triage` repeat this rule for the case where you do enter them. This Hard rule is the binding statement for the case where you do not.
 ## Process
@@ -160,6 +131,8 @@ Then for each phase:
 - Ending the turn with `follow-up: <next planned phase> — run /do …` when N < M. The `follow-up:` line is reserved for *post-chain* deferred work (push, manual ops live, prefs refresh). Using it to describe the *next planned phase of this chain* is a disguised hand-back — start the phase instead.
 - Treating a short user nudge ("진행해", "다음", "계속", "ㄱㄱ", "go", "이어서") as a new request. If the planned chain has unfinished phases, those nudges are noise — re-enter the chain at the next phase, do not re-classify and re-inject `/do`.
+The `pi-flow` plugin (installed by default) watches the end of each assistant turn: a `follow-up: ... run /do ...` line *not* preceded by `chain complete — no further action.` is treated as a mid-chain hand-back and steered back into the next phase automatically. The prose above is still the canonical contract; the plugin is the safety net.
 The only place a wrap-up belongs is **Step 7 — Final summary**, after the last phase has met its terminal predicate.
 ### Step 5 — Live verification
@@ -224,7 +197,7 @@ If you emitted anything that looks like a wrap-up ("All set!", "Done.", "Let me
 | `grill-with-docs` / `grill-lite`   | every open question answered, deferred with rationale, or escalated; relevant CONTEXT/ADR updates committed |
 | `to-prd`                           | PRD published to issue tracker with `needs-triage` (per `auto-create-issues`)           |
 | `to-issues`                        | all slices created on tracker; each is independently grabbable; labels applied per `auto-apply-labels` |
-| `triage`                           | issue carries exactly one state label; AI disclaimer present (verified via the Hard rule #8 self-check, not just inserted into the heredoc) |
+| `triage`                           | issue carries exactly one state label                                                                                                       |
 | `diagnose`                         | reproducible pass/fail loop exists AND root cause identified AND regression test exists |
 | `tdd`                              | new test red→green; project's check command clean; commit per `auto-commit-per-slice`    |
 | `improve-codebase-architecture`    | proposed deepening either applied or recorded as a follow-up issue                      |

package/skills/improve-skill-flow/SKILL.md CHANGED Viewed

@@ -7,9 +7,9 @@ description: MAINTAINER-ONLY. Analyse real pi-runtime session telemetry from any
 **Audience: pi-dev maintainers only.** Consumers do not get this skill installed. Consumers improve their own setup by editing `docs/agents/preferences.md` (per project) or `~/.pi/agent/preferences.md` (per machine), not by editing SKILL.md bodies. The framework is fixed for them; only the maintainer changes it.
-The pi-dev skills are pure markdown. They get better when the maintainer reads what real sessions did across consumer repos, compares that to what the SKILL.md said *should* happen, and edits the gap closed. This skill is the canonical loop for that.
+The pi-dev workflow is not just markdown. It is a layered control system: SKILL.md prose, repo preferences, pi-runtime lifecycle events, tools, commands, TUI surfaces, session persistence, compaction hooks, and extensions. This skill reads what real sessions did across consumer repos, compares that to what the workflow contract said *should* happen, then chooses the lightest effective intervention.
-The point: **never edit a SKILL.md from gut feeling.** Edit because session N showed phase P violated predicate Q on M occasions, and here is the line that would have prevented it.
+The point: **never improve the framework from gut feeling, and never collapse every fix into a "guard".** Edit because session N showed phase P violated predicate Q on M occasions, then decide whether the right response is clearer prose, repo-local preferences, a runtime steer, a custom tool, a command, a TUI affordance, state persistence, compaction/context shaping, or — only when the evidence calls for it — a blocking gate.
 ## Pre-flight (hard gate)
@@ -50,22 +50,49 @@ Always from inside the pi-dev checkout (see Pre-flight).
 - **Consumer repo path** (or its sessions directory) to audit. The maintainer names a repo on this machine; you resolve its sessions dir.
 - Optional: a specific skill name to focus the audit on (`do`, `migrate`, `triage`, …).
 - Optional: a date range.
-- **Fix scope** per finding — `framework` or `consumer-prefs`. Defaults set in Step 5.5; maintainer can flip individual rows before applying.
+- **Fix scope** per finding — `framework`, `extension`, or `consumer-prefs`. Defaults set in Step 5.5; maintainer can flip individual rows before applying.
 ## Fix scopes
-pi-runtime today loads skill bodies from a single location (`~/.pi/agent/skills/<name>/SKILL.md` for global installs, `<repo>/.pi/skills/<name>/SKILL.md` for local installs). The framework's 3-layer override is on **preferences**, not on SKILL bodies. So a finding lands in one of two places:
+pi-runtime loads three artefact kinds the framework can ship:
+- **Skill bodies** — `~/.pi/agent/skills/<name>/SKILL.md` (global) or `<repo>/.pi/skills/<name>/SKILL.md` (local). Pure prose.
+- **Runtime interventions** — extensions under `~/.pi/agent/extensions/<name>/` (global) or `<repo>/.pi/extensions/<name>/` (local). TypeScript modules auto-loaded via jiti. They can subscribe to lifecycle/session/agent/model/tool events, inject context in `before_agent_start`, reshape model context, observe `message_end`, intercept or modify tool calls/results, register custom tools/commands/shortcuts/flags, prompt via `ctx.ui`, render custom TUI widgets, persist state with `pi.appendEntry()`, and customize compaction/session behavior. pi-dev ships **one** extension by default — `pi-flow` — and the bar to add a second package remains high.
+- **Preferences** — `docs/agents/preferences.md` (per repo) and `~/.pi/agent/preferences.md` (per machine). 3-layer override on prose-level decisions.
+A finding lands in exactly one of:
 | scope | lands in | reaches | propagation | when to pick |
 | --- | --- | --- | --- | --- |
-| **framework** | this repo's `skills/<name>/SKILL.md` | every consumer after the next `npx pi-dev update` | release-please → npm publish | the SKILL.md wording itself is wrong; gap shows up generically |
-| **consumer-prefs** | the audited consumer repo's `docs/agents/preferences.md` (Project taboos / Diagnosis posture / Local-live playbook / Free notes — whichever section fits) | only that repo, on every `/do` bootstrap | regular consumer-repo commit | gap is the consumer repo's domain / paths / conventions, not the SKILL.md |
+| **framework** | this repo's `skills/<name>/SKILL.md` | every consumer after the next `npx pi-dev update` | release-please → npm publish | the SKILL.md wording is wrong; gap shows up generically; prose alone is plausibly enough |
+| **extension** | this repo's `extensions/pi-flow/index.ts` (extend) or a new `extensions/<name>/` (rare) | every consumer after `npx pi-dev update` | release-please → npm publish | a real audit shows prose / prefs cannot reliably preserve the workflow, AND pi-runtime has an event/tool/UI/state hook that can make the desired path easier, more observable, or safer |
+| **consumer-prefs** | the audited consumer repo's `docs/agents/preferences.md` | only that repo, on every `/do` bootstrap | regular consumer-repo commit | gap is the consumer repo's domain / paths / conventions, not the SKILL.md |
 Notes:
 - A `framework` apply is **always** mirrored into `~/.pi/agent/skills/<name>/` on this machine so the next session picks it up immediately, without waiting for npm.
+- An `extension` apply is **always** mirrored into `~/.pi/agent/extensions/<name>/` on this machine for the same reason. The package directory name is the install name verbatim (no `pi-dev-` prefix).
 - A `consumer-prefs` apply touches no pi-dev files. It is committed to the consumer repo only.
-- A single audit may produce a mix of framework and consumer-prefs findings. Decide scope per finding, not per audit.
+- A single audit may produce a mix of all three. Decide scope per finding, not per audit.
+**Extension scope is the most expensive option, but it is not synonymous with "guards".** Runtime interventions can be passive observability, progress/status UI, context injection, command shortcuts, structured tools, state checkpoints, compaction shaping, soft steers, confirmations, or hard blocks. Default to `framework` / `consumer-prefs` when prose or repo-local convention plausibly fixes the gap; reach for `extension` when the audit shows the workflow needs runtime support.
+Intervention ladder, from lightest to strongest:
+1. **Observe** — collect counters / emit low-noise status so the next audit can see what happened.
+2. **Make the right path easier** — custom command, custom tool, context injection, remembered state, or TUI affordance.
+3. **Steer** — inject a follow-up message or system/context nudge when the model is drifting but no irreversible action happened.
+4. **Confirm** — ask the user only for genuinely risky or preference-silent operations.
+5. **Block** — refuse a tool call only for destructive, costly, or repeatedly proven workflow violations.
+The bar for extension work:
+- The failure is repeated or high-cost, and the expected workflow was available in-context via SKILL.md / prefs / repo docs.
+- The runtime intervention can be described as a small event-driven mechanism over pi's lifecycle/session/agent/tool/UI/state surfaces; no hidden LLM call unless the finding is explicitly about a summarizer/evaluator extension.
+- The success metric is measurable in the next audit: fewer user nudges, higher `/do` phase completion, fewer corrections, better issue shape, shorter stalled intervals, or clearer live evidence.
+- A simpler remedy (docs wording, prefs, `.gitignore` lockout, issue-template fix) does **not** plausibly close the gap. If it does, take the simpler remedy.
+External research posture (DECAY): before proposing a new class of runtime intervention, do a quick current-source pass. 2025–2026 agent-reliability literature emphasizes trajectory monitoring, workflow-level observability, recovery orchestration, and textual feedback from traces — not just pre-execution blocking. Use that as a check against over-fitting on "guards"; cite the sources in the audit notes when they shape the proposal.
 ## Session-data location & format
@@ -159,12 +186,16 @@ Run a single pass over every targeted `.jsonl` and tally:
 - top Edit / Write targets (hotspot files)
 - bash commands matching domain-specific danger / smell patterns
-**Skill-flow specific:**
-- After each `<skill name="do">` injection, did another skill get injected within the same session? **This measures `/do` chain depth.** Single-step chains are a red flag.
+**Workflow-compliance specific:**
+- After each `<skill name="do">` injection, did another skill get injected within the same session? **This measures `/do` chain depth.** Single-step chains are a red flag, but not the whole story.
+- Did `/do` emit `[flow plan]`? Count planned phases and observed `[flow N/M]` status lines. Flag: plan missing, N/M skipped, final summary before all phases, or terminal predicate not evidenced.
 - Count user messages shorter than ~80 chars — these are usually nudges ("진행해", "다음은?", "끝났어?"). High proportion = `/do` is handing the flow back too often.
-- Count user messages containing correction markers (`아니`, `wait`, `stop`, `취소`, `다시`, `그만`, `undo`, `revert`). These mark interventions.
+- Count user messages containing correction markers (`아니`, `wait`, `stop`, `취소`, `다시`, `그만`, `undo`, `revert`, `왜`, `??`, `제대로`). These mark interventions.
+- Detect stalled intervals: long gaps between assistant text / tool calls, repeated identical commands, or repeated failed live probes beyond the repo's wait budget.
+- Count local-live / ops-live evidence when runtime behavior changed: command run, observed output, log/screenshot/evidence embedded in summary.
+- Count side-effect gates: commits, pushes, PRs, issue creates/edits/closes vs. merged prefs (`auto-*`) and tracker docs.
 - Count `<!-- migrated: ... -->` marker date vs. any post-marker `docs/handoff/` / `.scratch/flow/` / `SESSION_*.md` writes. Drift = handoff lockout failed.
-- Count tracker writes (`gh issue create`, `gh pr create`) and check whether `generated by AI` (the `/triage` disclaimer) appears in the surrounding text. Missing disclaimer = `/to-issues` / `/triage` predicate violated.
+- Count tracker writes (`gh issue create`, `gh pr create`) and inspect the bodies; the bodies are plain spec consumed by future worker agents, so look for issues with shape problems (missing parent, no acceptance criteria, wrong state labels, wrong parent topology, etc.) rather than meta tags.
 Use a deterministic Python or shell script you write once and check into `/tmp` for the duration of the run. Do not eyeball big JSONLs by hand.
@@ -191,7 +222,7 @@ Produce a small table per audited skill. Each row:
 | signal                         | observed | expected (predicate / rule)        | severity |
 | ------------------------------ | -------- | ---------------------------------- | -------- |
 | /do → next-skill chain depth   | 1/5      | ≥1 per chain step (M phases)       | 🔴       |
-| AI disclaimer on slice issues  | 0/8      | every issue starts with disclaimer | 🔴       |
+| /do hand-back via follow-up    | 8/13     | 0 mid-chain hand-backs             | 🔴       |
 | post-marker handoff writes     | 9        | 0                                  | 🟡       |
 | ...                            |          |                                    |          |
 ```
@@ -207,7 +238,7 @@ For every 🔴 / 🟡 row, quote the smallest piece of evidence that makes the g
 - a user message timestamp + first 80 chars,
 - a tool-call command that violates a taboo,
-- a missing disclaimer in a created issue body.
+- the first 80 chars of an issue body that violates the slice template.
 If a finding cannot be backed by an excerpt, it is not actionable yet — demote to a TODO and keep digging.
@@ -215,9 +246,19 @@ If a finding cannot be backed by an excerpt, it is not actionable yet — demote
 For each 🔴 / 🟡 finding, pick a default scope using the heuristic below, then show the table once and let the maintainer flip individual rows before applying.
+Before assigning scope, write one sentence answering: **what would have made the correct workflow easier to follow at the moment of failure?** Do not jump straight to a blocking guard.
+Default to `extension` if the finding matches **any** of:
+- The expected behavior was already in SKILL.md / prefs / repo docs at the time of the violating session **and** the session still drifted repeatedly or expensively.
+- The fix needs runtime affordance rather than prose: observe counters, inject context, add a command/tool, show TUI status, persist/checkpoint state, shape compaction/context, steer after a message, confirm a risky action, or block an unsafe tool call.
+- The success condition can be measured in the next session from telemetry, not merely hoped for from stronger wording.
+When defaulting to `extension`, classify the intervention type: `observe`, `affordance`, `context`, `state`, `render`, `steer`, `confirm`, or `block`.
 Default to `framework` if the finding matches **any** of:
-- Cites SKILL.md wording / phase / predicate / rule numbers.
+- Cites SKILL.md wording / phase / predicate / rule numbers, and the rule was not yet in place.
 - The proposed fix is a generic anti-pattern string, a terminator literal, a runway line, or a lockout that any repo would benefit from.
 - The same gap would plausibly show up in two or more consumer repos.
@@ -230,12 +271,12 @@ Default to `consumer-prefs` if the finding matches **any** of:
 Present the scope-decision table:
 ```
-| # | finding (short)                          | default scope    | target file                          | flip? |
-| - | ---------------------------------------- | ---------------- | ------------------------------------ | ----- |
-| 1 | /do hands flow back between phases       | framework        | pi-dev:skills/do/SKILL.md            |       |
-| 2 | docs/handoff/ resurrected after marker   | framework        | pi-dev:skills/migrate/SKILL.md       |       |
-| 3 | retro-action-item label still alive      | consumer-prefs   | hugn:docs/agents/preferences.md      |       |
-| 4 | smoke command name changed in S058       | consumer-prefs   | hugn:docs/agents/preferences.md      |       |
+| # | finding (short)                          | default scope    | target file                                          | flip? |
+| - | ---------------------------------------- | ---------------- | ---------------------------------------------------- | ----- |
+| 1 | /do hands flow back between phases       | extension:steer  | pi-dev:extensions/pi-flow/index.ts (message_end)     |       |
+| 2 | long live-smoke silence causes user nudges | extension:render/observe | pi-dev:extensions/pi-flow/index.ts or project extension |       |
+| 3 | post-marker handoff writes               | framework        | pi-dev:skills/migrate/SKILL.md (gitignore lockout)   |       |
+| 4 | smoke command name changed in S058       | consumer-prefs   | hugn:docs/agents/preferences.md                      |       |
 ```
 Ask once: "OK to proceed with these scopes? Reply with row numbers to flip, or `go`." Apply the flips and move on.
@@ -251,6 +292,16 @@ For each 🔴 / 🟡 finding, draft the smallest possible edit that, **if it had
 - Prefer **terminal markers** ("the summary's last line must be one of these two literals: …") over qualitative descriptions of "good wrap-up".
 - Update **at most three skills per run.** More than that means findings aren't anchored well enough.
+**Extension findings (target: pi-dev `extensions/pi-flow/index.ts`, or rarely a new `extensions/<name>/`):**
+- Start from pi's actual extension surface, not from the word "guard": lifecycle/session/agent/model/tool events, `before_agent_start` context injection, `context` shaping, `tool_call` / `tool_result` interception, registered tools, registered commands, `ctx.ui` notifications/widgets/custom UI, custom renderers, `pi.appendEntry()` state, and compaction/session hooks.
+- Prefer extending `pi-flow` when the concern is generic workflow compliance. Create a new extension only when the concern is large enough to be independently named, toggled, installed, and audited.
+- Pick the weakest intervention that would have changed the session outcome: observe → affordance → context/state/render → steer → confirm → block.
+- Determinism is mandatory for `confirm` and `block`; it is desirable but not sufficient for softer interventions. A progress widget, command shortcut, or state checkpoint can be valuable even when it does not block anything.
+- Runtime behavior must be toggleable via `~/.pi/agent/settings.json` when it can alter turns or tool execution. Default on only if the audit shows broad benefit.
+- Keep the corresponding SKILL.md prose as the human-readable contract: one-line *why*, the expected behavior, and a pointer to the runtime support. Do not delete the why — the model still needs to know the intent when the extension is off.
+- Update **at most one runtime mechanism per run** unless the second is pure observability. Two behavior changes at once destroys the next audit's ability to attribute movement.
 **Consumer-prefs findings (target: that repo's `docs/agents/preferences.md`):**
 - Pick the *narrowest* existing section that fits before adding a new one. Mapping:
@@ -265,10 +316,10 @@ For each 🔴 / 🟡 finding, draft the smallest possible edit that, **if it had
   | glossary / context term clarification | `## Glossary alignment` |
   | rationale that doesn't fit elsewhere | `## Free notes` (one paragraph max, dated) |
-- One bullet per finding. Reference the evidence ticket ("S058 smoke name", "#103 missing disclaimer") so the line stays auditable.
+- One bullet per finding. Reference the evidence ticket ("S058 smoke name", "#103 missing parent ref") so the line stays auditable.
 - Do **not** invent new top-level sections unless three findings legitimately share one.
-Show all drafts as one unified diff per target file before applying. Group by target file: pi-dev's `skills/<name>/SKILL.md` first (framework), then the consumer's `docs/agents/preferences.md` (consumer-prefs).
+Show all drafts as one unified diff per target file before applying. Group by target file: pi-dev's `extensions/pi-flow/index.ts` first (extension), then `skills/<name>/SKILL.md` (framework), then the consumer's `docs/agents/preferences.md` (consumer-prefs).
 ### 7 — Apply, release, verify (branches on scope)
@@ -281,6 +332,16 @@ Run both branches if the audit produced mixed-scope findings. Each branch has it
 3. `git push origin main`; release-please opens the version-bump PR; merge it; npm publish runs automatically.
 4. Confirm `npm view pi-dev@latest version` matches the bumped tag.
+**7a′. Extension branch** — only if any finding was approved as `extension`:
+1. Edit `extensions/pi-flow/index.ts` (or, only if justified by Step 5.5 bar, add a new `extensions/<name>/` directory with `index.ts`, `package.json`, `README.md`).
+2. If a new extension was added, register it in `src/install.ts` `EXTENSIONS` array so `pi-dev install` and `pi-dev update` propagate it.
+3. `npm run build` to ensure `install.ts` still compiles.
+4. Smoke-test with `node dist/cli.js install --local --skip-prefs -y` in `/tmp/<fresh-dir>`. Verify the extension landed under `.pi/extensions/<name>/`.
+5. `git add extensions/<name>/ src/install.ts src/paths.ts && git commit -m "feat(pi-flow): <one-liner anchoring the evidence>"`. Commit body must cite the signal.
+6. Mirror to live: `cp -r extensions/<name> ~/.pi/agent/extensions/<name>` so the **next** session picks up the change without waiting for npm.
+7. `git push origin main`; release-please → npm publish as usual.
 **7b. Consumer-prefs branch** — only if any finding was approved as `consumer-prefs`:
 1. In the audited consumer repo: edit `docs/agents/preferences.md` per the drafts from Step 6. Keep the migration marker at the very end of the file undisturbed.
@@ -291,7 +352,7 @@ Run both branches if the audit produced mixed-scope findings. Each branch has it
 **Verification (both branches).** After the next pi session in the affected repo:
 1. Re-run **this skill** scoped to the last 24 h.
-2. Confirm each previously-🔴 signal has moved (chain depth up, intervention rate down, taboo writes gone, disclaimer coverage at 100%, etc.).
+2. Confirm each previously-🔴 signal has moved (chain depth up, intervention rate down, taboo writes gone, etc.).
 3. If a signal did not move, the fix wording was too weak — file a follow-up audit, do not re-write from scratch.
 ## Terminal predicate
@@ -299,9 +360,9 @@ Run both branches if the audit produced mixed-scope findings. Each branch has it
 This skill is done when **all four** are true:
 1. A signal table with severities and evidence excerpts has been presented.
-2. Each finding has an approved scope (`framework` / `consumer-prefs` / `defer`) on record, defaulted by Step 5.5 and confirmed by the maintainer.
+2. Each finding has an approved scope (`framework` / `extension` / `consumer-prefs` / `defer`) on record, defaulted by Step 5.5 and confirmed by the maintainer.
 3. Either (a) zero 🔴 findings — flow is healthy, recorded as "no change this cycle", OR (b) each 🔴 finding has landed in its scope's target file.
-4. For any landed change: if `framework`, the npm version has bumped (`npm view pi-dev@latest version`); if `consumer-prefs`, the consumer repo has the commit on its push-stream. Either way, the next-session re-audit plan is stated.
+4. For any landed change: if `framework` or `extension`, the npm version has bumped (`npm view pi-dev@latest version`) and the live mirror is in place; if `consumer-prefs`, the consumer repo has the commit on its push-stream. Either way, the next-session re-audit plan is stated.
 The summary's **last line** must be one of:
@@ -323,10 +384,12 @@ audit complete — framework v<X.Y.Z> released, consumer-prefs commit <sha>, nex
 ## Heuristics
 - **One screenful of signals beats a dashboard.** Most of the time three or four numbers tell the whole story.
-- **Watch the gap between `/do` injection and next-skill injection.** That is the single highest-information ratio about whether the chain orchestrator is actually orchestrating.
-- **Short user messages are the cheapest interruption proxy.** Count them.
+- **Judge `/do` by workflow completion, not skill injection alone.** Chain depth is useful, but the stronger metric is: plan emitted → phases accounted for → terminal predicates evidenced → side effects match prefs → user did not need to re-enter the chain.
+- **Short user messages are the cheapest interruption proxy.** Count them, then inspect the preceding assistant turn to classify why the user had to nudge.
 - **A taboo without a `.gitignore` lockout will resurrect.** If the same taboo file shows up across two audits, the fix belongs in `/migrate`, not `/do`.
-- **Disclaimer / terminator / lockout / runway** are the four "tripwire" patterns that catch silent contract drift. Reach for them before inventing new ones.
+- **Do not overfit on guards.** The intervention may be an observability counter, status widget, custom command, context injection, state checkpoint, prompt/compaction shaping, soft steer, confirmation, or block. Pick by evidence and measured failure cost.
+- **Trajectory beats scalar success.** A task that eventually shipped after five corrections is not healthy; use the session trace to find where workflow support was missing.
+- **Refresh external reliability patterns when designing new runtime support.** Current agent-reliability work emphasizes trajectory monitoring, recovery orchestration, and trace-derived textual feedback; use that to challenge pi-flow designs before coding.
 ## Why this skill exists

package/skills/to-issues/SKILL.md CHANGED Viewed

@@ -55,11 +55,7 @@ For each approved slice, publish a new issue to the issue tracker. Use the issue
 Publish issues in dependency order (blockers first) so you can reference real issue identifiers in the "Blocked by" field.
-**Every issue body MUST start with the AI disclaimer** (same wording as `/triage`). This is a hard requirement — no exceptions, including auto-generated slices. After writing the body, grep your own draft for the disclaimer string; if missing, do not publish. This duplicates `/do` Hard rule #8 — the per-issue self-check below is part of the contract, not an optional verification step.
 <issue-template>
-> *This was generated by AI during triage.*
 ## Parent
 A reference to the parent issue on the issue tracker (if the source was an existing issue, otherwise omit this section).
@@ -82,15 +78,13 @@ Or "None - can start immediately" if no blockers.
 </issue-template>
-**Publishing pattern (GitHub).** Use `--body-file -` with a heredoc, never inline `--body "..."`, so the disclaimer and markdown structure survive shell quoting verbatim:
+**Publishing pattern (GitHub).** Use `--body-file -` with a heredoc, never inline `--body "..."`, so markdown structure survives shell quoting verbatim:
 ```bash
 gh issue create \
   --title "[Slice N] <title>" \
   --label needs-triage \
   --body-file - <<'EOF'
-> *This was generated by AI during triage.*
 ## Parent
 #<parent>
@@ -99,15 +93,4 @@ gh issue create \
 EOF
 ```
-After **each** `gh issue create` returns, the **very next tool call** must verify the disclaimer landed on that specific issue, before creating the next slice:
-```bash
-gh issue view <n> --json body --jq '.body' | awk 'NF{print; exit}' | grep -q 'generated by AI' \
-  || { echo "FAIL: disclaimer missing on #<n>"; exit 1; }
-```
-Use `awk 'NF{print; exit}'` (first non-blank line), not `head -1` — a leading blank line will pass `head -1` and hide a missing disclaimer.
-**If the self-check fails**, the next tool call MUST be `gh issue edit <n> --body-file <fixed>` to prepend the disclaimer, followed by a re-run of the self-check. Do not proceed to the next slice until the self-check on the current issue passes. Batching all the creates first and then running checks at the end is a Hard-rule #8 violation — the check gates the *next* create.
 Do NOT close or modify any parent issue.

package/skills/triage/SKILL.md CHANGED Viewed

@@ -7,12 +7,6 @@ description: Triage issues through a state machine driven by triage roles. Use w
 Move issues on the project issue tracker through a small state machine of triage roles.
-Every comment or issue posted to the issue tracker during triage **must** start with this disclaimer:
-```
-> *This was generated by AI during triage.*
-```
 ## Reference docs
 Before acting on the issue tracker, read the repo-local setup docs if they exist: