npm - pi-dev - Versions diffs - 0.2.6 → 0.2.7 - Mend

pi-dev 0.2.6 → 0.2.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/package.json +1 -1
package/skills/improve-skill-flow/SKILL.md +8 -1

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "pi-dev",
-  "version": "0.2.6",
+  "version": "0.2.7",
   "description": "An autonomous engineering skill framework for the pi runtime — built on Matt Pocock's skills.",
   "type": "module",
   "bin": {

package/skills/improve-skill-flow/SKILL.md CHANGED Viewed

@@ -11,6 +11,12 @@ The pi-dev workflow is not just markdown. It is a layered control system: SKILL.
 The point: **never improve the framework from gut feeling, and never collapse every fix into a "guard".** Edit because session N showed phase P violated predicate Q on M occasions, then decide whether the right response is clearer prose, repo-local preferences, a runtime steer, a custom tool, a command, a TUI affordance, state persistence, compaction/context shaping, or — only when the evidence calls for it — a blocking gate.
+## North star
+`/do` is the autonomous engineering lifecycle orchestrator. Its job is not to perform one isolated action; its job is to carry a request through the agreed lifecycle in order — classify, scope, plan, run the right skills, satisfy each phase predicate, verify locally/live as required, apply side effects according to prefs, and finish with durable state in code / issues / preferences. The framework's product goal is to make that A→Z loop long-running, durable, low-interruption, and trustworthy across real repos.
+Therefore `/improve-skill-flow` audits **lifecycle adherence**, not just individual mistakes. A session that eventually shipped after repeated user nudges, skipped verification, wrong side effects, or unordered phases is not healthy. The improvement question is: what framework / preference / runtime support would have let `/do` complete the full lifecycle autonomously and correctly the first time?
 ## Pre-flight (hard gate)
 Refuse to run unless cwd is the pi-dev repo. Releases happen from here; nothing else makes sense.
@@ -187,6 +193,7 @@ Run a single pass over every targeted `.jsonl` and tally:
 - bash commands matching domain-specific danger / smell patterns
 **Workflow-compliance specific:**
+- Reconstruct `/do`'s lifecycle trajectory: classify → scope → plan → phase execution → terminal predicates → verification → side effects → durable final state. Score the ordered lifecycle, not just the final outcome.
 - After each `<skill name="do">` injection, did another skill get injected within the same session? **This measures `/do` chain depth.** Single-step chains are a red flag, but not the whole story.
 - Did `/do` emit `[flow plan]`? Count planned phases and observed `[flow N/M]` status lines. Flag: plan missing, N/M skipped, final summary before all phases, or terminal predicate not evidenced.
 - Count user messages shorter than ~80 chars — these are usually nudges ("진행해", "다음은?", "끝났어?"). High proportion = `/do` is handing the flow back too often.
@@ -384,7 +391,7 @@ audit complete — framework v<X.Y.Z> released, consumer-prefs commit <sha>, nex
 ## Heuristics
 - **One screenful of signals beats a dashboard.** Most of the time three or four numbers tell the whole story.
-- **Judge `/do` by workflow completion, not skill injection alone.** Chain depth is useful, but the stronger metric is: plan emitted → phases accounted for → terminal predicates evidenced → side effects match prefs → user did not need to re-enter the chain.
+- **Judge `/do` by lifecycle completion, not skill injection alone.** Chain depth is useful, but the stronger metric is: classify/scope correct → plan emitted → phases run in order → terminal predicates evidenced → verification completed → side effects match prefs → durable state landed → user did not need to re-enter the chain.
 - **Short user messages are the cheapest interruption proxy.** Count them, then inspect the preceding assistant turn to classify why the user had to nudge.
 - **A taboo without a `.gitignore` lockout will resurrect.** If the same taboo file shows up across two audits, the fix belongs in `/migrate`, not `/do`.
 - **Do not overfit on guards.** The intervention may be an observability counter, status widget, custom command, context injection, state checkpoint, prompt/compaction shaping, soft steer, confirmation, or block. Pick by evidence and measured failure cost.