pi-dev 0.2.6 → 0.2.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pi-dev",
3
- "version": "0.2.6",
3
+ "version": "0.2.7",
4
4
  "description": "An autonomous engineering skill framework for the pi runtime — built on Matt Pocock's skills.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -11,6 +11,12 @@ The pi-dev workflow is not just markdown. It is a layered control system: SKILL.
11
11
 
12
12
  The point: **never improve the framework from gut feeling, and never collapse every fix into a "guard".** Edit because session N showed phase P violated predicate Q on M occasions, then decide whether the right response is clearer prose, repo-local preferences, a runtime steer, a custom tool, a command, a TUI affordance, state persistence, compaction/context shaping, or — only when the evidence calls for it — a blocking gate.
13
13
 
14
+ ## North star
15
+
16
+ `/do` is the autonomous engineering lifecycle orchestrator. Its job is not to perform one isolated action; its job is to carry a request through the agreed lifecycle in order — classify, scope, plan, run the right skills, satisfy each phase predicate, verify locally/live as required, apply side effects according to prefs, and finish with durable state in code / issues / preferences. The framework's product goal is to make that A→Z loop long-running, durable, low-interruption, and trustworthy across real repos.
17
+
18
+ Therefore `/improve-skill-flow` audits **lifecycle adherence**, not just individual mistakes. A session that eventually shipped after repeated user nudges, skipped verification, wrong side effects, or unordered phases is not healthy. The improvement question is: what framework / preference / runtime support would have let `/do` complete the full lifecycle autonomously and correctly the first time?
19
+
14
20
  ## Pre-flight (hard gate)
15
21
 
16
22
  Refuse to run unless cwd is the pi-dev repo. Releases happen from here; nothing else makes sense.
@@ -187,6 +193,7 @@ Run a single pass over every targeted `.jsonl` and tally:
187
193
  - bash commands matching domain-specific danger / smell patterns
188
194
 
189
195
  **Workflow-compliance specific:**
196
+ - Reconstruct `/do`'s lifecycle trajectory: classify → scope → plan → phase execution → terminal predicates → verification → side effects → durable final state. Score the ordered lifecycle, not just the final outcome.
190
197
  - After each `<skill name="do">` injection, did another skill get injected within the same session? **This measures `/do` chain depth.** Single-step chains are a red flag, but not the whole story.
191
198
  - Did `/do` emit `[flow plan]`? Count planned phases and observed `[flow N/M]` status lines. Flag: plan missing, N/M skipped, final summary before all phases, or terminal predicate not evidenced.
192
199
  - Count user messages shorter than ~80 chars — these are usually nudges ("진행해", "다음은?", "끝났어?"). High proportion = `/do` is handing the flow back too often.
@@ -384,7 +391,7 @@ audit complete — framework v<X.Y.Z> released, consumer-prefs commit <sha>, nex
384
391
  ## Heuristics
385
392
 
386
393
  - **One screenful of signals beats a dashboard.** Most of the time three or four numbers tell the whole story.
387
- - **Judge `/do` by workflow completion, not skill injection alone.** Chain depth is useful, but the stronger metric is: plan emitted → phases accounted for → terminal predicates evidenced → side effects match prefs → user did not need to re-enter the chain.
394
+ - **Judge `/do` by lifecycle completion, not skill injection alone.** Chain depth is useful, but the stronger metric is: classify/scope correct → plan emitted → phases run in order → terminal predicates evidenced → verification completed → side effects match prefs → durable state landed → user did not need to re-enter the chain.
388
395
  - **Short user messages are the cheapest interruption proxy.** Count them, then inspect the preceding assistant turn to classify why the user had to nudge.
389
396
  - **A taboo without a `.gitignore` lockout will resurrect.** If the same taboo file shows up across two audits, the fix belongs in `/migrate`, not `/do`.
390
397
  - **Do not overfit on guards.** The intervention may be an observability counter, status widget, custom command, context injection, state checkpoint, prompt/compaction shaping, soft steer, confirmation, or block. Pick by evidence and measured failure cost.