pi-dev 0.2.2 → 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pi-dev",
3
- "version": "0.2.2",
3
+ "version": "0.2.4",
4
4
  "description": "An autonomous engineering skill framework for the pi runtime — built on Matt Pocock's skills.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -17,6 +17,7 @@ The point: **one request → one finished outcome**, with as few user interrupti
17
17
  4. **Keep going. Never hand the flow back between phases.** When something would normally cause a halt (ambiguous classification, scope creep beyond `change-budget`, missing follow-up), **expand and surface in the final summary** instead of stopping mid-flow. Specifically:
18
18
  - Do **not** ask the user "shall I proceed to the next phase?", "want me to continue?", "should I do X next?", or any equivalent. The chain is decided in Step 3 and runs to completion.
19
19
  - Do **not** end your turn between phases. A phase's terminal predicate passing is the cue to immediately print the next phase's status line and start it, in the same turn.
20
+ - Do **not** end the turn on a `follow-up: <next slice> — run /do <intent>` line while phases in the planned chain are unfinished. A `follow-up:` line is the **Step 7 final-summary terminator**, not a between-phase handoff. If the chain has more phases, the next thing in the same turn is the next phase's status line, not a `follow-up:` line. Issuing `follow-up: ... run /do ...` mid-chain is a hand-back; delete it and start the next phase instead.
20
21
  - Confirmation-shaped questions are only legal via the **Ambiguity protocol** (one decision the prefs genuinely cannot answer) or the **Failure protocol** (terminal predicate failed twice).
21
22
 
22
23
  Halt only when:
@@ -38,18 +39,23 @@ The point: **one request → one finished outcome**, with as few user interrupti
38
39
 
39
40
  - Build the body with `--body-file <path>` or a heredoc, never with inline `--body "..."` (heredocs make the disclaimer visible in the diff and the file is auditable).
40
41
  - The first non-blank line of the body file must be the disclaimer literal above.
41
- - Immediately after `gh issue create` / `gh issue edit` returns, run one self-check on the resulting body and halt the flow if it fails:
42
+ - Immediately after **every** `gh issue create` / `gh issue edit` returns, the **very next tool call** must be this self-check on the resulting body. Not "sometime later", not "once at the end of the batch":
42
43
 
43
44
  ```bash
44
45
  gh issue view <num> --json body --jq '.body' | awk 'NF{print; exit}' | grep -q 'generated by AI' \
45
46
  || { echo "AI disclaimer missing on issue #<num>"; exit 1; }
46
47
  ```
47
48
 
49
+ - **If the self-check fails, the immediately following tool call MUST be a corrective `gh issue edit <num> --body-file <fixed>` that prepends the disclaimer literal, followed by a re-run of the self-check.** Do not create the next slice, do not call any other tool, and do not summarise progress until a self-check on that issue returns success. A failed self-check that is not followed by an edit-then-recheck pair is itself a Hard-rule #8 violation — equivalent to never having run the check.
50
+ - The `awk 'NF{print; exit}'` form is mandatory: it selects the first **non-blank** line, which is what the rule actually constrains. `head -1` is not equivalent and will pass false positives on bodies with a leading blank line.
51
+
48
52
  **Anti-patterns** (delete and redo if you catch any in your own draft):
49
53
 
50
54
  - `gh issue create --title "…" --body "..."` (inline body, no disclaimer check)
51
- - First line of issue body being `## Goal`, `## Scope`, `## Problem Statement`, `**Parent epic:**`, or any heading other than the disclaimer.
55
+ - First line of issue body being `## Goal`, `## Scope`, `## Problem Statement`, `## Context`, `**Parent epic:**`, or any heading other than the disclaimer.
52
56
  - Skipping the post-create `gh issue view ... | grep generated by AI` self-check because "I included the disclaimer in the heredoc".
57
+ - Running the self-check, seeing it fail, and *continuing to the next slice anyway* — "I'll fix the disclaimers in a batch at the end" is the failure mode that this rule is named after. Fix the one issue you just created, then move on.
58
+ - Batching N `gh issue create` calls in a row and then a single self-check at the end. The self-check is per-issue, and it gates the next `create`.
53
59
 
54
60
  Skills `/to-issues` and `/triage` repeat this rule for the case where you do enter them. This Hard rule is the binding statement for the case where you do not.
55
61
 
@@ -151,6 +157,8 @@ Then for each phase:
151
157
  - "Ready to move on to <next phase>?"
152
158
  - "Let me know if you want me to <next phase>."
153
159
  - Ending the turn after a phase summary when N < M.
160
+ - Ending the turn with `follow-up: <next planned phase> — run /do …` when N < M. The `follow-up:` line is reserved for *post-chain* deferred work (push, manual ops live, prefs refresh). Using it to describe the *next planned phase of this chain* is a disguised hand-back — start the phase instead.
161
+ - Treating a short user nudge ("진행해", "다음", "계속", "ㄱㄱ", "go", "이어서") as a new request. If the planned chain has unfinished phases, those nudges are noise — re-enter the chain at the next phase, do not re-classify and re-inject `/do`.
154
162
 
155
163
  The only place a wrap-up belongs is **Step 7 — Final summary**, after the last phase has met its terminal predicate.
156
164
 
@@ -84,7 +84,27 @@ Each line is one of these record types:
84
84
  | `session` | session header — has `cwd`, `timestamp`, `version`, `id` |
85
85
  | `model_change` | provider / modelId switch |
86
86
  | `thinking_level_change` | reasoning effort knob |
87
- | `message` | a user or assistant turn |
87
+ | `message` | a user / assistant / toolResult turn |
88
+
89
+ **Record shape for `message` (do not skim this — it has bitten parsers before):**
90
+
91
+ ```jsonc
92
+ {
93
+ "type": "message",
94
+ "id": "...",
95
+ "parentId": "...",
96
+ "timestamp": "2026-05-11T16:46:49.795Z",
97
+ "message": { // ← NESTED. role/content are HERE, not at top level.
98
+ "role": "user" | "assistant" | "toolResult",
99
+ "content": [ ...blocks ],
100
+ "timestamp": "...",
101
+ // assistant-only extras: api, provider, model, usage, stopReason, responseId
102
+ // toolResult-only extras: toolCallId, toolName, isError
103
+ }
104
+ }
105
+ ```
106
+
107
+ A correct read path is `rec["message"]["role"]` and `rec["message"]["content"]`. Reading `rec["role"]` / `rec["content"]` returns `None` for every record and silently produces a zero-row signal table — if your first pass shows all counters at 0, this is almost certainly why.
88
108
 
89
109
  `message.content` is a **list of blocks**, each block has a `type`:
90
110
 
@@ -95,8 +115,19 @@ Each line is one of these record types:
95
115
  | `toolCall` | `{id, name, arguments}` | **pi uses this** — NOT Anthropic SDK's `tool_use` |
96
116
  | `toolResult` | `{id, output}` | **pi uses this** — NOT `tool_result` |
97
117
 
118
+ Roles in practice: `user`, `assistant`, and **`toolResult`** (yes, role and block type share the name; a `toolResult`-role message contains one or more `text` blocks holding the tool output). Treat `toolResult`-role messages as siblings of the originating `toolCall` — do not double-count them as user/assistant turns.
119
+
98
120
  Tool names are **lower-case** (`bash`, `read`, `edit`, `write`, `glob`, `grep`, etc.). Build any parser around `toolCall` / `toolResult` first, then fall back to Anthropic-shaped blocks for robustness.
99
121
 
122
+ **Parser pre-flight (mandatory before Step 2 aggregates).** After you write the one-shot Python parser, run it on the newest 1–2 `.jsonl` files and assert the following are non-zero for any session that obviously had work done:
123
+
124
+ ```python
125
+ assert tool_count, "toolCall extraction returned 0 — check rec['message']['content'], not rec['content']"
126
+ assert user_msg_total or skill_inject_count, "no user messages parsed — same nested-message bug"
127
+ ```
128
+
129
+ If either assertion would fail, fix the parser before producing the signal table. A zero-row table is never a finding — it is a parser bug.
130
+
100
131
  **Critical detail:** the pi runtime injects each skill's `SKILL.md` content into the conversation as a `<skill name="..." location="...">…</skill>` block embedded inside a **user-role** message (system-side injection, but the role is user). This is how you tell the classifier loaded a skill. Count these to see what `/do` actually picked.
101
132
 
102
133
  Timestamps on `message` records are ISO strings; some other record types use int millis. Handle both.
@@ -221,6 +221,14 @@ In order:
221
221
  >> docs/agents/preferences.md
222
222
  ```
223
223
  The marker literal must read `<!-- migrated: YYYY-MM-DD by \`/migrate\` -->`. Do not paste a future or rounded date — `/do`'s 90-day staleness check relies on this being accurate.
224
+
225
+ **Self-check (mandatory, halts the migration if it fails).** Immediately after the `printf` above, verify the marker date equals today's UTC date:
226
+ ```bash
227
+ marker=$(grep -oE 'migrated: [0-9]{4}-[0-9]{2}-[0-9]{2}' docs/agents/preferences.md | tail -1 | awk '{print $2}')
228
+ today=$(date -u +%Y-%m-%d)
229
+ [ "$marker" = "$today" ] || { echo "marker $marker != today $today — re-run the printf with date -u, do not hand-type"; exit 1; }
230
+ ```
231
+ Skipping this check because "I already used `$(date -u …)`" is the anti-pattern — the hugn audit on 2026-05-11 found a marker of `2026-05-08` against a real preferences.md creation commit of `2026-05-11` despite the rule above.
224
232
  7. **Print summary**: counts of preserved / translated / archived / written, plus any FYI items needing manual review.
225
233
 
226
234
  ### 6. Idempotency
@@ -55,7 +55,7 @@ For each approved slice, publish a new issue to the issue tracker. Use the issue
55
55
 
56
56
  Publish issues in dependency order (blockers first) so you can reference real issue identifiers in the "Blocked by" field.
57
57
 
58
- **Every issue body MUST start with the AI disclaimer** (same wording as `/triage`). This is a hard requirement — no exceptions, including auto-generated slices. After writing the body, grep your own draft for the disclaimer string; if missing, do not publish.
58
+ **Every issue body MUST start with the AI disclaimer** (same wording as `/triage`). This is a hard requirement — no exceptions, including auto-generated slices. After writing the body, grep your own draft for the disclaimer string; if missing, do not publish. This duplicates `/do` Hard rule #8 — the per-issue self-check below is part of the contract, not an optional verification step.
59
59
 
60
60
  <issue-template>
61
61
  > *This was generated by AI during triage.*
@@ -99,11 +99,15 @@ gh issue create \
99
99
  EOF
100
100
  ```
101
101
 
102
- After creation, immediately verify the disclaimer landed:
102
+ After **each** `gh issue create` returns, the **very next tool call** must verify the disclaimer landed on that specific issue, before creating the next slice:
103
103
 
104
104
  ```bash
105
- gh issue view <n> --json body --jq '.body' | head -1 | grep -q 'generated by AI' \
106
- || echo "FAIL: disclaimer missing on #<n>"
105
+ gh issue view <n> --json body --jq '.body' | awk 'NF{print; exit}' | grep -q 'generated by AI' \
106
+ || { echo "FAIL: disclaimer missing on #<n>"; exit 1; }
107
107
  ```
108
108
 
109
+ Use `awk 'NF{print; exit}'` (first non-blank line), not `head -1` — a leading blank line will pass `head -1` and hide a missing disclaimer.
110
+
111
+ **If the self-check fails**, the next tool call MUST be `gh issue edit <n> --body-file <fixed>` to prepend the disclaimer, followed by a re-run of the self-check. Do not proceed to the next slice until the self-check on the current issue passes. Batching all the creates first and then running checks at the end is a Hard-rule #8 violation — the check gates the *next* create.
112
+
109
113
  Do NOT close or modify any parent issue.