@gotgenes/pi-autoformat 0.1.0 → 4.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (75) hide show
  1. package/.github/workflows/ci.yml +1 -3
  2. package/.github/workflows/release-please.yml +29 -0
  3. package/.markdownlint-cli2.yaml +14 -2
  4. package/.pi/extensions/pi-autoformat/config.json +3 -6
  5. package/.pi/prompts/README.md +59 -0
  6. package/.pi/prompts/plan-issue.md +64 -0
  7. package/.pi/prompts/retro.md +144 -0
  8. package/.pi/prompts/ship-issue.md +77 -0
  9. package/.pi/prompts/tdd-plan.md +67 -0
  10. package/.pi/skills/pi-extension-lifecycle/SKILL.md +256 -0
  11. package/.release-please-manifest.json +1 -1
  12. package/AGENTS.md +39 -0
  13. package/CHANGELOG.md +365 -0
  14. package/README.md +42 -109
  15. package/biome.json +1 -1
  16. package/docs/assets/logo.png +0 -0
  17. package/docs/assets/logo.svg +533 -0
  18. package/docs/configuration.md +358 -38
  19. package/docs/plans/0001-initial-implementation-plan.md +17 -9
  20. package/docs/plans/0002-richer-tui-formatter-summaries.md +220 -0
  21. package/docs/plans/0003-additional-pi-mutation-tools.md +273 -0
  22. package/docs/plans/0004-shell-driven-mutation-coverage.md +296 -0
  23. package/docs/plans/0010-acceptance-test-coverage.md +240 -0
  24. package/docs/plans/0012-remove-unused-formatter-extensions-field.md +152 -0
  25. package/docs/plans/0013-fallback-chain-step-type.md +280 -0
  26. package/docs/plans/0014-batch-by-default-formatter-dispatch.md +195 -0
  27. package/docs/plans/0015-builtin-treefmt-and-treefmt-nix-support.md +290 -0
  28. package/docs/plans/0016-detailed-formatter-output-on-failure.md +245 -0
  29. package/docs/plans/0022-pi-coding-agent-types.md +201 -0
  30. package/docs/plans/0027-format-before-agent-exit-follow-up-turn.md +355 -0
  31. package/docs/plans/0031-turn-end-flush-with-change-detection.md +365 -0
  32. package/docs/retro/0002-richer-tui-formatter-summaries.md +47 -0
  33. package/docs/retro/0013-fallback-chain-step-type.md +67 -0
  34. package/docs/retro/0015-builtin-treefmt-and-treefmt-nix-support.md +56 -0
  35. package/docs/retro/0016-detailed-formatter-output-on-failure.md +60 -0
  36. package/docs/retro/0022-pi-coding-agent-types.md +62 -0
  37. package/docs/testing.md +95 -0
  38. package/package.json +30 -11
  39. package/prek.toml +2 -2
  40. package/schemas/pi-autoformat.schema.json +145 -21
  41. package/src/builtin-formatters.ts +205 -0
  42. package/src/command-probe.ts +66 -0
  43. package/src/config-loader.ts +829 -90
  44. package/src/custom-mutation-tools.ts +125 -0
  45. package/src/extension.ts +469 -82
  46. package/src/format-scope.ts +118 -0
  47. package/src/formatter-config.ts +73 -36
  48. package/src/formatter-executor.ts +230 -34
  49. package/src/formatter-output-report.ts +149 -0
  50. package/src/formatter-registry.ts +139 -30
  51. package/src/index.ts +26 -5
  52. package/src/prompt-autoformatter.ts +148 -23
  53. package/src/shell-mutation-detector.ts +572 -0
  54. package/src/touched-files-queue.ts +72 -11
  55. package/test/acceptance-event-bus.test.ts +138 -0
  56. package/test/acceptance.test.ts +69 -0
  57. package/test/builtin-formatters.test.ts +382 -0
  58. package/test/command-probe.test.ts +79 -0
  59. package/test/config-loader.test.ts +640 -21
  60. package/test/custom-mutation-tools.test.ts +190 -0
  61. package/test/extension.test.ts +1535 -158
  62. package/test/fallback-acceptance.test.ts +98 -0
  63. package/test/fixtures/event-bus-emitter.ts +26 -0
  64. package/test/fixtures/formatter-recorder.mjs +25 -0
  65. package/test/format-scope.test.ts +139 -0
  66. package/test/formatter-config.test.ts +56 -5
  67. package/test/formatter-executor.test.ts +555 -35
  68. package/test/formatter-output-report.test.ts +178 -0
  69. package/test/formatter-registry.test.ts +330 -37
  70. package/test/helpers/rpc.ts +146 -0
  71. package/test/prompt-autoformatter.test.ts +315 -22
  72. package/test/schema.test.ts +149 -0
  73. package/test/shell-mutation-detector.test.ts +221 -0
  74. package/test/touched-files-queue.test.ts +40 -1
  75. package/test/types/theme-stub.test-d.ts +42 -0
@@ -0,0 +1,296 @@
1
+ ---
2
+ issue: 4
3
+ issue_title: "Post-v1: investigate shell-driven mutation coverage"
4
+ ---
5
+
6
+ # Plan: Shell-Driven Mutation Coverage (Issue #4)
7
+
8
+ ## Problem Statement
9
+
10
+ The v1 extension only tracks files mutated through Pi's built-in `write` and `edit` tools.
11
+ Files modified by `bash` invocations — including codegen, codemods, `sed -i`, `mv`, downloaded scaffolds, and project-specific scripts — are invisible to the touched-files queue and therefore are not formatted.
12
+
13
+ The original Issue #4 describes this as "one of the biggest remaining coverage gaps" for users who rely on shell commands that modify files.
14
+
15
+ ## Goals
16
+
17
+ - Detect a useful subset of file mutations performed by shell commands.
18
+ - Funnel any detected files through the existing prompt-end batching pipeline, reusing the formatter resolution and reporting paths.
19
+ - Keep behavior explicit, opt-in, and predictable — no repository-wide scans.
20
+ - Preserve the v1 safety properties: prompt-end timing remains the default, formatter failures stay non-blocking.
21
+
22
+ ## Non-Goals
23
+
24
+ - Tracking arbitrary side effects of complex shell pipelines.
25
+ - Filesystem watchers or whole-repo rescans after each `bash` call.
26
+ - Running formatters inside the shell tool's lifetime (still prompt-end).
27
+ - Strict-mode failure semantics (covered by Issue #6).
28
+
29
+ ## Background
30
+
31
+ Relevant existing pieces:
32
+
33
+ - `src/touched-files-queue.ts` — set-based dedupe of touched paths, currently hard-coded to the `write`/`edit` mutation tools.
34
+ - `src/extension.ts` — registers the `tool_result` handler that feeds the queue and triggers the prompt-end flush via `agent_end`.
35
+ - `src/prompt-autoformatter.ts` and `src/formatter-executor.ts` — already agnostic to where touched files came from.
36
+
37
+ This means the extension architecture is mostly compatible with new mutation sources; the work is mostly in *detection* and *configuration*.
38
+
39
+ ## Design Overview
40
+
41
+ We will add an opt-in shell mutation detector that participates in the same `tool_result` event flow used today.
42
+
43
+ The detector is structured as a chain of strategies, each emitting candidate paths from a shell tool result.
44
+ Strategies are intentionally narrow and explicit so users can reason about which commands are covered.
45
+
46
+ ### Strategy 1: Argument parsing for known mutating commands (default on)
47
+
48
+ For a small whitelist of commands with well-known mutation flags, parse the shell input string and extract target file arguments.
49
+ Initial whitelist:
50
+
51
+ - `sed -i …`
52
+ - `mv … <dest>` (single-target form only)
53
+ - `cp … <dest>` (single-target form only)
54
+ - `touch <files…>`
55
+ - `> <file>` and `>> <file>` redirections at the top of a simple command
56
+ - `tee <file>` / `tee -a <file>`
57
+
58
+ Rules:
59
+
60
+ - Only act when the parser recognizes the *whole* command shape; bail on pipelines, command substitutions, or unknown flags.
61
+ - Resolve paths relative to `ctx.cwd`.
62
+ - Drop paths that do not exist post-execution (the file may have been deleted by `mv` away, etc.).
63
+ - Out-of-scope filtering is handled centrally by the queue (see *Format Scope* below).
64
+
65
+ This covers the common deterministic cases without a generic shell parser.
66
+
67
+ ### Strategy 2: Pre/post directory snapshot for explicit globs (opt-in)
68
+
69
+ Per-project config can declare *snapshot scopes* — globs whose mtimes are sampled before and after each shell tool call.
70
+ Files whose mtime advanced are treated as touched.
71
+
72
+ ```jsonc
73
+ {
74
+ "shellMutationDetection": {
75
+ "enabled": true,
76
+ "snapshotGlobs": ["src/**/*.ts", "docs/**/*.md"]
77
+ }
78
+ }
79
+ ```
80
+
81
+ Rules:
82
+
83
+ - Snapshot only matched paths, not the whole repo.
84
+ - Cap the number of snapshotted entries (e.g., 5,000) and warn on overflow.
85
+ - Skip strategy entirely when no globs are configured, even if `enabled`.
86
+ - Use `node:fs` `stat` only; never read file contents during snapshotting.
87
+ - Skip directories in `.gitignore` and `node_modules` by default.
88
+
89
+ This is the explicit, low-noise alternative to whole-repo heuristics called out in the issue's constraints.
90
+
91
+ ### Strategy 3: User-declared shell wrappers (opt-in)
92
+
93
+ Allow users to configure shell command prefixes that are known to print the files they touched on stdout, one per line:
94
+
95
+ ```jsonc
96
+ {
97
+ "shellMutationDetection": {
98
+ "wrappers": [
99
+ { "prefix": "pnpm codegen", "outputFormat": "lines" }
100
+ ]
101
+ }
102
+ }
103
+ ```
104
+
105
+ When a `bash` tool result matches a configured prefix, parse stdout for paths and enqueue them.
106
+ This gives users a precise escape hatch without us having to model every codegen tool.
107
+
108
+ ## Format Scope (Out-of-CWD Handling)
109
+
110
+ The v1 `TouchedFilesQueue` normalizes paths but does **not** filter out-of-cwd targets.
111
+ Tightening this is necessary for the shell strategies (arg parsing and wrappers can produce arbitrary paths) and is also a latent v1 gap worth closing uniformly.
112
+
113
+ ### Default boundary: repo root, fall back to cwd
114
+
115
+ At session start, resolve the format scope once:
116
+
117
+ 1. Run `git rev-parse --show-toplevel` from `ctx.cwd`.
118
+ 2. If it succeeds, use that path as the scope root.
119
+ 3. If it fails (not a Git repo, Git missing), fall back to `ctx.cwd`.
120
+
121
+ This solves the monorepo case where Pi is launched inside a subpackage but the agent legitimately edits sibling packages, while staying conservative in non-Git contexts.
122
+ Git is used only as a *boundary discovery* mechanism here — a much weaker coupling than the deferred `git status` detection strategy.
123
+
124
+ ### Configuration
125
+
126
+ ```jsonc
127
+ {
128
+ "formatScope": "repoRoot" // "repoRoot" | "cwd" | string[]
129
+ }
130
+ ```
131
+
132
+ - `"repoRoot"` (default): repo-root with cwd fallback, as above.
133
+ - `"cwd"`: strict cwd subtree.
134
+ - `string[]`: explicit allowlist of roots, each resolved relative to `ctx.cwd` at load time.
135
+
136
+ ### Identification rules
137
+
138
+ For every candidate path, regardless of mutation source:
139
+
140
+ 1. Resolve to absolute via `path.resolve(cwd, candidate)`.
141
+ 2. `fs.realpath` both the candidate and each scope root, when the candidate exists.
142
+ Skip realpath if the candidate does not exist (e.g., a deleted target after `mv`); fall back to the normalized absolute form.
143
+ 3. Compute `path.relative(scopeRoot, resolvedCandidate)`.
144
+ In-scope iff the result is non-empty, does not start with `..`, and is not absolute.
145
+ 4. Use case-insensitive comparison on `darwin` and `win32`; case-sensitive elsewhere.
146
+ 5. If multiple scope roots are configured (the `string[]` form), the candidate is in-scope if it falls under any of them.
147
+
148
+ Realpath on both sides is what makes this correct in the presence of symlinks:
149
+
150
+ - A `pnpm` workspace dep symlinked into `node_modules` resolves *out* of the scope root and is correctly filtered.
151
+ - A `vendor/lib` symlink pointing to an absolute path that realpaths *into* a configured scope root is correctly included.
152
+
153
+ ### Out-of-scope handling
154
+
155
+ Drop silently from the queue.
156
+ Out-of-scope paths are common and benign (`mv` to `/tmp/`, scratch edits in `~/`), so user-visible warnings would be noise.
157
+ Optionally emit a debug-level log entry; do not surface in the prompt-end summary.
158
+
159
+ ### Applied uniformly
160
+
161
+ The scope check runs in `TouchedFilesQueue` itself, after path normalization.
162
+ All mutation sources — `write`, `edit`, shell argument parsing, wrappers, snapshot tracker — funnel through the same filter.
163
+ Each strategy's rules can stop restating "drop paths outside cwd" since the queue enforces it centrally.
164
+
165
+ ### Migration note
166
+
167
+ This tightens behavior for the existing `write`/`edit` paths: previously they would format any path the agent supplied.
168
+ The new default of `repoRoot` (with cwd fallback) is almost certainly the behavior users already expect, but it is technically a change.
169
+ Call it out in the changelog.
170
+ Users who relied on the old behavior can configure `formatScope` to a broader allowlist; we deliberately do not provide a "no scope check" escape hatch.
171
+
172
+ ## Shell Detection Configuration
173
+
174
+ New top-level config block under the existing extension-owned config files:
175
+
176
+ ```jsonc
177
+ {
178
+ "shellMutationDetection": {
179
+ "enabled": false,
180
+ "argumentParsing": true,
181
+ "snapshotGlobs": [],
182
+ "wrappers": []
183
+ }
184
+ }
185
+ ```
186
+
187
+ Precedence: project overrides global (existing behavior).
188
+
189
+ Defaults are intentionally conservative — feature is fully opt-in.
190
+ Once a user enables it, `argumentParsing` defaults to true because it has a tight, auditable surface.
191
+
192
+ Aligned updates required (per AGENTS.md):
193
+
194
+ - `schemas/pi-autoformat.schema.json`
195
+ - `docs/configuration.md`
196
+ - `README.md`
197
+ - TypeScript config loader (`src/config-loader.ts`, `src/formatter-config.ts`)
198
+
199
+ ## Code Changes
200
+
201
+ 1. **Config**
202
+ - Extend `AutoformatConfig` with `shellMutationDetection`.
203
+ - Validate types and unknown keys in `config-loader.ts`.
204
+
205
+ 2. **New module: `src/shell-mutation-detector.ts`**
206
+ - Pure functions: `parseKnownCommand(input)`, `matchWrapper(input, output, wrappers)`.
207
+ - Class `SnapshotTracker` for strategy 2, with `before()` / `after()`.
208
+ - No I/O at module load; injectable `fs`/`glob` for tests.
209
+
210
+ 3. **Touched-files queue**
211
+ - Generalize `MUTATION_TOOLS` into a registry of mutation-source handlers.
212
+ - Add a `bash` handler that delegates to the detector.
213
+ - Keep dedupe and `cwd`-relative normalization centralized.
214
+
215
+ 4. **Extension wiring**
216
+ - `tool_result` handler: if the tool is `bash` and detection is enabled, run argument parsing and wrapper matching against the result payload.
217
+ - Wrap the shell tool with `before/after` snapshot calls when `snapshotGlobs` is non-empty.
218
+ This requires a `tool_start` hook (or equivalent) — confirm the Pi extension API exposes one; if not, defer strategy 2 to a follow-up.
219
+
220
+ 5. **Reporting**
221
+ - No new reporting surface.
222
+ Files surface through the existing prompt-end summary path.
223
+
224
+ ## Testing
225
+
226
+ Per AGENTS.md, add focused tests:
227
+
228
+ - `shell-mutation-detector` argument parsing
229
+ - `sed -i 's/a/b/' foo.txt` → `["foo.txt"]`
230
+ - `sed -i.bak …` → `["foo.txt"]` (and ignores the `.bak`)
231
+ - `mv a.txt b.txt` → `["b.txt"]`
232
+ - pipelines / command substitutions → `[]`
233
+ - paths outside the format scope → `[]` (covered by the queue's scope check)
234
+ - snapshot tracker
235
+ - mtime advances → reported
236
+ - mtime unchanged → ignored
237
+ - cap exceeded → warning emitted, partial result returned
238
+ - wrapper matching
239
+ - prefix match with line output → list of files
240
+ - prefix mismatch → empty
241
+ - queue integration
242
+ - `bash` tool result feeds into prompt-end flush alongside `write`/`edit`
243
+ - dedupe across sources works
244
+ - format scope
245
+ - `repoRoot`: in-repo paths kept, out-of-repo paths dropped
246
+ - `repoRoot` with no Git: falls back to cwd subtree
247
+ - `cwd`: sibling-package paths dropped
248
+ - `string[]`: candidate matches any configured root
249
+ - symlinked workspace dep (realpaths outside scope) is dropped
250
+ - symlink whose realpath lands inside scope is kept
251
+ - case-insensitive match honored on darwin/win32
252
+ - config loader
253
+ - defaults are off
254
+ - project override of `snapshotGlobs` replaces, does not merge with global (consistent with current array-override behavior — confirm and document)
255
+
256
+ ## Rollout
257
+
258
+ 1. Land `formatScope` config + uniform scope filtering in the queue (independent of detection — closes the v1 gap on its own).
259
+ 2. Land config plumbing + schema/docs updates for shell detection with detection disabled.
260
+ 3. Land argument-parsing strategy behind the new config flag.
261
+ 4. Land wrapper strategy.
262
+ 5. Land snapshot strategy if a `tool_start` (or pre-tool) hook exists; else open a follow-up issue.
263
+ 6. Update `docs/plans/0001-initial-implementation-plan.md` and the README to describe the new opt-in coverage, the format scope behavior, and the explicit constraints.
264
+
265
+ ## Open Questions
266
+
267
+ - Does the Pi extension API expose a pre-tool hook usable for snapshotting? If not, strategy 2 is deferred.
268
+ - Should wrappers support a JSON output format in addition to line format? Likely yes, but not in the first cut.
269
+
270
+ ## Explicitly Deferred: `git status --porcelain` Detection
271
+
272
+ Using `git status --porcelain` (with a pre-call snapshot diff) was considered as a fourth strategy.
273
+ It offers near-complete coverage with no per-command modeling, honors `.gitignore` for free, and is fast.
274
+
275
+ It is **not** included at this stage because:
276
+
277
+ - It is implicit repo-wide behavior, which Issue #4 explicitly warns against.
278
+ - It produces false positives from concurrent activity (IDE saves, watchers, dev servers writing into tracked paths).
279
+ - It interacts awkwardly with pre-existing dirty working trees and requires careful snapshot/diff logic to subtract unrelated in-progress edits.
280
+ - Untracked files are ambiguous — sweeping them in catches scratch files, logs, and downloaded artifacts.
281
+ - It silently does nothing in non-Git directories, which Pi runs in more often than expected.
282
+ - It loses per-command attribution, hurting debuggability and foreclosing future per-command policy.
283
+
284
+ The explicit strategies above match the issue's stated philosophy that "explicit, low-noise designs are preferable to implicit heuristics." We can revisit `git status` later if real-world usage shows meaningful gaps the three explicit strategies cannot close, ideally as a scoped, opt-in tertiary strategy rather than a default.
285
+
286
+ ## Checkpoints / Commits
287
+
288
+ Following Conventional Commits:
289
+
290
+ - `feat: add formatScope config with repo-root default and cwd fallback`
291
+ - `feat(config): add shellMutationDetection schema and loader support`
292
+ - `feat: parse known mutating shell commands into touched files`
293
+ - `feat: support user-declared shell wrappers for touched-file output`
294
+ - `feat: snapshot configured globs around bash tool calls` *(conditional)*
295
+ - `docs: document shell mutation coverage and its constraints`
296
+ - `test: cover shell mutation detector strategies and queue integration`
@@ -0,0 +1,240 @@
1
+ ---
2
+ issue: 10
3
+ issue_title: "Expand acceptance test coverage with end-to-end pi CLI scenarios"
4
+ ---
5
+
6
+ # Plan: Expand acceptance test coverage (Issue #10)
7
+
8
+ ## Problem Statement
9
+
10
+ `test/acceptance.test.ts` spawns the real `pi` CLI in `--mode rpc`, loads the extension, and verifies a single `get_state` round-trip.
11
+ That smoke test catches load-time regressions (entrypoint shape, module resolution, `session_start` failures) without burning LLM credits, but it never drives a `tool_result` event end to end.
12
+ Issue #10 asks us to expand acceptance coverage so we also catch payload-shape drift, custom-tool dispatch regressions, EventBus channel regressions, and prompt-end batching regressions against the real Pi runtime — while keeping `pnpm test` deterministic, offline, and skippable for contributors without `pi` installed.
13
+
14
+ ## Goals
15
+
16
+ - Add deterministic, non-LLM acceptance tests that exercise the formatter pipeline through the real `pi` CLI (RPC mode), covering at minimum:
17
+ - the `bash` mutation path (snapshot tracker + shell mutation detector) driven by `{"type": "bash", ...}` RPC commands;
18
+ - the `pi.events`-based `autoformat:touched` channel via a small companion extension loaded alongside ours;
19
+ - the `customMutationTools` declarative path via a companion extension that registers a synthetic mutation tool and triggers it through an extension command (`/cmd`).
20
+ - Reuse the existing skip-when-`pi`-is-absent pattern so `pnpm test` stays green for contributors without Pi installed.
21
+ - Keep each acceptance test isolated to its own `cwd` and config, with no shared state between tests.
22
+ - Document an opt-in, env-gated path (`PI_AUTOFORMAT_LLM_TESTS=1`) for occasional LLM-backed scenarios — design only, no LLM calls in default CI.
23
+ - Resolve the `pi` binary from the locally-installed `@mariozechner/pi-coding-agent` devDependency rather than the global `PATH`, so the acceptance suite runs in CI under the existing `pnpm install --frozen-lockfile` step with no workflow changes.
24
+
25
+ ## Non-Goals
26
+
27
+ - Running LLM-backed scenarios on every `pnpm test` invocation.
28
+ These remain opt-in behind `PI_AUTOFORMAT_LLM_TESTS=1`; default CI must not require API keys.
29
+ - Adding new product features.
30
+ This is a test-coverage change; the only production code touched would be small fixes to anything the new tests expose.
31
+ - Replacing the existing unit / integration tests.
32
+ Acceptance tests are additive; they pin behavior at the Pi-runtime boundary, not inside our modules.
33
+ - Building a generic Pi-extension test harness.
34
+ The fixtures live under `test/fixtures/` and exist solely to drive the autoformatter pipeline.
35
+ - Hardening Pi's RPC protocol or filing upstream requests beyond what this plan needs.
36
+
37
+ ## Background
38
+
39
+ Relevant existing surface:
40
+
41
+ - `test/acceptance.test.ts` — current smoke test.
42
+ Spawns `pi --mode rpc --no-tools --no-extensions --no-session -e <EXTENSION_PATH>`, writes JSON commands to stdin, parses JSON-per-line responses from stdout.
43
+ Today it relies on `pi` being on `PATH` and skips when `spawnSync("pi", ["--help"]).status !== 0`; the new harness will resolve `pi` from `node_modules/.bin/pi` so the test runs whenever `pnpm install` has been done.
44
+ - `src/extension.ts` — `createAutoformatExtension` wires three real touched-file sources: built-in `write`/`edit` `tool_result` events, `customMutationTools` declared in config, and `pi.events.on(channel, …)` for the `autoformat:touched` (configurable) channel.
45
+ - `src/shell-mutation-detector.ts` — drives `bash` snapshot tracking (`SnapshotTracker`) plus argument parsing and wrapper matching for known mutation commands. The `bash` RPC command sends a real `tool_call` + `tool_result` pair through Pi.
46
+ - `src/custom-mutation-tools.ts` — `parseTouchedPayload` and `createCustomToolHandlers` accept either a `{ touched: string[] }` payload (event-bus path) or extract paths from configured custom tools.
47
+ - `node_modules/@mariozechner/pi-coding-agent/docs/rpc.md` — confirms RPC supports `prompt`, `bash`, `get_state`, and that `prompt` accepts extension commands (`/mycommand`) which execute immediately even without an LLM provider configured.
48
+ - `node_modules/@mariozechner/pi-coding-agent/docs/extensions.md` — confirms `pi.registerCommand(name, { handler })` lets a companion extension expose a slash command we can trigger over RPC.
49
+
50
+ Implications:
51
+
52
+ - Pi's RPC mode does not document a way to inject synthetic `tool_result` events directly.
53
+ We therefore drive real events via two channels Pi already provides: the `bash` RPC command (real `bash` `tool_result`) and `prompt` with a slash command (lets a companion extension synthesize side effects).
54
+ - A companion extension is the cleanest way to exercise both `customMutationTools` and the `autoformat:touched` event-bus channel without an LLM.
55
+ - Multiple `-e` flags are supported (per `extensions.md`), so we can load `src/extension.ts` and a fixture extension in the same RPC session.
56
+
57
+ ## Design Overview
58
+
59
+ ### Test fixtures
60
+
61
+ Place small companion extensions under `test/fixtures/` so the production package never imports them:
62
+
63
+ ```text
64
+ test/fixtures/
65
+ event-bus-emitter.ts # registers /emit-touched, calls pi.events.emit("autoformat:touched", { touched })
66
+ custom-tool-emitter.ts # registers a synthetic mutation tool + /trigger-custom-tool slash command
67
+ formatter-recorder.sh # POSIX shell script used as the formatter "command" so we can assert invocation
68
+ ```
69
+
70
+ `formatter-recorder.sh` writes its argv plus `cwd` to a file the test can read after the flush.
71
+ Using a real on-disk recorder keeps the test independent of mocking — it asserts what Pi actually invoked.
72
+
73
+ ### Shared RPC harness
74
+
75
+ Extract the `runRpcSession(...)` helper currently inline in `test/acceptance.test.ts` into `test/helpers/rpc.ts` so all acceptance tests share one implementation.
76
+ The helper accepts an optional `extraExtensions: string[]` so each test can load its companion fixture alongside `src/extension.ts`.
77
+
78
+ ```typescript
79
+ // test/helpers/rpc.ts
80
+ export type RpcResponse = {
81
+ id?: string;
82
+ type: string;
83
+ command?: string;
84
+ success?: boolean;
85
+ data?: unknown;
86
+ };
87
+
88
+ export type RpcEvent = { type: string; [key: string]: unknown };
89
+
90
+ export async function runRpcSession(options: {
91
+ cwd: string;
92
+ commands: object[];
93
+ extraExtensions?: string[];
94
+ timeoutMs?: number;
95
+ env?: NodeJS.ProcessEnv;
96
+ }): Promise<{
97
+ responses: RpcResponse[];
98
+ events: RpcEvent[];
99
+ stderr: string;
100
+ exitCode: number | null;
101
+ }>;
102
+ ```
103
+
104
+ ### Per-test scaffolding
105
+
106
+ Each acceptance test creates its own temp `cwd`, writes:
107
+
108
+ - `.pi/extensions/pi-autoformat/config.json` configuring formatters whose `command` points at `formatter-recorder.sh` (or a small Node runner script copied next to the temp dir);
109
+ - one or more files matching the chain extensions, so the formatter actually has something to run on.
110
+
111
+ After the RPC session closes, the test reads the recorder log and asserts:
112
+
113
+ - which formatter command ran;
114
+ - which absolute paths were passed in argv;
115
+ - per-test invariants (e.g. the snapshot tracker only emitted files that actually changed during the bash command).
116
+
117
+ ### Scenarios
118
+
119
+ 1. `bash` shell-mutation acceptance.
120
+ - `commands: [{type: "bash", command: "node -e 'fs.writeFileSync(\"out.ts\", ...)'"}]` followed by an explicit `agent_end` trigger (we already flush on `agent_end`; for the bash-only scenario we trigger the prompt-end flush by sending a `prompt` with a no-op slash command provided by a tiny "flush trigger" fixture, or by closing stdin and asserting the `session_shutdown` flush — pick whichever is more deterministic during TDD).
121
+ - Asserts the recorder ran on `out.ts` exactly once.
122
+
123
+ 2. `customMutationTools` acceptance.
124
+ - Loads `custom-tool-emitter.ts` via `-e`. The companion registers a tool whose name is also listed in `customMutationTools` config; the slash command `/trigger-custom-tool` calls `pi.sendMessage()`-style hooks that produce a real `tool_result` for that tool with a `{touched: [...]}` payload.
125
+ - If we cannot trigger a real registered tool without an LLM, fall back to a slash command that emits the synthetic `tool_result` via a documented hook; if no such hook exists, drop this scenario into Open Questions and rely on the EventBus path for v1.
126
+ - Asserts the recorder ran on the declared file.
127
+
128
+ 3. `autoformat:touched` EventBus acceptance.
129
+ - Loads `event-bus-emitter.ts` via `-e`. Slash command `/emit-touched <path>` calls `pi.events.emit("autoformat:touched", { touched: [absolutePath] })`.
130
+ - Test sends `{type: "prompt", message: "/emit-touched out.ts"}`, then triggers a flush as in (1).
131
+ - Asserts the recorder ran on `out.ts`.
132
+
133
+ 4. LLM-gated scenario (design only; not implemented in this plan unless trivial).
134
+ - Skipped unless `process.env.PI_AUTOFORMAT_LLM_TESTS === "1"` and a provider env var is present.
135
+ - Sends a real `prompt` and asserts the recorder ran on the file the agent edited.
136
+ - Documented in `docs/testing.md` (new), not enabled in CI.
137
+
138
+ ### Resolving the `pi` binary
139
+
140
+ `@mariozechner/pi-coding-agent` is already a devDependency and ships a `pi` bin, so `pnpm install` produces a working `node_modules/.bin/pi`.
141
+ The new harness resolves that path explicitly instead of relying on the global `PATH`:
142
+
143
+ ```typescript
144
+ // test/helpers/rpc.ts
145
+ import { existsSync } from "node:fs";
146
+ import { resolve } from "node:path";
147
+
148
+ export const PI_BIN = resolve("node_modules/.bin/pi");
149
+ export const piAvailable = existsSync(PI_BIN);
150
+ ```
151
+
152
+ Benefits:
153
+
154
+ - CI runs the acceptance suite with no workflow changes — `pnpm install --frozen-lockfile` already provides `pi`.
155
+ - The Pi version is pinned by `pnpm-lock.yaml`, so CI and local runs use the same binary.
156
+ - Contributors who ran `pnpm install` immediately get the acceptance suite; no separate global install.
157
+
158
+ ### Skip semantics
159
+
160
+ `describeIfPi` becomes a safety net rather than the default contributor experience: it skips only when `node_modules/.bin/pi` is missing (e.g. someone forgot `pnpm install`).
161
+ LLM-gated scenarios add a second guard on `PI_AUTOFORMAT_LLM_TESTS` and the relevant API key.
162
+
163
+ ## Module-Level Changes
164
+
165
+ - `test/helpers/rpc.ts` — new. Extracted RPC harness. Adds `extraExtensions`, `env`, and event/response separation.
166
+ - `test/acceptance.test.ts` — updated to import from `test/helpers/rpc.ts`. Behavior unchanged.
167
+ - `test/acceptance-bash-mutation.test.ts` — new. Scenario (1).
168
+ - `test/acceptance-event-bus.test.ts` — new. Scenario (3).
169
+ - `test/acceptance-custom-tool.test.ts` — new. Scenario (2). May be deferred if the no-LLM trigger turns out infeasible (see Open Questions).
170
+ - `test/fixtures/event-bus-emitter.ts` — new companion extension.
171
+ - `test/fixtures/custom-tool-emitter.ts` — new companion extension.
172
+ - `test/fixtures/formatter-recorder.sh` — new helper script (executable, POSIX `sh`).
173
+ - `docs/testing.md` — new. Documents acceptance-test layout, the `node_modules/.bin/pi` resolution behavior, skip semantics, and the env-gated `PI_AUTOFORMAT_LLM_TESTS` design.
174
+ - `README.md` — small "Testing" pointer to `docs/testing.md`.
175
+
176
+ No `.github/workflows/*.yml` change is needed: the existing `pnpm install --frozen-lockfile` step already provides `pi` via the devDependency.
177
+
178
+ No production source under `src/` is expected to change.
179
+ If the new tests expose a real bug, that bug is fixed in its own commit on the same branch.
180
+
181
+ ## TDD Order
182
+
183
+ 1. **Refactor the RPC harness (red → green).**
184
+ Move `runRpcSession` into `test/helpers/rpc.ts`; add `extraExtensions` and an `events` array in the result.
185
+ Resolve `PI_BIN` from `node_modules/.bin/pi` and key `piAvailable` off `existsSync(PI_BIN)` instead of `spawnSync("pi", ["--help"])`.
186
+ Update `test/acceptance.test.ts` to use the new harness; the existing test must still pass.
187
+ Commit: `test: extract shared rpc harness and resolve pi from node_modules`.
188
+
189
+ 2. **Bash mutation acceptance (red → green).**
190
+ Add `test/fixtures/formatter-recorder.sh`. Add `test/acceptance-bash-mutation.test.ts` that writes a project config pointing at the recorder, sends a `bash` RPC command that creates `out.ts`, triggers a flush, and asserts the recorder log.
191
+ Commit: `test: add acceptance coverage for bash-driven mutation flush`.
192
+
193
+ 3. **EventBus channel acceptance (red → green).**
194
+ Add `test/fixtures/event-bus-emitter.ts` and `test/acceptance-event-bus.test.ts`. Drive `/emit-touched` via `prompt`, trigger a flush, assert.
195
+ Commit: `test: add acceptance coverage for autoformat:touched event bus`.
196
+
197
+ 4. **Custom-tool acceptance (red → green) — conditional.**
198
+ Add `test/fixtures/custom-tool-emitter.ts` and `test/acceptance-custom-tool.test.ts`. If the no-LLM trigger path proves infeasible during TDD, capture the finding in Open Questions and skip this cycle.
199
+ Commit: `test: add acceptance coverage for customMutationTools dispatch`.
200
+
201
+ 5. **Documentation (green → docs).**
202
+ Add `docs/testing.md` describing the `node_modules/.bin/pi` resolution, skip semantics, and the env-gated LLM scenario design.
203
+ Update `README.md` to point at it.
204
+ Commit: `docs: document acceptance-test layout and pi binary resolution`.
205
+
206
+ ## Risks and Mitigations
207
+
208
+ - **RPC has no synthetic-tool-result injection.**
209
+ Mitigation: drive real events via `bash` RPC and slash-command-emitted EventBus events.
210
+ The custom-tool scenario degrades gracefully: if no non-LLM trigger exists, mark it deferred and rely on the EventBus path until an LLM-gated test fills the gap.
211
+
212
+ - **Spawning `pi` is slow and platform-sensitive.**
213
+ Mitigation: keep per-test timeout generous (10 s default, configurable per test), parallelize sparingly (Vitest runs files in parallel — these tests get their own files for `cwd` isolation), and continue skipping when `pi` is absent.
214
+
215
+ - **Formatter recorder script is not portable to Windows.**
216
+ Mitigation: use a tiny Node script (`formatter-recorder.mjs`) instead of `.sh` if Windows support matters.
217
+ Default plan picks the script flavor that matches the maintainer's CI; fallback is documented in `docs/testing.md`.
218
+
219
+ - **Companion extensions drift from Pi's API.**
220
+ Mitigation: keep them tiny (≤ 30 lines each), import from the same `@mariozechner/pi-coding-agent` types the production extension uses, and exercise them in CI so any drift fails fast.
221
+
222
+ - **Snapshot tracker false negatives.**
223
+ The `bash` scenario depends on the snapshot tracker's globs matching the test file.
224
+ Mitigation: explicitly configure `shellMutationDetection.snapshotGlobs` in the project config the test writes, so the test pins the configured contract rather than the default.
225
+
226
+ ## Open Questions
227
+
228
+ - Should `formatter-recorder` be a POSIX shell script or a Node script?
229
+ Resolved during execution: chose `formatter-recorder.mjs` (Node) for portability.
230
+
231
+ ## Execution Notes
232
+
233
+ - Step 2 (bash mutation acceptance) and step 4 (`customMutationTools` acceptance) were **deferred** during execution.
234
+ Empirical probing (and Pi's `docs/rpc.md`) confirmed that the RPC `bash` command does not emit `tool_call` / `tool_result` events; it only stores a `BashExecutionMessage` for the next prompt's LLM context.
235
+ Likewise, slash commands run extension code directly without going through tool dispatch, so a fixture extension cannot synthesize a `tool_result` event for a registered custom tool.
236
+ Both scenarios therefore require a real LLM-driven tool invocation and have been moved into the future LLM-gated suite documented in `docs/testing.md`.
237
+ - The plan's example payload `{ touched: string[] }` for the EventBus channel was incorrect.
238
+ The real contract handled by `parseTouchedPayload` is `{ path: string }` or `{ paths: string[] }`; the fixture and test now use `{ paths }`.
239
+ - macOS resolves `/var` to `/private/var` via realpath.
240
+ The acceptance test calls `realpathSync` on its temp `cwd` so assertions on the recorder's `process.cwd()` match what Pi spawns the formatter with.