npm - pan-wizard - Versions diffs - 3.5.2 → 3.7.10 - Mend

pan-wizard 3.5.2 → 3.7.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (93) hide show

package/README.md +8 -8
package/agents/pan-executor.md +18 -0
package/agents/pan-experiment-runner.md +126 -0
package/agents/pan-phase-researcher.md +16 -0
package/agents/pan-plan-checker.md +80 -0
package/agents/pan-planner.md +19 -0
package/agents/pan-reviewer.md +2 -0
package/agents/pan-verifier.md +41 -0
package/bin/install-lib.cjs +55 -0
package/bin/install.js +71 -22
package/commands/pan/debug.md +1 -1
package/commands/pan/experiment.md +219 -0
package/commands/pan/health.md +1 -1
package/commands/pan/learn.md +15 -1
package/commands/pan/optimize.md +13 -0
package/commands/pan/patches.md +10 -1
package/commands/pan/phase-tests.md +1 -4
package/commands/pan/todo-add.md +1 -1
package/commands/pan/todo-check.md +1 -1
package/hooks/dist/pan-cost-logger.js +54 -4
package/hooks/dist/pan-trace-logger.js +72 -3
package/package.json +67 -66
package/pan-wizard-core/bin/lib/commands.cjs +8 -0
package/pan-wizard-core/bin/lib/config.cjs +13 -2
package/pan-wizard-core/bin/lib/context-budget.cjs +73 -0
package/pan-wizard-core/bin/lib/core.cjs +13 -0
package/pan-wizard-core/bin/lib/doc-lint/frontmatter.js +270 -0
package/pan-wizard-core/bin/lib/doc-lint/reporter.js +45 -0
package/pan-wizard-core/bin/lib/doc-lint/schema.js +202 -0
package/pan-wizard-core/bin/lib/doc-lint/validate.js +190 -0
package/pan-wizard-core/bin/lib/doc-lint/walk.js +135 -0
package/pan-wizard-core/bin/lib/doc-lint.cjs +287 -0
package/pan-wizard-core/bin/lib/experiment.cjs +501 -0
package/pan-wizard-core/bin/lib/learn-index.cjs +235 -0
package/pan-wizard-core/bin/lib/learn-lint.cjs +292 -0
package/pan-wizard-core/bin/lib/optimize.cjs +474 -1
package/pan-wizard-core/bin/lib/runner.cjs +472 -0
package/pan-wizard-core/bin/pan-tools.cjs +222 -2
package/pan-wizard-core/learnings/README.md +70 -0
package/pan-wizard-core/learnings/index.json +540 -0
package/pan-wizard-core/learnings/internal/.gitkeep +2 -0
package/pan-wizard-core/learnings/internal/experiment-runner.md +81 -0
package/pan-wizard-core/learnings/internal/external-research.md +93 -0
package/pan-wizard-core/learnings/internal/loop-design.md +33 -0
package/pan-wizard-core/learnings/internal/pan-dev-bugs.md +181 -0
package/pan-wizard-core/learnings/universal/.gitkeep +2 -0
package/pan-wizard-core/learnings/universal/atomic-state.md +21 -0
package/pan-wizard-core/learnings/universal/binary-io.md +21 -0
package/pan-wizard-core/learnings/universal/comment-syntax.md +21 -0
package/pan-wizard-core/learnings/universal/composition.md +33 -0
package/pan-wizard-core/learnings/universal/concurrency.md +33 -0
package/pan-wizard-core/learnings/universal/dag-scheduler.md +33 -0
package/pan-wizard-core/learnings/universal/data-driven-design.md +21 -0
package/pan-wizard-core/learnings/universal/design-process.md +21 -0
package/pan-wizard-core/learnings/universal/empirical-spike.md +21 -0
package/pan-wizard-core/learnings/universal/error-handling.md +23 -0
package/pan-wizard-core/learnings/universal/error-paths.md +21 -0
package/pan-wizard-core/learnings/universal/glob-semantics.md +21 -0
package/pan-wizard-core/learnings/universal/idempotency.md +21 -0
package/pan-wizard-core/learnings/universal/invariants.md +21 -0
package/pan-wizard-core/learnings/universal/io-patterns.md +21 -0
package/pan-wizard-core/learnings/universal/numeric-edge-cases.md +21 -0
package/pan-wizard-core/learnings/universal/output-conventions.md +21 -0
package/pan-wizard-core/learnings/universal/parser-design.md +21 -0
package/pan-wizard-core/learnings/universal/phase-locking.md +21 -0
package/pan-wizard-core/learnings/universal/pipe-friendly-cli.md +21 -0
package/pan-wizard-core/learnings/universal/schema-design.md +21 -0
package/pan-wizard-core/learnings/universal/secret-handling.md +21 -0
package/pan-wizard-core/learnings/universal/streaming-io.md +21 -0
package/pan-wizard-core/learnings/universal/test-patterns.md +57 -0
package/pan-wizard-core/learnings/universal/test-strategy.md +33 -0
package/pan-wizard-core/learnings/universal/unicode.md +21 -0
package/pan-wizard-core/learnings/universal/vendor-pattern.md +21 -0
package/pan-wizard-core/references/guardrails.md +58 -0
package/pan-wizard-core/references/handoff-decisions.md +156 -0
package/pan-wizard-core/references/schemas/pan-command.schema.yml +39 -0
package/pan-wizard-core/references/verification-patterns.md +31 -0
package/pan-wizard-core/templates/config.json +2 -1
package/pan-wizard-core/templates/idea.md +52 -0
package/pan-wizard-core/templates/summary-complex.md +14 -5
package/pan-wizard-core/templates/summary-minimal.md +6 -0
package/pan-wizard-core/templates/summary-standard.md +14 -3
package/pan-wizard-core/workflows/discuss-phase.md +108 -1
package/pan-wizard-core/workflows/exec-phase.md +37 -1
package/pan-wizard-core/workflows/execute-plan.md +14 -0
package/pan-wizard-core/workflows/health.md +23 -0
package/pan-wizard-core/workflows/new-project.md +65 -81
package/pan-wizard-core/workflows/plan-phase.md +58 -0
package/pan-wizard-core/workflows/transition.md +102 -7
package/pan-wizard-core/workflows/verify-phase.md +14 -0
package/scripts/build-hooks.js +7 -1
package/scripts/generate-skills-docs.py +10 -8
package/scripts/release-check.js +184 -0

package/pan-wizard-core/learnings/internal/external-research.md ADDED Viewed

@@ -0,0 +1,93 @@
+---
+topic: external-research
+last_updated: 2026-05-02T18:16:19.459Z
+patterns:
+  - id: P-RES-001
+    summary: ACE (Zhang et al, arXiv:2510.04618, Oct 2025): summary-based context chains have brevity bias and context collapse. Treat memory as append-and-curate playbook, not paraphrase chain
+    promoted_at: 2026-05-02T18:15:25.976Z
+    source_experiments: [external]
+  - id: P-RES-002
+    summary: Chroma context-rot (July 2025): a single semantically-similar-but-irrelevant distractor degrades performance even at modest context sizes. Distractor density matters more than token count
+    promoted_at: 2026-05-02T18:15:32.003Z
+    source_experiments: [external]
+  - id: P-RES-003
+    summary: Cognition (June 2025) anti-multi-agent argument: parallel sub-agents fail because every action carries unstated decisions; downstream agents reconcile contradictions blindly when they only see artifacts
+    promoted_at: 2026-05-02T18:15:39.391Z
+    source_experiments: [external]
+  - id: P-RES-004
+    summary: Specification Gap paper (arXiv:2603.24284, early 2026): two-agent integration accuracy collapses 58 to 25 percent as spec detail is removed; coordination is quadratically sensitive to spec completeness
+    promoted_at: 2026-05-02T18:15:50.213Z
+    source_experiments: [external]
+  - id: P-RES-005
+    summary: GitHub PR audit (arXiv:2601.15195, Jan 2026): agent PRs fail mostly from spec/intent mismatch, design fit, and repo-norm violation — not buggy code. Code that compiles and tests still gets rejected
+    promoted_at: 2026-05-02T18:15:58.959Z
+    source_experiments: [external]
+  - id: P-RES-006
+    summary: S2R / RLVR (ACL 2025): naive self-critique is largely ineffective; verification gains come from FRESH-CONTEXT RESTART and FILE-MEDIATED STRUCTURE forcing re-reading, not from the judging itself. Verbose self-review can hurt via overthinking
+    promoted_at: 2026-05-02T18:16:09.893Z
+    source_experiments: [external]
+  - id: P-RES-007
+    summary: Sakana DGM (2025): in self-improvement loops, AGENT-DESIGN changes generalize across models and languages; PROMPT-FRAGMENT tweaks do not. Promote structural changes, not phrasing tweaks
+    promoted_at: 2026-05-02T18:16:19.459Z
+    source_experiments: [external]
+---
+# External Research (AI-derived)
+> Auto-maintained by `pan-tools learn promote`. Each pattern was extracted from one or more experiment runs (see source_experiments). Patterns are **advisory** — orchestrators should weight them against current context.
+## P-RES-001 — ACE (Zhang et al, arXiv:2510.04618, Oct 2025): summary-based context chains have brevity bias and context collapse. Treat memory as append-and-curate playbook, not paraphrase chain
+**Evidence:** https://arxiv.org/abs/2510.04618 — ACE: Agentic Context Engineering. Empirical: iterative summarization monotonically loses detail on agent and finance benchmarks; structured playbook curation outperforms across runs.
+**Rule:** Reframe memory/<agent>.md and per-phase summary.md as a structured DELTA-LOG (curated by an explicit reviewer step) rather than a paraphrase of the prior phase. Each entry is an addition or amendment to a structured field, not a fresh re-summarization. Curation is its own step, distinct from generation. The pan-optimizer's accrual model should be re-examined under this lens — does it append signal, or summarize away signal?
+**Applies in:** pan-optimizer accrual loop, memory.cjs, summary.md template design, retro --write-memory
+## P-RES-002 — Chroma context-rot (July 2025): a single semantically-similar-but-irrelevant distractor degrades performance even at modest context sizes. Distractor density matters more than token count
+**Evidence:** https://www.trychroma.com/research/context-rot — Hong & Huber, July 2025. Single-distractor experiments showed degradation begins well before 200K, and is non-linear with arrangement and similarity.
+**Rule:** Per-phase context budgets currently track tokens. Add a notion of distractor density: how much of context.md / research.md is plausibly-related-but-off-topic. Codebase mapper and phase researcher should optimize for relevance ratio, not coverage. The phase-budget command should warn when the relevance ratio is low even if token count is healthy.
+**Applies in:** phase-budget, codebase scan filtering, research agent guidance
+## P-RES-003 — Cognition (June 2025) anti-multi-agent argument: parallel sub-agents fail because every action carries unstated decisions; downstream agents reconcile contradictions blindly when they only see artifacts
+**Evidence:** https://cognition.ai/blog/dont-build-multi-agents — Walden Yan, Cognition. Contrast https://www.anthropic.com/engineering/multi-agent-research-system which argues breadth-first reads parallelize fine but writes/decisions need a single coherent trace.
+**Rule:** PAN's serial pipeline (planner -> researcher -> executor -> verifier) is what Cognition endorses, but file-mediated handoff passes only OUTPUTS, not reasoning traces. Consider: should plan.md include an explicit decisions-and-rationale section that the executor reads, beyond just the task list? Should summary.md include a deviations log that the verifier reads? The signal is: when an agent is briefed for a downstream phase, the upstream agent's reasoning trace should be available, not just the artifacts.
+**Applies in:** plan.md template, summary.md template, executor briefing, conductor briefing
+## P-RES-004 — Specification Gap paper (arXiv:2603.24284, early 2026): two-agent integration accuracy collapses 58 to 25 percent as spec detail is removed; coordination is quadratically sensitive to spec completeness
+**Evidence:** https://arxiv.org/abs/2603.24284v1 — The Specification Gap. Two-agent integration: 58 percent accuracy with full spec, 25 percent with stripped spec. Single-agent baseline: 89 to 56 percent. Coordination cost of incomplete specs is quadratic.
+**Rule:** pan-plan-checker currently verifies plan COHERENCE across 8 dimensions. Add a 9th: spec-sufficiency-for-handoff. Question to answer: does this plan contain enough detail that the executor cannot make a divergent decision in the implicit space the plan does not constrain. The check is not is-the-plan-good but is-the-plan-complete-enough-to-survive-the-context-boundary. Specifically: every task has explicit Files, explicit Action, explicit Verify, explicit Done; every architectural choice is locked vs flexible; every assumption is named.
+**Applies in:** agents/pan-plan-checker.md (existing 8 verification dimensions), plan.md template (forcing locked-vs-flexible markers)
+## P-RES-005 — GitHub PR audit (arXiv:2601.15195, Jan 2026): agent PRs fail mostly from spec/intent mismatch, design fit, and repo-norm violation — not buggy code. Code that compiles and tests still gets rejected
+**Evidence:** https://arxiv.org/abs/2601.15195 — Where Do AI Coding Agents Fail. 33K-PR audit. Primary failure modes: spec/intent mismatch (32 percent), design fit (24 percent), repo-norm violation (19 percent). Buggy code is a minority cause of rejection.
+**Rule:** pan-verifier currently checks code-against-plan. The dominant external-world failure is fit-against-repo-norms (style, naming, prior-PR conventions, framework idioms). Verifier should treat codebase/CONVENTIONS.md and codebase/STRUCTURE.md (when they exist from /pan:map-codebase) as first-class verification inputs, not advisory context. project.md and requirements.md may need a Norms section the verifier explicitly tests against. The verification dimensions should add: does this code follow the conventions evident in adjacent files.
+**Applies in:** agents/pan-verifier.md, agents/pan-reviewer.md, codebase/CONVENTIONS.md consumption, project.md template
+## P-RES-006 — S2R / RLVR (ACL 2025): naive self-critique is largely ineffective; verification gains come from FRESH-CONTEXT RESTART and FILE-MEDIATED STRUCTURE forcing re-reading, not from the judging itself. Verbose self-review can hurt via overthinking
+**Evidence:** https://aclanthology.org/2025.acl-long.1104.pdf — S2R. https://magazine.sebastianraschka.com/p/state-of-llms-2025 — Raschka summary. Untrained self-critique provides little gain on reasoning; verification helps when verifier has training or runs against verifiable rewards.
+**Rule:** PAN has multiple judgment-style verification roles: pan-plan-checker (judges plan coherence), pan-meta-reviewer (judges other reviewers), pan-hardener (judges security risk by inspection). The S2R finding suggests these roles' value is mostly the FRESH-CONTEXT structural reset, not the judgment per se. Implication: lean these agents harder on VERIFIABLE signals (test cmd, lint cmd, schema check, type check, dep cycle scan, regex anti-pattern detection) and reduce prose-only verdicts. Where a verifiable check exists, use it instead of prose review. Where one doesn't, ask whether the role earns its compute.
+**Applies in:** agents/pan-plan-checker.md, agents/pan-verifier.md, agents/pan-reviewer.md, agents/pan-meta-reviewer.md, agents/pan-hardener.md, references/verification-patterns.md
+## P-RES-007 — Sakana DGM (2025): in self-improvement loops, AGENT-DESIGN changes generalize across models and languages; PROMPT-FRAGMENT tweaks do not. Promote structural changes, not phrasing tweaks
+**Evidence:** https://sakana.ai/dgm/ — Darwin Godel Machine. Population-based self-improvement showed structural changes transferred across models; specific prompt tweaks did not. The same generalization curve likely holds for human-mediated promote gates.
+**Rule:** When pan-tools learn promote runs (manual gate today, possibly auto-promote in v3.8+), the promote criterion should distinguish: 1) STRUCTURAL pattern (a new agent role, a new file in .planning/, a new verification gate, a new tool-use idiom, an architectural decision) vs 2) PROMPT-FRAGMENT (specific phrasing, a worded instruction, a stylistic preference). Universal scope should be reserved for structural patterns. Prompt fragments belong in internal scope at most — they don't generalize across models or languages, so shipping them to all 5 runtimes is a bet that won't pay.
+**Applies in:** pan-tools learn promote --scope universal gate, optimize.cjs promotePattern criteria, future auto-promote rules

package/pan-wizard-core/learnings/internal/loop-design.md ADDED Viewed

@@ -0,0 +1,33 @@
+---
+topic: loop-design
+last_updated: 2026-05-03T05:00:00.000Z
+patterns:
+  - id: P-1303
+    summary: Exercising PAN's actual surfaces (autonomous run) produces orders-of-magnitude more PAN-relevant signal than building parallel tools — even when shorter
+    promoted_at: 2026-04-27T11:21:36.814Z
+    source_experiments: [panloop]
+  - id: P-1403
+    summary: Track wall-clock-per-commit and tokens-per-commit as autonomous-overhead metrics
+    promoted_at: 2026-04-27T12:01:14.269Z
+    source_experiments: [panloop]
+---
+# Loop Design (PAN-internal)
+> Auto-maintained by `pan-tools learn promote`. Each pattern was extracted from one or more experiment runs (see source_experiments). Internal-scope patterns are PAN-specific and stay in the source repo (stripped at install). Patterns are **advisory** — orchestrators should weight them against current context.
+## P-1303 — Exercising PAN's actual surfaces (autonomous run) produces orders-of-magnitude more PAN-relevant signal than building parallel tools — even when shorter
+**Evidence:** Single 25-second autonomous run (panloop) surfaced 2 critical real PAN bugs (P-1301 AskUserQuestion gap, P-1302 runner permissions gap). Compare: 8 prior hand-built mock experiments (whoocsv, whoojson, whooemoji, whoocron, whoohash, whoouuid, whoodag, whoofreq) totaling many hours produced 0 PAN-internal findings — only generic engineering patterns. The autonomous loop validates its own design hypothesis: hitting real surfaces > simulating them.
+**Rule:** When designing self-improvement loops or eval frameworks, the experiments must EXERCISE the system being optimized, not BUILD PARALLEL artifacts. A 25-second real run beats hours of mock work for surfacing system-internal bugs. For PAN specifically: future experiments should run /pan:new-project, /pan:plan-phase, /pan:exec-phase, /pan:focus-* against fresh test projects via the runner, not build standalone CLIs alongside. The mock builds have value for promoting GENERIC patterns; only autonomous runs surface PAN-INTERNAL ones.
+**Applies in:** self-improvement loop design (ADR-0026 update); v3.8+ planning; promote-step heuristics
+## P-1403 — Track wall-clock-per-commit and tokens-per-commit as autonomous-overhead metrics
+**Evidence:** panloop: 25 commits in 29 min = 1.16 min/commit. Cost $12.84 / 25 commits = $0.51/commit. Useful baseline for future optimization.
+**Rule:** PAN's /pan:learn report should compute and surface (a) commits_per_minute, (b) cost_usd_per_commit, (c) cost_usd_per_phase, (d) cost_usd_per_test, by reading harvest.json + git log + harvested cost data. Trend over experiments shows whether autonomous overhead is improving as patterns saturate.
+**Applies in:** v3.8+ pan-optimizer agent prompt; harvest.json schema extension

package/pan-wizard-core/learnings/internal/pan-dev-bugs.md ADDED Viewed

@@ -0,0 +1,181 @@
+---
+topic: pan-dev-bugs
+last_updated: 2026-04-27T14:36:56.341Z
+patterns:
+  - id: P-101
+    summary: experiment.cjs newExperiment does not persist status='ready' update after installer success
+    promoted_at: 2026-04-27T09:26:39.526Z
+    source_experiments: [whooo]
+  - id: P-102
+    summary: runner.cjs spawnSync fails on Windows for CLI tools without explicit .cmd resolution
+    promoted_at: 2026-04-27T09:26:39.618Z
+    source_experiments: [whooo]
+  - id: P-301
+    summary: PAN's commands/pan/*.md has 9 real frontmatter consistency bugs surfaced by the whooo dogfood gate
+    promoted_at: 2026-04-27T09:49:20.847Z
+    source_experiments: [whooo]
+  - id: P-1301
+    summary: /pan:new-project --auto workflow invokes AskUserQuestion for depth/execution/git-tracking despite --auto, blocking autonomous runs
+    promoted_at: 2026-04-27T11:21:36.615Z
+    source_experiments: [panloop]
+  - id: P-1302
+    summary: runner.cjs claude adapter must include --dangerously-skip-permissions for autonomous runs
+    promoted_at: 2026-04-27T11:21:36.712Z
+    source_experiments: [panloop]
+  - id: P-1304
+    summary: runner.cjs spawnSync with shell:true on Windows doesn't quote multi-word args; cmd.exe re-splits them
+    promoted_at: 2026-04-27T11:38:25.897Z
+    source_experiments: [panloop]
+  - id: P-1401
+    summary: Lightweight phases (scaffolding-only, single plan) over-ceremonialize: 5 commits + 5-7 min for trivial work
+    promoted_at: 2026-04-27T12:01:14.083Z
+    source_experiments: [panloop]
+  - id: P-1402
+    summary: Per-phase researcher re-derives material already covered by project-level research
+    promoted_at: 2026-04-27T12:01:14.179Z
+    source_experiments: [panloop]
+  - id: P-1404
+    summary: Auto-trace SubagentStop hook covers only some agents — pan-roadmapper logged but pan-planner/executor/verifier did not
+    promoted_at: 2026-04-27T12:01:14.367Z
+    source_experiments: [panloop]
+  - id: P-1501
+    summary: claude -p autonomous session exits after Phase 0 setup; multi-step workflows don't drive headless mode forward without explicit tool calls
+    promoted_at: 2026-04-27T12:11:40.314Z
+    source_experiments: [panloop2]
+    superseded_by: P-1501-r3
+    supersession_note: Original P-1501 hypothesis (workflows don't drive headless forward) was refined by P-1501-r2 (no-TTY root cause) and again by P-1501-r3 (TTY chain inheritance, stdio:'inherit' insufficient). Latest current rule lives at P-1501-r3.
+  - id: P-1502
+    summary: runner.cjs exit_code=0 is too coarse — should validate milestone-completion before declaring success
+    promoted_at: 2026-04-27T12:11:40.408Z
+    source_experiments: [panloop2]
+  - id: P-1701
+    summary: Multi-phase (3+) autonomous workflows exit at phase boundaries with /clear-and-rerun instructions; loop is autonomous WITHIN a phase, not across phases
+    promoted_at: 2026-04-27T12:43:48.397Z
+    source_experiments: [panmd2]
+  - id: P-1501-r2
+    summary: P-1501 root cause refined: runner.cjs spawnSync({stdio:[ignore,pipe,pipe]}) lacks TTY; manual bash invocation has TTY; claude -p detects no-TTY and exits after first response loop
+    promoted_at: 2026-04-27T12:43:48.469Z
+    source_experiments: [panmd2]
+    superseded_by: P-1501-r3
+    supersession_note: stdio:'inherit' fix proposed here turned out to be insufficient when the grandparent itself lacks TTY — see P-1501-r3.
+  - id: P-1501-r3
+    summary: P-1501 stdio:'inherit' fix is INSUFFICIENT when the grandparent (script/CI/tool) has no TTY itself
+    promoted_at: 2026-04-27T14:36:56.341Z
+    source_experiments: [panmd3]
+---
+# Pan Dev Bugs (AI-derived)
+> Auto-maintained by `pan-tools learn promote`. Each pattern was extracted from one or more experiment runs (see source_experiments). Patterns are **advisory** — orchestrators should weight them against current context.
+## P-101 — experiment.cjs newExperiment does not persist status='ready' update after installer success
+**Evidence:** whooo experiment: after successful installer run, in-memory manifest had status='ready' but the file write in newExperiment skipped the persistence. On-disk experiment.json shows status='scaffolded'.
+**Rule:** In experiment.cjs newExperiment: after manifest.status='ready' assignment (line ~155 in v3.7.0), add fs.writeFileSync(manifestPath, JSON.stringify(manifest, null, 2)) to persist the success status. 3-line fix; ship as v3.7.1.
+**Applies in:** experiment.cjs maintenance, v3.7.x patches
+## P-102 — runner.cjs spawnSync fails on Windows for CLI tools without explicit .cmd resolution
+**Evidence:** whooo experiment: tried to spawn node via runtime-override; got ENOENT on Windows because spawnSync with shell:false doesn't resolve .cmd shims. Forced fallback to direct build instead of subprocess invocation.
+**Rule:** In runner.cjs runExperiment: on Windows, either set shell:true OR resolve adapter.bin to its .cmd/.exe equivalent before spawnSync. Currently the runner is unusable on Windows for any CLI tool that ships only as .cmd (claude, gemini, codex via npx, etc.).
+**Applies in:** runner.cjs cross-platform fix; v3.7.x patches
+## P-301 — PAN's commands/pan/*.md has 9 real frontmatter consistency bugs surfaced by the whooo dogfood gate
+**Evidence:** whooo dogfood final report (51ms across 52 files): optimize.md missing frontmatter; patches.md missing name field plus description as array; phase-tests.md uses multi-line block-scalar values; todo-add.md and todo-check.md have description as array (should be string). Reproducible via: node bin/whooo.js lint --dir d:/PanWizard/commands/pan --schema test/fixtures/pan-cmd.schema.yml
+**Rule:** Ship a v3.7.x patch fixing the 9 known consistency issues in PAN's commands/pan/. Vendor whooo (or write equivalent) and add pan-tools doc-lint to the /pan:check flow so future drift is caught at author time, not by users at install.
+**Applies in:** v3.7.x patch planning; /pan:check workflow extension; commands/pan/ maintenance
+## P-1301 — /pan:new-project --auto workflow invokes AskUserQuestion for depth/execution/git-tracking despite --auto, blocking autonomous runs
+**Evidence:** panloop sess-real-loop-2026-04-27 11:17:45Z error (critical): claude -p result includes permission_denials with tool_name=AskUserQuestion + 3 questions (Depth, Execution, Git Tracking). Workflow stalled after 5 turns / 910 output tokens / $0.33. The first real autonomous loop run (the loop's own design hypothesis) is blocked by this.
+**Rule:** Audit pan-wizard-core/workflows/new-project.md auto-mode handling. When --auto is set, AskUserQuestion calls must be replaced with: (a) defaults from config.json, (b) overrides from idea.md frontmatter (e.g. planning_depth: quick), or (c) inferred values from idea content. Same audit applies to any other PAN workflow with an --auto/--yes/--non-interactive flag (plan-phase, milestone-new, etc.). Ship as v3.7.2 patch — this blocks the v3.7.0 self-improvement loop's own design intent.
+**Applies in:** pan-wizard-core/workflows/new-project.md auto-mode block; v3.7.2 patch planning; audit of all --auto-flagged workflows
+## P-1302 — runner.cjs claude adapter must include --dangerously-skip-permissions for autonomous runs
+**Evidence:** panloop sess-real-loop-2026-04-27: claude -p WITHOUT this flag prompts for tool permissions, can't be answered in headless mode, exits 1 silently. Manual reproduction with the flag added: workflow proceeds to AskUserQuestion (separate finding P-1301)
+**Rule:** In pan-wizard-core/bin/lib/runner.cjs RUNTIME_RUNNERS, add extraArgs: ['--dangerously-skip-permissions'] to the claude adapter (and equivalent flags for codex/gemini/opencode). The runner's purpose is autonomous execution — defaulting to interactive permission prompts contradicts the runner's design. Optionally gate behind opts.skipPermissions=true for paranoid users, but default ON for headless production. Document trade-off in adapter comment + ADR-0026 update.
+**Applies in:** v3.7.2 patch — runner.cjs adapters
+## P-1304 — runner.cjs spawnSync with shell:true on Windows doesn't quote multi-word args; cmd.exe re-splits them
+**Evidence:** panloop second autonomous run (post-P-1302 fix): claude -p exited 1 in 538ms because the prompt /pan:new-project --auto @.planning/idea.md was passed as 4 args but Node joined them with spaces under shell:true without quoting, so cmd.exe re-split it into 6 args. Manual reproduction with the prompt quoted worked fine (10+ min real autonomous workflow ran).
+**Rule:** When passing args to spawnSync({shell:true}), Node joins them with spaces and the shell re-parses. Multi-word args (prompts, paths with spaces) MUST be quoted by the caller. Fix in runner.cjs: when useShell is true, wrap any arg containing whitespace in double-quotes and double any embedded double-quote (cmd.exe convention). Apply same fix in any other place pan-wizard-core uses spawnSync({shell:true}).
+**Applies in:** v3.7.2 patch — runner.cjs runExperiment, audit other shell:true call sites
+## P-1401 — Lightweight phases (scaffolding-only, single plan) over-ceremonialize: 5 commits + 5-7 min for trivial work
+**Evidence:** panloop run: Phase 1 (project setup — package.json + dirs + CLI stub, ~10 LOC of work) went through full context+research+plan+execute+summary+close. 5 commits, ~5-7 min wall clock.
+**Rule:** PAN should detect 'phase has 1 plan with simple feat/chore-class work' and skip per-phase research + plan-checker stages, deferring directly from context to execute. Save ~3 commits and ~5 min per trivial phase. Heuristic: if plan count == 1 AND plan tasks count <= 3 AND no architectural changes mentioned in idea, treat as lightweight.
+**Applies in:** v3.7.x patch — workflows/exec-phase.md, workflows/plan-phase.md
+## P-1402 — Per-phase researcher re-derives material already covered by project-level research
+**Evidence:** panloop: phase 1 research and phase 2 research both touched ESM scaffolding territory already covered by project-level research/architecture.md, features.md, stack.md. Wasted tokens.
+**Rule:** pan-phase-researcher agent prompt should require reading research/architecture.md, features.md, stack.md as context, AND emit only deltas/specifics not in project-level research. Audit agents/pan-phase-researcher.md.
+**Applies in:** v3.7.x patch — agents/pan-phase-researcher.md prompt
+## P-1404 — Auto-trace SubagentStop hook covers only some agents — pan-roadmapper logged but pan-planner/executor/verifier did not
+**Evidence:** panloop run had ~25 agent invocations across the lifecycle (researcher×2, roadmapper, context, planner×2, executor×3, verifier×2, etc.). Only 14 trace events captured across 4 sub-sessions. Hook coverage gap means /pan:learn analysis is working from incomplete data.
+**Rule:** Audit hooks/pan-trace-logger.js to verify SubagentStop fires for ALL Task-spawned agent types, not just a known list. Either: (a) regex-match agent names broadly, (b) document expected agents and warn if hook payloads come from unknown ones, (c) add a 'fallback' trace event when an agent commits but no trace was captured (would require git-hook integration).
+**Applies in:** v3.7.x patch — hooks/pan-trace-logger.js audit
+## P-1501 — claude -p autonomous session exits after Phase 0 setup; multi-step workflows don't drive headless mode forward without explicit tool calls
+**Evidence:** panloop2 v3.7.3 validation run via patched runner.cjs: status=done, exit_code=0, elapsed=48s, BUT only config.json was written. No project.md, no roadmap, no research, no subagent spawns. The auto-mode workflow block applies defaults then says 'proceed' — model interprets that as completion and exits. Original panloop 29-min success was via MANUAL interactive claude -p, not via runner-spawned.
+**Rule:** Workflow auto-mode blocks must END with an explicit tool call that drives the next step (e.g., Write call to create project.md, or Task call to spawn pan-discusser). 'Proceed' as text instruction is insufficient in headless mode — claude -p exits when the assistant's text response has no pending tool calls. Audit all --auto-flagged workflow paths for this gap. Possible v3.7.4 patch: auto-mode workflow steps explicitly chain via tool invocation, not prose continuation.
+**Applies in:** v3.7.4+ patch — workflows/new-project.md auto-mode chain audit
+## P-1502 — runner.cjs exit_code=0 is too coarse — should validate milestone-completion before declaring success
+**Evidence:** panloop2: runner returned status=done, stop_reason=success, exit_code=0 even though only config.json was written and the workflow halted at Phase 0. Exit code only reflects 'claude -p exited cleanly' — not 'autonomous build completed'.
+**Rule:** After spawnSync returns exit_code=0, runner.cjs runExperiment should also check whether <experiment>/.planning/state.md status field is 'completed' (or whether milestone summary exists). If the workflow never reached milestone-done, set stop_reason='incomplete' or 'partial' even with clean exit. Caller can then act differently (e.g., mark for re-run, alert).
+**Applies in:** v3.7.4 — runner.cjs runExperiment success criteria audit
+## P-1701 — Multi-phase (3+) autonomous workflows exit at phase boundaries with /clear-and-rerun instructions; loop is autonomous WITHIN a phase, not across phases
+**Evidence:** panmd2 manual claude -p run on a 5-phase project: completed Phase 1 with 13 commits + 20/20 tests passing, then exited cleanly with final assistant message: 'Next Up: Phase 2: Rule Infrastructure — /pan:discuss-phase 2 --auto. /clear first → fresh context window'. Compare panloop (2 phases) which auto-chained both phases without exit. The auto-mode workflow has phase-handoff logic that emits a /clear instruction between phases for context-budget reasons.
+**Rule:** Audit pan-wizard-core/workflows/new-project.md and exec-phase.md for phase-handoff logic. The 'between phases /clear' approach prevents true multi-phase autonomous runs. v3.7.4+ options: (a) detect 'this is the last phase' and skip /clear instruction, (b) provide a '--multi-phase' mode that chains all phases in one session (high token cost, large context), (c) have the runner DETECT 'next up' style exits and auto-spawn next phase via /pan:plan-phase --auto. (c) is most scalable.
+**Applies in:** v3.7.4+ — workflows/new-project.md phase-handoff, runner.cjs continuation logic
+## P-1501-r2 — P-1501 root cause refined: runner.cjs spawnSync({stdio:[ignore,pipe,pipe]}) lacks TTY; manual bash invocation has TTY; claude -p detects no-TTY and exits after first response loop
+**Evidence:** Isolation tests: (1) Manual bash invocation 'claude -p --dangerously-skip-permissions <prompt>' (NO --output-format json, exact same flags as runner) → 13 commits, Phase 1 complete, exit 0. (2) Runner-spawned 'claude -p --dangerously-skip-permissions <prompt>' via spawnSync({stdio:[ignore,pipe,pipe], shell:'win32'}) → 0 commits, only config.json written, exit 0 in ~45s. The ONLY difference is the spawn environment. claude -p likely detects isatty(stdin)=false and exits after first complete response, treating the absence of TTY as 'scripted single-shot' instead of 'autonomous loop'.
+**Rule:** Fix in runner.cjs: either (a) allocate a pseudo-tty using node-pty (requires runtime dep), (b) pipe a 'continue' prompt to keep claude alive across iterations, or (c) wrap claude -p in a script that allocates a TTY (e.g., via 'script -q' on Unix, ConPTY on Windows). Document the environment requirement in runner adapter comment.
+**Applies in:** v3.7.4 — runner.cjs spawn environment fix
+## P-1501-r3 — P-1501 stdio:'inherit' fix is INSUFFICIENT when the grandparent (script/CI/tool) has no TTY itself
+**Evidence:** panmd3 v3.7.4 validation run via patched runner with stdio:[inherit, pipe, pipe]: still 48s, 0 commits, status=incomplete (P-1502 caught the regression honestly). Root cause: 'inherit' inherits from parent (node), which inherits from Bash tool wrapper, which has no TTY. Chain: no-TTY-grandparent → no-TTY-parent → claude sees no-TTY → exits early.
+**Rule:** Real fix for P-1501 requires either: (a) explicit pty allocation via node-pty (would be PAN's first runtime dependency — meaningful trade-off), (b) wrap claude invocation in a TTY-allocating tool (Windows: winpty/ConPTY API, Unix: script -q). Document the current limitation: pan-tools experiment run autonomous claude path WORKS only when invoked from a real terminal (where the entire ancestry chain has a TTY). When invoked from Bash-tool/CI/script wrappers, the run will return status=incomplete (P-1502 reports honestly). For v3.7.4: ship with this limitation documented; v3.8 may bring node-pty integration.
+**Applies in:** v3.8 — runner.cjs pty allocation; v3.7.4 — documentation in commands/pan/experiment.md

package/pan-wizard-core/learnings/universal/.gitkeep ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ # Placeholder so git tracks the empty directory.
2	+ # First `pan-tools learn promote --scope universal` creates topic files here.

package/pan-wizard-core/learnings/universal/atomic-state.md ADDED Viewed

@@ -0,0 +1,21 @@
+---
+topic: atomic-state
+last_updated: 2026-05-02T15:25:15.594Z
+patterns:
+  - id: P-1201
+    summary: Atomic state-file pattern: write to file.tmp, fsync, rename. Survives kill -9 mid-write. Use the existing file's content if read fails on the .tmp
+    promoted_at: 2026-05-02T15:25:15.594Z
+    source_experiments: [whoocache, whooflow]
+---
+# Atomic State (AI-derived)
+> Auto-maintained by `pan-tools learn promote`. Each pattern was extracted from one or more experiment runs (see source_experiments). Patterns are **advisory** — orchestrators should weight them against current context.
+## P-1201 — Atomic state-file pattern: write to file.tmp, fsync, rename. Survives kill -9 mid-write. Use the existing file's content if read fails on the .tmp
+**Evidence:** whoocache index-file.js, whooflow state.js — both implemented this independently for parallel-process safety. Survived kill -9 mid-write tests. The pattern is well-established in databases (write-ahead log, MANIFEST in LevelDB) and now appears twice in zero-dep Node.js cache/runner contexts.
+**Rule:** When persisting state that must survive crashes or concurrent access (cache index, run-state, session metadata, scheduler state), always: 1) write JSON to <name>.tmp 2) call fsync (or fs.writeFileSync which buffers) 3) rename <name>.tmp -> <name>. Never write directly to <name>. On read, prefer <name>; if <name>.tmp exists alone, log a warning and treat the original as truth (or recover via the .tmp if you trust it more).
+**Applies in:** any state file accessed by multiple processes or that must survive crashes

package/pan-wizard-core/learnings/universal/binary-io.md ADDED Viewed

@@ -0,0 +1,21 @@
+---
+topic: binary-io
+last_updated: 2026-04-27T10:55:28.062Z
+patterns:
+  - id: P-1101
+    summary: fs.readFileSync(path) without encoding returns Buffer (binary-safe); passing 'utf-8' would corrupt non-text files
+    promoted_at: 2026-04-27T10:55:28.062Z
+    source_experiments: [whoohash]
+---
+# Binary Io (AI-derived)
+> Auto-maintained by `pan-tools learn promote`. Each pattern was extracted from one or more experiment runs (see source_experiments). Patterns are **advisory** — orchestrators should weight them against current context.
+## P-1101 — fs.readFileSync(path) without encoding returns Buffer (binary-safe); passing 'utf-8' would corrupt non-text files
+**Evidence:** whoohash 15:10Z decision: hashing requires byte-level fidelity. fs.readFileSync(path) returns Buffer; fs.readFileSync(path, 'utf-8') returns String with replacement chars for invalid sequences.
+**Rule:** When code must preserve byte content (hashing, copying, network IO, format conversion), use fs.readFileSync(path) WITHOUT the encoding argument to get a Buffer. Only pass an encoding (utf-8 etc) when you actually want the string. Confusion comes from String.length-equivalent reasoning — Buffers are bytes, Strings are code points.
+**Applies in:** exec-phase (any code that reads files for hashing, copying, integrity checks, format conversion)

package/pan-wizard-core/learnings/universal/comment-syntax.md ADDED Viewed

@@ -0,0 +1,21 @@
+---
+topic: comment-syntax
+last_updated: 2026-04-27T10:19:16.013Z
+patterns:
+  - id: P-501
+    summary: JSDoc /** block comments self-terminate on the literal **/X byte sequence (no space)
+    promoted_at: 2026-04-27T10:19:16.013Z
+    source_experiments: [whoolen]
+---
+# Comment Syntax (AI-derived)
+> Auto-maintained by `pan-tools learn promote`. Each pattern was extracted from one or more experiment runs (see source_experiments). Patterns are **advisory** — orchestrators should weight them against current context.
+## P-501 — JSDoc /** block comments self-terminate on the literal **/X byte sequence (no space)
+**Evidence:** whoolen sess_20260427T132000 13:24Z error event (critical impact). lib/walk.js JSDoc described globToRegex fix saying 'with the **/X glob fix baked in'. Node parsed the /** as block-open and the **/ as block-close, then tried to compile 'X glob fix' as identifier. Confusing 'Unexpected identifier' SyntaxError that points at the wrong line.
+**Rule:** Avoid the literal byte sequence **/ inside a /** block comment. JS block-comment syntax has no escape mechanism. Three safe alternatives: (1) use single-line // comments when documenting glob/filesystem patterns containing **/X, (2) insert a space: ** / X, (3) describe with words: 'double-star slash X'. The same bug bites any source file (TypeScript, JavaScript, JSDoc, Java, C) that mentions ** in a block comment.
+**Applies in:** exec-phase (writing source comments), plan-phase (when documenting glob/filesystem patterns)

package/pan-wizard-core/learnings/universal/composition.md ADDED Viewed

@@ -0,0 +1,33 @@
+---
+topic: composition
+last_updated: 2026-04-27T10:28:48.497Z
+patterns:
+  - id: P-602
+    summary: Design experiment-built CLIs to compose via stdin/stdout pipes (Unix philosophy)
+    promoted_at: 2026-04-27T10:22:49.212Z
+    source_experiments: [whoograph]
+  - id: P-802
+    summary: Cross-tool composition emerges naturally when each tool independently applies P-401 (sync stdin), P-402 (trailing newline), and structured stdout
+    promoted_at: 2026-04-27T10:28:48.497Z
+    source_experiments: [whoorun]
+---
+# Composition (AI-derived)
+> Auto-maintained by `pan-tools learn promote`. Each pattern was extracted from one or more experiment runs (see source_experiments). Patterns are **advisory** — orchestrators should weight them against current context.
+## P-602 — Design experiment-built CLIs to compose via stdin/stdout pipes (Unix philosophy)
+**Evidence:** whoograph sess_20260427T133500 13:42Z surprise (critical): piped 'whoolen --format json | jq | whoograph' produced a real chart of PAN's 10 wordiest commands. Cross-tool composition emerged from independent application of P-401 (sync stdin) + P-402 (trailing newline) + structured stdout formats. No coordination required between tools.
+**Rule:** When building CLI tools as part of a self-improvement experiment series, design each tool's I/O for stdin/stdout pipe composition: read structured input from stdin, emit structured output to stdout, use stable line shapes that downstream tools can parse. Cross-tool emergent value (whoolen | whoograph) is greater than the sum of parts, even when neither tool was designed for the other. Patterns P-401 + P-402 + JSON-on-stdout get you composition by default.
+**Applies in:** plan-phase (CLI design), exec-phase (I/O contract decisions)
+## P-802 — Cross-tool composition emerges naturally when each tool independently applies P-401 (sync stdin), P-402 (trailing newline), and structured stdout
+**Evidence:** whoorun 14:13Z surprise (critical): composition.taskfile.md ran whoosort + whoolen as subprocesses, total 128ms, both succeeded. Combined with prior whoolen|whoograph pipe, all 5 whoo* experiments now compose as a real toolchain. None of the 5 was designed for composition with the others — it emerged from each independently applying the same I/O patterns.
+**Rule:** Cross-tool composition is a CONSEQUENCE of consistent patterns, not a design goal. If every experiment-built CLI applies P-401 (sync stdin via fs.readFileSync(0)), P-402 (trailing newline), JSON-on-stdout when --format json, structured (parseable) human output otherwise, AND consistent exit codes (0 ok, 1 logical fail, 2 fatal), then they pipe together by default. Promote P-401+P-402+exit-code-discipline as a bundle: applying any one in isolation has marginal value; applying all three is what produces the emergent composition.
+**Applies in:** plan-phase (CLI design), exec-phase (I/O contract decisions), retrospectives

package/pan-wizard-core/learnings/universal/concurrency.md ADDED Viewed

@@ -0,0 +1,33 @@
+---
+topic: concurrency
+last_updated: 2026-05-03T03:29:41.164Z
+patterns:
+  - id: P-1204
+    summary: O_EXCL lockfile + retry with bounded backoff is enough for multi-process file writes in Node — no flock needed
+    promoted_at: 2026-05-02T15:25:45.482Z
+    source_experiments: [whoocache]
+  - id: P-NPRS-003
+    summary: Worker thread + atomic epoch counter for cancellable slow operations: spawn worker tagged with current epoch; main thread bumps epoch on user action; worker checks epoch and discards stale results silently
+    promoted_at: 2026-05-03T03:29:41.163Z
+    source_experiments: [notepadrs]
+---
+# Concurrency (AI-derived)
+> Auto-maintained by `pan-tools learn promote`. Each pattern was extracted from one or more experiment runs (see source_experiments). Patterns are **advisory** — orchestrators should weight them against current context.
+## P-1204 — O_EXCL lockfile + retry with bounded backoff is enough for multi-process file writes in Node — no flock needed
+**Evidence:** whoocache lock.js + atomic-write.js: parallel-process tests with two child processes each calling set() 1000 times completed with consistent index, zero lost writes. P-1402 (whoocache 02-02 summary): 'O_EXCL lockfile + Windows rename retry' shipped Phase 2.
+**Rule:** For multi-process safe writes (parallel CLI invocations sharing one cache/index/state file), use fs.openSync(lockPath, 'wx') as a 'try-acquire' (EEXIST means held). On failure, retry with random backoff 5-50ms, capped at ~10 attempts. Always wrap acquired work in try/finally and unlink the lockfile in finally. Cross-platform safe (no flock dependency). Combine with the atomic write-tmp-then-rename pattern (P-1201) so even if the lock holder is killed mid-write, recovery is automatic.
+**Applies in:** shared cache/state files accessed by parallel CLI invocations, daemons
+## P-NPRS-003 — Worker thread + atomic epoch counter for cancellable slow operations: spawn worker tagged with current epoch; main thread bumps epoch on user action; worker checks epoch and discards stale results silently
+**Evidence:** notepadrs Plan 03-05: find/replace on >1MB buffers spawned a worker thread carrying an Arc<AtomicU64> epoch. Main thread bumped find_epoch on tab-switch / dialog-close / edit / restart. Worker posted WM_APP_FIND_RESULT only if its tagged epoch still matched current. 8 epoch-discipline integration tests verified silent-stale behavior. Closure-injected wakeup<F: Fn(u64)> kept the spawn function pure-testable (4 real-spawn tests).
+**Rule:** When a slow operation (search, fetch, compile, render) may be invalidated mid-flight by a faster user action, pass an Arc<AtomicU64> epoch into the worker; the worker reads the epoch at start AND at result-post time, and discards its result silently if the epoch advanced. The main thread cancels by bumping the epoch — it does NOT join the worker, does NOT use a kill flag, does NOT use channels for cancellation. Cost: one Arc<AtomicU64> per cancellation domain, two atomic loads per worker. Payoff: zero-copy cancellation; no per-cancel-source plumbing; new cancel sources are added by adding a single fetch_add(1) call.
+**Applies in:** find/search workers, async fetches that may be superseded, autocomplete/suggest backends, syntax-highlight workers, file-tree expanders, any operation where 'a newer version exists' silently invalidates an in-flight result

package/pan-wizard-core/learnings/universal/dag-scheduler.md ADDED Viewed

@@ -0,0 +1,33 @@
+---
+topic: dag-scheduler
+last_updated: 2026-05-03T03:29:27.025Z
+patterns:
+  - id: P-1206
+    summary: Kahn's topological sort with concurrency cap + retry-then-skip-downstream is enough for a 200-line task runner; resume from a state file falls out for free
+    promoted_at: 2026-05-02T15:26:05.473Z
+    source_experiments: [whooflow]
+  - id: P-NPRS-002
+    summary: Pre-allocate cross-plan plumbing in the foundation plan so later parallel plans don't merge-conflict on shared files
+    promoted_at: 2026-05-03T03:29:27.024Z
+    source_experiments: [notepadrs]
+---
+# Dag Scheduler (AI-derived)
+> Auto-maintained by `pan-tools learn promote`. Each pattern was extracted from one or more experiment runs (see source_experiments). Patterns are **advisory** — orchestrators should weight them against current context.
+## P-1206 — Kahn's topological sort with concurrency cap + retry-then-skip-downstream is enough for a 200-line task runner; resume from a state file falls out for free
+**Evidence:** whooflow scheduler.js + executor.js + state.js: 54 commits, 5/5 phases. Phase 3 dogfood: kill-and-resume test against npm run build:hooks succeeded. Phase 02-04 complete with --list/--dry-run preflight + SIGINT discipline. The pure-vs-effectful split (mergeState pure, executor effectful) made the tests trivial.
+**Rule:** For task graphs (dependencies, build pipelines, multi-step CI), use Kahn's algorithm with these wrinkles: 1) when a wave has >concurrency tasks, schedule first N; release a slot only when one finishes. 2) on task fail after all retries, mark downstream (BFS over depends_on graph) as skipped — don't retry. 3) persist task status atomically (P-1201) after each transition. 4) resume mode: read state, skip success, re-run failed tasks subject to remaining retries, run pending normally. Pure-functional mergeState + propagateSkip + active-children separates the math from the I/O so each is testable in isolation.
+**Applies in:** task runners, build pipelines, ETL orchestrators, CI workflow tools
+## P-NPRS-002 — Pre-allocate cross-plan plumbing in the foundation plan so later parallel plans don't merge-conflict on shared files
+**Evidence:** notepadrs Plan 03-01 (foundation): pre-allocated 12 IDM constants, 8 accelerator entries, find_state, find_dlg_hwnd, find_epoch AtomicU64, find_tx/find_rx mpsc channel, find_pending — all owned by Wave-1 foundation plan. Plans 03-02/03-03/03-04/03-05 in Waves 2-4 each touch only their own subsystem files; no merge conflicts on app.rs across 4 parallel-eligible plans.
+**Rule:** In a multi-plan phase where later plans modify the same shared file (a state struct, an enum of message IDs, a routing table), have the Wave-1 foundation plan PRE-ALLOCATE every cross-plan symbol — even if the value is a stub or zero. Subsequent plans then add only their own subsystem files plus implementation bodies, not signatures. Cost: foundation plan grows by ~10-30 lines of pre-allocated stubs. Payoff: the orchestrator can run later plans in parallel without merge-conflict gymnastics on the shared state file, and reviewers see a clean per-subsystem diff.
+**Applies in:** any multi-plan phase touching a shared coordination file (message routing table, IDM/event ID enum, app-level state struct, registry of subsystems, menu definitions)

package/pan-wizard-core/learnings/universal/data-driven-design.md ADDED Viewed

@@ -0,0 +1,21 @@
+---
+topic: data-driven-design
+last_updated: 2026-04-27T10:15:01.073Z
+patterns:
+  - id: P-403
+    summary: Comparator/validator/formatter as a data map beats a switch statement
+    promoted_at: 2026-04-27T10:15:01.073Z
+    source_experiments: [whoosort]
+---
+# Data Driven Design (AI-derived)
+> Auto-maintained by `pan-tools learn promote`. Each pattern was extracted from one or more experiment runs (see source_experiments). Patterns are **advisory** — orchestrators should weight them against current context.
+## P-403 — Comparator/validator/formatter as a data map beats a switch statement
+**Evidence:** whoosort 13:02Z decision: COMPARATORS = { alpha, numeric, length } object map is extensible and testable in isolation. Adding modes is data, not control flow.
+**Rule:** When dispatching to one of N strategies based on a string key (sort modes, output formats, validators, parsers), define them as a frozen object map rather than a switch statement. Adding a strategy becomes data + test, not control-flow modification. Generalizes to schema validators, output formatters, command dispatchers.
+**Applies in:** exec-phase, plan-phase (when designing extensibility points)

package/pan-wizard-core/learnings/universal/design-process.md ADDED Viewed

@@ -0,0 +1,21 @@
+---
+topic: design-process
+last_updated: 2026-04-27T09:48:51.193Z
+patterns:
+  - id: P-203
+    summary: Document explicit Out-of-Scope cuts in DESIGN_SPEC so deviations become spec corrections, not anonymous bug fixes
+    promoted_at: 2026-04-27T09:48:51.193Z
+    source_experiments: [whooo]
+---
+# Design Process (AI-derived)
+> Auto-maintained by `pan-tools learn promote`. Each pattern was extracted from one or more experiment runs (see source_experiments). Patterns are **advisory** — orchestrators should weight them against current context.
+## P-203 — Document explicit Out-of-Scope cuts in DESIGN_SPEC so deviations become spec corrections, not anonymous bug fixes
+**Evidence:** whooo trace.jsonl 11:32Z (decision major) plus 11:51Z (decision minor): block-list scope cut documented in DESIGN_SPEC explicitly. When dogfood revealed the cut was wrong, the deviation event self-documented as a spec correction with a clear reference back to the original cut. Inline source comment in lib/frontmatter.js also points to the trace event.
+**Rule:** Feature specs and DESIGN_SPEC documents should include an explicit Out-of-Scope table listing what v1 deliberately omits. When dogfood reveals an omission was wrong, the deviation is recognized as a spec correction (with a clear pointer to the original cut) rather than a generic bug fix. Adopt the pattern in all feature specs and ADR templates.
+**Applies in:** plan-phase (spec authoring), featureAI workflow, ADR-0026 template descendants

package/pan-wizard-core/learnings/universal/empirical-spike.md ADDED Viewed

@@ -0,0 +1,21 @@
+---
+topic: empirical-spike
+last_updated: 2026-05-03T03:29:14.814Z
+patterns:
+  - id: P-NPRS-001
+    summary: Lock undocumented system flag/API behavior with a committed measurable spike before shipping production code that depends on it
+    promoted_at: 2026-05-03T03:29:14.814Z
+    source_experiments: [notepadrs]
+---
+# Empirical Spike (AI-derived)
+> Auto-maintained by `pan-tools learn promote`. Each pattern was extracted from one or more experiment runs (see source_experiments). Patterns are **advisory** — orchestrators should weight them against current context.
+## P-NPRS-001 — Lock undocumented system flag/API behavior with a committed measurable spike before shipping production code that depends on it
+**Evidence:** notepadrs Plan 05-01: EM_SETTARGETDEVICE word-wrap polarity is undocumented in MSDN; spike measured EM_POSFROMCHAR Y-coordinate at lParam=0 (y=170, wrapped) vs lParam=1 (x=3993 off-screen). Spike committed at examples/wordwrap_spike.rs and result at 05-01-spike-result.md. Production constants WRAP_ON_LPARAM=0 and WRAP_OFF_LPARAM=1 doc-comment references the spike artifact.
+**Rule:** When a system flag, API constant, or external library has undocumented or contradictory semantics, write a Wave-0 spike test that PROGRAMMATICALLY measures the actual behavior (not visual inspection), commit the spike as a permanent reproducer (e.g. examples/<feature>_spike.rs), commit the measurement record as a phase artifact, and have production constants doc-comment the spike artifact path. Cost: 1 small spike per flag. Payoff: future maintainers can re-validate after OS/library upgrades; the constants are no longer 'magic numbers' — they're empirically locked.
+**Applies in:** system-level integrations with undocumented behavior (Win32 messages, mobile native flags, browser quirks, kernel flags, undocumented vendor APIs, version-specific framework behaviors)

package/pan-wizard-core/learnings/universal/error-handling.md ADDED Viewed

@@ -0,0 +1,23 @@
+---
+topic: error-handling
+last_updated: 2026-04-27T10:57:40.942Z
+patterns:
+  - id: P-1209
+    summary: Return errors as result object fields ({error, ...details}) rather than throwing for caller-facing pure functions
+    promoted_at: 2026-04-27T10:57:40.942Z
+    source_experiments: [whoodag]
+    superseded_id: P-1201
+    supersession_note: Renumbered 2026-05-03 — original P-1201 collided with the atomic-state.md pattern of the same ID; this is the error-handling rule.
+---
+# Error Handling (AI-derived)
+> Auto-maintained by `pan-tools learn promote`. Each pattern was extracted from one or more experiment runs (see source_experiments). Patterns are **advisory** — orchestrators should weight them against current context.
+## P-1209 — Return errors as result object fields ({error, ...details}) rather than throwing for caller-facing pure functions
+**Evidence:** whoodag 15:32Z decision: topoSort returns {sorted} on success or {error, cycleNodes} on failure. Caller branches on .error without try/catch. Pattern reused across whooo (parseFrontmatter), whoodiff (parseTaskFile), whoosort (validation).
+**Rule:** For pure-function APIs that return data, return errors as a field on the result object ({error: string, ...context}) rather than throwing. Reasons: (1) callers don't need try/catch boilerplate, (2) errors carry structured context the caller can branch on, (3) function signatures stay synchronous-typed in JSDoc/TS without union throw types. Reserve throw for programmer errors (invalid arg shape) and for catastrophic conditions; use result-object errors for expected/recoverable failures.
+**Applies in:** exec-phase (any pure-function library API), plan-phase (when designing internal contracts)

package/pan-wizard-core/learnings/universal/error-paths.md ADDED Viewed

@@ -0,0 +1,21 @@
+---
+topic: error-paths
+last_updated: 2026-05-02T15:25:54.602Z
+patterns:
+  - id: P-1205
+    summary: Error-path tracking via a path stack threaded through recursive validators yields readable JSONPath like $.users[2].email instead of opaque pointers
+    promoted_at: 2026-05-02T15:25:54.601Z
+    source_experiments: [whooschema]
+---
+# Error Paths (AI-derived)
+> Auto-maintained by `pan-tools learn promote`. Each pattern was extracted from one or more experiment runs (see source_experiments). Patterns are **advisory** — orchestrators should weight them against current context.
+## P-1205 — Error-path tracking via a path stack threaded through recursive validators yields readable JSONPath like $.users[2].email instead of opaque pointers
+**Evidence:** whooschema validate.js + error-utils.js: makeError(path, rule, value, expected) + sortErrors. Phase 1 verification: 40 tests pass, errors carry JSONPath. Phase 2 added composition (oneOf/anyOf/allOf/$ref) with the same path threading. The path-tracking pattern was the differentiator vs ajv/joi alternatives that ship 'failed at /users/2/email'.
+**Rule:** When recursing through nested data (validators, walkers, transformers), maintain an explicit path array and push/pop as you descend/ascend. Build error messages with $.field.array[0].nested format from that path. This costs ~3 lines per recursion frame but transforms 'validation failed' into 'validation failed at $.users[2].email: must match pattern ^.+@.+$ (got "alice")'. Aggregate ALL errors into a list, sorted by path lex order — don't short-circuit on first failure.
+**Applies in:** validators, transformers, schema-driven UIs, recursive walkers