pmx-canvas 0.1.26 → 0.1.28

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (64) hide show
  1. package/.github/extensions/pmx-canvas/extension.mjs +191 -0
  2. package/CHANGELOG.md +110 -0
  3. package/Readme.md +74 -27
  4. package/dist/canvas/index.js +82 -82
  5. package/dist/json-render/index.css +1 -1
  6. package/dist/json-render/index.js +944 -164
  7. package/dist/types/json-render/catalog.d.ts +195 -20
  8. package/dist/types/json-render/charts/components.d.ts +17 -0
  9. package/dist/types/json-render/charts/definitions.d.ts +13 -1
  10. package/dist/types/json-render/charts/tufte-components.d.ts +65 -0
  11. package/dist/types/json-render/charts/tufte-definitions.d.ts +164 -0
  12. package/dist/types/json-render/directives.d.ts +33 -0
  13. package/dist/types/json-render/renderer/index.d.ts +1 -0
  14. package/dist/types/json-render/server.d.ts +32 -1
  15. package/dist/types/mcp/canvas-access.d.ts +62 -0
  16. package/dist/types/server/ax-state.d.ts +170 -0
  17. package/dist/types/server/canvas-db.d.ts +17 -1
  18. package/dist/types/server/canvas-operations.d.ts +53 -0
  19. package/dist/types/server/canvas-schema.d.ts +5 -1
  20. package/dist/types/server/canvas-state.d.ts +95 -4
  21. package/dist/types/server/index.d.ts +120 -3
  22. package/dist/types/server/mutation-history.d.ts +1 -1
  23. package/docs/cli.md +42 -0
  24. package/docs/http-api.md +64 -0
  25. package/docs/mcp.md +23 -5
  26. package/docs/node-types.md +1 -1
  27. package/docs/screenshots/codex-app.png +0 -0
  28. package/docs/screenshots/github-copilot-app.png +0 -0
  29. package/docs/sdk.md +23 -5
  30. package/package.json +10 -7
  31. package/skills/control-session-orchestrator/SKILL.md +359 -0
  32. package/skills/control-session-orchestrator/evals/evals.json +75 -0
  33. package/skills/data-analysis/SKILL.md +6 -0
  34. package/skills/pmx-canvas/SKILL.md +50 -4
  35. package/skills/pmx-canvas/references/github-copilot-app-adapter.md +6 -0
  36. package/skills/tufte-viz/SKILL.md +157 -0
  37. package/skills/tufte-viz/references/analytical-design.md +217 -0
  38. package/skills/tufte-viz/references/tufte-principles.md +147 -0
  39. package/src/cli/agent.ts +302 -3
  40. package/src/cli/index.ts +2 -1
  41. package/src/client/nodes/ExtAppFrame.tsx +48 -1
  42. package/src/client/nodes/McpAppNode.tsx +6 -2
  43. package/src/json-render/catalog.ts +22 -1
  44. package/src/json-render/charts/components.tsx +127 -15
  45. package/src/json-render/charts/definitions.ts +19 -2
  46. package/src/json-render/charts/extra-components.tsx +5 -4
  47. package/src/json-render/charts/tufte-components.tsx +395 -0
  48. package/src/json-render/charts/tufte-definitions.ts +128 -0
  49. package/src/json-render/directives.ts +64 -0
  50. package/src/json-render/renderer/index.css +107 -1
  51. package/src/json-render/renderer/index.tsx +33 -0
  52. package/src/json-render/server.ts +275 -5
  53. package/src/mcp/canvas-access.ts +264 -1
  54. package/src/mcp/server.ts +498 -9
  55. package/src/server/ax-context.ts +8 -3
  56. package/src/server/ax-state.ts +447 -0
  57. package/src/server/canvas-db.ts +184 -1
  58. package/src/server/canvas-operations.ts +123 -2
  59. package/src/server/canvas-schema.ts +27 -3
  60. package/src/server/canvas-state.ts +349 -2
  61. package/src/server/index.ts +259 -7
  62. package/src/server/mutation-history.ts +6 -0
  63. package/src/server/server.ts +442 -5
  64. package/src/server/web-artifacts.ts +31 -5
@@ -0,0 +1,359 @@
1
+ ---
2
+ name: control-session-orchestrator
3
+ description: >
4
+ Control-plane workflow for coordinating multi-agent, multi-session project work from a single
5
+ Codex, GitHub Copilot, or agent-app control session. Use this skill whenever the user asks to
6
+ orchestrate agents, create or steer worker sessions, run a workflow-like effort, fan out
7
+ audits/research/migrations, coordinate parallel implementation streams, monitor other project
8
+ sessions, or compare this control-session pattern to Claude Code dynamic workflows. This skill is
9
+ especially relevant when the current session can spawn persistent project sessions and those
10
+ sessions can spawn their own subagents, creating a two-level orchestration hierarchy.
11
+ ---
12
+
13
+ # Control Session Orchestrator
14
+
15
+ Use the current session as the control plane for project work that is too broad, risky, or
16
+ stateful for one conversation. The control session owns intent, decomposition, routing, status,
17
+ verification, and consolidation. Worker sessions own scoped execution. Worker subagents are local
18
+ implementation/research/audit helpers inside each worker session.
19
+
20
+ ## Mental model
21
+
22
+ ```
23
+ User
24
+ -> Control session (strategy, dispatch, tracking, integration)
25
+ -> Worker project session A (persistent branch/workstream)
26
+ -> Subagents for research, implementation, review, tests
27
+ -> Worker project session B (persistent branch/workstream)
28
+ -> Subagents for local fan-out
29
+ -> Verifier/reviewer session (optional independent gate)
30
+ ```
31
+
32
+ This is similar to dynamic workflows, but the orchestration is human-readable and session-native
33
+ instead of a runtime script. Use it when persistence, branches, PRs, human steering, or cross-session
34
+ continuity matter more than fully automated fan-out.
35
+
36
+ A code runtime gets reliability for free (validated results, barriers, budgets, dedup, resume). A
37
+ prompt-driven control plane only gets it if you make state machine-checkable. Two contracts do that
38
+ without a runtime: a required **worker result block** and a durable **control-state manifest** (see
39
+ [Machine-checkable contracts](#machine-checkable-contracts)). Everything else in this skill keys off
40
+ those two artifacts — without them, "is this worker done and passing?" is a guess, not a field read.
41
+
42
+ ## Supported control apps
43
+
44
+ This skill is app-agnostic. First discover which orchestration tools are available in the current
45
+ session, then adapt the same control workflow to that surface.
46
+
47
+ | Capability | Codex app | GitHub Copilot app | Fallback |
48
+ |---|---|---|---|
49
+ | Find worker sessions | List/search project threads | List/search app sessions | Ask user for target session links/IDs |
50
+ | Create persistent workstreams | Create or reuse Codex threads/worktrees when available | Create or reuse Copilot app sessions/workspaces when available | Use local subagents only |
51
+ | Steer an existing workstream | Send a follow-up prompt to the thread | Send a follow-up prompt to the session | Ask user to paste the prompt into the worker |
52
+ | Local fan-out | Spawn subagents from this session or ask workers to spawn their own | Use Copilot's available agent/session tools | Keep work local |
53
+ | Tracking | Thread titles, pins, branches, PRs, canvas nodes, compact status tables | Session names, branches, PRs, issues, canvas nodes, compact status tables | Markdown status table |
54
+
55
+ Do not assume the GitHub Copilot or Codex tool names. Use the tools exposed in the current
56
+ environment, and say which control surface is active before dispatching workers.
57
+
58
+ ## When to use
59
+
60
+ Use this skill for:
61
+
62
+ - Codebase-wide audits, migrations, or parity checks
63
+ - Parallel investigation across modules, services, features, or PRs
64
+ - Work that benefits from independent implementer and verifier sessions
65
+ - Large features where design, implementation, testing, and review should be split
66
+ - Project-control prompts like "coordinate agents", "spin up sessions", "run a workflow",
67
+ "make workers handle this", "monitor the other sessions", or "act as control"
68
+ - Situations where worker sessions may themselves use subagents for local research, coding, or review
69
+
70
+ Do not use it for a simple one-file fix, a quick answer, or a task where a single local subagent is
71
+ enough. Orchestration has overhead; spend it only when coordination reduces risk or increases
72
+ throughput.
73
+
74
+ ## Machine-checkable contracts
75
+
76
+ These are the session-native analog of a runtime's typed results and durable run state. They stay
77
+ human-readable, but they are **required**, not advisory — the control session parses them instead of
78
+ re-reading prose.
79
+
80
+ ### Worker result block
81
+
82
+ Every worker MUST end its report with a fenced ` ```json ` block tagged `control-result`. The control
83
+ session reads this block (never the surrounding prose) to update state, dedup, and decide routing.
84
+
85
+ ```json control-result
86
+ {
87
+ "worker_id": "auth-api",
88
+ "wave_id": "w1",
89
+ "unit_key": "service/auth",
90
+ "scope": "src/auth/** — refresh-token rotation",
91
+ "status": "complete",
92
+ "files_changed": ["src/auth/rotate.ts"],
93
+ "verification": { "command": "pnpm test auth", "result": "pass", "evidence": "42 passed" },
94
+ "subagents_used": "2 — one research, one test author",
95
+ "risks": ["rotation interacts with logout; covered by test"],
96
+ "next_step": "ready for review session",
97
+ "report_ref": "thread/PR/path to the full report"
98
+ }
99
+ ```
100
+
101
+ The block must be **strict JSON** (no comments/trailing commas) so it parses. `status` is one of
102
+ `complete | blocked | needs-decision | failed`; `verification.result` is one of `pass | fail | not-run`.
103
+
104
+ ### Control-state manifest
105
+
106
+ One durable artifact that **is** the source of truth for the mission — a pinned control thread, a
107
+ tracking-issue body, a canvas node, or a committed `control/state.json`. Re-read and update it every
108
+ turn; keep the conversation for decisions, not state. One row per **unit** (unit-keyed, so the same
109
+ unit is never dispatched twice — this is the dedup ledger).
110
+
111
+ ```json
112
+ {
113
+ "mission": "MCP tool parity audit",
114
+ "non_goals": ["no behavior changes"],
115
+ "success_criteria": ["every tool present in server, HTTP, SDK, docs or flagged"],
116
+ "budget": { "max_concurrent_workers": 5, "max_total_workers": 25, "spawned": 0, "in_flight": 0 },
117
+ "convergence": { "rule": "single-pass", "k_empty": 2, "empty_streak": 0, "target": null, "current": 0 },
118
+ "workers": [
119
+ {
120
+ "unit_key": "surface/http",
121
+ "worker_id": "http-audit",
122
+ "session_ref": "thread-or-session id/link",
123
+ "scope": "HTTP API surface",
124
+ "branch_or_pr": "—",
125
+ "status": "pending",
126
+ "wave_id": "w1",
127
+ "last_update": "ISO-8601",
128
+ "evidence_ref": "report_ref from the result block"
129
+ }
130
+ ],
131
+ "decisions": [],
132
+ "open_followups": []
133
+ }
134
+ ```
135
+
136
+ Rules:
137
+
138
+ - **Worker status** (what a worker self-reports in its result block): `complete | blocked |
139
+ needs-decision | failed`.
140
+ - **Manifest unit status** (the superset the control session maintains): `pending | dispatched |
141
+ needs-decision | blocked | stalled | complete | failed | dropped`. Worker-reported values are a
142
+ subset of these, so setting a unit's status from a worker block (Step 5) is always valid.
143
+ - **Terminal** states — a unit is closed — are `complete | failed | dropped`. Everything else is
144
+ non-terminal and must be resolved, or explicitly converted to `dropped` with a reason, before the
145
+ mission closes (Step 8).
146
+ - `budget.in_flight` is the number of rows currently `dispatched`. Increment `spawned` and `in_flight`
147
+ on dispatch; decrement `in_flight` when a unit leaves `dispatched`; recompute it from the rows on
148
+ rehydrate.
149
+ - `convergence.rule` is one of `single-pass | loop-until-dry | loop-until-budget |
150
+ accumulate-to-target`. `k_empty`/`empty_streak` are used only by `loop-until-dry`; `target`/`current`
151
+ only by `accumulate-to-target` (`target` = the count or coverage goal, `current` = progress so far).
152
+ - dropped/failed units MUST carry a reason in `open_followups`.
153
+
154
+ This manifest is what a fresh control session rehydrates from (Step 0).
155
+
156
+ ## Control workflow
157
+
158
+ ### 0. Rehydrate (resume an in-flight mission)
159
+
160
+ On session start, look for an existing control-state manifest for this mission. If one exists:
161
+
162
+ - Load it; treat it as the source of truth.
163
+ - Re-attach to workers by `session_ref` and reconcile each worker's *real* status (read the thread/PR)
164
+ before any new dispatch.
165
+ - Recompute `budget.in_flight` from the rows still marked `dispatched`.
166
+ - Do NOT re-dispatch a unit whose status is `dispatched` or `complete` — route a follow-up instead.
167
+
168
+ If no manifest exists, this is a new mission — create one during Step 1.
169
+
170
+ ### 1. Frame the mission
171
+
172
+ Before spawning anything, capture (and write into the manifest):
173
+
174
+ - Objective and non-goals
175
+ - Repositories, branches, PRs, or issues in scope
176
+ - File or subsystem boundaries for each workstream
177
+ - Success criteria and verification gates
178
+ - Merge/integration expectations
179
+ - Any "do not touch" constraints
180
+
181
+ Also set explicit limits up front (manifest `budget` and `convergence`):
182
+
183
+ - `max_concurrent_workers` (default ~4–6) — never more in flight at once
184
+ - `max_total_workers` — a lifetime backstop for the whole mission (e.g. 25)
185
+ - optional token / cost / time ceiling
186
+ - the convergence rule: `single-pass` for bounded missions; `loop-until-dry`, `loop-until-budget`,
187
+ or `accumulate-to-target` for open-ended audits/migrations/parity sweeps
188
+
189
+ If any boundary is ambiguous and could cause conflicting edits, ask before dispatch.
190
+
191
+ ### 2. Detect the control surface
192
+
193
+ Before dispatch, identify the available app tools:
194
+
195
+ - Codex app: thread/session tools such as list, create/read, send-message, rename, pin/archive, plus
196
+ optional local subagent tools.
197
+ - GitHub Copilot app: session or workspace tools exposed by the app connector, plus any available
198
+ GitHub issue/PR/branch controls.
199
+ - Generic agent app: any combination of session, task, subagent, branch, issue, PR, or automation
200
+ tools.
201
+
202
+ If no persistent-session tools are available, downgrade to a local multi-agent plan and explain the
203
+ limitation. Do not invent a backend.
204
+
205
+ ### 3. Choose the topology
206
+
207
+ Pick the smallest useful topology:
208
+
209
+ - **One worker**: isolated implementation or bug fix that should live in its own project session
210
+ - **Parallel workers**: independent modules, packages, endpoints, tests, or docs
211
+ - **Research then implementation**: exploratory sessions report findings before coding starts
212
+ - **Implementer + verifier**: one session changes code, another reviews or verifies independently
213
+ - **Control-only**: no workers yet; just inspect state, list sessions, or plan the dispatch
214
+
215
+ Prefer separate sessions when workers may edit overlapping history, need different branches, or need
216
+ long-running context. Prefer local subagents inside one session when the task is exploratory and does
217
+ not need persistent branch state.
218
+
219
+ ### 4. Dispatch workers with complete prompts
220
+
221
+ Respect the budget: **never dispatch while `in_flight >= max_concurrent_workers`** — queue the unit
222
+ (`status: pending`) and log it. On reaching `max_total_workers` or a token/cost ceiling, STOP
223
+ dispatching and surface a *Decision needed* rather than spawning more. Dispatch is an **atomic
224
+ manifest update**: set the unit's row to `status: dispatched` (with `session_ref`, `worker_id`,
225
+ `wave_id`, `last_update`) and increment `spawned` and `in_flight` together; if the dispatch fails to
226
+ start, leave the row `pending` and advance neither counter. Decrement `in_flight` when a unit leaves
227
+ `dispatched` (it reaches a terminal state, or returns to `needs-decision`/`blocked`/`stalled`) so
228
+ queued units can start. This keeps `in_flight` equal to the count of `dispatched` rows that Step 0
229
+ recomputes.
230
+
231
+ Each worker prompt should be self-contained. Include:
232
+
233
+ - The mission and exact scope (and its `unit_key`)
234
+ - Files, subsystems, issue/PR links, and branch expectations
235
+ - What the worker may and may not change
236
+ - Verification commands or acceptance criteria
237
+ - Whether it may create commits, PRs, or only report back
238
+ - The required result block
239
+
240
+ Worker prompt template:
241
+
242
+ ```text
243
+ You are worker <name> for <project>.
244
+
245
+ Mission: <specific outcome>
246
+ unit_key / wave_id: <key> / <wave>
247
+ Scope: <files/subsystems/issue/PR>
248
+ Do not touch: <boundaries>
249
+ Approach: <expected plan or constraints>
250
+ Verification: <commands/checks/evidence>
251
+
252
+ You MAY use your own subagents for local research, implementation, and review, but you remain
253
+ accountable for this scope and the final report. Do NOT create or steer further persistent project
254
+ sessions — if the work needs another full workstream, say so in next_step.
255
+
256
+ End your report with a fenced ```json control-result block (see the contract). Populate every field;
257
+ record subagents you used in subagents_used. The control session reads only that block.
258
+ ```
259
+
260
+ When using Codex app controls, prefer to rename and pin important worker/control threads so the
261
+ session graph stays legible. When using GitHub Copilot app controls, use the corresponding session or
262
+ workspace labels if exposed.
263
+
264
+ ### 5. Track state centrally
265
+
266
+ The control-state manifest is the single source of truth — update it every turn, not the
267
+ conversation. From each worker's result block, set the unit's `status`, `branch_or_pr`,
268
+ `last_update`, and `evidence_ref`. Keep the control session's context focused on summaries and
269
+ decisions, not full transcripts; the full report lives at `report_ref`.
270
+
271
+ Track at least, per unit: `unit_key`, `worker_id`, `session_ref`, scope, status, branch/PR, last
272
+ update, blocker, and verification state. Canvas nodes or a SQL/todo table are good backends for the
273
+ manifest when the app exposes them.
274
+
275
+ ### 6. Route follow-ups (result-gate)
276
+
277
+ When a worker reports, first run the **result-gate**:
278
+
279
+ - Parse the `control-result` block. If a required field is missing or malformed, or the status is
280
+ inconsistent with evidence (e.g. `status: complete` with `verification.result != pass`), do NOT
281
+ accept it — send exactly one standardized re-prompt asking only for the corrected block. Cap at 2
282
+ retries, then escalate to the user.
283
+ - Accept completed work only when the block validates AND meets the success criteria.
284
+
285
+ Then route:
286
+
287
+ - Send targeted follow-ups for missing verification, scope drift, or blockers.
288
+ - Avoid duplicating a worker's investigation unless its result is incomplete or suspect (check the
289
+ unit ledger first).
290
+ - If two or more workers conflict, pause integration and resolve ownership before more edits happen.
291
+
292
+ ### 7. Iterate waves to convergence
293
+
294
+ For multi-wave missions, after routing a wave's follow-ups, apply the declared `convergence.rule`
295
+ before consolidating:
296
+
297
+ - **single-pass** — one wave; skip to consolidate.
298
+ - **loop-until-dry** — keep opening units until `k_empty` consecutive waves produce zero *new*
299
+ (deduped) units; maintain `empty_streak` in the manifest.
300
+ - **loop-until-budget** — stop when a budget cap is hit.
301
+ - **accumulate-to-target** — stop when the target count/coverage is reached.
302
+
303
+ "New" and "dry" are measured against the manifest's set of `unit_key`s, not memory. Never stop
304
+ silently — write why iteration ended (`open_followups` / `decisions`).
305
+
306
+ ### 8. Verify and consolidate
307
+
308
+ Before declaring the mission done:
309
+
310
+ - Run or delegate the agreed verification gate.
311
+ - Review diffs or ask an independent reviewer session for high-signal findings.
312
+ - Ensure worker outputs are integrated in the right branch/session.
313
+
314
+ **Wave-join / completeness gate:** the mission is complete only when **every** manifest worker row is
315
+ in a **terminal** state — `complete`, `failed`, or `dropped`. Non-terminal rows (`pending`,
316
+ `dispatched`, `needs-decision`, `blocked`, `stalled`) must first be resolved; a unit that cannot be —
317
+ e.g. a worker that never reported by its checkpoint, marked `stalled` — must be explicitly converted
318
+ to `dropped` with a reason. Only then may the mission be declared *"complete with N dropped: <ids +
319
+ reasons>"*. Never close with a non-terminal row, and never drop silently. Enumerate every dispatched
320
+ unit in the final summary.
321
+
322
+ **Pull cadence (no push signal):** a session-native control plane has no "worker done" event to wake
323
+ it. After dispatching a wave, define the next checkpoint trigger — a follow-up turn, a status-table
324
+ poll, or a user ping — and never leave a wave un-joined.
325
+
326
+ For PR-bound work, keep the control session responsible for final PR readiness and review routing.
327
+
328
+ ## Safety rules
329
+
330
+ - Do not spawn workers for trivial tasks.
331
+ - Do not let multiple workers edit the same files unless explicitly coordinated.
332
+ - Do not assume a named app connector exists; discover it and fall back honestly.
333
+ - Do not silently create branches, commits, pushes, or PRs; follow the user's consent and repo rules.
334
+ - Do not ask workers to share secrets or sensitive data across sessions.
335
+ - Worker subagents are leaf helpers — they MUST NOT create or steer further persistent sessions. The
336
+ hierarchy is exactly two levels (control -> worker -> subagents); a worker that needs another full
337
+ workstream reports that need to control.
338
+ - Enforce the concurrency and total-fan-out caps; never exceed them silently. Dropped, skipped, or
339
+ failed units MUST be recorded with a reason (no silent truncation).
340
+ - If using an in-place checkout, be extra careful: other user-owned changes may already exist.
341
+ - If the plan changes materially, update the user and the workers before continuing.
342
+
343
+ ## Recommended reporting format
344
+
345
+ Use a compact control-plane update (rows derived from the manifest):
346
+
347
+ ```markdown
348
+ **Status:** <on track | blocked | needs decision | complete>
349
+ **Budget:** in-flight <X/Y> · spawned <A/B> · wave <N> (empty-streak <E>)
350
+
351
+ | Workstream | Session | Scope | State | Evidence |
352
+ |---|---|---|---|---|
353
+ | <name> | <id/name> | <scope> | <state> | <test/report/PR> |
354
+
355
+ **Decision needed:** <only if blocked>
356
+ ```
357
+
358
+ Keep user-facing updates concise. The control session should make coordination legible, not flood the
359
+ user with every worker's transcript.
@@ -0,0 +1,75 @@
1
+ {
2
+ "skill_name": "control-session-orchestrator",
3
+ "evals": [
4
+ {
5
+ "id": 1,
6
+ "name": "multi-session-audit",
7
+ "prompt": "Act as the pmx-canvas control session and coordinate a workflow to audit MCP tool parity across server, HTTP API, SDK, and docs. Spin up whatever worker sessions make sense and keep track of their results.",
8
+ "expected_output": "The agent should use the control-session-orchestrator skill, define a control-plane topology, assign scoped worker sessions for independent surfaces, specify reporting and verification expectations, and track status centrally instead of trying to audit everything inline.",
9
+ "files": []
10
+ },
11
+ {
12
+ "id": 2,
13
+ "name": "implementer-and-verifier",
14
+ "prompt": "We need a safe parallel workflow for a risky canvas refactor: one agent should implement, another should independently review and verify. Please coordinate it from this session.",
15
+ "expected_output": "The agent should use the skill to frame mission, create or route to separate implementer and verifier sessions, prevent overlapping scope drift, require verification evidence, and consolidate the final decision in the control session.",
16
+ "files": []
17
+ },
18
+ {
19
+ "id": 3,
20
+ "name": "codex-control-thread",
21
+ "prompt": "Use this Codex app thread as the pmx-canvas control session. Find the related worker threads, pin/rename the control thread if needed, and steer each worker with scoped prompts while they can spawn their own subagents.",
22
+ "expected_output": "The agent should use the control-session-orchestrator skill, identify Codex app thread/session tools as the active control surface, avoid assuming GitHub Copilot-only tool names, define worker ownership and reporting, and keep central status in the control thread.",
23
+ "files": []
24
+ },
25
+ {
26
+ "id": 4,
27
+ "name": "avoid-over-orchestration",
28
+ "prompt": "Fix the typo in the README heading.",
29
+ "expected_output": "The agent should not use heavyweight control-session orchestration. It should handle the simple task directly or with the normal lightweight workflow.",
30
+ "files": []
31
+ },
32
+ {
33
+ "id": 5,
34
+ "name": "rehydrate-after-handover",
35
+ "prompt": "You're taking over as the control session for an in-progress multi-session migration. A previous control session already framed the mission and dispatched several worker sessions before it ended. Pick up where it left off.",
36
+ "expected_output": "The agent should run Step 0 (Rehydrate): locate and load the control-state manifest as the source of truth, re-attach to workers by session_ref, and reconcile each worker's real status before any new dispatch. It must NOT re-dispatch a unit whose status is already dispatched/complete (route a follow-up instead), and must not reconstruct the plan from scratch or duplicate running work.",
37
+ "files": []
38
+ },
39
+ {
40
+ "id": 6,
41
+ "name": "result-gate-rejects-unverified-report",
42
+ "prompt": "A worker session you dispatched just reported back: 'Done, the refactor looks good and tests should pass.' Decide whether to accept it and mark the workstream complete.",
43
+ "expected_output": "The agent should NOT accept the report. Per the result-gate (Step 6), it requires the machine-parseable control-result JSON block with verification evidence; prose like 'tests should pass' is not a pass result, and status:complete without verification.result:pass is inconsistent. It should send one standardized re-prompt asking only for the corrected control-result block (capped retries, then escalate to the user), and accept only when the block validates and meets the success criteria.",
44
+ "files": []
45
+ },
46
+ {
47
+ "id": 7,
48
+ "name": "respect-concurrency-and-total-caps",
49
+ "prompt": "Coordinate a parity audit across 20 independent endpoints; spin up worker sessions to cover them all.",
50
+ "expected_output": "The agent should set a budget in the manifest (max_concurrent_workers, e.g. 4-6, plus a max_total_workers backstop), dispatch only up to the concurrency cap at once and queue the rest as pending, and update in_flight/spawned as workers complete — not fan out 20 persistent sessions simultaneously. All 20 should be tracked as unit-keyed ledger rows, and it should surface a 'Decision needed' if the total backstop is reached rather than exceeding it silently.",
51
+ "files": []
52
+ },
53
+ {
54
+ "id": 8,
55
+ "name": "wave-join-completeness-gate",
56
+ "prompt": "Most of the audit workers have reported back. Two never responded. Can we call the audit complete and write up the result?",
57
+ "expected_output": "No. Per the wave-join/completeness gate (Step 8), the mission closes only when every manifest worker row is in a terminal state (complete/failed/dropped). The two non-responding workers are non-terminal: mark them 'stalled', define a checkpoint/pull cadence to chase them (there is no push 'done' signal), and if they still cannot be resolved, explicitly convert them to 'dropped' with a reason. Only then may the agent declare 'complete with N dropped: <ids + reasons>'. It must never close with a non-terminal row or drop work silently.",
58
+ "files": []
59
+ },
60
+ {
61
+ "id": 9,
62
+ "name": "convergence-stop-rule",
63
+ "prompt": "Run an open-ended workflow to find and fix every flaky test across the repo — keep going until they are all handled.",
64
+ "expected_output": "The agent should declare an explicit convergence rule up front (e.g. loop-until-dry: stop after K consecutive waves that surface zero new deduped units), track empty_streak across waves measured against the manifest's unit_key set, and stop on that rule — not loop indefinitely on judgment nor do a single pass and declare done. It should record why iteration ended and never stop silently.",
65
+ "files": []
66
+ },
67
+ {
68
+ "id": 10,
69
+ "name": "two-level-hierarchy-guard",
70
+ "prompt": "One of your worker sessions reports that the task is bigger than expected and wants to spin up its own set of persistent project sessions to parallelize further. How should that be handled?",
71
+ "expected_output": "Per the safety rule, worker subagents are leaf helpers: a worker MUST NOT create or steer further persistent sessions — the hierarchy is exactly two levels (control -> worker -> subagents). The worker should report the need (e.g. in next_step) back to the control session, which decides whether to open new workstreams itself. Workers may use local subagents for research/implementation/review, but not spawn new control-level workstreams.",
72
+ "files": []
73
+ }
74
+ ]
75
+ }
@@ -35,6 +35,12 @@ In `pmx-canvas`, prefer `canvas_add_graph_node` for charts and trend lines and
35
35
  `canvas_add_json_render_node` when the analysis should land as a richer dashboard or table inside
36
36
  the canvas.
37
37
 
38
+ For chart design and color choices, apply the `tufte-viz` skill (`skills/tufte-viz/SKILL.md`): color
39
+ must encode data, not decorate. Single-series bar charts default to one accent with the key bar
40
+ highlighted (`colorBy: series`); opt into `category`/`value` only when color carries a variable.
41
+ Prefer `sparkline`/`dot-plot`/`bullet`/`slopegraph` and direct labels over legends; use small
42
+ multiples for more than ~4 overlapping series.
43
+
38
44
  ## When to Use
39
45
 
40
46
  - Answering quantitative questions about engineering performance, delivery, or team health
@@ -181,6 +181,10 @@ pmx-canvas node list --type external-app --summary
181
181
  pmx-canvas pin --list
182
182
  pmx-canvas ax context
183
183
  pmx-canvas ax focus <node-id>
184
+ pmx-canvas ax work add --title "Wire up auth" --status in-progress <node-id>
185
+ pmx-canvas ax approval request --title "Deploy to prod"
186
+ pmx-canvas ax steer "focus on the failing test first"
187
+ pmx-canvas ax timeline --limit 50
184
188
  pmx-canvas snapshot save --name "before-refactor"
185
189
  pmx-canvas code-graph
186
190
  pmx-canvas spatial
@@ -202,6 +206,15 @@ pmx-canvas spatial
202
206
  `focus --no-pan` when you only need to select/raise a node without hijacking the human's camera.
203
207
  - `ax status|context|focus` — inspect the host-agnostic AX layer; `ax context`
204
208
  combines pinned context and AX focus for adapter prompt injection.
209
+ - `ax event add`, `ax steer`, `ax evidence add`, `ax timeline` — the AX timeline
210
+ (agent-events, steering messages, evidence). Persisted for diagnostics,
211
+ retention-bounded, and excluded from snapshots.
212
+ - `ax work add|update|list`, `ax approval request|resolve|list`,
213
+ `ax review add|list` — canvas-bound AX state (work items, approval gates,
214
+ review annotations) that rides snapshots and restore and is cleared by `clear`.
215
+ - `ax host report|status` — report/read the host/session capability (own partition).
216
+ - `copilot install-extension [--dry-run] [--yes]` — install the bundled GitHub
217
+ Copilot adapter into a repo; the core stays host-agnostic.
205
218
  - `fit [id ...]` — set the server viewport to fit the whole canvas or selected nodes before screenshots or whole-board review
206
219
  - `screenshot --output <path>` — top-level shortcut for `webview screenshot`; supports `--format png|jpeg|webp` and `--quality`
207
220
  - `json-render --schema|--examples` — inspect the json-render component catalog with `--component`/`--field` filters; same data as `node schema --type json-render` in a more direct shape
@@ -252,7 +265,7 @@ The CLI targets `http://localhost:4313` by default. Override with `PMX_CANVAS_UR
252
265
  | `trace` | Trace/timeline viewer | Execution traces, timelines |
253
266
  | `mcp-app` | Hosted app/embed frame | Tool-backed MCP apps or external app content; not generic CLI-created notes |
254
267
  | `json-render` | Native structured UI panel | Dashboards, forms, tables, interactive layouts from json-render specs |
255
- | `graph` | Native chart panel | Line, bar, pie, area, scatter, radar, stacked-bar, and composed charts rendered inside the canvas |
268
+ | `graph` | Native chart panel | Line, bar, pie, area, scatter, radar, stacked-bar, composed, plus Tufte primitives (sparkline, dot-plot, bullet, slopegraph) rendered inside the canvas |
256
269
  | `html` | Sandboxed HTML+JS document | Self-contained HTML with optional inline `<script>` and CDN imports rendered in a sandbox-restricted iframe; canvas theme tokens are auto-injected |
257
270
  | `group` | Spatial container/frame | Visually group related nodes together |
258
271
  | `prompt` | Prompt thread root | Canvas-native prompt entry points for agent conversations. **Internal type — surfaces in `canvas://layout` for thread rendering but is not created via the public `canvas_add_node` API. Don't try to add one directly.** |
@@ -364,19 +377,47 @@ If a node type is rejected by `canvas_add_node`, call `canvas_describe_schema` a
364
377
  `outline`. Legacy `props.label` and status variants (`success`, `info`, `warning`, `error`,
365
378
  `danger`) are normalized for saved-spec compatibility.
366
379
 
380
+ **`canvas_stream_json_render_node`** — Build a json-render node progressively (live)
381
+ - Omit `nodeId` on the first call to create a new streaming node — it returns the node `id`
382
+ - Pass that same `nodeId` on later calls to append more `patches`; set `done: true` on the final call
383
+ - `patches` are SpecStream JSON-Patch ops applied server-side (the canvas accumulates the spec):
384
+ `{ "op": "add", "path": "/elements/card", "value": { "type": "Card", "props": { "title": "Live" }, "children": [] } }`,
385
+ `{ "op": "replace", "path": "/root", "value": "card" }`,
386
+ `{ "op": "add", "path": "/elements/card/children/-", "value": "row1" }`
387
+ - Build incrementally: set `/root`, add container elements, then append child element ids and elements
388
+ - Each call re-renders the live node; partial specs render what they can. Use for dashboards/reports
389
+ that should fill in as you generate them rather than appearing all at once.
390
+
367
391
  **`canvas_add_graph_node`** — Add a native graph/chart node
368
392
  - Required: `graphType`, `data`
369
- - Supports `line`, `bar`, `pie`, `area`, `scatter`, `radar`, `stacked-bar`, and `composed`
370
- graph types (aliases accepted)
393
+ - Supports `line`, `bar`, `pie`, `area`, `scatter`, `radar`, `stacked-bar`, `composed`,
394
+ and the Tufte primitives `sparkline`, `dot-plot`, `bullet`, `slopegraph` (aliases accepted)
371
395
  - Use `xKey`/`yKey` for line, bar, area, and scatter graphs
372
396
  - Use `zKey` for scatter bubble size
373
397
  - Use `nameKey`/`valueKey` for pie graphs
374
398
  - Use `axisKey` plus `metrics` for radar graphs
375
399
  - Use `series` for stacked-bar graphs
376
400
  - Use `barKey`/`lineKey` plus optional `barColor`/`lineColor` for composed graphs
401
+ - Bar charts: `colorBy` (`series` default = one accent + a highlighted bar, `category`, `value`, `none`) and `highlight` (`max`/`min`/index)
402
+ - Use `valueKey` for `sparkline` (plus `fill`/`showEndDot`/`showMinMax`/`showValue`)
403
+ - Use `labelKey`/`valueKey` (plus `sort`) for `dot-plot`
404
+ - Use `labelKey`/`valueKey`/`targetKey`/`rangesKey` for `bullet`
405
+ - Use `labelKey`/`beforeKey`/`afterKey` (plus `beforeLabel`/`afterLabel`/`colorByDirection`) for `slopegraph`
377
406
  - Use `nodeHeight` for the canvas frame height and `height` for chart content height
378
407
  - Uses the native json-render chart catalog under the hood
379
408
 
409
+ **Tufte-aware charting** — color must encode data, not decorate. For chart design and critique, use
410
+ the `tufte-viz` skill (`skills/tufte-viz/SKILL.md`). Key rules:
411
+ - Single-series `bar` charts use `colorBy`: default `series` (one accent + one highlighted bar),
412
+ `category` (opt-in palette), `value` (sequential shade by magnitude), or `none` (flat). Do not
413
+ rainbow categorical bars by default.
414
+ - Prefer the Tufte primitives where they fit: `sparkline` (inline trend), `dot-plot` (ranked single
415
+ metric vs. a bar forest), `bullet` (measure vs. target, replaces a gauge), `slopegraph`
416
+ (before/after across many categories).
417
+ - Direct-label data (`showLegend: false`) instead of a legend when one or two series are identifiable.
418
+ - For more than ~4 overlapping series, build small multiples (several small graph nodes on a shared
419
+ scale, arranged in a grid/group) instead of one multi-color chart.
420
+
380
421
  **`canvas_build_web_artifact`** — Build and optionally open a bundled web artifact
381
422
  - Required: `title`, `appTsx` (source string contents, not a file path)
382
423
  - CLI `--app-file` reads a file before calling the same build path; MCP callers must pass the source contents
@@ -682,7 +723,7 @@ server's `ui://` resource as an iframe node on the canvas
682
723
  ### HTML Nodes (Sandboxed iframe)
683
724
 
684
725
  **`canvas_add_html_node`** — Add a normal self-contained HTML document rendered in a sandboxed iframe
685
- - Required: `html` (full document or fragment; inline `<script>` and CDN `<script src="...">` are allowed)
726
+ - Required: `html` (full document or fragment; inline `<script>` and CDN `<script src="...">` are allowed). If `html` is a bare path to an existing local `.html`/`.htm` file, the server reads that file's contents; otherwise it is treated as raw HTML.
686
727
  - Optional: `title`, `summary`, `agentSummary`, `presentation`, `slideTitles`, `embeddedNodeIds`, `embeddedUrls`, `x`, `y`, `width` (default 720), `height` (default 640), `strictSize`
687
728
  - Iframe sandbox is `allow-scripts` only — no same-origin access, no top-navigation, no forms
688
729
  - Canvas theme tokens are auto-injected as CSS custom properties (both `--c-*` and common `--color-*` aliases such as `--color-text-primary`, `--color-bg`, `--color-accent`) and updated live when the canvas theme changes
@@ -734,6 +775,10 @@ what the human has set up and what they're focusing on.
734
775
  | `canvas://spatial-context` | Proximity clusters, reading order, pinned neighborhoods |
735
776
  | `canvas://history` | Human-readable mutation timeline |
736
777
  | `canvas://code-graph` | Auto-detected file import dependencies (JS/TS, Python, Go, Rust) |
778
+ | `canvas://ax` | Host-agnostic AX state: focus, work items, approval gates, review annotations, host capability |
779
+ | `canvas://ax-context` | Agent-ready AX context: pinned context + current focus |
780
+ | `canvas://ax-work` | Canvas-bound AX work: work items, approval gates, review annotations |
781
+ | `canvas://ax-timeline` | Bounded AX timeline: recent agent events, evidence, and steering messages |
737
782
  | `canvas://skills` | Index of bundled agent skills shipped with the install. Each skill is also addressable as `canvas://skills/<name>` (e.g. `canvas://skills/web-artifacts-builder`) and returns the full SKILL.md. Read this resource first to discover companion workflows the canvas is built to support. |
738
783
 
739
784
  ### Reading Spatial Intent
@@ -777,6 +822,7 @@ All POST/PATCH endpoints accept `Content-Type: application/json`. Default base U
777
822
  | GET | `/api/canvas/pinned-context` | Get current pins with neighborhood context |
778
823
  | GET | `/api/canvas/search?q=...` | Search nodes |
779
824
  | POST | `/api/canvas/json-render` | Create a native json-render node |
825
+ | POST | `/api/canvas/json-render/stream` | Create/append a streaming json-render node (SpecStream patches) |
780
826
  | POST | `/api/canvas/graph` | Create a native graph node |
781
827
  | GET | `/api/canvas/schema` | Get running-server create schemas, examples, and json-render catalog metadata |
782
828
  | POST | `/api/canvas/schema/validate` | Validate a json-render spec or graph payload without creating a node |
@@ -83,6 +83,12 @@ The adapter rejects an unrelated running PMX server unless `serverUrl` is explic
83
83
  | `get_ax_context` | Return current pinned + focused AX context. |
84
84
  | `focus_nodes` | Set AX focus with `source: "copilot"`. |
85
85
  | `send_instruction` | Send an explicit prompt into the active Copilot session. |
86
+ | `add_work_item` | Create a canvas-bound AX work item. |
87
+ | `request_approval` | Open an approval gate (`pending`) before a high-impact action. |
88
+ | `resolve_approval` | Resolve an approval gate as approved/rejected. |
89
+ | `add_review_annotation` | Record a review comment/finding anchored to a node/file/region. |
90
+ | `get_timeline` | Read the bounded AX timeline (events, evidence, steering). |
91
+ | `report_capability` | Report host capabilities for diagnostics. |
86
92
 
87
93
  Example focus action:
88
94