npm - @zenuml/core - Versions diffs - 3.47.1 → 3.47.3 - Mend

@zenuml/core 3.47.1 → 3.47.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (53) hide show

package/.agents/skills/babysit-pr/SKILL.md +223 -0
package/.agents/skills/babysit-pr/agents/openai.yaml +7 -0
package/.agents/skills/dia-scoring/SKILL.md +139 -0
package/.agents/skills/dia-scoring/agents/openai.yaml +7 -0
package/.agents/skills/dia-scoring/references/selectors-and-keys.md +253 -0
package/.agents/skills/land-pr/SKILL.md +120 -0
package/.agents/skills/propagate-core-release/SKILL.md +205 -0
package/.agents/skills/propagate-core-release/agents/openai.yaml +7 -0
package/.agents/skills/propagate-core-release/references/downstreams.md +42 -0
package/.agents/skills/ship-branch/SKILL.md +105 -0
package/.agents/skills/submit-branch/SKILL.md +76 -0
package/.agents/skills/validate-branch/SKILL.md +72 -0
package/.claude/skills/emoji-eval/SKILL.md +187 -0
package/.claude/skills/propagate-core-release/SKILL.md +81 -76
package/.claude/skills/propagate-core-release/agents/openai.yaml +2 -2
package/.claude/skills/zenuml-ux-research/SKILL.md +183 -0
package/.claude/skills/zenuml-ux-research/references/assertion-catalog.md +261 -0
package/.claude/skills/zenuml-ux-research/references/best-practices-overview.md +56 -0
package/.claude/skills/zenuml-ux-research/references/report-template.md +89 -0
package/.claude/skills/zenuml-ux-research/references/scenarios/edit-message-label.md +37 -0
package/.claude/skills/zenuml-ux-research/references/scenarios/insert-message.md +36 -0
package/.claude/skills/zenuml-ux-research/references/scenarios/insert-participant.md +31 -0
package/.claude/skills/zenuml-ux-research/references/scenarios/rename-participant.md +33 -0
package/.claude/skills/zenuml-ux-research/references/scenarios/undo-insert.md +35 -0
package/AGENTS.md +1 -1
package/dist/stats.html +1 -1
package/dist/zenuml.esm.mjs +22732 -20169
package/dist/zenuml.js +590 -543
package/docs/superpowers/plans/2026-03-30-emoji-support.md +1220 -0
package/docs/superpowers/plans/2026-03-30-self-correcting-scoring.md +206 -0
package/docs/superpowers/plans/2026-04-15-keyboard-editing-on-diagram.md +1992 -0
package/docs/superpowers/plans/2026-04-15-zenuml-ux-research-skill.md +1452 -0
package/docs/ux-research/.gitkeep +0 -0
package/docs/ux-research/2026-04-15-rename-participant.md +156 -0
package/docs/ux-research/2026-04-18-insert-participant.md +151 -0
package/e2e/data/compare-cases.js +233 -0
package/e2e/fixtures/create-message.html +26 -0
package/e2e/fixtures/editable-label.html +1 -0
package/e2e/fixtures/empty-diagram.html +23 -0
package/e2e/fixtures/insert-participant.html +23 -0
package/e2e/fixtures/reorder-cross-fragment.html +31 -0
package/e2e/fixtures/reorder-fragment.html +29 -0
package/e2e/fixtures/reorder-message.html +27 -0
package/e2e/fixtures/type-switch.html +29 -0
package/e2e/tools/compare-case.html +16 -2
package/index.html +44 -0
package/package.json +3 -3
package/playwright.config.ts +1 -1
package/scripts/analyze-compare-case/collect-data.mjs +139 -16
package/scripts/analyze-compare-case/config.mjs +1 -1
package/scripts/analyze-compare-case/report.mjs +3 -0
package/scripts/analyze-compare-case/residual-scopes.mjs +23 -1
package/scripts/analyze-compare-case/scoring.mjs +1 -0

package/.claude/skills/emoji-eval/SKILL.md ADDED Viewed

@@ -0,0 +1,187 @@
+---
+name: emoji-eval
+description: Evaluate emoji rendering quality in ZenUML diagrams. Renders test cases in both DOM and SVG modes, takes screenshots, and scores emoji visibility, position, spacing, box fit, and decorator coexistence. Reports per-case scores with HTML-vs-SVG parity check. Use when testing emoji rendering, after emoji-related code changes, or when the user asks to evaluate/score emoji rendering.
+---
+# Emoji Rendering Evaluator
+Automatically score emoji rendering quality in ZenUML diagrams by rendering test cases in both DOM and SVG modes, taking screenshots, and evaluating what the agent sees.
+## Prerequisites
+- Dev server running on `http://localhost:8080` (`bun dev`)
+- Playwright MCP available for browser automation
+## Test Cases
+Run ALL of these cases unless the user specifies a subset:
+```javascript
+const EMOJI_TEST_CASES = {
+  "emoji-basic": "[rocket] Production\nA->Production.deploy()",
+  "emoji-multi": "[rocket] Production\n[lock] AuthService\n[fire] Cache\nProduction->AuthService.validate()\nAuthService->Cache.get()",
+  "emoji-with-type": "@Database [fire] HotDB\n@Actor [star] Admin\nAdmin->HotDB.query()",
+  "emoji-with-stereotype": '<<service>> [lock] Auth\n<<gateway>> [globe] API\nAPI->Auth.validate()',
+  "emoji-inline": "[rocket]User->[fire]Server.request()",
+  "emoji-async-message": "A->B: [check] validated\nB->C: [warning] review needed",
+  "emoji-comment": "// [eyes] review phase\nA->B.process()",
+  "emoji-colon-override": "[:red:] Alert\nA->Alert.trigger()",
+  "emoji-css-combo": "// [rocket, red] deploy note\nA->B.deploy()",
+  "emoji-complex": "@Database [fire] HotDB\n[rocket] Production\n<<service>> [lock] Auth\nProduction->Auth.validate(token)\n  Auth->HotDB.check(token)\n  Auth-->Production: [check] valid",
+};
+```
+## Procedure
+For each test case:
+### Step 1: Render in DOM mode
+1. Navigate to `http://localhost:8080`
+2. Set the code via CodeMirror:
+   ```javascript
+   page.evaluate((code) => {
+     const cm = document.querySelector('.CodeMirror');
+     cm.CodeMirror.setValue(code);
+   }, testCode);
+   ```
+3. Click the "DOM" button to switch to DOM view
+4. Wait 1 second for rendering to complete
+5. Take a screenshot of the diagram area — save as `emoji-eval-{caseName}-dom.png`
+### Step 2: Render in SVG mode
+1. Click the "SVG" button to switch to SVG view
+2. Wait 1 second for rendering to complete
+3. Take a screenshot of the diagram area — save as `emoji-eval-{caseName}-svg.png`
+### Step 3: Evaluate both screenshots
+Read each screenshot and score on the criteria below. Use your understanding of sequence diagrams to judge:
+- A participant header should be a box with the name inside
+- Emoji should appear inline to the LEFT of the participant name
+- @Type icons (actor, database) should appear ABOVE the name, in their own row
+- Stereotypes (`<<name>>`) should appear ABOVE the name, below the icon
+- Messages should be horizontal arrows with labels
+- Comments should be italicized text above messages
+## Scoring Criteria
+Score each criterion 0-3:
+### 1. Emoji Visibility (per participant with emoji)
+- **0**: Emoji not visible, blank box, or tofu character
+- **1**: Something visible but wrong character or garbled
+- **2**: Correct emoji visible but poor contrast or very small
+- **3**: Correct emoji clearly visible
+### 2. Emoji Position (per participant with emoji)
+- **0**: Emoji in wrong location (after name, outside box, overlapping other elements)
+- **1**: Before name but overlapping the name text
+- **2**: Correct position with minor vertical misalignment
+- **3**: Perfectly aligned inline before the name
+### 3. Spacing (per participant with emoji)
+- **0**: Emoji and name overlap or no gap
+- **1**: Too tight (characters touching) or too wide (looks disconnected)
+- **2**: Acceptable gap, slightly off
+- **3**: Natural, comfortable spacing
+### 4. Box Fit (per participant with emoji)
+- **0**: Emoji or name overflows the participant box boundary
+- **1**: Box boundary clips the emoji or text
+- **2**: Box fits but looks cramped
+- **3**: Box comfortably accommodates emoji + name with padding
+### 5. Decorator Coexistence (only for cases with @Type or stereotype)
+- **0**: @Type icon or stereotype is missing or broken
+- **1**: Both present but overlapping or misaligned
+- **2**: Both present, minor layout issues
+- **3**: Perfect layout — icon above, stereotype below icon, emoji inline with name
+### 6. Message/Comment Emoji (only for cases with emoji in messages or comments)
+- **0**: Emoji not visible in message/comment text
+- **1**: Emoji visible but breaks the message layout
+- **2**: Emoji visible, minor alignment issues
+- **3**: Emoji renders naturally inline with message/comment text
+## Parity Check
+For each criterion scored in both DOM and SVG:
+- **Match**: Both scores are equal → mark as `=`
+- **Close**: Scores differ by 1 → mark as `~`
+- **Divergent**: Scores differ by 2+ → mark as `!=` (flag for investigation)
+## Output Format
+Present results as a markdown report:
+```markdown
+# Emoji Rendering Evaluation Report
+**Date:** YYYY-MM-DD
+**Branch:** {current git branch}
+**Total cases:** {N}
+## Summary
+| Case | DOM Score | SVG Score | Parity | Status |
+|------|----------|-----------|--------|--------|
+| emoji-basic | 12/12 | 10/12 | ~ | PASS |
+| emoji-multi | 12/12 | 11/12 | ~ | PASS |
+| ... | ... | ... | ... | ... |
+**Overall DOM:** {total}/{max} ({percentage}%)
+**Overall SVG:** {total}/{max} ({percentage}%)
+**Parity divergences:** {count}
+## Detailed Results
+### emoji-basic
+**DSL:**
+\`\`\`
+[rocket] Production
+A->Production.deploy()
+\`\`\`
+**DOM render:**
+[screenshot: emoji-eval-emoji-basic-dom.png]
+| Criterion | Score | Notes |
+|-----------|-------|-------|
+| Emoji visibility | 3 | Rocket emoji clearly visible |
+| Emoji position | 3 | Correctly before "Production" |
+| Spacing | 3 | Natural gap |
+| Box fit | 3 | Box fits comfortably |
+| **Total** | **12/12** | |
+**SVG render:**
+[screenshot: emoji-eval-emoji-basic-svg.png]
+| Criterion | Score | Notes |
+|-----------|-------|-------|
+| Emoji visibility | 3 | Rocket emoji visible |
+| Emoji position | 2 | Slightly tighter than DOM |
+| Spacing | 2 | Tighter spacing than DOM |
+| Box fit | 3 | Box fits |
+| **Total** | **10/12** | |
+**Parity:** Spacing is tighter in SVG (~ close)
+---
+(repeat for each case)
+```
+## Pass/Fail Thresholds
+- **PASS**: All criteria >= 2, total >= 75%
+- **WARN**: Any criterion at 1, or total 50-75%
+- **FAIL**: Any criterion at 0, or total < 50%
+## When to use this skill
+- After emoji-related code changes
+- Before creating a PR that touches emoji rendering
+- When the user asks to "evaluate emoji", "score emoji rendering", "check emoji quality"
+- When debugging emoji visual issues

package/.claude/skills/propagate-core-release/SKILL.md CHANGED Viewed

@@ -1,11 +1,11 @@
 ---
 name: propagate-core-release
-description: Propagate a published `@zenuml/core` release to downstream projects by updating each consumer on its own branch and opening or reusing draft PRs. Use when the user says "push core to downstreams", "update downstream projects", "propagate release", "open downstream PRs", "submit downstream drafts", or wants the newly published zenuml/core version rolled out across mermaid, mermaid live editor, web-sequence, the IntelliJ plugin, confluence-plugin-cloud, and diagramly.ai.
+description: Propagate a published `@zenuml/core` release by opening or reusing per-repo downstream issues with explicit rollout instructions. Use when the user says "push core to downstreams", "update downstream projects", "propagate release", "open downstream issues", "file rollout issues", or wants the newly published zenuml/core version handed off across mermaid, mermaid live editor, web-sequence, the IntelliJ plugin, confluence-plugin-cloud, and diagramly.ai.
 ---
 # Propagate Core Release
-Update downstream consumers after `@zenuml/core` has already been published. This skill creates or reuses per-repo update branches and draft PRs, but does not merge anything.
+Coordinate downstream consumers after `@zenuml/core` has already been published. This skill creates or reuses per-repo GitHub issues with clear implementation instructions for each downstream team. It does not edit downstream repos or open PRs on their behalf.
 ## Scope
@@ -14,17 +14,17 @@ This skill is for the post-publish propagation step only.
 It should:
 1. identify the published `@zenuml/core` version to roll out
-2. update each downstream repo to that version
-3. create or reuse a branch in each downstream repo
-4. push the branch
-5. create or reuse a **draft** PR
-6. summarize which repos succeeded, failed, or were skipped
+2. inspect each downstream repo's update conventions from [references/downstreams.md](references/downstreams.md)
+3. create or reuse one downstream issue per repo for that version
+4. include explicit repo-specific instructions in each issue body
+5. summarize which repos succeeded, failed, or were skipped
 It should not:
 - publish `@zenuml/core`
-- merge downstream PRs
-- auto-fix unrelated downstream test failures beyond straightforward dependency-update fallout
+- update downstream code directly
+- create downstream branches or PRs
+- auto-fix unrelated downstream test failures or implementation details
 Renderer integration rule:
@@ -33,15 +33,14 @@ Renderer integration rule:
 ## Downstream Repos
-Read [references/downstreams.md](references/downstreams.md) before starting. It contains the canonical downstream repo list and repo slug assumptions.
+Read [references/downstreams.md](references/downstreams.md) before starting. It contains the canonical downstream repo list, repo slug assumptions, and repo-specific update commands that must be copied into the issue instructions.
 ## Preconditions
 Before starting:
 - confirm the target `@zenuml/core` version is already published
-- confirm `gh auth status` is healthy for all target orgs
-- confirm you have local checkout strategy for each downstream repo
+- confirm `gh auth status` is healthy for all target orgs and repos where issues will be filed
 - if the user did not specify the target version, discover the latest published one first
 If the published version is ambiguous, stop and ask.
@@ -51,46 +50,50 @@ If the published version is ambiguous, stop and ask.
 Treat each downstream repo as an independent unit of work.
 - Continue processing the remaining repos if one repo fails.
-- Keep a per-repo status ledger as you go: `updated`, `already-updated`, `draft-pr-open`, `blocked`, `failed`.
-- Prefer deterministic updates and small diffs.
-- Reuse existing update branches or draft PRs when they already target the same core version.
+- Keep a per-repo status ledger as you go: `issue-opened`, `issue-reused`, `already-tracked`, `blocked`, `failed`.
+- Prefer deterministic, reusable issue text.
+- Check for same-version issues before creating anything new.
+- Reuse an existing open issue when it already targets the same core version.
+- If the same version already has a closed issue, treat it as `already-tracked` and report it instead of opening a duplicate unless the user explicitly asks to reopen or replace it.
-## Branch Naming
+## Issue Rules
-Use a consistent branch name across downstream repos:
+Each downstream repo should get at most one open issue per core version.
-```text
-chore/zenuml-core-v<version>
-```
-Example:
-```text
-chore/zenuml-core-v1.2.3
-```
-## Draft PR Rules
-All PRs created by this skill must be draft PRs.
+Before creating a new issue, search that repo for issues matching the target version in the title or body. Prefer exact matches on `@zenuml/core v<version>`.
 Use a consistent title pattern:
 ```text
-chore: update @zenuml/core to v<version>
+chore: roll out @zenuml/core v<version>
 ```
-Use a concise body:
+Use a clear body with actionable instructions:
 ```markdown
 ## Summary
-- update `@zenuml/core` to `v<version>`
-## Notes
-- automated downstream propagation after core publish
-- draft PR for repo-specific verification
+- `@zenuml/core` `v<version>` has been published
+- this repo needs to adopt that release
+## Required Work
+1. Run: `<update-command>`
+2. Run: `<lockfile-refresh-command>` and include the lockfile in the PR when applicable
+3. Run: `<verify-command>` when applicable
+4. Keep the diff scoped to the core upgrade and any required integration fix
+5. Open a downstream PR that links back to this issue
+## Repo-Specific Notes
+- <repo-specific note 1>
+- <repo-specific note 2>
+## Acceptance Criteria
+- repo is updated to `@zenuml/core` `v<version>` or the equivalent vendored build output
+- lockfile is refreshed when the repo uses one
+- verification command passes locally, or failure details are documented in the PR
+- no unrelated dependency or renderer migrations are mixed into the change
 ```
-If a draft PR already exists for the same branch or same target version, reuse it and report it instead of creating a duplicate.
+If an issue already exists for the same target version, do not create a duplicate. Reuse the open one, or report the closed one as already tracked.
 ## Workflow
@@ -110,31 +113,31 @@ Record:
 For each repo in [references/downstreams.md](references/downstreams.md):
-1. Ensure you have a local checkout or clone target.
-2. Fetch latest default branch state.
-3. Create or reuse `chore/zenuml-core-v<version>`.
-4. Update the dependency or bundled artifact according to the repo's conventions.
-5. Inspect the diff and keep it scoped to the propagation work.
-6. Run lightweight repo-appropriate verification if it is cheap and obvious.
-7. Commit with:
-   ```text
-   chore: update @zenuml/core to v<version>
-   ```
-8. Push the branch.
-9. Create or reuse a **draft** PR.
+1. Read the repo row carefully and extract the update command, verification command, and notes.
+2. Search for existing issues in that repo for the same core version, checking both open and closed issues.
+3. If an open match exists, reuse it and record the URL.
+4. If only a closed match exists, record it as `already-tracked` and do not create a duplicate unless the user explicitly asked for that.
+5. Otherwise create a new issue using the standard title and a repo-specific body.
+6. Make sure the issue body includes:
+   - the target core version
+   - the exact update command from the table
+   - the lockfile refresh command when the repo uses pnpm or yarn
+   - the exact verify command when one is defined
+   - the renderer and API caveats from the repo notes
+   - an explicit instruction to open a PR linked to the issue after the work is complete
+   - a version marker that makes future deduplication easy, such as `Core version: v<version>`
 ### Step 3: Handle repo-specific blockers
 If a repo fails, capture exactly why:
+- missing issue creation permissions
+- existing issue search is ambiguous
+- existing closed issue should be reopened but the policy is unclear
 - dependency location unclear
-- package manager / lockfile conflict
-- update compiles locally but tests fail
-- missing permissions
-- repo missing locally and clone failed
-- PR creation failed
+- package manager or package filter is unclear
+- repo notes are insufficient to write a safe instruction
+- issue creation failed
 Do not let one repo failure stop the rest of the batch.
@@ -143,21 +146,23 @@ Do not let one repo failure stop the rest of the batch.
 At the end, produce a per-repo summary with:
 - repo
-- branch
-- PR URL or reused PR URL
+- issue URL or matched prior issue URL
 - final status
 - blocker if any
-## Repo Update Guidance
+## Repo Issue Guidance
-Each downstream has specific update and verification commands documented in [references/downstreams.md](references/downstreams.md). Follow the table exactly — do not guess package managers or update commands.
+Each downstream has specific update and verification commands documented in [references/downstreams.md](references/downstreams.md). Follow the table exactly when drafting instructions. Do not guess package managers, package filters, or update commands.
 For each repo:
-1. Run the **Update Command** from the table
-2. Run the **lockfile refresh** (`pnpm install` or `yarn install`) — always commit the updated lockfile
-3. Run the **Verify Command** from the table — if it fails, report the failure and move on
-4. Commit only the dependency change + lockfile — nothing else
+1. Include the **Update Command** from the table verbatim
+2. Include the lockfile refresh step:
+   - `pnpm install` for pnpm repos
+   - `yarn install` for yarn repos
+3. Include the **Verify Command** from the table verbatim when one exists
+4. Tell the downstream team to keep the change scoped to the core upgrade and any required integration fix
+5. Tell the downstream team to open a PR after verification and link it back to the issue
 Special handling for renderer API changes:
@@ -165,23 +170,24 @@ Special handling for renderer API changes:
 - `mermaid-js/mermaid-live-editor` is an indirect SVG-renderer consumer through `@mermaid-js/mermaid-zenuml`. Do not add `@zenuml/core` directly there just to follow a core release.
 - `web-sequence`, `confluence-plugin-cloud`, `diagramly.ai`, and similar downstreams stay on the HTML-renderer path unless the user explicitly asks for a renderer migration.
-Prefer the smallest change that updates the downstream safely:
+Prefer the smallest downstream task description that updates the repo safely:
 - package dependency bumps
 - lockfile refreshes
-- vendored asset refreshes only when the repo actually vendors core output (e.g., `jetbrains-zenuml`)
+- vendored asset refreshes only when the repo actually vendors core output, such as `jetbrains-zenuml`
-Do not opportunistically clean up unrelated code while touching the downstream repo.
+Do not ask downstream teams to opportunistically clean up unrelated code while doing the upgrade.
-If a downstream repo needs custom update logic that is not obvious from the table or its files, stop on that repo and report the ambiguity.
+If a downstream repo needs custom update logic that is not obvious from the table or its notes, stop on that repo and report the ambiguity instead of inventing instructions.
 ## Safety
+- Never update downstream repos directly from this skill.
 - Never merge downstream PRs from this skill.
-- Never force-push unless the user explicitly asks.
-- Never batch all downstream repos into one branch or one PR.
+- Never batch all downstream repos into one issue.
+- Never file duplicate issues for the same repo and core version.
 - Never hide per-repo failures behind a single "batch failed" message.
-- Never update unrelated dependencies in the same PR.
+- Never ask downstream teams to update unrelated dependencies in the same PR.
 ## Output
@@ -189,12 +195,11 @@ Final report format:
 ```markdown
 ## Downstream Propagation Report
-- Core version: v<version>
+- Core version: `v<version>`
 - Overall: <N> succeeded, <N> reused, <N> skipped, <N> failed
 ### Repo Results
-- `<repo>`: draft PR opened | draft PR reused | already updated | failed
-  branch: `<branch-name>`
-  pr: <url or none>
+- `<repo>`: issue opened | issue reused | already tracked | failed
+  issue: <url or none>
   notes: <short reason or blocker>
 ```

package/.claude/skills/propagate-core-release/agents/openai.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 interface:
   display_name: "Propagate Core Release"
-  short_description: "Open downstream draft PRs for a published core version"
-  default_prompt: "Use $propagate-core-release after @zenuml/core has been published to update the configured downstream repos on per-repo branches and open or reuse draft PRs. Do not merge the downstream PRs."
+  short_description: "Open downstream rollout issues for a published core version"
+  default_prompt: "Use $propagate-core-release after @zenuml/core has been published to open or reuse per-repo downstream issues with explicit rollout instructions. Do not update downstream repos directly and do not open PRs on their behalf."
 policy:
   allow_implicit_invocation: true

package/.claude/skills/zenuml-ux-research/SKILL.md ADDED Viewed

@@ -0,0 +1,183 @@
+---
+name: zenuml-ux-research
+description: Audit one ZenUML user interaction scenario at a time (e.g., inserting a message, renaming a participant) against diagramming-tool best practices. Uses claude-in-chrome to walk through the flow in a live browser and writes a gap-only markdown report to docs/ux-research/. Use when the user says "audit ux of", "zenuml ux research", "analyze interaction for zenuml", "run ux research on", or "/zenuml-ux-research". Produces a research report, not an audit pass/fail matrix.
+---
+# ZenUML UX Research
+This skill audits a single ZenUML interaction scenario against diagramming-tool best practices and writes a gap-only markdown report. It is a research tool, not an audit or regression tool. Run it interactively, read the report, and act on it by hand. Never wire it into CI.
+## When to invoke
+- User asks "audit ux of X", "zenuml ux research", "analyze interaction for X".
+- User runs `/zenuml-ux-research <scenario-id>` or `/zenuml-ux-research "free-text goal"`.
+- User asks for a specific gap analysis in the ZenUML editor experience.
+Do NOT invoke this skill for pixel-level comparison (that is `dia-scoring`), parser behavior, or build/deploy tasks.
+## Invocation parameters
+- **Scenario identifier (positional):** either a catalog ID like `insert-message` or a free-text goal like `"audit how users insert a message between A and B"`.
+- **`--url <url>` (optional):** target URL. Default `http://localhost:4000`. Can point to a deployed staging URL.
+- **`--allow-prod` (optional):** required for any URL that is not `localhost`, `127.0.0.1`, or a known staging subdomain. The skill is read-only against the target, but this flag forces the human to confirm they know they're pointing at a real-users environment.
+## Dependencies
+- `claude-in-chrome` MCP tools (for walkthrough). If these are not yet loaded in the session, the skill must instruct the user to load them via `ToolSearch` and stop; the walkthrough cannot run in text-only mode.
+- `ZenUML dev server or a reachable URL` (default `http://localhost:4000`).
+- `Read` / `Grep` tools (for static source analysis of `zenuml-core/src/`).
+- `WebSearch` / `WebFetch` tools (for targeted best-practice lookups; optional).
+## Files this skill uses at runtime
+- `references/scenarios/<scenario-id>.md` — loaded at Phase A.
+- `references/assertion-catalog.md` — loaded at Phase B.
+- `references/best-practices-overview.md` — loaded at Phase B for narrative framing.
+- `references/report-template.md` — loaded at Phase F.
+## Report output
+- Written to `zenuml-core/docs/ux-research/YYYY-MM-DD-<scenario-id>.md`.
+- Create the directory if it doesn't exist (`mkdir -p`).
+- On filename collision, append `-2`, `-3`, etc. Never overwrite.
+- Never commit the report automatically — the human decides.
+## Workflow
+### Phase A — Scenario resolution
+1. Determine whether the invocation is a catalog ID or free-text.
+2. **Catalog ID:**
+   - Check that `references/scenarios/<id>.md` exists.
+   - If not, list all available scenario filenames (glob `references/scenarios/*.md`) and stop.
+   - Load the file. Verify it has front matter with `id` and `title`, plus headings for `User intent`, `Starting DSL`, `Target DSL`, `Relevant assertion categories`. If any are missing, print which field is missing from which file and stop.
+3. **Free-text:**
+   - Synthesize a scenario record with the same fields (id, title, user intent, starting DSL, target DSL, relevant categories).
+   - Present the synthesized record to the user and wait for explicit confirmation.
+   - Do NOT proceed on an unconfirmed synthesized scenario.
+4. Check `--url` reachability with a quick HTTP GET.
+   - If unreachable and the URL is local: print the exact fix command (`cd /Users/penxia/ai-personal/zenuml-core && bun run dev`) and stop.
+   - If unreachable and the URL is remote: print the HTTP status and stop.
+   - If reachable and the URL is non-local but does NOT have `--allow-prod`: warn and stop. Non-local URLs that look like known staging patterns (e.g., contain `staging`, `preview`, `github.io` for the gh-pages build) may proceed with a one-line warning but no hard stop.
+5. Confirm `claude-in-chrome` tools are loaded. If not: instruct the user to load them via `ToolSearch` with query `"select:mcp__claude-in-chrome__tabs_context_mcp,mcp__claude-in-chrome__tabs_create_mcp,mcp__claude-in-chrome__navigate,mcp__claude-in-chrome__find,mcp__claude-in-chrome__computer,mcp__claude-in-chrome__read_page,mcp__claude-in-chrome__read_console_messages,mcp__claude-in-chrome__javascript_tool"` and stop.
+### Phase B — Hypothesis formation
+1. Read the scenario's User intent.
+2. Read `references/best-practices-overview.md` for narrative framing.
+3. Scan `references/assertion-catalog.md` for rules whose category is in the scenario's Relevant assertion categories list. Treat these as **priors** — starting points for what you expect to see — not as a checklist.
+4. Form a short list of expectations in working memory. Example for `rename-participant`: "I expect Enter on selected participant to enter edit mode (KBD-03). I expect caret at end (EDT-02). I expect Escape to cancel (KBD-04). I expect undo granularity to be at the label level (UND-02)."
+5. **Hypotheses are NOT limited to the catalog.** Form open-ended expectations based on general best practices and common sense. If the scenario suggests territory the catalog is silent on, run **1–3** targeted `WebSearch` queries (e.g., "how does tldraw handle arrow-key navigation between shapes"). Keep the budget tight.
+### Phase C — Browser walkthrough
+1. `mcp__claude-in-chrome__tabs_context_mcp` → get current tab state (do not reuse existing tabs from prior sessions).
+2. `mcp__claude-in-chrome__tabs_create_mcp` → open a new tab.
+3. `mcp__claude-in-chrome__navigate` → navigate to the `--url`.
+4. Wait for the page to load. `mcp__claude-in-chrome__read_console_messages` at each interaction to catch runtime errors.
+5. **Seed the starting state** by interacting with the DSL editor pane to type the scenario's Starting DSL. This is setup, not walkthrough — failures here are infrastructure errors.
+   - If Starting DSL is empty, no seeding is needed.
+   - If seeding itself fails (e.g., the DSL editor is unreachable), stop, report "could not seed starting state via DSL editor" as a walkthrough-blocker, and do NOT write a report. This is worse than a gap — it's a dead environment.
+6. **Attempt to reach Target DSL via the most discoverable path a new user would try first.** Record each step:
+   - What was attempted (e.g., "clicked canvas area to the right of participant B")
+   - What happened (e.g., "no visible change; console warning: `[zenuml] unknown click target`")
+   - Whether it advanced toward Target DSL
+7. If the first path fails or hits friction, try 1–2 alternative paths (toolbar, keyboard shortcut, DSL edit). Record each.
+8. **Capture screenshots only at decision moments**, not every step — keeps reports readable. Use `mcp__claude-in-chrome__computer` for screenshots if the tool is available.
+9. **Hard stop after 3 failed attempts on the same step.** Record "could not perform step X after 3 attempts" and move on or stop. Do NOT loop.
+### Phase D — Gap detection
+1. For each observation, compare against the corresponding hypothesis.
+2. **If observation matches hypothesis: drop it. Do not record. Silence is correct.**
+3. **If observation diverges from hypothesis: record a gap.** Each gap has:
+   - Headline (short, e.g., "Enter on selected participant does nothing")
+   - Observed (verbatim)
+   - Expected (from hypothesis)
+   - Catalog ID (scan `references/assertion-catalog.md` for a rule whose `Applies when` and `Check` match; cite that ID. If no rule matches, label the gap `novel — candidate for new rule`.)
+   - Exemplars (from the catalog if cited, else from web search)
+   - Rationale
+   - Severity (`low`, `med`, `high`) — use the catalog rule's severity if cited, else judge based on impact
+4. Novel gaps are flagged but NOT auto-written to the catalog. The human reviews them and folds them in manually later.
+### Phase E — Targeted static source analysis
+For each gap, use `Grep` on `/Users/penxia/ai-personal/zenuml-core/src/` to find the relevant code path:
+- For keyboard interactions: grep for the key name (`'Enter'`, `'Escape'`) and keydown listeners.
+- For selection state: grep for `select`, `Selection`, `aria-selected`, and the Jotai atoms.
+- For inline editing: grep for `contenteditable`, `input`, component names like `Participant`, `Message`.
+- For undo/redo: grep for `undo`, `history`, `Jotai` atoms that track history state.
+Attach `file:line` pointers to each gap. If no handler is found, write `"no code path found — this is a missing implementation, not a misrouted one."` Often the most useful finding.
+### Phase F — Report writing
+1. Load `references/report-template.md`.
+2. Fill in all `{{placeholder}}` fields.
+3. Determine the output path:
+   - Today's date in `YYYY-MM-DD` format.
+   - Filename: `YYYY-MM-DD-<scenario-id>.md`.
+   - Full path: `/Users/penxia/ai-personal/zenuml-core/docs/ux-research/YYYY-MM-DD-<scenario-id>.md`.
+4. Create `docs/ux-research/` if it doesn't exist.
+5. If the filename already exists for today, append `-2`, `-3`, etc.
+6. Write the file.
+7. If gap count is zero, render the zero-gap form (omit Gaps and Playwright snippet sections, collapse the walkthrough to a one-line "No gaps observed on <sha>").
+### Phase G — Hand-off
+1. Print the report path.
+2. Print a one-line summary: `"Found N gaps (X high, Y med, Z low). Report at <path>."`
+3. Stop. Do NOT:
+   - auto-commit the report
+   - auto-fix any gap
+   - open a PR
+   - notify anyone
+   - run additional scenarios
+## Error handling
+**Invocation-time (fail fast, clear instructions):**
+- Scenario ID not found → list `references/scenarios/*.md` filenames, stop.
+- Free-text goal too ambiguous (e.g., can't infer starting or target DSL) → ask one clarifying question, re-confirm, only proceed on confirmation.
+- Scenario file malformed → print file path and missing field, stop.
+- `claude-in-chrome` tools not loaded → instruct user to load via ToolSearch (query shown above), stop.
+- URL unreachable → print fix command, stop.
+- Non-local URL without `--allow-prod` → warn and stop.
+**Walkthrough-time (observe and record, do not panic):**
+- Target state unreachable after 2–3 paths → this is itself a high-severity gap. Write a report with "scenario target is unreachable via discovered interaction paths" as the primary finding.
+- Console error mid-walkthrough → captured, included in the walkthrough step, does not halt unless the app becomes unresponsive.
+- Browser crash → stop, print observations so far, do NOT write a partial report.
+- Screenshot failure → skipped, walkthrough step still recorded with `screenshot: failed`.
+- Same step fails 3 times → stop retrying, record "could not perform step X", move on or stop.
+**Analysis-time (degrade gracefully):**
+- Static analysis finds no handler → say so explicitly.
+- Web search returns nothing → fall back to catalog and common sense.
+- Catalog has no matching rule → label gap `novel`, flag as growth candidate.
+**Output-time:**
+- `docs/ux-research/` does not exist → create it.
+- Filename collision → append `-2`, `-3`, etc.
+- Git SHA capture fails → write `unknown` in metadata. Do not abort.
+## What this skill does NOT do
+- Retry failed walkthrough paths indefinitely.
+- Auto-fix any gap.
+- Commit the report, open a PR, notify anyone.
+- Run multiple scenarios in one invocation.
+- Run Playwright — only emits a snippet for the human to use.
+- Touch production deploy state.
+## Extending the skill
+- **New scenario:** drop a new file into `references/scenarios/`, matching the format of existing scenarios. The skill discovers it automatically.
+- **New assertion rule:** append it to `references/assertion-catalog.md` with the next sequential ID in its category. Never renumber existing rules.
+- **Catalog growth from novel gaps:** when a run flags a `novel` gap, the human reviews and, if appropriate, adds a new rule to the catalog by hand.
+- **Calibration drift:** any substantial change to this SKILL.md should be followed by re-running both calibration scenarios (see the plan document).