npm - @fredcallagan/arn-spark - Versions diffs - 5.1.0 - Mend

@fredcallagan/arn-spark 5.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (130) hide show

package/plugins/arn-spark/skills/arn-spark-clickable-prototype-teams/references/debate-protocol.md ADDED Viewed

@@ -0,0 +1,242 @@
+# Interaction Review Debate Protocol
+This document defines the structured debate process for team-based interaction review in `arn-spark-clickable-prototype-teams`. Two expert reviewers -- a product strategist and a UX specialist -- independently score interaction criteria against journey screenshots and the interaction report, then cross-review each other's scores and findings to surface insights, resolve disagreements, and produce richer feedback than mechanical lower-of-two scoring.
+The skill acts as the **facilitator**: it orchestrates the debate phases, passes file paths between agents, synthesizes the debate report, detects divergence, manages resolution, and presents results to the user. The facilitator does not participate in the scoring itself.
+## Team Roles
+| Role | Agent | Perspective |
+|------|-------|-------------|
+| Builder | `arn-spark-prototype-builder` | Creates prototype screens (not part of debate) |
+| Interactor | `arn-spark-ui-interactor` | Tests journeys via Playwright (not part of debate) |
+| Interaction Strategist | `arn-spark-product-strategist` | Navigation patterns, screen reachability, flow coherence, product goal coverage, journey completability from a product perspective |
+| Interaction Flow Reviewer | `arn-spark-ux-specialist` | Interaction clarity, state changes visibility, transition smoothness, form element functionality, error state handling, journey experience |
+| Facilitator | The skill itself | Orchestrates debate, synthesizes report, manages divergence |
+| Judge | `arn-spark-ux-judge` | Independent interactive verdict (not part of debate) |
+## Debate Modes
+### Divergence Mode (Default)
+Cross-review (Phase 2) triggers only when any criterion score differs by >= 2 points between experts. When experts mostly agree (all scores within 1 point), Phase 2 is skipped and combined scores use the lower of each pair (identical to base skill behavior). This mode saves tokens on cycles where the experts align.
+### Standard Mode
+Full cross-review every cycle, regardless of score agreement. Produces richer debate findings but costs more tokens. Use when the project has complex interaction patterns where expert dialogue adds value even on criteria they numerically agree about.
+## Execution Modes
+**Important:** All execution modes use the same file-based review output. Each expert writes its review to a file -- this works identically in Agent Teams mode and sequential mode. The execution mode selection is based ONLY on whether Agent Teams is supported. File-based output does NOT affect mode selection and does NOT favor sequential over Agent Teams. When Agent Teams is supported, always use Agent Teams mode -- it is faster because experts run in parallel.
+### Agent Teams Mode (Preferred)
+**When:** Agent Teams is supported by your platform.
+Both experts are spawned as teammates. Phase 1 runs in parallel -- each expert writes to its own file simultaneously with no contention. Phase 2 uses Teams communication to coordinate cross-review -- each expert reads the other's completed file and writes its cross-review to a separate file.
+### Sequential Mode (Fallback)
+**When:** Agent Teams is NOT enabled.
+The skill simulates the debate through sequential expert invocations, manually passing file paths between agents so each can read the other's review. Produces the same logical result as Agent Teams mode but with serialized invocations.
+### Single-Reviewer Mode
+**When:** `arn-spark-ux-specialist` is unavailable.
+No debate occurs. The product strategist reviews independently. Strategist scores become the combined scores directly. The debate report notes "Single-Reviewer Mode" throughout. The skill suggests using `/arn-spark-clickable-prototype` instead, which handles single-reviewer identically.
+## Debate Phases
+### Phase 1: Independent Scoring
+Both experts independently score ALL criteria against the journey screenshots and interaction report. Neither sees the other's scores during this phase.
+**Product strategist focus areas (interaction perspective):**
+- Can every screen be reached from the hub navigation?
+- Does the navigation flow match the product concept's intended user journeys?
+- Can each defined user journey be completed from start to finish?
+- Do navigation elements consistently indicate the current location?
+- Are all screens present that the product concept requires?
+- Is the screen organization (functional area grouping) logical for the target user?
+- Do error states provide helpful guidance aligned with the product's tone?
+- If visual grounding assets exist:
+  - References: Do screen layouts and flow feel align with the reference direction?
+  - Designs: Do screen layouts match design mockups in structure and component placement?
+  - Brand: Do brand elements appear correctly across all screens?
+**UX specialist focus areas (interaction perspective):**
+- Do all clickable elements respond with visual feedback?
+- Do form elements function correctly (inputs accept text, dropdowns open, checkboxes toggle)?
+- Are interactive state changes (selected, active, expanded, collapsed) visually clear?
+- Are page transitions smooth and not jarring?
+- At each step of a journey, is the next action obvious to the user?
+- Can users navigate back without relying on browser controls?
+- Is there a dead end anywhere (a screen with no path forward or back)?
+- Are component interactions consistent across screens (same component, same behavior)?
+- Does responsive behavior work correctly (if applicable)?
+- Are there JavaScript errors or broken assets during normal interaction?
+**Phase 1 file output:** Each expert writes their review to a file using the expert interaction review template (`${CLAUDE_PLUGIN_ROOT}/skills/arn-spark-clickable-prototype-teams/references/expert-interaction-review-template.md`). The skill tells each agent the exact file path to write to:
+- Product strategist -> `prototypes/clickable/reviews/round-N-strategist-review.md`
+- UX specialist -> `prototypes/clickable/reviews/round-N-ux-review.md`
+The agent returns a brief summary in conversation -- the full review is in the file. Downstream steps read from the file, not from conversation context.
+### Divergence Check (Performed by Skill)
+After Phase 1, the skill reads both review files and extracts per-criterion scores from the "Per-Criterion Scores" table.
+**In divergence mode:** Calculate `|strategist_score - ux_score|` for each criterion.
+- If max divergence < 2: Skip Phase 2. Combined score per criterion = `min(strategist, ux)`. Present to user: "Experts scored within 1 point on all criteria. No divergence detected -- skipping cross-review."
+- If any divergence >= 2: Proceed to Phase 2. Present to user: "Divergence detected on [N] criteria (difference >= 2 points): [list criteria names and score pairs]. Triggering cross-review."
+**In standard mode:** Always proceed to Phase 2 regardless of score differences.
+### Phase 2: Cross-Review
+Each expert reads the other's Phase 1 file and responds per-criterion.
+**Instructions for each expert during cross-review:**
+For each criterion:
+- **Agree** -- the other expert's score is valid. Optionally adjust own score (up or down) with reasoning.
+- **Disagree** -- the other expert's score is incorrect. Maintain own score with counter-evidence (reference specific journey steps, screenshots, or interaction behaviors).
+- **New concern prompted** -- the other expert's review reveals something not previously noticed. Add observation.
+For each journey:
+- Note whether you agree with the other expert's journey assessment (Complete/Partial/Failed).
+- If you disagree on a journey outcome, provide specific step-level evidence.
+Focus on DIVERGENT criteria (score difference >= 2). For criteria with small differences (<= 1 point), a brief acknowledgment suffices.
+**Phase 2 file output:** Each expert writes their cross-review to a separate file:
+- Product strategist -> `prototypes/clickable/reviews/round-N-strategist-cross-review.md`
+- UX specialist -> `prototypes/clickable/reviews/round-N-ux-cross-review.md`
+In sequential mode (where the UX specialist writes Phase 1 + Phase 2 combined), the combined output goes to `round-N-ux-review.md` (a single file with both sections).
+### Phase 3: Synthesis (Performed by Skill)
+The skill reads all review files written by the experts -- never from conversation context. The files to read are:
+- `prototypes/clickable/reviews/round-N-strategist-review.md` (Phase 1)
+- `prototypes/clickable/reviews/round-N-ux-review.md` (Phase 1, or Phase 1 + Phase 2 combined in sequential mode)
+- `prototypes/clickable/reviews/round-N-strategist-cross-review.md` (Phase 2, if written separately)
+- `prototypes/clickable/reviews/round-N-ux-cross-review.md` (Phase 2, Agent Teams mode only)
+For each criterion, categorize:
+**Consensus:** Both experts scored the same, or one adjusted their score in cross-review to match the other. Combined score = the agreed score.
+**Additions:** One expert scored lower with specific feedback, the other did not dispute the lower score in cross-review (neither agreed nor disagreed). Combined score = the lower score.
+**Disagreements:** Both experts maintained different scores after cross-review -- one raised a concern and the other explicitly disagreed. These require user resolution in Phase 4.
+**No-debate:** Criteria where Phase 2 was skipped (divergence mode, no divergence detected). Combined score = `min(strategist, ux)`.
+Also synthesize journey assessments: if experts disagree on whether a journey completed, note the disagreement in the debate report and include it in the resolution step if the journey outcome affects a criterion score.
+Write the debate review report using the debate review report template. Save to `prototypes/clickable/reviews/round-N-cycle-M-debate-report.md`.
+### Phase 4: Resolution (Conditional)
+**Trigger:** One or more unresolved disagreements exist after Phase 3.
+For each unresolved disagreement, present to the user:
+"Expert disagreement on **[Criterion Name]** (criterion #[N]):
+- **Product Strategist:** Score [X] -- [reasoning and evidence with journey reference]
+- **UX Specialist:** Score [Y] -- [reasoning and evidence with journey reference]
+- **Trade-off:** [what each score optimizes for]
+What score should this criterion receive?"
+Record user decisions. Update the debate report with resolutions. The resolved score becomes the final combined score for that criterion.
+## Sequential Mode Invocation Detail
+When Agent Teams is not enabled, the skill simulates the debate with 3 sequential invocations per round:
+**Invocation 1 -- Product Strategist Phase 1:**
+Invoke the `arn-spark-product-strategist` agent via the Task tool, passing the model from `.arness/agent-models/spark.md` as the `model` parameter (see `plugins/arn-spark/skills/arn-spark-ensure-config/references/ensure-config.md` "Dispatch convention" for fallback). Context:
+- Journey screenshots from interaction testing step
+- Interaction report from `arn-spark-ui-interactor`
+- All criteria with descriptions, scoring scale, and threshold
+- Style brief, product concept, visual grounding assets (with category context)
+- Expert interaction review template path: `${CLAUDE_PLUGIN_ROOT}/skills/arn-spark-clickable-prototype-teams/references/expert-interaction-review-template.md`
+- File path to write to: `prototypes/clickable/reviews/round-N-strategist-review.md`
+- Instruction: "Score every criterion independently against the journey screenshots and interaction report. Assess every journey for completability. Write your complete review to the specified file path using the expert interaction review template. Return a brief summary in conversation."
+**Invocation 2 -- UX Specialist Phase 1 + Phase 2 Combined:**
+Invoke the `arn-spark-ux-specialist` agent via the Task tool, passing the model from `.arness/agent-models/spark.md` as the `model` parameter (see `plugins/arn-spark/skills/arn-spark-ensure-config/references/ensure-config.md` "Dispatch convention" for fallback). Context:
+- Same inputs as Invocation 1
+- The strategist's file path to read: `prototypes/clickable/reviews/round-N-strategist-review.md`
+- File path to write to: `prototypes/clickable/reviews/round-N-ux-review.md`
+- Instruction: "First, score every criterion independently from the UX perspective and assess every journey, using the Phase 1 format. Then, read the strategist's review at the specified file path and respond to each criterion score and journey assessment using the Phase 2 cross-review format: agree (optionally adjust your score), disagree (with counter-evidence), or note new concerns. Write your complete review (Phase 1 + Phase 2 combined) to the specified file path. Return a brief summary in conversation."
+**Invocation 3 -- Product Strategist Phase 2:**
+Invoke the `arn-spark-product-strategist` agent via the Task tool, passing the model from `.arness/agent-models/spark.md` as the `model` parameter (see `plugins/arn-spark/skills/arn-spark-ensure-config/references/ensure-config.md` "Dispatch convention" for fallback). Context:
+- The UX specialist's file path to read: `prototypes/clickable/reviews/round-N-ux-review.md`
+- Expert interaction review template path
+- File path to write to: `prototypes/clickable/reviews/round-N-strategist-cross-review.md`
+- Instruction: "Read the UX specialist's review at the specified file path. The UX specialist has scored the criteria, assessed journeys, and also responded to your review. Respond to their scores, journey assessments, and cross-review using the Phase 2 format. Write your cross-review to the specified file path. Return a brief summary in conversation."
+The skill synthesizes by reading all three review files (not from conversation context).
+**Note on sequential asymmetry:** In sequential mode, the UX specialist sees the strategist's review before writing their own Phase 1 review. Instruct the UX specialist to "score independently first" to minimize anchoring bias. The synthesis step normalizes the output regardless of invocation order.
+## Agent Teams Mode Invocation Detail
+**Phase 1:** Spawn both experts simultaneously as teammates. Each receives:
+- Journey screenshots, interaction report, criteria, scoring parameters, style brief, product concept, visual grounding assets
+- Expert interaction review template path
+- File path to write to: strategist -> `round-N-strategist-review.md`, UX specialist -> `round-N-ux-review.md`
+- Instruction: "Score every criterion independently. Assess every journey. Write the complete review to the specified file path using the expert interaction review template. Do not communicate with other teammates during this phase."
+Both produce Phase 1 reviews independently and write them to their respective files.
+**Runtime verification:** After Phase 1, the skill checks that BOTH review files exist and contain per-criterion scores. If one file is missing (Agent Teams silently failed to spawn one expert), invoke the missing expert sequentially and note the issue in the debate report: "Agent Teams Phase 1 partial failure: [agent] did not produce its review file. Invoked sequentially as fallback."
+**Phase 2:** Share file paths through Teams communication:
+- Tell the UX specialist to read the strategist's file: `round-N-strategist-review.md`
+- Tell the strategist to read the UX specialist's file: `round-N-ux-review.md`
+- Each reads the other's file and writes their cross-review to a separate file:
+  - Product strategist -> `round-N-strategist-cross-review.md`
+  - UX specialist -> `round-N-ux-cross-review.md`
+- Each responds using the Phase 2 cross-review format from the expert interaction review template
+The skill synthesizes by reading all four review files (not from conversation context).
+## Invocation Counts per Cycle
+| Execution Mode | Debate Mode | Divergence Found | Expert Invocations | Notes |
+|---------------|-------------|------------------|-------------------|-------|
+| Agent Teams | Standard | N/A (always) | 4 (2 P1 + 2 P2) | Parallel within phases |
+| Agent Teams | Divergence | Yes | 4 (2 P1 + 2 P2) | Same as standard |
+| Agent Teams | Divergence | No | 2 (P1 only) | Phase 2 skipped |
+| Sequential | Standard | N/A (always) | 3 (strat P1, UX P1+P2, strat P2) | |
+| Sequential | Divergence | Yes | 3 | |
+| Sequential | Divergence | No | 2 (strat P1, UX P1) | Phase 2 skipped |
+| Single-reviewer | Any | N/A | 1 | No debate |
+For max_cycles=3 with Agent Teams + standard mode: up to 12 expert invocations + 3 build cycles + 3 interaction testing cycles + judge + showcase.
+## Skill's Facilitation Responsibilities
+The `arn-spark-clickable-prototype-teams` skill (not the agents) is responsible for:
+1. **Agent Teams verification:** Check env var in Step 1 AND verify both experts write files after Phase 1. If one expert's file is missing, invoke sequentially as fallback and log the issue.
+2. **Mode detection:** Record execution mode (agent_teams / sequential) and debate mode (divergence / standard) from Step 1 and Step 3
+3. **Directory setup:** Create `prototypes/clickable/reviews/` if it does not exist before invoking experts
+4. **File path assignment:** Tell each expert agent the exact file path to write to and the expert interaction review template path
+5. **Phase orchestration:** Run Phase 1, perform divergence check, conditionally run Phase 2, telling each expert to read the other's review file (by file path, not by passing content through conversation)
+6. **Synthesis:** Read all expert review files and categorize per criterion into consensus, additions, disagreements. Also compare per-journey assessments. Never rely on the expert's conversation summary -- always read the file.
+7. **Score computation:** For each criterion, compute the final combined score based on debate outcome (consensus: agreed score, additions: lower score, disagreements: user-resolved score, no-debate: min of two)
+8. **Conflict detection:** Identify disagreements (both criterion-level and journey-level) and present to user for resolution with both positions and evidence
+9. **Report writing:** Produce the debate review report per template and save to file
+10. **Budget management:** Never exceed the user's configured max_cycles
+11. **User communication:** Present divergence status, debate summaries, journey agreement status, and resolution requests clearly between phases

package/plugins/arn-spark/skills/arn-spark-clickable-prototype-teams/references/debate-review-report-template.md ADDED Viewed

@@ -0,0 +1,161 @@
+# Debate Review Report Template -- Clickable Prototype Teams
+Use this template for each cycle's debate review report produced by the `arn-spark-clickable-prototype-teams` skill. The skill populates this template after synthesizing the expert debate outputs (Phase 1 scores + Phase 2 cross-review responses) into categorized findings with final combined scores.
+## Instructions for the Skill
+When populating this template:
+- Every section MUST appear, even if empty (write "None" for empty sections)
+- **Consensus:** both experts agree on the score, or one adjusted to match in cross-review
+- **Additions:** one expert scored lower with feedback, the other did not dispute
+- **Disagreements:** experts explicitly disagreed in cross-review and disagreement persists
+- **No-debate:** Phase 2 was skipped (divergence mode, no divergence detected) -- combined = min(strategist, ux)
+- In single-reviewer mode (no UX specialist): all scores are from the strategist alone. Omit Disagreements and note "Single-Reviewer Mode" throughout.
+- Save each report to `prototypes/clickable/reviews/round-N-cycle-M-debate-report.md`
+- Also copy to `prototypes/clickable/v[M]/review-report.md` for version-local access
+---
+## Template
+```markdown
+# Debate Review Report: Cycle [M], Round [N]
+## Debate Participants
+| Role | Agent | Status |
+|------|-------|--------|
+| Interaction Strategist | arn-spark-product-strategist | Participated |
+| Interaction Flow Reviewer | arn-spark-ux-specialist | Participated / Unavailable |
+## Configuration
+- **Debate mode:** [Divergence / Standard]
+- **Execution mode:** [Agent Teams / Sequential / Single-Reviewer]
+- **Phase 2 triggered:** [Yes -- [N] criteria diverged by >= 2 / No -- all within 1 point / Yes -- standard mode (always)]
+- **Divergent criteria:** [list names, or "None"]
+- **Scoring scale:** [1-N]
+- **Minimum threshold:** [T]
+## Criterion Scores
+| # | Criterion | Strategist | UX Specialist | Combined | Status | Category |
+|---|-----------|-----------|---------------|----------|--------|----------|
+| 1 | [name] | [score] | [score] | [combined] | PASS/FAIL | [Consensus/Addition/Disagreement/No-debate] |
+| 2 | [name] | [score] | [score] | [combined] | PASS/FAIL | [category] |
+| ... | ... | ... | ... | ... | ... | ... |
+## Visual Grounding Comparison
+**Assets provided to reviewers:**
+| Category | Count | Source |
+|----------|-------|--------|
+| References | [N] | [URL captures, user screenshots] |
+| Designs | [N] | [Figma exports, Canva exports, manual mockups] |
+| Brand | [N] | [logos, guidelines] |
+**Comparison notes:**
+- **Reference alignment:** [How well screen layouts and flow feel match the inspirational direction]
+- **Design fidelity:** [How closely screen layouts match the design mockups -- only if designs exist]
+- **Brand compliance:** [Whether brand elements appear correctly across screens -- only if brand assets exist]
+[If no visual grounding assets: "No visual grounding assets provided. Review based on style brief text only."]
+## Journey Results Summary
+| # | Journey | Strategist Assessment | UX Assessment | Agreed | Issues |
+|---|---------|----------------------|---------------|--------|--------|
+| 1 | [name] | [Complete/Partial/Failed] | [Complete/Partial/Failed] | [Yes/No] | [brief summary] |
+| 2 | [name] | [Complete/Partial/Failed] | [Complete/Partial/Failed] | [Yes/No] | [brief summary] |
+| ... | ... | ... | ... | ... | ... |
+[If single-reviewer: only one assessment column, "Agreed" column reads "N/A"]
+## Debate Findings
+### Consensus Criteria
+[Criteria where both experts agreed or one adjusted to match]
+**Criterion [N]: [Name]** -- Combined [X]/[scale]
+- **Strategist:** [brief reasoning with journey/screenshot reference]
+- **UX Specialist:** [brief reasoning with journey/screenshot reference]
+- **Outcome:** Both agree. [Any shared feedback for builder.]
+[Repeat for each consensus criterion, or "None"]
+### Addition Criteria
+[Criteria where one expert raised feedback the other did not dispute]
+**Criterion [N]: [Name]** -- Combined [X]/[scale] (raised by [Strategist / UX Specialist])
+- **Lower scorer:** [agent] scored [X] -- [reasoning with journey evidence]
+- **Higher scorer:** [agent] scored [Y] -- did not dispute
+- **Outcome:** Lower score used. Builder feedback: [specific suggestion with journey step reference]
+[Repeat for each addition criterion, or "None"]
+### Disagreement Criteria
+[Criteria where experts explicitly disagreed after cross-review]
+**Criterion [N]: [Name]**
+- **Strategist:** Score [X] -- [position + reasoning + journey/screenshot evidence]
+- **UX Specialist:** Score [Y] -- [position + reasoning + journey/screenshot evidence]
+- **Trade-off:** [what each score optimizes for]
+- **Resolution:** [User decided: score [Z] because [reasoning] / Pending user input]
+[Repeat for each disagreement criterion, or "None"]
+### No-Debate Criteria
+[Criteria where Phase 2 was skipped -- divergence mode only]
+[If Phase 2 was skipped:] All criteria scored within 1 point. Combined = min(strategist, ux). No cross-review was performed.
+[If Phase 2 ran:] N/A -- all criteria were included in the debate.
+## Failing Criteria
+### [Criterion Name] -- Combined [X]/[scale]
+- **Strategist feedback:** [specific observation and suggestion with journey/screen reference]
+- **UX specialist feedback:** [specific observation and suggestion with journey/screen reference]
+- **Journey evidence:** [which journey step exposed the issue, with screenshot reference]
+- **Debate insight:** [anything surfaced during cross-review that adds context beyond individual feedback]
+- **Priority:** [Critical / Important]
+[Repeat for each failing criterion]
+## Passing Criteria Highlights
+[Brief notes on particularly strong aspects]
+## Summary
+- **Passing:** [N] of [M] criteria meet threshold
+- **Failing:** [N] criteria below threshold
+- **Journeys:** [X] of [Y] completed successfully (agreed by both experts)
+- **Phase 2 triggered:** [Yes / No]
+- **Consensus criteria:** [N]
+- **Addition criteria:** [N]
+- **Disagreement criteria:** [N] ([N] resolved by user)
+- **No-debate criteria:** [N]
+- **Verdict:** PROCEED TO NEXT CYCLE / ALL CRITERIA PASS -- PROCEED TO JUDGE
+## Recommended Focus for Next Cycle
+[If failing: ordered list of what to fix, most critical first, incorporating debate insights and journey evidence. Each item includes the debate context so the builder understands WHY, not just what to fix.]
+1. **[Criterion Name]:** [specific fix] -- Journey [N], Step [M]: [what went wrong]. [debate context: both experts agreed / strategist flagged X while UX specialist noted Y / user resolved in favor of Z]
+2. ...
+```
+## Usage Notes
+- The skill writes this report to `prototypes/clickable/reviews/round-N-cycle-M-debate-report.md` after each debate cycle
+- Also copy to `prototypes/clickable/v[M]/review-report.md` for version-local access
+- The "Recommended Focus for Next Cycle" section is the primary output fed to the builder for the next cycle -- it must be actionable and specific, enriched by debate context and journey evidence
+- In single-reviewer mode, the Category column is always "Single-reviewer", the Disagreements section reads "N/A -- single-reviewer mode", and the UX Specialist column shows "N/A"
+- When Phase 2 is skipped in divergence mode, all criteria are categorized as "No-debate" and the Debate Findings section reflects this
+- The Journey Results Summary table captures expert agreement on journey outcomes -- if experts disagree on whether a journey completed, this is noted and may trigger additional debate or user resolution
+- When writing the final report (`prototypes/clickable/final-report.md`), aggregate all per-cycle debate reports with a summary of the debate arc: how scores evolved, what diverged, what converged, what the user decided

package/plugins/arn-spark/skills/arn-spark-clickable-prototype-teams/references/expert-interaction-review-template.md ADDED Viewed

@@ -0,0 +1,152 @@
+# Expert Interaction Review Template
+This template defines the file format that expert agents (`arn-spark-product-strategist`, `arn-spark-ux-specialist`) use when writing their interaction review reports to disk during `arn-spark-clickable-prototype-teams` debate cycles. Writing reviews to files ensures they survive context compression and provides a full audit trail.
+## File Naming Convention
+All review files go in `prototypes/clickable/reviews/`. Create the directory if it does not exist.
+```
+prototypes/clickable/reviews/
+├── round-N-strategist-review.md       ← Product strategist Phase 1
+├── round-N-ux-review.md               ← UX specialist Phase 1 (+ Phase 2 in sequential mode)
+├── round-N-strategist-cross-review.md ← Product strategist Phase 2 response
+├── round-N-ux-cross-review.md         ← UX specialist Phase 2 response (Agent Teams only)
+└── round-N-cycle-M-debate-report.md   ← Synthesized debate report (written by skill)
+```
+Where `N` is the overall round number and `M` is the cycle number within the current validation run.
+---
+## Phase 1 Template (Independent Scoring)
+When an expert writes their Phase 1 review, the file must follow this structure:
+```markdown
+# Interaction Review: Round [N] -- [Product Strategist / UX Specialist]
+**Agent:** [arn-spark-product-strategist / arn-spark-ux-specialist]
+**Phase:** Phase 1: Independent Scoring
+**Execution mode:** [Agent Teams / Sequential / Single-Reviewer]
+**Criteria scored:** [count]
+**Journey screenshots reviewed:** [count]
+**Journeys assessed:** [count]
+**Version:** v[X]
+---
+## Per-Criterion Scores
+| # | Criterion | Score | Evidence |
+|---|-----------|-------|----------|
+| 1 | [name] | [X]/[scale] | [1-2 sentence observation grounded in specific journey screenshot evidence] |
+| 2 | [name] | [X]/[scale] | [evidence] |
+| ... | ... | ... | ... |
+## Per-Journey Assessment
+| # | Journey | Steps | Completed | Issues | Key Screenshots |
+|---|---------|-------|-----------|--------|----------------|
+| 1 | [name] | [total] | [count] | [brief summary or "None"] | [screenshot filenames] |
+| 2 | [name] | [total] | [count] | [brief summary or "None"] | [screenshot filenames] |
+| ... | ... | ... | ... | ... | ... |
+## Failing Criteria Detail
+### [Criterion Name] -- [X]/[scale]
+- **Observation:** [What specifically is wrong -- reference journey step, screenshot, screen, or interaction]
+- **Expected:** [What the criteria description requires]
+- **Journey evidence:** [Which journey step exposed the issue, with screenshot reference]
+- **Suggestion:** [Specific actionable improvement for the builder]
+- **Priority:** [Critical / Important]
+[Repeat for each criterion below threshold]
+## Passing Criteria Highlights
+[Brief notes on particularly strong aspects, optional]
+## Cross-Cutting Observations
+- **Navigation coherence:** [Overall observation about screen reachability and navigation flow]
+- **Interaction quality:** [Overall observation about interactive element responsiveness and feedback]
+- **Journey experience:** [Overall observation about journey flow, clarity, and completability]
+- **Visual consistency:** [Overall observation about style consistency across screens]
+- **Missing elements:** [Anything expected but not present in the prototype]
+```
+---
+## Phase 2 Template (Cross-Review)
+When an expert writes their Phase 2 cross-review, the file must follow this structure. In sequential mode where the UX specialist writes Phase 1 + Phase 2 combined, append this section after the Phase 1 content in the same file.
+```markdown
+## Cross-Review Response
+### Response to [Product Strategist / UX Specialist]'s Scores
+| # | Criterion | Their Score | My Score | Response | Adjusted Score |
+|---|-----------|-------------|----------|----------|----------------|
+| 1 | [name] | [X] | [Y] | [Agree/Disagree/New concern] | [new score or unchanged] |
+| 2 | [name] | [X] | [Y] | [Agree/Disagree/New concern] | [new score or unchanged] |
+| ... | ... | ... | ... | ... | ... |
+### Journey Assessment Comparison
+| # | Journey | Their Assessment | My Assessment | Response |
+|---|---------|-----------------|---------------|----------|
+| 1 | [name] | [Completed/Partial/Failed] | [Completed/Partial/Failed] | [Agree/Disagree with reason] |
+| ... | ... | ... | ... | ... |
+### Detailed Responses (for divergent criteria)
+**Criterion [N]: [Name]**
+- **Their score:** [X] -- "[their evidence summary]"
+- **My score:** [Y]
+- **Response:** [Agree / Disagree / New concern]
+- **Reasoning:** [Specific counter-evidence or supporting evidence, referencing journey steps and screenshots]
+- **Adjusted score:** [new score, or same if maintaining position]
+[Repeat for each criterion with score difference >= 2, or all criteria in standard mode]
+### New Concerns Prompted by Other Expert's Review
+- [Description of something their review revealed that was not noticed in Phase 1, with journey/screenshot references]
+```
+---
+## Instructions for Expert Agents
+When instructed to write an interaction review:
+1. Read all journey screenshots provided (visually, via multimodal capabilities)
+2. Read the interaction report from `arn-spark-ui-interactor` for journey completion data
+3. Read the criteria list and scoring scale
+4. Read the style brief and product concept for context
+5. Read visual grounding assets if provided (with their category context: references=inspirational direction, designs=specification targets, brand=constraints)
+6. Score EVERY criterion -- do not skip or combine criteria
+7. Assess EVERY journey -- note completion status, issues, and key screenshots
+8. For each score, provide specific evidence grounded in what you observe in the journey screenshots and interaction report
+9. Read this template to understand the expected file format
+10. Write your review to the exact file path specified by the caller
+11. Return a brief summary in conversation (criteria scored, count below threshold, journeys assessed, top concerns) -- the full detail is in the file
+The file contains the COMPLETE review with all scores, journey assessments, and evidence. The conversation summary is just an acknowledgment -- downstream steps read from the file, not from conversation context.
+## Instructions for the Skill (Facilitator)
+When orchestrating expert interaction reviews:
+1. Create the `prototypes/clickable/reviews/` directory if it does not exist before invoking any expert
+2. Tell each expert agent the exact file path to write to AND the path to this template
+3. Tell each expert the criteria list, scoring scale, threshold, and all reference documents (style brief, product concept, visual grounding assets with categories)
+4. Provide all journey screenshots from the interaction testing step AND the interaction report
+5. When invoking for cross-review (Phase 2), tell the expert to READ the other expert's file by providing the file path -- do not pass the file content through conversation
+6. After each expert completes, read the review file (not the conversation summary) to extract scores for divergence calculation and synthesis
+7. Extract per-criterion scores from the "Per-Criterion Scores" table in each review file
+8. Extract per-journey assessments from the "Per-Journey Assessment" table to compare journey-level agreement
+9. When synthesizing the debate report, read ALL review files from the current round -- never rely on conversation context