npm - @fredcallagan/arn-spark - Versions diffs - 5.1.0 - Mend

@fredcallagan/arn-spark 5.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (130) hide show

package/plugins/arn-spark/skills/arn-spark-static-prototype-teams/references/debate-protocol.md ADDED Viewed

@@ -0,0 +1,230 @@
+# Visual Review Debate Protocol
+This document defines the structured debate process for team-based visual review in `arn-spark-static-prototype-teams`. Two expert reviewers -- a product strategist and a UX specialist -- independently score visual criteria against showcase screenshots, then cross-review each other's scores and findings to surface insights, resolve disagreements, and produce richer feedback than mechanical lower-of-two scoring.
+The skill acts as the **facilitator**: it orchestrates the debate phases, passes file paths between agents, synthesizes the debate report, detects divergence, manages resolution, and presents results to the user. The facilitator does not participate in the scoring itself.
+## Team Roles
+| Role | Agent | Perspective |
+|------|-------|-------------|
+| Builder | `arn-spark-prototype-builder` | Creates showcase (not part of debate) |
+| Visual Strategist | `arn-spark-product-strategist` | Brand alignment, design direction, style brief fidelity, overall impression, product context |
+| Visual Flow Reviewer | `arn-spark-ux-specialist` | Component usability, state coverage, interaction patterns, accessibility, spacing/hierarchy |
+| Facilitator | The skill itself | Orchestrates debate, synthesizes report, manages divergence |
+| Judge | `arn-spark-ux-judge` | Independent verdict (not part of debate) |
+## Debate Modes
+### Divergence Mode (Default)
+Cross-review (Phase 2) triggers only when any criterion score differs by >= 2 points between experts. When experts mostly agree (all scores within 1 point), Phase 2 is skipped and combined scores use the lower of each pair (identical to base skill behavior). This mode saves tokens on cycles where the experts align.
+### Standard Mode
+Full cross-review every cycle, regardless of score agreement. Produces richer debate findings but costs more tokens. Use when the project has complex visual decisions where expert dialogue adds value even on criteria they numerically agree about.
+## Execution Modes
+**Important:** All execution modes use the same file-based review output. Each expert writes its review to a file -- this works identically in Agent Teams mode and sequential mode. The execution mode selection is based ONLY on whether Agent Teams is supported. File-based output does NOT affect mode selection and does NOT favor sequential over Agent Teams. When Agent Teams is supported, always use Agent Teams mode -- it is faster because experts run in parallel.
+### Agent Teams Mode (Preferred)
+**When:** Agent Teams is supported by your platform.
+Both experts are spawned as teammates. Phase 1 runs in parallel -- each expert writes to its own file simultaneously with no contention. Phase 2 uses Teams communication to coordinate cross-review -- each expert reads the other's completed file and writes its cross-review to a separate file.
+### Sequential Mode (Fallback)
+**When:** Agent Teams is NOT enabled.
+The skill simulates the debate through sequential expert invocations, manually passing file paths between agents so each can read the other's review. Produces the same logical result as Agent Teams mode but with serialized invocations.
+### Single-Reviewer Mode
+**When:** `arn-spark-ux-specialist` is unavailable.
+No debate occurs. The product strategist reviews independently. Strategist scores become the combined scores directly. The debate report notes "Single-Reviewer Mode" throughout. The skill suggests using `/arn-spark-static-prototype` instead, which handles single-reviewer identically.
+## Debate Phases
+### Phase 1: Independent Scoring
+Both experts independently score ALL criteria against the showcase screenshots. Neither sees the other's scores during this phase.
+**Product strategist focus areas:**
+- Does the color palette match the style brief's hex values?
+- Does the typography match the style brief's specified families, weights, sizes?
+- Do shadows, borders, and radii match the style brief?
+- Does the overall visual impression match the style brief's described feel?
+- Is the showcase coherent with the product concept's intended audience and tone?
+- If visual grounding assets exist:
+  - References: Does the overall direction align with inspirational references?
+  - Designs: How closely do components match design mockups?
+  - Brand: Are brand elements correctly applied?
+**UX specialist focus areas:**
+- Are all component variants rendered (sizes, states)?
+- Are interactive states visually distinct (default, hover, active, disabled, error)?
+- Does the component library theme configuration appear applied globally (not per-component overrides)?
+- Does the visual hierarchy (heading, body, caption) establish clear reading order?
+- Are components aligned to a consistent grid or layout system?
+- Is content density appropriate for the intended use?
+- Does dark mode (if applicable) maintain contrast and readability?
+- Are spacing tokens consistent across components?
+**Phase 1 file output:** Each expert writes their review to a file using the expert visual review template (`${CLAUDE_PLUGIN_ROOT}/skills/arn-spark-static-prototype-teams/references/expert-visual-review-template.md`). The skill tells each agent the exact file path to write to:
+- Product strategist -> `prototypes/static/reviews/round-N-strategist-review.md`
+- UX specialist -> `prototypes/static/reviews/round-N-ux-review.md`
+The agent returns a brief summary in conversation -- the full review is in the file. Downstream steps read from the file, not from conversation context.
+### Divergence Check (Performed by Skill)
+After Phase 1, the skill reads both review files and extracts per-criterion scores from the "Per-Criterion Scores" table.
+**In divergence mode:** Calculate `|strategist_score - ux_score|` for each criterion.
+- If max divergence < 2: Skip Phase 2. Combined score per criterion = `min(strategist, ux)`. Present to user: "Experts scored within 1 point on all criteria. No divergence detected -- skipping cross-review."
+- If any divergence >= 2: Proceed to Phase 2. Present to user: "Divergence detected on [N] criteria (difference >= 2 points): [list criteria names and score pairs]. Triggering cross-review."
+**In standard mode:** Always proceed to Phase 2 regardless of score differences.
+### Phase 2: Cross-Review
+Each expert reads the other's Phase 1 file and responds per-criterion.
+**Instructions for each expert during cross-review:**
+For each criterion:
+- **Agree** -- the other expert's score is valid. Optionally adjust own score (up or down) with reasoning.
+- **Disagree** -- the other expert's score is incorrect. Maintain own score with counter-evidence (reference specific screenshot areas, style brief sections, or component states).
+- **New concern prompted** -- the other expert's review reveals something not previously noticed. Add observation.
+Focus on DIVERGENT criteria (score difference >= 2). For criteria with small differences (<= 1 point), a brief acknowledgment suffices.
+**Phase 2 file output:** Each expert writes their cross-review to a separate file:
+- Product strategist -> `prototypes/static/reviews/round-N-strategist-cross-review.md`
+- UX specialist -> `prototypes/static/reviews/round-N-ux-cross-review.md`
+In sequential mode (where the UX specialist writes Phase 1 + Phase 2 combined), the combined output goes to `round-N-ux-review.md` (a single file with both sections).
+### Phase 3: Synthesis (Performed by Skill)
+The skill reads all review files written by the experts -- never from conversation context. The files to read are:
+- `prototypes/static/reviews/round-N-strategist-review.md` (Phase 1)
+- `prototypes/static/reviews/round-N-ux-review.md` (Phase 1, or Phase 1 + Phase 2 combined in sequential mode)
+- `prototypes/static/reviews/round-N-strategist-cross-review.md` (Phase 2, if written separately)
+- `prototypes/static/reviews/round-N-ux-cross-review.md` (Phase 2, Agent Teams mode only)
+For each criterion, categorize:
+**Consensus:** Both experts scored the same, or one adjusted their score in cross-review to match the other. Combined score = the agreed score.
+**Additions:** One expert scored lower with specific feedback, the other did not dispute the lower score in cross-review (neither agreed nor disagreed). Combined score = the lower score.
+**Disagreements:** Both experts maintained different scores after cross-review -- one raised a concern and the other explicitly disagreed. These require user resolution in Phase 4.
+**No-debate:** Criteria where Phase 2 was skipped (divergence mode, no divergence detected). Combined score = `min(strategist, ux)`.
+Write the debate review report using the debate review report template. Save to `prototypes/static/reviews/round-N-cycle-M-debate-report.md`.
+### Phase 4: Resolution (Conditional)
+**Trigger:** One or more unresolved disagreements exist after Phase 3.
+For each unresolved disagreement, present to the user:
+"Expert disagreement on **[Criterion Name]** (criterion #[N]):
+- **Product Strategist:** Score [X] -- [reasoning and evidence]
+- **UX Specialist:** Score [Y] -- [reasoning and evidence]
+- **Trade-off:** [what each score optimizes for]
+What score should this criterion receive?"
+Record user decisions. Update the debate report with resolutions. The resolved score becomes the final combined score for that criterion.
+## Sequential Mode Invocation Detail
+When Agent Teams is not enabled, the skill simulates the debate with 3 sequential invocations per round:
+**Invocation 1 -- Product Strategist Phase 1:**
+Invoke the `arn-spark-product-strategist` agent via the Task tool, passing the model from `.arness/agent-models/spark.md` as the `model` parameter (see `plugins/arn-spark/skills/arn-spark-ensure-config/references/ensure-config.md` "Dispatch convention" for fallback). Context:
+- Screenshots from capture step
+- All criteria with descriptions, scoring scale, and threshold
+- Style brief, product concept, visual grounding assets (with category context)
+- Expert visual review template path: `${CLAUDE_PLUGIN_ROOT}/skills/arn-spark-static-prototype-teams/references/expert-visual-review-template.md`
+- File path to write to: `prototypes/static/reviews/round-N-strategist-review.md`
+- Instruction: "Score every criterion independently against the screenshots. Write your complete review to the specified file path using the expert visual review template. Return a brief summary in conversation."
+**Invocation 2 -- UX Specialist Phase 1 + Phase 2 Combined:**
+Invoke the `arn-spark-ux-specialist` agent via the Task tool, passing the model from `.arness/agent-models/spark.md` as the `model` parameter (see `plugins/arn-spark/skills/arn-spark-ensure-config/references/ensure-config.md` "Dispatch convention" for fallback). Context:
+- Same inputs as Invocation 1
+- The strategist's file path to read: `prototypes/static/reviews/round-N-strategist-review.md`
+- File path to write to: `prototypes/static/reviews/round-N-ux-review.md`
+- Instruction: "First, score every criterion independently from the UX perspective using the Phase 1 format. Then, read the strategist's review at the specified file path and respond to each criterion score using the Phase 2 cross-review format: agree (optionally adjust your score), disagree (with counter-evidence), or note new concerns. Write your complete review (Phase 1 + Phase 2 combined) to the specified file path. Return a brief summary in conversation."
+**Invocation 3 -- Product Strategist Phase 2:**
+Invoke the `arn-spark-product-strategist` agent via the Task tool, passing the model from `.arness/agent-models/spark.md` as the `model` parameter (see `plugins/arn-spark/skills/arn-spark-ensure-config/references/ensure-config.md` "Dispatch convention" for fallback). Context:
+- The UX specialist's file path to read: `prototypes/static/reviews/round-N-ux-review.md`
+- Expert visual review template path
+- File path to write to: `prototypes/static/reviews/round-N-strategist-cross-review.md`
+- Instruction: "Read the UX specialist's review at the specified file path. The UX specialist has scored the criteria and also responded to your review. Respond to their scores and cross-review using the Phase 2 format. Write your cross-review to the specified file path. Return a brief summary in conversation."
+The skill synthesizes by reading all three review files (not from conversation context).
+**Note on sequential asymmetry:** In sequential mode, the UX specialist sees the strategist's review before writing their own Phase 1 review. Instruct the UX specialist to "score independently first" to minimize anchoring bias. The synthesis step normalizes the output regardless of invocation order.
+## Agent Teams Mode Invocation Detail
+**Phase 1:** Spawn both experts simultaneously as teammates. Each receives:
+- Screenshots, criteria, scoring parameters, style brief, product concept, visual grounding assets
+- Expert visual review template path
+- File path to write to: strategist -> `round-N-strategist-review.md`, UX specialist -> `round-N-ux-review.md`
+- Instruction: "Score every criterion independently. Write the complete review to the specified file path using the expert visual review template. Do not communicate with other teammates during this phase."
+Both produce Phase 1 reviews independently and write them to their respective files.
+**Runtime verification:** After Phase 1, the skill checks that BOTH review files exist and contain per-criterion scores. If one file is missing (Agent Teams silently failed to spawn one expert), invoke the missing expert sequentially and note the issue in the debate report: "Agent Teams Phase 1 partial failure: [agent] did not produce its review file. Invoked sequentially as fallback."
+**Phase 2:** Share file paths through Teams communication:
+- Tell the UX specialist to read the strategist's file: `round-N-strategist-review.md`
+- Tell the strategist to read the UX specialist's file: `round-N-ux-review.md`
+- Each reads the other's file and writes their cross-review to a separate file:
+  - Product strategist -> `round-N-strategist-cross-review.md`
+  - UX specialist -> `round-N-ux-cross-review.md`
+- Each responds using the Phase 2 cross-review format from the expert visual review template
+The skill synthesizes by reading all four review files (not from conversation context).
+## Invocation Counts per Cycle
+| Execution Mode | Debate Mode | Divergence Found | Expert Invocations | Notes |
+|---------------|-------------|------------------|-------------------|-------|
+| Agent Teams | Standard | N/A (always) | 4 (2 P1 + 2 P2) | Parallel within phases |
+| Agent Teams | Divergence | Yes | 4 (2 P1 + 2 P2) | Same as standard |
+| Agent Teams | Divergence | No | 2 (P1 only) | Phase 2 skipped |
+| Sequential | Standard | N/A (always) | 3 (strat P1, UX P1+P2, strat P2) | |
+| Sequential | Divergence | Yes | 3 | |
+| Sequential | Divergence | No | 2 (strat P1, UX P1) | Phase 2 skipped |
+| Single-reviewer | Any | N/A | 1 | No debate |
+For max_cycles=3 with Agent Teams + standard mode: up to 12 expert invocations + 3 build cycles + 3 capture cycles + judge + showcase.
+## Skill's Facilitation Responsibilities
+The `arn-spark-static-prototype-teams` skill (not the agents) is responsible for:
+1. **Agent Teams verification:** Check env var in Step 1 AND verify both experts write files after Phase 1. If one expert's file is missing, invoke sequentially as fallback and log the issue.
+2. **Mode detection:** Record execution mode (agent_teams / sequential) and debate mode (divergence / standard) from Step 1 and Step 3
+3. **Directory setup:** Create `prototypes/static/reviews/` if it does not exist before invoking experts
+4. **File path assignment:** Tell each expert agent the exact file path to write to and the expert visual review template path
+5. **Phase orchestration:** Run Phase 1, perform divergence check, conditionally run Phase 2, telling each expert to read the other's review file (by file path, not by passing content through conversation)
+6. **Synthesis:** Read all expert review files and categorize per criterion into consensus, additions, disagreements. Never rely on the expert's conversation summary -- always read the file.
+7. **Score computation:** For each criterion, compute the final combined score based on debate outcome (consensus: agreed score, additions: lower score, disagreements: user-resolved score, no-debate: min of two)
+8. **Conflict detection:** Identify disagreements and present to user for resolution with both positions and evidence
+9. **Report writing:** Produce the debate review report per template and save to file
+10. **Budget management:** Never exceed the user's configured max_cycles
+11. **User communication:** Present divergence status, debate summaries, and resolution requests clearly between phases

package/plugins/arn-spark/skills/arn-spark-static-prototype-teams/references/debate-review-report-template.md ADDED Viewed

@@ -0,0 +1,148 @@
+# Debate Review Report Template -- Static Prototype Teams
+Use this template for each cycle's debate review report produced by the `arn-spark-static-prototype-teams` skill. The skill populates this template after synthesizing the expert debate outputs (Phase 1 scores + Phase 2 cross-review responses) into categorized findings with final combined scores.
+## Instructions for the Skill
+When populating this template:
+- Every section MUST appear, even if empty (write "None" for empty sections)
+- **Consensus:** both experts agree on the score, or one adjusted to match in cross-review
+- **Additions:** one expert scored lower with feedback, the other did not dispute
+- **Disagreements:** experts explicitly disagreed in cross-review and disagreement persists
+- **No-debate:** Phase 2 was skipped (divergence mode, no divergence detected) -- combined = min(strategist, ux)
+- In single-reviewer mode (no UX specialist): all scores are from the strategist alone. Omit Disagreements and note "Single-Reviewer Mode" throughout.
+- Save each report to `prototypes/static/reviews/round-N-cycle-M-debate-report.md`
+- Also copy to `prototypes/static/v[M]/review-report.md` for version-local access
+---
+## Template
+```markdown
+# Debate Review Report: Cycle [M], Round [N]
+## Debate Participants
+| Role | Agent | Status |
+|------|-------|--------|
+| Visual Strategist | arn-spark-product-strategist | Participated |
+| Visual Flow Reviewer | arn-spark-ux-specialist | Participated / Unavailable |
+## Configuration
+- **Debate mode:** [Divergence / Standard]
+- **Execution mode:** [Agent Teams / Sequential / Single-Reviewer]
+- **Phase 2 triggered:** [Yes -- [N] criteria diverged by >= 2 / No -- all within 1 point / Yes -- standard mode (always)]
+- **Divergent criteria:** [list names, or "None"]
+- **Scoring scale:** [1-N]
+- **Minimum threshold:** [T]
+## Criterion Scores
+| # | Criterion | Strategist | UX Specialist | Combined | Status | Category |
+|---|-----------|-----------|---------------|----------|--------|----------|
+| 1 | [name] | [score] | [score] | [combined] | PASS/FAIL | [Consensus/Addition/Disagreement/No-debate] |
+| 2 | [name] | [score] | [score] | [combined] | PASS/FAIL | [category] |
+| ... | ... | ... | ... | ... | ... | ... |
+## Visual Grounding Comparison
+**Assets provided to reviewers:**
+| Category | Count | Source |
+|----------|-------|--------|
+| References | [N] | [URL captures, user screenshots] |
+| Designs | [N] | [Figma exports, Canva exports, manual mockups] |
+| Brand | [N] | [logos, guidelines] |
+**Comparison notes:**
+- **Reference alignment:** [strategist and UX specialist observations, noting any agreement or disagreement]
+- **Design fidelity:** [only if designs exist]
+- **Brand compliance:** [only if brand assets exist]
+[If no visual grounding assets: "No visual grounding assets provided. Review based on style brief text only."]
+## Debate Findings
+### Consensus Criteria
+[Criteria where both experts agreed or one adjusted to match]
+**Criterion [N]: [Name]** -- Combined [X]/[scale]
+- **Strategist:** [brief reasoning]
+- **UX Specialist:** [brief reasoning]
+- **Outcome:** Both agree. [Any shared feedback for builder.]
+[Repeat for each consensus criterion, or "None"]
+### Addition Criteria
+[Criteria where one expert raised feedback the other did not dispute]
+**Criterion [N]: [Name]** -- Combined [X]/[scale] (raised by [Strategist / UX Specialist])
+- **Lower scorer:** [agent] scored [X] -- [reasoning]
+- **Higher scorer:** [agent] scored [Y] -- did not dispute
+- **Outcome:** Lower score used. Builder feedback: [specific suggestion]
+[Repeat for each addition criterion, or "None"]
+### Disagreement Criteria
+[Criteria where experts explicitly disagreed after cross-review]
+**Criterion [N]: [Name]**
+- **Strategist:** Score [X] -- [position + reasoning + evidence]
+- **UX Specialist:** Score [Y] -- [position + reasoning + evidence]
+- **Trade-off:** [what each score optimizes for]
+- **Resolution:** [User decided: score [Z] because [reasoning] / Pending user input]
+[Repeat for each disagreement criterion, or "None"]
+### No-Debate Criteria
+[Criteria where Phase 2 was skipped -- divergence mode only]
+[If Phase 2 was skipped:] All criteria scored within 1 point. Combined = min(strategist, ux). No cross-review was performed.
+[If Phase 2 ran:] N/A -- all criteria were included in the debate.
+## Failing Criteria
+### [Criterion Name] -- Combined [X]/[scale]
+- **Strategist feedback:** [specific observation and suggestion]
+- **UX specialist feedback:** [specific observation and suggestion]
+- **Debate insight:** [anything surfaced during cross-review that adds context beyond individual feedback]
+- **Priority:** [Critical / Important]
+[Repeat for each failing criterion]
+## Passing Criteria Highlights
+[Brief notes on particularly strong aspects]
+## Summary
+- **Passing:** [N] of [M] criteria meet threshold
+- **Failing:** [N] criteria below threshold
+- **Phase 2 triggered:** [Yes / No]
+- **Consensus criteria:** [N]
+- **Addition criteria:** [N]
+- **Disagreement criteria:** [N] ([N] resolved by user)
+- **No-debate criteria:** [N]
+- **Verdict:** PROCEED TO NEXT CYCLE / ALL CRITERIA PASS -- PROCEED TO JUDGE
+## Recommended Focus for Next Cycle
+[If failing: ordered list of what to fix, most critical first, incorporating debate insights. Each item includes the debate context so the builder understands WHY, not just what to fix.]
+1. **[Criterion Name]:** [specific fix] -- [debate context: both experts agreed / strategist flagged X while UX specialist noted Y / user resolved in favor of Z]
+2. ...
+```
+## Usage Notes
+- The skill writes this report to `prototypes/static/reviews/round-N-cycle-M-debate-report.md` after each debate cycle
+- Also copy to `prototypes/static/v[M]/review-report.md` for version-local access
+- The "Recommended Focus for Next Cycle" section is the primary output fed to the builder for the next cycle -- it must be actionable and specific, enriched by debate context
+- In single-reviewer mode, the Category column is always "Single-reviewer", the Disagreements section reads "N/A -- single-reviewer mode", and the UX Specialist column shows "N/A"
+- When Phase 2 is skipped in divergence mode, all criteria are categorized as "No-debate" and the Debate Findings section reflects this
+- When writing the final report (`prototypes/static/final-report.md`), aggregate all per-cycle debate reports with a summary of the debate arc: how scores evolved, what diverged, what converged, what the user decided

package/plugins/arn-spark/skills/arn-spark-static-prototype-teams/references/expert-visual-review-template.md ADDED Viewed

@@ -0,0 +1,130 @@
+# Expert Visual Review Template
+This template defines the file format that expert agents (`arn-spark-product-strategist`, `arn-spark-ux-specialist`) use when writing their visual review reports to disk during `arn-spark-static-prototype-teams` debate cycles. Writing reviews to files ensures they survive context compression and provides a full audit trail.
+## File Naming Convention
+All review files go in `prototypes/static/reviews/`. Create the directory if it does not exist.
+```
+prototypes/static/reviews/
+├── round-N-strategist-review.md       ← Product strategist Phase 1
+├── round-N-ux-review.md               ← UX specialist Phase 1 (+ Phase 2 in sequential mode)
+├── round-N-strategist-cross-review.md ← Product strategist Phase 2 response
+├── round-N-ux-cross-review.md         ← UX specialist Phase 2 response (Agent Teams only)
+└── round-N-cycle-M-debate-report.md   ← Synthesized debate report (written by skill)
+```
+Where `N` is the overall round number and `M` is the cycle number within the current validation run.
+---
+## Phase 1 Template (Independent Scoring)
+When an expert writes their Phase 1 review, the file must follow this structure:
+```markdown
+# Visual Review: Round [N] -- [Product Strategist / UX Specialist]
+**Agent:** [arn-spark-product-strategist / arn-spark-ux-specialist]
+**Phase:** Phase 1: Independent Scoring
+**Execution mode:** [Agent Teams / Sequential / Single-Reviewer]
+**Criteria scored:** [count]
+**Screenshots reviewed:** [count]
+**Version:** v[X]
+---
+## Per-Criterion Scores
+| # | Criterion | Score | Evidence |
+|---|-----------|-------|----------|
+| 1 | [name] | [X]/[scale] | [1-2 sentence observation grounded in specific screenshot evidence] |
+| 2 | [name] | [X]/[scale] | [evidence] |
+| ... | ... | ... | ... |
+## Failing Criteria Detail
+### [Criterion Name] -- [X]/[scale]
+- **Observation:** [What specifically is wrong -- reference screenshot area, component, or element]
+- **Expected:** [What the style brief or criteria description requires]
+- **Suggestion:** [Specific actionable improvement for the builder]
+- **Priority:** [Critical / Important]
+[Repeat for each criterion below threshold]
+## Passing Criteria Highlights
+[Brief notes on particularly strong aspects, optional]
+## Cross-Cutting Observations
+- **Style coherence:** [Overall observation about the showcase's visual consistency]
+- **Component quality:** [Overall observation about component rendering quality]
+- **Missing elements:** [Anything expected but not present in the showcase]
+```
+---
+## Phase 2 Template (Cross-Review)
+When an expert writes their Phase 2 cross-review, the file must follow this structure. In sequential mode where the UX specialist writes Phase 1 + Phase 2 combined, append this section after the Phase 1 content in the same file.
+```markdown
+## Cross-Review Response
+### Response to [Product Strategist / UX Specialist]'s Scores
+| # | Criterion | Their Score | My Score | Response | Adjusted Score |
+|---|-----------|-------------|----------|----------|----------------|
+| 1 | [name] | [X] | [Y] | [Agree/Disagree/New concern] | [new score or unchanged] |
+| 2 | [name] | [X] | [Y] | [Agree/Disagree/New concern] | [new score or unchanged] |
+| ... | ... | ... | ... | ... | ... |
+### Detailed Responses (for divergent criteria)
+**Criterion [N]: [Name]**
+- **Their score:** [X] -- "[their evidence summary]"
+- **My score:** [Y]
+- **Response:** [Agree / Disagree / New concern]
+- **Reasoning:** [Specific counter-evidence or supporting evidence, referencing screenshot areas]
+- **Adjusted score:** [new score, or same if maintaining position]
+[Repeat for each criterion with score difference >= 2, or all criteria in standard mode]
+### New Concerns Prompted by Other Expert's Review
+- [Description of something their review revealed that was not noticed in Phase 1]
+```
+---
+## Instructions for Expert Agents
+When instructed to write a visual review:
+1. Read all screenshots provided (visually, via multimodal capabilities)
+2. Read the criteria list and scoring scale
+3. Read the style brief and product concept for context
+4. Read visual grounding assets if provided (with their category context: references=inspirational, designs=specification, brand=constraints)
+5. Score EVERY criterion -- do not skip or combine criteria
+6. For each score, provide specific evidence grounded in what you observe in the screenshots
+7. Read this template to understand the expected file format
+8. Write your review to the exact file path specified by the caller
+9. Return a brief summary in conversation (criteria scored, count below threshold, top concerns) -- the full detail is in the file
+The file contains the COMPLETE review with all scores and evidence. The conversation summary is just an acknowledgment -- downstream steps read from the file, not from conversation context.
+## Instructions for the Skill (Facilitator)
+When orchestrating expert visual reviews:
+1. Create the `prototypes/static/reviews/` directory if it does not exist before invoking any expert
+2. Tell each expert agent the exact file path to write to AND the path to this template
+3. Tell each expert the criteria list, scoring scale, threshold, and all reference documents (style brief, product concept, visual grounding assets with categories)
+4. Provide all screenshots from the capture step
+5. When invoking for cross-review (Phase 2), tell the expert to READ the other expert's file by providing the file path -- do not pass the file content through conversation
+6. After each expert completes, read the review file (not the conversation summary) to extract scores for divergence calculation and synthesis
+7. Extract per-criterion scores from the "Per-Criterion Scores" table in each review file
+8. When synthesizing the debate report, read ALL review files from the current round -- never rely on conversation context