npm - @zigrivers/scaffold - Versions diffs - 3.10.1 → 3.12.0 - Mend

@zigrivers/scaffold 3.10.1 → 3.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/README.md +7 -5
package/content/knowledge/core/automated-review-tooling.md +137 -140
package/content/knowledge/core/multi-model-research-dispatch.md +219 -0
package/content/knowledge/core/multi-model-review-dispatch.md +47 -6
package/content/knowledge/game/game-ideation.md +100 -0
package/content/knowledge/product/ideation-craft.md +209 -0
package/content/methodology/game-overlay.yml +2 -0
package/content/pipeline/foundation/tech-stack.md +1 -0
package/content/pipeline/vision/create-vision.md +43 -0
package/content/skills/multi-model-dispatch/SKILL.md +20 -22
package/content/tools/post-implementation-review.md +71 -26
package/content/tools/prompt-pipeline.md +1 -0
package/content/tools/review-code.md +37 -11
package/content/tools/review-pr.md +65 -23
package/content/tools/spark.md +337 -0
package/package.json +1 -1
package/skills/multi-model-dispatch/SKILL.md +20 -22

package/content/knowledge/core/multi-model-review-dispatch.md CHANGED Viewed

@@ -44,21 +44,53 @@ The review never blocks on external model availability.
 ## Deep Guidance
+See `review-methodology` for severity definitions (P0-P3). This entry uses those severities but does not define them.
 ### Dispatch Mechanics
+#### Foreground-Only Execution
+When an AI agent dispatches CLI reviews via a tool runner (Claude Code Bash tool, Codex exec, etc.), always run commands in the foreground. Background execution (`run_in_background`, `&`, `nohup`) produces empty or truncated output from Codex and Gemini CLIs. Multiple foreground calls can still run in parallel if the tool runner supports parallel tool invocations.
 #### CLI Availability Check
-Before dispatching, verify the model CLI is installed and authenticated:
+Before dispatching, verify the model CLI is installed and authenticated using a two-step process that produces distinct statuses for the orchestration layer:
+**Step 1 — Installation check:**
+```bash
+# Codex: not found -> status: "not_installed"
+command -v codex >/dev/null 2>&1
+# Gemini: not found -> status: "not_installed"
+command -v gemini >/dev/null 2>&1
+```
+If the CLI is not found, report status `not_installed` to the orchestration layer. Do not prompt the user to install it.
+**Step 2 — Auth verification (only if installed):**
 ```bash
-# Codex check
-which codex && codex --version 2>/dev/null
+# Codex: fail -> status: "auth_failed"
+codex login status 2>/dev/null
-# Gemini check (via Google Cloud CLI or dedicated tool)
-which gemini 2>/dev/null || (which gcloud && gcloud ai models list 2>/dev/null)
+# Gemini: exit 41 -> status: "auth_failed"
+NO_BROWSER=true gemini -p "respond with ok" -o json 2>&1
 ```
-If the CLI is not found, skip dispatch immediately. Do not prompt the user to install it — this is a review enhancement, not a requirement.
+If auth fails, report status `auth_failed` and surface recovery to the user:
+- Codex: "Codex auth expired — run `! codex login` to re-authenticate"
+- Gemini: "Gemini auth expired — run `! gemini -p \"hello\"` to re-authenticate"
+If auth check times out (~5 seconds), retry once. If still failing, report `auth_timeout`.
+If auth succeeds, report `ready` and proceed to dispatch.
+**Post-dispatch terminal states:**
+- `completed` — channel produced results, use normally
+- `partial_timeout` — partial output before timeout; use what was received, note incompleteness. Does NOT trigger compensating pass.
+- `failed` — crashed or unparseable output; triggers compensating pass.
+Verdict impact: `partial_timeout` and `failed` channels mean the review is degraded. Maximum verdict is `degraded-pass` when any channel has a non-`completed` terminal state.
 #### Prompt Formatting
@@ -241,6 +273,15 @@ Minimum standards for a multi-model review to be considered complete:
 If the primary Claude review produces zero findings and external models are unavailable, the review should explicitly note this as unusual and recommend a targeted re-review at a later stage.
+#### Degraded-Mode Gate Adaptation
+When channels are skipped and compensating passes are used:
+- **Minimum finding count** gate: compensating passes count toward the total but are not treated as separate external channels for consensus purposes.
+- **Reconciliation completeness** gate (cross-model disagreement documentation): applies whenever 2+ distinct model perspectives participate (Claude + one external counts). N/A only when Claude is the sole perspective (no external models and no compensating passes that introduce genuinely different framing).
+- **Coverage threshold** gate: compensating passes satisfy the "every pass has at least one finding or explicit no-issues note" requirement.
+- The reconciled output must record which channels were real, which were compensating, and which were skipped, so the orchestration layer can apply appropriate verdict logic.
 ### Common Anti-Patterns
 **Blind trust of external findings.** An external model flags an issue and the reviewer includes it without verification. External models hallucinate — they may flag a "missing section" that actually exists, or cite a "contradiction" based on a misread. Fix: every external finding must be verified against the actual artifact before inclusion in the final report.

package/content/knowledge/game/game-ideation.md ADDED Viewed

@@ -0,0 +1,100 @@
+---
+name: game-ideation
+description: Game-specific ideation techniques for spark — core loop, player fantasy, retention, session design, monetization
+topics: [game-dev, ideation, core-loop, player-fantasy, retention, monetization, session-design]
+---
+Game ideation applies game-specific lenses — core loop, player fantasy, retention mechanics, session design, and monetization — to the spark tool's ideation flow. It supplements the general ideation-craft entry when a user is exploring a game idea.
+## Summary
+### Game Ideation Lenses
+Five lenses to apply during idea exploration: **Core loop** (what the player does every 30 seconds), **Player fantasy** (the emotional experience, not mechanics), **Retention** (what brings players back), **Session design** (how long and how satisfying), **Monetization** (how the game sustains itself).
+### Quick Tests
+- **Core loop**: Can you describe it in one sentence without "and"?
+- **Player fantasy**: Does every major mechanic reinforce it?
+- **Retention**: What happens if the player leaves for a week?
+## Deep Guidance
+### Core Loop Identification
+- **What is the core loop?** The repeating cycle of actions the player performs most often. In a shooter: aim → shoot → loot → repeat. In a puzzle game: observe → plan → execute → evaluate → repeat.
+- **Ask the user**: "What does the player do every 30 seconds? Every 5 minutes? Every session?"
+- **Test**: Can you describe the core loop in one sentence without using the word "and"? If not, it's too complex or undefined.
+### Player Fantasy
+- **What fantasy does the player live out?** Not the game mechanics — the emotional experience. "I am a powerful wizard" not "I cast spells with mana."
+- **Ask the user**: "When the player tells their friend about your game, what do they say it feels like?"
+- **Test**: Does every major mechanic reinforce the fantasy? If a mechanic exists but doesn't serve the fantasy, question why it's there.
+### Retention Mechanics
+- **Session hooks**: What brings the player back tomorrow? (Daily rewards, story cliffhangers, social obligations, unfinished goals)
+- **Progression**: What does the player invest that makes leaving costly? (Character levels, base building, collection progress, social reputation)
+- **Ask the user**: "What happens if the player doesn't open the game for a week? Do they lose anything? Miss anything?"
+### Session Design
+- **Session length**: How long is a typical play session? (Mobile: 3-5 min. PC: 30-90 min. Console: 60+ min.)
+- **Session arc**: Does each session have a beginning, middle, and satisfying end? Can the player stop mid-session without frustration?
+- **Ask the user**: "Where and when does your player play? Commute? Couch? Desk? This determines session length."
+### Monetization Models
+- **Premium**: Pay once, play forever. Best for narrative, creative, or skill-based games.
+- **Free-to-play**: Free entry, monetize through cosmetics, battle pass, or convenience. Best for multiplayer/social games.
+- **Subscription**: Recurring payment for ongoing content. Best for live-service games.
+- **Ask the user**: "How does your player feel about spending money in your game? What would they pay for? What would feel unfair?"
+### Applying Game Lenses During Spark Phases
+**Phase 1 (Seed)**: Ask about the core loop and player fantasy early. These are the foundation — if they're unclear, everything else is built on sand.
+**Phase 2 (Research)**: Research competitors through a game lens. For each competitor: What's their core loop? What fantasy do they deliver? How do they monetize? What's their session design? Where do player reviews complain?
+**Phase 3 (Expand)**: Use game-specific expansion angles:
+- "What if the core loop had a social/multiplayer dimension?"
+- "What if you added a metagame layer on top of the core loop?"
+- "What platform would change the experience most? (Mobile → PC, or vice versa)"
+- "What if monetization was through player-created content?"
+**Phase 4 (Challenge)**: Challenge through game-specific risk lenses:
+- "Core loop fatigue — will this still be fun after 100 hours?"
+- "Monetization pressure — does the business model conflict with the player fantasy?"
+- "Scope vs. team — can a [team size] team build this in [timeline]?"
+- "Platform expectations — does the session design match the platform's usage patterns?"
+### Game-Specific Brief Sections
+When writing the spark brief for a game idea, adapt sections:
+- **Idea & Problem Space** → Include the core loop and player fantasy
+- **Landscape** → Frame competitors by core loop and fantasy, not just features
+- **Expansion Ideas** → Tag which ideas affect the core loop vs. metagame vs. content
+- **Risks** → Include core loop fatigue, monetization/fantasy tension, and scope risks
+### Scoping by Project Scale
+| Scale | Core loop | Content depth | Monetization | Session design |
+|-------|-----------|---------------|-------------|----------------|
+| Game jam (48-72h) | One mechanic, tight loop | Minimal — procedural or template | None (free) | 5-15 min total |
+| Indie (solo/small team) | 1-2 mechanics, polished | Handcrafted, limited scope | Premium or F2P with cosmetics | 15-60 min sessions |
+| AA/studio | Multiple interlocking systems | Extensive content pipeline | Any model, balanced | Platform-appropriate |
+### Common Game Ideation Anti-Patterns
+- **The Kitchen Sink**: Trying to combine too many mechanics before any one is fun. Focus the core loop first.
+- **Fantasy Mismatch**: The monetization model undermines the player fantasy. (Pay-to-win in a skill-based competitive game.)
+- **Platform Blindness**: Designing a 90-minute session game for mobile, or a 3-minute session for PC/console.
+- **Retention Treadmill**: Relying on FOMO and daily login rewards instead of intrinsic motivation. Players resent obligation.
+- **Scope Denial**: "We'll just add multiplayer later." Multiplayer is an architecture decision, not a feature toggle.
+- **Clone Trap**: "Like [popular game] but with [small twist]." The twist must be fundamental enough to justify switching costs.
+### Core Loop Evaluation Worksheet
+When evaluating a proposed core loop, walk through these questions:
+1. **Primary loop**: What does the player do every 30 seconds? Is it inherently satisfying?
+2. **Secondary loop**: What does the player do every 5 minutes? Does it give meaning to the primary loop?
+3. **Tertiary loop**: What does the player do every session? Does it create a sense of progress?
+4. **Friction test**: Remove one mechanic from the loop. Does the game still work? If yes, that mechanic may be unnecessary.
+5. **Fantasy alignment**: Does every step in the loop reinforce the player fantasy? If a step breaks immersion, redesign it.
+6. **Depth test**: Can a skilled player execute the loop differently than a novice? If not, the loop may lack depth.
+7. **Social test**: Would watching someone else do this loop be entertaining? If not, the loop may lack spectacle or surprise.

package/content/knowledge/product/ideation-craft.md ADDED Viewed

@@ -0,0 +1,209 @@
+---
+name: ideation-craft
+description: Questioning techniques, research methodology, lightweight expansion patterns, and brief synthesis for early-stage idea exploration
+topics: [ideation, questioning, research, competitive-analysis, brief-synthesis, socratic-method]
+---
+# Ideation Craft
+Ideation craft covers the questioning, research, and synthesis techniques used during early-stage idea exploration. It guides a conversational flow from a raw idea through competitive research to a structured idea brief.
+## Summary
+### Key Techniques
+- **Questioning**: Socratic method (what → who → why → why not), 5 Whys for root cause, "What would have to be true?" for assumptions. Batch 2-3 questions per turn.
+- **Research**: Scan direct competitors, indirect alternatives, and the "do nothing" option. Capture strengths, weaknesses, positioning per competitor. Check adjacent markets and market timing.
+- **Expansion**: Lightweight one-liner prompts — adjacent markets, ecosystem plays, contrarian angles, tech enablers, AI-native rethinking. These are conversation starters, not full strategic methodology.
+- **Synthesis**: 2-4 sentences per brief section. Tag confidence: validated, hypothesized, or speculative. Never fabricate — write "None identified" for empty sections.
+## Deep Guidance
+### Questioning Techniques
+- **Socratic method**: Ask progressively deeper questions. Start with "what" (the idea), move to "who" (the audience), then "why" (the problem), then "why not" (the assumptions).
+- **The 5 Whys**: When the user states a problem, ask "why?" five times to reach the root cause. Surface-level problems hide deeper opportunities.
+- **"What would have to be true?"**: For every assumption, ask what conditions must hold for it to work. This surfaces hidden dependencies and risks.
+- **Batching**: Group 2-3 related questions per turn. Don't pepper the user with single questions (wastes turns) or overwhelm with 10 at once (causes shallow answers).
+### Progressive Questioning Framework
+**Turn 1 — Capture the spark**: What are you building? Who is it for? What problem does it solve?
+**Turn 2 — Dig into the problem**: How do people solve this today? What's painful about the current approach? How often do they experience this pain?
+**Turn 3 — Understand the audience**: Describe the person who needs this most. What are they doing the moment before they reach for your product? What does "success" look like from their perspective?
+**Turn 4 — Challenge assumptions**: You said [X] — what evidence do you have? What would have to be true for [Y] to work? If [Z] turned out to be wrong, would the idea still make sense?
+**Turn 5+ — Deepen based on gaps**: Follow the thread. If the audience is unclear, keep pulling on that. If the problem is well-defined but the solution is vague, focus there. Don't follow a script — follow the gaps.
+### Research Methodology
+- **Competitor scan**: Search for direct competitors (same problem, same audience), indirect alternatives (different approach, same problem), and the "do nothing" option (how users cope today).
+- **What to capture per competitor**: Name, what they do well (be specific), where they fall short (be honest), pricing model, target audience, and why a user might choose them over the idea.
+- **Adjacent markets**: Look for products solving related problems for the same audience, or the same problem for a different audience. These are expansion opportunities.
+- **Market timing**: Why now? What changed (technology, regulation, culture, behavior) that makes this idea viable today when it wasn't before?
+### Expansion Patterns (Lightweight)
+- **Adjacent market**: "Your users also need X — have you considered expanding into that?"
+- **Ecosystem play**: "If you solve A, you become the natural place to also solve B and C."
+- **Contrarian angle**: "Everyone in this space does X. What if you deliberately did the opposite?"
+- **Technology enabler**: "A new capability (API, model, platform) makes Y possible now — could that reshape your approach?"
+- **AI-native rethinking**: "If you assumed AI could handle Z, how would that change the product?"
+These are conversation starters for Phase 3 (Expand), not full strategic methodology. The pipeline's `innovate-vision` step covers comprehensive strategic expansion later.
+### Brief Synthesis
+- A good directional hypothesis names a specific audience, problem, and approach — not vague aspirations.
+- Bad: "This app will help people be more productive." Good: "Freelance designers who lose 5+ hours/week to invoice tracking — a tool that auto-generates invoices from their time-tracking data."
+- Tag confidence levels: "validated" (user confirmed + research supports), "hypothesized" (user stated but unresearched), "speculative" (surfaced during expansion, unconfirmed).
+- Each brief section should be 2-4 sentences or concise bullet points. If a section has nothing, write "None identified" — don't fabricate.
+### Competitive Research Process
+1. **Start with the obvious**: Search for "[problem] app" or "[problem] tool." The first 5-10 results are the landscape the user will compete against.
+2. **Check review sites**: App Store reviews, G2, Capterra, ProductHunt comments. Users complain about exactly the gaps a new product can fill.
+3. **Look for failures**: Search "[category] startup failed" or "[competitor] shutdown." Failed attempts tell you what didn't work and why.
+4. **Find the "do nothing" option**: How do people cope without any tool? Spreadsheets, manual processes, asking friends? This is often the biggest competitor.
+5. **Assess timing**: Search for recent news, funding rounds, regulatory changes, or technology launches in the space. Timing explains why an idea works now when it didn't before.
+### Framing Research for External Model Dispatch
+When dispatching to an external model for competitive research (depth 4+), frame the prompt as:
+> "Research the competitive landscape for [idea summary]. Identify: (1) Direct competitors solving the same problem for the same audience, (2) Indirect alternatives — different approaches to the same problem, (3) The 'do nothing' option — how users cope today, (4) Recent market signals — funding, launches, shutdowns, regulatory changes. For each competitor, note what they do well and where they fall short. Be honest — acknowledge genuine strengths."
+### Brief Section Guidance
+| Section | Source Phase | What to write | Common mistakes |
+|---------|-------------|---------------|-----------------|
+| Idea & Problem Space | Phase 1 (Seed) | Core idea, specific problem, target audience, why they need it | Too vague ("helps people"), no audience specificity |
+| Landscape | Phase 2 (Research) | 2-5 competitors with strengths/weaknesses, positioning | Dismissing competitors, listing without analysis |
+| Expansion Ideas | Phase 3 (Expand) | Accepted ideas tagged as preliminary, deferred ideas noted | Treating preliminary as committed scope |
+| Constraints & Scope | Phase 4 (Challenge) | Confirmed assumptions, what's in/out, locked decisions | Scope too broad, no explicit "out" list |
+| Technology Opportunities | Phase 2-3 | Tech enablers discovered during research/expansion | Listing technologies without explaining why they matter |
+| Open Questions | All phases | Unresolved items that need answers before building | Ignoring questions that feel uncomfortable |
+| Risks | Phase 4 (Challenge) | Market, technical, feasibility risks with severity | Only listing technical risks, ignoring market risks |
+### Audience Definition Techniques
+Avoid demographic-only definitions ("18-35 year old professionals"). Instead, define audiences by behavior and motivation:
+**Behavior-based**: "People who currently track expenses in a spreadsheet because existing apps are too complex."
+**Motivation-based**: "Freelancers who want to spend less than 10 minutes per week on invoicing so they can focus on client work."
+**Context-based**: "The moment someone finishes a client project and thinks 'now I have to figure out the invoice' — that's when they need this."
+**Questions to sharpen audience definition:**
+- What is this person doing the moment before they reach for your product?
+- What is the last thing they tried? Why did it fail them?
+- How would they describe their problem to a friend (not in your language — in theirs)?
+- If you could only serve ONE type of user, who would it be and why?
+### Problem Validation Framework
+Before accepting a problem statement, test it:
+1. **Specificity test**: Can you name a real person (or type of person) who has this problem? If "everyone has this problem," it's too vague.
+2. **Frequency test**: How often does this problem occur? Daily problems are more valuable than annual ones.
+3. **Severity test**: When this problem occurs, how painful is it? Mild inconvenience or hair-on-fire emergency?
+4. **Workaround test**: How do people cope today? If they have a workable (even if imperfect) solution, your product must be dramatically better.
+5. **Willingness test**: Would someone pay money / change habits / switch tools to solve this? If not, the problem may not be valuable enough.
+### Scope Sharpening Techniques
+When the idea is too broad, use these techniques to find the core:
+- **The one-feature test**: "If your product could only do ONE thing, what would it be?" This reveals the core value proposition.
+- **The removal test**: "If you removed [feature X], would anyone still use the product?" If yes, X is not core.
+- **The first-user test**: "Who is the first person who would use this, and what exactly would they do with it?" This grounds abstract ideas in concrete behavior.
+- **The MVP boundary**: "What is the smallest thing you could build that would make one person's life measurably better?" This defines the initial scope.
+- **The anti-scope list**: Explicitly list what the product does NOT do. This is as important as what it does.
+### Positioning Against Competitors
+When the landscape is crowded, help the user find genuine differentiation:
+- **Head-to-head**: "Competitor X does this well. You would need to be 10x better at this specific thing to win users away. Can you be?"
+- **Underserved segment**: "Competitor X serves enterprise. Is there an underserved segment (freelancers, students, non-profits) that you could own?"
+- **Different job**: "Competitor X solves problem A. Could you solve a related but different problem B for the same audience?"
+- **Channel advantage**: "Competitor X requires a desktop app. Could you win by being mobile-first, browser-based, or embedded in an existing workflow?"
+- **Timing advantage**: "What has changed (new technology, regulation, cultural shift) that makes your approach viable now when it wasn't when competitors launched?"
+### Ideation Anti-Patterns
+| Anti-pattern | What it sounds like | Why it's dangerous | How to challenge |
+|-------------|--------------------|--------------------|-----------------|
+| Solution-first | "I want to build an app that..." | Skips the problem entirely | "What problem does this solve? For whom?" |
+| Everyone-needs-this | "Everyone could use this" | No target audience = no product | "Who needs this MOST? Who would pay?" |
+| Feature soup | "It'll do X and Y and Z and..." | No core value proposition | "Remove one feature. Does it still work?" |
+| Competitor blindness | "Nobody else does this" | Almost certainly false | "How do people solve this today?" |
+| Technology hammer | "I learned [tech] and want to use it" | Technology seeking a problem | "Forget the tech. What problem exists?" |
+| Scale fantasy | "Once we have millions of users..." | Ignores the path to the first user | "How do you get user #1? User #10?" |
+| Uniqueness obsession | "We need a totally new idea" | Execution beats novelty almost always | "What existing idea could you execute 10x better?" |
+### Worked Example: From Vague to Sharp
+**Vague starting point**: "An app for recipes"
+**After Phase 1 (Seed):**
+- Who: Home cooks who meal prep on weekends but waste food because they buy ingredients for recipes they never make.
+- Problem: Planning meals for the week takes 45+ minutes, and existing apps have 50,000 recipes but no help deciding which ones to cook together.
+- Core idea: A meal planning tool that suggests complementary recipes sharing ingredients, minimizing waste and shopping time.
+**After Phase 2 (Research):**
+- Competitors: Mealime (good UI but no ingredient overlap), Paprika (great for saving recipes but no planning), Eat This Much (calorie-focused, not taste-focused).
+- Gap: No tool optimizes for ingredient reuse across a week of meals.
+**After Phase 3 (Expand):**
+- Accepted: Grocery list auto-generation from the meal plan (directly supports core value).
+- Deferred: Social sharing of meal plans (not core, revisit later).
+- Rejected: Calorie tracking (different problem, different audience).
+**After Phase 4 (Challenge):**
+- Confirmed: The ingredient-overlap algorithm is the differentiator.
+- Revised: Scope down from "all cuisines" to "weeknight dinners, 30 min or less" for MVP.
+- Locked out: No restaurant recommendations, no diet tracking, no social features for v1.
+This progression from "an app for recipes" to a tightly scoped meal planning tool with a clear differentiator is what a good spark session produces.
+### Confidence Tagging Guide
+Every claim in the spark brief should carry an implicit confidence level. This helps `create-vision` know what to validate vs. what to build on.
+**Validated** (highest confidence):
+- User stated it AND research supports it.
+- Example: "3 competitors exist in this space" (user said, you verified via search).
+- create-vision can build on this without re-exploring.
+**Hypothesized** (medium confidence):
+- User stated it but it hasn't been independently verified.
+- Example: "Target users are freelance designers" (user's claim, no research to confirm market size).
+- create-vision should probe deeper on these — targeted follow-up questions.
+**Speculative** (lowest confidence):
+- Surfaced during expansion or challenge, not yet confirmed by user.
+- Example: "Meal planning apps retain 3x better than recipe apps" (research finding, user hasn't decided whether to pivot).
+- create-vision should present these as open questions, not assumptions.
+**How to apply in the brief:**
+- Don't tag every sentence explicitly (clutters the document).
+- Tag at the section level: "This section is largely validated — user confirmed the audience and research supports the competitive gap."
+- Call out speculative items explicitly: "Note: the social sharing angle is speculative — surfaced during expansion, not yet confirmed."
+### Market Timing Analysis
+When assessing "why now?", look for these signals:
+**Technology shifts**: A new API, platform, or capability that makes something possible (or dramatically cheaper) that wasn't before. Example: LLMs making personalized recommendation affordable for indie tools.
+**Regulatory changes**: New laws or standards that create demand or remove barriers. Example: GDPR creating demand for privacy-first alternatives.
+**Behavioral changes**: Shifts in how people work, communicate, or consume. Example: Remote work increasing demand for async collaboration tools.
+**Market failures**: Recent shutdowns, pivots, or public failures that leave an underserved audience. Example: A popular tool raising prices 10x, driving users to seek alternatives.
+**Cultural shifts**: Changing attitudes that make new products viable. Example: Growing sustainability awareness creating demand for waste-reduction tools.
+Each timing signal should be specific and verifiable — not "AI is trending" but "GPT-4's function calling API, launched in June 2023, makes it possible to build structured data extraction at 1/100th the cost of custom NLP pipelines."

package/content/methodology/game-overlay.yml CHANGED Viewed

@@ -112,6 +112,8 @@ knowledge-overrides:
     append: [game-design-document]
   critical-path-walkthrough:
     append: [game-design-document]
+  spark:
+    append: [game-ideation]
 # ---------------------------------------------------------------------------
 # reads-overrides

package/content/pipeline/foundation/tech-stack.md CHANGED Viewed

@@ -24,6 +24,7 @@ about ecosystem maturity, alternatives, and gotchas.
 ## Inputs
 - docs/plan.md (required) — PRD features, integrations, and technical requirements
 - User preferences (gathered via questions) — language, framework, deployment target, constraints
+- docs/spark-brief.md (optional) — Technology Opportunities section from spark ideation session. If present and not stale (compare tracking comment date against docs/vision.md and docs/plan.md — if the brief predates both, ignore it), use the Technology Opportunities section as supplementary research context when evaluating technology options.
 ## Expected Outputs
 - docs/tech-stack.md — complete technology reference with architecture overview,

package/content/pipeline/vision/create-vision.md CHANGED Viewed

@@ -19,6 +19,7 @@ throughout the entire pipeline.
 ## Inputs
 - Project idea (provided by user verbally or in a brief)
+- docs/spark-brief.md (optional) — upstream context from spark ideation session
 - Existing project files (if brownfield — any README, docs, or code)
 - Market context or competitive research (if available)
@@ -103,11 +104,53 @@ Before starting, check if `docs/vision.md` already exists:
 - **Related docs**: `docs/plan.md`
 - **Special rules**: Never change guiding principles without user approval. Preserve any strategic decisions that were explicitly made by the user.
+### Spark Brief Detection
+**If `docs/spark-brief.md` exists**: Read it completely. Check its tracking
+comment date against the `docs/vision.md` tracking comment date (if vision
+exists). If the brief predates the current vision, ignore it and note:
+"Spark brief found but predates current vision — ignoring." Check the
+brief's heading (`# Spark Brief: [Idea Name]`) against the current
+`$ARGUMENTS` — if the idea name appears unrelated, ask the user before
+using it.
+Otherwise, this is upstream context from a spark ideation session — the user
+has already explored the problem space, researched competitors, expanded the
+idea, and challenged assumptions.
+**Accelerated mode**: Use the brief's answers as a baseline and ask targeted
+follow-up questions to expand them to create-vision's required depth. Do not
+skip phases — deepen and validate the brief's hypotheses rather than
+re-exploring from scratch.
+If the brief was red-teamed (Session Metadata), treat its competitive
+landscape and risk sections as pre-validated hypotheses — focus discovery on
+gaps or updates rather than re-exploring those areas.
+create-vision uses its own configured depth regardless of the brief's depth.
+The brief's depth metadata is informational — it tells you how thoroughly
+the idea was explored, not how thorough this vision step should be.
+Defer the brief's "Technology Opportunities" section to downstream phases
+(tech-stack, architecture) — the vision document is about purpose and positioning,
+not technical implementation.
+**If `docs/spark-brief.md` does NOT exist**: Proceed normally.
 ## Here's my idea:
 $ARGUMENTS
 ## Phase 1: Strategic Discovery
+### Spark Brief Context
+**If `docs/spark-brief.md` was read during Spark Brief Detection above**, use
+it as your baseline for this phase. Do not skip phases — use the brief's
+answers as a starting point and ask targeted follow-up questions to deepen
+and validate the brief's hypotheses to create-vision's required depth.
+**If no spark brief exists**, proceed normally with the discovery questions below.
 Use AskUserQuestionTool throughout this phase. Batch related questions together — don't ask one at a time.
 ### Understand the Problem Space

package/content/skills/multi-model-dispatch/SKILL.md CHANGED Viewed

@@ -64,7 +64,7 @@ fi
 The `!` prefix runs the command in the user's terminal session, allowing interactive auth flows (browser OAuth, Y/n prompts) that can't work in headless mode.
-**If neither CLI is available or authenticated**: Fall back to structured Claude-only self-review. Re-read the artifact with an adversarial lens — actively try to find issues the initial review missed. Document this as "single-model review (no external CLIs available)."
+**If neither CLI is available or authenticated**: Queue a compensating Claude pass focused on the failed channel's strength area. Document this as "single-model review (no external CLIs available)."
 ## Correct Invocation Patterns
@@ -134,6 +134,12 @@ NO_BROWSER=true gemini -p "REVIEW_PROMPT_HERE" --output-format json -s --approva
 **Output**: JSON on stdout with `{ response, stats, error }` structure.
+## Foreground-Only Execution
+Always run Codex and Gemini CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`. Background execution produces empty or truncated output from both CLIs. Multiple foreground calls in a single message are fine — the tool runner supports parallel invocations.
+This means: when dispatching reviews, make each CLI call a separate foreground Bash tool invocation. Do NOT use shell `&` or background subshells.
 ## Context Bundling
 When dispatching a review, bundle all relevant context into the prompt. Each CLI gets the same bundle — do NOT share one model's review with the other.
@@ -208,35 +214,27 @@ You are reviewing a pull request diff. Report P0, P1, and P2 issues.
 | PR diff | Full diff | If >2000 lines, split into file groups |
 | Implementation plan | Task list + representative tasks | Include full task list, detail for flagged tasks |
-## Dual-Model Reconciliation
-When both CLIs produce results, reconcile findings using these rules:
+## Finding Reconciliation
-| Scenario | Confidence | Action |
-|----------|-----------|--------|
-| Both flag same issue | **High** | Fix immediately — two independent models agree |
-| Both approve (no findings) | **High** | Proceed confidently |
-| One flags P0, other approves | **High** | Fix it — P0 is critical enough from a single source |
-| One flags P1, other approves | **Medium** | Review the finding carefully before fixing. If the finding is specific and actionable, fix it. If vague, skip. |
-| Models contradict each other | **Low** | Present both findings to the user for adjudication |
+When multiple models produce findings, reconcile them using the rules defined in `multi-model-review-dispatch`. Key principles:
-**Independence rule**: Never share one model's review output with the other. Each model must review the artifact independently to avoid confirmation bias.
+- **Independence rule**: Never share one model's review output with the other. Each model must review the artifact independently to avoid confirmation bias.
+- **Round tracking**: For iterative reviews (like PR review loops), track the round number. After 3 fix rounds with unresolved findings, stop and surface the verdict (`blocked` or `needs-user-decision`) to the user. Do NOT auto-merge.
-**Round tracking**: For iterative reviews (like PR review loops), track the round number. After 3 fix rounds, merge with a warning and create a follow-up issue for remaining findings.
+For the full consensus rules, confidence scoring, and disagreement resolution process, see `multi-model-review-dispatch`.
 ## Fallback Behavior
 | Situation | Fallback |
 |-----------|----------|
-| Neither CLI available | Structured Claude-only adversarial self-review |
-| Codex only | Single-model review with Codex |
-| Gemini only | Single-model review with Gemini |
-| **CLI auth expired** | **Surface to user with recovery command — do NOT silently fall back** |
-| One CLI fails mid-review (non-auth) | Continue with the other; note the failure in summary |
-| Both CLIs fail (non-auth) | Fall back to Claude-only self-review; warn user |
-| CLI output not parseable as JSON | Treat as text, extract findings manually |
-**Auth failures are NOT silent fallbacks.** The difference between "CLI not installed" (fall back quietly) and "CLI auth expired" (user action required) is critical. Auth can be fixed in 30 seconds with an interactive command — silently skipping wastes the user's review infrastructure.
+| Neither CLI available | Queue two compensating Claude passes (one per missing channel's strength area). Label findings. Max verdict: `degraded-pass`. |
+| Codex only | Single-model review with Codex + compensating Claude pass for Gemini |
+| Gemini only | Single-model review with Gemini + compensating Claude pass for Codex |
+| **CLI auth expired** | **Surface to user with `!` recovery command — do NOT silently fall back** |
+| One CLI fails mid-review | Use partial results if available, else queue compensating pass. Note failure in summary. |
+| Both CLIs fail | Two compensating passes, max verdict: `degraded-pass`. Warn user. |
+Auth failures are NOT silent fallbacks.
 ## Integration with Review Steps