@zigrivers/scaffold 3.10.1 → 3.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -44,21 +44,53 @@ The review never blocks on external model availability.
44
44
 
45
45
  ## Deep Guidance
46
46
 
47
+ See `review-methodology` for severity definitions (P0-P3). This entry uses those severities but does not define them.
48
+
47
49
  ### Dispatch Mechanics
48
50
 
51
+ #### Foreground-Only Execution
52
+
53
+ When an AI agent dispatches CLI reviews via a tool runner (Claude Code Bash tool, Codex exec, etc.), always run commands in the foreground. Background execution (`run_in_background`, `&`, `nohup`) produces empty or truncated output from Codex and Gemini CLIs. Multiple foreground calls can still run in parallel if the tool runner supports parallel tool invocations.
54
+
49
55
  #### CLI Availability Check
50
56
 
51
- Before dispatching, verify the model CLI is installed and authenticated:
57
+ Before dispatching, verify the model CLI is installed and authenticated using a two-step process that produces distinct statuses for the orchestration layer:
58
+
59
+ **Step 1 — Installation check:**
60
+
61
+ ```bash
62
+ # Codex: not found -> status: "not_installed"
63
+ command -v codex >/dev/null 2>&1
64
+
65
+ # Gemini: not found -> status: "not_installed"
66
+ command -v gemini >/dev/null 2>&1
67
+ ```
68
+
69
+ If the CLI is not found, report status `not_installed` to the orchestration layer. Do not prompt the user to install it.
70
+
71
+ **Step 2 — Auth verification (only if installed):**
52
72
 
53
73
  ```bash
54
- # Codex check
55
- which codex && codex --version 2>/dev/null
74
+ # Codex: fail -> status: "auth_failed"
75
+ codex login status 2>/dev/null
56
76
 
57
- # Gemini check (via Google Cloud CLI or dedicated tool)
58
- which gemini 2>/dev/null || (which gcloud && gcloud ai models list 2>/dev/null)
77
+ # Gemini: exit 41 -> status: "auth_failed"
78
+ NO_BROWSER=true gemini -p "respond with ok" -o json 2>&1
59
79
  ```
60
80
 
61
- If the CLI is not found, skip dispatch immediately. Do not prompt the user to install it — this is a review enhancement, not a requirement.
81
+ If auth fails, report status `auth_failed` and surface recovery to the user:
82
+ - Codex: "Codex auth expired — run `! codex login` to re-authenticate"
83
+ - Gemini: "Gemini auth expired — run `! gemini -p \"hello\"` to re-authenticate"
84
+
85
+ If auth check times out (~5 seconds), retry once. If still failing, report `auth_timeout`.
86
+ If auth succeeds, report `ready` and proceed to dispatch.
87
+
88
+ **Post-dispatch terminal states:**
89
+ - `completed` — channel produced results, use normally
90
+ - `partial_timeout` — partial output before timeout; use what was received, note incompleteness. Does NOT trigger compensating pass.
91
+ - `failed` — crashed or unparseable output; triggers compensating pass.
92
+
93
+ Verdict impact: `partial_timeout` and `failed` channels mean the review is degraded. Maximum verdict is `degraded-pass` when any channel has a non-`completed` terminal state.
62
94
 
63
95
  #### Prompt Formatting
64
96
 
@@ -241,6 +273,15 @@ Minimum standards for a multi-model review to be considered complete:
241
273
 
242
274
  If the primary Claude review produces zero findings and external models are unavailable, the review should explicitly note this as unusual and recommend a targeted re-review at a later stage.
243
275
 
276
+ #### Degraded-Mode Gate Adaptation
277
+
278
+ When channels are skipped and compensating passes are used:
279
+
280
+ - **Minimum finding count** gate: compensating passes count toward the total but are not treated as separate external channels for consensus purposes.
281
+ - **Reconciliation completeness** gate (cross-model disagreement documentation): applies whenever 2+ distinct model perspectives participate (Claude + one external counts). N/A only when Claude is the sole perspective (no external models and no compensating passes that introduce genuinely different framing).
282
+ - **Coverage threshold** gate: compensating passes satisfy the "every pass has at least one finding or explicit no-issues note" requirement.
283
+ - The reconciled output must record which channels were real, which were compensating, and which were skipped, so the orchestration layer can apply appropriate verdict logic.
284
+
244
285
  ### Common Anti-Patterns
245
286
 
246
287
  **Blind trust of external findings.** An external model flags an issue and the reviewer includes it without verification. External models hallucinate — they may flag a "missing section" that actually exists, or cite a "contradiction" based on a misread. Fix: every external finding must be verified against the actual artifact before inclusion in the final report.
@@ -0,0 +1,100 @@
1
+ ---
2
+ name: game-ideation
3
+ description: Game-specific ideation techniques for spark — core loop, player fantasy, retention, session design, monetization
4
+ topics: [game-dev, ideation, core-loop, player-fantasy, retention, monetization, session-design]
5
+ ---
6
+
7
+ Game ideation applies game-specific lenses — core loop, player fantasy, retention mechanics, session design, and monetization — to the spark tool's ideation flow. It supplements the general ideation-craft entry when a user is exploring a game idea.
8
+
9
+ ## Summary
10
+
11
+ ### Game Ideation Lenses
12
+ Five lenses to apply during idea exploration: **Core loop** (what the player does every 30 seconds), **Player fantasy** (the emotional experience, not mechanics), **Retention** (what brings players back), **Session design** (how long and how satisfying), **Monetization** (how the game sustains itself).
13
+
14
+ ### Quick Tests
15
+ - **Core loop**: Can you describe it in one sentence without "and"?
16
+ - **Player fantasy**: Does every major mechanic reinforce it?
17
+ - **Retention**: What happens if the player leaves for a week?
18
+
19
+ ## Deep Guidance
20
+
21
+ ### Core Loop Identification
22
+ - **What is the core loop?** The repeating cycle of actions the player performs most often. In a shooter: aim → shoot → loot → repeat. In a puzzle game: observe → plan → execute → evaluate → repeat.
23
+ - **Ask the user**: "What does the player do every 30 seconds? Every 5 minutes? Every session?"
24
+ - **Test**: Can you describe the core loop in one sentence without using the word "and"? If not, it's too complex or undefined.
25
+
26
+ ### Player Fantasy
27
+ - **What fantasy does the player live out?** Not the game mechanics — the emotional experience. "I am a powerful wizard" not "I cast spells with mana."
28
+ - **Ask the user**: "When the player tells their friend about your game, what do they say it feels like?"
29
+ - **Test**: Does every major mechanic reinforce the fantasy? If a mechanic exists but doesn't serve the fantasy, question why it's there.
30
+
31
+ ### Retention Mechanics
32
+ - **Session hooks**: What brings the player back tomorrow? (Daily rewards, story cliffhangers, social obligations, unfinished goals)
33
+ - **Progression**: What does the player invest that makes leaving costly? (Character levels, base building, collection progress, social reputation)
34
+ - **Ask the user**: "What happens if the player doesn't open the game for a week? Do they lose anything? Miss anything?"
35
+
36
+ ### Session Design
37
+ - **Session length**: How long is a typical play session? (Mobile: 3-5 min. PC: 30-90 min. Console: 60+ min.)
38
+ - **Session arc**: Does each session have a beginning, middle, and satisfying end? Can the player stop mid-session without frustration?
39
+ - **Ask the user**: "Where and when does your player play? Commute? Couch? Desk? This determines session length."
40
+
41
+ ### Monetization Models
42
+ - **Premium**: Pay once, play forever. Best for narrative, creative, or skill-based games.
43
+ - **Free-to-play**: Free entry, monetize through cosmetics, battle pass, or convenience. Best for multiplayer/social games.
44
+ - **Subscription**: Recurring payment for ongoing content. Best for live-service games.
45
+ - **Ask the user**: "How does your player feel about spending money in your game? What would they pay for? What would feel unfair?"
46
+
47
+ ### Applying Game Lenses During Spark Phases
48
+
49
+ **Phase 1 (Seed)**: Ask about the core loop and player fantasy early. These are the foundation — if they're unclear, everything else is built on sand.
50
+
51
+ **Phase 2 (Research)**: Research competitors through a game lens. For each competitor: What's their core loop? What fantasy do they deliver? How do they monetize? What's their session design? Where do player reviews complain?
52
+
53
+ **Phase 3 (Expand)**: Use game-specific expansion angles:
54
+ - "What if the core loop had a social/multiplayer dimension?"
55
+ - "What if you added a metagame layer on top of the core loop?"
56
+ - "What platform would change the experience most? (Mobile → PC, or vice versa)"
57
+ - "What if monetization was through player-created content?"
58
+
59
+ **Phase 4 (Challenge)**: Challenge through game-specific risk lenses:
60
+ - "Core loop fatigue — will this still be fun after 100 hours?"
61
+ - "Monetization pressure — does the business model conflict with the player fantasy?"
62
+ - "Scope vs. team — can a [team size] team build this in [timeline]?"
63
+ - "Platform expectations — does the session design match the platform's usage patterns?"
64
+
65
+ ### Game-Specific Brief Sections
66
+
67
+ When writing the spark brief for a game idea, adapt sections:
68
+ - **Idea & Problem Space** → Include the core loop and player fantasy
69
+ - **Landscape** → Frame competitors by core loop and fantasy, not just features
70
+ - **Expansion Ideas** → Tag which ideas affect the core loop vs. metagame vs. content
71
+ - **Risks** → Include core loop fatigue, monetization/fantasy tension, and scope risks
72
+
73
+ ### Scoping by Project Scale
74
+
75
+ | Scale | Core loop | Content depth | Monetization | Session design |
76
+ |-------|-----------|---------------|-------------|----------------|
77
+ | Game jam (48-72h) | One mechanic, tight loop | Minimal — procedural or template | None (free) | 5-15 min total |
78
+ | Indie (solo/small team) | 1-2 mechanics, polished | Handcrafted, limited scope | Premium or F2P with cosmetics | 15-60 min sessions |
79
+ | AA/studio | Multiple interlocking systems | Extensive content pipeline | Any model, balanced | Platform-appropriate |
80
+
81
+ ### Common Game Ideation Anti-Patterns
82
+
83
+ - **The Kitchen Sink**: Trying to combine too many mechanics before any one is fun. Focus the core loop first.
84
+ - **Fantasy Mismatch**: The monetization model undermines the player fantasy. (Pay-to-win in a skill-based competitive game.)
85
+ - **Platform Blindness**: Designing a 90-minute session game for mobile, or a 3-minute session for PC/console.
86
+ - **Retention Treadmill**: Relying on FOMO and daily login rewards instead of intrinsic motivation. Players resent obligation.
87
+ - **Scope Denial**: "We'll just add multiplayer later." Multiplayer is an architecture decision, not a feature toggle.
88
+ - **Clone Trap**: "Like [popular game] but with [small twist]." The twist must be fundamental enough to justify switching costs.
89
+
90
+ ### Core Loop Evaluation Worksheet
91
+
92
+ When evaluating a proposed core loop, walk through these questions:
93
+
94
+ 1. **Primary loop**: What does the player do every 30 seconds? Is it inherently satisfying?
95
+ 2. **Secondary loop**: What does the player do every 5 minutes? Does it give meaning to the primary loop?
96
+ 3. **Tertiary loop**: What does the player do every session? Does it create a sense of progress?
97
+ 4. **Friction test**: Remove one mechanic from the loop. Does the game still work? If yes, that mechanic may be unnecessary.
98
+ 5. **Fantasy alignment**: Does every step in the loop reinforce the player fantasy? If a step breaks immersion, redesign it.
99
+ 6. **Depth test**: Can a skilled player execute the loop differently than a novice? If not, the loop may lack depth.
100
+ 7. **Social test**: Would watching someone else do this loop be entertaining? If not, the loop may lack spectacle or surprise.
@@ -0,0 +1,209 @@
1
+ ---
2
+ name: ideation-craft
3
+ description: Questioning techniques, research methodology, lightweight expansion patterns, and brief synthesis for early-stage idea exploration
4
+ topics: [ideation, questioning, research, competitive-analysis, brief-synthesis, socratic-method]
5
+ ---
6
+
7
+ # Ideation Craft
8
+
9
+ Ideation craft covers the questioning, research, and synthesis techniques used during early-stage idea exploration. It guides a conversational flow from a raw idea through competitive research to a structured idea brief.
10
+
11
+ ## Summary
12
+
13
+ ### Key Techniques
14
+ - **Questioning**: Socratic method (what → who → why → why not), 5 Whys for root cause, "What would have to be true?" for assumptions. Batch 2-3 questions per turn.
15
+ - **Research**: Scan direct competitors, indirect alternatives, and the "do nothing" option. Capture strengths, weaknesses, positioning per competitor. Check adjacent markets and market timing.
16
+ - **Expansion**: Lightweight one-liner prompts — adjacent markets, ecosystem plays, contrarian angles, tech enablers, AI-native rethinking. These are conversation starters, not full strategic methodology.
17
+ - **Synthesis**: 2-4 sentences per brief section. Tag confidence: validated, hypothesized, or speculative. Never fabricate — write "None identified" for empty sections.
18
+
19
+ ## Deep Guidance
20
+
21
+ ### Questioning Techniques
22
+
23
+ - **Socratic method**: Ask progressively deeper questions. Start with "what" (the idea), move to "who" (the audience), then "why" (the problem), then "why not" (the assumptions).
24
+ - **The 5 Whys**: When the user states a problem, ask "why?" five times to reach the root cause. Surface-level problems hide deeper opportunities.
25
+ - **"What would have to be true?"**: For every assumption, ask what conditions must hold for it to work. This surfaces hidden dependencies and risks.
26
+ - **Batching**: Group 2-3 related questions per turn. Don't pepper the user with single questions (wastes turns) or overwhelm with 10 at once (causes shallow answers).
27
+
28
+ ### Progressive Questioning Framework
29
+
30
+ **Turn 1 — Capture the spark**: What are you building? Who is it for? What problem does it solve?
31
+
32
+ **Turn 2 — Dig into the problem**: How do people solve this today? What's painful about the current approach? How often do they experience this pain?
33
+
34
+ **Turn 3 — Understand the audience**: Describe the person who needs this most. What are they doing the moment before they reach for your product? What does "success" look like from their perspective?
35
+
36
+ **Turn 4 — Challenge assumptions**: You said [X] — what evidence do you have? What would have to be true for [Y] to work? If [Z] turned out to be wrong, would the idea still make sense?
37
+
38
+ **Turn 5+ — Deepen based on gaps**: Follow the thread. If the audience is unclear, keep pulling on that. If the problem is well-defined but the solution is vague, focus there. Don't follow a script — follow the gaps.
39
+
40
+ ### Research Methodology
41
+
42
+ - **Competitor scan**: Search for direct competitors (same problem, same audience), indirect alternatives (different approach, same problem), and the "do nothing" option (how users cope today).
43
+ - **What to capture per competitor**: Name, what they do well (be specific), where they fall short (be honest), pricing model, target audience, and why a user might choose them over the idea.
44
+ - **Adjacent markets**: Look for products solving related problems for the same audience, or the same problem for a different audience. These are expansion opportunities.
45
+ - **Market timing**: Why now? What changed (technology, regulation, culture, behavior) that makes this idea viable today when it wasn't before?
46
+
47
+ ### Expansion Patterns (Lightweight)
48
+
49
+ - **Adjacent market**: "Your users also need X — have you considered expanding into that?"
50
+ - **Ecosystem play**: "If you solve A, you become the natural place to also solve B and C."
51
+ - **Contrarian angle**: "Everyone in this space does X. What if you deliberately did the opposite?"
52
+ - **Technology enabler**: "A new capability (API, model, platform) makes Y possible now — could that reshape your approach?"
53
+ - **AI-native rethinking**: "If you assumed AI could handle Z, how would that change the product?"
54
+
55
+ These are conversation starters for Phase 3 (Expand), not full strategic methodology. The pipeline's `innovate-vision` step covers comprehensive strategic expansion later.
56
+
57
+ ### Brief Synthesis
58
+
59
+ - A good directional hypothesis names a specific audience, problem, and approach — not vague aspirations.
60
+ - Bad: "This app will help people be more productive." Good: "Freelance designers who lose 5+ hours/week to invoice tracking — a tool that auto-generates invoices from their time-tracking data."
61
+ - Tag confidence levels: "validated" (user confirmed + research supports), "hypothesized" (user stated but unresearched), "speculative" (surfaced during expansion, unconfirmed).
62
+ - Each brief section should be 2-4 sentences or concise bullet points. If a section has nothing, write "None identified" — don't fabricate.
63
+
64
+ ### Competitive Research Process
65
+
66
+ 1. **Start with the obvious**: Search for "[problem] app" or "[problem] tool." The first 5-10 results are the landscape the user will compete against.
67
+ 2. **Check review sites**: App Store reviews, G2, Capterra, ProductHunt comments. Users complain about exactly the gaps a new product can fill.
68
+ 3. **Look for failures**: Search "[category] startup failed" or "[competitor] shutdown." Failed attempts tell you what didn't work and why.
69
+ 4. **Find the "do nothing" option**: How do people cope without any tool? Spreadsheets, manual processes, asking friends? This is often the biggest competitor.
70
+ 5. **Assess timing**: Search for recent news, funding rounds, regulatory changes, or technology launches in the space. Timing explains why an idea works now when it didn't before.
71
+
72
+ ### Framing Research for External Model Dispatch
73
+
74
+ When dispatching to an external model for competitive research (depth 4+), frame the prompt as:
75
+
76
+ > "Research the competitive landscape for [idea summary]. Identify: (1) Direct competitors solving the same problem for the same audience, (2) Indirect alternatives — different approaches to the same problem, (3) The 'do nothing' option — how users cope today, (4) Recent market signals — funding, launches, shutdowns, regulatory changes. For each competitor, note what they do well and where they fall short. Be honest — acknowledge genuine strengths."
77
+
78
+ ### Brief Section Guidance
79
+
80
+ | Section | Source Phase | What to write | Common mistakes |
81
+ |---------|-------------|---------------|-----------------|
82
+ | Idea & Problem Space | Phase 1 (Seed) | Core idea, specific problem, target audience, why they need it | Too vague ("helps people"), no audience specificity |
83
+ | Landscape | Phase 2 (Research) | 2-5 competitors with strengths/weaknesses, positioning | Dismissing competitors, listing without analysis |
84
+ | Expansion Ideas | Phase 3 (Expand) | Accepted ideas tagged as preliminary, deferred ideas noted | Treating preliminary as committed scope |
85
+ | Constraints & Scope | Phase 4 (Challenge) | Confirmed assumptions, what's in/out, locked decisions | Scope too broad, no explicit "out" list |
86
+ | Technology Opportunities | Phase 2-3 | Tech enablers discovered during research/expansion | Listing technologies without explaining why they matter |
87
+ | Open Questions | All phases | Unresolved items that need answers before building | Ignoring questions that feel uncomfortable |
88
+ | Risks | Phase 4 (Challenge) | Market, technical, feasibility risks with severity | Only listing technical risks, ignoring market risks |
89
+
90
+ ### Audience Definition Techniques
91
+
92
+ Avoid demographic-only definitions ("18-35 year old professionals"). Instead, define audiences by behavior and motivation:
93
+
94
+ **Behavior-based**: "People who currently track expenses in a spreadsheet because existing apps are too complex."
95
+ **Motivation-based**: "Freelancers who want to spend less than 10 minutes per week on invoicing so they can focus on client work."
96
+ **Context-based**: "The moment someone finishes a client project and thinks 'now I have to figure out the invoice' — that's when they need this."
97
+
98
+ **Questions to sharpen audience definition:**
99
+ - What is this person doing the moment before they reach for your product?
100
+ - What is the last thing they tried? Why did it fail them?
101
+ - How would they describe their problem to a friend (not in your language — in theirs)?
102
+ - If you could only serve ONE type of user, who would it be and why?
103
+
104
+ ### Problem Validation Framework
105
+
106
+ Before accepting a problem statement, test it:
107
+
108
+ 1. **Specificity test**: Can you name a real person (or type of person) who has this problem? If "everyone has this problem," it's too vague.
109
+ 2. **Frequency test**: How often does this problem occur? Daily problems are more valuable than annual ones.
110
+ 3. **Severity test**: When this problem occurs, how painful is it? Mild inconvenience or hair-on-fire emergency?
111
+ 4. **Workaround test**: How do people cope today? If they have a workable (even if imperfect) solution, your product must be dramatically better.
112
+ 5. **Willingness test**: Would someone pay money / change habits / switch tools to solve this? If not, the problem may not be valuable enough.
113
+
114
+ ### Scope Sharpening Techniques
115
+
116
+ When the idea is too broad, use these techniques to find the core:
117
+
118
+ - **The one-feature test**: "If your product could only do ONE thing, what would it be?" This reveals the core value proposition.
119
+ - **The removal test**: "If you removed [feature X], would anyone still use the product?" If yes, X is not core.
120
+ - **The first-user test**: "Who is the first person who would use this, and what exactly would they do with it?" This grounds abstract ideas in concrete behavior.
121
+ - **The MVP boundary**: "What is the smallest thing you could build that would make one person's life measurably better?" This defines the initial scope.
122
+ - **The anti-scope list**: Explicitly list what the product does NOT do. This is as important as what it does.
123
+
124
+ ### Positioning Against Competitors
125
+
126
+ When the landscape is crowded, help the user find genuine differentiation:
127
+
128
+ - **Head-to-head**: "Competitor X does this well. You would need to be 10x better at this specific thing to win users away. Can you be?"
129
+ - **Underserved segment**: "Competitor X serves enterprise. Is there an underserved segment (freelancers, students, non-profits) that you could own?"
130
+ - **Different job**: "Competitor X solves problem A. Could you solve a related but different problem B for the same audience?"
131
+ - **Channel advantage**: "Competitor X requires a desktop app. Could you win by being mobile-first, browser-based, or embedded in an existing workflow?"
132
+ - **Timing advantage**: "What has changed (new technology, regulation, cultural shift) that makes your approach viable now when it wasn't when competitors launched?"
133
+
134
+ ### Ideation Anti-Patterns
135
+
136
+ | Anti-pattern | What it sounds like | Why it's dangerous | How to challenge |
137
+ |-------------|--------------------|--------------------|-----------------|
138
+ | Solution-first | "I want to build an app that..." | Skips the problem entirely | "What problem does this solve? For whom?" |
139
+ | Everyone-needs-this | "Everyone could use this" | No target audience = no product | "Who needs this MOST? Who would pay?" |
140
+ | Feature soup | "It'll do X and Y and Z and..." | No core value proposition | "Remove one feature. Does it still work?" |
141
+ | Competitor blindness | "Nobody else does this" | Almost certainly false | "How do people solve this today?" |
142
+ | Technology hammer | "I learned [tech] and want to use it" | Technology seeking a problem | "Forget the tech. What problem exists?" |
143
+ | Scale fantasy | "Once we have millions of users..." | Ignores the path to the first user | "How do you get user #1? User #10?" |
144
+ | Uniqueness obsession | "We need a totally new idea" | Execution beats novelty almost always | "What existing idea could you execute 10x better?" |
145
+
146
+ ### Worked Example: From Vague to Sharp
147
+
148
+ **Vague starting point**: "An app for recipes"
149
+
150
+ **After Phase 1 (Seed):**
151
+ - Who: Home cooks who meal prep on weekends but waste food because they buy ingredients for recipes they never make.
152
+ - Problem: Planning meals for the week takes 45+ minutes, and existing apps have 50,000 recipes but no help deciding which ones to cook together.
153
+ - Core idea: A meal planning tool that suggests complementary recipes sharing ingredients, minimizing waste and shopping time.
154
+
155
+ **After Phase 2 (Research):**
156
+ - Competitors: Mealime (good UI but no ingredient overlap), Paprika (great for saving recipes but no planning), Eat This Much (calorie-focused, not taste-focused).
157
+ - Gap: No tool optimizes for ingredient reuse across a week of meals.
158
+
159
+ **After Phase 3 (Expand):**
160
+ - Accepted: Grocery list auto-generation from the meal plan (directly supports core value).
161
+ - Deferred: Social sharing of meal plans (not core, revisit later).
162
+ - Rejected: Calorie tracking (different problem, different audience).
163
+
164
+ **After Phase 4 (Challenge):**
165
+ - Confirmed: The ingredient-overlap algorithm is the differentiator.
166
+ - Revised: Scope down from "all cuisines" to "weeknight dinners, 30 min or less" for MVP.
167
+ - Locked out: No restaurant recommendations, no diet tracking, no social features for v1.
168
+
169
+ This progression from "an app for recipes" to a tightly scoped meal planning tool with a clear differentiator is what a good spark session produces.
170
+
171
+ ### Confidence Tagging Guide
172
+
173
+ Every claim in the spark brief should carry an implicit confidence level. This helps `create-vision` know what to validate vs. what to build on.
174
+
175
+ **Validated** (highest confidence):
176
+ - User stated it AND research supports it.
177
+ - Example: "3 competitors exist in this space" (user said, you verified via search).
178
+ - create-vision can build on this without re-exploring.
179
+
180
+ **Hypothesized** (medium confidence):
181
+ - User stated it but it hasn't been independently verified.
182
+ - Example: "Target users are freelance designers" (user's claim, no research to confirm market size).
183
+ - create-vision should probe deeper on these — targeted follow-up questions.
184
+
185
+ **Speculative** (lowest confidence):
186
+ - Surfaced during expansion or challenge, not yet confirmed by user.
187
+ - Example: "Meal planning apps retain 3x better than recipe apps" (research finding, user hasn't decided whether to pivot).
188
+ - create-vision should present these as open questions, not assumptions.
189
+
190
+ **How to apply in the brief:**
191
+ - Don't tag every sentence explicitly (clutters the document).
192
+ - Tag at the section level: "This section is largely validated — user confirmed the audience and research supports the competitive gap."
193
+ - Call out speculative items explicitly: "Note: the social sharing angle is speculative — surfaced during expansion, not yet confirmed."
194
+
195
+ ### Market Timing Analysis
196
+
197
+ When assessing "why now?", look for these signals:
198
+
199
+ **Technology shifts**: A new API, platform, or capability that makes something possible (or dramatically cheaper) that wasn't before. Example: LLMs making personalized recommendation affordable for indie tools.
200
+
201
+ **Regulatory changes**: New laws or standards that create demand or remove barriers. Example: GDPR creating demand for privacy-first alternatives.
202
+
203
+ **Behavioral changes**: Shifts in how people work, communicate, or consume. Example: Remote work increasing demand for async collaboration tools.
204
+
205
+ **Market failures**: Recent shutdowns, pivots, or public failures that leave an underserved audience. Example: A popular tool raising prices 10x, driving users to seek alternatives.
206
+
207
+ **Cultural shifts**: Changing attitudes that make new products viable. Example: Growing sustainability awareness creating demand for waste-reduction tools.
208
+
209
+ Each timing signal should be specific and verifiable — not "AI is trending" but "GPT-4's function calling API, launched in June 2023, makes it possible to build structured data extraction at 1/100th the cost of custom NLP pipelines."
@@ -112,6 +112,8 @@ knowledge-overrides:
112
112
  append: [game-design-document]
113
113
  critical-path-walkthrough:
114
114
  append: [game-design-document]
115
+ spark:
116
+ append: [game-ideation]
115
117
 
116
118
  # ---------------------------------------------------------------------------
117
119
  # reads-overrides
@@ -24,6 +24,7 @@ about ecosystem maturity, alternatives, and gotchas.
24
24
  ## Inputs
25
25
  - docs/plan.md (required) — PRD features, integrations, and technical requirements
26
26
  - User preferences (gathered via questions) — language, framework, deployment target, constraints
27
+ - docs/spark-brief.md (optional) — Technology Opportunities section from spark ideation session. If present and not stale (compare tracking comment date against docs/vision.md and docs/plan.md — if the brief predates both, ignore it), use the Technology Opportunities section as supplementary research context when evaluating technology options.
27
28
 
28
29
  ## Expected Outputs
29
30
  - docs/tech-stack.md — complete technology reference with architecture overview,
@@ -19,6 +19,7 @@ throughout the entire pipeline.
19
19
 
20
20
  ## Inputs
21
21
  - Project idea (provided by user verbally or in a brief)
22
+ - docs/spark-brief.md (optional) — upstream context from spark ideation session
22
23
  - Existing project files (if brownfield — any README, docs, or code)
23
24
  - Market context or competitive research (if available)
24
25
 
@@ -103,11 +104,53 @@ Before starting, check if `docs/vision.md` already exists:
103
104
  - **Related docs**: `docs/plan.md`
104
105
  - **Special rules**: Never change guiding principles without user approval. Preserve any strategic decisions that were explicitly made by the user.
105
106
 
107
+ ### Spark Brief Detection
108
+
109
+ **If `docs/spark-brief.md` exists**: Read it completely. Check its tracking
110
+ comment date against the `docs/vision.md` tracking comment date (if vision
111
+ exists). If the brief predates the current vision, ignore it and note:
112
+ "Spark brief found but predates current vision — ignoring." Check the
113
+ brief's heading (`# Spark Brief: [Idea Name]`) against the current
114
+ `$ARGUMENTS` — if the idea name appears unrelated, ask the user before
115
+ using it.
116
+
117
+ Otherwise, this is upstream context from a spark ideation session — the user
118
+ has already explored the problem space, researched competitors, expanded the
119
+ idea, and challenged assumptions.
120
+
121
+ **Accelerated mode**: Use the brief's answers as a baseline and ask targeted
122
+ follow-up questions to expand them to create-vision's required depth. Do not
123
+ skip phases — deepen and validate the brief's hypotheses rather than
124
+ re-exploring from scratch.
125
+
126
+ If the brief was red-teamed (Session Metadata), treat its competitive
127
+ landscape and risk sections as pre-validated hypotheses — focus discovery on
128
+ gaps or updates rather than re-exploring those areas.
129
+
130
+ create-vision uses its own configured depth regardless of the brief's depth.
131
+ The brief's depth metadata is informational — it tells you how thoroughly
132
+ the idea was explored, not how thorough this vision step should be.
133
+
134
+ Defer the brief's "Technology Opportunities" section to downstream phases
135
+ (tech-stack, architecture) — the vision document is about purpose and positioning,
136
+ not technical implementation.
137
+
138
+ **If `docs/spark-brief.md` does NOT exist**: Proceed normally.
139
+
106
140
  ## Here's my idea:
107
141
  $ARGUMENTS
108
142
 
109
143
  ## Phase 1: Strategic Discovery
110
144
 
145
+ ### Spark Brief Context
146
+
147
+ **If `docs/spark-brief.md` was read during Spark Brief Detection above**, use
148
+ it as your baseline for this phase. Do not skip phases — use the brief's
149
+ answers as a starting point and ask targeted follow-up questions to deepen
150
+ and validate the brief's hypotheses to create-vision's required depth.
151
+
152
+ **If no spark brief exists**, proceed normally with the discovery questions below.
153
+
111
154
  Use AskUserQuestionTool throughout this phase. Batch related questions together — don't ask one at a time.
112
155
 
113
156
  ### Understand the Problem Space
@@ -64,7 +64,7 @@ fi
64
64
 
65
65
  The `!` prefix runs the command in the user's terminal session, allowing interactive auth flows (browser OAuth, Y/n prompts) that can't work in headless mode.
66
66
 
67
- **If neither CLI is available or authenticated**: Fall back to structured Claude-only self-review. Re-read the artifact with an adversarial lens — actively try to find issues the initial review missed. Document this as "single-model review (no external CLIs available)."
67
+ **If neither CLI is available or authenticated**: Queue a compensating Claude pass focused on the failed channel's strength area. Document this as "single-model review (no external CLIs available)."
68
68
 
69
69
  ## Correct Invocation Patterns
70
70
 
@@ -134,6 +134,12 @@ NO_BROWSER=true gemini -p "REVIEW_PROMPT_HERE" --output-format json -s --approva
134
134
 
135
135
  **Output**: JSON on stdout with `{ response, stats, error }` structure.
136
136
 
137
+ ## Foreground-Only Execution
138
+
139
+ Always run Codex and Gemini CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`. Background execution produces empty or truncated output from both CLIs. Multiple foreground calls in a single message are fine — the tool runner supports parallel invocations.
140
+
141
+ This means: when dispatching reviews, make each CLI call a separate foreground Bash tool invocation. Do NOT use shell `&` or background subshells.
142
+
137
143
  ## Context Bundling
138
144
 
139
145
  When dispatching a review, bundle all relevant context into the prompt. Each CLI gets the same bundle — do NOT share one model's review with the other.
@@ -208,35 +214,27 @@ You are reviewing a pull request diff. Report P0, P1, and P2 issues.
208
214
  | PR diff | Full diff | If >2000 lines, split into file groups |
209
215
  | Implementation plan | Task list + representative tasks | Include full task list, detail for flagged tasks |
210
216
 
211
- ## Dual-Model Reconciliation
212
-
213
- When both CLIs produce results, reconcile findings using these rules:
217
+ ## Finding Reconciliation
214
218
 
215
- | Scenario | Confidence | Action |
216
- |----------|-----------|--------|
217
- | Both flag same issue | **High** | Fix immediately — two independent models agree |
218
- | Both approve (no findings) | **High** | Proceed confidently |
219
- | One flags P0, other approves | **High** | Fix it — P0 is critical enough from a single source |
220
- | One flags P1, other approves | **Medium** | Review the finding carefully before fixing. If the finding is specific and actionable, fix it. If vague, skip. |
221
- | Models contradict each other | **Low** | Present both findings to the user for adjudication |
219
+ When multiple models produce findings, reconcile them using the rules defined in `multi-model-review-dispatch`. Key principles:
222
220
 
223
- **Independence rule**: Never share one model's review output with the other. Each model must review the artifact independently to avoid confirmation bias.
221
+ - **Independence rule**: Never share one model's review output with the other. Each model must review the artifact independently to avoid confirmation bias.
222
+ - **Round tracking**: For iterative reviews (like PR review loops), track the round number. After 3 fix rounds with unresolved findings, stop and surface the verdict (`blocked` or `needs-user-decision`) to the user. Do NOT auto-merge.
224
223
 
225
- **Round tracking**: For iterative reviews (like PR review loops), track the round number. After 3 fix rounds, merge with a warning and create a follow-up issue for remaining findings.
224
+ For the full consensus rules, confidence scoring, and disagreement resolution process, see `multi-model-review-dispatch`.
226
225
 
227
226
  ## Fallback Behavior
228
227
 
229
228
  | Situation | Fallback |
230
229
  |-----------|----------|
231
- | Neither CLI available | Structured Claude-only adversarial self-review |
232
- | Codex only | Single-model review with Codex |
233
- | Gemini only | Single-model review with Gemini |
234
- | **CLI auth expired** | **Surface to user with recovery command — do NOT silently fall back** |
235
- | One CLI fails mid-review (non-auth) | Continue with the other; note the failure in summary |
236
- | Both CLIs fail (non-auth) | Fall back to Claude-only self-review; warn user |
237
- | CLI output not parseable as JSON | Treat as text, extract findings manually |
238
-
239
- **Auth failures are NOT silent fallbacks.** The difference between "CLI not installed" (fall back quietly) and "CLI auth expired" (user action required) is critical. Auth can be fixed in 30 seconds with an interactive command — silently skipping wastes the user's review infrastructure.
230
+ | Neither CLI available | Queue two compensating Claude passes (one per missing channel's strength area). Label findings. Max verdict: `degraded-pass`. |
231
+ | Codex only | Single-model review with Codex + compensating Claude pass for Gemini |
232
+ | Gemini only | Single-model review with Gemini + compensating Claude pass for Codex |
233
+ | **CLI auth expired** | **Surface to user with `!` recovery command — do NOT silently fall back** |
234
+ | One CLI fails mid-review | Use partial results if available, else queue compensating pass. Note failure in summary. |
235
+ | Both CLIs fail | Two compensating passes, max verdict: `degraded-pass`. Warn user. |
236
+
237
+ Auth failures are NOT silent fallbacks.
240
238
 
241
239
  ## Integration with Review Steps
242
240