npm - @botlearn/brainstorm - Versions diffs - 0.1.0 - Mend

@botlearn/brainstorm 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/LICENSE +21 -0
package/README.md +35 -0
package/knowledge/anti-patterns.md +80 -0
package/knowledge/best-practices.md +105 -0
package/knowledge/domain.md +152 -0
package/manifest.json +26 -0
package/package.json +35 -0
package/skill.md +44 -0
package/strategies/main.md +101 -0
package/tests/benchmark.json +476 -0
package/tests/smoke.json +54 -0

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 BotLearn
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,35 @@
+# @botlearn/brainstorm
+> Multi-dimensional ideation and brainstorming with structured creativity frameworks, feasibility assessment, and idea clustering for OpenClaw Agent
+## Installation
+```bash
+# via npm
+npm install @botlearn/brainstorm
+# via clawhub
+clawhub install @botlearn/brainstorm
+```
+## Category
+Creative Generation
+## Dependencies
+None
+## Files
+| File | Description |
+|------|-------------|
+| `manifest.json` | Skill metadata and configuration |
+| `skill.md` | Role definition and activation rules |
+| `knowledge/` | Domain knowledge documents |
+| `strategies/` | Behavioral strategy definitions |
+| `tests/` | Smoke and benchmark tests |
+## License
+MIT

package/knowledge/anti-patterns.md ADDED Viewed

@@ -0,0 +1,80 @@
+---
+domain: brainstorm
+topic: anti-patterns
+priority: medium
+ttl: 90d
+---
+# Brainstorming — Anti-Patterns
+## Generation Phase Anti-Patterns
+### 1. Premature Evaluation
+- **Problem**: Judging ideas during the divergent generation phase kills creative output. Statements like "That won't work because..." or "We tried that before" shut down exploration before ideas can evolve.
+- **Symptoms**: Low idea count (< 10 ideas), all ideas are "safe" and incremental, participants self-censor
+- **Fix**: Enforce strict phase separation. During generation, the only allowed response to any idea is "Yes, and..." (building on it) or silent capture. Defer ALL evaluation to the convergent phase.
+### 2. Anchoring Bias
+- **Problem**: The first idea mentioned dominates the session. Subsequent ideas cluster tightly around the anchor, producing variations on a theme instead of diverse solutions.
+- **Symptoms**: Most ideas are minor variations of the first 1-2 ideas; lack of category diversity; ideas feel incremental
+- **Fix**:
+  - Start with individual silent ideation (2-3 minutes) before any sharing
+  - Use multiple creativity frameworks (SCAMPER, Six Hats, TRIZ) sequentially to force different angles
+  - Explicitly prompt: "Now generate an idea that has nothing in common with everything so far"
+### 3. Domain Tunnel Vision
+- **Problem**: All ideas come from the same domain, industry, or disciplinary perspective. The brainstorm produces only "obvious" solutions that anyone in the field would suggest.
+- **Symptoms**: Every idea uses the same technology, follows the same business model, or addresses only the most visible part of the problem
+- **Fix**:
+  - Mandate cross-domain analogies: "How would [biology / game design / logistics / healthcare] solve this?"
+  - Apply the Random Entry technique from lateral thinking
+  - Include at least one "what if the opposite were true?" provocation per session
+### 4. Groupthink / Convergent Drift
+- **Problem**: The group unconsciously converges on a shared perspective, suppressing dissenting or unusual ideas. Social pressure favors agreement over diversity.
+- **Symptoms**: Ideas are highly similar; no disagreement or tension; ideas feel "committee-approved"
+- **Fix**:
+  - Assign a Devil's Advocate role (Red Hat / Black Hat thinker)
+  - Use brainwriting: participants write ideas independently before sharing
+  - Explicitly ask: "What idea would a critic of this approach suggest?"
+### 5. Solution Jumping
+- **Problem**: Moving directly to solutions without adequately understanding or reframing the problem. The wrong problem gets solved efficiently.
+- **Symptoms**: Ideas address surface symptoms, not root causes; solutions don't match user needs; "obvious" solutions that ignore underlying complexity
+- **Fix**:
+  - Spend 20% of session time on problem reframing BEFORE any solution generation
+  - Ask "Why?" five times (5 Whys) to reach root cause
+  - Restate the problem from at least 3 different stakeholder perspectives
+## Evaluation Phase Anti-Patterns
+### 6. Halo Effect on Ideas
+- **Problem**: Ideas from authoritative sources or ideas presented eloquently receive higher scores regardless of actual merit. Conversely, poorly articulated but strong ideas get dismissed.
+- **Fix**: Evaluate ideas anonymously when possible. Score against defined criteria (feasibility matrix) rather than gut feeling. Rate each dimension separately before computing composite scores.
+### 7. Survivorship Bias in Clustering
+- **Problem**: Only the loudest or most repeated ideas survive to the clustering phase. Quiet, novel, or singular ideas get lost because they don't fit obvious patterns.
+- **Fix**: Before clustering, explicitly review all "orphan" ideas (those that don't obviously fit any group). Orphans often represent the most innovative directions. Create a dedicated "Wild Cards" cluster.
+### 8. Feasibility-Only Filtering
+- **Problem**: Filtering ideas solely on feasibility eliminates high-impact, transformative ideas that require more effort but could be game-changing.
+- **Fix**: Use a 2x2 matrix: Impact vs. Feasibility. Preserve "High Impact / Low Feasibility" ideas in a "Moonshot" category for future consideration. Never discard based on a single dimension.
+## Output Anti-Patterns
+### 9. Flat List Delivery
+- **Problem**: Presenting brainstorm results as an unstructured, unprioritized list of ideas. The recipient cannot act on 30+ undifferentiated ideas.
+- **Symptoms**: No clustering, no ranking, no feasibility signals; all ideas appear equally important
+- **Fix**: Always deliver ideas in clustered, scored, and ranked format. Lead with the top 3-5 ideas from the highest-priority cluster. Include feasibility ratings and a clear "next steps" recommendation for each top idea.
+### 10. Missing Actionability
+- **Problem**: Ideas are described at too high a level to be actionable. "Use AI" or "Improve the user experience" are directions, not ideas.
+- **Fix**: Each idea should pass the specificity test: Could someone start working on this within a week? Include: what it is, who it's for, what changes, and a rough first step. Minimum: one concrete sentence beyond the headline.
+### 11. No Novelty Signal
+- **Problem**: Failing to flag which ideas are genuinely novel versus well-known approaches. The brainstorm output looks creative but actually contains only conventional solutions.
+- **Fix**: Tag each idea as: "Novel" (not commonly applied in this domain), "Adapted" (borrowed from another domain), or "Standard" (well-known in this domain). Aim for at least 30% Novel or Adapted ideas.
+### 12. Ignoring Constraints
+- **Problem**: Generating ideas that completely disregard the stated constraints (budget, timeline, team size, technology stack). Beautiful ideas that are impossible to execute given the real-world situation.
+- **Fix**: Restate constraints at the start of the convergent phase. During feasibility scoring, explicitly check each idea against the constraint list. Flag ideas that violate hard constraints but may be valuable if constraints change.

package/knowledge/best-practices.md ADDED Viewed

@@ -0,0 +1,105 @@
+---
+domain: brainstorm
+topic: ideation-quality-and-process
+priority: high
+ttl: 90d
+---
+# Brainstorming — Best Practices
+## Diverge-Converge Discipline
+### The Double Diamond Model
+Effective brainstorming follows a strict two-phase rhythm:
+1. **Diverge (Diamond 1 — Explore)**: Generate as many ideas as possible without judgment
+   - Quantity over quality — aim for 3x the ideas you think you need
+   - Build on others' ideas ("Yes, and..." not "Yes, but...")
+   - Welcome wild, impractical, and absurd ideas — they seed practical derivatives
+   - No evaluation, no criticism, no feasibility checks during this phase
+2. **Converge (Diamond 2 — Evaluate)**: Filter, assess, and prioritize
+   - Apply feasibility criteria systematically
+   - Cluster related ideas into themes
+   - Score and rank using defined criteria
+   - Kill ideas that fail viability checks — no emotional attachment
+### Phase Separation Rules
+- NEVER mix divergent and convergent thinking in the same step
+- Signal phase transitions explicitly: "Switching from generation to evaluation mode"
+- If evaluation thoughts arise during divergence, note them separately for later
+- Minimum divergent output: 15+ raw ideas before switching to convergence
+## Cross-Domain Ideation
+### Analogical Reasoning Protocol
+1. **Abstract the problem**: Strip domain-specific details to reveal the structural pattern
+   - Example: "How do we reduce customer wait times?" → "How do systems manage queue throughput?"
+2. **Identify analog domains**: Find 3+ domains that face structurally similar challenges
+   - Queue throughput analogies: airport security, hospital triage, packet routing, restaurant seating
+3. **Extract principles**: What solution pattern does each analog domain use?
+   - Airport: parallel processing lanes; Hospital: severity-based prioritization; Networking: load balancing
+4. **Transfer and adapt**: Apply extracted principles back to the original domain
+   - Parallel processing → multiple service channels; Severity-based → VIP fast-track; Load balancing → dynamic routing
+### Cross-Pollination Sources
+| Source Domain | Yields Ideas For |
+|--------------|-----------------|
+| Biology / Nature | Resilience, adaptation, self-organization, swarm behavior |
+| Architecture | Modular design, scalability, user flow, spatial efficiency |
+| Game Design | Engagement, motivation, feedback loops, progressive difficulty |
+| Supply Chain | Efficiency, logistics, just-in-time, bottleneck elimination |
+| Psychology | Behavior change, habit formation, cognitive load, persuasion |
+| Music / Art | Rhythm, composition, contrast, emotional resonance |
+## Feasibility Matrix
+### Scoring Dimensions
+Each idea should be assessed on four axes using a 1-5 scale:
+| Dimension | 1 (Low) | 3 (Medium) | 5 (High) |
+|-----------|---------|-----------|----------|
+| **Technical Viability** | Requires unproven technology or fundamental research | Needs significant engineering but uses known approaches | Can be built with existing tools and proven patterns |
+| **Resource Requirements** | Massive team, budget, and infrastructure needed | Moderate investment; achievable with dedicated effort | Lean implementation; small team, low cost |
+| **Time-to-Implement** | 12+ months to first viable version | 3-6 months to MVP | Days to weeks for prototype |
+| **Impact Potential** | Marginal improvement over status quo | Meaningful improvement for a segment of users | Transformative change for the target audience |
+### Composite Feasibility Score
+- **Viable** (score 14-20): Proceed to detailed planning
+- **Promising** (score 9-13): Worth exploring with further research or reduced scope
+- **Speculative** (score 4-8): Park for future consideration or combine with other ideas
+### Weighted Scoring for Prioritization
+When priorities differ by context, adjust weights:
+- **Startup / MVP context**: Time-to-Implement (35%), Impact (30%), Resources (20%), Technical (15%)
+- **Enterprise / R&D context**: Impact (35%), Technical (30%), Resources (20%), Time (15%)
+- **Innovation lab context**: Impact (40%), Technical (25%), Time (20%), Resources (15%)
+## Idea Volume & Quality Targets
+### The 3x Rule
+- Structured brainstorming should produce at least 3x the ideas compared to unstructured free-association
+- Unstructured session on a typical problem: ~8-12 ideas
+- Structured session target: 25-40 ideas minimum
+### Viability Rate
+- Target: 40%+ of generated ideas should score "Viable" or "Promising" on the feasibility matrix
+- Below 30% viability: generation was too unconstrained — tighten problem framing
+- Above 70% viability: generation was too conservative — push for wilder ideas
+## Idea Clustering
+### Affinity Grouping Process
+1. Spread all ideas visually (list or mind-map form)
+2. Identify natural themes or patterns — group related ideas
+3. Name each cluster with a descriptive theme label
+4. Identify "bridge ideas" that connect two or more clusters
+5. Look for underrepresented clusters — gaps may signal unexplored opportunities
+### Cluster Prioritization
+Rank clusters (not just individual ideas) by:
+- **Cluster density**: More ideas in a cluster = stronger signal of opportunity
+- **Peak idea quality**: Best single idea in the cluster
+- **Strategic alignment**: Fit with stated goals and constraints
+- **Synergy potential**: Can ideas within the cluster reinforce each other?

package/knowledge/domain.md ADDED Viewed

@@ -0,0 +1,152 @@
+---
+domain: brainstorm
+topic: creativity-frameworks-and-ideation-methods
+priority: high
+ttl: 90d
+---
+# Brainstorming — Creativity Frameworks & Ideation Methods
+## SCAMPER Framework
+SCAMPER is a structured checklist for generating ideas by transforming existing concepts along seven dimensions.
+### S — Substitute
+- What components, materials, or processes can be replaced?
+- Can you substitute a different person, place, time, or approach?
+- Example: Substitute physical meetings with async video → asynchronous standup tools
+### C — Combine
+- What ideas, features, or functions can be merged?
+- Can you combine purposes, audiences, or technologies?
+- Example: Combine fitness tracking + social media → Strava's activity feed
+### A — Adapt
+- What can be borrowed from another domain or context?
+- How can an existing solution be adapted to this problem?
+- Example: Adapt assembly-line thinking to software → CI/CD pipelines
+### M — Modify (Magnify / Minimize)
+- What can be made larger, stronger, more frequent, or exaggerated?
+- What can be made smaller, lighter, simpler, or less frequent?
+- Example: Magnify personalization → hyper-personalized learning paths
+### P — Put to Another Use
+- Can the existing product/process serve a different purpose?
+- Can waste or by-products be repurposed?
+- Example: Put shipping containers to another use → modular housing
+### E — Eliminate
+- What can be removed without losing core value?
+- What steps, features, or constraints are unnecessary?
+- Example: Eliminate the middleman → direct-to-consumer brands
+### R — Reverse / Rearrange
+- What happens if you reverse the process or sequence?
+- Can components be rearranged for a different outcome?
+- Example: Reverse the sales model → customer sets the price (Priceline)
+## Six Thinking Hats (de Bono)
+Each hat represents a distinct thinking mode. Cycling through all six ensures comprehensive idea exploration.
+### White Hat — Facts & Data
+- What information do we have? What is missing?
+- Focus on objective data, numbers, and known constraints
+- Use to establish the factual foundation before ideation
+### Red Hat — Emotions & Intuition
+- What does your gut feeling say? What feels right or wrong?
+- Capture emotional reactions without needing justification
+- Use to surface hunches, excitement, or discomfort about ideas
+### Black Hat — Caution & Risk
+- What could go wrong? What are the risks and downsides?
+- Critical judgment, devil's advocate perspective
+- Use ONLY in convergent phase — never during idea generation
+### Yellow Hat — Optimism & Value
+- What are the benefits? Why could this work?
+- Focus on best-case scenarios and value propositions
+- Use to build on nascent ideas and find hidden strengths
+### Green Hat — Creativity & Alternatives
+- What new ideas are possible? What if we tried something different?
+- Provocative thinking, lateral connections, wild ideas welcome
+- Use as the primary hat during divergent generation
+### Blue Hat — Process & Meta-thinking
+- Where are we in the process? What hat should we use next?
+- Controls the thinking process, sets agenda, summarizes
+- Use to orchestrate transitions between divergent and convergent phases
+## TRIZ (Theory of Inventive Problem Solving)
+TRIZ provides systematic patterns for resolving contradictions in problem-solving.
+### 40 Inventive Principles (Key Subset)
+| # | Principle | Description | Ideation Prompt |
+|---|-----------|-------------|-----------------|
+| 1 | Segmentation | Divide an object or system into parts | "What if we break this into independent modules?" |
+| 2 | Taking Out / Extraction | Remove a problematic part or property | "What if we isolate the essential function from the rest?" |
+| 5 | Merging | Combine identical or similar objects/operations | "What if we merge these parallel processes?" |
+| 10 | Preliminary Action | Perform required changes in advance | "What if we pre-process or pre-position resources?" |
+| 13 | The Other Way Round | Invert the action or process | "What if we reverse who does what, or the sequence?" |
+| 15 | Dynamization | Make a rigid system flexible or adaptive | "What if this adjusted in real-time based on context?" |
+| 17 | Another Dimension | Move to a different dimension (1D→2D→3D, or add a layer) | "What if we add a new axis or layer to this?" |
+| 22 | Blessing in Disguise | Turn a harmful factor into a benefit | "What if this problem is actually an opportunity?" |
+| 25 | Self-Service | Make the object serve or maintain itself | "What if users contribute to the solution automatically?" |
+| 35 | Parameter Change | Change physical or logical state/parameters | "What if we change the frequency, format, or medium?" |
+### Contradiction Matrix
+- Identify the **improving parameter** (what you want to enhance)
+- Identify the **worsening parameter** (what degrades when you improve)
+- Look up suggested inventive principles from the matrix
+- Example: Improving speed (parameter 9) worsens reliability (parameter 27) → Apply principles #10 (Preliminary Action), #28 (Mechanical Substitution), #35 (Parameter Change)
+## Lateral Thinking Techniques
+### Random Entry
+- Pick a random word, image, or concept and force connections to the problem
+- The randomness breaks established thinking patterns
+- Process: Random stimulus → extract attributes → force-fit to problem → generate ideas
+### Challenge Assumptions
+- List all assumptions about the problem (explicit and implicit)
+- Systematically challenge each: "What if this assumption were false?"
+- Generate ideas that operate in the assumption-free space
+- Example: "What if customers don't need to own the product?" → subscription/rental models
+### Analogy Transfer
+- Find a domain that has solved a structurally similar problem
+- Extract the solution principle (abstracted from domain specifics)
+- Apply the principle to the target domain
+- Example: How does nature handle load balancing? → ant colony optimization → distributed computing algorithms
+### Provocation (PO)
+- Make a deliberately provocative, impossible, or absurd statement
+- Use the provocation as a stepping stone to practical ideas
+- Format: "PO: [provocative statement]" → "Movement: [what's interesting about this?]" → "Idea: [practical derivative]"
+- Example: "PO: The product repairs itself" → Movement: self-diagnosing systems → Idea: automated error detection and self-healing code
+## Innovation Patterns
+### Technology Transfer Patterns
+- **Digital twin**: Create a virtual replica of a physical process
+- **Platform shift**: Move from product to platform, enabling ecosystem effects
+- **Automation cascade**: Automate one step, revealing the next bottleneck to automate
+- **Unbundling/rebundling**: Break a monolithic offering into components, then recombine differently
+### Business Model Innovation Patterns
+- **Freemium**: Free basic tier funded by premium conversions
+- **Marketplace**: Connect supply and demand, take a transaction fee
+- **Subscription**: Recurring revenue from ongoing access
+- **Razor-and-blades**: Low-cost entry product, high-margin consumables
+- **Data monetization**: Generate value from data produced by the core product
+### User Experience Innovation Patterns
+- **Reduce friction**: Remove steps between intent and outcome
+- **Progressive disclosure**: Show complexity only when the user is ready
+- **Anticipatory design**: Predict and pre-fill user needs
+- **Social proof integration**: Use community behavior to guide individual decisions

package/manifest.json ADDED Viewed

@@ -0,0 +1,26 @@
+{
+  "name": "@botlearn/brainstorm",
+  "version": "0.1.0",
+  "description": "Multi-dimensional ideation and brainstorming with structured creativity frameworks, feasibility assessment, and idea clustering for OpenClaw Agent",
+  "category": "creative-generation",
+  "author": "BotLearn",
+  "benchmarkDimension": "creative-generation",
+  "expectedImprovement": 40,
+  "dependencies": {},
+  "compatibility": {
+    "openclaw": ">=0.5.0"
+  },
+  "files": {
+    "skill": "skill.md",
+    "knowledge": [
+      "knowledge/domain.md",
+      "knowledge/best-practices.md",
+      "knowledge/anti-patterns.md"
+    ],
+    "strategies": [
+      "strategies/main.md"
+    ],
+    "smokeTest": "tests/smoke.json",
+    "benchmark": "tests/benchmark.json"
+  }
+}

package/package.json ADDED Viewed

@@ -0,0 +1,35 @@
+{
+  "name": "@botlearn/brainstorm",
+  "version": "0.1.0",
+  "description": "Multi-dimensional ideation and brainstorming with structured creativity frameworks, feasibility assessment, and idea clustering for OpenClaw Agent",
+  "type": "module",
+  "main": "manifest.json",
+  "files": [
+    "manifest.json",
+    "skill.md",
+    "knowledge/",
+    "strategies/",
+    "tests/",
+    "README.md"
+  ],
+  "keywords": [
+    "botlearn",
+    "openclaw",
+    "skill",
+    "creative-generation"
+  ],
+  "author": "BotLearn",
+  "license": "MIT",
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/readai-team/botlearn-awesome-skills.git",
+    "directory": "packages/skills/brainstorm"
+  },
+  "homepage": "https://github.com/readai-team/botlearn-awesome-skills/tree/main/packages/skills/brainstorm",
+  "bugs": {
+    "url": "https://github.com/readai-team/botlearn-awesome-skills/issues"
+  },
+  "publishConfig": {
+    "access": "public"
+  }
+}

package/skill.md ADDED Viewed

@@ -0,0 +1,44 @@
+---
+name: brainstorm
+role: Creative Ideation Specialist
+version: 1.0.0
+triggers:
+  - "brainstorm"
+  - "generate ideas"
+  - "ideation"
+  - "think of ways"
+  - "creative solutions"
+  - "come up with ideas"
+  - "brainstorming session"
+---
+# Role
+You are a Creative Ideation Specialist. When activated, you facilitate structured brainstorming sessions that produce multi-dimensional ideas across diverse categories, assess each idea for feasibility, and deliver clustered, prioritized output. Your goal is to triple the number of ideas generated compared to unstructured ideation while ensuring 40%+ of ideas are viable.
+# Capabilities
+1. Reframe problems using multiple lenses (user-centric, systemic, contrarian, analogical) to unlock non-obvious solution spaces
+2. Apply structured creativity frameworks (SCAMPER, Six Thinking Hats, TRIZ, lateral thinking) to generate ideas across multiple dimensions
+3. Produce cross-domain idea transfer by mapping patterns from unrelated industries and disciplines onto the target problem
+4. Assess each idea along a feasibility matrix covering technical viability, resource requirements, time-to-implement, and impact potential
+5. Cluster related ideas into coherent themes and prioritize clusters using weighted scoring across novelty, feasibility, and strategic fit
+# Constraints
+1. Never evaluate ideas during the divergent generation phase — defer all judgment to the convergent assessment phase
+2. Never anchor the session on a single framework — always apply at least two complementary creativity methods
+3. Never limit ideas to the obvious domain — always include at least one cross-domain or analogical transfer perspective
+4. Never present ideas as a flat list — always cluster, score, and rank for actionability
+5. Never skip feasibility assessment — every idea must carry at least a high/medium/low viability signal
+6. Always separate divergent (generation) from convergent (evaluation) thinking phases
+# Activation
+WHEN the user requests brainstorming, ideation, or creative solution generation:
+1. Analyze the problem space and identify scope, constraints, and desired outcomes
+2. Reframe the problem through multiple lenses following strategies/main.md Step 1
+3. Apply creativity frameworks from knowledge/domain.md to generate ideas across dimensions
+4. Follow diverge-then-converge discipline from knowledge/best-practices.md
+5. Avoid anti-patterns identified in knowledge/anti-patterns.md (premature evaluation, anchoring, domain tunnel vision)
+6. Assess feasibility, cluster ideas, and output a prioritized, actionable idea portfolio

package/strategies/main.md ADDED Viewed

@@ -0,0 +1,101 @@
+---
+strategy: brainstorm
+version: 1.0.0
+steps: 6
+---
+# Brainstorming Strategy
+## Step 1: Problem Reframing
+- Parse the user's request to identify: **problem statement**, **scope**, **constraints**, **desired outcomes**, **target audience**
+- Reframe the problem through at least 3 lenses:
+  - **User-centric**: What is the core user pain or unmet need?
+  - **Systemic**: What are the upstream causes and downstream effects?
+  - **Contrarian**: What if the opposite of the obvious approach were tried?
+  - **Analogical**: What does this problem resemble in a completely different domain?
+- IF the problem is vague THEN ask one clarifying question about scope or constraints before proceeding
+- IF the problem is well-defined THEN note any implicit assumptions and flag them for challenge
+- OUTPUT: 2-4 reframed problem statements that open different solution spaces
+## Step 2: Multi-Dimensional Divergence
+- SELECT at least 2 creativity frameworks based on problem type:
+  - Incremental improvement → SCAMPER (systematic transformation of existing solution)
+  - Complex trade-offs → TRIZ (contradiction resolution via inventive principles)
+  - Exploratory / open-ended → Six Thinking Hats (rotating perspective lenses)
+  - Stuck / obvious solutions only → Lateral Thinking (random entry, provocation, assumption challenge)
+- FOR EACH selected framework:
+  - Apply the framework systematically to each reframed problem statement from Step 1
+  - Generate 5-10 raw ideas per framework application
+  - Capture ALL ideas without filtering — no evaluation in this step
+- APPLY cross-domain transfer from knowledge/best-practices.md:
+  - Identify 2-3 analog domains using analogical reasoning protocol
+  - Extract solution principles from each analog domain
+  - Force-fit at least 3 cross-domain ideas onto the target problem
+- TARGET: Minimum 20 raw ideas across all frameworks and dimensions
+## Step 3: Idea Generation & Expansion
+- Review the raw idea pool from Step 2
+- FOR EACH promising direction, expand with specificity:
+  - What exactly would this look like in practice?
+  - Who specifically would it serve and how?
+  - What would the first concrete step be?
+- APPLY idea building techniques:
+  - **Combination**: Merge 2-3 complementary ideas into hybrid concepts
+  - **Escalation**: Take a modest idea and push it to its extreme — what's the "10x version"?
+  - **Inversion**: For each strong idea, generate its opposite — does the inverse also work?
+- VERIFY against anti-patterns from knowledge/anti-patterns.md:
+  - Are ideas diverse across categories (not anchored)?
+  - Are there cross-domain ideas present (no tunnel vision)?
+  - Are ideas specific enough to be actionable (no vague directions)?
+- TARGET: 25-40 expanded, specific ideas
+## Step 4: Feasibility Assessment
+- SWITCH to convergent thinking mode — signal the transition explicitly
+- FOR EACH idea, score on 4 dimensions (1-5 scale) from knowledge/best-practices.md:
+  - **Technical Viability** (1-5): Can this be built with known approaches?
+  - **Resource Requirements** (1-5): What investment of people, money, and infrastructure?
+  - **Time-to-Implement** (1-5): How long to a working prototype or MVP?
+  - **Impact Potential** (1-5): How significant is the change for the target audience?
+- COMPUTE composite feasibility score (sum of 4 dimensions, max 20):
+  - Viable (14-20): Ready for detailed planning
+  - Promising (9-13): Worth exploring further or with reduced scope
+  - Speculative (4-8): Park for future or combine with stronger ideas
+- TAG novelty level: "Novel", "Adapted", or "Standard"
+- IF user provided specific constraints THEN check each idea against constraint list
+- TARGET: 40%+ of ideas scoring "Viable" or "Promising"
+## Step 5: Clustering & Theme Identification
+- GROUP ideas by natural affinity:
+  - Same target audience or user segment
+  - Same underlying mechanism or technology
+  - Same strategic direction or business model
+  - Same type of innovation (incremental / adjacent / transformative)
+- NAME each cluster with a descriptive theme label
+- IDENTIFY "bridge ideas" that connect multiple clusters
+- IDENTIFY "orphan ideas" that don't fit any cluster — create a "Wild Cards" group
+  - Review orphans specifically: these often contain the most novel insights
+- RANK clusters by:
+  - **Cluster density**: Number of ideas (more = stronger signal)
+  - **Peak quality**: Highest-scoring individual idea in the cluster
+  - **Strategic alignment**: Fit with the user's stated goals and constraints
+  - **Synergy potential**: Can ideas within the cluster reinforce each other?
+## Step 6: Prioritization & Output
+- Present results in structured format:
+  - **Executive Summary**: Top 3-5 recommended ideas with one-sentence descriptions
+  - **Cluster Map**: All clusters ranked by priority, with theme labels and idea counts
+  - **Top Ideas Detail**: For each top idea, provide:
+    - **Idea name** and one-paragraph description
+    - **Feasibility scores** (Technical / Resources / Time / Impact)
+    - **Novelty tag** (Novel / Adapted / Standard)
+    - **First concrete step** to explore or prototype
+    - **Key risk** and mitigation approach
+  - **Full Idea Portfolio**: Complete list organized by cluster, with scores
+  - **Wild Cards**: Unconventional ideas worth monitoring
+- SELF-CHECK:
+  - Did we generate 3x more ideas than a naive list would produce?
+  - Are 40%+ of ideas rated Viable or Promising?
+  - Are ideas spread across 3+ distinct clusters (not anchored to one theme)?
+  - Does every top idea have a concrete first step?
+  - Are cross-domain and novel ideas represented in the top recommendations?
+  - IF any check fails THEN loop back to Step 2 with additional frameworks or cross-domain prompts

package/tests/benchmark.json ADDED Viewed

@@ -0,0 +1,476 @@
+{
+  "version": "0.0.1",
+  "dimension": "creative-generation",
+  "tasks": [
+    {
+      "id": "bench-easy-01",
+      "difficulty": "easy",
+      "description": "Generate ideas for a simple product improvement",
+      "input": "Brainstorm ways to make a coffee shop's ordering process faster and more enjoyable for customers.",
+      "rubric": [
+        {
+          "criterion": "Idea Volume",
+          "weight": 0.3,
+          "scoring": {
+            "5": "15+ ideas generated across technology, service design, layout, and customer experience dimensions",
+            "3": "8-14 ideas mostly focused on one or two dimensions",
+            "1": "Fewer than 8 ideas",
+            "0": "Fewer than 3 ideas"
+          }
+        },
+        {
+          "criterion": "Framework Usage",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Applies at least 2 creativity frameworks visibly; ideas clearly derive from structured methods",
+            "3": "One framework applied; some structure visible",
+            "1": "No framework; free association only",
+            "0": "No structured approach"
+          }
+        },
+        {
+          "criterion": "Actionability",
+          "weight": 0.4,
+          "scoring": {
+            "5": "Each idea includes what it is, who benefits, and a first step; feasibility rated; ideas are clustered",
+            "3": "Ideas described at headline level with some feasibility notes",
+            "1": "Vague one-line ideas without detail",
+            "0": "Unusable output"
+          }
+        }
+      ],
+      "expectedScoreWithout": 35,
+      "expectedScoreWith": 75
+    },
+    {
+      "id": "bench-easy-02",
+      "difficulty": "easy",
+      "description": "Brainstorm names and concepts for a new product",
+      "input": "Generate creative ideas for a mobile app that helps people build better daily habits. I need both feature ideas and unique selling points.",
+      "rubric": [
+        {
+          "criterion": "Idea Diversity",
+          "weight": 0.35,
+          "scoring": {
+            "5": "Ideas span gamification, social features, AI personalization, behavioral psychology, integrations, and novel UX; includes cross-domain inspiration",
+            "3": "Ideas cover 2-3 categories; some variety but limited cross-domain thinking",
+            "1": "All ideas are variations of the same approach (e.g., all gamification)",
+            "0": "1-2 generic ideas"
+          }
+        },
+        {
+          "criterion": "Creativity & Novelty",
+          "weight": 0.35,
+          "scoring": {
+            "5": "At least 30% of ideas are novel or adapted from other domains; includes surprising or non-obvious concepts",
+            "3": "Mix of standard and somewhat creative ideas; few surprises",
+            "1": "All ideas are well-known in the habit-tracking space",
+            "0": "No creative effort"
+          }
+        },
+        {
+          "criterion": "Structure & Feasibility",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Ideas clustered by theme; feasibility assessed; top ideas have concrete descriptions and first steps",
+            "3": "Some grouping and feasibility notes but incomplete",
+            "1": "Flat list with no structure or feasibility",
+            "0": "Unorganized dump"
+          }
+        }
+      ],
+      "expectedScoreWithout": 30,
+      "expectedScoreWith": 70
+    },
+    {
+      "id": "bench-easy-03",
+      "difficulty": "easy",
+      "description": "Simple brainstorm for event planning",
+      "input": "Think of creative ways to make a company's annual all-hands meeting more engaging and memorable for 150 remote employees.",
+      "rubric": [
+        {
+          "criterion": "Idea Volume & Range",
+          "weight": 0.35,
+          "scoring": {
+            "5": "15+ ideas covering technology, interactive formats, entertainment, team building, and follow-up activities",
+            "3": "8-14 ideas across 2-3 categories",
+            "1": "Fewer than 8 ideas in one category",
+            "0": "Fewer than 3 ideas"
+          }
+        },
+        {
+          "criterion": "Remote-Specific Innovation",
+          "weight": 0.35,
+          "scoring": {
+            "5": "Ideas specifically leverage or address remote context; includes novel virtual engagement mechanics; cross-domain inspiration (gaming, live events, education)",
+            "3": "Ideas acknowledge remote context but mostly adapt in-person activities",
+            "1": "Ideas ignore the remote constraint or are generic",
+            "0": "No remote consideration"
+          }
+        },
+        {
+          "criterion": "Prioritization",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Ideas ranked by impact and effort; feasibility noted; top 3-5 highlighted with implementation steps",
+            "3": "Some ranking but informal; limited implementation detail",
+            "1": "No ranking or prioritization",
+            "0": "No structure"
+          }
+        }
+      ],
+      "expectedScoreWithout": 35,
+      "expectedScoreWith": 75
+    },
+    {
+      "id": "bench-med-01",
+      "difficulty": "medium",
+      "description": "Multi-constraint brainstorm requiring trade-off analysis",
+      "input": "Brainstorm solutions to reduce food waste in a university dining hall that serves 3,000 students daily. Constraints: budget under $10,000 for implementation, must not reduce food variety, and solutions should be deployable within one semester. I need ideas across technology, behavioral, and operational categories.",
+      "rubric": [
+        {
+          "criterion": "Constraint Awareness",
+          "weight": 0.25,
+          "scoring": {
+            "5": "All three constraints (budget, variety, timeline) explicitly addressed in feasibility assessment; ideas flagged when they push constraint boundaries",
+            "3": "Constraints mentioned but not systematically checked against each idea",
+            "1": "Constraints acknowledged but largely ignored in idea generation",
+            "0": "Constraints not considered"
+          }
+        },
+        {
+          "criterion": "Multi-Category Coverage",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Substantive ideas in all 3 requested categories (technology, behavioral, operational) plus unexpected categories; 20+ ideas total",
+            "3": "Ideas in 2 of 3 categories; 10-19 ideas",
+            "1": "Ideas concentrated in one category only",
+            "0": "Fewer than 5 ideas"
+          }
+        },
+        {
+          "criterion": "Feasibility Rigor",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Each idea scored on 4 feasibility dimensions; composite scores calculated; 40%+ viable/promising; budget estimates included for top ideas",
+            "3": "Informal feasibility notes; some viable ideas identified but scoring incomplete",
+            "1": "Minimal feasibility consideration",
+            "0": "No feasibility assessment"
+          }
+        },
+        {
+          "criterion": "Innovation Quality",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Includes cross-domain transfers (e.g., supply chain optimization, behavioral economics nudges); at least 30% novel or adapted ideas",
+            "3": "Some creative ideas but mostly standard approaches to food waste",
+            "1": "All ideas are commonly known food waste solutions",
+            "0": "No creative effort"
+          }
+        }
+      ],
+      "expectedScoreWithout": 25,
+      "expectedScoreWith": 65
+    },
+    {
+      "id": "bench-med-02",
+      "difficulty": "medium",
+      "description": "Cross-domain ideation for a technical challenge",
+      "input": "Generate creative ideas for reducing API response latency in a microservices architecture without increasing infrastructure costs. Current p99 latency is 800ms and the target is under 200ms. Think beyond traditional caching and load balancing.",
+      "rubric": [
+        {
+          "criterion": "Beyond-Obvious Ideas",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Generates ideas beyond standard caching/load-balancing: includes architectural patterns, data flow redesigns, edge computing, predictive pre-computation, protocol changes, or cross-domain analogies; at least 5 non-obvious ideas",
+            "3": "Some ideas go beyond obvious solutions but most are variations of caching/scaling",
+            "1": "All ideas are standard optimization techniques",
+            "0": "Generic advice without specific ideas"
+          }
+        },
+        {
+          "criterion": "Technical Depth",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Ideas include specific technical mechanisms, trade-offs, and estimated latency impact; demonstrates understanding of microservices challenges",
+            "3": "Ideas are technically reasonable but lack specificity on implementation or impact",
+            "1": "Ideas are technically shallow or contain errors",
+            "0": "Technically invalid suggestions"
+          }
+        },
+        {
+          "criterion": "Framework Application",
+          "weight": 0.2,
+          "scoring": {
+            "5": "Uses TRIZ contradiction resolution (speed vs. cost), SCAMPER on the request/response lifecycle, or analogical transfer from other latency-critical domains",
+            "3": "Some structured approach but framework usage is superficial",
+            "1": "No framework usage; ad-hoc brainstorming only",
+            "0": "No structured approach"
+          }
+        },
+        {
+          "criterion": "Prioritized Output",
+          "weight": 0.2,
+          "scoring": {
+            "5": "Ideas clustered by approach type; feasibility scored with cost-neutrality check; top ideas include estimated latency reduction and implementation steps",
+            "3": "Some grouping and prioritization but incomplete feasibility data",
+            "1": "Flat list without prioritization",
+            "0": "Unstructured output"
+          }
+        }
+      ],
+      "expectedScoreWithout": 25,
+      "expectedScoreWith": 65
+    },
+    {
+      "id": "bench-med-03",
+      "difficulty": "medium",
+      "description": "Brainstorm with stakeholder complexity",
+      "input": "Brainstorm ideas for a public library system to increase engagement among teenagers (13-18). The library board is conservative and risk-averse, teens find libraries 'boring', and budget is tight. I need ideas that would appeal to teens AND get board approval.",
+      "rubric": [
+        {
+          "criterion": "Dual-Audience Navigation",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Ideas explicitly address both teen appeal AND board acceptability; trade-offs between audiences are analyzed; framing suggestions for board presentations included",
+            "3": "Ideas generally appropriate for both audiences but trade-offs not explicitly analyzed",
+            "1": "Ideas appeal to one audience but alienate the other",
+            "0": "Audience needs not considered"
+          }
+        },
+        {
+          "criterion": "Idea Diversity & Volume",
+          "weight": 0.25,
+          "scoring": {
+            "5": "20+ ideas across programming, space design, digital integration, partnerships, and outreach; includes cross-domain inspiration from gaming, social media, or education",
+            "3": "10-19 ideas across 2-3 categories",
+            "1": "Fewer than 10 ideas in one category",
+            "0": "Fewer than 5 ideas"
+          }
+        },
+        {
+          "criterion": "Feasibility & Budget Awareness",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Ideas scored for feasibility with budget sensitivity; low-cost quick wins identified alongside longer-term investments; implementation staged by effort",
+            "3": "Budget mentioned but not systematically addressed; some feasibility notes",
+            "1": "Budget constraint largely ignored",
+            "0": "No feasibility consideration"
+          }
+        },
+        {
+          "criterion": "Problem Reframing",
+          "weight": 0.2,
+          "scoring": {
+            "5": "Problem reframed from multiple angles (teen identity, community role, digital-physical hybrid, competition with other activities); reframes drive meaningfully different ideas",
+            "3": "Some reframing attempted but ideas don't significantly differ from initial framing",
+            "1": "No reframing; takes the problem at face value",
+            "0": "Problem misunderstood"
+          }
+        }
+      ],
+      "expectedScoreWithout": 25,
+      "expectedScoreWith": 70
+    },
+    {
+      "id": "bench-med-04",
+      "difficulty": "medium",
+      "description": "Brainstorm requiring feasibility matrix with quantitative scoring",
+      "input": "Generate ideas for a small e-commerce startup (5 people, $50K budget) to differentiate from Amazon in a niche market (artisanal home goods). I need each idea scored on a feasibility matrix with technical viability, resource requirements, time-to-implement, and impact potential.",
+      "rubric": [
+        {
+          "criterion": "Feasibility Matrix Quality",
+          "weight": 0.35,
+          "scoring": {
+            "5": "Every idea scored on all 4 dimensions (1-5 scale); composite scores computed; ideas classified as viable/promising/speculative; matrix is internally consistent",
+            "3": "Most ideas have feasibility notes but not all 4 dimensions; some numerical scoring",
+            "1": "Informal feasibility comments without systematic scoring",
+            "0": "No feasibility assessment"
+          }
+        },
+        {
+          "criterion": "Strategic Differentiation",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Ideas target specific Amazon weaknesses (personalization, curation, community, story, craft authenticity); each idea explains WHY it differentiates",
+            "3": "Some differentiation awareness but ideas could apply to any e-commerce site",
+            "1": "Ideas are generic e-commerce tactics not specific to competing with Amazon",
+            "0": "No differentiation strategy"
+          }
+        },
+        {
+          "criterion": "Startup-Context Fit",
+          "weight": 0.2,
+          "scoring": {
+            "5": "Ideas respect 5-person team and $50K budget; lean/scrappy approaches prioritized; growth staging suggested",
+            "3": "Budget acknowledged but some ideas clearly exceed constraints",
+            "1": "Ideas require enterprise-level resources",
+            "0": "Constraints completely ignored"
+          }
+        },
+        {
+          "criterion": "Clustering & Prioritization",
+          "weight": 0.15,
+          "scoring": {
+            "5": "Ideas grouped into strategic themes; top 3-5 ideas highlighted with first steps; quick wins vs. long-term plays distinguished",
+            "3": "Some grouping but ranking is informal",
+            "1": "Flat list",
+            "0": "No organization"
+          }
+        }
+      ],
+      "expectedScoreWithout": 25,
+      "expectedScoreWith": 70
+    },
+    {
+      "id": "bench-hard-01",
+      "difficulty": "hard",
+      "description": "Complex multi-stakeholder brainstorm with competing priorities",
+      "input": "Brainstorm solutions for reducing urban traffic congestion in a mid-size city (population 500K). Stakeholders: city government (limited budget, election cycle pressure), residents (want less noise and pollution), businesses (need delivery access and customer parking), environmental groups (want car-free zones), and public transit authority (underfunded). Generate ideas that balance these competing interests, assess feasibility, and identify which stakeholder coalitions each idea could build.",
+      "rubric": [
+        {
+          "criterion": "Stakeholder Analysis",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Each idea explicitly maps which stakeholders benefit, which bear costs, and what coalitions support it; trade-offs between stakeholders analyzed; ideas designed to build multi-stakeholder support",
+            "3": "Stakeholders mentioned but mapping is incomplete; 2-3 stakeholders considered per idea",
+            "1": "Stakeholders listed but ideas don't account for competing interests",
+            "0": "Stakeholder complexity ignored"
+          }
+        },
+        {
+          "criterion": "Idea Volume & Multi-Dimensionality",
+          "weight": 0.25,
+          "scoring": {
+            "5": "25+ ideas across infrastructure, policy, technology, behavioral, economic incentive, and land-use dimensions; includes cross-domain transfers from logistics, game theory, or urban planning research",
+            "3": "15-24 ideas across 3-4 dimensions",
+            "1": "Fewer than 15 ideas concentrated in 1-2 dimensions",
+            "0": "Fewer than 8 ideas"
+          }
+        },
+        {
+          "criterion": "Framework Sophistication",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Uses TRIZ to resolve stakeholder contradictions; applies Six Hats for multi-perspective analysis; uses SCAMPER on existing traffic systems; cross-domain analogies (ant colonies, fluid dynamics, network routing)",
+            "3": "One framework applied well; some structured thinking visible",
+            "1": "No framework usage; ideas appear ad-hoc",
+            "0": "No structured creative methodology"
+          }
+        },
+        {
+          "criterion": "Actionable Output",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Ideas clustered by implementation timeline (quick wins / medium-term / transformative); feasibility scored; political viability assessed; top 5 ideas have detailed implementation roadmaps",
+            "3": "Some clustering and prioritization; limited implementation detail",
+            "1": "Ideas listed without prioritization or implementation guidance",
+            "0": "Unstructured idea dump"
+          }
+        }
+      ],
+      "expectedScoreWithout": 20,
+      "expectedScoreWith": 60
+    },
+    {
+      "id": "bench-hard-02",
+      "difficulty": "hard",
+      "description": "Paradigm-breaking brainstorm requiring fundamental reframing",
+      "input": "The traditional university education model (4-year degree, lectures, exams, campus-based) is being challenged by online learning, bootcamps, and AI tutors. Brainstorm radically different models for higher education in 2035. Don't just improve the current model — think about what could replace it entirely. I need ideas that are truly novel, not incremental improvements, with feasibility assessment for a 10-year horizon.",
+      "rubric": [
+        {
+          "criterion": "Paradigm Novelty",
+          "weight": 0.3,
+          "scoring": {
+            "5": "At least 50% of ideas represent genuinely new education paradigms (not incremental improvements to lectures/exams); challenges fundamental assumptions about what a university IS; ideas would be surprising to higher-ed professionals",
+            "3": "Mix of novel and incremental ideas; some assumption-challenging but also many 'better version of current' ideas",
+            "1": "Mostly incremental improvements to the existing model with some new technology added",
+            "0": "No paradigm-breaking ideas"
+          }
+        },
+        {
+          "criterion": "Problem Reframing Depth",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Challenges 5+ fundamental assumptions (time-bounded, location-based, credential-focused, age-cohort, discipline-siloed); reframes from learner, employer, society, and knowledge-creation perspectives",
+            "3": "Challenges 2-3 assumptions; 2 reframing perspectives",
+            "1": "Takes the problem at face value with minimal reframing",
+            "0": "No reframing"
+          }
+        },
+        {
+          "criterion": "10-Year Feasibility Assessment",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Each idea assessed for 10-year viability considering technology trends, demographic shifts, economic forces, and regulatory environment; transition pathways from current to future state described",
+            "3": "Some forward-looking feasibility but analysis is shallow or misses key factors",
+            "1": "Feasibility based on current conditions, not 10-year projection",
+            "0": "No feasibility assessment"
+          }
+        },
+        {
+          "criterion": "Cross-Domain Inspiration",
+          "weight": 0.2,
+          "scoring": {
+            "5": "Ideas draw from gaming (progression systems), open source (collaborative creation), healthcare (personalized treatment), professional sports (talent development), or other unexpected domains",
+            "3": "Some cross-domain thinking but mostly from obvious adjacent domains (e.g., corporate training)",
+            "1": "All ideas from within the education domain",
+            "0": "No cross-domain thinking"
+          }
+        }
+      ],
+      "expectedScoreWithout": 20,
+      "expectedScoreWith": 60
+    },
+    {
+      "id": "bench-hard-03",
+      "difficulty": "hard",
+      "description": "Brainstorm under severe constraints requiring creative constraint navigation",
+      "input": "A nonprofit organization with 2 staff members and zero budget needs to increase community awareness about mental health resources in a rural area with limited internet access and high stigma around mental health. Brainstorm at least 20 ideas. Every idea must work with no money and no reliable internet. I need creative solutions that navigate the stigma barrier.",
+      "rubric": [
+        {
+          "criterion": "Constraint Adherence",
+          "weight": 0.25,
+          "scoring": {
+            "5": "All ideas genuinely work with zero budget and no internet; creative resource acquisition strategies (partnerships, in-kind, volunteer networks) identified; no ideas that secretly require funding",
+            "3": "Most ideas respect constraints but 2-3 require modest resources not available",
+            "1": "Many ideas ignore the zero-budget or no-internet constraints",
+            "0": "Constraints largely ignored"
+          }
+        },
+        {
+          "criterion": "Stigma Navigation",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Ideas specifically address stigma with proven de-stigmatization approaches (normalization, trusted messenger strategy, indirect entry points, community-embedded approaches); at least 5 ideas include stigma-navigation mechanisms",
+            "3": "Stigma acknowledged but ideas don't specifically address how to overcome it",
+            "1": "Stigma mentioned once; ideas would likely trigger stigma avoidance",
+            "0": "Stigma barrier ignored entirely"
+          }
+        },
+        {
+          "criterion": "Idea Volume & Creativity",
+          "weight": 0.25,
+          "scoring": {
+            "5": "20+ ideas; includes cross-domain inspiration (public health campaigns, grassroots organizing, religious community networks, agricultural extension models); highly creative given extreme constraints",
+            "3": "15-19 ideas with moderate creativity; some cross-domain thinking",
+            "1": "Fewer than 15 ideas; mostly obvious approaches (flyers, events)",
+            "0": "Fewer than 10 ideas"
+          }
+        },
+        {
+          "criterion": "Feasibility & Prioritization",
+          "weight": 0.2,
+          "scoring": {
+            "5": "Ideas scored for impact and effort (given 2 staff, zero budget); quick wins vs. relationship-building plays identified; implementation sequence suggested; partnership opportunities flagged",
+            "3": "Some prioritization but effort estimates missing or unrealistic for 2-person team",
+            "1": "No prioritization; ideas presented as equally weighted",
+            "0": "No feasibility assessment"
+          }
+        }
+      ],
+      "expectedScoreWithout": 20,
+      "expectedScoreWith": 60
+    }
+  ]
+}

package/tests/smoke.json ADDED Viewed

@@ -0,0 +1,54 @@
+{
+  "version": "0.0.1",
+  "timeout": 60,
+  "tasks": [
+    {
+      "id": "smoke-01",
+      "description": "Brainstorm ideas for improving employee onboarding at a mid-size tech company with multi-framework ideation and feasibility assessment",
+      "input": "Brainstorm creative solutions to improve the employee onboarding experience at a mid-size tech company (200 employees). Current onboarding takes 2 weeks and new hires report feeling lost and disconnected. Budget is limited. I need diverse ideas with feasibility ratings.",
+      "rubric": [
+        {
+          "criterion": "Idea Quantity & Diversity",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Generates 15+ ideas spanning 3+ distinct categories (technology, culture, process, mentorship, etc.); uses multiple creativity frameworks; includes cross-domain ideas",
+            "3": "Generates 8-14 ideas across 2 categories; uses at least one framework; mostly same-domain thinking",
+            "1": "Generates fewer than 8 ideas; all ideas are from the same category or approach",
+            "0": "Fewer than 3 ideas or no structured ideation"
+          }
+        },
+        {
+          "criterion": "Framework Application",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Visibly applies 2+ creativity frameworks (SCAMPER, Six Hats, TRIZ, lateral thinking); framework application drives idea diversity",
+            "3": "References a framework but applies it superficially; ideas don't clearly derive from structured methods",
+            "1": "No recognizable framework usage; ideas appear free-associated",
+            "0": "No structured approach at all"
+          }
+        },
+        {
+          "criterion": "Feasibility Assessment",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Each idea has explicit feasibility scoring (technical viability, resources, time, impact); ideas are classified as viable/promising/speculative; 40%+ are viable or promising",
+            "3": "Some ideas have feasibility notes but scoring is informal or incomplete; viability is mentioned but not systematically assessed",
+            "1": "Feasibility is mentioned for 1-2 ideas only; no systematic assessment",
+            "0": "No feasibility assessment provided"
+          }
+        },
+        {
+          "criterion": "Output Structure",
+          "weight": 0.2,
+          "scoring": {
+            "5": "Ideas are clustered by theme, ranked by priority; top ideas have concrete first steps; includes executive summary and wild cards section",
+            "3": "Ideas are grouped but ranking is informal; some actionable detail but not consistently",
+            "1": "Flat list of ideas with no clustering or prioritization",
+            "0": "Unstructured dump of ideas"
+          }
+        }
+      ],
+      "passThreshold": 60
+    }
+  ]
+}