npm - @faviovazquez/deliberate - Versions diffs - 0.2.1 → 0.2.4 - Mend

@faviovazquez/deliberate 0.2.1 → 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/BRAINSTORM.md CHANGED Viewed

@@ -14,6 +14,16 @@ Brainstorm mode is not a stripped-down deliberation. It's a separate process opt
 /deliberate --brainstorm --members assumption-breaker,reframer,pragmatic-builder "new pricing model"
 /deliberate --brainstorm --triad product "what features should v2 include?"
 /deliberate --brainstorm --visual "landing page redesign"
+/deliberate --brainstorm --research "what are the best approaches to financial data lineage?"
+/deliberate --brainstorm --research=code "how should we refactor this module?"
+/deliberate --brainstorm --research=web "new data governance frameworks we should consider"
+```
+On Windsurf:
+```
+@deliberate brainstorm with research: what are the best approaches to financial data lineage?
+@deliberate brainstorm look at the codebase first: how should we refactor the ingestion pipeline?
+@deliberate brainstorm search the web: new data governance frameworks we should consider
 ```
 ## HARD GATE
@@ -35,6 +45,25 @@ Read the relevant context before asking any questions:
 Produce a brief context summary (3-5 sentences) so the user can confirm you understand the space.
+#### Research Grounding (optional — `--research` flag only)
+If `--research`, `--research=web`, or `--research=code` is set (or natural language on Windsurf: "with research", "search the web", "look at the codebase first", "do research first"):
+1. Read `protocols/research-grounding.md` for the full protocol
+2. Execute Steps R1–R4 **before** asking clarifying questions in Phase 3
+3. Include the Codebase Context Summary and/or Web Research Summary in the context summary presented to the user
+4. Inject grounding evidence into every agent prompt in Phase 5 (Divergent Phase)
+The grounding evidence supplements — but does not replace — the clarifying questions in Phase 3. Agents still receive the clarified scope from Phase 3 alongside the research findings.
+Announce to the user:
+```
+Research mode active. I'll scan [the codebase / the web / both] before we start.
+This gives the agents real evidence to reason from, not just parametric knowledge.
+```
+If `--research` is NOT set, do not spontaneously research. Proceed with standard context exploration only.
 ### Phase 2: Visual Companion Offer
 If this brainstorm is likely to involve visual content (UI, architecture diagrams, mockups, design comparisons), offer the visual companion:

package/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,24 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.2.4] - 2025-04-03
+### Added
+- `assets/research_grounding.png`: diagram explaining the `--research` grounding flow — pipeline stages R1–R5, codebase scan vs web search breakdown, without/with contrast panel
+- Embedded `research_grounding.png` in README at the `--research` section
+## [0.2.3] - 2025-04-03
+### Added
+- `--research` flag: opt-in grounding phase before Round 1 — scans codebase and/or searches the web before agents analyze
+- `--research=web`: web search only variant
+- `--research=code`: codebase scan only variant
+- `protocols/research-grounding.md`: full grounding protocol (search strategy, codebase scan levels, scope limits, evidence quality rules)
+- Research grounding integrated into SKILL.md coordinator sequence (Step 0), Quick mode, Duo mode
+- Research grounding integrated into BRAINSTORM.md Phase 1
+- Natural language Windsurf triggers: "with research", "search the web", "look at the codebase first", "do research first"
+- README: Research Grounding section with dual-platform examples, grounding phase description, evidence rules
 ## [0.2.1] - 2025-04-03
 ### Added

package/README.md CHANGED Viewed

@@ -282,8 +282,75 @@ Pre-defined agent groups for common scenarios. Use when you don't want to pick i
 | `--profile {name}` | Use named profile (full, lean, exploration, execution) | `/deliberate --profile exploration "question"` |
 | `--visual` | Launch browser-based visual companion | `/deliberate --visual --full "question"` |
 | `--save {slug}` | Override auto-generated filename slug for output | `/deliberate --save my-decision "question"` |
+| `--research` | Grounding phase before Round 1: scan codebase + search web. Agents reason from retrieved evidence. **Opt-in only.** | `/deliberate --research "question"` |
+| `--research=web` | Web search only (no codebase scan) | `/deliberate --research=web "question"` |
+| `--research=code` | Codebase scan only (no web search) | `/deliberate --research=code "question"` |
-> **Windsurf/Cursor note:** Flags are expressed as natural language after `@deliberate`. The skill parses intent from your message. Examples: `@deliberate full deliberation: ...`, `@deliberate quick: ...`, `@deliberate architecture triad: ...`, `@deliberate brainstorm: ...`.
+> **Windsurf/Cursor note:** Flags are expressed as natural language after `@deliberate`. The skill parses intent from your message. Examples: `@deliberate full deliberation: ...`, `@deliberate quick: ...`, `@deliberate architecture triad: ...`, `@deliberate brainstorm: ...`, `@deliberate with research: ...`, `@deliberate search the web before answering: ...`, `@deliberate look at the codebase first: ...`.
+---
+## Research Grounding (`--research`)
+By default, agents reason from **parametric knowledge** — what the model internalized during training. This works well for timeless architectural reasoning. It breaks down for:
+- **Your codebase**: agents don't know what's in it unless they read it
+- **Recent events**: model cutoffs miss new library releases, regulatory changes, pricing shifts, CVEs
+- **Current benchmarks**: performance comparisons from 18 months ago may be reversed today
+- **Competitor landscape**: what Bloomberg, Palantir, or a new startup shipped last quarter
+`--research` adds a **grounding phase before Round 1**. The coordinator scans your codebase and/or searches the web, packages the findings into a Codebase Context Summary and Web Research Summary, and injects this evidence into every agent's prompt. Agents then reason from real, retrieved facts — and are required to cite what they found and flag what was missing.
+**This is opt-in only. It is never the default.** Not every question needs research — and research consumes tokens. Use it when parametric knowledge is insufficient.
+<p align="center">
+  <img src="assets/research_grounding.png" alt="Research Grounding — --research flag" width="100%" />
+</p>
+### Research modes
+**Claude Code:**
+```
+/deliberate --research "should we adopt Apache Iceberg for our data lake?"
+/deliberate --research=web "what are the current alternatives to dbt for data transformation?"
+/deliberate --research=code "is our current ingestion pipeline a good candidate for refactoring?"
+/deliberate --triad architecture --research "should we migrate our REST APIs to GraphQL?"
+/deliberate --brainstorm --research "what data quality monitoring approaches should we consider?"
+```
+**Windsurf / Cursor:**
+```
+@deliberate with research: should we adopt Apache Iceberg for our data lake?
+@deliberate search the web before answering: what are the current alternatives to dbt for data transformation?
+@deliberate look at the codebase first: is our ingestion pipeline a good candidate for refactoring?
+@deliberate architecture triad with research: should we migrate our REST APIs to GraphQL?
+@deliberate brainstorm do research first: what data quality monitoring approaches should we consider?
+```
+### What the grounding phase does
+**Codebase scan** (activated by `--research` or `--research=code`):
+1. Reads `AGENTS.md`, project README, top-level structure
+2. Reads relevant entry points, dependency manifests, config files
+3. Follows specific files surfaced by the question domain (auth layers for security questions, hot paths for performance, etc.)
+4. Caps at **10 files maximum** — precision over volume
+5. Produces a **Codebase Context Summary** shared with all agents
+**Web search** (activated by `--research` or `--research=web`):
+1. Runs **5 targeted searches maximum** — bad searches waste context
+2. Prioritizes recency (last 12 months) and primary sources (official docs, engineering blogs, CVE databases, academic papers)
+3. Produces a **Web Research Summary** with findings + source URLs + recency note
+4. `assumption-breaker` is explicitly tasked with scrutinizing source quality
+### Grounding evidence rules
+Agents treat grounding evidence as **facts to reason from, not conclusions to accept**:
+- They must cite specific findings ("the codebase currently uses library X at version Y, which has Z implication")
+- They must flag gaps ("the web research didn't find benchmark data for >10M records/day — my analysis is therefore first-principles")
+- `assumption-breaker` scrutinizes source quality and flags cherry-picked or outdated evidence
+- Agents that over-rely on evidence get flagged in the coordinator's Round 2 dispatch
+If the platform has no web search or file access tools, the coordinator announces the limitation and falls back gracefully to parametric knowledge.
 ---

package/SKILL.md CHANGED Viewed

@@ -35,6 +35,9 @@ description: "Multi-agent deliberation skill. Forces structured disagreement to
 | `--profile {name}` | Use named profile (full, lean, exploration, execution). |
 | `--visual` | Launch visual companion for this session. |
 | `--save {slug}` | Override auto-generated filename slug for output. |
+| `--research` | Run a grounding phase before Round 1: scan the codebase and/or search the web. Agents reason from retrieved evidence, not parametric knowledge alone. **Opt-in only — never the default.** See `protocols/research-grounding.md`. |
+| `--research=web` | Web search only (no codebase scan). |
+| `--research=code` | Codebase scan only (no web search). |
 ## The 14 Core Agents
@@ -117,14 +120,27 @@ These agents are structural counterweights. When both are present, genuine disag
 The coordinator is the orchestration layer. It does NOT have its own opinion. It routes, enforces protocol, and synthesizes.
-### Step 0: Platform Detection
+### Step 0: Research Grounding (optional — `--research` flag only)
+If `--research`, `--research=web`, or `--research=code` is set (or natural language equivalents on Windsurf: "do research first", "search the web", "look at the codebase first", "with research"):
+1. Read the full protocol at `protocols/research-grounding.md`
+2. Execute Steps R1–R4 from that protocol before dispatching any agents
+3. Package the grounding context (Codebase Context Summary and/or Web Research Summary) for injection into all agent prompts in Step 5
+4. After Round 1, run Step R5 (evidence quality check)
+If `--research` is NOT set, skip this step entirely. Do not spontaneously research unless the user explicitly requests it.
+---
+### Step 1: Platform Detection
 Read `configs/defaults.yaml` to determine:
 - **Claude Code**: Use parallel subagents. Each agent gets its own context window.
 - **Windsurf**: Use sequential role-prompting within single context. Default to "lean" profile unless user overrides.
 - **Other**: Fall back to sequential mode.
-### Step 1: Problem Restatement
+### Step 2: Problem Restatement
 Before any agent speaks, the coordinator restates the user's question in neutral, precise terms. This prevents framing bias from the original question.
@@ -133,7 +149,7 @@ Before any agent speaks, the coordinator restates the user's question in neutral
 {Neutral restatement of the user's question, stripped of loaded language}
 ```
-### Step 2: Agent Selection
+### Step 3: Agent Selection
 Based on flags:
 - `--full`: All 14 core agents
@@ -145,7 +161,7 @@ Based on flags:
 If auto-detection is ambiguous, present the top 2-3 triad matches and let the user choose.
-### Step 3: Model Routing
+### Step 4: Model Routing
 Read `configs/defaults.yaml` for tier mapping:
 - Agents marked `model: high` get the high-tier model
@@ -154,7 +170,7 @@ Read `configs/defaults.yaml` for tier mapping:
 If `configs/provider-model-slots.yaml` exists, use manual overrides instead of auto-routing.
-### Step 4: Visual Companion (optional)
+### Step 5: Visual Companion (optional)
 If `--visual` flag is set, or if the user has previously accepted the visual companion in this session:
 1. Launch `scripts/start-server.sh --project-dir {project_root}`
@@ -162,7 +178,7 @@ If `--visual` flag is set, or if the user has previously accepted the visual com
 3. Tell user to open the URL
 4. Visual companion will be updated after each round
-### Step 5: Round 1 -- Independent Analysis
+### Step 6: Round 1 -- Independent Analysis
 Each selected agent receives:
 ```
@@ -170,7 +186,12 @@ Each selected agent receives:
 You are {agent_name}. Read your full agent definition at agents/{agent_name}.md.
 ## Problem
-{Problem restatement from Step 1}
+{Problem restatement from Step 2}
+## Grounding Evidence [only present if --research is active]
+{Codebase Context Summary and/or Web Research Summary from Step 0}
+Use this evidence in your analysis. You may cite specific findings.
+If the evidence is insufficient, note what additional information would change your position.
 ## Instructions
 Produce your independent analysis following your Analytical Method.
@@ -181,7 +202,7 @@ Follow your Standalone Output Format.
 **Claude Code execution**: Launch all agents as parallel subagents. Wait for all to complete.
 **Windsurf execution**: Prompt each agent role sequentially. Collect all outputs before proceeding.
-### Step 6: Round 2 -- Cross-Examination
+### Step 7: Round 2 -- Cross-Examination
 Each agent receives ALL Round 1 outputs and must respond:
@@ -209,7 +230,7 @@ You MUST disagree with at least one position.
 **Claude Code execution**: Sequential (each agent needs to see prior cross-examinations for richer engagement).
 **Windsurf execution**: Sequential.
-### Step 7: Enforcement Scans
+### Step 8: Enforcement Scans
 After Round 2, the coordinator checks:
@@ -220,7 +241,7 @@ After Round 2, the coordinator checks:
 5. **Novelty gate**: Did Round 2 introduce at least one idea not present in Round 1? If not, flag stale deliberation.
 6. **Groupthink flag**: If ALL agents agree on the core position, flag: "Unanimous agreement may indicate groupthink. Consider invoking `risk-analyst` or `assumption-breaker` standalone for a second opinion."
-### Step 8: Round 3 -- Crystallization
+### Step 9: Round 3 -- Crystallization
 Each agent states their FINAL position:
@@ -233,7 +254,7 @@ If you changed your mind during cross-examination, state the new position and wh
 **Execution**: Parallel (Claude Code) or sequential (Windsurf). No interaction needed.
-### Step 9: Verdict Synthesis
+### Step 10: Verdict Synthesis
 The coordinator synthesizes all three rounds into a structured verdict:
@@ -278,7 +299,7 @@ The coordinator synthesizes all three rounds into a structured verdict:
 {Questions raised during deliberation that were not resolved}
 ```
-### Step 10: Save Output
+### Step 11: Save Output
 Save the full deliberation record to `deliberations/YYYY-MM-DD-HH-MM-{mode}-{slug}.md`.
@@ -290,10 +311,10 @@ If visual companion is active, push the verdict formation view to the browser.
 Quick mode skips Round 2 (cross-examination):
-1. Steps 0-5 (same as full)
-2. Skip Steps 6-7
-3. Step 8: Crystallization (agents state final position based only on seeing Round 1 outputs)
-4. Steps 9-10 (same as full)
+1. Steps 0-6 (same as full, including research grounding if `--research` is set)
+2. Skip Steps 7-8
+3. Step 9: Crystallization (agents state final position based only on seeing Round 1 outputs)
+4. Steps 10-11 (same as full)
 Quick mode is faster and cheaper but produces less refined disagreement. Use for time-sensitive decisions where the diversity of initial perspectives is more valuable than deep cross-examination.
@@ -303,14 +324,14 @@ Quick mode is faster and cheaper but produces less refined disagreement. Use for
 Duo mode runs two agents in dialectic:
-1. Steps 0-4 (same as full, but only 2 agents)
+1. Steps 0-5 (same as full, but only 2 agents, including research grounding if `--research` is set)
 2. **Exchange Round 1**: Agent A states position (400 words). Agent B states position (400 words).
 3. **Exchange Round 2**: Agent A responds to Agent B (300 words). Agent B responds to Agent A (300 words).
 4. **Synthesis**: Coordinator synthesizes the exchange into a verdict highlighting:
    - Where the two agents agree
    - Where they disagree and why
    - What the user should weigh when deciding
-5. Step 10 (save output)
+5. Step 11 (save output)
 Duo mode is ideal for binary decisions ("should we do X or not?"). The polarity pairs table above lists the most productive pairings.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@faviovazquez/deliberate",
-  "version": "0.2.1",
+  "version": "0.2.4",
   "description": "Multi-agent deliberation skill for AI coding assistants. Agreement is a bug.",
   "license": "MIT",
   "author": "Favio Vazquez",
@@ -27,7 +27,7 @@
     "brainstorming"
   ],
   "bin": {
-    "deliberate": "./bin/cli.js"
+    "deliberate": "bin/cli.js"
   },
   "files": [
     "bin/",