@faviovazquez/deliberate 0.2.2 → 0.2.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/BRAINSTORM.md +29 -0
- package/CHANGELOG.md +23 -0
- package/README.md +68 -1
- package/SKILL.md +39 -18
- package/assets/agents_wheel.png +0 -0
- package/assets/banner.png +0 -0
- package/assets/enforcement.png +0 -0
- package/assets/modes_overview.png +0 -0
- package/assets/research_grounding.png +0 -0
- package/assets/sycophancy.png +0 -0
- package/assets/triad_map.png +0 -0
- package/assets/verdict.png +0 -0
- package/package.json +3 -1
- package/protocols/research-grounding.md +183 -0
package/BRAINSTORM.md
CHANGED
|
@@ -14,6 +14,16 @@ Brainstorm mode is not a stripped-down deliberation. It's a separate process opt
|
|
|
14
14
|
/deliberate --brainstorm --members assumption-breaker,reframer,pragmatic-builder "new pricing model"
|
|
15
15
|
/deliberate --brainstorm --triad product "what features should v2 include?"
|
|
16
16
|
/deliberate --brainstorm --visual "landing page redesign"
|
|
17
|
+
/deliberate --brainstorm --research "what are the best approaches to financial data lineage?"
|
|
18
|
+
/deliberate --brainstorm --research=code "how should we refactor this module?"
|
|
19
|
+
/deliberate --brainstorm --research=web "new data governance frameworks we should consider"
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
On Windsurf:
|
|
23
|
+
```
|
|
24
|
+
@deliberate brainstorm with research: what are the best approaches to financial data lineage?
|
|
25
|
+
@deliberate brainstorm look at the codebase first: how should we refactor the ingestion pipeline?
|
|
26
|
+
@deliberate brainstorm search the web: new data governance frameworks we should consider
|
|
17
27
|
```
|
|
18
28
|
|
|
19
29
|
## HARD GATE
|
|
@@ -35,6 +45,25 @@ Read the relevant context before asking any questions:
|
|
|
35
45
|
|
|
36
46
|
Produce a brief context summary (3-5 sentences) so the user can confirm you understand the space.
|
|
37
47
|
|
|
48
|
+
#### Research Grounding (optional — `--research` flag only)
|
|
49
|
+
|
|
50
|
+
If `--research`, `--research=web`, or `--research=code` is set (or natural language on Windsurf: "with research", "search the web", "look at the codebase first", "do research first"):
|
|
51
|
+
|
|
52
|
+
1. Read `protocols/research-grounding.md` for the full protocol
|
|
53
|
+
2. Execute Steps R1–R4 **before** asking clarifying questions in Phase 3
|
|
54
|
+
3. Include the Codebase Context Summary and/or Web Research Summary in the context summary presented to the user
|
|
55
|
+
4. Inject grounding evidence into every agent prompt in Phase 5 (Divergent Phase)
|
|
56
|
+
|
|
57
|
+
The grounding evidence supplements — but does not replace — the clarifying questions in Phase 3. Agents still receive the clarified scope from Phase 3 alongside the research findings.
|
|
58
|
+
|
|
59
|
+
Announce to the user:
|
|
60
|
+
```
|
|
61
|
+
Research mode active. I'll scan [the codebase / the web / both] before we start.
|
|
62
|
+
This gives the agents real evidence to reason from, not just parametric knowledge.
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
If `--research` is NOT set, do not spontaneously research. Proceed with standard context exploration only.
|
|
66
|
+
|
|
38
67
|
### Phase 2: Visual Companion Offer
|
|
39
68
|
|
|
40
69
|
If this brainstorm is likely to involve visual content (UI, architecture diagrams, mockups, design comparisons), offer the visual companion:
|
package/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,29 @@ All notable changes to this project will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [0.2.5] - 2025-04-03
|
|
9
|
+
|
|
10
|
+
### Fixed
|
|
11
|
+
- Added `protocols/` and `assets/` to npm `files` array — both were missing from published packages, causing `protocols/research-grounding.md` and all README images to be absent after install
|
|
12
|
+
|
|
13
|
+
## [0.2.4] - 2025-04-03
|
|
14
|
+
|
|
15
|
+
### Added
|
|
16
|
+
- `assets/research_grounding.png`: diagram explaining the `--research` grounding flow — pipeline stages R1–R5, codebase scan vs web search breakdown, without/with contrast panel
|
|
17
|
+
- Embedded `research_grounding.png` in README at the `--research` section
|
|
18
|
+
|
|
19
|
+
## [0.2.3] - 2025-04-03
|
|
20
|
+
|
|
21
|
+
### Added
|
|
22
|
+
- `--research` flag: opt-in grounding phase before Round 1 — scans codebase and/or searches the web before agents analyze
|
|
23
|
+
- `--research=web`: web search only variant
|
|
24
|
+
- `--research=code`: codebase scan only variant
|
|
25
|
+
- `protocols/research-grounding.md`: full grounding protocol (search strategy, codebase scan levels, scope limits, evidence quality rules)
|
|
26
|
+
- Research grounding integrated into SKILL.md coordinator sequence (Step 0), Quick mode, Duo mode
|
|
27
|
+
- Research grounding integrated into BRAINSTORM.md Phase 1
|
|
28
|
+
- Natural language Windsurf triggers: "with research", "search the web", "look at the codebase first", "do research first"
|
|
29
|
+
- README: Research Grounding section with dual-platform examples, grounding phase description, evidence rules
|
|
30
|
+
|
|
8
31
|
## [0.2.1] - 2025-04-03
|
|
9
32
|
|
|
10
33
|
### Added
|
package/README.md
CHANGED
|
@@ -282,8 +282,75 @@ Pre-defined agent groups for common scenarios. Use when you don't want to pick i
|
|
|
282
282
|
| `--profile {name}` | Use named profile (full, lean, exploration, execution) | `/deliberate --profile exploration "question"` |
|
|
283
283
|
| `--visual` | Launch browser-based visual companion | `/deliberate --visual --full "question"` |
|
|
284
284
|
| `--save {slug}` | Override auto-generated filename slug for output | `/deliberate --save my-decision "question"` |
|
|
285
|
+
| `--research` | Grounding phase before Round 1: scan codebase + search web. Agents reason from retrieved evidence. **Opt-in only.** | `/deliberate --research "question"` |
|
|
286
|
+
| `--research=web` | Web search only (no codebase scan) | `/deliberate --research=web "question"` |
|
|
287
|
+
| `--research=code` | Codebase scan only (no web search) | `/deliberate --research=code "question"` |
|
|
285
288
|
|
|
286
|
-
> **Windsurf/Cursor note:** Flags are expressed as natural language after `@deliberate`. The skill parses intent from your message. Examples: `@deliberate full deliberation: ...`, `@deliberate quick: ...`, `@deliberate architecture triad: ...`, `@deliberate brainstorm: ...`.
|
|
289
|
+
> **Windsurf/Cursor note:** Flags are expressed as natural language after `@deliberate`. The skill parses intent from your message. Examples: `@deliberate full deliberation: ...`, `@deliberate quick: ...`, `@deliberate architecture triad: ...`, `@deliberate brainstorm: ...`, `@deliberate with research: ...`, `@deliberate search the web before answering: ...`, `@deliberate look at the codebase first: ...`.
|
|
290
|
+
|
|
291
|
+
---
|
|
292
|
+
|
|
293
|
+
## Research Grounding (`--research`)
|
|
294
|
+
|
|
295
|
+
By default, agents reason from **parametric knowledge** — what the model internalized during training. This works well for timeless architectural reasoning. It breaks down for:
|
|
296
|
+
|
|
297
|
+
- **Your codebase**: agents don't know what's in it unless they read it
|
|
298
|
+
- **Recent events**: model cutoffs miss new library releases, regulatory changes, pricing shifts, CVEs
|
|
299
|
+
- **Current benchmarks**: performance comparisons from 18 months ago may be reversed today
|
|
300
|
+
- **Competitor landscape**: what Bloomberg, Palantir, or a new startup shipped last quarter
|
|
301
|
+
|
|
302
|
+
`--research` adds a **grounding phase before Round 1**. The coordinator scans your codebase and/or searches the web, packages the findings into a Codebase Context Summary and Web Research Summary, and injects this evidence into every agent's prompt. Agents then reason from real, retrieved facts — and are required to cite what they found and flag what was missing.
|
|
303
|
+
|
|
304
|
+
**This is opt-in only. It is never the default.** Not every question needs research — and research consumes tokens. Use it when parametric knowledge is insufficient.
|
|
305
|
+
|
|
306
|
+
<p align="center">
|
|
307
|
+
<img src="assets/research_grounding.png" alt="Research Grounding — --research flag" width="100%" />
|
|
308
|
+
</p>
|
|
309
|
+
|
|
310
|
+
### Research modes
|
|
311
|
+
|
|
312
|
+
**Claude Code:**
|
|
313
|
+
```
|
|
314
|
+
/deliberate --research "should we adopt Apache Iceberg for our data lake?"
|
|
315
|
+
/deliberate --research=web "what are the current alternatives to dbt for data transformation?"
|
|
316
|
+
/deliberate --research=code "is our current ingestion pipeline a good candidate for refactoring?"
|
|
317
|
+
/deliberate --triad architecture --research "should we migrate our REST APIs to GraphQL?"
|
|
318
|
+
/deliberate --brainstorm --research "what data quality monitoring approaches should we consider?"
|
|
319
|
+
```
|
|
320
|
+
|
|
321
|
+
**Windsurf / Cursor:**
|
|
322
|
+
```
|
|
323
|
+
@deliberate with research: should we adopt Apache Iceberg for our data lake?
|
|
324
|
+
@deliberate search the web before answering: what are the current alternatives to dbt for data transformation?
|
|
325
|
+
@deliberate look at the codebase first: is our ingestion pipeline a good candidate for refactoring?
|
|
326
|
+
@deliberate architecture triad with research: should we migrate our REST APIs to GraphQL?
|
|
327
|
+
@deliberate brainstorm do research first: what data quality monitoring approaches should we consider?
|
|
328
|
+
```
|
|
329
|
+
|
|
330
|
+
### What the grounding phase does
|
|
331
|
+
|
|
332
|
+
**Codebase scan** (activated by `--research` or `--research=code`):
|
|
333
|
+
1. Reads `AGENTS.md`, project README, top-level structure
|
|
334
|
+
2. Reads relevant entry points, dependency manifests, config files
|
|
335
|
+
3. Follows specific files surfaced by the question domain (auth layers for security questions, hot paths for performance, etc.)
|
|
336
|
+
4. Caps at **10 files maximum** — precision over volume
|
|
337
|
+
5. Produces a **Codebase Context Summary** shared with all agents
|
|
338
|
+
|
|
339
|
+
**Web search** (activated by `--research` or `--research=web`):
|
|
340
|
+
1. Runs **5 targeted searches maximum** — bad searches waste context
|
|
341
|
+
2. Prioritizes recency (last 12 months) and primary sources (official docs, engineering blogs, CVE databases, academic papers)
|
|
342
|
+
3. Produces a **Web Research Summary** with findings + source URLs + recency note
|
|
343
|
+
4. `assumption-breaker` is explicitly tasked with scrutinizing source quality
|
|
344
|
+
|
|
345
|
+
### Grounding evidence rules
|
|
346
|
+
|
|
347
|
+
Agents treat grounding evidence as **facts to reason from, not conclusions to accept**:
|
|
348
|
+
- They must cite specific findings ("the codebase currently uses library X at version Y, which has Z implication")
|
|
349
|
+
- They must flag gaps ("the web research didn't find benchmark data for >10M records/day — my analysis is therefore first-principles")
|
|
350
|
+
- `assumption-breaker` scrutinizes source quality and flags cherry-picked or outdated evidence
|
|
351
|
+
- Agents that over-rely on evidence get flagged in the coordinator's Round 2 dispatch
|
|
352
|
+
|
|
353
|
+
If the platform has no web search or file access tools, the coordinator announces the limitation and falls back gracefully to parametric knowledge.
|
|
287
354
|
|
|
288
355
|
---
|
|
289
356
|
|
package/SKILL.md
CHANGED
|
@@ -35,6 +35,9 @@ description: "Multi-agent deliberation skill. Forces structured disagreement to
|
|
|
35
35
|
| `--profile {name}` | Use named profile (full, lean, exploration, execution). |
|
|
36
36
|
| `--visual` | Launch visual companion for this session. |
|
|
37
37
|
| `--save {slug}` | Override auto-generated filename slug for output. |
|
|
38
|
+
| `--research` | Run a grounding phase before Round 1: scan the codebase and/or search the web. Agents reason from retrieved evidence, not parametric knowledge alone. **Opt-in only — never the default.** See `protocols/research-grounding.md`. |
|
|
39
|
+
| `--research=web` | Web search only (no codebase scan). |
|
|
40
|
+
| `--research=code` | Codebase scan only (no web search). |
|
|
38
41
|
|
|
39
42
|
## The 14 Core Agents
|
|
40
43
|
|
|
@@ -117,14 +120,27 @@ These agents are structural counterweights. When both are present, genuine disag
|
|
|
117
120
|
|
|
118
121
|
The coordinator is the orchestration layer. It does NOT have its own opinion. It routes, enforces protocol, and synthesizes.
|
|
119
122
|
|
|
120
|
-
### Step 0:
|
|
123
|
+
### Step 0: Research Grounding (optional — `--research` flag only)
|
|
124
|
+
|
|
125
|
+
If `--research`, `--research=web`, or `--research=code` is set (or natural language equivalents on Windsurf: "do research first", "search the web", "look at the codebase first", "with research"):
|
|
126
|
+
|
|
127
|
+
1. Read the full protocol at `protocols/research-grounding.md`
|
|
128
|
+
2. Execute Steps R1–R4 from that protocol before dispatching any agents
|
|
129
|
+
3. Package the grounding context (Codebase Context Summary and/or Web Research Summary) for injection into all agent prompts in Step 5
|
|
130
|
+
4. After Round 1, run Step R5 (evidence quality check)
|
|
131
|
+
|
|
132
|
+
If `--research` is NOT set, skip this step entirely. Do not spontaneously research unless the user explicitly requests it.
|
|
133
|
+
|
|
134
|
+
---
|
|
135
|
+
|
|
136
|
+
### Step 1: Platform Detection
|
|
121
137
|
|
|
122
138
|
Read `configs/defaults.yaml` to determine:
|
|
123
139
|
- **Claude Code**: Use parallel subagents. Each agent gets its own context window.
|
|
124
140
|
- **Windsurf**: Use sequential role-prompting within single context. Default to "lean" profile unless user overrides.
|
|
125
141
|
- **Other**: Fall back to sequential mode.
|
|
126
142
|
|
|
127
|
-
### Step
|
|
143
|
+
### Step 2: Problem Restatement
|
|
128
144
|
|
|
129
145
|
Before any agent speaks, the coordinator restates the user's question in neutral, precise terms. This prevents framing bias from the original question.
|
|
130
146
|
|
|
@@ -133,7 +149,7 @@ Before any agent speaks, the coordinator restates the user's question in neutral
|
|
|
133
149
|
{Neutral restatement of the user's question, stripped of loaded language}
|
|
134
150
|
```
|
|
135
151
|
|
|
136
|
-
### Step
|
|
152
|
+
### Step 3: Agent Selection
|
|
137
153
|
|
|
138
154
|
Based on flags:
|
|
139
155
|
- `--full`: All 14 core agents
|
|
@@ -145,7 +161,7 @@ Based on flags:
|
|
|
145
161
|
|
|
146
162
|
If auto-detection is ambiguous, present the top 2-3 triad matches and let the user choose.
|
|
147
163
|
|
|
148
|
-
### Step
|
|
164
|
+
### Step 4: Model Routing
|
|
149
165
|
|
|
150
166
|
Read `configs/defaults.yaml` for tier mapping:
|
|
151
167
|
- Agents marked `model: high` get the high-tier model
|
|
@@ -154,7 +170,7 @@ Read `configs/defaults.yaml` for tier mapping:
|
|
|
154
170
|
|
|
155
171
|
If `configs/provider-model-slots.yaml` exists, use manual overrides instead of auto-routing.
|
|
156
172
|
|
|
157
|
-
### Step
|
|
173
|
+
### Step 5: Visual Companion (optional)
|
|
158
174
|
|
|
159
175
|
If `--visual` flag is set, or if the user has previously accepted the visual companion in this session:
|
|
160
176
|
1. Launch `scripts/start-server.sh --project-dir {project_root}`
|
|
@@ -162,7 +178,7 @@ If `--visual` flag is set, or if the user has previously accepted the visual com
|
|
|
162
178
|
3. Tell user to open the URL
|
|
163
179
|
4. Visual companion will be updated after each round
|
|
164
180
|
|
|
165
|
-
### Step
|
|
181
|
+
### Step 6: Round 1 -- Independent Analysis
|
|
166
182
|
|
|
167
183
|
Each selected agent receives:
|
|
168
184
|
```
|
|
@@ -170,7 +186,12 @@ Each selected agent receives:
|
|
|
170
186
|
You are {agent_name}. Read your full agent definition at agents/{agent_name}.md.
|
|
171
187
|
|
|
172
188
|
## Problem
|
|
173
|
-
{Problem restatement from Step
|
|
189
|
+
{Problem restatement from Step 2}
|
|
190
|
+
|
|
191
|
+
## Grounding Evidence [only present if --research is active]
|
|
192
|
+
{Codebase Context Summary and/or Web Research Summary from Step 0}
|
|
193
|
+
Use this evidence in your analysis. You may cite specific findings.
|
|
194
|
+
If the evidence is insufficient, note what additional information would change your position.
|
|
174
195
|
|
|
175
196
|
## Instructions
|
|
176
197
|
Produce your independent analysis following your Analytical Method.
|
|
@@ -181,7 +202,7 @@ Follow your Standalone Output Format.
|
|
|
181
202
|
**Claude Code execution**: Launch all agents as parallel subagents. Wait for all to complete.
|
|
182
203
|
**Windsurf execution**: Prompt each agent role sequentially. Collect all outputs before proceeding.
|
|
183
204
|
|
|
184
|
-
### Step
|
|
205
|
+
### Step 7: Round 2 -- Cross-Examination
|
|
185
206
|
|
|
186
207
|
Each agent receives ALL Round 1 outputs and must respond:
|
|
187
208
|
|
|
@@ -209,7 +230,7 @@ You MUST disagree with at least one position.
|
|
|
209
230
|
**Claude Code execution**: Sequential (each agent needs to see prior cross-examinations for richer engagement).
|
|
210
231
|
**Windsurf execution**: Sequential.
|
|
211
232
|
|
|
212
|
-
### Step
|
|
233
|
+
### Step 8: Enforcement Scans
|
|
213
234
|
|
|
214
235
|
After Round 2, the coordinator checks:
|
|
215
236
|
|
|
@@ -220,7 +241,7 @@ After Round 2, the coordinator checks:
|
|
|
220
241
|
5. **Novelty gate**: Did Round 2 introduce at least one idea not present in Round 1? If not, flag stale deliberation.
|
|
221
242
|
6. **Groupthink flag**: If ALL agents agree on the core position, flag: "Unanimous agreement may indicate groupthink. Consider invoking `risk-analyst` or `assumption-breaker` standalone for a second opinion."
|
|
222
243
|
|
|
223
|
-
### Step
|
|
244
|
+
### Step 9: Round 3 -- Crystallization
|
|
224
245
|
|
|
225
246
|
Each agent states their FINAL position:
|
|
226
247
|
|
|
@@ -233,7 +254,7 @@ If you changed your mind during cross-examination, state the new position and wh
|
|
|
233
254
|
|
|
234
255
|
**Execution**: Parallel (Claude Code) or sequential (Windsurf). No interaction needed.
|
|
235
256
|
|
|
236
|
-
### Step
|
|
257
|
+
### Step 10: Verdict Synthesis
|
|
237
258
|
|
|
238
259
|
The coordinator synthesizes all three rounds into a structured verdict:
|
|
239
260
|
|
|
@@ -278,7 +299,7 @@ The coordinator synthesizes all three rounds into a structured verdict:
|
|
|
278
299
|
{Questions raised during deliberation that were not resolved}
|
|
279
300
|
```
|
|
280
301
|
|
|
281
|
-
### Step
|
|
302
|
+
### Step 11: Save Output
|
|
282
303
|
|
|
283
304
|
Save the full deliberation record to `deliberations/YYYY-MM-DD-HH-MM-{mode}-{slug}.md`.
|
|
284
305
|
|
|
@@ -290,10 +311,10 @@ If visual companion is active, push the verdict formation view to the browser.
|
|
|
290
311
|
|
|
291
312
|
Quick mode skips Round 2 (cross-examination):
|
|
292
313
|
|
|
293
|
-
1. Steps 0-
|
|
294
|
-
2. Skip Steps
|
|
295
|
-
3. Step
|
|
296
|
-
4. Steps
|
|
314
|
+
1. Steps 0-6 (same as full, including research grounding if `--research` is set)
|
|
315
|
+
2. Skip Steps 7-8
|
|
316
|
+
3. Step 9: Crystallization (agents state final position based only on seeing Round 1 outputs)
|
|
317
|
+
4. Steps 10-11 (same as full)
|
|
297
318
|
|
|
298
319
|
Quick mode is faster and cheaper but produces less refined disagreement. Use for time-sensitive decisions where the diversity of initial perspectives is more valuable than deep cross-examination.
|
|
299
320
|
|
|
@@ -303,14 +324,14 @@ Quick mode is faster and cheaper but produces less refined disagreement. Use for
|
|
|
303
324
|
|
|
304
325
|
Duo mode runs two agents in dialectic:
|
|
305
326
|
|
|
306
|
-
1. Steps 0-
|
|
327
|
+
1. Steps 0-5 (same as full, but only 2 agents, including research grounding if `--research` is set)
|
|
307
328
|
2. **Exchange Round 1**: Agent A states position (400 words). Agent B states position (400 words).
|
|
308
329
|
3. **Exchange Round 2**: Agent A responds to Agent B (300 words). Agent B responds to Agent A (300 words).
|
|
309
330
|
4. **Synthesis**: Coordinator synthesizes the exchange into a verdict highlighting:
|
|
310
331
|
- Where the two agents agree
|
|
311
332
|
- Where they disagree and why
|
|
312
333
|
- What the user should weigh when deciding
|
|
313
|
-
5. Step
|
|
334
|
+
5. Step 11 (save output)
|
|
314
335
|
|
|
315
336
|
Duo mode is ideal for binary decisions ("should we do X or not?"). The polarity pairs table above lists the most productive pairings.
|
|
316
337
|
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@faviovazquez/deliberate",
|
|
3
|
-
"version": "0.2.
|
|
3
|
+
"version": "0.2.5",
|
|
4
4
|
"description": "Multi-agent deliberation skill for AI coding assistants. Agreement is a bug.",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"author": "Favio Vazquez",
|
|
@@ -32,7 +32,9 @@
|
|
|
32
32
|
"files": [
|
|
33
33
|
"bin/",
|
|
34
34
|
"agents/",
|
|
35
|
+
"assets/",
|
|
35
36
|
"configs/",
|
|
37
|
+
"protocols/",
|
|
36
38
|
"scripts/",
|
|
37
39
|
"templates/",
|
|
38
40
|
"SKILL.md",
|
|
@@ -0,0 +1,183 @@
|
|
|
1
|
+
# Research Grounding Protocol
|
|
2
|
+
|
|
3
|
+
This protocol is injected into deliberations and brainstorms when the `--research` flag is active. It runs **before Round 1 / Divergent Phase** and grounds each agent's analysis in real, retrieved evidence rather than parametric knowledge alone.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## When This Protocol Activates
|
|
8
|
+
|
|
9
|
+
Only when the user explicitly passes `--research` (or natural language equivalent on Windsurf: "with research", "do research first", "search before answering").
|
|
10
|
+
|
|
11
|
+
This is **opt-in only**. It is never the default. The flag exists because:
|
|
12
|
+
- Agents have a training cutoff — recent events, new libraries, current prices, live APIs are not in parametric memory
|
|
13
|
+
- The user's codebase is not in parametric memory — agents must read it to understand the actual context
|
|
14
|
+
- Some decisions require current external data (benchmark results, competitor pricing, regulatory text, recent CVEs, API specs)
|
|
15
|
+
|
|
16
|
+
Without `--research`, agents reason from internalized knowledge. With `--research`, agents ground first, reason second.
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## Coordinator Instructions: Pre-Round Grounding Phase
|
|
21
|
+
|
|
22
|
+
### Step R1: Classify the Research Needs
|
|
23
|
+
|
|
24
|
+
Before dispatching agents, the coordinator classifies what kind of grounding this deliberation needs. Announce this to the user:
|
|
25
|
+
|
|
26
|
+
```
|
|
27
|
+
Research mode active. Before agents analyze, I'll gather grounding context.
|
|
28
|
+
Classifying research needs for: "{problem restatement}"
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
Determine which of the following apply:
|
|
32
|
+
|
|
33
|
+
| Type | When to activate | Tools to use |
|
|
34
|
+
|------|-----------------|--------------|
|
|
35
|
+
| **Codebase scan** | Question involves existing code, architecture, tech debt, dependencies | Read files, grep, list dirs |
|
|
36
|
+
| **Web search** | Question involves current technology landscape, recent events, pricing, benchmarks, regulatory changes, competitor moves | Web search |
|
|
37
|
+
| **Documentation lookup** | Question involves a specific tool, library, API, or framework | Web search targeting official docs |
|
|
38
|
+
| **Both** | Complex technical decisions that involve current state of the codebase AND the external landscape | Both |
|
|
39
|
+
|
|
40
|
+
Announce what you're about to do:
|
|
41
|
+
```
|
|
42
|
+
I'll do a [codebase scan / web search / documentation lookup / full research pass] before agents speak.
|
|
43
|
+
This may take a moment. The grounding evidence will be shared with all agents.
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
### Step R2: Codebase Scan (if applicable)
|
|
49
|
+
|
|
50
|
+
If codebase research is needed, run the following in order, stopping when you have enough context:
|
|
51
|
+
|
|
52
|
+
**Level 1 — Orientation** (always run if codebase scan is active):
|
|
53
|
+
```
|
|
54
|
+
Read: AGENTS.md (if exists) — understand project soul and current state
|
|
55
|
+
Read: .planning/PROJECT.md or README.md — understand what's being built
|
|
56
|
+
List: top-level directory structure
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
**Level 2 — Focused scan** (run based on question domain):
|
|
60
|
+
- Architecture questions → read main entry points, config files, dependency manifests (`package.json`, `pyproject.toml`, `go.mod`, `pom.xml`, `Cargo.toml`)
|
|
61
|
+
- Debugging questions → read the failing file + its direct imports + relevant test files
|
|
62
|
+
- Tech debt questions → search for TODO/FIXME/HACK comments, read the most recently modified files
|
|
63
|
+
- Performance questions → read the hot path files + any profiling configs
|
|
64
|
+
- Security questions → read auth/middleware layers, dependency files for known vulnerable versions
|
|
65
|
+
|
|
66
|
+
**Level 3 — Deep dive** (only if Level 2 reveals something specific to pursue):
|
|
67
|
+
- Follow the specific file/module/function surfaced in Level 2
|
|
68
|
+
|
|
69
|
+
Cap the codebase scan at **10 files maximum**. If the codebase is larger, surface what you read and note what you skipped.
|
|
70
|
+
|
|
71
|
+
Produce a **Codebase Context Summary**:
|
|
72
|
+
```markdown
|
|
73
|
+
## Codebase Context
|
|
74
|
+
- Stack: {languages, frameworks, key libraries with versions}
|
|
75
|
+
- Relevant files read: {list}
|
|
76
|
+
- Current state relevant to the question: {3-5 bullet points of facts found}
|
|
77
|
+
- Gaps: {what exists in the codebase that wasn't read, if relevant}
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
### Step R3: Web Search (if applicable)
|
|
83
|
+
|
|
84
|
+
Perform targeted searches. Each search should be precise — bad searches waste context.
|
|
85
|
+
|
|
86
|
+
**Search strategy by question type:**
|
|
87
|
+
|
|
88
|
+
| Question type | Search queries to run |
|
|
89
|
+
|--------------|----------------------|
|
|
90
|
+
| Technology choice (A vs B) | "{tech A} vs {tech B} {year}" + "{tech A} production issues {year}" + "{tech B} benchmark {year}" |
|
|
91
|
+
| Library/framework | "{library name} changelog" + "{library name} known issues" + "{library name} alternatives {year}" |
|
|
92
|
+
| Architecture pattern | "{pattern name} at scale case study" + "{pattern name} failure modes" |
|
|
93
|
+
| Security/compliance | "{CVE or regulation} {year}" + "{tool name} security audit" |
|
|
94
|
+
| Pricing/vendor | "{vendor} pricing {year}" + "{vendor} enterprise SLA" |
|
|
95
|
+
| Competitor landscape | "{domain} tools comparison {year}" + "{competitor} recent updates" |
|
|
96
|
+
|
|
97
|
+
Run a **maximum of 5 searches**. Prioritize recency (last 12 months) and primary sources (official docs, engineering blogs, academic papers, CVE databases).
|
|
98
|
+
|
|
99
|
+
Produce a **Web Research Summary**:
|
|
100
|
+
```markdown
|
|
101
|
+
## Web Research
|
|
102
|
+
- Searches run: {list of queries}
|
|
103
|
+
- Key findings: {bullet points of facts found, each with source URL}
|
|
104
|
+
- Recency note: {newest source date found}
|
|
105
|
+
- Gaps: {what couldn't be found or verified}
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
---
|
|
109
|
+
|
|
110
|
+
### Step R4: Distribute Grounding Context to Agents
|
|
111
|
+
|
|
112
|
+
The coordinator packages the grounding evidence and includes it in every agent's Round 1 prompt, as an additional section:
|
|
113
|
+
|
|
114
|
+
```
|
|
115
|
+
## Grounding Evidence
|
|
116
|
+
{Codebase Context Summary if applicable}
|
|
117
|
+
{Web Research Summary if applicable}
|
|
118
|
+
|
|
119
|
+
Use this evidence in your analysis. You may cite specific findings.
|
|
120
|
+
If the evidence is insufficient for your analytical method, note what additional information would change your position.
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
**Critical rule**: Agents must treat grounding evidence as **facts to reason from**, not conclusions to agree with. The `assumption-breaker` in particular should scrutinize the sources and flag any evidence that looks cherry-picked, outdated, or from a biased source.
|
|
124
|
+
|
|
125
|
+
---
|
|
126
|
+
|
|
127
|
+
### Step R5: Evidence Quality Check
|
|
128
|
+
|
|
129
|
+
After agents complete Round 1, the coordinator scans for evidence misuse:
|
|
130
|
+
|
|
131
|
+
- Did any agent **over-cite** grounding evidence (treating search results as definitive when they're preliminary)?
|
|
132
|
+
- Did any agent **ignore** grounding evidence that was directly relevant to their analysis?
|
|
133
|
+
- Did `assumption-breaker` scrutinize the evidence quality, or just accept it?
|
|
134
|
+
|
|
135
|
+
If evidence was over-relied on, add to the coordinator's Round 2 dispatch:
|
|
136
|
+
```
|
|
137
|
+
Note to agents: The grounding evidence is current as of the search date but may be incomplete.
|
|
138
|
+
{agent name} may be over-weighting {specific finding}. Challenge this if your analysis warrants.
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
---
|
|
142
|
+
|
|
143
|
+
## Agent Instructions: Using Grounding Evidence
|
|
144
|
+
|
|
145
|
+
When `--research` is active, each agent receives the grounding context. Agents should:
|
|
146
|
+
|
|
147
|
+
1. **Ground your analysis in the evidence** — reference specific findings from the codebase scan or web research. "The codebase currently uses X version, which has Y implication" is stronger than generic analysis.
|
|
148
|
+
|
|
149
|
+
2. **Flag evidence gaps** — if the grounding evidence doesn't cover what your analytical method needs, say so explicitly. "The web research didn't surface benchmark data for >10M records/day — my risk analysis is therefore based on first principles rather than empirical data."
|
|
150
|
+
|
|
151
|
+
3. **Challenge evidence quality** (especially `assumption-breaker`) — search results can be wrong, outdated, or from biased sources. If you find a reason to distrust a finding, say so.
|
|
152
|
+
|
|
153
|
+
4. **Don't be anchored** — grounding evidence informs but doesn't determine your position. If the evidence points one way but your analytical method points another, hold your position and explain the discrepancy.
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
## Scope Limits
|
|
158
|
+
|
|
159
|
+
The research grounding phase has strict limits to prevent it from becoming a context-consuming sinkhole:
|
|
160
|
+
|
|
161
|
+
| Limit | Value |
|
|
162
|
+
|-------|-------|
|
|
163
|
+
| Max files read (codebase) | 10 |
|
|
164
|
+
| Max searches (web) | 5 |
|
|
165
|
+
| Max tokens in Codebase Context Summary | ~500 |
|
|
166
|
+
| Max tokens in Web Research Summary | ~500 |
|
|
167
|
+
| Max total grounding context per agent | ~1000 tokens |
|
|
168
|
+
|
|
169
|
+
If the research scope would exceed these limits, the coordinator surfaces the most relevant subset and notes what was excluded. More research is not always better — precision matters more than volume.
|
|
170
|
+
|
|
171
|
+
---
|
|
172
|
+
|
|
173
|
+
## Platform Notes
|
|
174
|
+
|
|
175
|
+
**Claude Code**: Agents run as parallel subagents. Each receives the same grounding context package prepared by the coordinator. The coordinator itself does the research before dispatching agents.
|
|
176
|
+
|
|
177
|
+
**Windsurf**: The coordinator does the research inline (using Windsurf's web search and file tools) before role-prompting each agent. The grounding context is included in every agent's prompt.
|
|
178
|
+
|
|
179
|
+
**No tool access**: If the platform has no web search or file reading capability, the coordinator announces:
|
|
180
|
+
```
|
|
181
|
+
Research mode requested but this platform doesn't have [web search / file access] available.
|
|
182
|
+
Falling back to parametric knowledge. Consider providing relevant context manually in your question.
|
|
183
|
+
```
|