@faviovazquez/deliberate 0.2.0 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +8 -0
- package/README.md +466 -94
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,14 @@ All notable changes to this project will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [0.2.1] - 2025-04-03
|
|
9
|
+
|
|
10
|
+
### Added
|
|
11
|
+
- 7 Gemini-generated images for the README (banner, sycophancy diagram, modes overview, agents wheel, triad map, enforcement rules, verdict output)
|
|
12
|
+
- Expanded "The Problem" section with deep sycophancy analysis: confirmation bias amplification, delusional spiraling, simulated balance, hidden trade-offs, context collapse
|
|
13
|
+
- "What deliberate brings to the table" section explaining structural disagreement, forced dissent, minority report, cross-examination, transparent verdicts
|
|
14
|
+
- Images embedded throughout README at relevant sections
|
|
15
|
+
|
|
8
16
|
## [0.2.0] - 2025-04-03
|
|
9
17
|
|
|
10
18
|
### Changed
|
package/README.md
CHANGED
|
@@ -1,163 +1,408 @@
|
|
|
1
|
+
<p align="center">
|
|
2
|
+
<img src="assets/banner.png" alt="deliberate — Agreement is a bug" width="100%" />
|
|
3
|
+
</p>
|
|
4
|
+
|
|
1
5
|
# deliberate
|
|
2
6
|
|
|
3
7
|
**Agreement is a bug.**
|
|
4
8
|
|
|
5
9
|
A multi-agent deliberation and brainstorming skill for AI coding assistants. Forces multiple agents to disagree before they agree, surfacing blind spots that single-perspective answers hide.
|
|
6
10
|
|
|
7
|
-
## The Problem
|
|
11
|
+
## The Problem: AI Sycophancy
|
|
12
|
+
|
|
13
|
+
AI chatbots are sycophantic. They validate your claims, confirm your hypotheses, and produce polished answers that sound balanced but come from a single reasoning tradition.
|
|
14
|
+
|
|
15
|
+
This is not a minor UX inconvenience. It is a structural failure mode:
|
|
8
16
|
|
|
9
|
-
|
|
17
|
+
- **Confirmation bias amplification**: LLMs agree with the user's framing by default. If you ask "should we use microservices?", the model builds a case for microservices. If you ask "should we stay monolithic?", the same model builds an equally confident case for monoliths. The answer follows the framing, not the evidence.
|
|
18
|
+
- **Delusional spiraling**: [Chandra et al. (2025)](https://arxiv.org/abs/2602.19141) formalized how prolonged conversations with agreeable AI lead to "sycophancy-induced delusional spiraling" — users develop dangerously confident beliefs because the AI never pushes back. Their model shows that even initially rational users converge toward overconfidence when the AI consistently validates.
|
|
19
|
+
- **Simulated balance**: A single LLM generates one coherent viewpoint per response. When asked for "both sides," it produces a paragraph for each — but both paragraphs come from the same reasoning tradition, the same training distribution, the same latent biases. It simulates balance without achieving genuine adversarial analysis.
|
|
20
|
+
- **Hidden trade-offs**: Complex decisions involve real trade-offs where the correct answer depends on which values you weight. A single model flattens these into one recommendation, hiding the tensions that should be visible to the decision-maker.
|
|
21
|
+
- **Context collapse**: In long conversations, the AI anchors on earlier positions. By session 5, you're in an echo chamber of your own assumptions, reinforced by an eager assistant.
|
|
10
22
|
|
|
11
|
-
|
|
23
|
+
The research is clear: a single AI perspective is structurally insufficient for complex decisions.
|
|
24
|
+
|
|
25
|
+
<p align="center">
|
|
26
|
+
<img src="assets/sycophancy.png" alt="The Sycophancy Problem — Single AI vs Multi-Agent Deliberation" width="100%" />
|
|
27
|
+
</p>
|
|
12
28
|
|
|
13
29
|
## The Solution
|
|
14
30
|
|
|
15
|
-
`deliberate` externalizes the disagreement layer. Instead of asking one agent for a balanced answer, it spawns multiple agents with distinct analytical methods
|
|
31
|
+
`deliberate` externalizes the disagreement layer. Instead of asking one agent for a balanced answer, it spawns multiple agents with **distinct analytical methods**, **explicit blind spots**, and **structural counterweights**. They analyze independently, cross-examine each other, and produce a verdict that shows you where they agree, where they disagree, and why.
|
|
16
32
|
|
|
17
33
|
The disagreements are the point. A single model averages opposing views into one confident recommendation. `deliberate` keeps them separate so you can decide.
|
|
18
34
|
|
|
35
|
+
**What deliberate brings to the table:**
|
|
36
|
+
|
|
37
|
+
- **Structural disagreement, not simulated balance**: Each agent has a declared analytical method and declared blind spots. The polarity pairs (e.g., `pragmatic-builder` vs `reframer`: "ship it" vs "does it even need to exist?") guarantee genuine tension.
|
|
38
|
+
- **Forced dissent**: The protocol requires at least 30% of agents to disagree in Round 2. Unanimous agreement triggers an explicit groupthink warning. The system is designed to make agreement hard.
|
|
39
|
+
- **Minority report**: Dissenting positions are preserved in full, not averaged away. Sometimes the minority is right — you should see their reasoning.
|
|
40
|
+
- **Multi-round cross-examination**: Agents don't just state opinions in parallel. In Round 2, each agent must name which other agent they most disagree with, and why. This forces genuine engagement with opposing views.
|
|
41
|
+
- **Transparent verdict**: The output shows you agreement, disagreement, the specific tensions, and unresolved questions. No confident recommendation hiding real trade-offs.
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
19
45
|
## Quick Start
|
|
20
46
|
|
|
47
|
+
### Install
|
|
48
|
+
|
|
21
49
|
```bash
|
|
22
50
|
npx @faviovazquez/deliberate
|
|
23
51
|
```
|
|
24
52
|
|
|
25
|
-
|
|
53
|
+
The interactive installer auto-detects your platform and lets you choose global or workspace installation. You can also pass flags directly:
|
|
26
54
|
|
|
27
|
-
|
|
55
|
+
```bash
|
|
56
|
+
# Claude Code — global (recommended)
|
|
57
|
+
npx @faviovazquez/deliberate --claude --global
|
|
58
|
+
|
|
59
|
+
# Windsurf — global
|
|
60
|
+
npx @faviovazquez/deliberate --windsurf --global
|
|
61
|
+
|
|
62
|
+
# Cursor — workspace only
|
|
63
|
+
npx @faviovazquez/deliberate --cursor
|
|
64
|
+
|
|
65
|
+
# All detected platforms — global
|
|
66
|
+
npx @faviovazquez/deliberate --all --global
|
|
67
|
+
|
|
68
|
+
# Preview without installing
|
|
69
|
+
npx @faviovazquez/deliberate --claude --global --dry-run
|
|
70
|
+
|
|
71
|
+
# Uninstall
|
|
72
|
+
npx @faviovazquez/deliberate --claude --global --uninstall
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
### Your First Deliberation
|
|
76
|
+
|
|
77
|
+
**Claude Code** — invoke with `/deliberate`:
|
|
28
78
|
```
|
|
29
79
|
/deliberate "should we migrate from REST to GraphQL?"
|
|
30
80
|
```
|
|
31
81
|
|
|
32
|
-
**Windsurf
|
|
82
|
+
**Windsurf** — invoke with `@deliberate` or just ask a complex question (Windsurf auto-invokes when the question matches the skill description):
|
|
33
83
|
```
|
|
34
84
|
@deliberate should we migrate from REST to GraphQL?
|
|
35
85
|
```
|
|
36
86
|
|
|
37
|
-
|
|
87
|
+
**Cursor** — invoke with `@deliberate`:
|
|
88
|
+
```
|
|
89
|
+
@deliberate should we migrate from REST to GraphQL?
|
|
90
|
+
```
|
|
38
91
|
|
|
39
|
-
### Manual Installation
|
|
92
|
+
### Manual Installation (git clone)
|
|
40
93
|
|
|
41
|
-
**Claude Code (workspace):**
|
|
42
94
|
```bash
|
|
43
95
|
git clone https://github.com/FavioVazquez/deliberate.git
|
|
44
96
|
cd deliberate
|
|
45
|
-
./install.sh --platform claude-code
|
|
46
97
|
```
|
|
47
98
|
|
|
48
|
-
**Claude Code (global -- all projects):**
|
|
49
99
|
```bash
|
|
100
|
+
# Claude Code
|
|
50
101
|
./install.sh --platform claude-code --global
|
|
51
|
-
```
|
|
52
|
-
|
|
53
|
-
**Windsurf (workspace -- into .windsurf/skills/):**
|
|
54
|
-
```bash
|
|
55
|
-
git clone https://github.com/FavioVazquez/deliberate.git
|
|
56
|
-
cd deliberate
|
|
57
|
-
./install.sh --platform windsurf
|
|
58
|
-
```
|
|
59
102
|
|
|
60
|
-
|
|
61
|
-
```bash
|
|
103
|
+
# Windsurf
|
|
62
104
|
./install.sh --platform windsurf --global
|
|
105
|
+
|
|
106
|
+
# Both
|
|
107
|
+
./install.sh --platform all --global
|
|
63
108
|
```
|
|
64
109
|
|
|
110
|
+
---
|
|
111
|
+
|
|
65
112
|
## Modes
|
|
66
113
|
|
|
114
|
+
`deliberate` has 6 modes. Each mode works on every platform — only the invocation syntax differs.
|
|
115
|
+
|
|
116
|
+
<p align="center">
|
|
117
|
+
<img src="assets/modes_overview.png" alt="Six Deliberation Modes" width="100%" />
|
|
118
|
+
</p>
|
|
119
|
+
|
|
67
120
|
### Full Deliberation (3 rounds)
|
|
121
|
+
|
|
122
|
+
All 14 core agents. Round 1: independent analysis. Round 2: cross-examination (agents must disagree). Round 3: crystallization. Produces a structured verdict with minority report.
|
|
123
|
+
|
|
124
|
+
**Claude Code:**
|
|
68
125
|
```
|
|
69
126
|
/deliberate --full "is this acquisition worth pursuing at 8x revenue?"
|
|
127
|
+
/deliberate --full "should we open-source our core library?"
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
**Windsurf / Cursor:**
|
|
131
|
+
```
|
|
132
|
+
@deliberate full deliberation: is this acquisition worth pursuing at 8x revenue?
|
|
133
|
+
@deliberate run all 14 agents on: should we open-source our core library?
|
|
70
134
|
```
|
|
71
|
-
All 14 agents. Round 1: independent analysis. Round 2: cross-examination (must disagree). Round 3: crystallization. Produces a structured verdict with minority report.
|
|
72
135
|
|
|
73
136
|
### Quick Deliberation (2 rounds)
|
|
137
|
+
|
|
138
|
+
Auto-selected triad. Rounds 1 + 3 only (skips cross-examination). Faster, cheaper, still multi-perspective.
|
|
139
|
+
|
|
140
|
+
**Claude Code:**
|
|
74
141
|
```
|
|
75
142
|
/deliberate --quick "monorepo or polyrepo?"
|
|
143
|
+
/deliberate --quick "should we add Redis caching?"
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
**Windsurf / Cursor:**
|
|
147
|
+
```
|
|
148
|
+
@deliberate quick: monorepo or polyrepo?
|
|
149
|
+
@deliberate quick deliberation on whether to add Redis caching
|
|
76
150
|
```
|
|
77
|
-
Auto-selected triad. Rounds 1 + 3 only (skip cross-examination). Faster, cheaper.
|
|
78
151
|
|
|
79
152
|
### Triad (domain-optimized)
|
|
153
|
+
|
|
154
|
+
3 agents selected for a specific domain. 18 pre-defined triads available (see table below). Use when you know the domain of your question.
|
|
155
|
+
|
|
156
|
+
**Claude Code:**
|
|
80
157
|
```
|
|
81
158
|
/deliberate --triad architecture "should we split the monolith?"
|
|
82
159
|
/deliberate --triad decision "build vs buy for notifications"
|
|
83
|
-
/deliberate --triad risk "should we launch before the audit?"
|
|
160
|
+
/deliberate --triad risk "should we launch before the security audit?"
|
|
161
|
+
/deliberate --triad ai "should we fine-tune or use RAG?"
|
|
162
|
+
/deliberate --triad shipping "can we ship v2 by Friday?"
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
**Windsurf / Cursor:**
|
|
166
|
+
```
|
|
167
|
+
@deliberate architecture triad: should we split the monolith?
|
|
168
|
+
@deliberate use the decision triad for: build vs buy for notifications
|
|
169
|
+
@deliberate risk triad: should we launch before the security audit?
|
|
170
|
+
@deliberate ai triad: should we fine-tune or use RAG?
|
|
171
|
+
@deliberate shipping triad: can we ship v2 by Friday?
|
|
84
172
|
```
|
|
85
|
-
3 agents selected for the domain. 18 pre-defined triads available.
|
|
86
173
|
|
|
87
174
|
### Duo / Dialectic
|
|
175
|
+
|
|
176
|
+
Two agents, two rounds of exchange, then synthesis. Best for binary decisions ("should we X or not?"). Pair agents from the polarity pairs table for maximum disagreement.
|
|
177
|
+
|
|
178
|
+
**Claude Code:**
|
|
88
179
|
```
|
|
89
180
|
/deliberate --duo assumption-breaker,pragmatic-builder "rewrite the auth layer?"
|
|
181
|
+
/deliberate --duo risk-analyst,pragmatic-builder "ship with known tech debt?"
|
|
182
|
+
/deliberate --duo classifier,emergence-reader "impose strict types or keep it flexible?"
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
**Windsurf / Cursor:**
|
|
186
|
+
```
|
|
187
|
+
@deliberate duo with assumption-breaker and pragmatic-builder: should we rewrite the auth layer?
|
|
188
|
+
@deliberate dialectic between risk-analyst and pragmatic-builder on shipping with known tech debt
|
|
189
|
+
@deliberate duo: classifier vs emergence-reader on imposing strict types vs keeping it flexible
|
|
90
190
|
```
|
|
91
|
-
Two agents, two rounds of exchange, then synthesis. Best for binary decisions.
|
|
92
191
|
|
|
93
192
|
### Brainstorm
|
|
193
|
+
|
|
194
|
+
Creative exploration with multiple agents. Divergent ideas → cross-pollination → convergence into actionable designs. Optionally add `--visual` for an interactive browser companion.
|
|
195
|
+
|
|
196
|
+
**Claude Code:**
|
|
94
197
|
```
|
|
95
198
|
/deliberate --brainstorm "how should we redesign onboarding?"
|
|
96
199
|
/deliberate --brainstorm --visual "landing page redesign"
|
|
200
|
+
/deliberate --brainstorm "new pricing model for our API"
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
**Windsurf / Cursor:**
|
|
204
|
+
```
|
|
205
|
+
@deliberate brainstorm: how should we redesign onboarding?
|
|
206
|
+
@deliberate brainstorm with visual companion: landing page redesign
|
|
207
|
+
@deliberate brainstorm: new pricing model for our API
|
|
97
208
|
```
|
|
98
|
-
Creative exploration with multiple agents. Divergent ideas, cross-pollination, convergence into actionable designs. Optional visual companion with interactive idea maps.
|
|
99
209
|
|
|
100
210
|
### Auto-Detect (no flag)
|
|
211
|
+
|
|
212
|
+
Just ask your question. The coordinator parses domain keywords, selects the best-matching triad, and runs the 3-round protocol automatically.
|
|
213
|
+
|
|
214
|
+
**Claude Code:**
|
|
101
215
|
```
|
|
102
216
|
/deliberate "should we migrate from REST to GraphQL?"
|
|
217
|
+
/deliberate "is our microservices architecture causing more problems than it solves?"
|
|
218
|
+
/deliberate "should we hire senior engineers or train juniors?"
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
**Windsurf / Cursor:**
|
|
222
|
+
```
|
|
223
|
+
@deliberate should we migrate from REST to GraphQL?
|
|
224
|
+
@deliberate is our microservices architecture causing more problems than it solves?
|
|
225
|
+
@deliberate should we hire senior engineers or train juniors?
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
### Custom Agent Selection
|
|
229
|
+
|
|
230
|
+
Pick specific agents by name for full control over who deliberates.
|
|
231
|
+
|
|
232
|
+
**Claude Code:**
|
|
233
|
+
```
|
|
234
|
+
/deliberate --members assumption-breaker,first-principles,bias-detector "why does our cache keep failing?"
|
|
235
|
+
/deliberate --members pragmatic-builder,risk-analyst,systems-thinker,inverter "refactor the payment system?"
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
**Windsurf / Cursor:**
|
|
239
|
+
```
|
|
240
|
+
@deliberate use agents assumption-breaker, first-principles, and bias-detector: why does our cache keep failing?
|
|
241
|
+
@deliberate members pragmatic-builder, risk-analyst, systems-thinker, inverter: should we refactor the payment system?
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
### Profiles
|
|
245
|
+
|
|
246
|
+
Pre-defined agent groups for common scenarios. Use when you don't want to pick individual agents.
|
|
247
|
+
|
|
248
|
+
| Profile | Agents | When to Use |
|
|
249
|
+
|---------|--------|-------------|
|
|
250
|
+
| `full` | All 14 core agents | Complex decisions with real trade-offs. **Claude Code default.** |
|
|
251
|
+
| `lean` | assumption-breaker, first-principles, bias-detector, pragmatic-builder, inverter | Fast decisions, limited context. **Windsurf/Cursor default.** |
|
|
252
|
+
| `exploration` | assumption-breaker, classifier, emergence-reader, reframer, systems-thinker, inverter, risk-analyst | Discovery, open-ended investigation |
|
|
253
|
+
| `execution` | pragmatic-builder, first-principles, adversarial-strategist, bias-detector, formal-verifier | Shipping decisions, technical trade-offs |
|
|
254
|
+
|
|
255
|
+
**Claude Code:**
|
|
256
|
+
```
|
|
257
|
+
/deliberate --profile exploration "what's the right approach to AI safety for our product?"
|
|
258
|
+
/deliberate --profile execution "can we ship this feature by next sprint?"
|
|
259
|
+
/deliberate --profile lean "quick take: should we use Postgres or MongoDB?"
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
**Windsurf / Cursor:**
|
|
263
|
+
```
|
|
264
|
+
@deliberate exploration profile: what's the right approach to AI safety for our product?
|
|
265
|
+
@deliberate execution profile: can we ship this feature by next sprint?
|
|
266
|
+
@deliberate lean profile: should we use Postgres or MongoDB?
|
|
103
267
|
```
|
|
104
|
-
|
|
268
|
+
|
|
269
|
+
---
|
|
270
|
+
|
|
271
|
+
## Flags Reference
|
|
272
|
+
|
|
273
|
+
| Flag | Effect | Example (Claude Code) |
|
|
274
|
+
|------|--------|----------------------|
|
|
275
|
+
| *(no flag)* | Auto-detect domain, select matching triad | `/deliberate "your question"` |
|
|
276
|
+
| `--full` | All 14 core agents, 3-round protocol | `/deliberate --full "question"` |
|
|
277
|
+
| `--quick` | Auto-detect triad, 2-round protocol (skip cross-examination) | `/deliberate --quick "question"` |
|
|
278
|
+
| `--duo a,b` | Dialectic mode: 2 agents, 2 exchange rounds, then synthesis | `/deliberate --duo risk-analyst,pragmatic-builder "question"` |
|
|
279
|
+
| `--triad {domain}` | Pre-defined triad for domain, 3-round protocol | `/deliberate --triad architecture "question"` |
|
|
280
|
+
| `--members a,b,c` | Custom agent selection (2-14 agents), 3-round protocol | `/deliberate --members a,b,c "question"` |
|
|
281
|
+
| `--brainstorm` | Creative exploration with divergent-convergent flow | `/deliberate --brainstorm "question"` |
|
|
282
|
+
| `--profile {name}` | Use named profile (full, lean, exploration, execution) | `/deliberate --profile exploration "question"` |
|
|
283
|
+
| `--visual` | Launch browser-based visual companion | `/deliberate --visual --full "question"` |
|
|
284
|
+
| `--save {slug}` | Override auto-generated filename slug for output | `/deliberate --save my-decision "question"` |
|
|
285
|
+
|
|
286
|
+
> **Windsurf/Cursor note:** Flags are expressed as natural language after `@deliberate`. The skill parses intent from your message. Examples: `@deliberate full deliberation: ...`, `@deliberate quick: ...`, `@deliberate architecture triad: ...`, `@deliberate brainstorm: ...`.
|
|
287
|
+
|
|
288
|
+
---
|
|
105
289
|
|
|
106
290
|
## The 14 Core Agents
|
|
107
291
|
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
|
115
|
-
|
|
116
|
-
| `
|
|
117
|
-
| `
|
|
118
|
-
| `
|
|
119
|
-
| `
|
|
120
|
-
| `
|
|
121
|
-
| `
|
|
122
|
-
| `
|
|
123
|
-
| `
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
292
|
+
Each agent is named by its analytical function — not by historical figures. Every agent declares its method, what it sees that others miss, and what it tends to miss. These declared blind spots are why the polarity pairs matter.
|
|
293
|
+
|
|
294
|
+
<p align="center">
|
|
295
|
+
<img src="assets/agents_wheel.png" alt="The 14 Core Agents" width="100%" />
|
|
296
|
+
</p>
|
|
297
|
+
|
|
298
|
+
| # | Agent | Function | Tier |
|
|
299
|
+
|---|-------|----------|------|
|
|
300
|
+
| 1 | `assumption-breaker` | Destroys hidden premises, tests by contradiction, dialectical questioning | high |
|
|
301
|
+
| 2 | `first-principles` | Bottom-up derivation, refuses unexplained complexity | mid |
|
|
302
|
+
| 3 | `classifier` | Taxonomic structure, category errors, four-cause analysis | mid |
|
|
303
|
+
| 4 | `formal-verifier` | Computational skeleton, mechanization boundaries, abstraction | mid |
|
|
304
|
+
| 5 | `bias-detector` | Cognitive bias detection, pre-mortem, de-biasing interventions | high |
|
|
305
|
+
| 6 | `systems-thinker` | Feedback loops, leverage points, unintended consequences | mid |
|
|
306
|
+
| 7 | `resilience-anchor` | Control vs acceptance, moral clarity, anti-panic grounding | mid |
|
|
307
|
+
| 8 | `adversarial-strategist` | Terrain reading, competitive dynamics, strategic timing | mid |
|
|
308
|
+
| 9 | `emergence-reader` | Non-action, subtraction, intervention audit, minimum intervention | high |
|
|
309
|
+
| 10 | `incentive-mapper` | Power dynamics, actor incentives, principal-agent problems | mid |
|
|
310
|
+
| 11 | `pragmatic-builder` | Ship it, maintenance cost, over-engineering detection | mid |
|
|
311
|
+
| 12 | `reframer` | Dissolves false problems, frame audit, false dichotomies | high |
|
|
312
|
+
| 13 | `risk-analyst` | Antifragility, tail risk, fragility profile, barbell strategy | high |
|
|
313
|
+
| 14 | `inverter` | Multi-model reasoning, inversion ("what guarantees failure?"), opportunity cost | mid |
|
|
314
|
+
|
|
315
|
+
### Optional Specialists
|
|
316
|
+
|
|
317
|
+
Activated only when their domain-specific triad is selected:
|
|
318
|
+
|
|
319
|
+
| Agent | Function | Triads |
|
|
320
|
+
|-------|----------|--------|
|
|
321
|
+
| `ml-intuition` | Neural net intuition, training dynamics, jagged frontier | ai, ai-product |
|
|
322
|
+
| `safety-frontier` | Scaling dynamics, capability-safety frontier, phase transitions | ai |
|
|
323
|
+
| `design-lens` | User-centered design, honesty audit, "less but better" | design, ai-product |
|
|
324
|
+
|
|
325
|
+
### Polarity Pairs
|
|
326
|
+
|
|
327
|
+
These agents are structural counterweights. When both are present, genuine disagreement is almost guaranteed. Use these pairings for `--duo` mode:
|
|
328
|
+
|
|
329
|
+
| Pair | Tension |
|
|
330
|
+
|------|---------|
|
|
331
|
+
| `assumption-breaker` vs `first-principles` | Top-down destruction vs bottom-up construction |
|
|
332
|
+
| `classifier` vs `emergence-reader` | Impose structure vs let it emerge |
|
|
333
|
+
| `adversarial-strategist` vs `resilience-anchor` | Win externally vs govern internally |
|
|
334
|
+
| `formal-verifier` vs `incentive-mapper` | Abstract purity vs messy human reality |
|
|
335
|
+
| `pragmatic-builder` vs `reframer` | Ship it vs does it need to exist? |
|
|
336
|
+
| `pragmatic-builder` vs `systems-thinker` | Fix the bug vs redesign the system |
|
|
337
|
+
| `risk-analyst` vs `ml-intuition` | Tail paranoia vs empirical iteration |
|
|
338
|
+
|
|
339
|
+
---
|
|
128
340
|
|
|
129
341
|
## 18 Pre-defined Triads
|
|
130
342
|
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
|
138
|
-
|
|
139
|
-
|
|
|
140
|
-
|
|
|
141
|
-
|
|
|
142
|
-
|
|
|
143
|
-
|
|
|
144
|
-
|
|
|
145
|
-
|
|
|
146
|
-
|
|
|
147
|
-
|
|
|
148
|
-
|
|
|
149
|
-
|
|
|
150
|
-
|
|
|
343
|
+
Each triad is a team of 3 agents optimized for a domain. The reasoning chain shows the deliberation flow.
|
|
344
|
+
|
|
345
|
+
<p align="center">
|
|
346
|
+
<img src="assets/triad_map.png" alt="18 Pre-defined Triads" width="100%" />
|
|
347
|
+
</p>
|
|
348
|
+
|
|
349
|
+
| Domain | Agents | Reasoning Chain |
|
|
350
|
+
|--------|--------|-----------------|
|
|
351
|
+
| `architecture` | classifier + formal-verifier + first-principles | categorize → formalize → simplicity-test |
|
|
352
|
+
| `strategy` | adversarial-strategist + incentive-mapper + resilience-anchor | terrain → incentives → moral grounding |
|
|
353
|
+
| `ethics` | resilience-anchor + assumption-breaker + emergence-reader | duty → questioning → natural order |
|
|
354
|
+
| `debugging` | first-principles + assumption-breaker + formal-verifier | bottom-up → assumptions → formal verify |
|
|
355
|
+
| `innovation` | formal-verifier + emergence-reader + classifier | abstraction → emergence → classification |
|
|
356
|
+
| `conflict` | assumption-breaker + incentive-mapper + resilience-anchor | expose → predict → ground |
|
|
357
|
+
| `complexity` | emergence-reader + classifier + formal-verifier | emergence → categories → formalism |
|
|
358
|
+
| `risk` | adversarial-strategist + resilience-anchor + first-principles | threats → resilience → empirical verify |
|
|
359
|
+
| `shipping` | pragmatic-builder + adversarial-strategist + first-principles | pragmatism → timing → first-principles |
|
|
360
|
+
| `product` | pragmatic-builder + incentive-mapper + reframer | ship it → incentives → reframing |
|
|
361
|
+
| `decision` | inverter + bias-detector + risk-analyst | inversion → biases → tail risk |
|
|
362
|
+
| `systems` | systems-thinker + emergence-reader + classifier | feedback → emergence → structure |
|
|
363
|
+
| `economics` | adversarial-strategist + inverter + incentive-mapper | terrain → models → power |
|
|
364
|
+
| `uncertainty` | risk-analyst + adversarial-strategist + assumption-breaker | tails → threats → premises |
|
|
365
|
+
| `bias` | bias-detector + reframer + assumption-breaker | biases → frame → premises |
|
|
366
|
+
| `ai` | formal-verifier + ml-intuition + safety-frontier | formalism → empirical ML → safety |
|
|
367
|
+
| `ai-product` | pragmatic-builder + ml-intuition + design-lens | ship → ML reality → user |
|
|
368
|
+
| `design` | design-lens + reframer + pragmatic-builder | user → frame → ship |
|
|
369
|
+
|
|
370
|
+
**Which triad should I use?**
|
|
371
|
+
|
|
372
|
+
| If your question is about... | Use triad |
|
|
373
|
+
|------------------------------|-----------|
|
|
374
|
+
| Code structure, API design, monolith vs micro | `architecture` |
|
|
375
|
+
| Build vs buy, go/no-go, pricing | `decision` |
|
|
376
|
+
| Should we launch/ship? | `shipping` |
|
|
377
|
+
| Competitive moves, market entry | `strategy` |
|
|
378
|
+
| Something feels risky | `risk` or `uncertainty` |
|
|
379
|
+
| Root cause analysis, why is X broken | `debugging` |
|
|
380
|
+
| Feature design, user experience | `product` or `design` |
|
|
381
|
+
| AI/ML architecture, model selection | `ai` |
|
|
382
|
+
| AI product decisions | `ai-product` |
|
|
383
|
+
| Team dynamics, organizational change | `conflict` |
|
|
384
|
+
| System reliability, scaling | `systems` or `complexity` |
|
|
385
|
+
| Ethical implications | `ethics` |
|
|
386
|
+
| Cognitive biases in your decision | `bias` |
|
|
387
|
+
|
|
388
|
+
---
|
|
151
389
|
|
|
152
390
|
## Visual Companion
|
|
153
391
|
|
|
154
|
-
|
|
392
|
+
A browser-based interface that shows deliberation progress in real time. Available on all platforms.
|
|
155
393
|
|
|
394
|
+
**Claude Code:**
|
|
156
395
|
```
|
|
157
396
|
/deliberate --visual --full "major architecture decision"
|
|
158
397
|
/deliberate --brainstorm --visual "redesign the dashboard"
|
|
159
398
|
```
|
|
160
399
|
|
|
400
|
+
**Windsurf / Cursor:**
|
|
401
|
+
```
|
|
402
|
+
@deliberate full deliberation with visual companion: major architecture decision
|
|
403
|
+
@deliberate brainstorm with visual: redesign the dashboard
|
|
404
|
+
```
|
|
405
|
+
|
|
161
406
|
It provides:
|
|
162
407
|
- **Agent Position Map**: Force-directed graph showing agents as colored nodes positioned by agreement/disagreement
|
|
163
408
|
- **Agreement Matrix**: Heatmap of which agents agree/disagree on which points
|
|
@@ -167,62 +412,189 @@ It provides:
|
|
|
167
412
|
|
|
168
413
|
Built with plain HTML + JS + Canvas 2D. No framework, no build step. Served locally via a lightweight Node.js file-watcher server.
|
|
169
414
|
|
|
415
|
+
---
|
|
416
|
+
|
|
170
417
|
## Platforms
|
|
171
418
|
|
|
172
419
|
| Platform | Execution Model | Default Profile | Invocation | Install Paths |
|
|
173
420
|
|----------|----------------|-----------------|------------|---------------|
|
|
174
|
-
| Claude Code | Parallel subagents (`context: fork`) | full (14 agents) | `/deliberate` | `~/.claude/skills/`
|
|
175
|
-
| Windsurf | Sequential role-prompting | lean (5 agents) | `@deliberate` or auto | `~/.codeium/windsurf/skills
|
|
176
|
-
| Cursor | Sequential role-prompting | lean (5 agents) | `@deliberate`
|
|
421
|
+
| Claude Code | Parallel subagents (`context: fork`) | full (14 agents) | `/deliberate` + flags | `~/.claude/skills/` + `~/.claude/agents/` |
|
|
422
|
+
| Windsurf | Sequential role-prompting | lean (5 agents) | `@deliberate` or auto-invoke | `~/.codeium/windsurf/skills/deliberate/` |
|
|
423
|
+
| Cursor | Sequential role-prompting | lean (5 agents) | `@deliberate` | `.cursor/skills/deliberate/` |
|
|
177
424
|
|
|
178
425
|
### How it works on each platform
|
|
179
426
|
|
|
180
|
-
**Claude Code**: Each agent runs as a parallel subagent with its own isolated context window. The coordinator dispatches all agents simultaneously in Round 1 and Round 3, and sequentially in Round 2 (cross-examination). Agents are installed as separate `.md` files in `~/.claude/agents/` and referenced by the skill
|
|
427
|
+
**Claude Code**: Each agent runs as a parallel subagent with its own isolated context window. The coordinator dispatches all agents simultaneously in Round 1 and Round 3, and sequentially in Round 2 (cross-examination requires seeing prior outputs). Agents are installed as separate `.md` files in `~/.claude/agents/` and referenced by the skill protocol in `~/.claude/skills/deliberate/`.
|
|
428
|
+
|
|
429
|
+
**Windsurf**: No subagent support. The coordinator adopts each agent's persona sequentially within a single context window. Agent definitions are bundled inside the skill directory (`~/.codeium/windsurf/skills/deliberate/agents/`) and read on demand. The default "lean" profile (5 agents) keeps context usage manageable. Windsurf also auto-invokes the skill when your question matches the skill description — you don't always need to type `@deliberate`.
|
|
181
430
|
|
|
182
|
-
**
|
|
431
|
+
**Cursor**: Same sequential execution as Windsurf. Agents bundled inside `.cursor/skills/deliberate/agents/`. Workspace-local installation only.
|
|
183
432
|
|
|
184
|
-
|
|
433
|
+
---
|
|
185
434
|
|
|
186
435
|
## Configuration
|
|
187
436
|
|
|
188
|
-
### Model
|
|
189
|
-
All agents use sonnet-equivalent models by default. To enable higher-tier models:
|
|
437
|
+
### Model Tiers
|
|
190
438
|
|
|
439
|
+
All agents use sonnet-equivalent models by default. The tier determines model quality:
|
|
440
|
+
|
|
441
|
+
| Tier | Model | Agents at this tier |
|
|
442
|
+
|------|-------|--------------------|
|
|
443
|
+
| `high` | claude-sonnet-4 / equivalent | assumption-breaker, bias-detector, emergence-reader, reframer, risk-analyst |
|
|
444
|
+
| `mid` | claude-sonnet-4 / equivalent | All other agents |
|
|
445
|
+
|
|
446
|
+
To override all agents to a higher tier:
|
|
191
447
|
```yaml
|
|
192
448
|
# In your project's config.yaml:
|
|
193
449
|
model_tier: high
|
|
194
450
|
# WARNING: high-tier models consume significantly more tokens/credits
|
|
195
451
|
```
|
|
196
452
|
|
|
197
|
-
### Custom
|
|
198
|
-
Copy `configs/provider-model-slots.example.yaml` to `configs/provider-model-slots.yaml` and customize per-agent model assignments.
|
|
453
|
+
### Custom Model Routing
|
|
199
454
|
|
|
200
|
-
|
|
201
|
-
|
|
455
|
+
For per-agent model control, copy the example config:
|
|
456
|
+
|
|
457
|
+
```bash
|
|
458
|
+
cp configs/provider-model-slots.example.yaml configs/provider-model-slots.yaml
|
|
459
|
+
```
|
|
460
|
+
|
|
461
|
+
Then edit to assign specific models per agent:
|
|
462
|
+
```yaml
|
|
463
|
+
# configs/provider-model-slots.yaml
|
|
464
|
+
assumption-breaker:
|
|
465
|
+
provider: anthropic
|
|
466
|
+
model: claude-sonnet-4-20250514
|
|
467
|
+
first-principles:
|
|
468
|
+
provider: anthropic
|
|
469
|
+
model: claude-sonnet-4-20250514
|
|
470
|
+
# ... customize per agent
|
|
471
|
+
```
|
|
472
|
+
|
|
473
|
+
### Output Location
|
|
474
|
+
|
|
475
|
+
All deliberation and brainstorm records are saved to `deliberations/` in your project root:
|
|
476
|
+
```
|
|
477
|
+
deliberations/
|
|
478
|
+
2025-04-03-14-30-triad-architecture-monolith.md
|
|
479
|
+
2025-04-03-15-00-full-acquisition.md
|
|
480
|
+
2025-04-03-16-00-brainstorm-onboarding.md
|
|
481
|
+
```
|
|
482
|
+
|
|
483
|
+
### Platform Defaults
|
|
484
|
+
|
|
485
|
+
The `configs/defaults.yaml` file controls per-platform behavior:
|
|
486
|
+
|
|
487
|
+
```yaml
|
|
488
|
+
platforms:
|
|
489
|
+
claude-code:
|
|
490
|
+
execution: parallel # Agents run as parallel subagents
|
|
491
|
+
default_profile: full # All 14 agents
|
|
492
|
+
windsurf:
|
|
493
|
+
execution: sequential # Single context, role-prompting
|
|
494
|
+
default_profile: lean # 5 agents (saves context)
|
|
495
|
+
cursor:
|
|
496
|
+
execution: sequential
|
|
497
|
+
default_profile: lean
|
|
498
|
+
```
|
|
499
|
+
|
|
500
|
+
You can override the default profile per invocation with `--profile`.
|
|
501
|
+
|
|
502
|
+
---
|
|
202
503
|
|
|
203
504
|
## When to Use
|
|
204
505
|
|
|
205
|
-
**Use
|
|
506
|
+
**Use `deliberate` for:**
|
|
507
|
+
- Complex decisions where trade-offs are real: architecture choices, strategic pivots, build-vs-buy
|
|
508
|
+
- Pricing models, go/no-go decisions, risk assessment
|
|
509
|
+
- Any situation where you suspect a single confident answer hides real trade-offs
|
|
510
|
+
- Decisions where you already have an opinion but suspect you're missing something
|
|
206
511
|
|
|
207
|
-
**Don't use
|
|
512
|
+
**Don't use `deliberate` for:**
|
|
513
|
+
- Questions with clear correct answers
|
|
514
|
+
- Don't convene 14 agents to debate tabs vs spaces
|
|
515
|
+
- Don't use `--full` when a triad covers the domain (14 agents consume significant context and API cost)
|
|
208
516
|
|
|
209
|
-
**The sweet spot:** Decisions where
|
|
517
|
+
**The sweet spot:** Decisions where a single confident answer hides real trade-offs. `deliberate` surfaces what you're not seeing — structured, with the disagreements visible.
|
|
518
|
+
|
|
519
|
+
---
|
|
210
520
|
|
|
211
521
|
## Enforcement
|
|
212
522
|
|
|
213
523
|
The protocol includes safeguards against common failure modes:
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
|
|
524
|
+
|
|
525
|
+
<p align="center">
|
|
526
|
+
<img src="assets/enforcement.png" alt="Protocol Enforcement" width="100%" />
|
|
527
|
+
</p>
|
|
528
|
+
|
|
529
|
+
| Rule | What it prevents |
|
|
530
|
+
|------|-----------------|
|
|
531
|
+
| **Hemlock rule** | Infinite questioning spirals — forces 50-word position statement |
|
|
532
|
+
| **3-level depth limit** | Endless depth — forces position commitment after 3 rounds of questioning |
|
|
533
|
+
| **2-message cutoff** | Any pair dominating the discussion |
|
|
534
|
+
| **Dissent quota (30%)** | Groupthink — at least 30% of agents must disagree in Round 2 |
|
|
535
|
+
| **Novelty gate** | Stale deliberation — Round 2 must introduce new ideas not in Round 1 |
|
|
536
|
+
| **Groupthink flag** | Unanimous agreement triggers explicit warning to the user |
|
|
537
|
+
|
|
538
|
+
---
|
|
539
|
+
|
|
540
|
+
## Verdict Output
|
|
541
|
+
|
|
542
|
+
Every deliberation produces a structured verdict saved to `deliberations/`:
|
|
543
|
+
|
|
544
|
+
<p align="center">
|
|
545
|
+
<img src="assets/verdict.png" alt="Structured Verdict Output" width="100%" />
|
|
546
|
+
</p>
|
|
547
|
+
|
|
548
|
+
```markdown
|
|
549
|
+
## Deliberation Verdict
|
|
550
|
+
|
|
551
|
+
### Problem
|
|
552
|
+
{Original question}
|
|
553
|
+
|
|
554
|
+
### Agents Present
|
|
555
|
+
{List of agents with functions}
|
|
556
|
+
|
|
557
|
+
### Mode
|
|
558
|
+
{Full / Quick / Duo / Triad: {domain}}
|
|
559
|
+
|
|
560
|
+
### Consensus Position
|
|
561
|
+
{Position held by 2/3+ agents, if one exists}
|
|
562
|
+
|
|
563
|
+
### Key Insights by Agent
|
|
564
|
+
{2-3 sentence summary per agent}
|
|
565
|
+
|
|
566
|
+
### Points of Agreement
|
|
567
|
+
{Where agents converged}
|
|
568
|
+
|
|
569
|
+
### Points of Disagreement
|
|
570
|
+
{Where agents diverged, with the specific tension}
|
|
571
|
+
|
|
572
|
+
### Minority Report
|
|
573
|
+
{Dissenting positions with full reasoning — sometimes the minority is right}
|
|
574
|
+
|
|
575
|
+
### Verdict Type
|
|
576
|
+
{consensus | majority | split | dilemma}
|
|
577
|
+
|
|
578
|
+
### Recommended Next Steps
|
|
579
|
+
{1-3 concrete actions}
|
|
580
|
+
|
|
581
|
+
### Unresolved Questions
|
|
582
|
+
{Questions raised but not resolved}
|
|
583
|
+
```
|
|
584
|
+
|
|
585
|
+
**Verdict types:**
|
|
586
|
+
- **consensus**: 2/3+ agree, minority report recorded
|
|
587
|
+
- **majority**: Simple majority, significant dissent recorded
|
|
588
|
+
- **split**: No majority, all positions presented equally
|
|
589
|
+
- **dilemma**: Genuine dilemma with no clear resolution — the agents surfaced the tension, you decide
|
|
590
|
+
|
|
591
|
+
---
|
|
220
592
|
|
|
221
593
|
## References
|
|
222
594
|
|
|
223
|
-
- **Chandra, Y., Mishra, C., & Flynn, B.** (2025). *Can AIeli-bots turn us all delusional? How AI sycophancy, AI psychosis, and human self-correction interact.* arXiv:2602.19141. [Paper](https://arxiv.org/abs/2602.19141) — The formal model of sycophancy-induced delusional spiraling that motivated this project.
|
|
224
|
-
- **Council of High Intelligence** — [github.com/0xNyk/council-of-high-intelligence](https://github.com/0xNyk/council-of-high-intelligence) — The original multi-agent deliberation system for Claude Code.
|
|
225
|
-
- **Superpowers Brainstorming Skill** — [github.com/obra/superpowers](https://github.com/obra/superpowers) — The brainstorming skill and browser-based visual companion architecture.
|
|
595
|
+
- **Chandra, Y., Mishra, C., & Flynn, B.** (2025). *Can AIeli-bots turn us all delusional? How AI sycophancy, AI psychosis, and human self-correction interact.* arXiv:2602.19141. [Paper](https://arxiv.org/abs/2602.19141) — The formal model of sycophancy-induced delusional spiraling that motivated this project.
|
|
596
|
+
- **Council of High Intelligence** — [github.com/0xNyk/council-of-high-intelligence](https://github.com/0xNyk/council-of-high-intelligence) — The original multi-agent deliberation system for Claude Code. `deliberate` redesigns the agent roster around analytical functions rather than personas, adds enforcement rules, and extends to multiple platforms.
|
|
597
|
+
- **Superpowers Brainstorming Skill** — [github.com/obra/superpowers](https://github.com/obra/superpowers) — The brainstorming skill and browser-based visual companion architecture.
|
|
226
598
|
|
|
227
599
|
## License
|
|
228
600
|
|