frontier-council 0.1.2__tar.gz → 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,7 +1,7 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: frontier-council
3
- Version: 0.1.2
4
- Summary: Multi-model deliberation for important decisions. 5 frontier LLMs debate, then a judge synthesizes consensus.
3
+ Version: 0.2.0
4
+ Summary: Multi-model deliberation for important decisions. 4 frontier LLMs debate with rotating challenger, then Claude judges.
5
5
  Project-URL: Homepage, https://github.com/terry-li-hm/frontier-council
6
6
  Project-URL: Repository, https://github.com/terry-li-hm/frontier-council
7
7
  Project-URL: Issues, https://github.com/terry-li-hm/frontier-council/issues
@@ -19,22 +19,25 @@ Classifier: Programming Language :: Python :: 3.12
19
19
  Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
20
20
  Requires-Python: >=3.11
21
21
  Requires-Dist: httpx>=0.25.0
22
+ Requires-Dist: pydantic>=2.0
23
+ Requires-Dist: pyyaml>=6.0
22
24
  Description-Content-Type: text/markdown
23
25
 
24
26
  # Frontier Council
25
27
 
26
- Multi-model deliberation for important decisions. 5 frontier LLMs debate a question, then a judge synthesizes consensus.
28
+ Multi-model deliberation for important decisions. 4 frontier LLMs debate a question, then Claude judges and synthesizes.
27
29
 
28
- Inspired by [Andrej Karpathy's LLM Council](https://github.com/karpathy/llm-council), with added blind phase (anti-anchoring), explicit engagement requirements, devil's advocate role, and social calibration mode.
30
+ Inspired by [Andrej Karpathy's LLM Council](https://github.com/karpathy/llm-council), with added blind phase (anti-anchoring), explicit engagement requirements, rotating challenger role, and social calibration mode.
29
31
 
30
32
  ## Models
31
33
 
32
- - Claude (claude-opus-4.5)
34
+ **Council (deliberators):**
33
35
  - GPT (gpt-5.2-pro)
34
36
  - Gemini (gemini-3-pro-preview)
35
37
  - Grok (grok-4)
36
38
  - Kimi (kimi-k2.5)
37
- - Judge: Claude Opus 4.5
39
+
40
+ **Judge:** Claude Opus 4.5 (synthesizes + adds own perspective)
38
41
 
39
42
  ## Installation
40
43
 
@@ -99,7 +102,9 @@ All sessions are auto-saved to `~/.frontier-council/sessions/` for later review.
99
102
  | `--share` | Upload transcript to secret GitHub Gist |
100
103
  | `--social` | Enable social calibration mode (auto-detected for interview/networking) |
101
104
  | `--persona TEXT` | Context about the person asking |
102
- | `--advocate N` | Which speaker (1-5) should be devil's advocate (default: random) |
105
+ | `--challenger MODEL` | Which model starts as challenger (gpt/gemini/grok/kimi). Rotates each round. |
106
+ | `--domain DOMAIN` | Regulatory domain context (banking, healthcare, eu, fintech, bio) |
107
+ | `--followup` | Enable interactive drill-down after judge synthesis |
103
108
  | `--quiet` | Suppress progress output |
104
109
  | `--sessions` | List recent saved sessions |
105
110
  | `--no-save` | Don't auto-save transcript to ~/.frontier-council/sessions/ |
@@ -114,9 +119,15 @@ All sessions are auto-saved to `~/.frontier-council/sessions/` for later review.
114
119
  **Deliberation Protocol:**
115
120
  1. All models see everyone's blind claims, then deliberate
116
121
  2. Each model MUST explicitly AGREE, DISAGREE, or BUILD ON previous speakers by name
117
- 3. After each round, the system checks for consensus (4/5 agreement triggers early exit)
122
+ 3. After each round, the system checks for consensus (3/4 non-challengers agreeing triggers early exit)
118
123
  4. Judge synthesizes the full deliberation
119
124
 
125
+ **Rotating Challenger:**
126
+ - One model each round is assigned the "challenger" role
127
+ - The challenger MUST argue the contrarian position and identify weaknesses in emerging consensus
128
+ - Role rotates each round (GPT R1 → Gemini R2 → Grok R3 → Kimi R4...) to ensure sustained disagreement
129
+ - Challenger is excluded from consensus detection (forced disagreement shouldn't block early exit)
130
+
120
131
  **Anonymous Deliberation:**
121
132
  - Models see each other as "Speaker 1", "Speaker 2", etc. during deliberation
122
133
  - Prevents models from playing favorites based on vendor reputation
@@ -1,17 +1,18 @@
1
1
  # Frontier Council
2
2
 
3
- Multi-model deliberation for important decisions. 5 frontier LLMs debate a question, then a judge synthesizes consensus.
3
+ Multi-model deliberation for important decisions. 4 frontier LLMs debate a question, then Claude judges and synthesizes.
4
4
 
5
- Inspired by [Andrej Karpathy's LLM Council](https://github.com/karpathy/llm-council), with added blind phase (anti-anchoring), explicit engagement requirements, devil's advocate role, and social calibration mode.
5
+ Inspired by [Andrej Karpathy's LLM Council](https://github.com/karpathy/llm-council), with added blind phase (anti-anchoring), explicit engagement requirements, rotating challenger role, and social calibration mode.
6
6
 
7
7
  ## Models
8
8
 
9
- - Claude (claude-opus-4.5)
9
+ **Council (deliberators):**
10
10
  - GPT (gpt-5.2-pro)
11
11
  - Gemini (gemini-3-pro-preview)
12
12
  - Grok (grok-4)
13
13
  - Kimi (kimi-k2.5)
14
- - Judge: Claude Opus 4.5
14
+
15
+ **Judge:** Claude Opus 4.5 (synthesizes + adds own perspective)
15
16
 
16
17
  ## Installation
17
18
 
@@ -76,7 +77,9 @@ All sessions are auto-saved to `~/.frontier-council/sessions/` for later review.
76
77
  | `--share` | Upload transcript to secret GitHub Gist |
77
78
  | `--social` | Enable social calibration mode (auto-detected for interview/networking) |
78
79
  | `--persona TEXT` | Context about the person asking |
79
- | `--advocate N` | Which speaker (1-5) should be devil's advocate (default: random) |
80
+ | `--challenger MODEL` | Which model starts as challenger (gpt/gemini/grok/kimi). Rotates each round. |
81
+ | `--domain DOMAIN` | Regulatory domain context (banking, healthcare, eu, fintech, bio) |
82
+ | `--followup` | Enable interactive drill-down after judge synthesis |
80
83
  | `--quiet` | Suppress progress output |
81
84
  | `--sessions` | List recent saved sessions |
82
85
  | `--no-save` | Don't auto-save transcript to ~/.frontier-council/sessions/ |
@@ -91,9 +94,15 @@ All sessions are auto-saved to `~/.frontier-council/sessions/` for later review.
91
94
  **Deliberation Protocol:**
92
95
  1. All models see everyone's blind claims, then deliberate
93
96
  2. Each model MUST explicitly AGREE, DISAGREE, or BUILD ON previous speakers by name
94
- 3. After each round, the system checks for consensus (4/5 agreement triggers early exit)
97
+ 3. After each round, the system checks for consensus (3/4 non-challengers agreeing triggers early exit)
95
98
  4. Judge synthesizes the full deliberation
96
99
 
100
+ **Rotating Challenger:**
101
+ - One model each round is assigned the "challenger" role
102
+ - The challenger MUST argue the contrarian position and identify weaknesses in emerging consensus
103
+ - Role rotates each round (GPT R1 → Gemini R2 → Grok R3 → Kimi R4...) to ensure sustained disagreement
104
+ - Challenger is excluded from consensus detection (forced disagreement shouldn't block early exit)
105
+
97
106
  **Anonymous Deliberation:**
98
107
  - Models see each other as "Speaker 1", "Speaker 2", etc. during deliberation
99
108
  - Prevents models from playing favorites based on vendor reputation
@@ -0,0 +1,59 @@
1
+ # Brainstorm: Rotating Challenger for Sustained Disagreement
2
+
3
+ **Date:** 2026-02-05
4
+ **Status:** Ready for planning
5
+
6
+ ## What We're Building
7
+
8
+ Modify the frontier-council deliberation architecture so that the challenger role rotates each round instead of only firing in Round 1. This ensures someone is always structurally incentivized to push back, preventing the premature convergence observed in current transcripts.
9
+
10
+ Additionally, strengthen the challenger prompt to produce sharper disagreement.
11
+
12
+ ## Why This Approach
13
+
14
+ ### Problem Observed
15
+
16
+ From transcript analysis:
17
+ - Best insights came from genuine pushback (Kimi's "stop weaving, start haunting", Gemini's "dismantle the OpenCode delegation")
18
+ - These contrarian takes got softened by Round 2
19
+ - Models converge quickly because LLMs are trained to agree
20
+ - Current challenger/advocate roles only fire in Round 1, exactly when they're least needed
21
+
22
+ ### Why Rotating Challenger
23
+
24
+ | Alternative | Why Not |
25
+ |-------------|---------|
26
+ | Position Locking | Requires state tracking, feels artificial, models may defend positions they don't believe |
27
+ | Adversarial Pairing | Major redesign, doesn't work for all question types |
28
+ | Just fix the judge | Symptom not cause — the deliberation itself converges too fast |
29
+
30
+ Rotating challenger is:
31
+ - Minimal code change (move the `if round_num == 0` check)
32
+ - Immediate impact on deliberation dynamics
33
+ - Easy to measure (compare transcripts before/after)
34
+
35
+ ## Key Decisions
36
+
37
+ 1. **Rotation pattern:** Sequential through council order (Claude → GPT → Gemini → Grok → Kimi → repeat)
38
+ 2. **Prompt strengthening:** Add explicit requirements to challenger prompt:
39
+ - Must name one specific thing that would make the emerging consensus WRONG
40
+ - Must identify the weakest assumption being made
41
+ - Cannot use phrases like "building on" or "adding nuance"
42
+ 3. **Merge advocate and challenger:** Remove the redundant devil's advocate role. One challenger role is enough.
43
+ 4. **All rounds:** Challenger fires every round, not just Round 1
44
+
45
+ ## Open Questions
46
+
47
+ 1. Should challenger be excluded from consensus detection? (If the challenger is forced to disagree, they shouldn't count toward "4/5 agree")
48
+ 2. Should we track which model was challenger in the output metadata?
49
+ 3. Does strengthening the prompt risk making disagreement feel forced/artificial?
50
+
51
+ ## Success Criteria
52
+
53
+ - Transcripts show sustained disagreement through Round 2+
54
+ - Contrarian perspectives survive to judge synthesis
55
+ - Judge explicitly notes unresolved tensions (may need separate prompt tweak)
56
+
57
+ ## Next Steps
58
+
59
+ Run `/workflows:plan` to create implementation plan.
@@ -0,0 +1,280 @@
1
+ ---
2
+ title: "feat: Rotating Challenger for Sustained Disagreement"
3
+ type: feat
4
+ date: 2026-02-05
5
+ brainstorm: docs/brainstorms/2026-02-05-rotating-challenger-brainstorm.md
6
+ ---
7
+
8
+ # feat: Rotating Challenger for Sustained Disagreement
9
+
10
+ ## Overview
11
+
12
+ Modify frontier-council so the challenger role rotates each round instead of only firing in Round 1. This ensures someone is always structurally incentivized to push back, preventing the premature convergence observed in transcripts.
13
+
14
+ ## Problem Statement
15
+
16
+ From transcript analysis:
17
+ - Best insights came from genuine pushback (Kimi's "stop weaving, start haunting", Gemini's "dismantle delegation")
18
+ - These contrarian takes got softened by Round 2
19
+ - Current challenger/advocate roles only apply in `round_num == 0`
20
+ - Devil's advocate and challenger prompts are **nearly identical** — redundant complexity
21
+
22
+ ## Proposed Solution
23
+
24
+ 1. **Merge advocate and challenger** into single "challenger" role
25
+ 2. **Rotate challenger each round** — Claude R1 → GPT R2 → Gemini R3 → Grok R4 → Kimi R5 → wrap
26
+ 3. **Strengthen challenger prompt** with explicit anti-convergence requirements
27
+ 4. **Exclude challenger from consensus detection** so forced disagreement doesn't block early exit
28
+
29
+ ## Technical Approach
30
+
31
+ ### Files to Modify
32
+
33
+ | File | Changes |
34
+ |------|---------|
35
+ | `council.py` | Remove `devils_advocate_addition`, modify role application logic, update consensus detection |
36
+ | `cli.py` | Deprecate `--advocate` with warning, update `--challenger` semantics |
37
+
38
+ ### Implementation Steps
39
+
40
+ #### Step 1: Merge Advocate and Challenger Prompts
41
+
42
+ **File:** `council.py` lines 783-837
43
+
44
+ Remove `devils_advocate_addition` (lines 783-794). Keep only `challenger_addition` and strengthen it:
45
+
46
+ ```python
47
+ # council.py ~line 826
48
+ challenger_addition = """
49
+
50
+ SPECIAL ROLE: You are the CHALLENGER for this round. Your job is to argue the CONTRARIAN position.
51
+
52
+ REQUIREMENTS:
53
+ 1. You MUST explicitly DISAGREE with at least one major point from the other speakers
54
+ 2. Identify the weakest assumption in the emerging consensus and attack it
55
+ 3. Name ONE specific thing that would make the consensus WRONG
56
+ 4. You CANNOT use phrases like "building on", "adding nuance", or "I largely agree"
57
+ 5. If everyone is converging too fast, that's a red flag — find the hidden complexity
58
+
59
+ Even if you ultimately agree with the direction, you MUST articulate the strongest possible counter-argument.
60
+ If you can't find real disagreement, explain why the consensus might be groupthink."""
61
+ ```
62
+
63
+ #### Step 2: Rotate Challenger Each Round
64
+
65
+ **File:** `council.py` lines 879-883
66
+
67
+ Current code:
68
+ ```python
69
+ if idx == advocate_idx and round_num == 0:
70
+ system_prompt += devils_advocate_addition
71
+
72
+ if idx == challenger_idx and round_num == 0:
73
+ system_prompt += challenger_addition
74
+ ```
75
+
76
+ Replace with:
77
+ ```python
78
+ # Calculate rotating challenger for this round
79
+ if challenger_idx is not None:
80
+ # Explicit --challenger sets starting point, then rotates
81
+ current_challenger = (challenger_idx + round_num) % len(council_config)
82
+ else:
83
+ # Default: start with Claude (index 0), rotate through council
84
+ current_challenger = round_num % len(council_config)
85
+
86
+ if idx == current_challenger:
87
+ system_prompt += challenger_addition
88
+ ```
89
+
90
+ #### Step 3: Update Consensus Detection
91
+
92
+ **File:** `council.py` lines 545-564
93
+
94
+ Modify `detect_consensus` to accept and exclude challenger:
95
+
96
+ ```python
97
+ def detect_consensus(
98
+ conversation: list[tuple[str, str]],
99
+ council_config: list,
100
+ current_challenger_idx: int | None = None
101
+ ) -> tuple[bool, str]:
102
+ """Detect if council has converged. Returns (converged, reason)."""
103
+ council_size = len(council_config)
104
+
105
+ if len(conversation) < council_size:
106
+ return False, "insufficient responses"
107
+
108
+ recent = conversation[-council_size:]
109
+
110
+ # Exclude challenger from consensus count
111
+ if current_challenger_idx is not None:
112
+ challenger_name = council_config[current_challenger_idx][0]
113
+ recent = [(name, text) for name, text in recent if name != challenger_name]
114
+
115
+ effective_size = len(recent)
116
+ threshold = effective_size - 1 # Need all-but-one non-challengers to agree
117
+
118
+ consensus_count = sum(1 for _, text in recent if "CONSENSUS:" in text.upper())
119
+ if consensus_count >= threshold:
120
+ return True, "explicit consensus signals"
121
+
122
+ agreement_phrases = ["i agree with", "i concur", "we all agree", "consensus emerging"]
123
+ agreement_count = sum(
124
+ 1 for _, text in recent
125
+ if any(phrase in text.lower() for phrase in agreement_phrases)
126
+ )
127
+ if agreement_count >= threshold:
128
+ return True, "agreement language detected"
129
+
130
+ return False, "no consensus"
131
+ ```
132
+
133
+ Update the call site (~line 943):
134
+ ```python
135
+ current_challenger = (challenger_idx + round_num) % len(council_config) if challenger_idx is not None else round_num % len(council_config)
136
+ converged, reason = detect_consensus(conversation, council_config, current_challenger)
137
+ ```
138
+
139
+ #### Step 4: Deprecate --advocate Flag
140
+
141
+ **File:** `cli.py` lines 100-105 and 212
142
+
143
+ Add deprecation warning:
144
+ ```python
145
+ # cli.py ~line 212
146
+ if args.advocate:
147
+ print("Warning: --advocate is deprecated. Use --challenger instead.", file=sys.stderr)
148
+ # Map speaker number (1-5) to model name for backward compat
149
+ model_names = [n for n, _, _ in COUNCIL]
150
+ mapped_model = model_names[args.advocate - 1]
151
+ print(f" Mapping --advocate {args.advocate} to --challenger {mapped_model.lower()}", file=sys.stderr)
152
+ if not args.challenger:
153
+ args.challenger = mapped_model.lower()
154
+ ```
155
+
156
+ #### Step 5: Update Transcript Output
157
+
158
+ Show challenger indicator in round headers:
159
+
160
+ ```python
161
+ # council.py ~line 908 (in the speaker output section)
162
+ challenger_indicator = " (challenger)" if idx == current_challenger else ""
163
+ output_parts.append(f"### {name}{challenger_indicator}\n{response}")
164
+ ```
165
+
166
+ ### Function Signature Changes
167
+
168
+ **`run_council`** (lines 706-723):
169
+ - Remove `advocate_idx` parameter
170
+ - Keep `challenger_idx` (now means "starting challenger")
171
+
172
+ ```python
173
+ def run_council(
174
+ question: str,
175
+ council_config: list[tuple[str, str, tuple[str, str] | None]],
176
+ api_key: str,
177
+ google_api_key: str | None = None,
178
+ moonshot_api_key: str | None = None,
179
+ rounds: int = 1,
180
+ verbose: bool = True,
181
+ anonymous: bool = True,
182
+ blind: bool = True,
183
+ context: str | None = None,
184
+ social_mode: bool = False,
185
+ persona: str | None = None,
186
+ # advocate_idx removed
187
+ domain: str | None = None,
188
+ challenger_idx: int | None = None, # Now means "starting challenger"
189
+ format: str = "prose",
190
+ ) -> tuple[str, list[str]]:
191
+ ```
192
+
193
+ ## Acceptance Criteria
194
+
195
+ ### Functional Requirements
196
+ - [x] Challenger role rotates each round (R1: model 0, R2: model 1, etc.)
197
+ - [x] `--challenger X` sets starting point, then rotates
198
+ - [x] `--advocate` shows deprecation warning and maps to `--challenger`
199
+ - [x] Challenger excluded from consensus detection
200
+ - [x] Transcript shows which model is challenger each round
201
+ - [x] Claude removed from council (judge-only)
202
+ - [x] Judge has own voice with "Judge's Own Take" section
203
+
204
+ ### Non-Functional Requirements
205
+ - [x] No breaking changes to existing scripts (deprecation, not removal)
206
+ - [x] Tests pass for new rotation logic
207
+ - [x] README updated with new behavior
208
+ - [x] Tests updated for 4-model council
209
+
210
+ ## Success Metrics
211
+
212
+ Compare transcripts before/after:
213
+ - Sustained disagreement through Round 2+
214
+ - Contrarian perspectives survive to judge synthesis
215
+ - Judge notes unresolved tensions (may need separate prompt tweak)
216
+
217
+ ## Testing Plan
218
+
219
+ ### Unit Tests
220
+
221
+ Add to `tests/test_utils.py`:
222
+
223
+ ```python
224
+ class TestRotatingChallenger:
225
+ def test_challenger_rotates_default(self):
226
+ """Challenger rotates through council order by default."""
227
+ # R0: index 0, R1: index 1, R2: index 2...
228
+ assert get_challenger_for_round(None, 0, 5) == 0
229
+ assert get_challenger_for_round(None, 1, 5) == 1
230
+ assert get_challenger_for_round(None, 4, 5) == 4
231
+ assert get_challenger_for_round(None, 5, 5) == 0 # wraps
232
+
233
+ def test_challenger_rotates_from_explicit(self):
234
+ """Explicit --challenger sets starting point."""
235
+ # --challenger gemini (index 2): R0=2, R1=3, R2=4, R3=0...
236
+ assert get_challenger_for_round(2, 0, 5) == 2
237
+ assert get_challenger_for_round(2, 1, 5) == 3
238
+ assert get_challenger_for_round(2, 3, 5) == 0 # wraps
239
+
240
+ class TestConsensusWithChallenger:
241
+ def test_consensus_excludes_challenger(self):
242
+ """Challenger's agreement doesn't count toward consensus."""
243
+ conversation = [
244
+ ("Claude", "CONSENSUS: I agree"),
245
+ ("GPT", "CONSENSUS: agreed"),
246
+ ("Gemini", "CONSENSUS: yes"), # challenger
247
+ ("Grok", "CONSENSUS: agreed"),
248
+ ("Kimi", "different view"),
249
+ ]
250
+ council_config = [("Claude",), ("GPT",), ("Gemini",), ("Grok",), ("Kimi",)]
251
+ # Gemini (index 2) is challenger, excluded
252
+ # 3 of 4 non-challengers agree = consensus
253
+ converged, _ = detect_consensus(conversation, council_config, 2)
254
+ assert converged
255
+ ```
256
+
257
+ ### Integration Test
258
+
259
+ ```bash
260
+ # Run with 3 rounds, verify rotation in transcript
261
+ frontier-council "test question" --rounds 3 --output /tmp/test.md
262
+ grep -E "### .+ \(challenger\)" /tmp/test.md
263
+ # Should show 3 different models as challenger
264
+ ```
265
+
266
+ ## Risks and Mitigations
267
+
268
+ | Risk | Mitigation |
269
+ |------|------------|
270
+ | Forced disagreement feels artificial | Prompt says "even if you ultimately agree" — models can agree after challenging |
271
+ | Breaking scripts using `--advocate` | Deprecation warning + automatic mapping, not hard removal |
272
+ | Consensus detection edge cases | Thorough unit tests for threshold math |
273
+
274
+ ## References
275
+
276
+ - Brainstorm: `docs/brainstorms/2026-02-05-rotating-challenger-brainstorm.md`
277
+ - Current challenger impl: `council.py:826-837`
278
+ - Current advocate impl: `council.py:783-794` (to be removed)
279
+ - Consensus detection: `council.py:545-564`
280
+ - Deliberation loop: `council.py:839-947`
@@ -1,6 +1,6 @@
1
1
  """Frontier Council - Multi-model deliberation for important decisions."""
2
2
 
3
- __version__ = "0.1.2"
3
+ __version__ = "0.1.3"
4
4
 
5
5
  from .council import (
6
6
  run_council,
@@ -29,6 +29,8 @@ from .council import (
29
29
  COUNCIL,
30
30
  detect_social_context,
31
31
  run_council,
32
+ DOMAIN_CONTEXTS,
33
+ run_followup_discussion,
32
34
  )
33
35
 
34
36
 
@@ -42,9 +44,10 @@ Examples:
42
44
  frontier-council "What questions should I ask?" --social
43
45
  frontier-council "Career decision" --persona "builder who hates process work"
44
46
  frontier-council "Architecture choice" --rounds 3 --output transcript.md
47
+ frontier-council "Decision" --domain banking --followup --output counsel.md
45
48
  """,
46
49
  )
47
- parser.add_argument("question", help="The question for the council to deliberate")
50
+ parser.add_argument("question", nargs="?", help="The question for the council to deliberate")
48
51
  parser.add_argument(
49
52
  "--rounds",
50
53
  type=int,
@@ -74,6 +77,12 @@ Examples:
74
77
  "--context", "-c",
75
78
  help="Context hint for the judge (e.g., 'architecture decision', 'ethics question')",
76
79
  )
80
+ parser.add_argument(
81
+ "--format", "-f",
82
+ choices=["json", "yaml", "prose"],
83
+ default="prose",
84
+ help="Output format: json (machine-parseable), yaml (structured), prose (default)",
85
+ )
77
86
  parser.add_argument(
78
87
  "--share",
79
88
  action="store_true",
@@ -92,7 +101,20 @@ Examples:
92
101
  "--advocate",
93
102
  type=int,
94
103
  choices=[1, 2, 3, 4, 5],
95
- help="Which speaker (1-5) should be devil's advocate (default: random)",
104
+ help="DEPRECATED: Use --challenger instead. Maps to --challenger by model name.",
105
+ )
106
+ parser.add_argument(
107
+ "--domain",
108
+ help="Regulatory domain context (banking, healthcare, eu, fintech, bio)",
109
+ )
110
+ parser.add_argument(
111
+ "--challenger",
112
+ help="Which model should argue contrarian (claude, gpt, gemini, grok, kimi). Default: claude",
113
+ )
114
+ parser.add_argument(
115
+ "--followup",
116
+ action="store_true",
117
+ help="Enable followup mode to drill into specific points after judge synthesis",
96
118
  )
97
119
  parser.add_argument(
98
120
  "--no-save",
@@ -121,12 +143,44 @@ Examples:
121
143
  print(f"\n ... and {len(sessions) - 20} more")
122
144
  sys.exit(0)
123
145
 
146
+ # Require question for normal operation
147
+ if not args.question:
148
+ parser.error("the following arguments are required: question")
149
+
124
150
  # Auto-detect social context if not explicitly set
125
151
  social_mode = args.social or detect_social_context(args.question)
126
152
  if social_mode and not args.social and not args.quiet:
127
153
  print("(Auto-detected social context - enabling social calibration mode)")
128
154
  print()
129
155
 
156
+ # Validate and resolve domain
157
+ domain_context = None
158
+ if args.domain:
159
+ if args.domain.lower() not in DOMAIN_CONTEXTS:
160
+ print(f"Error: Unknown domain '{args.domain}'. Valid domains: {', '.join(DOMAIN_CONTEXTS.keys())}", file=sys.stderr)
161
+ sys.exit(1)
162
+ domain_context = args.domain.lower()
163
+
164
+ # Resolve challenger model
165
+ challenger_idx = None
166
+ if args.challenger:
167
+ challenger_lower = args.challenger.lower()
168
+ model_name_map = {n.lower(): i for i, (n, _, _) in enumerate(COUNCIL)}
169
+ if challenger_lower not in model_name_map:
170
+ print(f"Error: Unknown model '{args.challenger}'. Valid models: {', '.join(n for n, _, _ in COUNCIL)}", file=sys.stderr)
171
+ sys.exit(1)
172
+ challenger_idx = model_name_map[challenger_lower]
173
+ elif args.domain:
174
+ # Default challenger: GPT (index 0) when domain is set
175
+ # Reasoning: Grok is naturally contrarian anyway, so assigning GPT as challenger
176
+ # gives you two sources of pushback
177
+ challenger_idx = 0
178
+
179
+ if not args.quiet and challenger_idx is not None:
180
+ challenger_name = COUNCIL[challenger_idx][0]
181
+ print(f"(Contrainian challenger: {challenger_name})")
182
+ print()
183
+
130
184
  # Get API keys
131
185
  api_key = os.environ.get("OPENROUTER_API_KEY")
132
186
  if not api_key:
@@ -155,14 +209,28 @@ Examples:
155
209
  print()
156
210
 
157
211
  try:
158
- advocate_idx = (args.advocate - 1) if args.advocate else random.randint(0, len(COUNCIL) - 1)
212
+ # Handle deprecated --advocate flag
213
+ if args.advocate:
214
+ print("Warning: --advocate is deprecated. Use --challenger instead.", file=sys.stderr)
215
+ model_names = [n for n, _, _ in COUNCIL]
216
+ mapped_model = model_names[args.advocate - 1].lower()
217
+ print(f" Mapping --advocate {args.advocate} to --challenger {mapped_model}", file=sys.stderr)
218
+ if not args.challenger:
219
+ args.challenger = mapped_model
220
+ # Re-resolve challenger_idx after mapping
221
+ challenger_lower = args.challenger.lower()
222
+ model_name_map = {n.lower(): i for i, (n, _, _) in enumerate(COUNCIL)}
223
+ challenger_idx = model_name_map.get(challenger_lower, 0)
159
224
 
160
225
  if not args.quiet and args.persona:
161
226
  print(f"(Persona context: {args.persona})")
162
227
  print()
228
+
229
+ # Show starting challenger (now rotates each round)
163
230
  if not args.quiet:
164
- advocate_name = COUNCIL[advocate_idx][0]
165
- print(f"(Devil's advocate: {advocate_name})")
231
+ starting_challenger_idx = challenger_idx if challenger_idx is not None else 0
232
+ starting_challenger_name = COUNCIL[starting_challenger_idx][0]
233
+ print(f"(Starting challenger: {starting_challenger_name}, rotates each round)")
166
234
  print()
167
235
 
168
236
  transcript, failed_models = run_council(
@@ -178,9 +246,32 @@ Examples:
178
246
  context=args.context,
179
247
  social_mode=social_mode,
180
248
  persona=args.persona,
181
- advocate_idx=advocate_idx,
249
+ domain=domain_context,
250
+ challenger_idx=challenger_idx,
251
+ format=args.format,
182
252
  )
183
253
 
254
+ # Followup mode
255
+ followup_transcript = ""
256
+ if args.followup and not args.quiet:
257
+ print("\n" + "=" * 60)
258
+ print("Enter topic to explore further (or 'done'): ", end="", flush=True)
259
+ topic = input().strip()
260
+
261
+ if topic and topic.lower() != "done":
262
+ domain_ctxt = DOMAIN_CONTEXTS.get(domain_context, "") if domain_context else ""
263
+ followup_transcript = run_followup_discussion(
264
+ question=args.question,
265
+ topic=topic,
266
+ council_config=COUNCIL,
267
+ api_key=api_key,
268
+ domain_context=domain_ctxt,
269
+ social_mode=social_mode,
270
+ persona=args.persona,
271
+ verbose=not args.quiet,
272
+ )
273
+ transcript += "\n\n" + followup_transcript
274
+
184
275
  # Print failure summary
185
276
  if failed_models and not args.quiet:
186
277
  print()