massgen 0.0.3__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of massgen might be problematic. Click here for more details.

Files changed (76) hide show
  1. massgen/__init__.py +94 -0
  2. massgen/agent_config.py +507 -0
  3. massgen/backend/CLAUDE_API_RESEARCH.md +266 -0
  4. massgen/backend/Function calling openai responses.md +1161 -0
  5. massgen/backend/GEMINI_API_DOCUMENTATION.md +410 -0
  6. massgen/backend/OPENAI_RESPONSES_API_FORMAT.md +65 -0
  7. massgen/backend/__init__.py +25 -0
  8. massgen/backend/base.py +180 -0
  9. massgen/backend/chat_completions.py +228 -0
  10. massgen/backend/claude.py +661 -0
  11. massgen/backend/gemini.py +652 -0
  12. massgen/backend/grok.py +187 -0
  13. massgen/backend/response.py +397 -0
  14. massgen/chat_agent.py +440 -0
  15. massgen/cli.py +686 -0
  16. massgen/configs/README.md +293 -0
  17. massgen/configs/creative_team.yaml +53 -0
  18. massgen/configs/gemini_4o_claude.yaml +31 -0
  19. massgen/configs/news_analysis.yaml +51 -0
  20. massgen/configs/research_team.yaml +51 -0
  21. massgen/configs/single_agent.yaml +18 -0
  22. massgen/configs/single_flash2.5.yaml +44 -0
  23. massgen/configs/technical_analysis.yaml +51 -0
  24. massgen/configs/three_agents_default.yaml +31 -0
  25. massgen/configs/travel_planning.yaml +51 -0
  26. massgen/configs/two_agents.yaml +39 -0
  27. massgen/frontend/__init__.py +20 -0
  28. massgen/frontend/coordination_ui.py +945 -0
  29. massgen/frontend/displays/__init__.py +24 -0
  30. massgen/frontend/displays/base_display.py +83 -0
  31. massgen/frontend/displays/rich_terminal_display.py +3497 -0
  32. massgen/frontend/displays/simple_display.py +93 -0
  33. massgen/frontend/displays/terminal_display.py +381 -0
  34. massgen/frontend/logging/__init__.py +9 -0
  35. massgen/frontend/logging/realtime_logger.py +197 -0
  36. massgen/message_templates.py +431 -0
  37. massgen/orchestrator.py +1222 -0
  38. massgen/tests/__init__.py +10 -0
  39. massgen/tests/multi_turn_conversation_design.md +214 -0
  40. massgen/tests/multiturn_llm_input_analysis.md +189 -0
  41. massgen/tests/test_case_studies.md +113 -0
  42. massgen/tests/test_claude_backend.py +310 -0
  43. massgen/tests/test_grok_backend.py +160 -0
  44. massgen/tests/test_message_context_building.py +293 -0
  45. massgen/tests/test_rich_terminal_display.py +378 -0
  46. massgen/tests/test_v3_3agents.py +117 -0
  47. massgen/tests/test_v3_simple.py +216 -0
  48. massgen/tests/test_v3_three_agents.py +272 -0
  49. massgen/tests/test_v3_two_agents.py +176 -0
  50. massgen/utils.py +79 -0
  51. massgen/v1/README.md +330 -0
  52. massgen/v1/__init__.py +91 -0
  53. massgen/v1/agent.py +605 -0
  54. massgen/v1/agents.py +330 -0
  55. massgen/v1/backends/gemini.py +584 -0
  56. massgen/v1/backends/grok.py +410 -0
  57. massgen/v1/backends/oai.py +571 -0
  58. massgen/v1/cli.py +351 -0
  59. massgen/v1/config.py +169 -0
  60. massgen/v1/examples/fast-4o-mini-config.yaml +44 -0
  61. massgen/v1/examples/fast_config.yaml +44 -0
  62. massgen/v1/examples/production.yaml +70 -0
  63. massgen/v1/examples/single_agent.yaml +39 -0
  64. massgen/v1/logging.py +974 -0
  65. massgen/v1/main.py +368 -0
  66. massgen/v1/orchestrator.py +1138 -0
  67. massgen/v1/streaming_display.py +1190 -0
  68. massgen/v1/tools.py +160 -0
  69. massgen/v1/types.py +245 -0
  70. massgen/v1/utils.py +199 -0
  71. massgen-0.0.3.dist-info/METADATA +568 -0
  72. massgen-0.0.3.dist-info/RECORD +76 -0
  73. massgen-0.0.3.dist-info/WHEEL +5 -0
  74. massgen-0.0.3.dist-info/entry_points.txt +2 -0
  75. massgen-0.0.3.dist-info/licenses/LICENSE +204 -0
  76. massgen-0.0.3.dist-info/top_level.txt +1 -0
@@ -0,0 +1,10 @@
1
+ """
2
+ MassGen Tests
3
+
4
+ Test suite for MassGen functionality including:
5
+ - Basic agent functionality
6
+ - Tool handling
7
+ - Orchestrator coordination
8
+ - Multi-agent scenarios
9
+ - Frontend displays
10
+ """
@@ -0,0 +1,214 @@
1
+ # Multi-Turn Conversation Design for MassGen Orchestrator
2
+
3
+ ## Overview
4
+
5
+ This document outlines the design approach for implementing multi-turn conversations in the MassGen orchestrator, based on the proven approach used in MassGen v0.0.1.
6
+
7
+ ## Current State
8
+
9
+ The current orchestrator has **partial multi-turn support**:
10
+ - ✅ Accepts conversation history through `chat(messages)` interface
11
+ - ✅ Maintains conversation history at orchestrator level
12
+ - ❌ **Limited**: Only processes the last user message for coordination
13
+ - ❌ **CLI Issue**: Multi-agent mode with history bypasses coordination UI display
14
+
15
+ ## V0.0.1 Approach Analysis
16
+
17
+ ### Key Innovation: Dynamic Context Reconstruction
18
+
19
+ V0.0.1 implements multi-turn conversations through **dynamic message reconstruction** rather than persistent conversation state:
20
+
21
+ ```python
22
+ def work_on_task(self, task: TaskInput) -> List[Dict[str, str]]:
23
+ # Initialize working messages
24
+ working_status, working_messages, all_tools = self._get_curr_messages_and_tools(task)
25
+
26
+ while curr_round < self.max_rounds and self.state.status == "working":
27
+ # Process messages...
28
+
29
+ # When agents need to restart due to updates:
30
+ if renew_conversation:
31
+ # Rebuild conversation with latest context
32
+ working_status, working_messages, all_tools = self._get_curr_messages_and_tools(task)
33
+ ```
34
+
35
+ ### Core Principles
36
+
37
+ 1. **Dynamic Context Generation**: Agents don't maintain persistent conversations - they regenerate context each time they restart
38
+ 2. **Fresh State on Updates**: When other agents provide new answers, the conversation context is rebuilt with latest information
39
+ 3. **Multi-layered Context**: Conversations include:
40
+ - System instructions
41
+ - Original task/question
42
+ - Current answers from all agents
43
+ - Voting information (when applicable)
44
+
45
+ ### Context Reconstruction Method
46
+
47
+ ```python
48
+ def _get_curr_messages_and_tools(self, task: TaskInput):
49
+ """Get the current messages and tools for the agent."""
50
+ working_status, user_input = self._get_task_input(task) # Includes latest agent answers
51
+ working_messages = self._get_task_input_messages(user_input) # System + user messages
52
+ all_tools = self._get_available_tools() # Current tool set
53
+ return working_status, working_messages, all_tools
54
+ ```
55
+
56
+ ## Proposed Implementation
57
+
58
+ ### 1. Orchestrator-Level Conversation Management
59
+
60
+ ```python
61
+ class Orchestrator:
62
+ def __init__(self, ...):
63
+ self.conversation_history: List[Dict[str, Any]] = [] # User conversation
64
+ self.current_coordination_context: Optional[str] = None
65
+
66
+ async def chat(self, messages: List[Dict[str, Any]], ...):
67
+ # Extract full conversation context
68
+ conversation_context = self._build_conversation_context(messages)
69
+
70
+ # Start coordination with full context
71
+ async for chunk in self._coordinate_agents(conversation_context):
72
+ yield chunk
73
+ ```
74
+
75
+ ### 2. Context-Aware Agent Coordination
76
+
77
+ ```python
78
+ async def _coordinate_agents(self, conversation_context: Dict[str, Any]):
79
+ """Coordinate agents with full conversation context."""
80
+
81
+ # Build enriched task context including:
82
+ # - Original conversation history
83
+ # - Current user message
84
+ # - Existing agent answers (if any)
85
+
86
+ for agent_id in self.agents:
87
+ # Each agent gets full context when starting/restarting
88
+ agent_context = self._build_agent_context(
89
+ conversation_history=conversation_context['history'],
90
+ current_task=conversation_context['current_message'],
91
+ agent_answers=self._get_current_answers(),
92
+ voting_state=self._get_voting_state()
93
+ )
94
+
95
+ # Agent processes with full context
96
+ await self._stream_agent_execution(agent_id, agent_context)
97
+ ```
98
+
99
+ ### 3. Dynamic Context Rebuilding
100
+
101
+ ```python
102
+ def _build_agent_context(self, conversation_history, current_task, agent_answers, voting_state):
103
+ """Build agent context dynamically based on current state."""
104
+
105
+ # Format conversation history for agent context
106
+ history_context = self._format_conversation_history(conversation_history)
107
+
108
+ # Format current coordination state
109
+ coordination_context = self.message_templates.build_coordination_context(
110
+ current_task=current_task,
111
+ conversation_history=history_context,
112
+ agent_answers=agent_answers,
113
+ voting_state=voting_state
114
+ )
115
+
116
+ return {
117
+ "system_message": self.message_templates.system_message_with_context(history_context),
118
+ "user_message": coordination_context,
119
+ "tools": self.workflow_tools
120
+ }
121
+ ```
122
+
123
+ ### 4. Message Template Updates
124
+
125
+ ```python
126
+ class MessageTemplates:
127
+ def build_coordination_context(self, current_task, conversation_history, agent_answers, voting_state):
128
+ """Build coordination context including conversation history."""
129
+
130
+ context_parts = []
131
+
132
+ # Add conversation history if present
133
+ if conversation_history:
134
+ context_parts.append(f"""
135
+ <CONVERSATION_HISTORY>
136
+ {self._format_conversation_for_agent(conversation_history)}
137
+ </CONVERSATION_HISTORY>
138
+ """)
139
+
140
+ # Add current task
141
+ context_parts.append(f"""
142
+ <CURRENT_MESSAGE>
143
+ {current_task}
144
+ </CURRENT_MESSAGE>
145
+ """)
146
+
147
+ # Add agent answers if any exist
148
+ if agent_answers:
149
+ context_parts.append(f"""
150
+ <CURRENT_ANSWERS>
151
+ {self._format_agent_answers(agent_answers)}
152
+ </CURRENT_ANSWERS>
153
+ """)
154
+
155
+ return "\n".join(context_parts)
156
+ ```
157
+
158
+ ## Implementation Benefits
159
+
160
+ ### 1. **True Multi-Turn Support**
161
+ - Agents understand full conversation context, not just current message
162
+ - Natural conversation flow maintained across coordination rounds
163
+ - Context-aware responses that reference previous exchanges
164
+
165
+ ### 2. **Dynamic State Management**
166
+ - Agents always work with latest information from all sources
167
+ - No stale conversation state issues
168
+ - Clean restart mechanism when coordination state changes
169
+
170
+ ### 3. **Scalable Architecture**
171
+ - Conversation history managed centrally at orchestrator level
172
+ - Agents remain stateless - context provided on-demand
173
+ - Easy to extend for different conversation patterns
174
+
175
+ ### 4. **Backward Compatibility**
176
+ - Existing single-turn usage patterns continue to work
177
+ - Gradual migration path for CLI and frontend improvements
178
+
179
+ ## Implementation Priority
180
+
181
+ This approach addresses multiple TODO items:
182
+
183
+ - **HIGH PRIORITY**: Support chat with an orchestrator (core multi-agent functionality)
184
+ - **MEDIUM PRIORITY**: Fix CLI multi-turn conversation display in multi-agent mode
185
+ - **MEDIUM PRIORITY**: Port missing features from v0.0.1
186
+
187
+ ## Next Steps
188
+
189
+ 1. **Phase 1**: Update message templates to support conversation context
190
+ 2. **Phase 2**: Modify orchestrator coordination to pass full context to agents
191
+ 3. **Phase 3**: Update CLI to properly display coordination with conversation history
192
+ 4. **Phase 4**: Add conversation management utilities and testing
193
+
194
+ ## Technical Notes
195
+
196
+ ### Context Size Management
197
+ - Monitor conversation history length to prevent token limit issues
198
+ - Implement conversation truncation strategies for very long histories
199
+ - Consider conversation summarization for extended sessions
200
+
201
+ ### Performance Considerations
202
+ - Context rebuilding is lightweight (no persistent state management)
203
+ - Memory usage scales with conversation length, not coordination complexity
204
+ - Caching opportunities for repeated context elements
205
+
206
+ ### Testing Strategy
207
+ - Unit tests for context building methods
208
+ - Integration tests for multi-turn coordination scenarios
209
+ - CLI testing with conversation history of various lengths
210
+ - Edge case testing (empty history, very long conversations, etc.)
211
+
212
+ ---
213
+
214
+ *This design document is based on analysis of MassGen v0.0.1's proven multi-turn approach and adapted for the current async streaming architecture.*
@@ -0,0 +1,189 @@
1
+ # Multi-Turn LLM Input Analysis - MassGen
2
+
3
+ ## Overview
4
+
5
+ This document shows the exact input structure sent to LLMs during multi-turn conversations in MassGen, demonstrating how conversation context is built and passed to agents.
6
+
7
+ ## Context Building Progression
8
+
9
+ ### Turn 1: Initial Question (No History)
10
+
11
+ **Context Size:** 568 characters total
12
+ - System Message: 389 chars
13
+ - User Message: 179 chars
14
+ - Tools: 2 (new_answer, vote)
15
+
16
+ **System Message:**
17
+ ```
18
+ You are evaluating answers from multiple agents for final response to a message. Does the best CURRENT ANSWER address the ORIGINAL MESSAGE?
19
+
20
+ If YES, use the `vote` tool to record your vote and skip the `new_answer` tool.
21
+ Otherwise, do additional work first, then use the `new_answer` tool to record a better answer to the ORIGINAL MESSAGE. Make sure you actually call one of the two tools.
22
+ ```
23
+
24
+ **User Message Structure:**
25
+ ```
26
+ <ORIGINAL MESSAGE> What are the main benefits of renewable energy? <END OF ORIGINAL MESSAGE>
27
+
28
+ <CURRENT ANSWERS from the agents>
29
+ (no answers available yet)
30
+ <END OF CURRENT ANSWERS>
31
+ ```
32
+
33
+ **Key Features:**
34
+ - ❌ No conversation history section
35
+ - ✅ Original message clearly marked
36
+ - ✅ Empty current answers section
37
+ - ❌ Standard system message (no context awareness)
38
+
39
+ ---
40
+
41
+ ### Turn 2: Follow-up with History
42
+
43
+ **Context Size:** 1,152 characters total (+103% from Turn 1)
44
+ - System Message: 574 chars (+47%)
45
+ - User Message: 578 chars (+223%)
46
+ - Tools: 2 (same)
47
+
48
+ **System Message (Enhanced):**
49
+ ```
50
+ You are evaluating answers from multiple agents for final response to a message. Does the best CURRENT ANSWER address the ORIGINAL MESSAGE?
51
+
52
+ If YES, use the `vote` tool to record your vote and skip the `new_answer` tool.
53
+ Otherwise, do additional work first, then use the `new_answer` tool to record a better answer to the ORIGINAL MESSAGE. Make sure you actually call one of the two tools.
54
+
55
+ IMPORTANT: You are responding to the latest message in an ongoing conversation. Consider the full conversation context when evaluating answers and providing your response.
56
+ ```
57
+
58
+ **User Message Structure:**
59
+ ```
60
+ <CONVERSATION_HISTORY>
61
+ User: What are the main benefits of renewable energy?
62
+ Assistant: Renewable energy offers several key benefits including environmental sustainability, economic advantages, and energy security. It reduces greenhouse gas emissions, creates jobs, and decreases dependence on fossil fuel imports.
63
+ <END OF CONVERSATION_HISTORY>
64
+
65
+ <ORIGINAL MESSAGE> What about the challenges and limitations? <END OF ORIGINAL MESSAGE>
66
+
67
+ <CURRENT ANSWERS from the agents>
68
+ <agent1> Key benefits include environmental and economic advantages. <end of agent1>
69
+ <END OF CURRENT ANSWERS>
70
+ ```
71
+
72
+ **Key Features:**
73
+ - ✅ **Conversation history section** with previous exchange
74
+ - ✅ Original message (current question)
75
+ - ✅ Agent answers from coordination
76
+ - ✅ **Context-aware system message**
77
+
78
+ ---
79
+
80
+ ### Turn 3: Extended Conversation
81
+
82
+ **Context Size:** 1,252 characters total (+120% from Turn 1)
83
+ - System Message: 574 chars (same as Turn 2)
84
+ - User Message: 678 chars (+279% from Turn 1)
85
+ - Tools: 2 (same)
86
+
87
+ **User Message Structure:**
88
+ ```
89
+ <CONVERSATION_HISTORY>
90
+ User: What are the main benefits of renewable energy?
91
+ Assistant: Renewable energy offers environmental, economic, and energy security benefits.
92
+ User: What about the challenges and limitations?
93
+ Assistant: Main challenges include high upfront costs, intermittency issues, and infrastructure requirements.
94
+ <END OF CONVERSATION_HISTORY>
95
+
96
+ <ORIGINAL MESSAGE> How can governments support the transition? <END OF ORIGINAL MESSAGE>
97
+
98
+ <CURRENT ANSWERS from the agents>
99
+ <agent2> Benefits include environmental and economic advantages. <end of agent2>
100
+ <agent1> Challenges include costs, intermittency, and infrastructure needs. <end of agent1>
101
+ <END OF CURRENT ANSWERS>
102
+ ```
103
+
104
+ **Key Features:**
105
+ - ✅ **Full conversation history** (2 previous exchanges)
106
+ - ✅ Original message (current question)
107
+ - ✅ **Multiple agent answers** from coordination
108
+ - ✅ Context-aware system message
109
+ - ✅ **Progressive context building**
110
+
111
+ ## Context Growth Analysis
112
+
113
+ ### Size Progression
114
+ ```
115
+ Turn 1: 568 chars (baseline)
116
+ Turn 2: 1,152 chars (+103% growth)
117
+ Turn 3: 1,252 chars (+120% growth)
118
+ ```
119
+
120
+ ### Context Elements by Turn
121
+ | Element | Turn 1 | Turn 2 | Turn 3 |
122
+ |---------|--------|--------|--------|
123
+ | CONVERSATION_HISTORY | ❌ | ✅ (1 exchange) | ✅ (2 exchanges) |
124
+ | ORIGINAL MESSAGE | ✅ | ✅ | ✅ |
125
+ | CURRENT ANSWERS | ✅ (empty) | ✅ (1 agent) | ✅ (2 agents) |
126
+ | Context-aware system | ❌ | ✅ | ✅ |
127
+
128
+ ## Key Implementation Insights
129
+
130
+ ### 1. **Dynamic Context Reconstruction**
131
+ - Each turn rebuilds the complete context from scratch (v0.0.1 approach)
132
+ - No persistent conversation state in agents
133
+ - Context includes conversation history + current coordination state
134
+
135
+ ### 2. **Conversation History Format**
136
+ ```
137
+ <CONVERSATION_HISTORY>
138
+ User: [previous question]
139
+ Assistant: [previous response]
140
+ User: [another question]
141
+ Assistant: [another response]
142
+ <END OF CONVERSATION_HISTORY>
143
+ ```
144
+
145
+ ### 3. **System Message Enhancement**
146
+ - Turn 1: Standard evaluation prompt
147
+ - Turn 2+: Enhanced with context awareness note:
148
+ ```
149
+ IMPORTANT: You are responding to the latest message in an ongoing conversation.
150
+ Consider the full conversation context when evaluating answers and providing your response.
151
+ ```
152
+
153
+ ### 4. **Multi-layered Context**
154
+ Each agent receives:
155
+ 1. **Conversation History**: Previous user-assistant exchanges
156
+ 2. **Original Message**: Current question being coordinated
157
+ 3. **Current Answers**: Existing answers from other agents in this coordination round
158
+ 4. **Tools**: Standard MassGen workflow tools (vote, new_answer)
159
+
160
+ ## Benefits of This Approach
161
+
162
+ ### ✅ **True Multi-Turn Support**
163
+ - Agents understand full conversation context, not just current message
164
+ - Natural conversation flow maintained across coordination rounds
165
+ - Context-aware responses that reference previous exchanges
166
+
167
+ ### ✅ **Scalable Context Management**
168
+ - Context size grows linearly with conversation length
169
+ - Clean separation between conversation history and coordination state
170
+ - Memory-efficient (no persistent agent state)
171
+
172
+ ### ✅ **Robust State Management**
173
+ - Each coordination round starts with fresh, complete context
174
+ - No issues with stale or inconsistent conversation state
175
+ - Easy to debug and understand exactly what agents receive
176
+
177
+ ## Testing and Validation
178
+
179
+ The implementation includes comprehensive tests:
180
+
181
+ 1. **`test_message_context_building.py`**: Shows exact message structure without API calls
182
+ 2. **`test_multiturn_llm_input.py`**: Captures actual LLM calls during coordination with debug backend
183
+ 3. **`test_multiturn_conversation.py`**: End-to-end multi-turn conversation testing
184
+
185
+ Run these tests to see the exact LLM inputs and validate the context building behavior.
186
+
187
+ ---
188
+
189
+ *This analysis confirms that MassGen's multi-turn conversation support properly implements the proven v0.0.1 dynamic context reconstruction approach, adapted for the current async streaming architecture.*
@@ -0,0 +1,113 @@
1
+ # MassGen Case Study Test Commands
2
+
3
+ This document contains commands to test all the case studies from `docs/case_studies/` using the three agents default configuration.
4
+
5
+ ## Quick Commands
6
+
7
+ All tests use the `three_agents_default.yaml` configuration with:
8
+ - **Gemini 2.5 Flash** (web search enabled)
9
+ - **GPT-4o-mini** (web search + code interpreter)
10
+ - **Grok 3 mini** (web search with citations)
11
+
12
+ ### 1. Collaborative Creative Writing
13
+ ```bash
14
+ # From project root:
15
+ python massgen/cli.py --config massgen/configs/three_agents_default.yaml "Write a short story about a robot who discovers music."
16
+
17
+ # From tests directory:
18
+ python ../cli.py --config ../configs/three_agents_default.yaml "Write a short story about a robot who discovers music."
19
+ ```
20
+ **Original:** gpt-4o, gemini-2.5-flash, grok-3-mini
21
+ **Current:** gemini2.5flash, 4omini, grok3mini with builtin tools
22
+
23
+ ### 2. AI News Synthesis
24
+ ```bash
25
+ # From project root:
26
+ python massgen/cli.py --config massgen/configs/three_agents_default.yaml "find big AI news this week"
27
+
28
+ # From tests directory:
29
+ python ../cli.py --config ../configs/three_agents_default.yaml "find big AI news this week"
30
+ ```
31
+ **Original:** gpt-4.1, gemini-2.5-flash, grok-3-mini
32
+ **Current:** gemini2.5flash, 4omini, grok3mini with web search
33
+
34
+ ### 3. Grok HLE Cost Estimation
35
+ ```bash
36
+ # From project root:
37
+ python massgen/cli.py --config massgen/configs/three_agents_default.yaml "How much does it cost to run HLE benchmark with Grok-4"
38
+
39
+ # From tests directory:
40
+ python ../cli.py --config ../configs/three_agents_default.yaml "How much does it cost to run HLE benchmark with Grok-4"
41
+ ```
42
+ **Original:** gpt-4o, gemini-2.5-flash, grok-3-mini
43
+ **Current:** gemini2.5flash, 4omini, grok3mini with web search
44
+
45
+ ### 4. IMO 2025 Winner
46
+ ```bash
47
+ # From project root:
48
+ python massgen/cli.py --config massgen/configs/three_agents_default.yaml "Which AI won IMO 2025?"
49
+
50
+ # From tests directory:
51
+ python ../cli.py --config ../configs/three_agents_default.yaml "Which AI won IMO 2025?"
52
+ ```
53
+ **Original:** gemini-2.5-flash, gpt-4.1 (2 agents)
54
+ **Current:** gemini2.5flash, 4omini, grok3mini (3 agents with web search)
55
+
56
+ ### 5. Stockholm Travel Guide
57
+ ```bash
58
+ # From project root:
59
+ python massgen/cli.py --config massgen/configs/three_agents_default.yaml "what's best to do in Stockholm in October 2025"
60
+
61
+ # From tests directory:
62
+ python ../cli.py --config ../configs/three_agents_default.yaml "what's best to do in Stockholm in October 2025"
63
+ ```
64
+ **Original:** gemini-2.5-flash, gpt-4o (2 agents)
65
+ **Current:** gemini2.5flash, 4omini, grok3mini with web search for current info
66
+
67
+ ## Configuration Details
68
+
69
+ The `three_agents_default.yaml` configuration provides:
70
+
71
+ ### Agent Capabilities
72
+ - **gemini2.5flash**: Gemini 2.5 Flash with web search
73
+ - **4omini**: GPT-4o-mini with web search + code interpreter
74
+ - **grok3mini**: Grok 3 mini with web search and citations
75
+
76
+ ### UI Features
77
+ - Rich terminal display with enhanced visualization
78
+ - Real-time coordination updates
79
+ - Logging enabled for debugging
80
+
81
+ ### Custom Queries
82
+ ```bash
83
+ # Use for any question with the three agents setup:
84
+ python massgen/cli.py --config massgen/configs/three_agents_default.yaml "your question here"
85
+ ```
86
+
87
+ ## Running All Tests
88
+
89
+ Use the interactive test script:
90
+ ```bash
91
+ # From project root:
92
+ ./massgen/tests/test_case_studies.sh
93
+
94
+ # From tests directory:
95
+ ./test_case_studies.sh
96
+ ```
97
+
98
+ ## Requirements
99
+
100
+ - **OpenAI API Key:** Set `OPENAI_API_KEY` environment variable (for GPT-4o-mini)
101
+ - **Gemini API Key:** Set `GOOGLE_API_KEY` environment variable (for Gemini 2.5 Flash)
102
+ - **Grok API Key:** Set `XAI_API_KEY` environment variable (for Grok 3 mini)
103
+
104
+ ## Notes
105
+
106
+ - All tests now use the unified `three_agents_default.yaml` configuration
107
+ - Combines three different model providers for diverse perspectives
108
+ - Built-in tools (web search, code execution) available across agents
109
+ - Rich terminal UI provides enhanced visualization and real-time updates
110
+ - Each agent brings unique strengths:
111
+ - Gemini: Advanced reasoning with web search
112
+ - GPT-4o-mini: Cost-effective with code execution
113
+ - Grok: Real-time information with citations