massgen 0.0.3__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of massgen might be problematic. Click here for more details.
- massgen/__init__.py +94 -0
- massgen/agent_config.py +507 -0
- massgen/backend/CLAUDE_API_RESEARCH.md +266 -0
- massgen/backend/Function calling openai responses.md +1161 -0
- massgen/backend/GEMINI_API_DOCUMENTATION.md +410 -0
- massgen/backend/OPENAI_RESPONSES_API_FORMAT.md +65 -0
- massgen/backend/__init__.py +25 -0
- massgen/backend/base.py +180 -0
- massgen/backend/chat_completions.py +228 -0
- massgen/backend/claude.py +661 -0
- massgen/backend/gemini.py +652 -0
- massgen/backend/grok.py +187 -0
- massgen/backend/response.py +397 -0
- massgen/chat_agent.py +440 -0
- massgen/cli.py +686 -0
- massgen/configs/README.md +293 -0
- massgen/configs/creative_team.yaml +53 -0
- massgen/configs/gemini_4o_claude.yaml +31 -0
- massgen/configs/news_analysis.yaml +51 -0
- massgen/configs/research_team.yaml +51 -0
- massgen/configs/single_agent.yaml +18 -0
- massgen/configs/single_flash2.5.yaml +44 -0
- massgen/configs/technical_analysis.yaml +51 -0
- massgen/configs/three_agents_default.yaml +31 -0
- massgen/configs/travel_planning.yaml +51 -0
- massgen/configs/two_agents.yaml +39 -0
- massgen/frontend/__init__.py +20 -0
- massgen/frontend/coordination_ui.py +945 -0
- massgen/frontend/displays/__init__.py +24 -0
- massgen/frontend/displays/base_display.py +83 -0
- massgen/frontend/displays/rich_terminal_display.py +3497 -0
- massgen/frontend/displays/simple_display.py +93 -0
- massgen/frontend/displays/terminal_display.py +381 -0
- massgen/frontend/logging/__init__.py +9 -0
- massgen/frontend/logging/realtime_logger.py +197 -0
- massgen/message_templates.py +431 -0
- massgen/orchestrator.py +1222 -0
- massgen/tests/__init__.py +10 -0
- massgen/tests/multi_turn_conversation_design.md +214 -0
- massgen/tests/multiturn_llm_input_analysis.md +189 -0
- massgen/tests/test_case_studies.md +113 -0
- massgen/tests/test_claude_backend.py +310 -0
- massgen/tests/test_grok_backend.py +160 -0
- massgen/tests/test_message_context_building.py +293 -0
- massgen/tests/test_rich_terminal_display.py +378 -0
- massgen/tests/test_v3_3agents.py +117 -0
- massgen/tests/test_v3_simple.py +216 -0
- massgen/tests/test_v3_three_agents.py +272 -0
- massgen/tests/test_v3_two_agents.py +176 -0
- massgen/utils.py +79 -0
- massgen/v1/README.md +330 -0
- massgen/v1/__init__.py +91 -0
- massgen/v1/agent.py +605 -0
- massgen/v1/agents.py +330 -0
- massgen/v1/backends/gemini.py +584 -0
- massgen/v1/backends/grok.py +410 -0
- massgen/v1/backends/oai.py +571 -0
- massgen/v1/cli.py +351 -0
- massgen/v1/config.py +169 -0
- massgen/v1/examples/fast-4o-mini-config.yaml +44 -0
- massgen/v1/examples/fast_config.yaml +44 -0
- massgen/v1/examples/production.yaml +70 -0
- massgen/v1/examples/single_agent.yaml +39 -0
- massgen/v1/logging.py +974 -0
- massgen/v1/main.py +368 -0
- massgen/v1/orchestrator.py +1138 -0
- massgen/v1/streaming_display.py +1190 -0
- massgen/v1/tools.py +160 -0
- massgen/v1/types.py +245 -0
- massgen/v1/utils.py +199 -0
- massgen-0.0.3.dist-info/METADATA +568 -0
- massgen-0.0.3.dist-info/RECORD +76 -0
- massgen-0.0.3.dist-info/WHEEL +5 -0
- massgen-0.0.3.dist-info/entry_points.txt +2 -0
- massgen-0.0.3.dist-info/licenses/LICENSE +204 -0
- massgen-0.0.3.dist-info/top_level.txt +1 -0
|
@@ -0,0 +1,214 @@
|
|
|
1
|
+
# Multi-Turn Conversation Design for MassGen Orchestrator
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
This document outlines the design approach for implementing multi-turn conversations in the MassGen orchestrator, based on the proven approach used in MassGen v0.0.1.
|
|
6
|
+
|
|
7
|
+
## Current State
|
|
8
|
+
|
|
9
|
+
The current orchestrator has **partial multi-turn support**:
|
|
10
|
+
- ✅ Accepts conversation history through `chat(messages)` interface
|
|
11
|
+
- ✅ Maintains conversation history at orchestrator level
|
|
12
|
+
- ❌ **Limited**: Only processes the last user message for coordination
|
|
13
|
+
- ❌ **CLI Issue**: Multi-agent mode with history bypasses coordination UI display
|
|
14
|
+
|
|
15
|
+
## V0.0.1 Approach Analysis
|
|
16
|
+
|
|
17
|
+
### Key Innovation: Dynamic Context Reconstruction
|
|
18
|
+
|
|
19
|
+
V0.0.1 implements multi-turn conversations through **dynamic message reconstruction** rather than persistent conversation state:
|
|
20
|
+
|
|
21
|
+
```python
|
|
22
|
+
def work_on_task(self, task: TaskInput) -> List[Dict[str, str]]:
|
|
23
|
+
# Initialize working messages
|
|
24
|
+
working_status, working_messages, all_tools = self._get_curr_messages_and_tools(task)
|
|
25
|
+
|
|
26
|
+
while curr_round < self.max_rounds and self.state.status == "working":
|
|
27
|
+
# Process messages...
|
|
28
|
+
|
|
29
|
+
# When agents need to restart due to updates:
|
|
30
|
+
if renew_conversation:
|
|
31
|
+
# Rebuild conversation with latest context
|
|
32
|
+
working_status, working_messages, all_tools = self._get_curr_messages_and_tools(task)
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
### Core Principles
|
|
36
|
+
|
|
37
|
+
1. **Dynamic Context Generation**: Agents don't maintain persistent conversations - they regenerate context each time they restart
|
|
38
|
+
2. **Fresh State on Updates**: When other agents provide new answers, the conversation context is rebuilt with latest information
|
|
39
|
+
3. **Multi-layered Context**: Conversations include:
|
|
40
|
+
- System instructions
|
|
41
|
+
- Original task/question
|
|
42
|
+
- Current answers from all agents
|
|
43
|
+
- Voting information (when applicable)
|
|
44
|
+
|
|
45
|
+
### Context Reconstruction Method
|
|
46
|
+
|
|
47
|
+
```python
|
|
48
|
+
def _get_curr_messages_and_tools(self, task: TaskInput):
|
|
49
|
+
"""Get the current messages and tools for the agent."""
|
|
50
|
+
working_status, user_input = self._get_task_input(task) # Includes latest agent answers
|
|
51
|
+
working_messages = self._get_task_input_messages(user_input) # System + user messages
|
|
52
|
+
all_tools = self._get_available_tools() # Current tool set
|
|
53
|
+
return working_status, working_messages, all_tools
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
## Proposed Implementation
|
|
57
|
+
|
|
58
|
+
### 1. Orchestrator-Level Conversation Management
|
|
59
|
+
|
|
60
|
+
```python
|
|
61
|
+
class Orchestrator:
|
|
62
|
+
def __init__(self, ...):
|
|
63
|
+
self.conversation_history: List[Dict[str, Any]] = [] # User conversation
|
|
64
|
+
self.current_coordination_context: Optional[str] = None
|
|
65
|
+
|
|
66
|
+
async def chat(self, messages: List[Dict[str, Any]], ...):
|
|
67
|
+
# Extract full conversation context
|
|
68
|
+
conversation_context = self._build_conversation_context(messages)
|
|
69
|
+
|
|
70
|
+
# Start coordination with full context
|
|
71
|
+
async for chunk in self._coordinate_agents(conversation_context):
|
|
72
|
+
yield chunk
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
### 2. Context-Aware Agent Coordination
|
|
76
|
+
|
|
77
|
+
```python
|
|
78
|
+
async def _coordinate_agents(self, conversation_context: Dict[str, Any]):
|
|
79
|
+
"""Coordinate agents with full conversation context."""
|
|
80
|
+
|
|
81
|
+
# Build enriched task context including:
|
|
82
|
+
# - Original conversation history
|
|
83
|
+
# - Current user message
|
|
84
|
+
# - Existing agent answers (if any)
|
|
85
|
+
|
|
86
|
+
for agent_id in self.agents:
|
|
87
|
+
# Each agent gets full context when starting/restarting
|
|
88
|
+
agent_context = self._build_agent_context(
|
|
89
|
+
conversation_history=conversation_context['history'],
|
|
90
|
+
current_task=conversation_context['current_message'],
|
|
91
|
+
agent_answers=self._get_current_answers(),
|
|
92
|
+
voting_state=self._get_voting_state()
|
|
93
|
+
)
|
|
94
|
+
|
|
95
|
+
# Agent processes with full context
|
|
96
|
+
await self._stream_agent_execution(agent_id, agent_context)
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
### 3. Dynamic Context Rebuilding
|
|
100
|
+
|
|
101
|
+
```python
|
|
102
|
+
def _build_agent_context(self, conversation_history, current_task, agent_answers, voting_state):
|
|
103
|
+
"""Build agent context dynamically based on current state."""
|
|
104
|
+
|
|
105
|
+
# Format conversation history for agent context
|
|
106
|
+
history_context = self._format_conversation_history(conversation_history)
|
|
107
|
+
|
|
108
|
+
# Format current coordination state
|
|
109
|
+
coordination_context = self.message_templates.build_coordination_context(
|
|
110
|
+
current_task=current_task,
|
|
111
|
+
conversation_history=history_context,
|
|
112
|
+
agent_answers=agent_answers,
|
|
113
|
+
voting_state=voting_state
|
|
114
|
+
)
|
|
115
|
+
|
|
116
|
+
return {
|
|
117
|
+
"system_message": self.message_templates.system_message_with_context(history_context),
|
|
118
|
+
"user_message": coordination_context,
|
|
119
|
+
"tools": self.workflow_tools
|
|
120
|
+
}
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
### 4. Message Template Updates
|
|
124
|
+
|
|
125
|
+
```python
|
|
126
|
+
class MessageTemplates:
|
|
127
|
+
def build_coordination_context(self, current_task, conversation_history, agent_answers, voting_state):
|
|
128
|
+
"""Build coordination context including conversation history."""
|
|
129
|
+
|
|
130
|
+
context_parts = []
|
|
131
|
+
|
|
132
|
+
# Add conversation history if present
|
|
133
|
+
if conversation_history:
|
|
134
|
+
context_parts.append(f"""
|
|
135
|
+
<CONVERSATION_HISTORY>
|
|
136
|
+
{self._format_conversation_for_agent(conversation_history)}
|
|
137
|
+
</CONVERSATION_HISTORY>
|
|
138
|
+
""")
|
|
139
|
+
|
|
140
|
+
# Add current task
|
|
141
|
+
context_parts.append(f"""
|
|
142
|
+
<CURRENT_MESSAGE>
|
|
143
|
+
{current_task}
|
|
144
|
+
</CURRENT_MESSAGE>
|
|
145
|
+
""")
|
|
146
|
+
|
|
147
|
+
# Add agent answers if any exist
|
|
148
|
+
if agent_answers:
|
|
149
|
+
context_parts.append(f"""
|
|
150
|
+
<CURRENT_ANSWERS>
|
|
151
|
+
{self._format_agent_answers(agent_answers)}
|
|
152
|
+
</CURRENT_ANSWERS>
|
|
153
|
+
""")
|
|
154
|
+
|
|
155
|
+
return "\n".join(context_parts)
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
## Implementation Benefits
|
|
159
|
+
|
|
160
|
+
### 1. **True Multi-Turn Support**
|
|
161
|
+
- Agents understand full conversation context, not just current message
|
|
162
|
+
- Natural conversation flow maintained across coordination rounds
|
|
163
|
+
- Context-aware responses that reference previous exchanges
|
|
164
|
+
|
|
165
|
+
### 2. **Dynamic State Management**
|
|
166
|
+
- Agents always work with latest information from all sources
|
|
167
|
+
- No stale conversation state issues
|
|
168
|
+
- Clean restart mechanism when coordination state changes
|
|
169
|
+
|
|
170
|
+
### 3. **Scalable Architecture**
|
|
171
|
+
- Conversation history managed centrally at orchestrator level
|
|
172
|
+
- Agents remain stateless - context provided on-demand
|
|
173
|
+
- Easy to extend for different conversation patterns
|
|
174
|
+
|
|
175
|
+
### 4. **Backward Compatibility**
|
|
176
|
+
- Existing single-turn usage patterns continue to work
|
|
177
|
+
- Gradual migration path for CLI and frontend improvements
|
|
178
|
+
|
|
179
|
+
## Implementation Priority
|
|
180
|
+
|
|
181
|
+
This approach addresses multiple TODO items:
|
|
182
|
+
|
|
183
|
+
- **HIGH PRIORITY**: Support chat with an orchestrator (core multi-agent functionality)
|
|
184
|
+
- **MEDIUM PRIORITY**: Fix CLI multi-turn conversation display in multi-agent mode
|
|
185
|
+
- **MEDIUM PRIORITY**: Port missing features from v0.0.1
|
|
186
|
+
|
|
187
|
+
## Next Steps
|
|
188
|
+
|
|
189
|
+
1. **Phase 1**: Update message templates to support conversation context
|
|
190
|
+
2. **Phase 2**: Modify orchestrator coordination to pass full context to agents
|
|
191
|
+
3. **Phase 3**: Update CLI to properly display coordination with conversation history
|
|
192
|
+
4. **Phase 4**: Add conversation management utilities and testing
|
|
193
|
+
|
|
194
|
+
## Technical Notes
|
|
195
|
+
|
|
196
|
+
### Context Size Management
|
|
197
|
+
- Monitor conversation history length to prevent token limit issues
|
|
198
|
+
- Implement conversation truncation strategies for very long histories
|
|
199
|
+
- Consider conversation summarization for extended sessions
|
|
200
|
+
|
|
201
|
+
### Performance Considerations
|
|
202
|
+
- Context rebuilding is lightweight (no persistent state management)
|
|
203
|
+
- Memory usage scales with conversation length, not coordination complexity
|
|
204
|
+
- Caching opportunities for repeated context elements
|
|
205
|
+
|
|
206
|
+
### Testing Strategy
|
|
207
|
+
- Unit tests for context building methods
|
|
208
|
+
- Integration tests for multi-turn coordination scenarios
|
|
209
|
+
- CLI testing with conversation history of various lengths
|
|
210
|
+
- Edge case testing (empty history, very long conversations, etc.)
|
|
211
|
+
|
|
212
|
+
---
|
|
213
|
+
|
|
214
|
+
*This design document is based on analysis of MassGen v0.0.1's proven multi-turn approach and adapted for the current async streaming architecture.*
|
|
@@ -0,0 +1,189 @@
|
|
|
1
|
+
# Multi-Turn LLM Input Analysis - MassGen
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
This document shows the exact input structure sent to LLMs during multi-turn conversations in MassGen, demonstrating how conversation context is built and passed to agents.
|
|
6
|
+
|
|
7
|
+
## Context Building Progression
|
|
8
|
+
|
|
9
|
+
### Turn 1: Initial Question (No History)
|
|
10
|
+
|
|
11
|
+
**Context Size:** 568 characters total
|
|
12
|
+
- System Message: 389 chars
|
|
13
|
+
- User Message: 179 chars
|
|
14
|
+
- Tools: 2 (new_answer, vote)
|
|
15
|
+
|
|
16
|
+
**System Message:**
|
|
17
|
+
```
|
|
18
|
+
You are evaluating answers from multiple agents for final response to a message. Does the best CURRENT ANSWER address the ORIGINAL MESSAGE?
|
|
19
|
+
|
|
20
|
+
If YES, use the `vote` tool to record your vote and skip the `new_answer` tool.
|
|
21
|
+
Otherwise, do additional work first, then use the `new_answer` tool to record a better answer to the ORIGINAL MESSAGE. Make sure you actually call one of the two tools.
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
**User Message Structure:**
|
|
25
|
+
```
|
|
26
|
+
<ORIGINAL MESSAGE> What are the main benefits of renewable energy? <END OF ORIGINAL MESSAGE>
|
|
27
|
+
|
|
28
|
+
<CURRENT ANSWERS from the agents>
|
|
29
|
+
(no answers available yet)
|
|
30
|
+
<END OF CURRENT ANSWERS>
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
**Key Features:**
|
|
34
|
+
- ❌ No conversation history section
|
|
35
|
+
- ✅ Original message clearly marked
|
|
36
|
+
- ✅ Empty current answers section
|
|
37
|
+
- ❌ Standard system message (no context awareness)
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
41
|
+
### Turn 2: Follow-up with History
|
|
42
|
+
|
|
43
|
+
**Context Size:** 1,152 characters total (+103% from Turn 1)
|
|
44
|
+
- System Message: 574 chars (+47%)
|
|
45
|
+
- User Message: 578 chars (+223%)
|
|
46
|
+
- Tools: 2 (same)
|
|
47
|
+
|
|
48
|
+
**System Message (Enhanced):**
|
|
49
|
+
```
|
|
50
|
+
You are evaluating answers from multiple agents for final response to a message. Does the best CURRENT ANSWER address the ORIGINAL MESSAGE?
|
|
51
|
+
|
|
52
|
+
If YES, use the `vote` tool to record your vote and skip the `new_answer` tool.
|
|
53
|
+
Otherwise, do additional work first, then use the `new_answer` tool to record a better answer to the ORIGINAL MESSAGE. Make sure you actually call one of the two tools.
|
|
54
|
+
|
|
55
|
+
IMPORTANT: You are responding to the latest message in an ongoing conversation. Consider the full conversation context when evaluating answers and providing your response.
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
**User Message Structure:**
|
|
59
|
+
```
|
|
60
|
+
<CONVERSATION_HISTORY>
|
|
61
|
+
User: What are the main benefits of renewable energy?
|
|
62
|
+
Assistant: Renewable energy offers several key benefits including environmental sustainability, economic advantages, and energy security. It reduces greenhouse gas emissions, creates jobs, and decreases dependence on fossil fuel imports.
|
|
63
|
+
<END OF CONVERSATION_HISTORY>
|
|
64
|
+
|
|
65
|
+
<ORIGINAL MESSAGE> What about the challenges and limitations? <END OF ORIGINAL MESSAGE>
|
|
66
|
+
|
|
67
|
+
<CURRENT ANSWERS from the agents>
|
|
68
|
+
<agent1> Key benefits include environmental and economic advantages. <end of agent1>
|
|
69
|
+
<END OF CURRENT ANSWERS>
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
**Key Features:**
|
|
73
|
+
- ✅ **Conversation history section** with previous exchange
|
|
74
|
+
- ✅ Original message (current question)
|
|
75
|
+
- ✅ Agent answers from coordination
|
|
76
|
+
- ✅ **Context-aware system message**
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
### Turn 3: Extended Conversation
|
|
81
|
+
|
|
82
|
+
**Context Size:** 1,252 characters total (+120% from Turn 1)
|
|
83
|
+
- System Message: 574 chars (same as Turn 2)
|
|
84
|
+
- User Message: 678 chars (+279% from Turn 1)
|
|
85
|
+
- Tools: 2 (same)
|
|
86
|
+
|
|
87
|
+
**User Message Structure:**
|
|
88
|
+
```
|
|
89
|
+
<CONVERSATION_HISTORY>
|
|
90
|
+
User: What are the main benefits of renewable energy?
|
|
91
|
+
Assistant: Renewable energy offers environmental, economic, and energy security benefits.
|
|
92
|
+
User: What about the challenges and limitations?
|
|
93
|
+
Assistant: Main challenges include high upfront costs, intermittency issues, and infrastructure requirements.
|
|
94
|
+
<END OF CONVERSATION_HISTORY>
|
|
95
|
+
|
|
96
|
+
<ORIGINAL MESSAGE> How can governments support the transition? <END OF ORIGINAL MESSAGE>
|
|
97
|
+
|
|
98
|
+
<CURRENT ANSWERS from the agents>
|
|
99
|
+
<agent2> Benefits include environmental and economic advantages. <end of agent2>
|
|
100
|
+
<agent1> Challenges include costs, intermittency, and infrastructure needs. <end of agent1>
|
|
101
|
+
<END OF CURRENT ANSWERS>
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
**Key Features:**
|
|
105
|
+
- ✅ **Full conversation history** (2 previous exchanges)
|
|
106
|
+
- ✅ Original message (current question)
|
|
107
|
+
- ✅ **Multiple agent answers** from coordination
|
|
108
|
+
- ✅ Context-aware system message
|
|
109
|
+
- ✅ **Progressive context building**
|
|
110
|
+
|
|
111
|
+
## Context Growth Analysis
|
|
112
|
+
|
|
113
|
+
### Size Progression
|
|
114
|
+
```
|
|
115
|
+
Turn 1: 568 chars (baseline)
|
|
116
|
+
Turn 2: 1,152 chars (+103% growth)
|
|
117
|
+
Turn 3: 1,252 chars (+120% growth)
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
### Context Elements by Turn
|
|
121
|
+
| Element | Turn 1 | Turn 2 | Turn 3 |
|
|
122
|
+
|---------|--------|--------|--------|
|
|
123
|
+
| CONVERSATION_HISTORY | ❌ | ✅ (1 exchange) | ✅ (2 exchanges) |
|
|
124
|
+
| ORIGINAL MESSAGE | ✅ | ✅ | ✅ |
|
|
125
|
+
| CURRENT ANSWERS | ✅ (empty) | ✅ (1 agent) | ✅ (2 agents) |
|
|
126
|
+
| Context-aware system | ❌ | ✅ | ✅ |
|
|
127
|
+
|
|
128
|
+
## Key Implementation Insights
|
|
129
|
+
|
|
130
|
+
### 1. **Dynamic Context Reconstruction**
|
|
131
|
+
- Each turn rebuilds the complete context from scratch (v0.0.1 approach)
|
|
132
|
+
- No persistent conversation state in agents
|
|
133
|
+
- Context includes conversation history + current coordination state
|
|
134
|
+
|
|
135
|
+
### 2. **Conversation History Format**
|
|
136
|
+
```
|
|
137
|
+
<CONVERSATION_HISTORY>
|
|
138
|
+
User: [previous question]
|
|
139
|
+
Assistant: [previous response]
|
|
140
|
+
User: [another question]
|
|
141
|
+
Assistant: [another response]
|
|
142
|
+
<END OF CONVERSATION_HISTORY>
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
### 3. **System Message Enhancement**
|
|
146
|
+
- Turn 1: Standard evaluation prompt
|
|
147
|
+
- Turn 2+: Enhanced with context awareness note:
|
|
148
|
+
```
|
|
149
|
+
IMPORTANT: You are responding to the latest message in an ongoing conversation.
|
|
150
|
+
Consider the full conversation context when evaluating answers and providing your response.
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
### 4. **Multi-layered Context**
|
|
154
|
+
Each agent receives:
|
|
155
|
+
1. **Conversation History**: Previous user-assistant exchanges
|
|
156
|
+
2. **Original Message**: Current question being coordinated
|
|
157
|
+
3. **Current Answers**: Existing answers from other agents in this coordination round
|
|
158
|
+
4. **Tools**: Standard MassGen workflow tools (vote, new_answer)
|
|
159
|
+
|
|
160
|
+
## Benefits of This Approach
|
|
161
|
+
|
|
162
|
+
### ✅ **True Multi-Turn Support**
|
|
163
|
+
- Agents understand full conversation context, not just current message
|
|
164
|
+
- Natural conversation flow maintained across coordination rounds
|
|
165
|
+
- Context-aware responses that reference previous exchanges
|
|
166
|
+
|
|
167
|
+
### ✅ **Scalable Context Management**
|
|
168
|
+
- Context size grows linearly with conversation length
|
|
169
|
+
- Clean separation between conversation history and coordination state
|
|
170
|
+
- Memory-efficient (no persistent agent state)
|
|
171
|
+
|
|
172
|
+
### ✅ **Robust State Management**
|
|
173
|
+
- Each coordination round starts with fresh, complete context
|
|
174
|
+
- No issues with stale or inconsistent conversation state
|
|
175
|
+
- Easy to debug and understand exactly what agents receive
|
|
176
|
+
|
|
177
|
+
## Testing and Validation
|
|
178
|
+
|
|
179
|
+
The implementation includes comprehensive tests:
|
|
180
|
+
|
|
181
|
+
1. **`test_message_context_building.py`**: Shows exact message structure without API calls
|
|
182
|
+
2. **`test_multiturn_llm_input.py`**: Captures actual LLM calls during coordination with debug backend
|
|
183
|
+
3. **`test_multiturn_conversation.py`**: End-to-end multi-turn conversation testing
|
|
184
|
+
|
|
185
|
+
Run these tests to see the exact LLM inputs and validate the context building behavior.
|
|
186
|
+
|
|
187
|
+
---
|
|
188
|
+
|
|
189
|
+
*This analysis confirms that MassGen's multi-turn conversation support properly implements the proven v0.0.1 dynamic context reconstruction approach, adapted for the current async streaming architecture.*
|
|
@@ -0,0 +1,113 @@
|
|
|
1
|
+
# MassGen Case Study Test Commands
|
|
2
|
+
|
|
3
|
+
This document contains commands to test all the case studies from `docs/case_studies/` using the three agents default configuration.
|
|
4
|
+
|
|
5
|
+
## Quick Commands
|
|
6
|
+
|
|
7
|
+
All tests use the `three_agents_default.yaml` configuration with:
|
|
8
|
+
- **Gemini 2.5 Flash** (web search enabled)
|
|
9
|
+
- **GPT-4o-mini** (web search + code interpreter)
|
|
10
|
+
- **Grok 3 mini** (web search with citations)
|
|
11
|
+
|
|
12
|
+
### 1. Collaborative Creative Writing
|
|
13
|
+
```bash
|
|
14
|
+
# From project root:
|
|
15
|
+
python massgen/cli.py --config massgen/configs/three_agents_default.yaml "Write a short story about a robot who discovers music."
|
|
16
|
+
|
|
17
|
+
# From tests directory:
|
|
18
|
+
python ../cli.py --config ../configs/three_agents_default.yaml "Write a short story about a robot who discovers music."
|
|
19
|
+
```
|
|
20
|
+
**Original:** gpt-4o, gemini-2.5-flash, grok-3-mini
|
|
21
|
+
**Current:** gemini2.5flash, 4omini, grok3mini with builtin tools
|
|
22
|
+
|
|
23
|
+
### 2. AI News Synthesis
|
|
24
|
+
```bash
|
|
25
|
+
# From project root:
|
|
26
|
+
python massgen/cli.py --config massgen/configs/three_agents_default.yaml "find big AI news this week"
|
|
27
|
+
|
|
28
|
+
# From tests directory:
|
|
29
|
+
python ../cli.py --config ../configs/three_agents_default.yaml "find big AI news this week"
|
|
30
|
+
```
|
|
31
|
+
**Original:** gpt-4.1, gemini-2.5-flash, grok-3-mini
|
|
32
|
+
**Current:** gemini2.5flash, 4omini, grok3mini with web search
|
|
33
|
+
|
|
34
|
+
### 3. Grok HLE Cost Estimation
|
|
35
|
+
```bash
|
|
36
|
+
# From project root:
|
|
37
|
+
python massgen/cli.py --config massgen/configs/three_agents_default.yaml "How much does it cost to run HLE benchmark with Grok-4"
|
|
38
|
+
|
|
39
|
+
# From tests directory:
|
|
40
|
+
python ../cli.py --config ../configs/three_agents_default.yaml "How much does it cost to run HLE benchmark with Grok-4"
|
|
41
|
+
```
|
|
42
|
+
**Original:** gpt-4o, gemini-2.5-flash, grok-3-mini
|
|
43
|
+
**Current:** gemini2.5flash, 4omini, grok3mini with web search
|
|
44
|
+
|
|
45
|
+
### 4. IMO 2025 Winner
|
|
46
|
+
```bash
|
|
47
|
+
# From project root:
|
|
48
|
+
python massgen/cli.py --config massgen/configs/three_agents_default.yaml "Which AI won IMO 2025?"
|
|
49
|
+
|
|
50
|
+
# From tests directory:
|
|
51
|
+
python ../cli.py --config ../configs/three_agents_default.yaml "Which AI won IMO 2025?"
|
|
52
|
+
```
|
|
53
|
+
**Original:** gemini-2.5-flash, gpt-4.1 (2 agents)
|
|
54
|
+
**Current:** gemini2.5flash, 4omini, grok3mini (3 agents with web search)
|
|
55
|
+
|
|
56
|
+
### 5. Stockholm Travel Guide
|
|
57
|
+
```bash
|
|
58
|
+
# From project root:
|
|
59
|
+
python massgen/cli.py --config massgen/configs/three_agents_default.yaml "what's best to do in Stockholm in October 2025"
|
|
60
|
+
|
|
61
|
+
# From tests directory:
|
|
62
|
+
python ../cli.py --config ../configs/three_agents_default.yaml "what's best to do in Stockholm in October 2025"
|
|
63
|
+
```
|
|
64
|
+
**Original:** gemini-2.5-flash, gpt-4o (2 agents)
|
|
65
|
+
**Current:** gemini2.5flash, 4omini, grok3mini with web search for current info
|
|
66
|
+
|
|
67
|
+
## Configuration Details
|
|
68
|
+
|
|
69
|
+
The `three_agents_default.yaml` configuration provides:
|
|
70
|
+
|
|
71
|
+
### Agent Capabilities
|
|
72
|
+
- **gemini2.5flash**: Gemini 2.5 Flash with web search
|
|
73
|
+
- **4omini**: GPT-4o-mini with web search + code interpreter
|
|
74
|
+
- **grok3mini**: Grok 3 mini with web search and citations
|
|
75
|
+
|
|
76
|
+
### UI Features
|
|
77
|
+
- Rich terminal display with enhanced visualization
|
|
78
|
+
- Real-time coordination updates
|
|
79
|
+
- Logging enabled for debugging
|
|
80
|
+
|
|
81
|
+
### Custom Queries
|
|
82
|
+
```bash
|
|
83
|
+
# Use for any question with the three agents setup:
|
|
84
|
+
python massgen/cli.py --config massgen/configs/three_agents_default.yaml "your question here"
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
## Running All Tests
|
|
88
|
+
|
|
89
|
+
Use the interactive test script:
|
|
90
|
+
```bash
|
|
91
|
+
# From project root:
|
|
92
|
+
./massgen/tests/test_case_studies.sh
|
|
93
|
+
|
|
94
|
+
# From tests directory:
|
|
95
|
+
./test_case_studies.sh
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
## Requirements
|
|
99
|
+
|
|
100
|
+
- **OpenAI API Key:** Set `OPENAI_API_KEY` environment variable (for GPT-4o-mini)
|
|
101
|
+
- **Gemini API Key:** Set `GOOGLE_API_KEY` environment variable (for Gemini 2.5 Flash)
|
|
102
|
+
- **Grok API Key:** Set `XAI_API_KEY` environment variable (for Grok 3 mini)
|
|
103
|
+
|
|
104
|
+
## Notes
|
|
105
|
+
|
|
106
|
+
- All tests now use the unified `three_agents_default.yaml` configuration
|
|
107
|
+
- Combines three different model providers for diverse perspectives
|
|
108
|
+
- Built-in tools (web search, code execution) available across agents
|
|
109
|
+
- Rich terminal UI provides enhanced visualization and real-time updates
|
|
110
|
+
- Each agent brings unique strengths:
|
|
111
|
+
- Gemini: Advanced reasoning with web search
|
|
112
|
+
- GPT-4o-mini: Cost-effective with code execution
|
|
113
|
+
- Grok: Real-time information with citations
|