mageagent-local 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,501 @@
1
+ # MageAgent Orchestration Patterns
2
+
3
+ > Deep dive into multi-model orchestration patterns for optimal AI performance
4
+
5
+ ## Overview
6
+
7
+ MageAgent provides intelligent multi-model orchestration using Apple's MLX framework. Instead of relying on a single model, MageAgent routes tasks through specialized model combinations to achieve quality approaching Claude Opus while running entirely locally.
8
+
9
+ **Key Innovation**: All patterns route tool extraction through **Hermes-3 Q8**, ensuring reliable function calling regardless of which reasoning model generated the response.
10
+
11
+ ## Pattern Summary
12
+
13
+ | Pattern | Models Used | Quality Gain | Latency | Use Case |
14
+ |---------|-------------|--------------|---------|----------|
15
+ | `hybrid` | 72B + Hermes | Baseline+ | 30-60s | Best capability |
16
+ | `validated` | 72B + 7B + Hermes | +5-10% | 40-80s | Code with error checking |
17
+ | `compete` | 72B + 32B + 7B + Hermes | +10-15% | 60-120s | Critical decisions |
18
+ | `auto` | Dynamic | Variable | Variable | Automatic optimization |
19
+ | `tools` | Hermes only | Fast | 3-5s | Quick tool calls |
20
+ | `primary` | 72B only | Direct | 20-40s | Complex reasoning |
21
+
22
+ ---
23
+
24
+ ## Pattern Details
25
+
26
+ ### 1. Hybrid Pattern (Recommended)
27
+
28
+ **Command**: `mageagent:hybrid`
29
+
30
+ The hybrid pattern combines Qwen-72B Q8's superior reasoning with Hermes-3 Q8's reliable tool calling.
31
+
32
+ ```
33
+ User Request
34
+
35
+
36
+ ┌─────────────────────┐
37
+ │ Qwen-72B Q8 │ ← Primary reasoning
38
+ │ (77GB, 8 tok/s) │
39
+ └─────────────────────┘
40
+
41
+ │ Response with reasoning
42
+
43
+ ┌─────────────────────┐
44
+ │ Hermes-3 Q8 │ ← Tool extraction (if needed)
45
+ │ (9GB, 50 tok/s) │
46
+ └─────────────────────┘
47
+
48
+
49
+ Final Response + Tool Calls
50
+ ```
51
+
52
+ **Flow**:
53
+ 1. Qwen-72B Q8 analyzes the request and generates comprehensive response
54
+ 2. If tools are needed, Hermes-3 Q8 extracts structured tool calls
55
+ 3. Response returned with both reasoning and executable tool calls
56
+
57
+ **Best For**:
58
+ - Tasks requiring both analysis AND file operations
59
+ - Architecture decisions that need implementation
60
+ - Complex code generation with file modifications
61
+ - General-purpose "best effort" requests
62
+
63
+ **Example**:
64
+ ```bash
65
+ curl -X POST http://localhost:3457/v1/chat/completions \
66
+ -H "Content-Type: application/json" \
67
+ -d '{
68
+ "model": "mageagent:hybrid",
69
+ "messages": [{
70
+ "role": "user",
71
+ "content": "Analyze the architecture of src/ and suggest refactoring improvements, then create a plan file"
72
+ }]
73
+ }'
74
+ ```
75
+
76
+ ---
77
+
78
+ ### 2. Validated Pattern
79
+
80
+ **Command**: `mageagent:validated`
81
+
82
+ The validated pattern adds a quality check step that catches errors before output.
83
+
84
+ ```
85
+ User Request
86
+
87
+
88
+ ┌─────────────────────┐
89
+ │ Qwen-72B Q8 │ ← Generate initial response
90
+ │ (Primary) │
91
+ └─────────────────────┘
92
+
93
+
94
+ ┌─────────────────────┐
95
+ │ Qwen-7B Q4 │ ← Validate for errors
96
+ │ (Validator) │
97
+ │ PASS / FAIL │
98
+ └─────────────────────┘
99
+
100
+ │ If FAIL
101
+
102
+ ┌─────────────────────┐
103
+ │ Qwen-72B Q8 │ ← Regenerate with feedback
104
+ │ (Revision) │
105
+ └─────────────────────┘
106
+
107
+
108
+ ┌─────────────────────┐
109
+ │ Hermes-3 Q8 │ ← Extract tool calls
110
+ │ (Tools) │
111
+ └─────────────────────┘
112
+
113
+
114
+ Final Response + Tool Calls
115
+ ```
116
+
117
+ **Flow**:
118
+ 1. Qwen-72B Q8 generates initial response
119
+ 2. Qwen-7B Q4 validates for:
120
+ - Syntax errors
121
+ - Logic bugs
122
+ - Missing error handling
123
+ - Security vulnerabilities
124
+ - Performance problems
125
+ 3. If issues found, 72B regenerates with feedback
126
+ 4. Hermes-3 Q8 extracts tool calls if needed
127
+
128
+ **Quality Improvement**: +5-10% over single model
129
+
130
+ **Best For**:
131
+ - Code generation where correctness is critical
132
+ - Production code that needs to work first time
133
+ - Complex algorithms requiring validation
134
+ - Code reviews and improvements
135
+
136
+ **Validation Prompt Used**:
137
+ ```
138
+ Review the response for issues:
139
+ 1. Syntax errors
140
+ 2. Logic bugs
141
+ 3. Missing error handling
142
+ 4. Security vulnerabilities
143
+ 5. Performance problems
144
+
145
+ Output ONLY "PASS" if no issues found, or "FAIL: <brief list of issues>"
146
+ ```
147
+
148
+ ---
149
+
150
+ ### 3. Compete Pattern
151
+
152
+ **Command**: `mageagent:compete`
153
+
154
+ The compete pattern generates two independent solutions and uses a judge to select the best one.
155
+
156
+ ```
157
+ User Request
158
+
159
+ ├────────────────────────────────────┐
160
+ ▼ ▼
161
+ ┌─────────────────┐ ┌─────────────────┐
162
+ │ Qwen-72B Q8 │ │ Qwen-32B Q4 │
163
+ │ Solution A │ │ Solution B │
164
+ │ (Reasoning) │ │ (Coding) │
165
+ └─────────────────┘ └─────────────────┘
166
+ │ │
167
+ └───────────────┬────────────────────┘
168
+
169
+ ┌─────────────────┐
170
+ │ Qwen-7B Q4 │
171
+ │ (Judge) │
172
+ │ Pick A or B │
173
+ └─────────────────┘
174
+
175
+
176
+ ┌─────────────────┐
177
+ │ Hermes-3 Q8 │
178
+ │ (Tools) │
179
+ └─────────────────┘
180
+
181
+
182
+ Best Solution + Tool Calls
183
+ ```
184
+
185
+ **Flow**:
186
+ 1. Qwen-72B Q8 generates Solution A (reasoning-focused)
187
+ 2. Qwen-32B Q4 generates Solution B (coding-focused)
188
+ 3. Qwen-7B Q4 judges both solutions on:
189
+ - Correctness
190
+ - Efficiency
191
+ - Readability
192
+ - Error handling
193
+ 4. Best solution selected
194
+ 5. Hermes-3 Q8 extracts tool calls if needed
195
+
196
+ **Quality Improvement**: +10-15% over single model
197
+
198
+ **Best For**:
199
+ - Critical production code
200
+ - Complex architectural decisions
201
+ - When you want multiple perspectives
202
+ - Important features where quality > speed
203
+
204
+ **Judge Prompt Used**:
205
+ ```
206
+ Compare two solutions and pick the better one.
207
+ Consider: correctness, efficiency, readability, error handling.
208
+ Output ONLY "A" or "B" followed by a brief one-sentence explanation.
209
+ ```
210
+
211
+ **Note**: Models run sequentially (not parallel) due to Metal GPU limitations with large models.
212
+
213
+ ---
214
+
215
+ ### 4. Auto Pattern
216
+
217
+ **Command**: `mageagent:auto`
218
+
219
+ The auto pattern intelligently routes tasks based on classification.
220
+
221
+ ```
222
+ User Request
223
+
224
+
225
+ ┌─────────────────────┐
226
+ │ Task Classifier │
227
+ │ (Pattern matching) │
228
+ └─────────────────────┘
229
+
230
+ ├─── Coding task ──────► Validated Pattern
231
+
232
+ ├─── Reasoning task ───► Hybrid Pattern
233
+
234
+ └─── Simple task ──────► Validator (fast)
235
+ ```
236
+
237
+ **Classification Criteria**:
238
+
239
+ | Task Type | Patterns Detected | Route |
240
+ |-----------|-------------------|-------|
241
+ | Coding | `implement`, `function`, `class`, `refactor`, `fix bug`, code blocks | Validated |
242
+ | Reasoning | `explain`, `analyze`, `plan`, `design`, `architecture`, `compare` | Hybrid |
243
+ | Simple | Everything else | Validator (fast) |
244
+
245
+ **Best For**:
246
+ - General-purpose usage
247
+ - When you don't want to think about which pattern
248
+ - Mixed workloads
249
+ - Default configuration
250
+
251
+ ---
252
+
253
+ ### 5. Tools Pattern
254
+
255
+ **Command**: `mageagent:tools`
256
+
257
+ Direct access to Hermes-3 Q8 for fast tool calling.
258
+
259
+ ```
260
+ User Request
261
+
262
+
263
+ ┌─────────────────────┐
264
+ │ Hermes-3 Q8 │
265
+ │ (9GB, 50 tok/s) │
266
+ │ Tool specialist │
267
+ └─────────────────────┘
268
+
269
+
270
+ Response + Tool Calls
271
+ ```
272
+
273
+ **Best For**:
274
+ - Quick file operations
275
+ - Simple tool calls
276
+ - Fast iterations
277
+ - When you don't need heavy reasoning
278
+
279
+ **Why Q8 for Tools?**
280
+
281
+ Research shows Q4 quantization breaks tool calling in most models. Q8 preserves the precision needed for reliable function calling.
282
+
283
+ | Quantization | Tool Calling | Notes |
284
+ |--------------|--------------|-------|
285
+ | Q8_0 | Reliable | Recommended for tool calling |
286
+ | Q6_K | Partial | May work, less reliable |
287
+ | Q4_K_M | Broken | Do not use for tools |
288
+
289
+ ---
290
+
291
+ ### 6. Primary Pattern
292
+
293
+ **Command**: `mageagent:primary`
294
+
295
+ Direct access to Qwen-72B Q8 without tool extraction.
296
+
297
+ ```
298
+ User Request
299
+
300
+
301
+ ┌─────────────────────┐
302
+ │ Qwen-72B Q8 │
303
+ │ (77GB, 8 tok/s) │
304
+ │ Full reasoning │
305
+ └─────────────────────┘
306
+
307
+
308
+ Response (no tools)
309
+ ```
310
+
311
+ **Best For**:
312
+ - Pure reasoning tasks
313
+ - Analysis and explanation
314
+ - Architecture planning
315
+ - When you don't need file operations
316
+
317
+ ---
318
+
319
+ ## Model Specifications
320
+
321
+ | Model | Role | Quantization | Memory | Speed | Tool Calling |
322
+ |-------|------|--------------|--------|-------|--------------|
323
+ | Qwen2.5-72B | Primary | Q8_0 | 77GB | 8 tok/s | Yes |
324
+ | Qwen2.5-Coder-32B | Competitor | Q4_K_M | 18GB | 25 tok/s | No |
325
+ | Qwen2.5-Coder-7B | Validator | Q4_K_M | 5GB | 105 tok/s | No |
326
+ | Hermes-3-Llama-8B | Tools | Q8_0 | 9GB | 50 tok/s | Yes |
327
+
328
+ **Total Memory**: ~109GB (fits in 128GB with 19GB headroom)
329
+
330
+ ---
331
+
332
+ ## Tool Extraction Architecture
333
+
334
+ The key innovation in MageAgent is centralized tool extraction through Hermes-3 Q8.
335
+
336
+ ### Why Centralized Tool Extraction?
337
+
338
+ 1. **Q4 breaks tools**: Lower quantization loses the precision needed for structured JSON output
339
+ 2. **Consistent behavior**: Same tool format regardless of which model reasoned
340
+ 3. **Separation of concerns**: Reasoning models focus on reasoning, tool model focuses on tools
341
+ 4. **Reliability**: Hermes-3 is specifically trained for function calling
342
+
343
+ ### Tool Extraction Prompt
344
+
345
+ ```python
346
+ """You are a tool-calling assistant. Based on the task and response, extract required tool calls.
347
+
348
+ Output tool calls as JSON array:
349
+ [{"tool": "tool_name", "arguments": {"arg1": "value1"}}]
350
+
351
+ Available tools:
352
+ - Read: {"file_path": "path"} - Read file contents
353
+ - Write: {"file_path": "path", "content": "content"} - Write to file
354
+ - Edit: {"file_path": "path", "old_string": "text", "new_string": "text"} - Edit file
355
+ - Bash: {"command": "shell_command"} - Execute shell command
356
+ - Glob: {"pattern": "**/*.py", "path": "dir"} - Find files by pattern
357
+ - Grep: {"pattern": "regex", "path": "dir"} - Search file contents
358
+
359
+ If no tools are needed, output: []"""
360
+ ```
361
+
362
+ ### Tool Detection Heuristics
363
+
364
+ MageAgent detects when tool extraction is needed using pattern matching:
365
+
366
+ ```python
367
+ tool_patterns = [
368
+ r'\bread\b.*\bfile\b', r'\bwrite\b.*\bfile\b', r'\blist\b.*\bdir',
369
+ r'\bexecute\b', r'\brun\b', r'\bcreate\b.*\bfile\b', r'\bdelete\b',
370
+ r'\bsearch\b', r'\bfind\b', r'\bedit\b', r'\bmodify\b',
371
+ r'\btool\b', r'\bfunction\b.*\bcall\b', r'\bapi\b.*\bcall\b',
372
+ r'\bglob\b', r'\bgrep\b', r'\bbash\b', r'\bshell\b'
373
+ ]
374
+ ```
375
+
376
+ ---
377
+
378
+ ## Performance Benchmarks
379
+
380
+ On M4 Max (128GB unified memory):
381
+
382
+ ### Pattern Latency
383
+
384
+ | Pattern | Min | Typical | Max |
385
+ |---------|-----|---------|-----|
386
+ | tools | 2s | 3-5s | 10s |
387
+ | primary | 15s | 20-40s | 60s |
388
+ | hybrid | 20s | 30-60s | 90s |
389
+ | validated | 30s | 40-80s | 120s |
390
+ | compete | 50s | 60-120s | 180s |
391
+
392
+ ### Quality Comparison
393
+
394
+ Based on internal testing and research (Together AI MoA):
395
+
396
+ | Approach | vs Single Model | vs Claude Opus |
397
+ |----------|-----------------|----------------|
398
+ | Single Qwen-72B | Baseline | ~70% |
399
+ | MageAgent Validated | +5-10% | ~80% |
400
+ | MageAgent Compete | +10-15% | ~85% |
401
+ | Claude Opus | N/A | 100% |
402
+
403
+ ---
404
+
405
+ ## Choosing the Right Pattern
406
+
407
+ ### Decision Matrix
408
+
409
+ | Your Priority | Recommended Pattern |
410
+ |---------------|---------------------|
411
+ | Maximum quality | `compete` |
412
+ | Quality + speed | `validated` |
413
+ | Balanced | `hybrid` |
414
+ | Don't know | `auto` |
415
+ | Speed | `tools` or `primary` |
416
+ | Just reasoning | `primary` |
417
+ | Just tools | `tools` |
418
+
419
+ ### Task-Based Recommendations
420
+
421
+ | Task | Pattern | Why |
422
+ |------|---------|-----|
423
+ | Write production code | `validated` | Error checking catches bugs |
424
+ | Architectural design | `compete` | Multiple perspectives |
425
+ | Code review | `hybrid` | Reasoning + suggestions |
426
+ | Quick file operations | `tools` | Fast, reliable tools |
427
+ | Explain complex code | `primary` | Pure reasoning |
428
+ | General coding | `auto` | Intelligent routing |
429
+ | Critical feature | `compete` | Maximum quality |
430
+ | Rapid prototyping | `tools` | Speed over quality |
431
+
432
+ ---
433
+
434
+ ## Custom Patterns (Future)
435
+
436
+ MageAgent is designed to support custom orchestration patterns. Future versions will allow:
437
+
438
+ ```python
439
+ # Example custom pattern (not yet implemented)
440
+ CUSTOM_PATTERNS = {
441
+ "review_and_fix": {
442
+ "steps": [
443
+ {"model": "primary", "prompt": "Review this code..."},
444
+ {"model": "validator", "prompt": "List issues..."},
445
+ {"model": "competitor", "prompt": "Fix these issues..."},
446
+ {"model": "tools", "prompt": "Extract tool calls..."}
447
+ ]
448
+ }
449
+ }
450
+ ```
451
+
452
+ ---
453
+
454
+ ## Troubleshooting Patterns
455
+
456
+ ### Pattern Times Out
457
+
458
+ - Check available memory: `top -l 1 | grep PhysMem`
459
+ - Reduce context length in request
460
+ - Use faster pattern (`tools` or `auto`)
461
+
462
+ ### Tool Calls Not Extracted
463
+
464
+ - Verify Hermes-3 Q8 model is downloaded
465
+ - Check prompt contains tool-triggering keywords
466
+ - Use `mageagent:hybrid` which always attempts tool extraction
467
+
468
+ ### Quality Lower Than Expected
469
+
470
+ - Use `validated` or `compete` patterns
471
+ - Increase temperature slightly (0.7-0.8)
472
+ - Provide more context in prompt
473
+
474
+ ### Model Not Loading
475
+
476
+ - Check model exists: `ls ~/.cache/mlx-models/`
477
+ - Verify memory available: need ~77GB for 72B model
478
+ - Check logs: `tail -f ~/.claude/debug/mageagent.log`
479
+
480
+ ---
481
+
482
+ ## API Reference
483
+
484
+ All patterns are accessible via the OpenAI-compatible API:
485
+
486
+ ```bash
487
+ curl -X POST http://localhost:3457/v1/chat/completions \
488
+ -H "Content-Type: application/json" \
489
+ -d '{
490
+ "model": "mageagent:PATTERN",
491
+ "messages": [{"role": "user", "content": "Your prompt"}],
492
+ "temperature": 0.7,
493
+ "max_tokens": 2048
494
+ }'
495
+ ```
496
+
497
+ Replace `PATTERN` with: `auto`, `hybrid`, `validated`, `compete`, `tools`, `primary`, `validator`, `competitor`
498
+
499
+ ---
500
+
501
+ *Made with care by [Adverant](https://github.com/adverant)*