moa-cli 0.1.0__tar.gz → 0.2.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
moa_cli-0.2.1/PKG-INFO ADDED
@@ -0,0 +1,242 @@
1
+ Metadata-Version: 2.3
2
+ Name: moa-cli
3
+ Version: 0.2.1
4
+ Summary: Ask one question to multiple local AI coding CLIs in parallel and collect their answers.
5
+ Keywords: llm,agents,cli,claude,codex,agy,opencode,peer-review
6
+ Author: Paul-Louis Pröve
7
+ Author-email: Paul-Louis Pröve <plp@workgenius.com>
8
+ Requires-Dist: typer>=0.25.0
9
+ Requires-Python: >=3.12
10
+ Description-Content-Type: text/markdown
11
+
12
+ <p align="center">
13
+ <img src="assets/logo-full-white.png" alt="moa - mixture of agents" width="360">
14
+ </p>
15
+
16
+ <p align="center">
17
+ <a href="https://github.com/pietz/moa-cli/actions/workflows/ci.yml"><img src="https://github.com/pietz/moa-cli/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
18
+ </p>
19
+
20
+ # MOA - Mixture of Agents
21
+
22
+ Ask one question to multiple local AI coding CLIs **in parallel** and collect their answers. MOA detects which agent CLIs you have installed (Claude Code, Codex, agy, opencode), fans your prompt out to them, and streams each answer back the moment that agent finishes. Or run `moa distill` to have a strong aggregator merge those answers into a single unified response, or `moa debate` to have them critique each other across rounds before a neutral judge gives the verdict.
23
+
24
+ It's a drop-in, batteries-included replacement for hand-rolling parallel `claude -p` / `codex exec` / `opencode run` calls (or a "peer review" agent skill): one command, clean attributed output, made to be called by a human **or** by another agent.
25
+
26
+ The package is named `moa-cli` but installs the command `moa`.
27
+
28
+ ```bash
29
+ uv tool install moa-cli
30
+ moa ask "Is Postgres or SQLite better for a desktop app?"
31
+ ```
32
+
33
+ Or run it once without installing:
34
+
35
+ ```bash
36
+ uvx --from moa-cli moa ask "Review this plan."
37
+ ```
38
+
39
+ ## Why
40
+
41
+ A single model gives you one perspective. Asking three frontier models the same question - and seeing where they agree, diverge, or contradict - is a fast, cheap way to pressure-test an answer. MOA makes that a one-liner using the CLIs you already pay for, with no API keys of its own.
42
+
43
+ ## Usage
44
+
45
+ MOA has three prompt verbs that share the same selection/output options:
46
+
47
+ - **`moa ask PROMPT`** - council / peer review: N agents answer the same prompt in parallel; every answer is returned with attribution, streamed as it lands.
48
+ - **`moa distill PROMPT`** - synthesis: run the council, then one strong aggregator merges the answers into a single unified response.
49
+ - **`moa debate PROMPT`** - sequential debate: two debaters answer and adversarially critique each other across rounds, then a separate neutral judge writes the final verdict. The costliest mode; read the caveats below before reaching for it.
50
+
51
+ ```bash
52
+ moa doctor # show installed CLIs and their default models
53
+ moa ask "Should this feature use SQLite?" # ask the top 3 installed agents (read-only)
54
+ moa ask -n 2 "..." # ask only the top 2 (priority order)
55
+ moa ask -p claude -p agy "..." # pin specific agents
56
+ moa ask -x claude "..." # drop an agent (e.g. exclude the caller's own model)
57
+ moa ask -m claude=sonnet "..." # override which model a tool uses
58
+ moa ask --yolo "..." # grant full write access (default is read-only)
59
+ moa ask --json "..." # machine-readable JSONL (for agents/pipes)
60
+ git diff | moa ask -f - "Review this diff." # read the prompt from stdin
61
+ moa distill "Design a rate limiter." # council, then merge into one answer
62
+ moa distill -s codex "..." # pick who distills (auto | random | provider)
63
+ moa debate "Is this race condition real?" # 2 debaters + a judge (default n=3)
64
+ moa debate -r 3 "..." # more rounds (default 2, hard max 4)
65
+ moa debate -j claude "..." # pin who judges (must not be a debater)
66
+ ```
67
+
68
+ The shared options (`-n/--num`, `-p/--provider`, `-x/--exclude`, `-m/--model`, `-t/--timeout`, `-f/--file`, `--json`, `--yolo`) work identically on all three verbs. `distill` adds `-s/--synthesizer`; `debate` adds `-r/--rounds` and `-j/--judge`.
69
+
70
+ ### Read-only by default
71
+
72
+ MOA is built to be called autonomously, so by default **no agent can write files or
73
+ run mutating commands**. Each agent runs in its tool's safest mode: it may read local
74
+ files (and, where the tool allows, research online), but it cannot edit anything. This
75
+ is enforced by spawning each CLI with its own read-only flags:
76
+
77
+ | Provider | Read-only (default) | Reads files | Web research |
78
+ | ---------- | -------------------------- | ----------- | ------------------------- |
79
+ | `claude` | `--permission-mode plan` | yes | yes |
80
+ | `codex` | `-s read-only` | yes | **no** (sandbox blocks network) |
81
+ | `opencode` | `--agent plan` | yes | yes |
82
+ | `agy` | `--sandbox` (partial: shell only - can still edit files) | yes | yes |
83
+
84
+ `codex`'s read-only mode is a kernel sandbox that also blocks network, so codex does no
85
+ web research in the default mode (it still reads local files). `agy` has **no true
86
+ read-only mode**: its `--sandbox` flag restricts agy's terminal/shell but does **not** stop
87
+ its `write_file` tool, so agy **can still edit files** even in the default mode. This is
88
+ **partial** protection (it closes the shell vector only), not read-only. moa applies
89
+ `--sandbox` as the next-best safeguard and the selection note on stderr states honestly that
90
+ `agy` is shell-sandboxed but can still edit files.
91
+
92
+ ### `--yolo` (full write access)
93
+
94
+ Pass `--yolo` to grant every agent full write access (file edits and shell commands,
95
+ auto-approved). Use it only when you actually want the agents to change your working tree.
96
+
97
+ ```bash
98
+ moa ask --yolo "Refactor this module and run the tests."
99
+ ```
100
+
101
+ Under `--yolo` every agent gets full write access. For `agy` this means dropping
102
+ `--sandbox`, so `agy --yolo` runs with no shell restrictions at all. In the default mode,
103
+ `agy` runs with `--sandbox` (partial protection: shell only - it can still edit files), and
104
+ MOA states that honestly on stderr.
105
+
106
+ ### How agents are selected
107
+
108
+ `-n/--num` (default 3) picks the first N **installed** agents from a popularity-ordered priority list:
109
+
110
+ ```
111
+ claude -> codex -> agy -> opencode
112
+ ```
113
+
114
+ So `moa ask -n 3` on a machine with all four installed asks Claude, Codex, and agy (opencode is #4). `agy` has no true read-only mode, so in the default mode it runs with `--sandbox` (partial protection: shell only - it can still edit files) and MOA flags that with an honest note on stderr; it is **not** excluded. Use `-p/--provider` (repeatable) to pin an exact set and ignore `-n`.
115
+
116
+ Use `-x/--exclude` (repeatable) to drop one or more agents from the run. Exclusion is applied *before* `-n` takes the first N, and it also drops excluded names from an explicit `-p` set. It is off by default. The motivating case: an agent (e.g. Claude Code) calls `moa` for *other* opinions; `moa ask -x claude` makes sure one "peer" isn't just the caller's own model. So `moa ask -n 3 -x claude` asks Codex, agy, and opencode.
117
+
118
+ ### Choosing models
119
+
120
+ Each tool ships with a reasonable default model, but you can override which model any tool uses with `-m/--model PROVIDER=MODEL` (repeatable). Only the providers you name change; the rest keep their defaults.
121
+
122
+ ```bash
123
+ moa ask -m claude=sonnet -m agy="Gemini 3.1 Pro (Low)" "..."
124
+ ```
125
+
126
+ The model-string format differs per tool and is passed through verbatim (the tool's own CLI validates it):
127
+
128
+ | Provider | Default | `-m` format |
129
+ | ---------- | ----------------------- | ------------------------------------------------------ |
130
+ | `claude` | `opus` | short id, e.g. `claude=sonnet` |
131
+ | `codex` | `gpt-5.5` | model id, e.g. `codex=gpt-5.5` |
132
+ | `agy` | `Gemini 3.1 Pro (High)` | exact display name, e.g. `agy="Gemini 3.1 Pro (Low)"` |
133
+ | `opencode` | (tool's authed default) | `provider/model` slug, e.g. `opencode=anthropic/claude-sonnet-4` |
134
+
135
+ `opencode` has no built-in default; without an override it omits `-m` and lets opencode pick. Pass `-m opencode=provider/model` to pin one.
136
+
137
+ ### Configuration
138
+
139
+ To avoid repeating the same flags on every call, persist your own defaults in a config file. MOA reads it for every verb and merges it under your flags.
140
+
141
+ **Location.** `~/.moa/config.toml` (the dir is created on first write). Set `$MOA_CONFIG_DIR` to point the whole config layer somewhere else (useful in tests/CI).
142
+
143
+ **Precedence.** `built-in default < config file < CLI flag`. A flag always wins; the config file only changes a default when that flag is omitted; an absent file means today's built-in behaviour.
144
+
145
+ **Keys** (all shared across `ask`/`distill`/`debate`):
146
+
147
+ | Key | Type | Example |
148
+ | ------------- | ----------------------- | ----------------------------- |
149
+ | `num` | int (>= 1) | `num = 2` |
150
+ | `timeout` | seconds (> 0) | `timeout = 120` |
151
+ | `exclude` | list of provider names | `exclude = ["claude"]` |
152
+ | `synthesizer` | `auto`/`random`/provider | `synthesizer = "codex"` |
153
+ | `[models]` | provider -> model table | `claude = "sonnet"` |
154
+
155
+ ```toml
156
+ # ~/.moa/config.toml
157
+ num = 2
158
+ timeout = 120
159
+ exclude = ["claude"]
160
+ synthesizer = "auto"
161
+
162
+ [models]
163
+ claude = "sonnet"
164
+ agy = "Gemini 3.1 Pro (Low)"
165
+ ```
166
+
167
+ **`moa config`** inspects and edits the file (it creates the dir/file as needed and validates provider names):
168
+
169
+ ```bash
170
+ moa config show # effective config (defaults + file) + path
171
+ moa config path # print the config file path
172
+ moa config set num 2 # set a scalar
173
+ moa config set exclude claude,codex # set the exclude list (comma-separated)
174
+ moa config set model claude=sonnet # set one entry in [models]
175
+ moa config unset num # remove a key
176
+ moa config unset model claude # remove one [models] entry
177
+ ```
178
+
179
+ The synthesizer default is persistable too (e.g. `moa config set synthesizer codex`); `debate`'s `-r/--rounds` and `-j/--judge` are not persisted. CLI `-m` overrides win per-provider over the config `[models]` table.
180
+
181
+ ### Output
182
+
183
+ - **stdout** carries only content: each agent's answer is fronted by a centered separator rule naming it (`──── claude (opus) · OK · 3.5s ────`) with blank lines around it for clear separation, flushed the instant that agent finishes. `moa distill` then appends the merged block (`──── synthesis · via claude · OK · ... ────`) once the aggregator finishes.
184
+ - **stderr** carries progress and selection notes (`Asking claude, codex ...`), so piping stdout stays clean.
185
+ - `--json` emits one JSON object per line (JSONL): a `{"type": "response", ...}` record per agent as it completes; `distill` then adds a `{"type": "synthesis", ...}` record. `debate` instead emits a `{"type": "debate_turn", "round": N, ...}` record per turn plus a final `{"type": "verdict", ...}` record. Ideal when another agent calls MOA and parses the result.
186
+
187
+ ### `moa distill` (synthesis)
188
+
189
+ `distill` runs the same council fan-out as `ask`, then one more pass where a strong aggregator merges the collected answers into a single, unified answer. It needs at least two successful proposer answers; with fewer it streams what it has and skips the merge. The aggregator is chosen with `-s/--synthesizer`:
190
+
191
+ - `auto` (default) - the highest-priority agent that ran (deterministic)
192
+ - `random` - pick one of the agents that ran, at random
193
+ - a provider name (`claude`, `codex`, `agy`, `opencode`)
194
+
195
+ The aggregator prompt is adapted from the Mixture-of-Agents "Aggregate-and-Synthesize" prompt (Wang et al. 2024): it tells the aggregator to critically evaluate the inputs (some may be biased or incorrect) and not to simply replicate them but offer a refined, accurate, comprehensive reply.
196
+
197
+ ### `moa debate` (sequential debate + neutral judge)
198
+
199
+ `debate` is the opt-in, highest-cost mode. Instead of fanning out in parallel, it runs a sequential, adversarial exchange and then asks a **separate neutral judge** to write the final answer.
200
+
201
+ **Roles.** By default the top **2** selected agents are the debaters and the **3rd** is the judge - so the default `-n 3` maps to *2 debaters + 1 judge*. Pin a specific judge with `-j/--judge PROVIDER`; the judge must be one of the selected agents and must **not** also be a debater. Debate needs at least 2 debaters and 1 distinct judge, so it needs at least 3 agents; with fewer it exits with a clear message rather than silently degrading.
202
+
203
+ **Rounds.** `-r/--rounds` defaults to **2** (gains plateau around 2-3 rounds while token cost grows multiplicatively) and is hard-capped at **4** - higher values are clamped with a warning on stderr.
204
+
205
+ **The loop.** Round 1: debater A answers cold; debater B sees A's answer with an adversarial-stance instruction ("identify errors/weaknesses before giving your own answer; do not agree merely to reach consensus"). Each later round, every debater sees the other's latest answer and responds in the same spirit. If every debater signals it has *no substantive change* (it may open its reply with `NO SUBSTANTIVE CHANGE`), the debate stops early before the cap.
206
+
207
+ **The judge.** A model that is **not** a debater reads the full transcript - presented **anonymized and order-shuffled** (a model is judging, so brand/position bias is killed, per item 002) - and writes the final answer. Its prompt instructs it to weigh correctness and evidence **above** confidence and fluency. The judge's verdict is the final block (`──── verdict · judge <name> · ... ────`).
208
+
209
+ **Streaming/output.** Each debater's turn streams as it completes (`──── round N · <provider> · ... ────`), then the judge's verdict last. `--json` emits a `{"type": "debate_turn", "round": N, ...}` record per turn plus a final `{"type": "verdict", ...}` record.
210
+
211
+ **Safety.** Debaters and the judge run in the same read-only (or `--yolo`) mode as the other verbs - there is no permission bypass. agy's partial-sandbox caveat (shell only; it can still edit files) applies here too.
212
+
213
+ > **Caveat - use sparingly.** Debate is the costliest mode (roughly `debaters x rounds + 1` model calls) **and the least reliably beneficial.** The research is mixed-to-negative: multi-agent debate can converge on a *wrong* answer through conformity, a confident-but-incorrect debater can win on persuasiveness over correctness, and more rounds can entrench an error rather than fix it. The separate neutral judge and the adversarial-stance prompt are there to fight these failure modes, but they do not eliminate them. For most questions, `ask` or `distill` is the better default; reach for `debate` when you specifically want to surface and stress-test disagreement. (See *Can LLM Agents Really Debate?* arXiv:2511.07784, *Talk Isn't Always Cheap* arXiv:2509.05396, and the conformity/position-bias work cited in the design notes.)
214
+
215
+ ### Attribution policy
216
+
217
+ The human (or agent) reading MOA's output **always gets correct attribution**: every response block shows the real provider name. There is no human-facing anonymization toggle.
218
+
219
+ The `distill` aggregator is a different story. To stop it picking favourites by brand, it **always** receives the proposer answers anonymized as "Response A / B / C" and order-shuffled (no toggle). The merged answer itself is brand-agnostic prose, and the A/B/C labels never leak into stdout, stderr, or the JSON.
220
+
221
+ ## Supported agents
222
+
223
+ Invocations below show the default (read-only) flags; `--yolo` swaps in each tool's full-access mode.
224
+
225
+ | Provider | CLI | Invocation (read-only default) |
226
+ | ----------- | ---------- | ------------------------------------------------------------------- |
227
+ | `claude` | `claude` | `claude --model opus --permission-mode plan -p PROMPT` |
228
+ | `codex` | `codex` | `codex exec -m gpt-5.5 --skip-git-repo-check -s read-only PROMPT` |
229
+ | `agy` | `agy` | `agy --sandbox --model "Gemini 3.1 Pro (High)" -p PROMPT` (partial: shell only - can still edit files) |
230
+ | `opencode` | `opencode` | `opencode run --agent plan PROMPT` |
231
+
232
+ Adding a new agent is a single entry in the `PROVIDERS` table in `src/moa_cli/cli.py` (executable, default model, command builder, permission flags); it then participates in detection, `-n` selection, and `distill` automatically.
233
+
234
+ ## Development
235
+
236
+ ```bash
237
+ uv sync
238
+ uv run pytest
239
+ uv run ruff check src tests
240
+ ```
241
+
242
+ MIT licensed.
@@ -0,0 +1,231 @@
1
+ <p align="center">
2
+ <img src="assets/logo-full-white.png" alt="moa - mixture of agents" width="360">
3
+ </p>
4
+
5
+ <p align="center">
6
+ <a href="https://github.com/pietz/moa-cli/actions/workflows/ci.yml"><img src="https://github.com/pietz/moa-cli/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
7
+ </p>
8
+
9
+ # MOA - Mixture of Agents
10
+
11
+ Ask one question to multiple local AI coding CLIs **in parallel** and collect their answers. MOA detects which agent CLIs you have installed (Claude Code, Codex, agy, opencode), fans your prompt out to them, and streams each answer back the moment that agent finishes. Or run `moa distill` to have a strong aggregator merge those answers into a single unified response, or `moa debate` to have them critique each other across rounds before a neutral judge gives the verdict.
12
+
13
+ It's a drop-in, batteries-included replacement for hand-rolling parallel `claude -p` / `codex exec` / `opencode run` calls (or a "peer review" agent skill): one command, clean attributed output, made to be called by a human **or** by another agent.
14
+
15
+ The package is named `moa-cli` but installs the command `moa`.
16
+
17
+ ```bash
18
+ uv tool install moa-cli
19
+ moa ask "Is Postgres or SQLite better for a desktop app?"
20
+ ```
21
+
22
+ Or run it once without installing:
23
+
24
+ ```bash
25
+ uvx --from moa-cli moa ask "Review this plan."
26
+ ```
27
+
28
+ ## Why
29
+
30
+ A single model gives you one perspective. Asking three frontier models the same question - and seeing where they agree, diverge, or contradict - is a fast, cheap way to pressure-test an answer. MOA makes that a one-liner using the CLIs you already pay for, with no API keys of its own.
31
+
32
+ ## Usage
33
+
34
+ MOA has three prompt verbs that share the same selection/output options:
35
+
36
+ - **`moa ask PROMPT`** - council / peer review: N agents answer the same prompt in parallel; every answer is returned with attribution, streamed as it lands.
37
+ - **`moa distill PROMPT`** - synthesis: run the council, then one strong aggregator merges the answers into a single unified response.
38
+ - **`moa debate PROMPT`** - sequential debate: two debaters answer and adversarially critique each other across rounds, then a separate neutral judge writes the final verdict. The costliest mode; read the caveats below before reaching for it.
39
+
40
+ ```bash
41
+ moa doctor # show installed CLIs and their default models
42
+ moa ask "Should this feature use SQLite?" # ask the top 3 installed agents (read-only)
43
+ moa ask -n 2 "..." # ask only the top 2 (priority order)
44
+ moa ask -p claude -p agy "..." # pin specific agents
45
+ moa ask -x claude "..." # drop an agent (e.g. exclude the caller's own model)
46
+ moa ask -m claude=sonnet "..." # override which model a tool uses
47
+ moa ask --yolo "..." # grant full write access (default is read-only)
48
+ moa ask --json "..." # machine-readable JSONL (for agents/pipes)
49
+ git diff | moa ask -f - "Review this diff." # read the prompt from stdin
50
+ moa distill "Design a rate limiter." # council, then merge into one answer
51
+ moa distill -s codex "..." # pick who distills (auto | random | provider)
52
+ moa debate "Is this race condition real?" # 2 debaters + a judge (default n=3)
53
+ moa debate -r 3 "..." # more rounds (default 2, hard max 4)
54
+ moa debate -j claude "..." # pin who judges (must not be a debater)
55
+ ```
56
+
57
+ The shared options (`-n/--num`, `-p/--provider`, `-x/--exclude`, `-m/--model`, `-t/--timeout`, `-f/--file`, `--json`, `--yolo`) work identically on all three verbs. `distill` adds `-s/--synthesizer`; `debate` adds `-r/--rounds` and `-j/--judge`.
58
+
59
+ ### Read-only by default
60
+
61
+ MOA is built to be called autonomously, so by default **no agent can write files or
62
+ run mutating commands**. Each agent runs in its tool's safest mode: it may read local
63
+ files (and, where the tool allows, research online), but it cannot edit anything. This
64
+ is enforced by spawning each CLI with its own read-only flags:
65
+
66
+ | Provider | Read-only (default) | Reads files | Web research |
67
+ | ---------- | -------------------------- | ----------- | ------------------------- |
68
+ | `claude` | `--permission-mode plan` | yes | yes |
69
+ | `codex` | `-s read-only` | yes | **no** (sandbox blocks network) |
70
+ | `opencode` | `--agent plan` | yes | yes |
71
+ | `agy` | `--sandbox` (partial: shell only - can still edit files) | yes | yes |
72
+
73
+ `codex`'s read-only mode is a kernel sandbox that also blocks network, so codex does no
74
+ web research in the default mode (it still reads local files). `agy` has **no true
75
+ read-only mode**: its `--sandbox` flag restricts agy's terminal/shell but does **not** stop
76
+ its `write_file` tool, so agy **can still edit files** even in the default mode. This is
77
+ **partial** protection (it closes the shell vector only), not read-only. moa applies
78
+ `--sandbox` as the next-best safeguard and the selection note on stderr states honestly that
79
+ `agy` is shell-sandboxed but can still edit files.
80
+
81
+ ### `--yolo` (full write access)
82
+
83
+ Pass `--yolo` to grant every agent full write access (file edits and shell commands,
84
+ auto-approved). Use it only when you actually want the agents to change your working tree.
85
+
86
+ ```bash
87
+ moa ask --yolo "Refactor this module and run the tests."
88
+ ```
89
+
90
+ Under `--yolo` every agent gets full write access. For `agy` this means dropping
91
+ `--sandbox`, so `agy --yolo` runs with no shell restrictions at all. In the default mode,
92
+ `agy` runs with `--sandbox` (partial protection: shell only - it can still edit files), and
93
+ MOA states that honestly on stderr.
94
+
95
+ ### How agents are selected
96
+
97
+ `-n/--num` (default 3) picks the first N **installed** agents from a popularity-ordered priority list:
98
+
99
+ ```
100
+ claude -> codex -> agy -> opencode
101
+ ```
102
+
103
+ So `moa ask -n 3` on a machine with all four installed asks Claude, Codex, and agy (opencode is #4). `agy` has no true read-only mode, so in the default mode it runs with `--sandbox` (partial protection: shell only - it can still edit files) and MOA flags that with an honest note on stderr; it is **not** excluded. Use `-p/--provider` (repeatable) to pin an exact set and ignore `-n`.
104
+
105
+ Use `-x/--exclude` (repeatable) to drop one or more agents from the run. Exclusion is applied *before* `-n` takes the first N, and it also drops excluded names from an explicit `-p` set. It is off by default. The motivating case: an agent (e.g. Claude Code) calls `moa` for *other* opinions; `moa ask -x claude` makes sure one "peer" isn't just the caller's own model. So `moa ask -n 3 -x claude` asks Codex, agy, and opencode.
106
+
107
+ ### Choosing models
108
+
109
+ Each tool ships with a reasonable default model, but you can override which model any tool uses with `-m/--model PROVIDER=MODEL` (repeatable). Only the providers you name change; the rest keep their defaults.
110
+
111
+ ```bash
112
+ moa ask -m claude=sonnet -m agy="Gemini 3.1 Pro (Low)" "..."
113
+ ```
114
+
115
+ The model-string format differs per tool and is passed through verbatim (the tool's own CLI validates it):
116
+
117
+ | Provider | Default | `-m` format |
118
+ | ---------- | ----------------------- | ------------------------------------------------------ |
119
+ | `claude` | `opus` | short id, e.g. `claude=sonnet` |
120
+ | `codex` | `gpt-5.5` | model id, e.g. `codex=gpt-5.5` |
121
+ | `agy` | `Gemini 3.1 Pro (High)` | exact display name, e.g. `agy="Gemini 3.1 Pro (Low)"` |
122
+ | `opencode` | (tool's authed default) | `provider/model` slug, e.g. `opencode=anthropic/claude-sonnet-4` |
123
+
124
+ `opencode` has no built-in default; without an override it omits `-m` and lets opencode pick. Pass `-m opencode=provider/model` to pin one.
125
+
126
+ ### Configuration
127
+
128
+ To avoid repeating the same flags on every call, persist your own defaults in a config file. MOA reads it for every verb and merges it under your flags.
129
+
130
+ **Location.** `~/.moa/config.toml` (the dir is created on first write). Set `$MOA_CONFIG_DIR` to point the whole config layer somewhere else (useful in tests/CI).
131
+
132
+ **Precedence.** `built-in default < config file < CLI flag`. A flag always wins; the config file only changes a default when that flag is omitted; an absent file means today's built-in behaviour.
133
+
134
+ **Keys** (all shared across `ask`/`distill`/`debate`):
135
+
136
+ | Key | Type | Example |
137
+ | ------------- | ----------------------- | ----------------------------- |
138
+ | `num` | int (>= 1) | `num = 2` |
139
+ | `timeout` | seconds (> 0) | `timeout = 120` |
140
+ | `exclude` | list of provider names | `exclude = ["claude"]` |
141
+ | `synthesizer` | `auto`/`random`/provider | `synthesizer = "codex"` |
142
+ | `[models]` | provider -> model table | `claude = "sonnet"` |
143
+
144
+ ```toml
145
+ # ~/.moa/config.toml
146
+ num = 2
147
+ timeout = 120
148
+ exclude = ["claude"]
149
+ synthesizer = "auto"
150
+
151
+ [models]
152
+ claude = "sonnet"
153
+ agy = "Gemini 3.1 Pro (Low)"
154
+ ```
155
+
156
+ **`moa config`** inspects and edits the file (it creates the dir/file as needed and validates provider names):
157
+
158
+ ```bash
159
+ moa config show # effective config (defaults + file) + path
160
+ moa config path # print the config file path
161
+ moa config set num 2 # set a scalar
162
+ moa config set exclude claude,codex # set the exclude list (comma-separated)
163
+ moa config set model claude=sonnet # set one entry in [models]
164
+ moa config unset num # remove a key
165
+ moa config unset model claude # remove one [models] entry
166
+ ```
167
+
168
+ The synthesizer default is persistable too (e.g. `moa config set synthesizer codex`); `debate`'s `-r/--rounds` and `-j/--judge` are not persisted. CLI `-m` overrides win per-provider over the config `[models]` table.
169
+
170
+ ### Output
171
+
172
+ - **stdout** carries only content: each agent's answer is fronted by a centered separator rule naming it (`──── claude (opus) · OK · 3.5s ────`) with blank lines around it for clear separation, flushed the instant that agent finishes. `moa distill` then appends the merged block (`──── synthesis · via claude · OK · ... ────`) once the aggregator finishes.
173
+ - **stderr** carries progress and selection notes (`Asking claude, codex ...`), so piping stdout stays clean.
174
+ - `--json` emits one JSON object per line (JSONL): a `{"type": "response", ...}` record per agent as it completes; `distill` then adds a `{"type": "synthesis", ...}` record. `debate` instead emits a `{"type": "debate_turn", "round": N, ...}` record per turn plus a final `{"type": "verdict", ...}` record. Ideal when another agent calls MOA and parses the result.
175
+
176
+ ### `moa distill` (synthesis)
177
+
178
+ `distill` runs the same council fan-out as `ask`, then one more pass where a strong aggregator merges the collected answers into a single, unified answer. It needs at least two successful proposer answers; with fewer it streams what it has and skips the merge. The aggregator is chosen with `-s/--synthesizer`:
179
+
180
+ - `auto` (default) - the highest-priority agent that ran (deterministic)
181
+ - `random` - pick one of the agents that ran, at random
182
+ - a provider name (`claude`, `codex`, `agy`, `opencode`)
183
+
184
+ The aggregator prompt is adapted from the Mixture-of-Agents "Aggregate-and-Synthesize" prompt (Wang et al. 2024): it tells the aggregator to critically evaluate the inputs (some may be biased or incorrect) and not to simply replicate them but offer a refined, accurate, comprehensive reply.
185
+
186
+ ### `moa debate` (sequential debate + neutral judge)
187
+
188
+ `debate` is the opt-in, highest-cost mode. Instead of fanning out in parallel, it runs a sequential, adversarial exchange and then asks a **separate neutral judge** to write the final answer.
189
+
190
+ **Roles.** By default the top **2** selected agents are the debaters and the **3rd** is the judge - so the default `-n 3` maps to *2 debaters + 1 judge*. Pin a specific judge with `-j/--judge PROVIDER`; the judge must be one of the selected agents and must **not** also be a debater. Debate needs at least 2 debaters and 1 distinct judge, so it needs at least 3 agents; with fewer it exits with a clear message rather than silently degrading.
191
+
192
+ **Rounds.** `-r/--rounds` defaults to **2** (gains plateau around 2-3 rounds while token cost grows multiplicatively) and is hard-capped at **4** - higher values are clamped with a warning on stderr.
193
+
194
+ **The loop.** Round 1: debater A answers cold; debater B sees A's answer with an adversarial-stance instruction ("identify errors/weaknesses before giving your own answer; do not agree merely to reach consensus"). Each later round, every debater sees the other's latest answer and responds in the same spirit. If every debater signals it has *no substantive change* (it may open its reply with `NO SUBSTANTIVE CHANGE`), the debate stops early before the cap.
195
+
196
+ **The judge.** A model that is **not** a debater reads the full transcript - presented **anonymized and order-shuffled** (a model is judging, so brand/position bias is killed, per item 002) - and writes the final answer. Its prompt instructs it to weigh correctness and evidence **above** confidence and fluency. The judge's verdict is the final block (`──── verdict · judge <name> · ... ────`).
197
+
198
+ **Streaming/output.** Each debater's turn streams as it completes (`──── round N · <provider> · ... ────`), then the judge's verdict last. `--json` emits a `{"type": "debate_turn", "round": N, ...}` record per turn plus a final `{"type": "verdict", ...}` record.
199
+
200
+ **Safety.** Debaters and the judge run in the same read-only (or `--yolo`) mode as the other verbs - there is no permission bypass. agy's partial-sandbox caveat (shell only; it can still edit files) applies here too.
201
+
202
+ > **Caveat - use sparingly.** Debate is the costliest mode (roughly `debaters x rounds + 1` model calls) **and the least reliably beneficial.** The research is mixed-to-negative: multi-agent debate can converge on a *wrong* answer through conformity, a confident-but-incorrect debater can win on persuasiveness over correctness, and more rounds can entrench an error rather than fix it. The separate neutral judge and the adversarial-stance prompt are there to fight these failure modes, but they do not eliminate them. For most questions, `ask` or `distill` is the better default; reach for `debate` when you specifically want to surface and stress-test disagreement. (See *Can LLM Agents Really Debate?* arXiv:2511.07784, *Talk Isn't Always Cheap* arXiv:2509.05396, and the conformity/position-bias work cited in the design notes.)
203
+
204
+ ### Attribution policy
205
+
206
+ The human (or agent) reading MOA's output **always gets correct attribution**: every response block shows the real provider name. There is no human-facing anonymization toggle.
207
+
208
+ The `distill` aggregator is a different story. To stop it picking favourites by brand, it **always** receives the proposer answers anonymized as "Response A / B / C" and order-shuffled (no toggle). The merged answer itself is brand-agnostic prose, and the A/B/C labels never leak into stdout, stderr, or the JSON.
209
+
210
+ ## Supported agents
211
+
212
+ Invocations below show the default (read-only) flags; `--yolo` swaps in each tool's full-access mode.
213
+
214
+ | Provider | CLI | Invocation (read-only default) |
215
+ | ----------- | ---------- | ------------------------------------------------------------------- |
216
+ | `claude` | `claude` | `claude --model opus --permission-mode plan -p PROMPT` |
217
+ | `codex` | `codex` | `codex exec -m gpt-5.5 --skip-git-repo-check -s read-only PROMPT` |
218
+ | `agy` | `agy` | `agy --sandbox --model "Gemini 3.1 Pro (High)" -p PROMPT` (partial: shell only - can still edit files) |
219
+ | `opencode` | `opencode` | `opencode run --agent plan PROMPT` |
220
+
221
+ Adding a new agent is a single entry in the `PROVIDERS` table in `src/moa_cli/cli.py` (executable, default model, command builder, permission flags); it then participates in detection, `-n` selection, and `distill` automatically.
222
+
223
+ ## Development
224
+
225
+ ```bash
226
+ uv sync
227
+ uv run pytest
228
+ uv run ruff check src tests
229
+ ```
230
+
231
+ MIT licensed.
@@ -1,6 +1,6 @@
1
1
  [project]
2
2
  name = "moa-cli"
3
- version = "0.1.0"
3
+ version = "0.2.1"
4
4
  description = "Ask one question to multiple local AI coding CLIs in parallel and collect their answers."
5
5
  readme = "README.md"
6
6
  authors = [
@@ -1,3 +1,3 @@
1
1
  """MOA CLI package."""
2
2
 
3
- __version__ = "0.1.0"
3
+ __version__ = "0.2.1"