@freibergergarcia/phone-a-friend 2.2.0 → 2.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "phone-a-friend",
3
3
  "description": "CLI relay that lets AI coding agents collaborate by sending prompts and repository context to backend agents.",
4
- "version": "2.2.0",
4
+ "version": "2.3.0",
5
5
  "author": {
6
6
  "name": "Bruno Freiberger"
7
7
  }
package/README.md CHANGED
@@ -24,17 +24,18 @@ Relay tasks to any backend, spin up multi-model teams, or run persistent multi-a
24
24
  | **Team** | Iterative multi-backend refinement over N rounds | Collaborative review, converging on a solution |
25
25
  | **Agentic** | Persistent multi-agent sessions with @mention routing | Autonomous collaboration, adversarial review, deep analysis |
26
26
 
27
- <div align="center">
28
-
29
- ### TUI Dashboard
30
-
31
- <img src="https://raw.githubusercontent.com/freibergergarcia/phone-a-friend/main/assets/tui-dashboard.gif" alt="TUI dashboard" width="600">
32
-
33
- ### Web Dashboard
27
+ ### Host parity
34
28
 
35
- <img src="https://raw.githubusercontent.com/freibergergarcia/phone-a-friend/main/assets/web-dashboard.gif" alt="Web dashboard" width="700">
29
+ | Feature | Claude Code | OpenCode |
30
+ |---|:---:|:---:|
31
+ | `/phone-a-friend` (single-backend relay) | ✓ | ✓ |
32
+ | `/curiosity-engine` (Q&A rally) | ✓ | ✓ |
33
+ | `/phone-a-team` (iterative multi-model team) | ✓ | — |
34
+ | Plugin marketplace install | ✓ | — |
35
+ | CLI plugin install (`phone-a-friend plugin install --<host>`) | ✓ | ✓ |
36
+ | Skill auto-discovery | ✓ | ✓ |
36
37
 
37
- </div>
38
+ OpenCode users can replicate `/phone-a-team` by running repeated `/phone-a-friend` calls and synthesizing manually.
38
39
 
39
40
  ## Quick Start
40
41
 
@@ -57,26 +58,15 @@ The setup wizard detects your backends, offers to install detected host integrat
57
58
 
58
59
  **Claude Code marketplace (commands and skills only):**
59
60
 
60
- If you use [Claude Code](https://docs.anthropic.com/en/docs/claude-code/setup),
61
- you can install directly from the marketplace:
62
-
63
61
  ```
64
62
  /plugin marketplace add freibergergarcia/phone-a-friend
65
63
  /plugin install phone-a-friend@phone-a-friend-marketplace
66
64
  ```
67
65
 
68
- This fetches the latest version from npm automatically. To update later:
69
-
70
- ```
71
- /plugin marketplace update phone-a-friend-marketplace
72
- /plugin update phone-a-friend@phone-a-friend-marketplace
73
- ```
66
+ To update: `/plugin marketplace update phone-a-friend-marketplace` then `/plugin update phone-a-friend@phone-a-friend-marketplace`.
74
67
 
75
68
  > [!NOTE]
76
- > Marketplace install gives you Claude Code integration (slash commands
77
- > and skills). For the full CLI including agentic mode, the TUI dashboard, and
78
- > the web dashboard on localhost, install globally with
79
- > `npm install -g @freibergergarcia/phone-a-friend`.
69
+ > Marketplace install ships only the slash commands and skills. For the full CLI (agentic mode, TUI, web dashboard), install via `npm install -g @freibergergarcia/phone-a-friend`.
80
70
 
81
71
  **OpenCode commands and skills:**
82
72
 
@@ -113,10 +103,10 @@ Build a team with Claude and Ollama. Have them review the website copy,
113
103
  loop through 3 rounds, and converge on final suggestions.
114
104
  ```
115
105
 
116
- No slash commands needed. Once the host integration is installed (the setup wizard offers to do this), the host can route single-backend tasks through `/phone-a-friend`. In Claude Code, mention multiple backends and Claude can use `/phone-a-team` for iterative multi-agent refinement; `/phone-a-team` is Claude-only because it depends on Claude Agent Teams primitives. In OpenCode, use repeated `/phone-a-friend` calls and synthesize the results manually. You can explicitly invoke `/phone-a-friend` in both hosts, and `/phone-a-team` in Claude Code only.
106
+ No slash commands needed once the host integration is installed (see [Host parity](#host-parity) for which slash commands work in which host).
117
107
 
118
108
  > [!TIP]
119
- > **Power-user setup:** Run Claude Code in [**tmux**](https://formulae.brew.sh/formula/tmux) and enable [**bypass permissions**](https://docs.anthropic.com/en/docs/claude-code/security) (`⏵⏵`) for trusted repos. [**Agent teams**](https://docs.anthropic.com/en/docs/claude-code/agent-teams) show up in split panes, so you can watch agents work in parallel without approval pauses. Pair it with **phone-a-friend agentic mode** for fully autonomous multi-agent sessions.
109
+ > **Claude Code power-user setup:** Run in [**tmux**](https://formulae.brew.sh/formula/tmux) with [**bypass permissions**](https://docs.anthropic.com/en/docs/claude-code/security) (`⏵⏵`) and [**Agent Teams**](https://docs.anthropic.com/en/docs/claude-code/agent-teams) to watch agents work in parallel split panes. Pair with **phone-a-friend agentic mode** for fully autonomous sessions.
120
110
 
121
111
  ## CLI Usage
122
112
 
@@ -126,7 +116,7 @@ Delegate a task to any backend and get the result back:
126
116
 
127
117
  ```bash
128
118
  phone-a-friend --to codex --prompt "Review this code"
129
- phone-a-friend --to gemini --prompt "Analyze the architecture" --model gemini-2.5-flash
119
+ phone-a-friend --to gemini --prompt "Analyze the architecture"
130
120
  phone-a-friend --to claude --prompt "Refactor this module"
131
121
  phone-a-friend --to ollama --prompt "Explain this function"
132
122
  phone-a-friend --to opencode --prompt "Audit this repo" --model qwen3-coder # Local agentic (OpenCode + Ollama)
@@ -213,21 +203,22 @@ phone-a-friend config edit # Open in $EDITOR
213
203
 
214
204
  ## Backends
215
205
 
216
- | Backend | Type | Streaming | How it works |
217
- |---------|------|-----------|-------------|
218
- | **Codex** | CLI subprocess | No | Runs `codex exec` with sandbox and repo context |
219
- | **Gemini** | CLI subprocess | No | Runs `gemini --prompt` with `--yolo` auto-approve |
220
- | **Ollama** | HTTP API | Yes (NDJSON) | POSTs to `localhost:11434/api/chat` via native fetch |
221
- | **Claude** | CLI subprocess | Yes (JSON) | Runs `claude` with sandbox-to-tool mapping |
222
- | **OpenCode** | CLI subprocess | Yes (NDJSON) | Runs `opencode run` with repo-local tool access |
206
+ | Backend | Type | Streaming |
207
+ |---------|------|-----------|
208
+ | **Codex** | CLI subprocess | No |
209
+ | **Gemini** | CLI subprocess | No |
210
+ | **Ollama** | HTTP API | Yes (NDJSON) |
211
+ | **Claude** | CLI subprocess | Yes (JSON) |
212
+ | **OpenCode** | CLI subprocess | Yes (NDJSON) |
223
213
 
224
214
  Ollama configuration via environment variables:
225
215
  - `OLLAMA_HOST` -- custom host (default: `http://localhost:11434`)
226
216
  - `OLLAMA_MODEL` -- default model (overridden by `--model` flag)
227
217
 
228
218
  Phone-a-friend environment variables:
229
- - `PHONE_A_FRIEND_INCLUDE_DIFF=false` -- disable diff inclusion across every relay; equivalent to passing `--no-include-diff` on every command. Useful when `defaults.include_diff = true` in your config but you want a session without the diff. Also the canonical mechanism used by OpenCode shims for stale-binary compatibility (the `--no-include-diff` flag was added in v2.2.0+; older binaries reject it but accept this env var since v1.7.2).
230
- - `PHONE_A_FRIEND_HOST=opencode` -- mark the calling process as OpenCode for the recursion guard. Set automatically by the OpenCode shims; only relevant if you're invoking PaF programmatically from inside an OpenCode session.
219
+ - `PHONE_A_FRIEND_INCLUDE_DIFF=false` -- disable diff inclusion globally (equivalent to `--no-include-diff` on every call).
220
+ - `PHONE_A_FRIEND_HOST=opencode` -- mark the calling process as OpenCode for the recursion guard (set automatically by the OpenCode shims).
221
+ - `PHONE_A_FRIEND_GEMINI_DEAD_CACHE=false` -- bypass the Gemini dead-model cache (debugging stale entries).
231
222
 
232
223
  OpenCode configuration via TOML:
233
224
  ```toml
@@ -272,14 +263,6 @@ phone-a-friend agentic dashboard # default: localhost:7777
272
263
  phone-a-friend agentic dashboard --port 8080
273
264
  ```
274
265
 
275
- **How it works:**
276
-
277
- 1. The orchestrator spawns each agent with the initial prompt and a unique name (e.g., `ada.reviewer`, `fern.critic`)
278
- 2. Agents respond and `@mention` other agents (or `@all` / `@user`)
279
- 3. The orchestrator routes messages to the targeted agents
280
- 4. Agents reply in subsequent turns, building on accumulated context
281
- 5. The session ends when agents converge (no new messages), hit the turn limit, or time out
282
-
283
266
  **What you get:**
284
267
 
285
268
  - **Live web dashboard** -- watch agents collaborate in real time at `localhost:7777` (SSE-powered)
@@ -297,11 +280,20 @@ Full usage guide, examples, CLI reference, and configuration details:
297
280
 
298
281
  ## Uninstall
299
282
 
283
+ **npm install:**
284
+
300
285
  ```bash
301
286
  npm uninstall -g @freibergergarcia/phone-a-friend
302
287
  ```
303
288
 
304
- The Claude Code plugin and OpenCode commands/skills are removed automatically when installed through the CLI.
289
+ Automatically removes the Claude Code plugin (CLI-installed), OpenCode commands and skills, and the `~/.config/phone-a-friend` directory (config, sessions, jobs).
290
+
291
+ **Claude Code marketplace:**
292
+
293
+ ```
294
+ /plugin uninstall phone-a-friend@phone-a-friend-marketplace
295
+ /plugin marketplace remove phone-a-friend-marketplace
296
+ ```
305
297
 
306
298
  ## Contributing
307
299
 
@@ -351,28 +351,33 @@ Present the full session summary:
351
351
  <2-3 questions raised during the rally that weren't followed up on, worth exploring in a future session>
352
352
  ```
353
353
 
354
- ## Gemini Model Priority
354
+ ## Gemini model selection
355
355
 
356
- When BACKEND=gemini, always pass `--model` using the first available model from this list:
356
+ For BACKEND=gemini, **omit `--model` by default** and let Gemini CLI's
357
+ auto-routing pick. Set `--model` only when reproducibility, specific
358
+ capability, or debugging requires a pin.
357
359
 
358
- 1. `gemini-2.5-flash`reliable, confirmed working
359
- 2. `gemini-2.5-pro` higher capability, frequently at capacity (429)
360
- 3. `gemini-2.5-flash-lite` last resort
360
+ **Binary mode** (preferred when a pinned model returns a strong 404
361
+ `ModelNotFoundError`, PaF caches it for 24h at
362
+ `~/.config/phone-a-friend/gemini-models.json` and surfaces a clear error
363
+ with cache path, expiry, and bypass instructions; no auto-substitution):
361
364
 
362
- When BACKEND=gemini, the relay command must include `--model`:
363
-
364
- **Binary mode:**
365
365
  ```bash
366
- phone-a-friend --to gemini --model gemini-2.5-flash --repo "$PWD" --sandbox read-only --fast $PAF_NO_DIFF --prompt "<relay-prompt>"
366
+ phone-a-friend --to gemini --repo "$PWD" --sandbox read-only --fast $PAF_NO_DIFF --prompt "<relay-prompt>"
367
367
  ```
368
368
 
369
- **Direct mode:**
369
+ To bypass the cache: `PHONE_A_FRIEND_GEMINI_DEAD_CACHE=false`. Or delete the
370
+ cache file to clear it.
371
+
372
+ **Direct mode** (no PaF wrapper — orchestrator handles retry):
370
373
  ```bash
371
- gemini --sandbox --yolo --include-directories "$PWD" --output-format text -m gemini-2.5-flash --prompt "<relay-prompt>"
374
+ gemini --sandbox --yolo --include-directories "$PWD" --output-format text --prompt "<relay-prompt>"
372
375
  ```
373
376
 
374
- On capacity/transient errors (429, 500, 503), try the next model before treating as round failure.
375
- Do NOT use aliases like `auto`, `pro`, or `flash` — always use the full model name.
377
+ In direct mode, on capacity/transient errors (429, 500, 503), retry with a
378
+ different model before treating as round failure. On `ModelNotFoundError`,
379
+ surface immediately. Either omit `--model` entirely or pass a concrete
380
+ model name; do NOT use aliases like `auto`, `pro`, or `flash`.
376
381
 
377
382
  ## Constraints
378
383
 
@@ -169,9 +169,9 @@ I'm working on this task and got the above response. Please review it and return
169
169
  **Binary mode** (`RELAY_MODE = binary`):
170
170
  ```bash
171
171
  phone-a-friend --to codex --repo "$PWD" --prompt "<relay-prompt>" --context-text "<context-payload>" $PAF_NO_DIFF [--fast] [--session <id>]
172
- # For gemini, always include --model (see "Gemini Model Priority" below).
172
+ # For gemini, omit --model by default (let auto-routing pick); see "Gemini model selection" below.
173
173
  # Do NOT pass --session to gemini — it will error (see "Session continuity" below):
174
- phone-a-friend --to gemini --repo "$PWD" --prompt "<relay-prompt>" --context-text "<context-payload>" --model <model> $PAF_NO_DIFF [--fast]
174
+ phone-a-friend --to gemini --repo "$PWD" --prompt "<relay-prompt>" --context-text "<context-payload>" $PAF_NO_DIFF [--fast]
175
175
  ```
176
176
 
177
177
  `$PAF_NO_DIFF` comes from the probe in "Diff suppression" above. It
@@ -186,8 +186,8 @@ I'm working on this task and got the above response. Please review it and return
186
186
  ```bash
187
187
  # Codex:
188
188
  codex exec -C "$PWD" --skip-git-repo-check --sandbox read-only "<combined-prompt>" < /dev/null
189
- # Gemini (always include -m, see "Gemini Model Priority" below):
190
- gemini --sandbox --yolo --include-directories "$PWD" --output-format text -m <model> --prompt "<combined-prompt>"
189
+ # Gemini (omit -m for auto-routing; pin only when reproducibility/capability is needed):
190
+ gemini --sandbox --yolo --include-directories "$PWD" --output-format text --prompt "<combined-prompt>"
191
191
  ```
192
192
 
193
193
  In direct mode, build `<combined-prompt>` using the template from the
@@ -284,51 +284,55 @@ This is rarely the right move from inside a Claude Code conversation — the
284
284
  common case is `--session <label>` with a fresh label. Only use
285
285
  `--backend-session` when the user supplied a specific backend thread ID.
286
286
 
287
- ## Gemini Model Priority
287
+ ## Gemini model selection
288
288
 
289
- When using `--to gemini`, **always** pass `--model` using the first model from
290
- this priority list. Never use aliases (`auto`, `pro`, `flash`) use concrete
291
- model names only:
289
+ By default, **omit `--model`** for `--to gemini` and let Gemini CLI's
290
+ auto-routing pick. This mirrors how `--to codex` and `--to claude` are used
291
+ in this command — the CLI's own default is the right default. Pinning
292
+ `--model` ages docs poorly; auto-routing tracks deployed models for you.
292
293
 
293
- ### Why we bypass auto-routing
294
+ Set `--model` explicitly only when you need:
294
295
 
295
- Gemini CLI has built-in model fallback via auto mode, but it does NOT work in
296
- headless/non-interactive mode. `--yolo` (and `--approval-mode yolo`) only
297
- auto-approve tool calls, not model switch prompts. When Gemini hits a capacity
298
- error in headless mode, it tries to prompt for consent and fails
299
- (`google-gemini/gemini-cli#13561`). By passing `--model` explicitly, we bypass
300
- this broken behavior and handle retry/fallback ourselves.
296
+ - **Reproducibility** pinning produces deterministic behavior across runs.
297
+ - **Capability** a more capable model for a specific task (e.g.,
298
+ `--model gemini-2.5-pro` for a hard review, accepting more 429s).
299
+ - **Debugging** isolating model behavior from auto-routing changes.
301
300
 
302
- ### Priority rationale
301
+ ### Cache-aware failure for explicit pins
303
302
 
304
- As of 2026-02-22, `gemini-3.1-pro-preview-*` models return 404 (not yet
305
- deployed) and `gemini-2.5-pro` is perpetually at capacity (429). Based on
306
- empirical testing across 10+ relay sessions, `gemini-2.5-flash` is the only
307
- model that reliably works. Lead with what works; fall forward to newer models
308
- as they become available.
303
+ When you pin a model and it returns a strong 404 (`ModelNotFoundError`),
304
+ PaF caches it as unavailable for 24h at
305
+ `~/.config/phone-a-friend/gemini-models.json` and surfaces a clear error
306
+ that includes the cache path, expiry timestamp, and bypass instructions.
307
+ PaF does **not** auto-substitute another model — explicit pins surface
308
+ explicit failures so the caller decides whether to retry, switch model,
309
+ or omit `--model` and rely on auto-routing.
309
310
 
310
- 1. `gemini-2.5-flash` reliable, fast, confirmed working
311
- 2. `gemini-2.5-pro` — higher capability but frequently at capacity (429)
312
- 3. `gemini-2.5-flash-lite` — last resort
313
- 4. `gemini-3.1-pro-preview-customtools` — not yet deployed (404 as of 2026-02-22)
314
- 5. `gemini-3.1-pro-preview` — not yet deployed (404 as of 2026-02-22)
311
+ What is and isn't cached:
315
312
 
316
- ### Fallback rule
313
+ - **Cached** (24h): strong 404 (`ModelNotFoundError` from gemini-cli's own classifier).
314
+ - **Not cached**: ambiguous 404s (could be a missing project / file, not the model), 429 / RESOURCE_EXHAUSTED, authentication failures, any other error class.
315
+ - **Not consulted**: when `--model` is unset (auto-routing), or during session resume (`--resume`).
317
316
 
318
- On Gemini relay failure, retry with the next model **only** for transient or
319
- capacity errors:
317
+ To bypass the cache (debugging stale entries, testing recovery):
320
318
 
321
- - **Retry with next model**: HTTP 429, 499, 500, 503, 504; RESOURCE_EXHAUSTED;
322
- "high demand"; model not found; transient/timeout errors
323
- - **Do NOT retry**: authentication failures, invalid arguments, prompt errors,
324
- permission errors
325
- - **Default**: if an error cannot be confidently classified as transient, do
326
- NOT model-fallback — report the error immediately
319
+ ```bash
320
+ PHONE_A_FRIEND_GEMINI_DEAD_CACHE=false phone-a-friend --to gemini --model X --prompt "..."
321
+ ```
322
+
323
+ Or delete `~/.config/phone-a-friend/gemini-models.json` to clear it.
324
+
325
+ ### Direct Gemini CLI mode
326
+
327
+ When the orchestrator is calling `gemini` directly (no PaF wrapper), the
328
+ dead-model cache does NOT apply — the orchestrator is responsible for any
329
+ retry. Retry rules in direct mode:
327
330
 
328
- After exhausting all models, stop and report the error with the list of
329
- attempted models.
331
+ - **Retry**: HTTP 429, 499, 500, 503, 504; RESOURCE_EXHAUSTED; transient/timeout errors.
332
+ - **Do NOT retry**: authentication failures, invalid arguments, permission errors, model-not-found.
333
+ - **Default**: if an error cannot be confidently classified as transient, surface it immediately.
330
334
 
331
- This does NOT apply to `--to codex`.
335
+ This does NOT apply to `--to codex` or `--to claude`.
332
336
 
333
337
  ## Notes
334
338
 
@@ -269,7 +269,7 @@ Resolution matrix:
269
269
  | Friend backend | Include when |
270
270
  |----------------|--------------|
271
271
  | `codex` | `command -v codex` AND `codex --version` succeeds |
272
- | `gemini` | `command -v gemini` succeeds (auth verified at first relay; transient errors handled by Gemini Model Priority retry rules) |
272
+ | `gemini` | `command -v gemini` succeeds (auth verified at first relay; transient errors handled by Gemini auto-routing) |
273
273
  | `ollama` | `curl -sf "${OLLAMA_HOST:-http://localhost:11434}/api/tags"` succeeds AND parsed `models[]` has at least one entry |
274
274
  | `claude` | `command -v claude` AND `claude --version` succeeds. Claude is excluded by default when this skill is running inside Claude Code (we are already orchestrating with Claude). Include only when the user explicitly asked for Claude in addition |
275
275
  | `opencode` | `command -v opencode` succeeds AND the host is NOT OpenCode (`PHONE_A_FRIEND_HOST=opencode` means we are inside OpenCode; relaying back to opencode is blocked by the recursion guard regardless) |
@@ -525,7 +525,7 @@ Delegate the task to the backend via the relay. The lead's job is to
525
525
  "Direct call reference" section. If `--include-diff` is used, run
526
526
  `git diff HEAD` and append the output to the template's "Git Diff" section.
527
527
 
528
- For gemini, always include `--model` / `-m` per the Gemini Model Priority section.
528
+ For gemini, omit `--model` by default and let auto-routing pick (see "Gemini model selection" section).
529
529
  For ollama, always include `--model` / model field using `OLLAMA_SELECTED_MODEL` from preflight.
530
530
  - **Both backends**: Relay to each backend (in parallel if using teams,
531
531
  sequentially otherwise). You may give them the same task or different
@@ -804,53 +804,45 @@ happened and whether the result is complete.
804
804
  targets *unsolicited* repo-content dumps, not user-requested
805
805
  diff-scoped reviews.
806
806
 
807
- ## Gemini Model Priority
807
+ ## Gemini model selection
808
808
 
809
- When using `--to gemini` (including the gemini side of `--backend both`),
810
- **always** pass `--model` using the first model from this priority list. Never
811
- use aliases (`auto`, `pro`, `flash`) use concrete model names only:
809
+ For `--to gemini` (including the gemini side of `--backend both`), **omit
810
+ `--model` by default** and let Gemini CLI's auto-routing pick. This mirrors
811
+ how `--to codex` and `--to claude` are used in this command — the CLI's own
812
+ default is the right default.
812
813
 
813
- ### Why we bypass auto-routing
814
+ Set `--model` explicitly only when reproducibility, specific capability, or
815
+ debugging requires a pin.
814
816
 
815
- Gemini CLI has built-in model fallback via auto mode, but it does NOT work in
816
- headless/non-interactive mode. `--yolo` (and `--approval-mode yolo`) only
817
- auto-approve tool calls, not model switch prompts. When Gemini hits a capacity
818
- error in headless mode, it tries to prompt for consent and fails
819
- (`google-gemini/gemini-cli#13561`). By passing `--model` explicitly, we bypass
820
- this broken behavior and handle retry/fallback ourselves.
817
+ ### Cache-aware failure for explicit pins
821
818
 
822
- ### Priority rationale
819
+ PaF binary mode (`phone-a-friend --to gemini --model X`) caches strong 404s
820
+ (`ModelNotFoundError`) at `~/.config/phone-a-friend/gemini-models.json` for
821
+ 24h and surfaces a clear error with the cache path, expiry, and bypass
822
+ instructions. PaF does **not** auto-substitute another model — explicit
823
+ pins surface explicit failures.
823
824
 
824
- As of 2026-02-22, `gemini-3.1-pro-preview-*` models return 404 (not yet
825
- deployed) and `gemini-2.5-pro` is perpetually at capacity (429). Based on
826
- empirical testing across 10+ relay sessions, `gemini-2.5-flash` is the only
827
- model that reliably works. Lead with what works; fall forward to newer models
828
- as they become available.
825
+ What is and isn't cached:
829
826
 
830
- 1. `gemini-2.5-flash` reliable, fast, confirmed working
831
- 2. `gemini-2.5-pro` higher capability but frequently at capacity (429)
832
- 3. `gemini-2.5-flash-lite` last resort
833
- 4. `gemini-3.1-pro-preview-customtools` — not yet deployed (404 as of 2026-02-22)
834
- 5. `gemini-3.1-pro-preview` — not yet deployed (404 as of 2026-02-22)
827
+ - **Cached** (24h): strong 404 (`ModelNotFoundError` from gemini-cli's own classifier).
828
+ - **Not cached**: ambiguous 404s, 429 / RESOURCE_EXHAUSTED, authentication failures, any other error class.
829
+ - **Not consulted**: when `--model` is unset (auto-routing), or during session resume.
835
830
 
836
- ### Fallback rule
831
+ To bypass the cache: `PHONE_A_FRIEND_GEMINI_DEAD_CACHE=false`. Or delete the
832
+ cache file to clear it.
837
833
 
838
- On Gemini relay failure, retry with the next model **only** for transient or
839
- capacity errors:
834
+ ### Direct Gemini CLI mode
840
835
 
841
- - **Retry with next model**: HTTP 429, 499, 500, 503, 504; RESOURCE_EXHAUSTED;
842
- "high demand"; model not found; transient/timeout errors
843
- - **Do NOT retry**: authentication failures, invalid arguments, prompt errors,
844
- permission errors
845
- - **Default**: if an error cannot be confidently classified as transient, do
846
- NOT model-fallback — treat as immediate round failure
836
+ When invoking `gemini` directly (no PaF wrapper), the dead-model cache does
837
+ NOT apply. Orchestrator-level retry rules:
847
838
 
848
- Model fallback happens **within the current round** (see Step 7 precedence
849
- rule). After exhausting all models in a round, escalate to round-level retry
850
- or stop per Step 7. Each new round resets to model #1.
839
+ - **Retry**: HTTP 429, 499, 500, 503, 504; RESOURCE_EXHAUSTED; transient/timeout.
840
+ - **Do NOT retry**: auth failures, invalid args, permission errors, model-not-found.
841
+ - **Default**: surface unclassified errors immediately, do not loop.
851
842
 
852
- When reporting errors in synthesis, list all attempted models and the error
853
- from each.
843
+ Round-level retry: each new round can attempt a different `--model` if the
844
+ prior round failed (see Step 7). When reporting errors in synthesis, list
845
+ the attempted models and each error.
854
846
 
855
847
  This does NOT apply to `--to codex` or `--to ollama`.
856
848