@runtypelabs/cli 2.0.1 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -220,15 +220,44 @@ EOF
220
220
  - `planWritten` — advances when the agent writes its plan artifact
221
221
  - `never` — only the agent's `TASK_COMPLETE` signal can advance (if `canAcceptCompletion: true`)
222
222
 
223
+ **Playbook policies**:
224
+
225
+ The optional `policy` block lets you restrict what the agent can do at runtime. Policies are additive restrictions — they can only narrow behavior, never override global safety denies (e.g. `.env` files and private keys are always blocked).
226
+
227
+ ```yaml
228
+ name: blog-writer
229
+ policy:
230
+ allowedReadGlobs: ['content/**', 'templates/**']
231
+ allowedWriteGlobs: ['content/**']
232
+ blockedTools: ['search_repo']
233
+ blockDiscoveryTools: true
234
+ requirePlanBeforeWrite: true
235
+ requireVerification: true
236
+ outputRoot: 'content/'
237
+ milestones:
238
+ - ...
239
+ ```
240
+
241
+ | Field | Type | Description |
242
+ | ------------------------ | ---------- | ----------------------------------------------------------------------------------------------------------------------------- |
243
+ | `allowedReadGlobs` | `string[]` | Glob patterns for allowed read paths. If set, reads outside these are blocked. |
244
+ | `allowedWriteGlobs` | `string[]` | Glob patterns for allowed write paths. If set, writes outside these are blocked. The plan file is always writable regardless. |
245
+ | `blockedTools` | `string[]` | Tool names to block entirely (e.g. `["write_file", "search_repo"]`). |
246
+ | `blockDiscoveryTools` | `boolean` | Block `search_repo`, `glob_files`, `tree_directory`, and `list_directory`. |
247
+ | `requirePlanBeforeWrite` | `boolean` | Require the agent to write its plan before any other file writes. |
248
+ | `requireVerification` | `boolean` | Require verification before `TASK_COMPLETE`. |
249
+ | `outputRoot` | `string` | For creation tasks: confine writes to this directory (e.g. `"public/"`). |
250
+
223
251
  #### Marathon Anatomy
224
252
 
225
253
  ```
226
254
  ┌─ marathon ──────────────────────────────────────────────────────┐
227
255
  │ │
228
- │ ┌─ playbook (optional) ─────────────────────────────┐
229
- │ │ Defines milestones, models, verification, rules
230
- │ │ .runtype/marathons/playbooks/tdd.yaml
231
- └───────────────────────────────────────────────────┘
256
+ │ ┌─ playbook (optional) ──────────────────────────────────┐
257
+ │ │ Defines milestones, models, verification, rules,
258
+ │ │ and policy constraints
259
+ │ │ .runtype/marathons/playbooks/tdd.yaml │ │
260
+ │ └────────────────────────────────────────────────────────┘ │
232
261
  │ │ │
233
262
  │ ▼ │
234
263
  │ ┌─ milestone 1 ──┐ ┌─ milestone 2 ──┐ ┌─ milestone 3 ─────┐ |
@@ -261,8 +290,55 @@ What's optional:
261
290
  ✓ Rules Without them, agent follows only playbook/milestone instructions
262
291
  ✓ Models Without overrides, uses CLI --model flag or default
263
292
  ✓ Verification Without it, no verification gate between milestones
293
+ ✓ Policy Without one, only global safety denies apply
294
+ ```
295
+
296
+ #### Reasoning / Thinking
297
+
298
+ Marathon enables model reasoning by default for models that support it (Gemini 3, o-series, GPT-5, etc.). When active, the model's thinking process streams to the TUI in real time. To disable:
299
+
300
+ ```bash
301
+ runtype marathon "Code Builder" --goal "Fix the bug" --no-reasoning
302
+ ```
303
+
304
+ #### Fallback Models
305
+
306
+ When an upstream model provider returns a transient error (e.g. overload, rate limit), marathon can automatically retry and then fall back to a different model instead of dying mid-run.
307
+
308
+ **CLI flag** — applies to all phases:
309
+
310
+ ```bash
311
+ # If claude-opus-4-6 fails, retry once then fall back to claude-sonnet-4-5
312
+ runtype marathon "Code Builder" --goal "Refactor auth" \
313
+ --model claude-opus-4-6 \
314
+ --fallback-model claude-sonnet-4-5
315
+ ```
316
+
317
+ **Playbook** — per-milestone fallback chains:
318
+
319
+ ```yaml
320
+ milestones:
321
+ - name: research
322
+ model: claude-sonnet-4-5
323
+ fallbackModels:
324
+ - gpt-4o # string shorthand
325
+ - gemini-3-flash
326
+ instructions: |
327
+ Research the codebase...
328
+
329
+ - name: execution
330
+ model: claude-opus-4-6
331
+ fallbackModels:
332
+ - model: claude-sonnet-4-5 # object form with overrides
333
+ temperature: 0.5
334
+ - model: gpt-4o
335
+ maxTokens: 8192
336
+ instructions: |
337
+ Implement the changes...
264
338
  ```
265
339
 
340
+ Playbook per-milestone fallbacks take priority over the CLI `--fallback-model` flag. The fallback chain always starts with a retry (5s delay) before trying alternative models.
341
+
266
342
  #### Tool Context Modes
267
343
 
268
344
  When a marathon runs multiple sessions, tool call/result pairs from previous sessions are preserved in the conversation history. The `--tool-context` flag controls how older tool results are stored to balance cost and re-readability: