@covibes/zeroshot 5.2.0 → 5.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (62) hide show
  1. package/CHANGELOG.md +178 -186
  2. package/README.md +199 -248
  3. package/cli/commands/providers.js +150 -0
  4. package/cli/index.js +214 -58
  5. package/cli/lib/first-run.js +40 -3
  6. package/cluster-templates/base-templates/debug-workflow.json +24 -78
  7. package/cluster-templates/base-templates/full-workflow.json +44 -145
  8. package/cluster-templates/base-templates/single-worker.json +23 -15
  9. package/cluster-templates/base-templates/worker-validator.json +47 -34
  10. package/cluster-templates/conductor-bootstrap.json +7 -5
  11. package/lib/docker-config.js +6 -1
  12. package/lib/provider-detection.js +59 -0
  13. package/lib/provider-names.js +56 -0
  14. package/lib/settings.js +191 -6
  15. package/lib/stream-json-parser.js +4 -238
  16. package/package.json +21 -5
  17. package/scripts/validate-templates.js +100 -0
  18. package/src/agent/agent-config.js +37 -13
  19. package/src/agent/agent-context-builder.js +64 -2
  20. package/src/agent/agent-hook-executor.js +82 -9
  21. package/src/agent/agent-lifecycle.js +53 -14
  22. package/src/agent/agent-task-executor.js +196 -194
  23. package/src/agent/output-extraction.js +200 -0
  24. package/src/agent/output-reformatter.js +175 -0
  25. package/src/agent/schema-utils.js +111 -0
  26. package/src/agent-wrapper.js +102 -30
  27. package/src/agents/git-pusher-agent.json +2 -2
  28. package/src/claude-task-runner.js +80 -30
  29. package/src/config-router.js +13 -13
  30. package/src/config-validator.js +231 -10
  31. package/src/github.js +36 -0
  32. package/src/isolation-manager.js +243 -154
  33. package/src/ledger.js +28 -6
  34. package/src/orchestrator.js +391 -96
  35. package/src/preflight.js +85 -82
  36. package/src/providers/anthropic/cli-builder.js +45 -0
  37. package/src/providers/anthropic/index.js +134 -0
  38. package/src/providers/anthropic/models.js +23 -0
  39. package/src/providers/anthropic/output-parser.js +159 -0
  40. package/src/providers/base-provider.js +181 -0
  41. package/src/providers/capabilities.js +51 -0
  42. package/src/providers/google/cli-builder.js +55 -0
  43. package/src/providers/google/index.js +116 -0
  44. package/src/providers/google/models.js +24 -0
  45. package/src/providers/google/output-parser.js +92 -0
  46. package/src/providers/index.js +75 -0
  47. package/src/providers/openai/cli-builder.js +122 -0
  48. package/src/providers/openai/index.js +135 -0
  49. package/src/providers/openai/models.js +21 -0
  50. package/src/providers/openai/output-parser.js +129 -0
  51. package/src/sub-cluster-wrapper.js +18 -3
  52. package/src/task-runner.js +8 -6
  53. package/src/tui/layout.js +20 -3
  54. package/task-lib/attachable-watcher.js +80 -78
  55. package/task-lib/claude-recovery.js +119 -0
  56. package/task-lib/commands/list.js +1 -1
  57. package/task-lib/commands/resume.js +3 -2
  58. package/task-lib/commands/run.js +12 -3
  59. package/task-lib/runner.js +59 -38
  60. package/task-lib/scheduler.js +2 -2
  61. package/task-lib/store.js +43 -30
  62. package/task-lib/watcher.js +81 -62
package/README.md CHANGED
@@ -1,265 +1,229 @@
1
1
  # zeroshot CLI
2
2
 
3
+ > **🎉 New Release:** Now supports **Codex** and **Gemini** CLI in addition to Claude! Use any provider or mix them in multi-agent workflows. See [Providers](#providers) for details.
4
+
5
+ <!-- install-placeholder -->
6
+ <p align="center">
7
+ <code>npm install -g @covibes/zeroshot</code>
8
+ </p>
9
+
10
+ <p align="center">
11
+ <img src="./docs/assets/zeroshot-demo.gif" alt="Demo" width="700">
12
+ <br>
13
+ <em>Demo (100x speed, 90-minute run, 5 iterations to approval)</em>
14
+ </p>
15
+
3
16
  [![CI](https://github.com/covibes/zeroshot/actions/workflows/ci.yml/badge.svg)](https://github.com/covibes/zeroshot/actions/workflows/ci.yml)
4
17
  [![npm version](https://img.shields.io/npm/v/@covibes/zeroshot.svg)](https://www.npmjs.com/package/@covibes/zeroshot)
5
18
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
6
19
  [![Node 18+](https://img.shields.io/badge/node-18%2B-brightgreen.svg)](https://nodejs.org/)
7
- [![Platform: Linux | macOS](https://img.shields.io/badge/platform-Linux%20%7C%20macOS-blue.svg)]()
8
-
9
- > **2024** was the year of LLMs. **2025** was the year of agents. **2026** is the year of agent clusters.
10
-
11
- **Autonomous engineering teams for Claude Code.**
20
+ ![Platform: Linux | macOS](https://img.shields.io/badge/platform-Linux%20%7C%20macOS-blue.svg)
12
21
 
13
- ## Install
22
+ <!-- discord-placeholder -->
14
23
 
15
- **Platforms**: Linux, macOS
24
+ [![Discord](https://img.shields.io/badge/Discord-Join-5865F2?logo=discord&logoColor=white)](https://discord.gg/PdZ3UEXB)
16
25
 
17
- ```bash
18
- npm install -g @covibes/zeroshot
19
- ```
26
+ Zeroshot is an open-source AI coding agent orchestration CLI that runs multi-agent workflows to autonomously implement, review, test, and verify code changes.
20
27
 
21
- **Requires**: Node 18+, [Claude Code CLI](https://claude.com/product/claude-code), [GitHub CLI](https://cli.github.com/)
28
+ It runs a **planner**, an **implementer**, and independent **validators** in isolated environments, looping until changes are **verified** or **rejected** with actionable, reproducible failures.
22
29
 
23
- ```bash
24
- npm i -g @anthropic-ai/claude-code && claude auth login
25
- gh auth login
26
- ```
30
+ Built for tasks where correctness matters more than speed.
27
31
 
28
- ---
29
-
30
- You know the problem. Your AI agent:
31
-
32
- - Says "tests pass" (never ran them)
33
- - Says "done!" (nothing works)
34
- - Implements 60% of what you asked
35
- - Ignores your coding guidelines
36
- - Introduces antipatterns like a junior dev
37
- - Gets sloppy on long tasks
38
-
39
- **AI is extremely capable. But not when one agent does everything in one session.**
32
+ ## How It Works
40
33
 
41
- Context degrades. Attention drifts. Shortcuts get taken.
34
+ - Plan: translate a task into concrete acceptance criteria
35
+ - Implement: make changes in an isolated workspace (local, worktree, or Docker)
36
+ - Validate: run automated checks with independent validators
37
+ - Iterate: repeat until verified, or return actionable failures
38
+ - Resume: crash-safe state persisted for recovery
42
39
 
43
- Zeroshot fixes this with **multiple isolated agents** that check each other's work. The validator didn't write the code, so it can't lie about tests. Fail? Fix and retry until it works.
40
+ ## Quick Start
44
41
 
45
42
  ```bash
46
- zeroshot 123
43
+ zeroshot run 123 # GitHub issue number
47
44
  ```
48
45
 
49
- Point at a GitHub issue, walk away, come back to working code.
50
-
51
- ### Demo
46
+ Or describe the task inline:
52
47
 
53
48
  ```bash
54
- zeroshot "Add optimistic locking with automatic retry: when updating a user,
55
- detect if another request modified it first using version numbers,
56
- automatically retry with exponential backoff up to 3 times,
57
- merge non-conflicting field changes, surface true conflicts to the caller
58
- with details of what conflicted. Handle the ABA problem where version goes A->B->A."
49
+ zeroshot run "Add optimistic locking with automatic retry: when updating a user,
50
+ retry with exponential backoff up to 3 times, merge non-conflicting field changes,
51
+ and surface conflicts with details. Handle the ABA problem where version goes A->B->A."
59
52
  ```
60
53
 
61
- <p align="center">
62
- <img src="./docs/assets/zeroshot-demo.gif" alt="Demo" width="700">
63
- <br>
64
- <em>Sped up 100x — 90 minutes, 5 iterations until validators approved</em>
65
- </p>
66
-
67
- **The full fix cycle.** Initial implementation passed basic tests but validators caught edge cases: race conditions in concurrent updates, ABA problem not fully handled, retry backoff timing issues. Each rejection triggered fixes until all 48 tests passed with 91%+ coverage.
54
+ ## Why Not Just Use a Single AI Agent?
68
55
 
69
- A single agent would say "done!" after the first implementation. Here, the adversarial tester actually *runs* concurrent requests, times the retry backoff, and verifies conflict detection works under load.
70
-
71
- **This is what production-grade looks like.** Not "tests pass" — validators reject until it actually works. 5 iterations, each one fixing real bugs the previous attempt missed.
72
-
73
- ---
74
-
75
- ## When to Use Zeroshot
56
+ | Approach | Writes Code | Runs Tests | Blind Validation | Iterates Until Verified |
57
+ | -------------------------- | ----------- | ---------- | ---------------- | ----------------------- |
58
+ | Chat-based assistant | ✅ | ⚠️ | ❌ | ❌ |
59
+ | Single coding agent | ✅ | ⚠️ | ❌ | ⚠️ |
60
+ | **Zeroshot (multi-agent)** | ✅ | ✅ | ✅ | ✅ |
76
61
 
77
- **Zeroshot requires well-defined tasks with clear acceptance criteria.**
62
+ ## Use Cases
78
63
 
79
- | Scenario | Use? | Why |
80
- |----------|:----:|-----|
81
- | Add rate limiting (sliding window, per-IP, 429) | | Clear requirements |
82
- | Refactor auth to JWT | ✅ | Defined end state |
83
- | Fix login bug | | Success is measurable |
84
- | Fix 2410 lint violations | ✅ | Clear completion criteria |
85
- | Make the app faster | ❌ | Needs exploration first |
86
- | Improve the codebase | ❌ | No acceptance criteria |
87
- | Figure out flaky tests | ❌ | Exploratory |
64
+ - Autonomous AI code refactoring
65
+ - AI-powered pull request automation
66
+ - Automated bug fixing with validation
67
+ - Multi-agent code generation for software engineering
68
+ - Agentic coding workflows with blind validation
88
69
 
89
- **Known unknowns** (implementation details unclear) → Zeroshot handles this. The planner figures it out.
70
+ ## Who Is This For?
90
71
 
91
- **Unknown unknowns** (don't know what you'll discover) → Use single-agent Claude Code for exploration first, then come back with a well-defined task.
72
+ - Senior engineers who care about correctness and reproducibility
73
+ - Teams automating PR workflows and code review gates
74
+ - Infra/platform teams standardizing agentic workflows
75
+ - Open-source maintainers working through issue backlogs
76
+ - AI power users who want verification, not vibes
92
77
 
93
- **Long-running batch tasks** → Zeroshot excels here. Run overnight with `-d` (daemon mode):
94
- - "Fix all 2410 semantic linting violations"
95
- - "Add TypeScript types to all 47 untyped files"
96
- - "Migrate all API calls from v1 to v2"
78
+ ## Install and Requirements
97
79
 
98
- Crash recovery (`zeroshot resume`) means multi-hour tasks survive interruptions.
80
+ **Platforms**: Linux, macOS (Windows WSL not yet supported)
99
81
 
100
- **Rule of thumb:** If you can't describe what "done" looks like, zeroshot's validators can't verify it.
101
-
102
- ---
82
+ ```bash
83
+ npm install -g @covibes/zeroshot
84
+ ```
103
85
 
104
- ## Commands
86
+ **Requires**: Node 18+, at least one provider CLI (Claude Code, Codex, Gemini). [GitHub CLI](https://cli.github.com/) is required when running by issue number.
105
87
 
106
88
  ```bash
107
- zeroshot run 123 # Run on GitHub issue
108
- zeroshot run "Add dark mode" # Run from description
89
+ # Install one or more providers
90
+ npm i -g @anthropic-ai/claude-code
91
+ npm i -g @openai/codex
92
+ npm i -g @google/gemini-cli
109
93
 
110
- # Automation levels (cascading: --ship → --pr → --worktree)
111
- zeroshot run 123 --docker # Docker isolation (full container)
112
- zeroshot run 123 --worktree # Git worktree isolation (lightweight)
113
- zeroshot run 123 --pr # Worktree + PR (human reviews)
114
- zeroshot run 123 --ship # Worktree + PR + auto-merge (full automation)
94
+ # Authenticate with the provider CLI
95
+ claude login # Claude
96
+ codex login # Codex
97
+ gemini auth login # Gemini
115
98
 
116
- # Background mode
117
- zeroshot run 123 -d # Detached/daemon
118
- zeroshot run 123 --ship -d # Full automation, background
99
+ # GitHub auth (for issue numbers)
100
+ gh auth login
101
+ ```
119
102
 
120
- # Control
121
- zeroshot list # See all running (--json for scripting)
122
- zeroshot status <id> # Cluster status (--json for scripting)
123
- zeroshot logs <id> -f # Follow output
124
- zeroshot resume <id> # Continue after crash
125
- zeroshot kill <id> # Stop
126
- zeroshot watch # TUI dashboard
103
+ ## Providers
127
104
 
128
- # Agent library
129
- zeroshot agents list # View available agents
130
- zeroshot agents show <name> # Agent details
105
+ Zeroshot shells out to provider CLIs. Pick a default and override per run:
131
106
 
132
- # Maintenance
133
- zeroshot clean # Remove old records
134
- zeroshot purge # NUCLEAR: kill all + delete all
107
+ ```bash
108
+ zeroshot providers
109
+ zeroshot providers set-default codex
110
+ zeroshot run 123 --provider gemini
135
111
  ```
136
112
 
137
- ---
138
-
139
- <details>
140
- <summary><strong>FAQ</strong></summary>
113
+ See `docs/providers.md` for setup, model levels, and Docker mounts.
141
114
 
142
- **Q: Why Claude-only (for now)?**
115
+ ## Why Multiple Agents?
143
116
 
144
- Claude Code is the most capable agentic coding tool available. We wrap it directly - same tools, same reliability, no custom implementations to break.
117
+ Single-agent sessions degrade. Context gets buried under thousands of tokens. The model optimizes for "done" over "correct."
145
118
 
146
- Multi-model support (Codex CLI, Gemini CLI) is planned - see [#19](https://github.com/covibes/zeroshot/issues/19).
119
+ Zeroshot fixes this with isolated agents that check each other's work. Validators can't lie about code they didn't write. Fail the check? Fix and retry until it actually works.
147
120
 
148
- **Q: Why do single-agent coding sessions get sloppy?**
121
+ ## What Makes It Different
149
122
 
150
- Three failure modes compound when one agent does everything in one session:
123
+ - **Blind validation** - Validators never see the worker's context or code history
124
+ - **Repeatable workflows** - Task complexity determines agent count and model selection
125
+ - **Accept/reject loop** - Rejections include actionable findings, not vague complaints
126
+ - **Crash recovery** - All state persisted to SQLite; resume anytime
127
+ - **Isolation modes** - None, git worktree, or Docker container
128
+ - **Cost control** - Model ceilings prevent runaway API spend
151
129
 
152
- - **Context Dilution**: Your initial guidelines compete with thousands of tokens of code, errors, and edits. Instructions from 50 messages ago get buried.
153
- - **Success Bias**: LLMs optimize for "Task Complete" - even if that means skipping steps to get there.
154
- - **Error Snowball**: When fixing mistakes repeatedly, the context fills with broken code. The model starts copying its own bad patterns.
130
+ ## When to Use Zeroshot
155
131
 
156
- Zeroshot fixes this with **isolated agents** where validators check work they didn't write - no self-grading, no shortcuts.
132
+ Zeroshot performs best when tasks have clear acceptance criteria.
157
133
 
158
- **Q: Can I customize the team?**
134
+ | Scenario | Use | Why |
135
+ | ----------------------------------------------- | --- | ------------------------- |
136
+ | Add rate limiting (sliding window, per-IP, 429) | Yes | Clear requirements |
137
+ | Refactor auth to JWT | Yes | Defined end state |
138
+ | Fix login bug | Yes | Success is measurable |
139
+ | Fix 2410 lint violations | Yes | Clear completion criteria |
140
+ | Make the app faster | No | Needs exploration first |
141
+ | Improve the codebase | No | No acceptance criteria |
142
+ | Figure out flaky tests | No | Exploratory |
159
143
 
160
- Yes, see CLAUDE.md. But most people never need to.
144
+ Rule of thumb: if you cannot describe what "done" means, validators cannot verify it.
161
145
 
162
- **Q: Why is it called "zeroshot"?**
146
+ ## Command Overview
163
147
 
164
- In machine learning, "zero-shot" means solving tasks the model has never seen before - using only the task description, no prior examples needed.
148
+ ```bash
149
+ # Run
150
+ zeroshot run 123
151
+ zeroshot run "Add dark mode"
165
152
 
166
- Same idea here: give zeroshot a well-defined task, get back a result. No examples. No iterative feedback. No hand-holding.
153
+ # Isolation
154
+ zeroshot run 123 --worktree # git worktree
155
+ zeroshot run 123 --docker # container
167
156
 
168
- The multi-agent architecture handles planning, implementation, and validation internally. You provide a clear problem statement. Zeroshot handles the rest.
157
+ # Automation (--ship implies --pr implies --worktree)
158
+ zeroshot run 123 --pr # worktree + create PR
159
+ zeroshot run 123 --ship # PR + auto-merge on approval
169
160
 
170
- </details>
161
+ # Background mode
162
+ zeroshot run 123 -d
163
+ zeroshot run 123 --ship -d
171
164
 
172
- ---
165
+ # Control
166
+ zeroshot list
167
+ zeroshot status <id>
168
+ zeroshot logs <id> -f
169
+ zeroshot resume <id>
170
+ zeroshot stop <id>
171
+ zeroshot kill <id>
172
+ zeroshot watch
173
+
174
+ # Providers
175
+ zeroshot providers
176
+ zeroshot providers set-default codex
173
177
 
174
- ## How It Works
178
+ # Agent library
179
+ zeroshot agents list
180
+ zeroshot agents show <name>
175
181
 
176
- Zeroshot is a **multi-agent coordination framework** with smart defaults.
182
+ # Maintenance
183
+ zeroshot clean
184
+ zeroshot purge
185
+ ```
177
186
 
178
- ### Zero Config
187
+ ## Architecture
179
188
 
180
- ```bash
181
- zeroshot 123 # Analyzes task → picks team → done
182
- ```
189
+ Zeroshot is a message-driven coordination layer with smart defaults.
183
190
 
184
- The conductor classifies your task (complexity × type) and picks the right workflow:
191
+ - The conductor classifies tasks by complexity and type.
192
+ - A workflow template selects agents and validators.
193
+ - Agents publish results to a SQLite ledger.
194
+ - Validators approve or reject with specific findings.
195
+ - Rejections route back to the worker for fixes.
185
196
 
186
- ```
187
- ┌─────────────────┐
188
- │ TASK │
189
- └────────┬────────┘
190
-
191
-
192
- ┌────────────────────────────────────────────┐
193
- │ CONDUCTOR │
194
- │ Complexity × TaskType → Workflow │
195
- └────────────────────────┬───────────────────┘
196
-
197
- ┌─────────────────────────────┼─────────────────────────────┐
198
- │ │ │
199
- ▼ ▼ ▼
200
- ┌───────────┐ ┌───────────┐ ┌───────────┐
201
- │ TRIVIAL │ │ SIMPLE │ │ STANDARD+ │
202
- │ 1 agent │──────────▶ │ worker │ │ planner │
203
- │ (haiku) │ COMPLETE │ + 1 valid.│ │ + worker │
204
- │ no valid. │ └─────┬─────┘ │ + 3-5 val.│
205
- └───────────┘ │ └─────┬─────┘
206
- ▼ │
207
- ┌─────────────┐ ▼
208
- ┌──▶│ WORKER │ ┌─────────────┐
209
- │ └──────┬──────┘ │ PLANNER │
210
- │ │ └──────┬──────┘
211
- │ ▼ │
212
- │ ┌─────────────────────┐ ▼
213
- │ │ ✓ validator │ ┌─────────────┐
214
- │ │ (generic check) │ ┌──▶│ WORKER │
215
- │ └──────────┬──────────┘ │ └──────┬──────┘
216
- │ REJECT │ ALL OK │ │
217
- └──────────────┘ │ │ ▼
218
- │ │ ┌──────────────────────┐
219
- │ │ │ ✓ requirements │
220
- │ │ │ ✓ code (STANDARD+) │
221
- │ │ │ ✓ security (CRIT) │
222
- │ │ │ ✓ tester (CRIT) │
223
- │ │ │ ✓ adversarial │
224
- │ │ │ (real execution) │
225
- │ │ └──────────┬───────────┘
226
- │ │ REJECT │ ALL OK
227
- │ └──────────────┘ │
228
- ▼ ▼
229
- ┌─────────────────────────────────────────────────────────────────────────────┐
230
- │ COMPLETE │
231
- └─────────────────────────────────────────────────────────────────────────────┘
232
- ```
197
+ ### Complexity Model
233
198
 
234
199
  | Task | Complexity | Agents | Validators |
235
200
  | ---------------------- | ---------- | ------ | ------------------------------------------------- |
236
201
  | Fix typo in README | TRIVIAL | 1 | None |
237
- | Add dark mode toggle | SIMPLE | 2 | generic validator |
238
- | Refactor auth system | STANDARD | 4 | requirements, code |
239
- | Implement payment flow | CRITICAL | 7 | requirements, code, security, tester, adversarial |
202
+ | Add dark mode toggle | SIMPLE | 2 | Generic validator |
203
+ | Refactor auth system | STANDARD | 4 | Requirements, code |
204
+ | Implement payment flow | CRITICAL | 7 | Requirements, code, security, tester, adversarial |
240
205
 
241
206
  ### Model Selection by Complexity
242
207
 
243
208
  | Complexity | Planner | Worker | Validators |
244
209
  | ---------- | ------- | ------ | ---------- |
245
- | TRIVIAL | - | haiku | 0 |
246
- | SIMPLE | - | sonnet | 1 (sonnet) |
247
- | STANDARD | sonnet | sonnet | 2 (sonnet) |
248
- | CRITICAL | opus | sonnet | 5 (sonnet) |
210
+ | TRIVIAL | - | level1 | - |
211
+ | SIMPLE | - | level2 | 1 (level2) |
212
+ | STANDARD | level2 | level2 | 2 (level2) |
213
+ | CRITICAL | level3 | level2 | 5 (level2) |
249
214
 
250
- Set model ceiling: `zeroshot settings set maxModel sonnet` (prevents opus)
251
-
252
- ---
215
+ Levels map to provider-specific models. Configure with `zeroshot providers setup <provider>` or
216
+ `settings.providerSettings`. (Legacy `maxModel` applies to Claude only.)
253
217
 
254
218
  <details>
255
219
  <summary><strong>Custom Workflows (Framework Mode)</strong></summary>
256
220
 
257
- Zeroshot is **message-driven** - define any agent topology:
221
+ Zeroshot is message-driven, so you can define any agent topology.
258
222
 
259
- - **Expert panels**: Parallel specialists aggregator decision
260
- - **Staged gates**: Sequential validators, each with veto power
261
- - **Hierarchical**: Supervisor dynamically spawns workers
262
- - **Dynamic**: Conductor adds agents mid-execution
223
+ - Expert panels: parallel specialists -> aggregator -> decision
224
+ - Staged gates: sequential validators, each with veto power
225
+ - Hierarchical: supervisor dynamically spawns workers
226
+ - Dynamic: conductor adds agents mid-execution
263
227
 
264
228
  **Coordination primitives:**
265
229
 
@@ -268,20 +232,14 @@ Zeroshot is **message-driven** - define any agent topology:
268
232
  - Ledger (SQLite, crash recovery)
269
233
  - Dynamic spawning (CLUSTER_OPERATIONS)
270
234
 
271
- #### Creating Custom Clusters with Claude Code
235
+ #### Creating Custom Clusters with a Provider CLI
272
236
 
273
- **The easiest way to create a custom cluster: just ask Claude Code.**
237
+ Start your provider CLI and describe your cluster:
274
238
 
275
- ```bash
276
- # In your zeroshot repo
277
- claude
278
- ```
279
-
280
- **Example prompt:**
281
239
  ```
282
240
  Create a zeroshot cluster config for security-critical features:
283
241
 
284
- 1. Implementation agent (sonnet) implements the feature
242
+ 1. Implementation agent (level2) implements the feature
285
243
  2. FOUR parallel validators:
286
244
  - Security validator: OWASP checks, SQL injection, XSS, CSRF
287
245
  - Performance validator: No N+1 queries, proper indexing
@@ -290,76 +248,64 @@ Create a zeroshot cluster config for security-critical features:
290
248
 
291
249
  3. ALL validators must approve before merge
292
250
  4. If ANY validator rejects, implementation agent fixes and resubmits
293
- 5. Use opus for security validator (highest stakes)
251
+ 5. Use level3 for security validator (highest stakes)
294
252
 
295
253
  Look at cluster-templates/base-templates/full-workflow.json
296
254
  and create a similar cluster. Save to cluster-templates/security-review.json
297
255
  ```
298
256
 
299
- Claude Code will read existing templates, create valid JSON config, and iterate until it works.
257
+ Built-in validation checks for missing triggers, deadlocks, and invalid type wiring before running.
300
258
 
301
- **Built-in validation catches failures before running:**
302
- - Never start (no bootstrap trigger)
303
- - Never complete (no path to completion)
304
- - Loop infinitely (circular dependencies)
305
- - Deadlock (impossible consensus)
306
- - Type mismatches (boolean → string in JSON)
307
-
308
- See [CLAUDE.md](./CLAUDE.md) for cluster config schema and examples.
259
+ See [CLAUDE.md](./CLAUDE.md) for the cluster schema and examples.
309
260
 
310
261
  </details>
311
262
 
312
- ---
313
-
314
263
  ## Crash Recovery
315
264
 
316
- Everything saves to SQLite. If your 2-hour run crashes at 1:59:
265
+ All state is persisted in the SQLite ledger. You can resume at any time:
317
266
 
318
267
  ```bash
319
268
  zeroshot resume cluster-bold-panther
320
- # Continues from exact point
321
269
  ```
322
270
 
323
- ---
324
-
325
271
  ## Isolation Modes
326
272
 
327
273
  ### Git Worktree (Default for --pr/--ship)
328
274
 
329
275
  ```bash
330
- zeroshot 123 --worktree
276
+ zeroshot run 123 --worktree
331
277
  ```
332
278
 
333
- Lightweight isolation using git worktree. Creates a separate working directory with its own branch. Fast (<1s setup), no Docker required. Auto-enabled with `--pr` and `--ship`.
279
+ Lightweight isolation using git worktree. Creates a separate working directory with its own branch. Auto-enabled with `--pr` and `--ship`.
334
280
 
335
281
  ### Docker Container
336
282
 
337
283
  ```bash
338
- zeroshot 123 --docker
284
+ zeroshot run 123 --docker
339
285
  ```
340
286
 
341
- Full isolation in a fresh container. Your workspace stays untouched. Good for risky experiments or parallel agents.
287
+ Full isolation in a fresh container. Your workspace stays untouched. Useful for risky experiments or parallel runs.
342
288
 
343
289
  ### When to Use Which
344
290
 
345
- | Scenario | Recommended |
346
- | -------- | ----------- |
347
- | Quick task, review changes yourself | No isolation (default) |
348
- | PR workflow, code review | `--worktree` or `--pr` |
349
- | Risky experiment, might break things | `--docker` |
350
- | Running multiple tasks in parallel | `--docker` |
351
- | Full automation, no review needed | `--ship` |
291
+ | Scenario | Recommended |
292
+ | ------------------------------------ | ---------------------- |
293
+ | Quick task, review changes yourself | No isolation (default) |
294
+ | PR workflow, code review | `--worktree` or `--pr` |
295
+ | Risky experiment, might break things | `--docker` |
296
+ | Running multiple tasks in parallel | `--docker` |
297
+ | Full automation, no review needed | `--ship` |
352
298
 
353
- **Default mode:** Agents are instructed to only modify files (no git commit/push). You review and commit yourself.
299
+ **Default behavior:** Agents modify files only; they do not commit or push unless using an isolation mode that explicitly allows it.
354
300
 
355
301
  <details>
356
302
  <summary><strong>Docker Credential Mounts</strong></summary>
357
303
 
358
- When using `--docker`, zeroshot mounts credential directories so Claude can access tools like AWS, Azure, kubectl.
304
+ When using `--docker`, zeroshot mounts credential directories so agents can access provider CLIs and tools like AWS, Azure, and kubectl.
359
305
 
360
306
  **Default mounts**: `gh`, `git`, `ssh` (GitHub CLI, git config, SSH keys)
361
307
 
362
- **Available presets**: `gh`, `git`, `ssh`, `aws`, `azure`, `kube`, `terraform`, `gcloud`
308
+ **Available presets**: `gh`, `git`, `ssh`, `aws`, `azure`, `kube`, `terraform`, `gcloud`, `claude`, `codex`, `gemini`
363
309
 
364
310
  ```bash
365
311
  # Configure via settings (persistent)
@@ -371,6 +317,10 @@ zeroshot settings get dockerMounts
371
317
  # Per-run override
372
318
  zeroshot run 123 --docker --mount ~/.aws:/root/.aws:ro
373
319
 
320
+ # Provider credentials
321
+ zeroshot run 123 --docker --mount ~/.config/codex:/home/node/.config/codex:ro
322
+ zeroshot run 123 --docker --mount ~/.config/gemini:/home/node/.config/gemini:ro
323
+
374
324
  # Disable all mounts
375
325
  zeroshot run 123 --docker --no-mounts
376
326
 
@@ -378,7 +328,10 @@ zeroshot run 123 --docker --no-mounts
378
328
  ZEROSHOT_DOCKER_MOUNTS='["aws","azure"]' zeroshot run 123 --docker
379
329
  ```
380
330
 
331
+ See `docs/providers.md` for provider CLI setup and mount details.
332
+
381
333
  **Custom mounts** (mix presets with explicit paths):
334
+
382
335
  ```bash
383
336
  zeroshot settings set dockerMounts '[
384
337
  "gh",
@@ -388,57 +341,55 @@ zeroshot settings set dockerMounts '[
388
341
  ```
389
342
 
390
343
  **Container home**: Presets use `$HOME` placeholder. Default: `/root`. Override with:
344
+
391
345
  ```bash
392
346
  zeroshot settings set dockerContainerHome '/home/node'
393
347
  # Or per-run:
394
348
  zeroshot run 123 --docker --container-home /home/node
395
349
  ```
396
350
 
397
- **Env var passthrough**: Presets auto-pass related env vars (e.g., `aws` `AWS_REGION`, `AWS_PROFILE`). Add custom:
351
+ **Env var passthrough**: Presets auto-pass related env vars (for example, `aws` -> `AWS_REGION`, `AWS_PROFILE`). Add custom:
352
+
398
353
  ```bash
399
354
  zeroshot settings set dockerEnvPassthrough '["MY_API_KEY", "TF_VAR_*"]'
400
355
  ```
401
356
 
402
357
  </details>
403
358
 
404
- ---
405
-
406
- ## More
407
-
408
- - **Debug**: `sqlite3 ~/.zeroshot/cluster-abc.db "SELECT * FROM messages;"`
409
- - **Export**: `zeroshot export <id> --format markdown`
410
- - **Architecture**: See [CLAUDE.md](./CLAUDE.md)
359
+ ## Resources
411
360
 
412
- ---
361
+ - [CLAUDE.md](./CLAUDE.md) - Architecture, cluster config schema, agent primitives
362
+ - `docs/providers.md` - Provider setup, model levels, and Docker mounts
363
+ - [Discord](https://discord.gg/PdZ3UEXB) - Support and community
364
+ - `zeroshot export <id>` - Export conversation to markdown
365
+ - `sqlite3 ~/.zeroshot/*.db` - Direct ledger access for debugging
413
366
 
414
367
  <details>
415
368
  <summary><strong>Troubleshooting</strong></summary>
416
369
 
417
- | Issue | Fix |
418
- | ----------------------------- | -------------------------------------------------------------------- |
419
- | `claude: command not found` | `npm i -g @anthropic-ai/claude-code && claude auth login` |
420
- | `gh: command not found` | [Install GitHub CLI](https://cli.github.com/) |
421
- | `--docker` fails | Docker must be running: `docker ps` to verify |
422
- | Cluster stuck | `zeroshot resume <id>` to continue with guidance |
423
- | Agent keeps failing | Check `zeroshot logs <id>` for actual error |
424
- | `zeroshot: command not found` | `npm install -g @covibes/zeroshot` |
370
+ | Issue | Fix |
371
+ | ----------------------------- | --------------------------------------------------------- |
372
+ | `claude: command not found` | `npm i -g @anthropic-ai/claude-code && claude auth login` |
373
+ | `codex: command not found` | `npm i -g @openai/codex && codex login` |
374
+ | `gemini: command not found` | `npm i -g @google/gemini-cli && gemini auth login` |
375
+ | `gh: command not found` | [Install GitHub CLI](https://cli.github.com/) |
376
+ | `--docker` fails | Docker must be running: `docker ps` to verify |
377
+ | Cluster stuck | `zeroshot resume <id>` to continue |
378
+ | Agent keeps failing | Check `zeroshot logs <id>` for actual error |
379
+ | `zeroshot: command not found` | `npm install -g @covibes/zeroshot` |
425
380
 
426
381
  </details>
427
382
 
428
- ---
429
-
430
383
  ## Contributing
431
384
 
432
385
  See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup and guidelines.
433
386
 
434
- Please read our [Code of Conduct](CODE_OF_CONDUCT.md) before participating.
387
+ Please read [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md) before participating.
435
388
 
436
389
  For security issues, see [SECURITY.md](SECURITY.md).
437
390
 
438
391
  ---
439
392
 
440
- MIT [Covibes](https://github.com/covibes)
393
+ MIT - [Covibes](https://github.com/covibes)
441
394
 
442
395
  Built on [Claude Code](https://claude.com/product/claude-code) by Anthropic.
443
-
444
-