agent-bober 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (212) hide show
  1. package/.claude-plugin/plugin.json +9 -0
  2. package/LICENSE +21 -0
  3. package/README.md +495 -0
  4. package/agents/bober-evaluator.md +323 -0
  5. package/agents/bober-generator.md +245 -0
  6. package/agents/bober-planner.md +248 -0
  7. package/dist/cli/commands/eval.d.ts +6 -0
  8. package/dist/cli/commands/eval.d.ts.map +1 -0
  9. package/dist/cli/commands/eval.js +129 -0
  10. package/dist/cli/commands/eval.js.map +1 -0
  11. package/dist/cli/commands/init.d.ts +5 -0
  12. package/dist/cli/commands/init.d.ts.map +1 -0
  13. package/dist/cli/commands/init.js +547 -0
  14. package/dist/cli/commands/init.js.map +1 -0
  15. package/dist/cli/commands/plan.d.ts +5 -0
  16. package/dist/cli/commands/plan.d.ts.map +1 -0
  17. package/dist/cli/commands/plan.js +87 -0
  18. package/dist/cli/commands/plan.js.map +1 -0
  19. package/dist/cli/commands/run.d.ts +5 -0
  20. package/dist/cli/commands/run.d.ts.map +1 -0
  21. package/dist/cli/commands/run.js +120 -0
  22. package/dist/cli/commands/run.js.map +1 -0
  23. package/dist/cli/commands/sprint.d.ts +6 -0
  24. package/dist/cli/commands/sprint.d.ts.map +1 -0
  25. package/dist/cli/commands/sprint.js +206 -0
  26. package/dist/cli/commands/sprint.js.map +1 -0
  27. package/dist/cli/index.d.ts +3 -0
  28. package/dist/cli/index.d.ts.map +1 -0
  29. package/dist/cli/index.js +124 -0
  30. package/dist/cli/index.js.map +1 -0
  31. package/dist/config/defaults.d.ts +15 -0
  32. package/dist/config/defaults.d.ts.map +1 -0
  33. package/dist/config/defaults.js +226 -0
  34. package/dist/config/defaults.js.map +1 -0
  35. package/dist/config/index.d.ts +4 -0
  36. package/dist/config/index.d.ts.map +1 -0
  37. package/dist/config/index.js +8 -0
  38. package/dist/config/index.js.map +1 -0
  39. package/dist/config/loader.d.ts +18 -0
  40. package/dist/config/loader.d.ts.map +1 -0
  41. package/dist/config/loader.js +189 -0
  42. package/dist/config/loader.js.map +1 -0
  43. package/dist/config/schema.d.ts +904 -0
  44. package/dist/config/schema.d.ts.map +1 -0
  45. package/dist/config/schema.js +181 -0
  46. package/dist/config/schema.js.map +1 -0
  47. package/dist/contracts/eval-result.d.ts +205 -0
  48. package/dist/contracts/eval-result.d.ts.map +1 -0
  49. package/dist/contracts/eval-result.js +87 -0
  50. package/dist/contracts/eval-result.js.map +1 -0
  51. package/dist/contracts/index.d.ts +4 -0
  52. package/dist/contracts/index.d.ts.map +1 -0
  53. package/dist/contracts/index.js +16 -0
  54. package/dist/contracts/index.js.map +1 -0
  55. package/dist/contracts/spec.d.ts +101 -0
  56. package/dist/contracts/spec.d.ts.map +1 -0
  57. package/dist/contracts/spec.js +51 -0
  58. package/dist/contracts/spec.js.map +1 -0
  59. package/dist/contracts/sprint-contract.d.ts +141 -0
  60. package/dist/contracts/sprint-contract.d.ts.map +1 -0
  61. package/dist/contracts/sprint-contract.js +80 -0
  62. package/dist/contracts/sprint-contract.js.map +1 -0
  63. package/dist/evaluators/builtin/api-check.d.ts +13 -0
  64. package/dist/evaluators/builtin/api-check.d.ts.map +1 -0
  65. package/dist/evaluators/builtin/api-check.js +152 -0
  66. package/dist/evaluators/builtin/api-check.js.map +1 -0
  67. package/dist/evaluators/builtin/build-check.d.ts +17 -0
  68. package/dist/evaluators/builtin/build-check.d.ts.map +1 -0
  69. package/dist/evaluators/builtin/build-check.js +155 -0
  70. package/dist/evaluators/builtin/build-check.js.map +1 -0
  71. package/dist/evaluators/builtin/command-runner.d.ts +26 -0
  72. package/dist/evaluators/builtin/command-runner.d.ts.map +1 -0
  73. package/dist/evaluators/builtin/command-runner.js +114 -0
  74. package/dist/evaluators/builtin/command-runner.js.map +1 -0
  75. package/dist/evaluators/builtin/lint.d.ts +17 -0
  76. package/dist/evaluators/builtin/lint.d.ts.map +1 -0
  77. package/dist/evaluators/builtin/lint.js +264 -0
  78. package/dist/evaluators/builtin/lint.js.map +1 -0
  79. package/dist/evaluators/builtin/playwright.d.ts +16 -0
  80. package/dist/evaluators/builtin/playwright.d.ts.map +1 -0
  81. package/dist/evaluators/builtin/playwright.js +238 -0
  82. package/dist/evaluators/builtin/playwright.js.map +1 -0
  83. package/dist/evaluators/builtin/typescript-check.d.ts +12 -0
  84. package/dist/evaluators/builtin/typescript-check.d.ts.map +1 -0
  85. package/dist/evaluators/builtin/typescript-check.js +155 -0
  86. package/dist/evaluators/builtin/typescript-check.js.map +1 -0
  87. package/dist/evaluators/builtin/unit-test.d.ts +18 -0
  88. package/dist/evaluators/builtin/unit-test.d.ts.map +1 -0
  89. package/dist/evaluators/builtin/unit-test.js +279 -0
  90. package/dist/evaluators/builtin/unit-test.js.map +1 -0
  91. package/dist/evaluators/index.d.ts +11 -0
  92. package/dist/evaluators/index.d.ts.map +1 -0
  93. package/dist/evaluators/index.js +13 -0
  94. package/dist/evaluators/index.js.map +1 -0
  95. package/dist/evaluators/plugin-interface.d.ts +50 -0
  96. package/dist/evaluators/plugin-interface.d.ts.map +1 -0
  97. package/dist/evaluators/plugin-interface.js +2 -0
  98. package/dist/evaluators/plugin-interface.js.map +1 -0
  99. package/dist/evaluators/plugin-loader.d.ts +18 -0
  100. package/dist/evaluators/plugin-loader.d.ts.map +1 -0
  101. package/dist/evaluators/plugin-loader.js +107 -0
  102. package/dist/evaluators/plugin-loader.js.map +1 -0
  103. package/dist/evaluators/registry.d.ts +78 -0
  104. package/dist/evaluators/registry.d.ts.map +1 -0
  105. package/dist/evaluators/registry.js +238 -0
  106. package/dist/evaluators/registry.js.map +1 -0
  107. package/dist/index.d.ts +17 -0
  108. package/dist/index.d.ts.map +1 -0
  109. package/dist/index.js +22 -0
  110. package/dist/index.js.map +1 -0
  111. package/dist/orchestrator/context-handoff.d.ts +543 -0
  112. package/dist/orchestrator/context-handoff.d.ts.map +1 -0
  113. package/dist/orchestrator/context-handoff.js +133 -0
  114. package/dist/orchestrator/context-handoff.js.map +1 -0
  115. package/dist/orchestrator/evaluator-agent.d.ts +15 -0
  116. package/dist/orchestrator/evaluator-agent.d.ts.map +1 -0
  117. package/dist/orchestrator/evaluator-agent.js +233 -0
  118. package/dist/orchestrator/evaluator-agent.js.map +1 -0
  119. package/dist/orchestrator/generator-agent.d.ts +16 -0
  120. package/dist/orchestrator/generator-agent.d.ts.map +1 -0
  121. package/dist/orchestrator/generator-agent.js +147 -0
  122. package/dist/orchestrator/generator-agent.js.map +1 -0
  123. package/dist/orchestrator/pipeline.d.ts +24 -0
  124. package/dist/orchestrator/pipeline.d.ts.map +1 -0
  125. package/dist/orchestrator/pipeline.js +290 -0
  126. package/dist/orchestrator/pipeline.js.map +1 -0
  127. package/dist/orchestrator/planner-agent.d.ts +10 -0
  128. package/dist/orchestrator/planner-agent.d.ts.map +1 -0
  129. package/dist/orchestrator/planner-agent.js +187 -0
  130. package/dist/orchestrator/planner-agent.js.map +1 -0
  131. package/dist/state/helpers.d.ts +5 -0
  132. package/dist/state/helpers.d.ts.map +1 -0
  133. package/dist/state/helpers.js +8 -0
  134. package/dist/state/helpers.js.map +1 -0
  135. package/dist/state/history.d.ts +39 -0
  136. package/dist/state/history.d.ts.map +1 -0
  137. package/dist/state/history.js +162 -0
  138. package/dist/state/history.js.map +1 -0
  139. package/dist/state/index.d.ts +8 -0
  140. package/dist/state/index.d.ts.map +1 -0
  141. package/dist/state/index.js +22 -0
  142. package/dist/state/index.js.map +1 -0
  143. package/dist/state/plan-state.d.ts +21 -0
  144. package/dist/state/plan-state.d.ts.map +1 -0
  145. package/dist/state/plan-state.js +108 -0
  146. package/dist/state/plan-state.js.map +1 -0
  147. package/dist/state/sprint-state.d.ts +20 -0
  148. package/dist/state/sprint-state.d.ts.map +1 -0
  149. package/dist/state/sprint-state.js +98 -0
  150. package/dist/state/sprint-state.js.map +1 -0
  151. package/dist/utils/fs.d.ts +31 -0
  152. package/dist/utils/fs.d.ts.map +1 -0
  153. package/dist/utils/fs.js +67 -0
  154. package/dist/utils/fs.js.map +1 -0
  155. package/dist/utils/git.d.ts +35 -0
  156. package/dist/utils/git.d.ts.map +1 -0
  157. package/dist/utils/git.js +84 -0
  158. package/dist/utils/git.js.map +1 -0
  159. package/dist/utils/index.d.ts +4 -0
  160. package/dist/utils/index.d.ts.map +1 -0
  161. package/dist/utils/index.js +4 -0
  162. package/dist/utils/index.js.map +1 -0
  163. package/dist/utils/logger.d.ts +45 -0
  164. package/dist/utils/logger.d.ts.map +1 -0
  165. package/dist/utils/logger.js +73 -0
  166. package/dist/utils/logger.js.map +1 -0
  167. package/hooks/hooks.json +10 -0
  168. package/package.json +67 -0
  169. package/scripts/detect-stack.sh +287 -0
  170. package/scripts/init-project.sh +206 -0
  171. package/scripts/run-eval.sh +175 -0
  172. package/skills/bober.anchor/SKILL.md +365 -0
  173. package/skills/bober.anchor/references/anchor-guide.md +567 -0
  174. package/skills/bober.brownfield/SKILL.md +422 -0
  175. package/skills/bober.brownfield/references/codebase-analysis.md +304 -0
  176. package/skills/bober.eval/SKILL.md +235 -0
  177. package/skills/bober.eval/references/eval-strategies.md +407 -0
  178. package/skills/bober.eval/references/feedback-format.md +182 -0
  179. package/skills/bober.plan/SKILL.md +244 -0
  180. package/skills/bober.plan/references/clarification-guide.md +124 -0
  181. package/skills/bober.plan/references/spec-schema.md +253 -0
  182. package/skills/bober.react/SKILL.md +330 -0
  183. package/skills/bober.react/references/react-scaffold.md +344 -0
  184. package/skills/bober.run/SKILL.md +303 -0
  185. package/skills/bober.solidity/SKILL.md +416 -0
  186. package/skills/bober.solidity/references/solidity-guide.md +487 -0
  187. package/skills/bober.sprint/SKILL.md +280 -0
  188. package/skills/bober.sprint/references/contract-schema.md +251 -0
  189. package/templates/base/CLAUDE.md +20 -0
  190. package/templates/base/bober.config.json +35 -0
  191. package/templates/brownfield/CLAUDE.md +34 -0
  192. package/templates/brownfield/bober.config.json +37 -0
  193. package/templates/presets/anchor/CLAUDE.md +163 -0
  194. package/templates/presets/anchor/bober.config.json +9 -0
  195. package/templates/presets/api-node/CLAUDE.md +153 -0
  196. package/templates/presets/api-node/bober.config.json +10 -0
  197. package/templates/presets/nextjs/CLAUDE.md +82 -0
  198. package/templates/presets/nextjs/bober.config.json +14 -0
  199. package/templates/presets/python-api/CLAUDE.md +202 -0
  200. package/templates/presets/python-api/bober.config.json +9 -0
  201. package/templates/presets/react-vite/CLAUDE.md +71 -0
  202. package/templates/presets/react-vite/bober.config.json +53 -0
  203. package/templates/presets/react-vite/scaffold/package.json +45 -0
  204. package/templates/presets/react-vite/scaffold/server/index.ts +38 -0
  205. package/templates/presets/react-vite/scaffold/server/tsconfig.json +24 -0
  206. package/templates/presets/react-vite/scaffold/src/App.tsx +37 -0
  207. package/templates/presets/react-vite/scaffold/src/index.html +12 -0
  208. package/templates/presets/react-vite/scaffold/src/main.tsx +12 -0
  209. package/templates/presets/react-vite/scaffold/tsconfig.json +27 -0
  210. package/templates/presets/react-vite/scaffold/vite.config.ts +34 -0
  211. package/templates/presets/solidity/CLAUDE.md +106 -0
  212. package/templates/presets/solidity/bober.config.json +9 -0
@@ -0,0 +1,9 @@
1
+ {
2
+ "name": "bober",
3
+ "description": "Generator-Evaluator multi-agent harness for building applications autonomously with Claude",
4
+ "version": "0.1.0",
5
+ "author": { "name": "bober4ik" },
6
+ "homepage": "https://github.com/bober4ik/agent-bober",
7
+ "repository": "https://github.com/bober4ik/agent-bober",
8
+ "license": "MIT"
9
+ }
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 bober4ik
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,495 @@
1
+ # agent-bober
2
+
3
+ **Generator-Evaluator multi-agent harness for building applications autonomously with Claude.**
4
+
5
+ Inspired by Anthropic's engineering publication [**"Harness design for long-running application development"**](https://www.anthropic.com/engineering/harness-design-long-running-apps), agent-bober implements the Generator-Evaluator multi-agent pattern as a reusable, installable workflow. It orchestrates multiple Claude agents in a structured loop: a **Planner** decomposes your idea into sprint contracts, a **Generator** writes the code, and an **Evaluator** independently verifies each sprint against its contract before moving on. The result is autonomous, high-quality software development with built-in guardrails, context resets, and brutally honest evaluation.
6
+
7
+ ```
8
+ You describe a feature
9
+ |
10
+ v
11
+ +-----------+
12
+ | Planner | Asks clarifying questions, produces a PlanSpec
13
+ +-----------+ with sprint contracts and acceptance criteria.
14
+ |
15
+ v
16
+ +-----------+ +-----------+
17
+ | Generator | --> | Evaluator | Writes code, then verifies it:
18
+ +-----------+ +-----------+ typecheck, lint, build, tests.
19
+ ^ |
20
+ | (rework) |
21
+ +---------------+
22
+ |
23
+ v Repeats per sprint until all
24
+ [Next Sprint] contracts are satisfied.
25
+ ```
26
+
27
+ ---
28
+
29
+ ## Installation
30
+
31
+ ```bash
32
+ # Install globally
33
+ npm install -g agent-bober
34
+
35
+ # Or use directly with npx
36
+ npx agent-bober init
37
+ ```
38
+
39
+ agent-bober also works as a **Claude Code plugin**. If you install it as a dependency or globally, Claude Code will detect the plugin manifest and make `/bober:*` slash commands available in your sessions.
40
+
41
+ ## Quick Start
42
+
43
+ ### Any Project
44
+ ```bash
45
+ npx agent-bober init
46
+ ```
47
+
48
+ Interactive setup -- describe what you want to build, pick a preset or let the planner decide.
49
+
50
+ ### With a Preset
51
+ ```bash
52
+ npx agent-bober init nextjs # Next.js full-stack app
53
+ npx agent-bober init react-vite # React + Vite
54
+ npx agent-bober init solidity # EVM smart contracts (Hardhat)
55
+ npx agent-bober init anchor # Solana programs (Anchor)
56
+ npx agent-bober init api-node # Node.js API
57
+ npx agent-bober init python-api # Python API (FastAPI)
58
+ ```
59
+
60
+ ### Existing Codebase
61
+ ```bash
62
+ cd your-existing-project
63
+ npx agent-bober init brownfield
64
+ ```
65
+
66
+ Then in Claude Code:
67
+ ```
68
+ /bober:plan # Describe your feature, get a structured plan
69
+ /bober:sprint # Execute the next sprint
70
+ /bober:eval # Evaluate the sprint output
71
+ /bober:run # Full autonomous pipeline
72
+ ```
73
+
74
+ Specialized workflows:
75
+ ```
76
+ /bober:react # React web app workflow
77
+ /bober:solidity # EVM smart contract workflow
78
+ /bober:anchor # Solana program workflow
79
+ /bober:brownfield # Existing codebase workflow
80
+ ```
81
+
82
+ ---
83
+
84
+ ## Commands
85
+
86
+ ### Slash Commands (Claude Code)
87
+
88
+ | Command | Description |
89
+ |---|---|
90
+ | `/bober:plan` | Plan any feature -- stack-agnostic |
91
+ | `/bober:sprint` | Execute the next sprint contract |
92
+ | `/bober:eval` | Evaluate current sprint output |
93
+ | `/bober:run` | Full autonomous pipeline |
94
+ | `/bober:react` | React web application workflow |
95
+ | `/bober:solidity` | EVM smart contract workflow |
96
+ | `/bober:anchor` | Solana program workflow |
97
+ | `/bober:brownfield` | Existing codebase workflow |
98
+
99
+ ### CLI
100
+
101
+ ```bash
102
+ npx agent-bober init [preset] # Initialize project (nextjs, react-vite, solidity, anchor, api-node, python-api, brownfield)
103
+ npx agent-bober plan # Run the planner
104
+ npx agent-bober sprint # Execute next sprint
105
+ npx agent-bober eval # Evaluate current sprint
106
+ npx agent-bober run # Full autonomous loop
107
+ npx agent-bober status # Show plan progress
108
+ ```
109
+
110
+ ---
111
+
112
+ ## Configuration
113
+
114
+ All configuration lives in `bober.config.json` at your project root. The `init` command creates this file from a template, and you can customize it afterward.
115
+
116
+ ### Full Configuration Reference
117
+
118
+ ```jsonc
119
+ {
120
+ // ── Project ─────────────────────────────────────
121
+ "project": {
122
+ "name": "my-app", // Project name
123
+ "mode": "greenfield", // "greenfield" | "brownfield"
124
+ "preset": "nextjs", // Optional: "nextjs" | "react-vite" | "solidity" | "anchor" | "api-node" | "python-api"
125
+ "description": "A task management app with real-time collaboration"
126
+ },
127
+
128
+ // ── Planner ─────────────────────────────────────
129
+ "planner": {
130
+ "maxClarifications": 5, // Max clarifying questions (0 to skip)
131
+ "model": "opus", // Model for planning: "opus" | "sonnet" | "haiku"
132
+ "contextFiles": [ // Extra files the planner should read
133
+ "docs/architecture.md"
134
+ ]
135
+ },
136
+
137
+ // ── Generator ───────────────────────────────────
138
+ "generator": {
139
+ "model": "sonnet", // Model for code generation
140
+ "maxTurnsPerSprint": 50, // Max tool-use turns per sprint
141
+ "autoCommit": true, // Auto-commit after each sprint
142
+ "branchPattern": "bober/{feature-name}" // Git branch naming
143
+ },
144
+
145
+ // ── Evaluator ───────────────────────────────────
146
+ "evaluator": {
147
+ "model": "sonnet", // Model for evaluation reasoning
148
+ "strategies": [ // Evaluation strategies to run
149
+ { "type": "typecheck", "required": true },
150
+ { "type": "lint", "required": true },
151
+ { "type": "build", "required": true },
152
+ { "type": "unit-test", "required": true },
153
+ { "type": "playwright","required": false }
154
+ ],
155
+ "maxIterations": 3, // Max rework cycles per sprint
156
+ "plugins": [] // Custom evaluator plugin paths
157
+ },
158
+
159
+ // ── Sprint ──────────────────────────────────────
160
+ "sprint": {
161
+ "maxSprints": 10, // Max sprints per plan
162
+ "requireContracts": true, // Require contract agreement before coding
163
+ "sprintSize": "medium" // "small" | "medium" | "large"
164
+ },
165
+
166
+ // ── Pipeline ────────────────────────────────────
167
+ "pipeline": {
168
+ "maxIterations": 20, // Max total iterations across all sprints
169
+ "requireApproval": false, // Pause for user approval between sprints
170
+ "contextReset": "always" // "always" | "on-threshold" | "never"
171
+ },
172
+
173
+ // ── Commands ────────────────────────────────────
174
+ "commands": {
175
+ "install": "npm install",
176
+ "build": "npm run build",
177
+ "test": "npm test",
178
+ "lint": "npm run lint",
179
+ "dev": "npm run dev",
180
+ "typecheck": "npx tsc --noEmit"
181
+ }
182
+ }
183
+ ```
184
+
185
+ ### Sprint Sizes
186
+
187
+ | Size | Generator Effort | Files Changed | Scope |
188
+ |---|---|---|---|
189
+ | `small` | 30-60 min | 1-2 files | Single concern |
190
+ | `medium` | 1-3 hours | 3-8 files | One cohesive feature slice |
191
+ | `large` | 3-5 hours | 5-15 files | Full feature vertical |
192
+
193
+ ### Context Reset Modes
194
+
195
+ | Mode | Behavior |
196
+ |---|---|
197
+ | `always` | Fresh context for every sprint (recommended for long plans) |
198
+ | `on-threshold` | Reset when context usage exceeds 80% |
199
+ | `never` | Carry context across sprints (only for short plans) |
200
+
201
+ ---
202
+
203
+ ## Evaluator Strategies
204
+
205
+ ### Built-in Strategies
206
+
207
+ | Strategy | What It Does |
208
+ |---|---|
209
+ | `typecheck` | Runs the configured typecheck command (e.g., `tsc --noEmit`) |
210
+ | `lint` | Runs the configured lint command (e.g., `eslint .`) |
211
+ | `build` | Runs the configured build command and checks for success |
212
+ | `unit-test` | Runs the configured test command |
213
+ | `playwright` | Runs Playwright E2E tests |
214
+ | `api-check` | Validates API endpoints respond correctly |
215
+
216
+ ### Inline Command Evaluators
217
+
218
+ The strategy type is **open** — you can use any name and provide a shell command directly. No plugin file needed:
219
+
220
+ ```json
221
+ {
222
+ "evaluator": {
223
+ "strategies": [
224
+ { "type": "typecheck", "required": true },
225
+ { "type": "lint", "required": true },
226
+ { "type": "k6", "command": "k6 run load-test.js", "required": false, "label": "Load Test" },
227
+ { "type": "slither", "command": "slither .", "required": true, "label": "Security Audit" },
228
+ { "type": "anchor-verify", "command": "anchor verify", "required": true },
229
+ { "type": "cargo-test", "command": "cargo test", "required": true },
230
+ { "type": "pytest", "command": "pytest --tb=short", "required": true },
231
+ { "type": "mypy", "command": "mypy . --strict", "required": false }
232
+ ]
233
+ }
234
+ }
235
+ ```
236
+
237
+ Any strategy with a `command` field runs that command and checks the exit code (0 = pass). Error output is parsed and included in the evaluator feedback. You can set a custom `timeout` in the config:
238
+
239
+ ```json
240
+ { "type": "k6", "command": "k6 run load.js", "required": false, "config": { "timeout": 300000 } }
241
+ ```
242
+
243
+ ### Custom Evaluator Plugins
244
+
245
+ For more complex evaluation logic, write a plugin that implements the `EvaluatorPlugin` interface:
246
+
247
+ ```typescript
248
+ import type { EvaluatorPlugin, EvalContext, EvalResult } from "agent-bober";
249
+
250
+ const myPlugin: EvaluatorPlugin = {
251
+ name: "My Custom Check",
252
+ description: "Validates something specific to my project",
253
+
254
+ async canRun(_projectRoot, _config) {
255
+ return true;
256
+ },
257
+
258
+ async evaluate(context: EvalContext): Promise<EvalResult> {
259
+ return {
260
+ evaluator: "my-custom-check",
261
+ passed: true,
262
+ score: 100,
263
+ details: [],
264
+ summary: "All checks passed",
265
+ feedback: "Everything looks good.",
266
+ timestamp: new Date().toISOString(),
267
+ };
268
+ },
269
+ };
270
+
271
+ export default () => myPlugin;
272
+ ```
273
+
274
+ Register plugins in `bober.config.json`:
275
+
276
+ ```json
277
+ {
278
+ "evaluator": {
279
+ "strategies": [
280
+ { "type": "custom", "plugin": "./my-evaluator.ts", "required": true }
281
+ ]
282
+ }
283
+ }
284
+ ```
285
+
286
+ ---
287
+
288
+ ## Presets
289
+
290
+ ### `nextjs`
291
+
292
+ Next.js full-stack (App Router, API routes, Prisma). Includes:
293
+
294
+ - Next.js with TypeScript, Tailwind CSS, ESLint
295
+ - API routes for backend logic
296
+ - Prisma ORM for database access
297
+ - Vitest for unit tests, Playwright for E2E
298
+
299
+ ### `react-vite`
300
+
301
+ React + Vite + any backend. Includes:
302
+
303
+ - Vite dev server with React and TypeScript
304
+ - Vitest for unit tests, Playwright for E2E
305
+ - ESLint configured for TypeScript + React
306
+ - Flexible backend pairing (Express, Fastify, etc.)
307
+
308
+ ### `solidity`
309
+
310
+ EVM smart contracts (Hardhat/Foundry). Includes:
311
+
312
+ - Hardhat or Foundry project setup
313
+ - OpenZeppelin Contracts integration
314
+ - Solhint for linting
315
+ - Hardhat tests or Forge tests
316
+ - Deployment and verification scripts
317
+
318
+ ### `anchor`
319
+
320
+ Solana programs (Anchor/Rust). Includes:
321
+
322
+ - Anchor project setup with program scaffold
323
+ - TypeScript integration tests
324
+ - Cargo clippy for Rust linting
325
+ - IDL generation and client SDK
326
+ - Deployment scripts for devnet/mainnet
327
+
328
+ ### `api-node`
329
+
330
+ Node.js API (Express/NestJS/Fastify). Includes:
331
+
332
+ - TypeScript API project structure
333
+ - Testing with Vitest or Jest
334
+ - ESLint and TypeScript strict mode
335
+ - Database integration (Prisma/Drizzle)
336
+
337
+ ### `python-api`
338
+
339
+ Python API (FastAPI/Django). Includes:
340
+
341
+ - FastAPI or Django project structure
342
+ - pytest for testing
343
+ - Ruff/Black for linting and formatting
344
+ - SQLAlchemy or Django ORM for database access
345
+
346
+ ### `brownfield`
347
+
348
+ Existing codebase (conservative defaults). No scaffold files -- just configuration:
349
+
350
+ - Conservative sprint sizes (`small`)
351
+ - Higher evaluator iteration limit (5 rework cycles)
352
+ - Requires user approval between sprints
353
+ - Emphasizes reading existing patterns before making changes
354
+
355
+ ### `base`
356
+
357
+ Minimal config, planner decides everything. Just a `bober.config.json` with `build` as the only required evaluator strategy. Intended as a starting point for any tech stack not covered by other presets.
358
+
359
+ ---
360
+
361
+ ## Architecture
362
+
363
+ ### How the Agents Interact
364
+
365
+ ```
366
+ bober.config.json
367
+ |
368
+ +---------+---------+
369
+ | |
370
+ .bober/specs/ .bober/contracts/
371
+ | |
372
+ v v
373
+ User Idea --> [Planner] --> PlanSpec + SprintContracts
374
+ |
375
+ v
376
+ [Generator]
377
+ | ^
378
+ v | (rework feedback)
379
+ [Evaluator]
380
+ |
381
+ pass? ----+---- fail?
382
+ | |
383
+ [Next Sprint] [Rework Loop]
384
+ |
385
+ v
386
+ All sprints done
387
+ |
388
+ v
389
+ Feature Complete
390
+ ```
391
+
392
+ ### The Generator-Evaluator Pattern
393
+
394
+ This architecture implements the patterns described in Anthropic's [**"Harness design for long-running application development"**](https://www.anthropic.com/engineering/harness-design-long-running-apps) by Prithvi Rajasekaran. The key insight from that research: separating code generation from code evaluation creates a feedback loop that catches errors early and dramatically improves output quality. In their tests, a solo agent produced broken output in 20 minutes, while the full harness produced a polished, working application — demonstrating that multi-agent orchestration with honest evaluation is worth the investment.
395
+
396
+ - **Planner** (Claude Opus): High-reasoning model for decomposing complex features into clear, testable sprint contracts. Thinks about scope, dependencies, and risk.
397
+ - **Generator** (Claude Sonnet): Fast, capable model for writing code. Works within the boundaries of a single sprint contract.
398
+ - **Evaluator** (Claude Sonnet): Runs automated checks (typecheck, lint, build, tests) and provides structured feedback. If a sprint fails evaluation, the Generator gets specific rework instructions.
399
+
400
+ The separation ensures that:
401
+ 1. The Generator cannot "mark its own homework" -- an independent evaluation step catches issues.
402
+ 2. Sprint contracts provide clear scope boundaries, preventing feature creep.
403
+ 3. Automated checks run after every sprint, not just at the end.
404
+ 4. Context resets between sprints keep the Generator focused and prevent context degradation.
405
+
406
+ ### State Management
407
+
408
+ All bober state lives in the `.bober/` directory:
409
+
410
+ ```
411
+ .bober/
412
+ specs/ PlanSpec JSON files
413
+ contracts/ SprintContract JSON files
414
+ evaluations/ Evaluation result logs
415
+ snapshots/ Context snapshots (gitignored)
416
+ progress.md Human-readable progress tracker
417
+ history.jsonl Machine-readable event log
418
+ ```
419
+
420
+ ---
421
+
422
+ ## Shell Scripts
423
+
424
+ For environments where you need to run bober operations outside of Claude Code:
425
+
426
+ | Script | Purpose |
427
+ |---|---|
428
+ | `scripts/init-project.sh` | Initialize a project with a template |
429
+ | `scripts/detect-stack.sh` | Auto-detect tech stack (outputs JSON) |
430
+ | `scripts/run-eval.sh` | Run evaluation strategies from config |
431
+
432
+ ```bash
433
+ # Initialize a new project
434
+ bash scripts/init-project.sh nextjs
435
+
436
+ # Detect an existing project's stack
437
+ bash scripts/detect-stack.sh /path/to/project
438
+
439
+ # Run evaluations
440
+ bash scripts/run-eval.sh /path/to/project
441
+ ```
442
+
443
+ ---
444
+
445
+ ## Contributing
446
+
447
+ Contributions are welcome. To set up the development environment:
448
+
449
+ ```bash
450
+ git clone https://github.com/bober4ik/agent-bober.git
451
+ cd agent-bober
452
+ npm install
453
+ npm run build
454
+ npm run typecheck
455
+ npm test
456
+ ```
457
+
458
+ ### Project Structure
459
+
460
+ ```
461
+ agent-bober/
462
+ src/
463
+ cli/ CLI entry point (commander)
464
+ config/ Config schema, loader, defaults
465
+ contracts/ Sprint contract and eval result types
466
+ evaluators/ Built-in evaluator plugins
467
+ orchestrator/ Context handoff and agent coordination
468
+ state/ State management for .bober/ directory
469
+ utils/ Shared utilities
470
+ agents/ Agent system prompts (.md files)
471
+ skills/ Claude Code slash command definitions
472
+ templates/ Project templates and scaffolds
473
+ hooks/ Claude Code hooks
474
+ scripts/ Shell scripts for init, detect, eval
475
+ ```
476
+
477
+ ### Guidelines
478
+
479
+ - TypeScript strict mode, no `any`.
480
+ - ESM only (`"type": "module"`).
481
+ - All evaluator plugins implement the `EvaluatorPlugin` interface.
482
+ - Sprint contracts are validated against Zod schemas.
483
+ - Test with `vitest`. Run `npm test` before submitting.
484
+
485
+ ---
486
+
487
+ ## Acknowledgments
488
+
489
+ This project is inspired by and implements the patterns from Anthropic's [**"Harness design for long-running application development"**](https://www.anthropic.com/engineering/harness-design-long-running-apps) by Prithvi Rajasekaran. The paper demonstrated that separating generation from evaluation, using sprint contracts, and applying context resets between agents dramatically improves the quality of autonomously built software. agent-bober packages these patterns into a reusable tool.
490
+
491
+ ---
492
+
493
+ ## License
494
+
495
+ [MIT](LICENSE) -- Copyright (c) 2026 bober4ik