agent-bober 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +9 -0
- package/LICENSE +21 -0
- package/README.md +495 -0
- package/agents/bober-evaluator.md +323 -0
- package/agents/bober-generator.md +245 -0
- package/agents/bober-planner.md +248 -0
- package/dist/cli/commands/eval.d.ts +6 -0
- package/dist/cli/commands/eval.d.ts.map +1 -0
- package/dist/cli/commands/eval.js +129 -0
- package/dist/cli/commands/eval.js.map +1 -0
- package/dist/cli/commands/init.d.ts +5 -0
- package/dist/cli/commands/init.d.ts.map +1 -0
- package/dist/cli/commands/init.js +547 -0
- package/dist/cli/commands/init.js.map +1 -0
- package/dist/cli/commands/plan.d.ts +5 -0
- package/dist/cli/commands/plan.d.ts.map +1 -0
- package/dist/cli/commands/plan.js +87 -0
- package/dist/cli/commands/plan.js.map +1 -0
- package/dist/cli/commands/run.d.ts +5 -0
- package/dist/cli/commands/run.d.ts.map +1 -0
- package/dist/cli/commands/run.js +120 -0
- package/dist/cli/commands/run.js.map +1 -0
- package/dist/cli/commands/sprint.d.ts +6 -0
- package/dist/cli/commands/sprint.d.ts.map +1 -0
- package/dist/cli/commands/sprint.js +206 -0
- package/dist/cli/commands/sprint.js.map +1 -0
- package/dist/cli/index.d.ts +3 -0
- package/dist/cli/index.d.ts.map +1 -0
- package/dist/cli/index.js +124 -0
- package/dist/cli/index.js.map +1 -0
- package/dist/config/defaults.d.ts +15 -0
- package/dist/config/defaults.d.ts.map +1 -0
- package/dist/config/defaults.js +226 -0
- package/dist/config/defaults.js.map +1 -0
- package/dist/config/index.d.ts +4 -0
- package/dist/config/index.d.ts.map +1 -0
- package/dist/config/index.js +8 -0
- package/dist/config/index.js.map +1 -0
- package/dist/config/loader.d.ts +18 -0
- package/dist/config/loader.d.ts.map +1 -0
- package/dist/config/loader.js +189 -0
- package/dist/config/loader.js.map +1 -0
- package/dist/config/schema.d.ts +904 -0
- package/dist/config/schema.d.ts.map +1 -0
- package/dist/config/schema.js +181 -0
- package/dist/config/schema.js.map +1 -0
- package/dist/contracts/eval-result.d.ts +205 -0
- package/dist/contracts/eval-result.d.ts.map +1 -0
- package/dist/contracts/eval-result.js +87 -0
- package/dist/contracts/eval-result.js.map +1 -0
- package/dist/contracts/index.d.ts +4 -0
- package/dist/contracts/index.d.ts.map +1 -0
- package/dist/contracts/index.js +16 -0
- package/dist/contracts/index.js.map +1 -0
- package/dist/contracts/spec.d.ts +101 -0
- package/dist/contracts/spec.d.ts.map +1 -0
- package/dist/contracts/spec.js +51 -0
- package/dist/contracts/spec.js.map +1 -0
- package/dist/contracts/sprint-contract.d.ts +141 -0
- package/dist/contracts/sprint-contract.d.ts.map +1 -0
- package/dist/contracts/sprint-contract.js +80 -0
- package/dist/contracts/sprint-contract.js.map +1 -0
- package/dist/evaluators/builtin/api-check.d.ts +13 -0
- package/dist/evaluators/builtin/api-check.d.ts.map +1 -0
- package/dist/evaluators/builtin/api-check.js +152 -0
- package/dist/evaluators/builtin/api-check.js.map +1 -0
- package/dist/evaluators/builtin/build-check.d.ts +17 -0
- package/dist/evaluators/builtin/build-check.d.ts.map +1 -0
- package/dist/evaluators/builtin/build-check.js +155 -0
- package/dist/evaluators/builtin/build-check.js.map +1 -0
- package/dist/evaluators/builtin/command-runner.d.ts +26 -0
- package/dist/evaluators/builtin/command-runner.d.ts.map +1 -0
- package/dist/evaluators/builtin/command-runner.js +114 -0
- package/dist/evaluators/builtin/command-runner.js.map +1 -0
- package/dist/evaluators/builtin/lint.d.ts +17 -0
- package/dist/evaluators/builtin/lint.d.ts.map +1 -0
- package/dist/evaluators/builtin/lint.js +264 -0
- package/dist/evaluators/builtin/lint.js.map +1 -0
- package/dist/evaluators/builtin/playwright.d.ts +16 -0
- package/dist/evaluators/builtin/playwright.d.ts.map +1 -0
- package/dist/evaluators/builtin/playwright.js +238 -0
- package/dist/evaluators/builtin/playwright.js.map +1 -0
- package/dist/evaluators/builtin/typescript-check.d.ts +12 -0
- package/dist/evaluators/builtin/typescript-check.d.ts.map +1 -0
- package/dist/evaluators/builtin/typescript-check.js +155 -0
- package/dist/evaluators/builtin/typescript-check.js.map +1 -0
- package/dist/evaluators/builtin/unit-test.d.ts +18 -0
- package/dist/evaluators/builtin/unit-test.d.ts.map +1 -0
- package/dist/evaluators/builtin/unit-test.js +279 -0
- package/dist/evaluators/builtin/unit-test.js.map +1 -0
- package/dist/evaluators/index.d.ts +11 -0
- package/dist/evaluators/index.d.ts.map +1 -0
- package/dist/evaluators/index.js +13 -0
- package/dist/evaluators/index.js.map +1 -0
- package/dist/evaluators/plugin-interface.d.ts +50 -0
- package/dist/evaluators/plugin-interface.d.ts.map +1 -0
- package/dist/evaluators/plugin-interface.js +2 -0
- package/dist/evaluators/plugin-interface.js.map +1 -0
- package/dist/evaluators/plugin-loader.d.ts +18 -0
- package/dist/evaluators/plugin-loader.d.ts.map +1 -0
- package/dist/evaluators/plugin-loader.js +107 -0
- package/dist/evaluators/plugin-loader.js.map +1 -0
- package/dist/evaluators/registry.d.ts +78 -0
- package/dist/evaluators/registry.d.ts.map +1 -0
- package/dist/evaluators/registry.js +238 -0
- package/dist/evaluators/registry.js.map +1 -0
- package/dist/index.d.ts +17 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +22 -0
- package/dist/index.js.map +1 -0
- package/dist/orchestrator/context-handoff.d.ts +543 -0
- package/dist/orchestrator/context-handoff.d.ts.map +1 -0
- package/dist/orchestrator/context-handoff.js +133 -0
- package/dist/orchestrator/context-handoff.js.map +1 -0
- package/dist/orchestrator/evaluator-agent.d.ts +15 -0
- package/dist/orchestrator/evaluator-agent.d.ts.map +1 -0
- package/dist/orchestrator/evaluator-agent.js +233 -0
- package/dist/orchestrator/evaluator-agent.js.map +1 -0
- package/dist/orchestrator/generator-agent.d.ts +16 -0
- package/dist/orchestrator/generator-agent.d.ts.map +1 -0
- package/dist/orchestrator/generator-agent.js +147 -0
- package/dist/orchestrator/generator-agent.js.map +1 -0
- package/dist/orchestrator/pipeline.d.ts +24 -0
- package/dist/orchestrator/pipeline.d.ts.map +1 -0
- package/dist/orchestrator/pipeline.js +290 -0
- package/dist/orchestrator/pipeline.js.map +1 -0
- package/dist/orchestrator/planner-agent.d.ts +10 -0
- package/dist/orchestrator/planner-agent.d.ts.map +1 -0
- package/dist/orchestrator/planner-agent.js +187 -0
- package/dist/orchestrator/planner-agent.js.map +1 -0
- package/dist/state/helpers.d.ts +5 -0
- package/dist/state/helpers.d.ts.map +1 -0
- package/dist/state/helpers.js +8 -0
- package/dist/state/helpers.js.map +1 -0
- package/dist/state/history.d.ts +39 -0
- package/dist/state/history.d.ts.map +1 -0
- package/dist/state/history.js +162 -0
- package/dist/state/history.js.map +1 -0
- package/dist/state/index.d.ts +8 -0
- package/dist/state/index.d.ts.map +1 -0
- package/dist/state/index.js +22 -0
- package/dist/state/index.js.map +1 -0
- package/dist/state/plan-state.d.ts +21 -0
- package/dist/state/plan-state.d.ts.map +1 -0
- package/dist/state/plan-state.js +108 -0
- package/dist/state/plan-state.js.map +1 -0
- package/dist/state/sprint-state.d.ts +20 -0
- package/dist/state/sprint-state.d.ts.map +1 -0
- package/dist/state/sprint-state.js +98 -0
- package/dist/state/sprint-state.js.map +1 -0
- package/dist/utils/fs.d.ts +31 -0
- package/dist/utils/fs.d.ts.map +1 -0
- package/dist/utils/fs.js +67 -0
- package/dist/utils/fs.js.map +1 -0
- package/dist/utils/git.d.ts +35 -0
- package/dist/utils/git.d.ts.map +1 -0
- package/dist/utils/git.js +84 -0
- package/dist/utils/git.js.map +1 -0
- package/dist/utils/index.d.ts +4 -0
- package/dist/utils/index.d.ts.map +1 -0
- package/dist/utils/index.js +4 -0
- package/dist/utils/index.js.map +1 -0
- package/dist/utils/logger.d.ts +45 -0
- package/dist/utils/logger.d.ts.map +1 -0
- package/dist/utils/logger.js +73 -0
- package/dist/utils/logger.js.map +1 -0
- package/hooks/hooks.json +10 -0
- package/package.json +67 -0
- package/scripts/detect-stack.sh +287 -0
- package/scripts/init-project.sh +206 -0
- package/scripts/run-eval.sh +175 -0
- package/skills/bober.anchor/SKILL.md +365 -0
- package/skills/bober.anchor/references/anchor-guide.md +567 -0
- package/skills/bober.brownfield/SKILL.md +422 -0
- package/skills/bober.brownfield/references/codebase-analysis.md +304 -0
- package/skills/bober.eval/SKILL.md +235 -0
- package/skills/bober.eval/references/eval-strategies.md +407 -0
- package/skills/bober.eval/references/feedback-format.md +182 -0
- package/skills/bober.plan/SKILL.md +244 -0
- package/skills/bober.plan/references/clarification-guide.md +124 -0
- package/skills/bober.plan/references/spec-schema.md +253 -0
- package/skills/bober.react/SKILL.md +330 -0
- package/skills/bober.react/references/react-scaffold.md +344 -0
- package/skills/bober.run/SKILL.md +303 -0
- package/skills/bober.solidity/SKILL.md +416 -0
- package/skills/bober.solidity/references/solidity-guide.md +487 -0
- package/skills/bober.sprint/SKILL.md +280 -0
- package/skills/bober.sprint/references/contract-schema.md +251 -0
- package/templates/base/CLAUDE.md +20 -0
- package/templates/base/bober.config.json +35 -0
- package/templates/brownfield/CLAUDE.md +34 -0
- package/templates/brownfield/bober.config.json +37 -0
- package/templates/presets/anchor/CLAUDE.md +163 -0
- package/templates/presets/anchor/bober.config.json +9 -0
- package/templates/presets/api-node/CLAUDE.md +153 -0
- package/templates/presets/api-node/bober.config.json +10 -0
- package/templates/presets/nextjs/CLAUDE.md +82 -0
- package/templates/presets/nextjs/bober.config.json +14 -0
- package/templates/presets/python-api/CLAUDE.md +202 -0
- package/templates/presets/python-api/bober.config.json +9 -0
- package/templates/presets/react-vite/CLAUDE.md +71 -0
- package/templates/presets/react-vite/bober.config.json +53 -0
- package/templates/presets/react-vite/scaffold/package.json +45 -0
- package/templates/presets/react-vite/scaffold/server/index.ts +38 -0
- package/templates/presets/react-vite/scaffold/server/tsconfig.json +24 -0
- package/templates/presets/react-vite/scaffold/src/App.tsx +37 -0
- package/templates/presets/react-vite/scaffold/src/index.html +12 -0
- package/templates/presets/react-vite/scaffold/src/main.tsx +12 -0
- package/templates/presets/react-vite/scaffold/tsconfig.json +27 -0
- package/templates/presets/react-vite/scaffold/vite.config.ts +34 -0
- package/templates/presets/solidity/CLAUDE.md +106 -0
- package/templates/presets/solidity/bober.config.json +9 -0
|
@@ -0,0 +1,9 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "bober",
|
|
3
|
+
"description": "Generator-Evaluator multi-agent harness for building applications autonomously with Claude",
|
|
4
|
+
"version": "0.1.0",
|
|
5
|
+
"author": { "name": "bober4ik" },
|
|
6
|
+
"homepage": "https://github.com/bober4ik/agent-bober",
|
|
7
|
+
"repository": "https://github.com/bober4ik/agent-bober",
|
|
8
|
+
"license": "MIT"
|
|
9
|
+
}
|
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 bober4ik
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,495 @@
|
|
|
1
|
+
# agent-bober
|
|
2
|
+
|
|
3
|
+
**Generator-Evaluator multi-agent harness for building applications autonomously with Claude.**
|
|
4
|
+
|
|
5
|
+
Inspired by Anthropic's engineering publication [**"Harness design for long-running application development"**](https://www.anthropic.com/engineering/harness-design-long-running-apps), agent-bober implements the Generator-Evaluator multi-agent pattern as a reusable, installable workflow. It orchestrates multiple Claude agents in a structured loop: a **Planner** decomposes your idea into sprint contracts, a **Generator** writes the code, and an **Evaluator** independently verifies each sprint against its contract before moving on. The result is autonomous, high-quality software development with built-in guardrails, context resets, and brutally honest evaluation.
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
You describe a feature
|
|
9
|
+
|
|
|
10
|
+
v
|
|
11
|
+
+-----------+
|
|
12
|
+
| Planner | Asks clarifying questions, produces a PlanSpec
|
|
13
|
+
+-----------+ with sprint contracts and acceptance criteria.
|
|
14
|
+
|
|
|
15
|
+
v
|
|
16
|
+
+-----------+ +-----------+
|
|
17
|
+
| Generator | --> | Evaluator | Writes code, then verifies it:
|
|
18
|
+
+-----------+ +-----------+ typecheck, lint, build, tests.
|
|
19
|
+
^ |
|
|
20
|
+
| (rework) |
|
|
21
|
+
+---------------+
|
|
22
|
+
|
|
|
23
|
+
v Repeats per sprint until all
|
|
24
|
+
[Next Sprint] contracts are satisfied.
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Installation
|
|
30
|
+
|
|
31
|
+
```bash
|
|
32
|
+
# Install globally
|
|
33
|
+
npm install -g agent-bober
|
|
34
|
+
|
|
35
|
+
# Or use directly with npx
|
|
36
|
+
npx agent-bober init
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
agent-bober also works as a **Claude Code plugin**. If you install it as a dependency or globally, Claude Code will detect the plugin manifest and make `/bober:*` slash commands available in your sessions.
|
|
40
|
+
|
|
41
|
+
## Quick Start
|
|
42
|
+
|
|
43
|
+
### Any Project
|
|
44
|
+
```bash
|
|
45
|
+
npx agent-bober init
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
Interactive setup -- describe what you want to build, pick a preset or let the planner decide.
|
|
49
|
+
|
|
50
|
+
### With a Preset
|
|
51
|
+
```bash
|
|
52
|
+
npx agent-bober init nextjs # Next.js full-stack app
|
|
53
|
+
npx agent-bober init react-vite # React + Vite
|
|
54
|
+
npx agent-bober init solidity # EVM smart contracts (Hardhat)
|
|
55
|
+
npx agent-bober init anchor # Solana programs (Anchor)
|
|
56
|
+
npx agent-bober init api-node # Node.js API
|
|
57
|
+
npx agent-bober init python-api # Python API (FastAPI)
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
### Existing Codebase
|
|
61
|
+
```bash
|
|
62
|
+
cd your-existing-project
|
|
63
|
+
npx agent-bober init brownfield
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
Then in Claude Code:
|
|
67
|
+
```
|
|
68
|
+
/bober:plan # Describe your feature, get a structured plan
|
|
69
|
+
/bober:sprint # Execute the next sprint
|
|
70
|
+
/bober:eval # Evaluate the sprint output
|
|
71
|
+
/bober:run # Full autonomous pipeline
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
Specialized workflows:
|
|
75
|
+
```
|
|
76
|
+
/bober:react # React web app workflow
|
|
77
|
+
/bober:solidity # EVM smart contract workflow
|
|
78
|
+
/bober:anchor # Solana program workflow
|
|
79
|
+
/bober:brownfield # Existing codebase workflow
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
---
|
|
83
|
+
|
|
84
|
+
## Commands
|
|
85
|
+
|
|
86
|
+
### Slash Commands (Claude Code)
|
|
87
|
+
|
|
88
|
+
| Command | Description |
|
|
89
|
+
|---|---|
|
|
90
|
+
| `/bober:plan` | Plan any feature -- stack-agnostic |
|
|
91
|
+
| `/bober:sprint` | Execute the next sprint contract |
|
|
92
|
+
| `/bober:eval` | Evaluate current sprint output |
|
|
93
|
+
| `/bober:run` | Full autonomous pipeline |
|
|
94
|
+
| `/bober:react` | React web application workflow |
|
|
95
|
+
| `/bober:solidity` | EVM smart contract workflow |
|
|
96
|
+
| `/bober:anchor` | Solana program workflow |
|
|
97
|
+
| `/bober:brownfield` | Existing codebase workflow |
|
|
98
|
+
|
|
99
|
+
### CLI
|
|
100
|
+
|
|
101
|
+
```bash
|
|
102
|
+
npx agent-bober init [preset] # Initialize project (nextjs, react-vite, solidity, anchor, api-node, python-api, brownfield)
|
|
103
|
+
npx agent-bober plan # Run the planner
|
|
104
|
+
npx agent-bober sprint # Execute next sprint
|
|
105
|
+
npx agent-bober eval # Evaluate current sprint
|
|
106
|
+
npx agent-bober run # Full autonomous loop
|
|
107
|
+
npx agent-bober status # Show plan progress
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
---
|
|
111
|
+
|
|
112
|
+
## Configuration
|
|
113
|
+
|
|
114
|
+
All configuration lives in `bober.config.json` at your project root. The `init` command creates this file from a template, and you can customize it afterward.
|
|
115
|
+
|
|
116
|
+
### Full Configuration Reference
|
|
117
|
+
|
|
118
|
+
```jsonc
|
|
119
|
+
{
|
|
120
|
+
// ── Project ─────────────────────────────────────
|
|
121
|
+
"project": {
|
|
122
|
+
"name": "my-app", // Project name
|
|
123
|
+
"mode": "greenfield", // "greenfield" | "brownfield"
|
|
124
|
+
"preset": "nextjs", // Optional: "nextjs" | "react-vite" | "solidity" | "anchor" | "api-node" | "python-api"
|
|
125
|
+
"description": "A task management app with real-time collaboration"
|
|
126
|
+
},
|
|
127
|
+
|
|
128
|
+
// ── Planner ─────────────────────────────────────
|
|
129
|
+
"planner": {
|
|
130
|
+
"maxClarifications": 5, // Max clarifying questions (0 to skip)
|
|
131
|
+
"model": "opus", // Model for planning: "opus" | "sonnet" | "haiku"
|
|
132
|
+
"contextFiles": [ // Extra files the planner should read
|
|
133
|
+
"docs/architecture.md"
|
|
134
|
+
]
|
|
135
|
+
},
|
|
136
|
+
|
|
137
|
+
// ── Generator ───────────────────────────────────
|
|
138
|
+
"generator": {
|
|
139
|
+
"model": "sonnet", // Model for code generation
|
|
140
|
+
"maxTurnsPerSprint": 50, // Max tool-use turns per sprint
|
|
141
|
+
"autoCommit": true, // Auto-commit after each sprint
|
|
142
|
+
"branchPattern": "bober/{feature-name}" // Git branch naming
|
|
143
|
+
},
|
|
144
|
+
|
|
145
|
+
// ── Evaluator ───────────────────────────────────
|
|
146
|
+
"evaluator": {
|
|
147
|
+
"model": "sonnet", // Model for evaluation reasoning
|
|
148
|
+
"strategies": [ // Evaluation strategies to run
|
|
149
|
+
{ "type": "typecheck", "required": true },
|
|
150
|
+
{ "type": "lint", "required": true },
|
|
151
|
+
{ "type": "build", "required": true },
|
|
152
|
+
{ "type": "unit-test", "required": true },
|
|
153
|
+
{ "type": "playwright","required": false }
|
|
154
|
+
],
|
|
155
|
+
"maxIterations": 3, // Max rework cycles per sprint
|
|
156
|
+
"plugins": [] // Custom evaluator plugin paths
|
|
157
|
+
},
|
|
158
|
+
|
|
159
|
+
// ── Sprint ──────────────────────────────────────
|
|
160
|
+
"sprint": {
|
|
161
|
+
"maxSprints": 10, // Max sprints per plan
|
|
162
|
+
"requireContracts": true, // Require contract agreement before coding
|
|
163
|
+
"sprintSize": "medium" // "small" | "medium" | "large"
|
|
164
|
+
},
|
|
165
|
+
|
|
166
|
+
// ── Pipeline ────────────────────────────────────
|
|
167
|
+
"pipeline": {
|
|
168
|
+
"maxIterations": 20, // Max total iterations across all sprints
|
|
169
|
+
"requireApproval": false, // Pause for user approval between sprints
|
|
170
|
+
"contextReset": "always" // "always" | "on-threshold" | "never"
|
|
171
|
+
},
|
|
172
|
+
|
|
173
|
+
// ── Commands ────────────────────────────────────
|
|
174
|
+
"commands": {
|
|
175
|
+
"install": "npm install",
|
|
176
|
+
"build": "npm run build",
|
|
177
|
+
"test": "npm test",
|
|
178
|
+
"lint": "npm run lint",
|
|
179
|
+
"dev": "npm run dev",
|
|
180
|
+
"typecheck": "npx tsc --noEmit"
|
|
181
|
+
}
|
|
182
|
+
}
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
### Sprint Sizes
|
|
186
|
+
|
|
187
|
+
| Size | Generator Effort | Files Changed | Scope |
|
|
188
|
+
|---|---|---|---|
|
|
189
|
+
| `small` | 30-60 min | 1-2 files | Single concern |
|
|
190
|
+
| `medium` | 1-3 hours | 3-8 files | One cohesive feature slice |
|
|
191
|
+
| `large` | 3-5 hours | 5-15 files | Full feature vertical |
|
|
192
|
+
|
|
193
|
+
### Context Reset Modes
|
|
194
|
+
|
|
195
|
+
| Mode | Behavior |
|
|
196
|
+
|---|---|
|
|
197
|
+
| `always` | Fresh context for every sprint (recommended for long plans) |
|
|
198
|
+
| `on-threshold` | Reset when context usage exceeds 80% |
|
|
199
|
+
| `never` | Carry context across sprints (only for short plans) |
|
|
200
|
+
|
|
201
|
+
---
|
|
202
|
+
|
|
203
|
+
## Evaluator Strategies
|
|
204
|
+
|
|
205
|
+
### Built-in Strategies
|
|
206
|
+
|
|
207
|
+
| Strategy | What It Does |
|
|
208
|
+
|---|---|
|
|
209
|
+
| `typecheck` | Runs the configured typecheck command (e.g., `tsc --noEmit`) |
|
|
210
|
+
| `lint` | Runs the configured lint command (e.g., `eslint .`) |
|
|
211
|
+
| `build` | Runs the configured build command and checks for success |
|
|
212
|
+
| `unit-test` | Runs the configured test command |
|
|
213
|
+
| `playwright` | Runs Playwright E2E tests |
|
|
214
|
+
| `api-check` | Validates API endpoints respond correctly |
|
|
215
|
+
|
|
216
|
+
### Inline Command Evaluators
|
|
217
|
+
|
|
218
|
+
The strategy type is **open** — you can use any name and provide a shell command directly. No plugin file needed:
|
|
219
|
+
|
|
220
|
+
```json
|
|
221
|
+
{
|
|
222
|
+
"evaluator": {
|
|
223
|
+
"strategies": [
|
|
224
|
+
{ "type": "typecheck", "required": true },
|
|
225
|
+
{ "type": "lint", "required": true },
|
|
226
|
+
{ "type": "k6", "command": "k6 run load-test.js", "required": false, "label": "Load Test" },
|
|
227
|
+
{ "type": "slither", "command": "slither .", "required": true, "label": "Security Audit" },
|
|
228
|
+
{ "type": "anchor-verify", "command": "anchor verify", "required": true },
|
|
229
|
+
{ "type": "cargo-test", "command": "cargo test", "required": true },
|
|
230
|
+
{ "type": "pytest", "command": "pytest --tb=short", "required": true },
|
|
231
|
+
{ "type": "mypy", "command": "mypy . --strict", "required": false }
|
|
232
|
+
]
|
|
233
|
+
}
|
|
234
|
+
}
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
Any strategy with a `command` field runs that command and checks the exit code (0 = pass). Error output is parsed and included in the evaluator feedback. You can set a custom `timeout` in the config:
|
|
238
|
+
|
|
239
|
+
```json
|
|
240
|
+
{ "type": "k6", "command": "k6 run load.js", "required": false, "config": { "timeout": 300000 } }
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
### Custom Evaluator Plugins
|
|
244
|
+
|
|
245
|
+
For more complex evaluation logic, write a plugin that implements the `EvaluatorPlugin` interface:
|
|
246
|
+
|
|
247
|
+
```typescript
|
|
248
|
+
import type { EvaluatorPlugin, EvalContext, EvalResult } from "agent-bober";
|
|
249
|
+
|
|
250
|
+
const myPlugin: EvaluatorPlugin = {
|
|
251
|
+
name: "My Custom Check",
|
|
252
|
+
description: "Validates something specific to my project",
|
|
253
|
+
|
|
254
|
+
async canRun(_projectRoot, _config) {
|
|
255
|
+
return true;
|
|
256
|
+
},
|
|
257
|
+
|
|
258
|
+
async evaluate(context: EvalContext): Promise<EvalResult> {
|
|
259
|
+
return {
|
|
260
|
+
evaluator: "my-custom-check",
|
|
261
|
+
passed: true,
|
|
262
|
+
score: 100,
|
|
263
|
+
details: [],
|
|
264
|
+
summary: "All checks passed",
|
|
265
|
+
feedback: "Everything looks good.",
|
|
266
|
+
timestamp: new Date().toISOString(),
|
|
267
|
+
};
|
|
268
|
+
},
|
|
269
|
+
};
|
|
270
|
+
|
|
271
|
+
export default () => myPlugin;
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
Register plugins in `bober.config.json`:
|
|
275
|
+
|
|
276
|
+
```json
|
|
277
|
+
{
|
|
278
|
+
"evaluator": {
|
|
279
|
+
"strategies": [
|
|
280
|
+
{ "type": "custom", "plugin": "./my-evaluator.ts", "required": true }
|
|
281
|
+
]
|
|
282
|
+
}
|
|
283
|
+
}
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
---
|
|
287
|
+
|
|
288
|
+
## Presets
|
|
289
|
+
|
|
290
|
+
### `nextjs`
|
|
291
|
+
|
|
292
|
+
Next.js full-stack (App Router, API routes, Prisma). Includes:
|
|
293
|
+
|
|
294
|
+
- Next.js with TypeScript, Tailwind CSS, ESLint
|
|
295
|
+
- API routes for backend logic
|
|
296
|
+
- Prisma ORM for database access
|
|
297
|
+
- Vitest for unit tests, Playwright for E2E
|
|
298
|
+
|
|
299
|
+
### `react-vite`
|
|
300
|
+
|
|
301
|
+
React + Vite + any backend. Includes:
|
|
302
|
+
|
|
303
|
+
- Vite dev server with React and TypeScript
|
|
304
|
+
- Vitest for unit tests, Playwright for E2E
|
|
305
|
+
- ESLint configured for TypeScript + React
|
|
306
|
+
- Flexible backend pairing (Express, Fastify, etc.)
|
|
307
|
+
|
|
308
|
+
### `solidity`
|
|
309
|
+
|
|
310
|
+
EVM smart contracts (Hardhat/Foundry). Includes:
|
|
311
|
+
|
|
312
|
+
- Hardhat or Foundry project setup
|
|
313
|
+
- OpenZeppelin Contracts integration
|
|
314
|
+
- Solhint for linting
|
|
315
|
+
- Hardhat tests or Forge tests
|
|
316
|
+
- Deployment and verification scripts
|
|
317
|
+
|
|
318
|
+
### `anchor`
|
|
319
|
+
|
|
320
|
+
Solana programs (Anchor/Rust). Includes:
|
|
321
|
+
|
|
322
|
+
- Anchor project setup with program scaffold
|
|
323
|
+
- TypeScript integration tests
|
|
324
|
+
- Cargo clippy for Rust linting
|
|
325
|
+
- IDL generation and client SDK
|
|
326
|
+
- Deployment scripts for devnet/mainnet
|
|
327
|
+
|
|
328
|
+
### `api-node`
|
|
329
|
+
|
|
330
|
+
Node.js API (Express/NestJS/Fastify). Includes:
|
|
331
|
+
|
|
332
|
+
- TypeScript API project structure
|
|
333
|
+
- Testing with Vitest or Jest
|
|
334
|
+
- ESLint and TypeScript strict mode
|
|
335
|
+
- Database integration (Prisma/Drizzle)
|
|
336
|
+
|
|
337
|
+
### `python-api`
|
|
338
|
+
|
|
339
|
+
Python API (FastAPI/Django). Includes:
|
|
340
|
+
|
|
341
|
+
- FastAPI or Django project structure
|
|
342
|
+
- pytest for testing
|
|
343
|
+
- Ruff/Black for linting and formatting
|
|
344
|
+
- SQLAlchemy or Django ORM for database access
|
|
345
|
+
|
|
346
|
+
### `brownfield`
|
|
347
|
+
|
|
348
|
+
Existing codebase (conservative defaults). No scaffold files -- just configuration:
|
|
349
|
+
|
|
350
|
+
- Conservative sprint sizes (`small`)
|
|
351
|
+
- Higher evaluator iteration limit (5 rework cycles)
|
|
352
|
+
- Requires user approval between sprints
|
|
353
|
+
- Emphasizes reading existing patterns before making changes
|
|
354
|
+
|
|
355
|
+
### `base`
|
|
356
|
+
|
|
357
|
+
Minimal config, planner decides everything. Just a `bober.config.json` with `build` as the only required evaluator strategy. Intended as a starting point for any tech stack not covered by other presets.
|
|
358
|
+
|
|
359
|
+
---
|
|
360
|
+
|
|
361
|
+
## Architecture
|
|
362
|
+
|
|
363
|
+
### How the Agents Interact
|
|
364
|
+
|
|
365
|
+
```
|
|
366
|
+
bober.config.json
|
|
367
|
+
|
|
|
368
|
+
+---------+---------+
|
|
369
|
+
| |
|
|
370
|
+
.bober/specs/ .bober/contracts/
|
|
371
|
+
| |
|
|
372
|
+
v v
|
|
373
|
+
User Idea --> [Planner] --> PlanSpec + SprintContracts
|
|
374
|
+
|
|
|
375
|
+
v
|
|
376
|
+
[Generator]
|
|
377
|
+
| ^
|
|
378
|
+
v | (rework feedback)
|
|
379
|
+
[Evaluator]
|
|
380
|
+
|
|
|
381
|
+
pass? ----+---- fail?
|
|
382
|
+
| |
|
|
383
|
+
[Next Sprint] [Rework Loop]
|
|
384
|
+
|
|
|
385
|
+
v
|
|
386
|
+
All sprints done
|
|
387
|
+
|
|
|
388
|
+
v
|
|
389
|
+
Feature Complete
|
|
390
|
+
```
|
|
391
|
+
|
|
392
|
+
### The Generator-Evaluator Pattern
|
|
393
|
+
|
|
394
|
+
This architecture implements the patterns described in Anthropic's [**"Harness design for long-running application development"**](https://www.anthropic.com/engineering/harness-design-long-running-apps) by Prithvi Rajasekaran. The key insight from that research: separating code generation from code evaluation creates a feedback loop that catches errors early and dramatically improves output quality. In their tests, a solo agent produced broken output in 20 minutes, while the full harness produced a polished, working application — demonstrating that multi-agent orchestration with honest evaluation is worth the investment.
|
|
395
|
+
|
|
396
|
+
- **Planner** (Claude Opus): High-reasoning model for decomposing complex features into clear, testable sprint contracts. Thinks about scope, dependencies, and risk.
|
|
397
|
+
- **Generator** (Claude Sonnet): Fast, capable model for writing code. Works within the boundaries of a single sprint contract.
|
|
398
|
+
- **Evaluator** (Claude Sonnet): Runs automated checks (typecheck, lint, build, tests) and provides structured feedback. If a sprint fails evaluation, the Generator gets specific rework instructions.
|
|
399
|
+
|
|
400
|
+
The separation ensures that:
|
|
401
|
+
1. The Generator cannot "mark its own homework" -- an independent evaluation step catches issues.
|
|
402
|
+
2. Sprint contracts provide clear scope boundaries, preventing feature creep.
|
|
403
|
+
3. Automated checks run after every sprint, not just at the end.
|
|
404
|
+
4. Context resets between sprints keep the Generator focused and prevent context degradation.
|
|
405
|
+
|
|
406
|
+
### State Management
|
|
407
|
+
|
|
408
|
+
All bober state lives in the `.bober/` directory:
|
|
409
|
+
|
|
410
|
+
```
|
|
411
|
+
.bober/
|
|
412
|
+
specs/ PlanSpec JSON files
|
|
413
|
+
contracts/ SprintContract JSON files
|
|
414
|
+
evaluations/ Evaluation result logs
|
|
415
|
+
snapshots/ Context snapshots (gitignored)
|
|
416
|
+
progress.md Human-readable progress tracker
|
|
417
|
+
history.jsonl Machine-readable event log
|
|
418
|
+
```
|
|
419
|
+
|
|
420
|
+
---
|
|
421
|
+
|
|
422
|
+
## Shell Scripts
|
|
423
|
+
|
|
424
|
+
For environments where you need to run bober operations outside of Claude Code:
|
|
425
|
+
|
|
426
|
+
| Script | Purpose |
|
|
427
|
+
|---|---|
|
|
428
|
+
| `scripts/init-project.sh` | Initialize a project with a template |
|
|
429
|
+
| `scripts/detect-stack.sh` | Auto-detect tech stack (outputs JSON) |
|
|
430
|
+
| `scripts/run-eval.sh` | Run evaluation strategies from config |
|
|
431
|
+
|
|
432
|
+
```bash
|
|
433
|
+
# Initialize a new project
|
|
434
|
+
bash scripts/init-project.sh nextjs
|
|
435
|
+
|
|
436
|
+
# Detect an existing project's stack
|
|
437
|
+
bash scripts/detect-stack.sh /path/to/project
|
|
438
|
+
|
|
439
|
+
# Run evaluations
|
|
440
|
+
bash scripts/run-eval.sh /path/to/project
|
|
441
|
+
```
|
|
442
|
+
|
|
443
|
+
---
|
|
444
|
+
|
|
445
|
+
## Contributing
|
|
446
|
+
|
|
447
|
+
Contributions are welcome. To set up the development environment:
|
|
448
|
+
|
|
449
|
+
```bash
|
|
450
|
+
git clone https://github.com/bober4ik/agent-bober.git
|
|
451
|
+
cd agent-bober
|
|
452
|
+
npm install
|
|
453
|
+
npm run build
|
|
454
|
+
npm run typecheck
|
|
455
|
+
npm test
|
|
456
|
+
```
|
|
457
|
+
|
|
458
|
+
### Project Structure
|
|
459
|
+
|
|
460
|
+
```
|
|
461
|
+
agent-bober/
|
|
462
|
+
src/
|
|
463
|
+
cli/ CLI entry point (commander)
|
|
464
|
+
config/ Config schema, loader, defaults
|
|
465
|
+
contracts/ Sprint contract and eval result types
|
|
466
|
+
evaluators/ Built-in evaluator plugins
|
|
467
|
+
orchestrator/ Context handoff and agent coordination
|
|
468
|
+
state/ State management for .bober/ directory
|
|
469
|
+
utils/ Shared utilities
|
|
470
|
+
agents/ Agent system prompts (.md files)
|
|
471
|
+
skills/ Claude Code slash command definitions
|
|
472
|
+
templates/ Project templates and scaffolds
|
|
473
|
+
hooks/ Claude Code hooks
|
|
474
|
+
scripts/ Shell scripts for init, detect, eval
|
|
475
|
+
```
|
|
476
|
+
|
|
477
|
+
### Guidelines
|
|
478
|
+
|
|
479
|
+
- TypeScript strict mode, no `any`.
|
|
480
|
+
- ESM only (`"type": "module"`).
|
|
481
|
+
- All evaluator plugins implement the `EvaluatorPlugin` interface.
|
|
482
|
+
- Sprint contracts are validated against Zod schemas.
|
|
483
|
+
- Test with `vitest`. Run `npm test` before submitting.
|
|
484
|
+
|
|
485
|
+
---
|
|
486
|
+
|
|
487
|
+
## Acknowledgments
|
|
488
|
+
|
|
489
|
+
This project is inspired by and implements the patterns from Anthropic's [**"Harness design for long-running application development"**](https://www.anthropic.com/engineering/harness-design-long-running-apps) by Prithvi Rajasekaran. The paper demonstrated that separating generation from evaluation, using sprint contracts, and applying context resets between agents dramatically improves the quality of autonomously built software. agent-bober packages these patterns into a reusable tool.
|
|
490
|
+
|
|
491
|
+
---
|
|
492
|
+
|
|
493
|
+
## License
|
|
494
|
+
|
|
495
|
+
[MIT](LICENSE) -- Copyright (c) 2026 bober4ik
|