agent-bober 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +9 -0
- package/LICENSE +21 -0
- package/README.md +495 -0
- package/agents/bober-evaluator.md +323 -0
- package/agents/bober-generator.md +245 -0
- package/agents/bober-planner.md +248 -0
- package/dist/cli/commands/eval.d.ts +6 -0
- package/dist/cli/commands/eval.d.ts.map +1 -0
- package/dist/cli/commands/eval.js +129 -0
- package/dist/cli/commands/eval.js.map +1 -0
- package/dist/cli/commands/init.d.ts +5 -0
- package/dist/cli/commands/init.d.ts.map +1 -0
- package/dist/cli/commands/init.js +547 -0
- package/dist/cli/commands/init.js.map +1 -0
- package/dist/cli/commands/plan.d.ts +5 -0
- package/dist/cli/commands/plan.d.ts.map +1 -0
- package/dist/cli/commands/plan.js +87 -0
- package/dist/cli/commands/plan.js.map +1 -0
- package/dist/cli/commands/run.d.ts +5 -0
- package/dist/cli/commands/run.d.ts.map +1 -0
- package/dist/cli/commands/run.js +120 -0
- package/dist/cli/commands/run.js.map +1 -0
- package/dist/cli/commands/sprint.d.ts +6 -0
- package/dist/cli/commands/sprint.d.ts.map +1 -0
- package/dist/cli/commands/sprint.js +206 -0
- package/dist/cli/commands/sprint.js.map +1 -0
- package/dist/cli/index.d.ts +3 -0
- package/dist/cli/index.d.ts.map +1 -0
- package/dist/cli/index.js +124 -0
- package/dist/cli/index.js.map +1 -0
- package/dist/config/defaults.d.ts +15 -0
- package/dist/config/defaults.d.ts.map +1 -0
- package/dist/config/defaults.js +226 -0
- package/dist/config/defaults.js.map +1 -0
- package/dist/config/index.d.ts +4 -0
- package/dist/config/index.d.ts.map +1 -0
- package/dist/config/index.js +8 -0
- package/dist/config/index.js.map +1 -0
- package/dist/config/loader.d.ts +18 -0
- package/dist/config/loader.d.ts.map +1 -0
- package/dist/config/loader.js +189 -0
- package/dist/config/loader.js.map +1 -0
- package/dist/config/schema.d.ts +904 -0
- package/dist/config/schema.d.ts.map +1 -0
- package/dist/config/schema.js +181 -0
- package/dist/config/schema.js.map +1 -0
- package/dist/contracts/eval-result.d.ts +205 -0
- package/dist/contracts/eval-result.d.ts.map +1 -0
- package/dist/contracts/eval-result.js +87 -0
- package/dist/contracts/eval-result.js.map +1 -0
- package/dist/contracts/index.d.ts +4 -0
- package/dist/contracts/index.d.ts.map +1 -0
- package/dist/contracts/index.js +16 -0
- package/dist/contracts/index.js.map +1 -0
- package/dist/contracts/spec.d.ts +101 -0
- package/dist/contracts/spec.d.ts.map +1 -0
- package/dist/contracts/spec.js +51 -0
- package/dist/contracts/spec.js.map +1 -0
- package/dist/contracts/sprint-contract.d.ts +141 -0
- package/dist/contracts/sprint-contract.d.ts.map +1 -0
- package/dist/contracts/sprint-contract.js +80 -0
- package/dist/contracts/sprint-contract.js.map +1 -0
- package/dist/evaluators/builtin/api-check.d.ts +13 -0
- package/dist/evaluators/builtin/api-check.d.ts.map +1 -0
- package/dist/evaluators/builtin/api-check.js +152 -0
- package/dist/evaluators/builtin/api-check.js.map +1 -0
- package/dist/evaluators/builtin/build-check.d.ts +17 -0
- package/dist/evaluators/builtin/build-check.d.ts.map +1 -0
- package/dist/evaluators/builtin/build-check.js +155 -0
- package/dist/evaluators/builtin/build-check.js.map +1 -0
- package/dist/evaluators/builtin/command-runner.d.ts +26 -0
- package/dist/evaluators/builtin/command-runner.d.ts.map +1 -0
- package/dist/evaluators/builtin/command-runner.js +114 -0
- package/dist/evaluators/builtin/command-runner.js.map +1 -0
- package/dist/evaluators/builtin/lint.d.ts +17 -0
- package/dist/evaluators/builtin/lint.d.ts.map +1 -0
- package/dist/evaluators/builtin/lint.js +264 -0
- package/dist/evaluators/builtin/lint.js.map +1 -0
- package/dist/evaluators/builtin/playwright.d.ts +16 -0
- package/dist/evaluators/builtin/playwright.d.ts.map +1 -0
- package/dist/evaluators/builtin/playwright.js +238 -0
- package/dist/evaluators/builtin/playwright.js.map +1 -0
- package/dist/evaluators/builtin/typescript-check.d.ts +12 -0
- package/dist/evaluators/builtin/typescript-check.d.ts.map +1 -0
- package/dist/evaluators/builtin/typescript-check.js +155 -0
- package/dist/evaluators/builtin/typescript-check.js.map +1 -0
- package/dist/evaluators/builtin/unit-test.d.ts +18 -0
- package/dist/evaluators/builtin/unit-test.d.ts.map +1 -0
- package/dist/evaluators/builtin/unit-test.js +279 -0
- package/dist/evaluators/builtin/unit-test.js.map +1 -0
- package/dist/evaluators/index.d.ts +11 -0
- package/dist/evaluators/index.d.ts.map +1 -0
- package/dist/evaluators/index.js +13 -0
- package/dist/evaluators/index.js.map +1 -0
- package/dist/evaluators/plugin-interface.d.ts +50 -0
- package/dist/evaluators/plugin-interface.d.ts.map +1 -0
- package/dist/evaluators/plugin-interface.js +2 -0
- package/dist/evaluators/plugin-interface.js.map +1 -0
- package/dist/evaluators/plugin-loader.d.ts +18 -0
- package/dist/evaluators/plugin-loader.d.ts.map +1 -0
- package/dist/evaluators/plugin-loader.js +107 -0
- package/dist/evaluators/plugin-loader.js.map +1 -0
- package/dist/evaluators/registry.d.ts +78 -0
- package/dist/evaluators/registry.d.ts.map +1 -0
- package/dist/evaluators/registry.js +238 -0
- package/dist/evaluators/registry.js.map +1 -0
- package/dist/index.d.ts +17 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +22 -0
- package/dist/index.js.map +1 -0
- package/dist/orchestrator/context-handoff.d.ts +543 -0
- package/dist/orchestrator/context-handoff.d.ts.map +1 -0
- package/dist/orchestrator/context-handoff.js +133 -0
- package/dist/orchestrator/context-handoff.js.map +1 -0
- package/dist/orchestrator/evaluator-agent.d.ts +15 -0
- package/dist/orchestrator/evaluator-agent.d.ts.map +1 -0
- package/dist/orchestrator/evaluator-agent.js +233 -0
- package/dist/orchestrator/evaluator-agent.js.map +1 -0
- package/dist/orchestrator/generator-agent.d.ts +16 -0
- package/dist/orchestrator/generator-agent.d.ts.map +1 -0
- package/dist/orchestrator/generator-agent.js +147 -0
- package/dist/orchestrator/generator-agent.js.map +1 -0
- package/dist/orchestrator/pipeline.d.ts +24 -0
- package/dist/orchestrator/pipeline.d.ts.map +1 -0
- package/dist/orchestrator/pipeline.js +290 -0
- package/dist/orchestrator/pipeline.js.map +1 -0
- package/dist/orchestrator/planner-agent.d.ts +10 -0
- package/dist/orchestrator/planner-agent.d.ts.map +1 -0
- package/dist/orchestrator/planner-agent.js +187 -0
- package/dist/orchestrator/planner-agent.js.map +1 -0
- package/dist/state/helpers.d.ts +5 -0
- package/dist/state/helpers.d.ts.map +1 -0
- package/dist/state/helpers.js +8 -0
- package/dist/state/helpers.js.map +1 -0
- package/dist/state/history.d.ts +39 -0
- package/dist/state/history.d.ts.map +1 -0
- package/dist/state/history.js +162 -0
- package/dist/state/history.js.map +1 -0
- package/dist/state/index.d.ts +8 -0
- package/dist/state/index.d.ts.map +1 -0
- package/dist/state/index.js +22 -0
- package/dist/state/index.js.map +1 -0
- package/dist/state/plan-state.d.ts +21 -0
- package/dist/state/plan-state.d.ts.map +1 -0
- package/dist/state/plan-state.js +108 -0
- package/dist/state/plan-state.js.map +1 -0
- package/dist/state/sprint-state.d.ts +20 -0
- package/dist/state/sprint-state.d.ts.map +1 -0
- package/dist/state/sprint-state.js +98 -0
- package/dist/state/sprint-state.js.map +1 -0
- package/dist/utils/fs.d.ts +31 -0
- package/dist/utils/fs.d.ts.map +1 -0
- package/dist/utils/fs.js +67 -0
- package/dist/utils/fs.js.map +1 -0
- package/dist/utils/git.d.ts +35 -0
- package/dist/utils/git.d.ts.map +1 -0
- package/dist/utils/git.js +84 -0
- package/dist/utils/git.js.map +1 -0
- package/dist/utils/index.d.ts +4 -0
- package/dist/utils/index.d.ts.map +1 -0
- package/dist/utils/index.js +4 -0
- package/dist/utils/index.js.map +1 -0
- package/dist/utils/logger.d.ts +45 -0
- package/dist/utils/logger.d.ts.map +1 -0
- package/dist/utils/logger.js +73 -0
- package/dist/utils/logger.js.map +1 -0
- package/hooks/hooks.json +10 -0
- package/package.json +67 -0
- package/scripts/detect-stack.sh +287 -0
- package/scripts/init-project.sh +206 -0
- package/scripts/run-eval.sh +175 -0
- package/skills/bober.anchor/SKILL.md +365 -0
- package/skills/bober.anchor/references/anchor-guide.md +567 -0
- package/skills/bober.brownfield/SKILL.md +422 -0
- package/skills/bober.brownfield/references/codebase-analysis.md +304 -0
- package/skills/bober.eval/SKILL.md +235 -0
- package/skills/bober.eval/references/eval-strategies.md +407 -0
- package/skills/bober.eval/references/feedback-format.md +182 -0
- package/skills/bober.plan/SKILL.md +244 -0
- package/skills/bober.plan/references/clarification-guide.md +124 -0
- package/skills/bober.plan/references/spec-schema.md +253 -0
- package/skills/bober.react/SKILL.md +330 -0
- package/skills/bober.react/references/react-scaffold.md +344 -0
- package/skills/bober.run/SKILL.md +303 -0
- package/skills/bober.solidity/SKILL.md +416 -0
- package/skills/bober.solidity/references/solidity-guide.md +487 -0
- package/skills/bober.sprint/SKILL.md +280 -0
- package/skills/bober.sprint/references/contract-schema.md +251 -0
- package/templates/base/CLAUDE.md +20 -0
- package/templates/base/bober.config.json +35 -0
- package/templates/brownfield/CLAUDE.md +34 -0
- package/templates/brownfield/bober.config.json +37 -0
- package/templates/presets/anchor/CLAUDE.md +163 -0
- package/templates/presets/anchor/bober.config.json +9 -0
- package/templates/presets/api-node/CLAUDE.md +153 -0
- package/templates/presets/api-node/bober.config.json +10 -0
- package/templates/presets/nextjs/CLAUDE.md +82 -0
- package/templates/presets/nextjs/bober.config.json +14 -0
- package/templates/presets/python-api/CLAUDE.md +202 -0
- package/templates/presets/python-api/bober.config.json +9 -0
- package/templates/presets/react-vite/CLAUDE.md +71 -0
- package/templates/presets/react-vite/bober.config.json +53 -0
- package/templates/presets/react-vite/scaffold/package.json +45 -0
- package/templates/presets/react-vite/scaffold/server/index.ts +38 -0
- package/templates/presets/react-vite/scaffold/server/tsconfig.json +24 -0
- package/templates/presets/react-vite/scaffold/src/App.tsx +37 -0
- package/templates/presets/react-vite/scaffold/src/index.html +12 -0
- package/templates/presets/react-vite/scaffold/src/main.tsx +12 -0
- package/templates/presets/react-vite/scaffold/tsconfig.json +27 -0
- package/templates/presets/react-vite/scaffold/vite.config.ts +34 -0
- package/templates/presets/solidity/CLAUDE.md +106 -0
- package/templates/presets/solidity/bober.config.json +9 -0
|
@@ -0,0 +1,407 @@
|
|
|
1
|
+
# Evaluation Strategies Reference
|
|
2
|
+
|
|
3
|
+
This document describes all built-in evaluation strategies available in the Bober evaluator system. Strategies are configured in `bober.config.json` under `evaluator.strategies`.
|
|
4
|
+
|
|
5
|
+
## Strategy Configuration Format
|
|
6
|
+
|
|
7
|
+
Each strategy in the config array follows this structure:
|
|
8
|
+
```json
|
|
9
|
+
{
|
|
10
|
+
"type": "typecheck | lint | unit-test | playwright | api-check | build | custom",
|
|
11
|
+
"required": true,
|
|
12
|
+
"plugin": "string (optional, for custom strategies)",
|
|
13
|
+
"config": {
|
|
14
|
+
"key": "value (optional, strategy-specific configuration)"
|
|
15
|
+
}
|
|
16
|
+
}
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
The `required` field determines whether a strategy failure blocks the sprint from passing:
|
|
20
|
+
- `required: true` — Sprint FAILS if this strategy fails
|
|
21
|
+
- `required: false` — Strategy result is recorded but does not block the sprint
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## typecheck
|
|
26
|
+
|
|
27
|
+
**Purpose:** Verify that all TypeScript code compiles without type errors.
|
|
28
|
+
|
|
29
|
+
**Default command:** `npx tsc --noEmit`
|
|
30
|
+
**Config override:** `commands.typecheck` in `bober.config.json`
|
|
31
|
+
|
|
32
|
+
**What it checks:**
|
|
33
|
+
- All `.ts` and `.tsx` files compile under the project's `tsconfig.json`
|
|
34
|
+
- No type errors (TS2xxx codes)
|
|
35
|
+
- No missing imports or unresolved modules
|
|
36
|
+
- Strict mode violations (if `strict: true` in tsconfig)
|
|
37
|
+
|
|
38
|
+
**Pass criteria:** Zero type errors in output. Warnings do not cause failure.
|
|
39
|
+
|
|
40
|
+
**Common failures:**
|
|
41
|
+
- Missing type imports: `Cannot find module './types' or its corresponding type declarations`
|
|
42
|
+
- Type mismatch: `Type 'string' is not assignable to type 'number'`
|
|
43
|
+
- Missing properties: `Property 'name' is missing in type '{}' but required in type 'User'`
|
|
44
|
+
- Implicit any: `Parameter 'x' implicitly has an 'any' type` (when `noImplicitAny` is enabled)
|
|
45
|
+
|
|
46
|
+
**Configuration:**
|
|
47
|
+
```json
|
|
48
|
+
{
|
|
49
|
+
"type": "typecheck",
|
|
50
|
+
"required": true,
|
|
51
|
+
"config": {
|
|
52
|
+
"tsconfig": "tsconfig.json",
|
|
53
|
+
"strict": true
|
|
54
|
+
}
|
|
55
|
+
}
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
**Notes:**
|
|
59
|
+
- Runs against the full project, not just files changed in the sprint
|
|
60
|
+
- Catches regressions in existing code caused by the sprint's changes
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
## lint
|
|
65
|
+
|
|
66
|
+
**Purpose:** Verify code follows the project's linting rules.
|
|
67
|
+
|
|
68
|
+
**Default command:** `npm run lint`
|
|
69
|
+
**Config override:** `commands.lint` in `bober.config.json`
|
|
70
|
+
|
|
71
|
+
**Supported linters:**
|
|
72
|
+
- **ESLint** (most common): Detected by `eslint.config.js`, `.eslintrc.*`, or `eslint` in devDependencies
|
|
73
|
+
- **Biome**: Detected by `biome.json` or `@biomejs/biome` in devDependencies
|
|
74
|
+
- **Both:** Some projects use both. Run whatever `commands.lint` specifies.
|
|
75
|
+
|
|
76
|
+
**What it checks:**
|
|
77
|
+
- Code style violations
|
|
78
|
+
- Potential bugs (unused variables, unreachable code, implicit type coercion)
|
|
79
|
+
- Import order and organization
|
|
80
|
+
- Framework-specific rules (React hooks rules, etc.)
|
|
81
|
+
|
|
82
|
+
**Pass criteria:** Zero errors. Warnings are acceptable (but should be noted in the report).
|
|
83
|
+
|
|
84
|
+
**Common failures:**
|
|
85
|
+
- Unused variables: `'x' is defined but never used`
|
|
86
|
+
- Missing dependencies in hook deps: `React Hook useEffect has a missing dependency`
|
|
87
|
+
- Prefer const: `'x' is never reassigned. Use 'const' instead`
|
|
88
|
+
- Import order violations
|
|
89
|
+
|
|
90
|
+
**Configuration:**
|
|
91
|
+
```json
|
|
92
|
+
{
|
|
93
|
+
"type": "lint",
|
|
94
|
+
"required": true,
|
|
95
|
+
"config": {
|
|
96
|
+
"fix": false,
|
|
97
|
+
"maxWarnings": -1
|
|
98
|
+
}
|
|
99
|
+
}
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
**Notes:**
|
|
103
|
+
- `fix: false` means the evaluator reports violations without auto-fixing them. The Generator must fix them.
|
|
104
|
+
- `maxWarnings: -1` means unlimited warnings are tolerated. Set a number to fail on too many warnings.
|
|
105
|
+
|
|
106
|
+
---
|
|
107
|
+
|
|
108
|
+
## unit-test
|
|
109
|
+
|
|
110
|
+
**Purpose:** Verify that unit tests pass, including both new tests and pre-existing tests.
|
|
111
|
+
|
|
112
|
+
**Default command:** `npm test`
|
|
113
|
+
**Config override:** `commands.test` in `bober.config.json`
|
|
114
|
+
|
|
115
|
+
**Supported frameworks:**
|
|
116
|
+
- **Vitest**: Detected by `vitest` in devDependencies or `vitest.config.*`
|
|
117
|
+
- **Jest**: Detected by `jest` in devDependencies or `jest.config.*`
|
|
118
|
+
- **Mocha**: Detected by `mocha` in devDependencies
|
|
119
|
+
- **Custom:** Whatever `commands.test` runs
|
|
120
|
+
|
|
121
|
+
**What it checks:**
|
|
122
|
+
- All tests pass (both new and existing)
|
|
123
|
+
- No test regressions (existing tests that previously passed should still pass)
|
|
124
|
+
- Test coverage (if configured)
|
|
125
|
+
|
|
126
|
+
**Pass criteria:** All tests pass with exit code 0.
|
|
127
|
+
|
|
128
|
+
**Common failures:**
|
|
129
|
+
- Assertion failures: `Expected 200 but received 500`
|
|
130
|
+
- Missing test dependencies: Module not found errors in test files
|
|
131
|
+
- Timeout: Tests that hang due to unresolved promises or server connections
|
|
132
|
+
- Snapshot mismatches (for snapshot testing)
|
|
133
|
+
|
|
134
|
+
**Configuration:**
|
|
135
|
+
```json
|
|
136
|
+
{
|
|
137
|
+
"type": "unit-test",
|
|
138
|
+
"required": true,
|
|
139
|
+
"config": {
|
|
140
|
+
"coverage": false,
|
|
141
|
+
"coverageThreshold": 80,
|
|
142
|
+
"testMatch": "**/*.test.{ts,tsx}",
|
|
143
|
+
"timeout": 30000
|
|
144
|
+
}
|
|
145
|
+
}
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
**Notes:**
|
|
149
|
+
- If `coverage: true`, the evaluator checks that coverage meets `coverageThreshold`
|
|
150
|
+
- The evaluator should count total tests, passed, failed, and skipped
|
|
151
|
+
- If no tests exist yet and this is the first sprint, the strategy passes vacuously but the evaluator should note "no tests found" in the report
|
|
152
|
+
|
|
153
|
+
---
|
|
154
|
+
|
|
155
|
+
## playwright
|
|
156
|
+
|
|
157
|
+
**Purpose:** Run end-to-end browser tests that verify the application works from a user's perspective.
|
|
158
|
+
|
|
159
|
+
**Default command:** `npx playwright test`
|
|
160
|
+
**Config override:** Strategy-specific config
|
|
161
|
+
|
|
162
|
+
**Prerequisites:**
|
|
163
|
+
- Playwright must be installed: `npx playwright install` (installs browsers)
|
|
164
|
+
- A dev server must be running or `webServer` must be configured in `playwright.config.ts`
|
|
165
|
+
- Test files must exist (usually in `tests/` or `e2e/` directory)
|
|
166
|
+
|
|
167
|
+
**What it checks:**
|
|
168
|
+
- Full user flows work end-to-end (login, navigation, form submission, etc.)
|
|
169
|
+
- UI renders correctly in a real browser
|
|
170
|
+
- Client-server interaction works
|
|
171
|
+
- No console errors or unhandled exceptions
|
|
172
|
+
|
|
173
|
+
**Pass criteria:** All Playwright tests pass.
|
|
174
|
+
|
|
175
|
+
**Common failures:**
|
|
176
|
+
- Element not found: `Timeout waiting for selector '#login-form'`
|
|
177
|
+
- Navigation error: `Page navigated to unexpected URL`
|
|
178
|
+
- Network error: API calls returning errors
|
|
179
|
+
- Visual regression: Screenshot comparison failures
|
|
180
|
+
|
|
181
|
+
**Configuration:**
|
|
182
|
+
```json
|
|
183
|
+
{
|
|
184
|
+
"type": "playwright",
|
|
185
|
+
"required": false,
|
|
186
|
+
"config": {
|
|
187
|
+
"project": "chromium",
|
|
188
|
+
"retries": 1,
|
|
189
|
+
"timeout": 60000,
|
|
190
|
+
"webServer": {
|
|
191
|
+
"command": "npm run dev",
|
|
192
|
+
"port": 3000,
|
|
193
|
+
"reuseExistingServer": true,
|
|
194
|
+
"timeout": 30000
|
|
195
|
+
}
|
|
196
|
+
}
|
|
197
|
+
}
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
**Notes:**
|
|
201
|
+
- Default `required: false` because Playwright setup is non-trivial. Mark as `required: true` only when E2E tests are critical and known to be configured.
|
|
202
|
+
- If Playwright is not installed, the evaluator marks this as `skipped` (not failed), even if `required: true`. It should flag this as a configuration issue.
|
|
203
|
+
- The evaluator should try to start the dev server before running tests if `webServer` is configured.
|
|
204
|
+
|
|
205
|
+
---
|
|
206
|
+
|
|
207
|
+
## api-check
|
|
208
|
+
|
|
209
|
+
**Purpose:** Verify that HTTP API endpoints respond correctly.
|
|
210
|
+
|
|
211
|
+
**Default command:** Uses `curl` or the configured HTTP client
|
|
212
|
+
**Config override:** Strategy-specific config
|
|
213
|
+
|
|
214
|
+
**What it checks:**
|
|
215
|
+
- Endpoints exist and respond
|
|
216
|
+
- Correct HTTP status codes
|
|
217
|
+
- Response body structure matches expectations
|
|
218
|
+
- Error responses are properly formatted
|
|
219
|
+
- Content-Type headers are correct
|
|
220
|
+
|
|
221
|
+
**Pass criteria:** All configured endpoint checks return expected status codes and response shapes.
|
|
222
|
+
|
|
223
|
+
**Configuration:**
|
|
224
|
+
```json
|
|
225
|
+
{
|
|
226
|
+
"type": "api-check",
|
|
227
|
+
"required": true,
|
|
228
|
+
"config": {
|
|
229
|
+
"baseUrl": "http://localhost:3000",
|
|
230
|
+
"startServer": true,
|
|
231
|
+
"serverCommand": "npm run dev",
|
|
232
|
+
"serverReadyPattern": "listening on port",
|
|
233
|
+
"serverTimeout": 15000,
|
|
234
|
+
"endpoints": [
|
|
235
|
+
{
|
|
236
|
+
"method": "POST",
|
|
237
|
+
"path": "/api/auth/register",
|
|
238
|
+
"body": { "email": "test@example.com", "password": "testpassword123" },
|
|
239
|
+
"expectedStatus": 201,
|
|
240
|
+
"expectedBodyKeys": ["id", "email"]
|
|
241
|
+
},
|
|
242
|
+
{
|
|
243
|
+
"method": "POST",
|
|
244
|
+
"path": "/api/auth/register",
|
|
245
|
+
"body": { "email": "test@example.com", "password": "testpassword123" },
|
|
246
|
+
"expectedStatus": 400,
|
|
247
|
+
"description": "Duplicate registration should fail"
|
|
248
|
+
}
|
|
249
|
+
]
|
|
250
|
+
}
|
|
251
|
+
}
|
|
252
|
+
```
|
|
253
|
+
|
|
254
|
+
**Notes:**
|
|
255
|
+
- The evaluator typically derives endpoint checks from the sprint contract's success criteria rather than relying solely on pre-configured endpoints
|
|
256
|
+
- If `startServer: true`, the evaluator starts the dev server, waits for `serverReadyPattern` in stdout, runs checks, then stops the server
|
|
257
|
+
- API checks are often used in combination with `manual` verification for the same criterion
|
|
258
|
+
|
|
259
|
+
---
|
|
260
|
+
|
|
261
|
+
## build
|
|
262
|
+
|
|
263
|
+
**Purpose:** Verify that the project compiles/builds without errors.
|
|
264
|
+
|
|
265
|
+
**Default command:** `npm run build`
|
|
266
|
+
**Config override:** `commands.build` in `bober.config.json`
|
|
267
|
+
|
|
268
|
+
**What it checks:**
|
|
269
|
+
- The full build pipeline completes successfully
|
|
270
|
+
- No compilation errors
|
|
271
|
+
- All assets are generated correctly
|
|
272
|
+
- Build output exists in the expected directory
|
|
273
|
+
|
|
274
|
+
**Pass criteria:** Build command exits with code 0 and no errors in output.
|
|
275
|
+
|
|
276
|
+
**Common failures:**
|
|
277
|
+
- Import errors: Missing modules or circular dependencies
|
|
278
|
+
- Syntax errors in new code
|
|
279
|
+
- Environment variable issues
|
|
280
|
+
- Asset processing failures (CSS, images)
|
|
281
|
+
- Bundle size exceeded (if configured)
|
|
282
|
+
|
|
283
|
+
**Configuration:**
|
|
284
|
+
```json
|
|
285
|
+
{
|
|
286
|
+
"type": "build",
|
|
287
|
+
"required": true,
|
|
288
|
+
"config": {
|
|
289
|
+
"outputDir": "dist",
|
|
290
|
+
"verifyOutput": true
|
|
291
|
+
}
|
|
292
|
+
}
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
**Notes:**
|
|
296
|
+
- This should almost always be `required: true`. If the project does not build, nothing else matters.
|
|
297
|
+
- `verifyOutput: true` means the evaluator checks that the output directory exists and is non-empty after the build
|
|
298
|
+
- This is different from `typecheck` -- `build` runs the full build pipeline (bundling, optimization, etc.), while `typecheck` only verifies types
|
|
299
|
+
|
|
300
|
+
---
|
|
301
|
+
|
|
302
|
+
## custom
|
|
303
|
+
|
|
304
|
+
**Purpose:** Run a user-defined evaluation command for project-specific checks.
|
|
305
|
+
|
|
306
|
+
**Default command:** None (must be configured)
|
|
307
|
+
**Config override:** Strategy-specific config
|
|
308
|
+
|
|
309
|
+
**What it checks:** Whatever the custom command checks. The evaluator interprets results based on exit code and output.
|
|
310
|
+
|
|
311
|
+
**Pass criteria:** Command exits with code 0.
|
|
312
|
+
|
|
313
|
+
**Configuration:**
|
|
314
|
+
```json
|
|
315
|
+
{
|
|
316
|
+
"type": "custom",
|
|
317
|
+
"required": false,
|
|
318
|
+
"plugin": "check-bundle-size",
|
|
319
|
+
"config": {
|
|
320
|
+
"command": "node scripts/check-bundle-size.js",
|
|
321
|
+
"maxSizeKb": 500,
|
|
322
|
+
"parseOutput": "json",
|
|
323
|
+
"passCondition": "output.passed === true"
|
|
324
|
+
}
|
|
325
|
+
}
|
|
326
|
+
```
|
|
327
|
+
|
|
328
|
+
**How to write a custom evaluator plugin:**
|
|
329
|
+
|
|
330
|
+
A custom evaluator is a script or command that:
|
|
331
|
+
1. Runs a specific check
|
|
332
|
+
2. Outputs results to stdout (optionally as JSON for structured parsing)
|
|
333
|
+
3. Exits with code 0 for pass, non-zero for fail
|
|
334
|
+
|
|
335
|
+
**Example custom evaluator script:**
|
|
336
|
+
```javascript
|
|
337
|
+
// scripts/check-bundle-size.js
|
|
338
|
+
import { statSync } from 'fs';
|
|
339
|
+
import { glob } from 'glob';
|
|
340
|
+
|
|
341
|
+
const MAX_SIZE_KB = 500;
|
|
342
|
+
const files = glob.sync('dist/**/*.js');
|
|
343
|
+
const totalSize = files.reduce((sum, f) => sum + statSync(f).size, 0);
|
|
344
|
+
const sizeKb = totalSize / 1024;
|
|
345
|
+
|
|
346
|
+
if (sizeKb > MAX_SIZE_KB) {
|
|
347
|
+
console.error(`Bundle size ${sizeKb.toFixed(1)}KB exceeds limit of ${MAX_SIZE_KB}KB`);
|
|
348
|
+
process.exit(1);
|
|
349
|
+
} else {
|
|
350
|
+
console.log(`Bundle size OK: ${sizeKb.toFixed(1)}KB / ${MAX_SIZE_KB}KB`);
|
|
351
|
+
process.exit(0);
|
|
352
|
+
}
|
|
353
|
+
```
|
|
354
|
+
|
|
355
|
+
**Plugin naming:** The `plugin` field is a human-readable name for the check. It appears in evaluation reports.
|
|
356
|
+
|
|
357
|
+
**Advanced custom evaluators:**
|
|
358
|
+
- Output JSON with `parseOutput: "json"` for structured results
|
|
359
|
+
- Use `passCondition` to evaluate a JavaScript expression against the parsed output
|
|
360
|
+
- Chain multiple commands with `&&` in the command string
|
|
361
|
+
|
|
362
|
+
---
|
|
363
|
+
|
|
364
|
+
## Strategy Execution Order
|
|
365
|
+
|
|
366
|
+
The evaluator runs strategies in this recommended order for fastest feedback:
|
|
367
|
+
|
|
368
|
+
1. **build** — If the build fails, everything else is likely unreliable
|
|
369
|
+
2. **typecheck** — Type errors indicate fundamental code issues
|
|
370
|
+
3. **lint** — Style and potential bug detection
|
|
371
|
+
4. **unit-test** — Functional correctness of individual units
|
|
372
|
+
5. **api-check** — API endpoint verification (requires running server)
|
|
373
|
+
6. **playwright** — Full E2E testing (most expensive, most comprehensive)
|
|
374
|
+
7. **custom** — Project-specific checks
|
|
375
|
+
|
|
376
|
+
The evaluator should continue running all strategies even if an early one fails, so the Generator gets complete feedback in one pass.
|
|
377
|
+
|
|
378
|
+
---
|
|
379
|
+
|
|
380
|
+
## Default Strategy Sets by Preset
|
|
381
|
+
|
|
382
|
+
### nextjs / react-vite
|
|
383
|
+
```json
|
|
384
|
+
[
|
|
385
|
+
{ "type": "typecheck", "required": true },
|
|
386
|
+
{ "type": "lint", "required": true },
|
|
387
|
+
{ "type": "build", "required": true },
|
|
388
|
+
{ "type": "playwright", "required": false }
|
|
389
|
+
]
|
|
390
|
+
```
|
|
391
|
+
|
|
392
|
+
### brownfield
|
|
393
|
+
```json
|
|
394
|
+
[
|
|
395
|
+
{ "type": "typecheck", "required": true },
|
|
396
|
+
{ "type": "lint", "required": true },
|
|
397
|
+
{ "type": "unit-test", "required": true }
|
|
398
|
+
]
|
|
399
|
+
```
|
|
400
|
+
|
|
401
|
+
### generic
|
|
402
|
+
```json
|
|
403
|
+
[
|
|
404
|
+
{ "type": "build", "required": true },
|
|
405
|
+
{ "type": "lint", "required": false }
|
|
406
|
+
]
|
|
407
|
+
```
|
|
@@ -0,0 +1,182 @@
|
|
|
1
|
+
# Evaluation Feedback Format
|
|
2
|
+
|
|
3
|
+
This document defines how evaluation feedback should be structured for maximum effectiveness when consumed by the Generator agent during retry iterations.
|
|
4
|
+
|
|
5
|
+
## Principles
|
|
6
|
+
|
|
7
|
+
1. **Actionable over descriptive.** Every piece of feedback should give the Generator enough information to fix the issue without guessing.
|
|
8
|
+
2. **Precise location.** Always include file paths and line numbers when applicable. "There's a bug in the auth code" is useless. "src/routes/auth.ts:42 — the bcrypt.hash call is missing the salt rounds argument" is actionable.
|
|
9
|
+
3. **One issue per feedback item.** Do not combine multiple issues into one feedback entry. The Generator processes each item independently.
|
|
10
|
+
4. **Prioritized.** Critical issues (build failures, type errors) come before minor issues (style, optimization).
|
|
11
|
+
|
|
12
|
+
## EvalResult JSON Schema
|
|
13
|
+
|
|
14
|
+
```json
|
|
15
|
+
{
|
|
16
|
+
"evalId": "string (required, format: eval-<contractId>-<iteration>)",
|
|
17
|
+
"contractId": "string (required)",
|
|
18
|
+
"specId": "string (required)",
|
|
19
|
+
"timestamp": "string (required, ISO-8601)",
|
|
20
|
+
"iteration": "number (required, 1-indexed)",
|
|
21
|
+
"overallResult": "string (required, one of: pass, fail)",
|
|
22
|
+
|
|
23
|
+
"score": {
|
|
24
|
+
"criteriaTotal": "number",
|
|
25
|
+
"criteriaPassed": "number",
|
|
26
|
+
"criteriaFailed": "number",
|
|
27
|
+
"criteriaSkipped": "number",
|
|
28
|
+
"requiredPassed": "number",
|
|
29
|
+
"requiredFailed": "number",
|
|
30
|
+
"requiredTotal": "number"
|
|
31
|
+
},
|
|
32
|
+
|
|
33
|
+
"strategyResults": [
|
|
34
|
+
{
|
|
35
|
+
"strategy": "string (strategy type)",
|
|
36
|
+
"required": "boolean",
|
|
37
|
+
"result": "string (pass | fail | skipped)",
|
|
38
|
+
"exitCode": "number (optional)",
|
|
39
|
+
"output": "string (relevant output excerpt, not full dump)",
|
|
40
|
+
"errorCount": "number (optional)",
|
|
41
|
+
"details": "string (explanation, especially for failures)"
|
|
42
|
+
}
|
|
43
|
+
],
|
|
44
|
+
|
|
45
|
+
"criteriaResults": [
|
|
46
|
+
{
|
|
47
|
+
"criterionId": "string (from contract)",
|
|
48
|
+
"description": "string (from contract)",
|
|
49
|
+
"required": "boolean",
|
|
50
|
+
"result": "string (pass | fail | skipped)",
|
|
51
|
+
"evidence": "string (specific evidence supporting judgment)",
|
|
52
|
+
"feedback": "string (if failed: what went wrong and what should happen instead)"
|
|
53
|
+
}
|
|
54
|
+
],
|
|
55
|
+
|
|
56
|
+
"regressions": [
|
|
57
|
+
{
|
|
58
|
+
"description": "string (what regressed)",
|
|
59
|
+
"evidence": "string (how detected)",
|
|
60
|
+
"severity": "string (critical | major | minor)",
|
|
61
|
+
"affectedFiles": ["string (file paths)"]
|
|
62
|
+
}
|
|
63
|
+
],
|
|
64
|
+
|
|
65
|
+
"generatorFeedback": [
|
|
66
|
+
{
|
|
67
|
+
"priority": "string (critical | high | medium | low)",
|
|
68
|
+
"category": "string (bug | missing-feature | regression | quality | performance)",
|
|
69
|
+
"file": "string (file path, if applicable)",
|
|
70
|
+
"line": "number (line number, if applicable)",
|
|
71
|
+
"description": "string (precise description of the issue)",
|
|
72
|
+
"expected": "string (what should happen instead)",
|
|
73
|
+
"reproduction": "string (steps to reproduce, if applicable)"
|
|
74
|
+
}
|
|
75
|
+
],
|
|
76
|
+
|
|
77
|
+
"summary": "string (2-3 sentence summary)"
|
|
78
|
+
}
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
## Priority Levels
|
|
82
|
+
|
|
83
|
+
| Priority | Meaning | Examples |
|
|
84
|
+
|----------|---------|---------|
|
|
85
|
+
| `critical` | Sprint cannot pass until this is fixed. Typically build/type errors or complete feature absence. | Build fails, type error in new code, required feature completely missing |
|
|
86
|
+
| `high` | Required criterion failed. Must be fixed for the sprint to pass. | API returns wrong status code, form validation not working, test assertion failing |
|
|
87
|
+
| `medium` | Non-required criterion failed or quality issue. Should be fixed but won't block the sprint. | Lint errors, missing error handling for edge case, incomplete accessibility |
|
|
88
|
+
| `low` | Minor quality issue or suggestion. Can be deferred. | Code style inconsistency, opportunity for optimization, extra console.log |
|
|
89
|
+
|
|
90
|
+
## Category Definitions
|
|
91
|
+
|
|
92
|
+
| Category | Description |
|
|
93
|
+
|----------|-------------|
|
|
94
|
+
| `bug` | Code that does not behave as specified. Incorrect logic, wrong return values, unhandled errors. |
|
|
95
|
+
| `missing-feature` | A required behavior described in the contract that was not implemented at all. |
|
|
96
|
+
| `regression` | Something that worked before the sprint that no longer works. |
|
|
97
|
+
| `quality` | Code quality issue: poor naming, missing error handling, no input validation, accessibility gaps. |
|
|
98
|
+
| `performance` | Performance issue: unnecessary re-renders, N+1 queries, missing pagination, large bundle size. |
|
|
99
|
+
|
|
100
|
+
## Writing Effective Feedback
|
|
101
|
+
|
|
102
|
+
### For Build/Type Errors
|
|
103
|
+
|
|
104
|
+
```json
|
|
105
|
+
{
|
|
106
|
+
"priority": "critical",
|
|
107
|
+
"category": "bug",
|
|
108
|
+
"file": "src/routes/auth.ts",
|
|
109
|
+
"line": 42,
|
|
110
|
+
"description": "TypeScript error TS2345: Argument of type 'string' is not assignable to parameter of type 'number'. The bcrypt.hash function expects a number for salt rounds but receives a string from process.env.SALT_ROUNDS.",
|
|
111
|
+
"expected": "Parse the environment variable to a number: parseInt(process.env.SALT_ROUNDS || '10', 10)",
|
|
112
|
+
"reproduction": "Run: npx tsc --noEmit"
|
|
113
|
+
}
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
### For Functional Failures
|
|
117
|
+
|
|
118
|
+
```json
|
|
119
|
+
{
|
|
120
|
+
"priority": "high",
|
|
121
|
+
"category": "bug",
|
|
122
|
+
"file": "src/routes/auth.ts",
|
|
123
|
+
"line": 55,
|
|
124
|
+
"description": "POST /api/auth/register returns 500 with error 'relation \"users\" does not exist' instead of creating a user. The Prisma migration has not been run, so the users table does not exist in the database.",
|
|
125
|
+
"expected": "The endpoint should return 201 with { id, email } after successfully creating a user record. Ensure the Prisma migration is included in the sprint setup or generatorNotes.",
|
|
126
|
+
"reproduction": "1. Start the dev server: npm run dev\n2. Run: curl -X POST http://localhost:3000/api/auth/register -H 'Content-Type: application/json' -d '{\"email\":\"test@test.com\",\"password\":\"password123\"}'\n3. Observe: 500 response with database error"
|
|
127
|
+
}
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
### For Missing Features
|
|
131
|
+
|
|
132
|
+
```json
|
|
133
|
+
{
|
|
134
|
+
"priority": "high",
|
|
135
|
+
"category": "missing-feature",
|
|
136
|
+
"file": "src/pages/Register.tsx",
|
|
137
|
+
"line": null,
|
|
138
|
+
"description": "The registration form exists but does not implement client-side password length validation. Contract criterion sc-1-7 requires that submitting a password shorter than 8 characters shows an error message before the form is submitted to the server.",
|
|
139
|
+
"expected": "When the user types a password shorter than 8 characters and attempts to submit (or on blur), the form should display 'Password must be at least 8 characters' below the password input without making an API call.",
|
|
140
|
+
"reproduction": "1. Navigate to /register\n2. Enter 'test@test.com' as email\n3. Enter '123' as password\n4. Click Submit\n5. Observe: form submits to server instead of showing client-side error"
|
|
141
|
+
}
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
### For Regressions
|
|
145
|
+
|
|
146
|
+
```json
|
|
147
|
+
{
|
|
148
|
+
"priority": "critical",
|
|
149
|
+
"category": "regression",
|
|
150
|
+
"file": "src/components/Navbar.tsx",
|
|
151
|
+
"line": 23,
|
|
152
|
+
"description": "The Navbar component import was changed from 'react-router-dom' Link to 'next/link' but the project uses React Router, not Next.js. This causes a build failure in an existing component that was working before this sprint.",
|
|
153
|
+
"expected": "The Navbar should continue using Link from 'react-router-dom' as it did before this sprint's changes.",
|
|
154
|
+
"reproduction": "Run: npm run build -- the error appears at src/components/Navbar.tsx:23"
|
|
155
|
+
}
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
## Evidence Standards
|
|
159
|
+
|
|
160
|
+
Evidence must be concrete and reproducible. Here is what counts as evidence for different verification methods:
|
|
161
|
+
|
|
162
|
+
| Method | Good Evidence | Bad Evidence |
|
|
163
|
+
|--------|--------------|-------------|
|
|
164
|
+
| `build` | "Build command exited with code 1. Error: Module not found: src/utils/auth.ts" | "Build seems broken" |
|
|
165
|
+
| `typecheck` | "TS2304: Cannot find name 'UserType' at src/routes/auth.ts:15:22" | "There are type errors" |
|
|
166
|
+
| `lint` | "ESLint: 'password' is defined but never used (no-unused-vars) at src/routes/auth.ts:30" | "Lint has warnings" |
|
|
167
|
+
| `unit-test` | "Test 'should hash password' failed: Expected bcrypt hash (starting with $2b$) but received plain text 'password123'" | "Tests failed" |
|
|
168
|
+
| `manual` | "Reading src/pages/Register.tsx: The component renders two input fields (email, password) but the contract requires three (email, password, confirm-password). No input with name='confirmPassword' or similar exists." | "The form looks incomplete" |
|
|
169
|
+
| `api-check` | "curl -s -o /dev/null -w '%{http_code}' -X POST localhost:3000/api/auth/register returned 404. The route is not defined in src/routes/index.ts." | "API doesn't work" |
|
|
170
|
+
|
|
171
|
+
## Summary Writing
|
|
172
|
+
|
|
173
|
+
The summary should be 2-3 sentences that:
|
|
174
|
+
1. State the overall result (pass/fail) and score
|
|
175
|
+
2. Highlight the most critical issue (if failed)
|
|
176
|
+
3. Indicate what the Generator should focus on for the retry (if failed)
|
|
177
|
+
|
|
178
|
+
**Good summary:**
|
|
179
|
+
"Sprint 1 FAILED: 5 of 7 required criteria passed. The two critical failures are: (1) the database migration was not run, causing all API endpoints to return 500 errors, and (2) the registration form is missing the confirm-password field. The Generator should focus on adding the Prisma migration step and the missing form field."
|
|
180
|
+
|
|
181
|
+
**Bad summary:**
|
|
182
|
+
"Some things passed and some things failed. There are a few issues to fix."
|