agent-bober 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (212) hide show
  1. package/.claude-plugin/plugin.json +9 -0
  2. package/LICENSE +21 -0
  3. package/README.md +495 -0
  4. package/agents/bober-evaluator.md +323 -0
  5. package/agents/bober-generator.md +245 -0
  6. package/agents/bober-planner.md +248 -0
  7. package/dist/cli/commands/eval.d.ts +6 -0
  8. package/dist/cli/commands/eval.d.ts.map +1 -0
  9. package/dist/cli/commands/eval.js +129 -0
  10. package/dist/cli/commands/eval.js.map +1 -0
  11. package/dist/cli/commands/init.d.ts +5 -0
  12. package/dist/cli/commands/init.d.ts.map +1 -0
  13. package/dist/cli/commands/init.js +547 -0
  14. package/dist/cli/commands/init.js.map +1 -0
  15. package/dist/cli/commands/plan.d.ts +5 -0
  16. package/dist/cli/commands/plan.d.ts.map +1 -0
  17. package/dist/cli/commands/plan.js +87 -0
  18. package/dist/cli/commands/plan.js.map +1 -0
  19. package/dist/cli/commands/run.d.ts +5 -0
  20. package/dist/cli/commands/run.d.ts.map +1 -0
  21. package/dist/cli/commands/run.js +120 -0
  22. package/dist/cli/commands/run.js.map +1 -0
  23. package/dist/cli/commands/sprint.d.ts +6 -0
  24. package/dist/cli/commands/sprint.d.ts.map +1 -0
  25. package/dist/cli/commands/sprint.js +206 -0
  26. package/dist/cli/commands/sprint.js.map +1 -0
  27. package/dist/cli/index.d.ts +3 -0
  28. package/dist/cli/index.d.ts.map +1 -0
  29. package/dist/cli/index.js +124 -0
  30. package/dist/cli/index.js.map +1 -0
  31. package/dist/config/defaults.d.ts +15 -0
  32. package/dist/config/defaults.d.ts.map +1 -0
  33. package/dist/config/defaults.js +226 -0
  34. package/dist/config/defaults.js.map +1 -0
  35. package/dist/config/index.d.ts +4 -0
  36. package/dist/config/index.d.ts.map +1 -0
  37. package/dist/config/index.js +8 -0
  38. package/dist/config/index.js.map +1 -0
  39. package/dist/config/loader.d.ts +18 -0
  40. package/dist/config/loader.d.ts.map +1 -0
  41. package/dist/config/loader.js +189 -0
  42. package/dist/config/loader.js.map +1 -0
  43. package/dist/config/schema.d.ts +904 -0
  44. package/dist/config/schema.d.ts.map +1 -0
  45. package/dist/config/schema.js +181 -0
  46. package/dist/config/schema.js.map +1 -0
  47. package/dist/contracts/eval-result.d.ts +205 -0
  48. package/dist/contracts/eval-result.d.ts.map +1 -0
  49. package/dist/contracts/eval-result.js +87 -0
  50. package/dist/contracts/eval-result.js.map +1 -0
  51. package/dist/contracts/index.d.ts +4 -0
  52. package/dist/contracts/index.d.ts.map +1 -0
  53. package/dist/contracts/index.js +16 -0
  54. package/dist/contracts/index.js.map +1 -0
  55. package/dist/contracts/spec.d.ts +101 -0
  56. package/dist/contracts/spec.d.ts.map +1 -0
  57. package/dist/contracts/spec.js +51 -0
  58. package/dist/contracts/spec.js.map +1 -0
  59. package/dist/contracts/sprint-contract.d.ts +141 -0
  60. package/dist/contracts/sprint-contract.d.ts.map +1 -0
  61. package/dist/contracts/sprint-contract.js +80 -0
  62. package/dist/contracts/sprint-contract.js.map +1 -0
  63. package/dist/evaluators/builtin/api-check.d.ts +13 -0
  64. package/dist/evaluators/builtin/api-check.d.ts.map +1 -0
  65. package/dist/evaluators/builtin/api-check.js +152 -0
  66. package/dist/evaluators/builtin/api-check.js.map +1 -0
  67. package/dist/evaluators/builtin/build-check.d.ts +17 -0
  68. package/dist/evaluators/builtin/build-check.d.ts.map +1 -0
  69. package/dist/evaluators/builtin/build-check.js +155 -0
  70. package/dist/evaluators/builtin/build-check.js.map +1 -0
  71. package/dist/evaluators/builtin/command-runner.d.ts +26 -0
  72. package/dist/evaluators/builtin/command-runner.d.ts.map +1 -0
  73. package/dist/evaluators/builtin/command-runner.js +114 -0
  74. package/dist/evaluators/builtin/command-runner.js.map +1 -0
  75. package/dist/evaluators/builtin/lint.d.ts +17 -0
  76. package/dist/evaluators/builtin/lint.d.ts.map +1 -0
  77. package/dist/evaluators/builtin/lint.js +264 -0
  78. package/dist/evaluators/builtin/lint.js.map +1 -0
  79. package/dist/evaluators/builtin/playwright.d.ts +16 -0
  80. package/dist/evaluators/builtin/playwright.d.ts.map +1 -0
  81. package/dist/evaluators/builtin/playwright.js +238 -0
  82. package/dist/evaluators/builtin/playwright.js.map +1 -0
  83. package/dist/evaluators/builtin/typescript-check.d.ts +12 -0
  84. package/dist/evaluators/builtin/typescript-check.d.ts.map +1 -0
  85. package/dist/evaluators/builtin/typescript-check.js +155 -0
  86. package/dist/evaluators/builtin/typescript-check.js.map +1 -0
  87. package/dist/evaluators/builtin/unit-test.d.ts +18 -0
  88. package/dist/evaluators/builtin/unit-test.d.ts.map +1 -0
  89. package/dist/evaluators/builtin/unit-test.js +279 -0
  90. package/dist/evaluators/builtin/unit-test.js.map +1 -0
  91. package/dist/evaluators/index.d.ts +11 -0
  92. package/dist/evaluators/index.d.ts.map +1 -0
  93. package/dist/evaluators/index.js +13 -0
  94. package/dist/evaluators/index.js.map +1 -0
  95. package/dist/evaluators/plugin-interface.d.ts +50 -0
  96. package/dist/evaluators/plugin-interface.d.ts.map +1 -0
  97. package/dist/evaluators/plugin-interface.js +2 -0
  98. package/dist/evaluators/plugin-interface.js.map +1 -0
  99. package/dist/evaluators/plugin-loader.d.ts +18 -0
  100. package/dist/evaluators/plugin-loader.d.ts.map +1 -0
  101. package/dist/evaluators/plugin-loader.js +107 -0
  102. package/dist/evaluators/plugin-loader.js.map +1 -0
  103. package/dist/evaluators/registry.d.ts +78 -0
  104. package/dist/evaluators/registry.d.ts.map +1 -0
  105. package/dist/evaluators/registry.js +238 -0
  106. package/dist/evaluators/registry.js.map +1 -0
  107. package/dist/index.d.ts +17 -0
  108. package/dist/index.d.ts.map +1 -0
  109. package/dist/index.js +22 -0
  110. package/dist/index.js.map +1 -0
  111. package/dist/orchestrator/context-handoff.d.ts +543 -0
  112. package/dist/orchestrator/context-handoff.d.ts.map +1 -0
  113. package/dist/orchestrator/context-handoff.js +133 -0
  114. package/dist/orchestrator/context-handoff.js.map +1 -0
  115. package/dist/orchestrator/evaluator-agent.d.ts +15 -0
  116. package/dist/orchestrator/evaluator-agent.d.ts.map +1 -0
  117. package/dist/orchestrator/evaluator-agent.js +233 -0
  118. package/dist/orchestrator/evaluator-agent.js.map +1 -0
  119. package/dist/orchestrator/generator-agent.d.ts +16 -0
  120. package/dist/orchestrator/generator-agent.d.ts.map +1 -0
  121. package/dist/orchestrator/generator-agent.js +147 -0
  122. package/dist/orchestrator/generator-agent.js.map +1 -0
  123. package/dist/orchestrator/pipeline.d.ts +24 -0
  124. package/dist/orchestrator/pipeline.d.ts.map +1 -0
  125. package/dist/orchestrator/pipeline.js +290 -0
  126. package/dist/orchestrator/pipeline.js.map +1 -0
  127. package/dist/orchestrator/planner-agent.d.ts +10 -0
  128. package/dist/orchestrator/planner-agent.d.ts.map +1 -0
  129. package/dist/orchestrator/planner-agent.js +187 -0
  130. package/dist/orchestrator/planner-agent.js.map +1 -0
  131. package/dist/state/helpers.d.ts +5 -0
  132. package/dist/state/helpers.d.ts.map +1 -0
  133. package/dist/state/helpers.js +8 -0
  134. package/dist/state/helpers.js.map +1 -0
  135. package/dist/state/history.d.ts +39 -0
  136. package/dist/state/history.d.ts.map +1 -0
  137. package/dist/state/history.js +162 -0
  138. package/dist/state/history.js.map +1 -0
  139. package/dist/state/index.d.ts +8 -0
  140. package/dist/state/index.d.ts.map +1 -0
  141. package/dist/state/index.js +22 -0
  142. package/dist/state/index.js.map +1 -0
  143. package/dist/state/plan-state.d.ts +21 -0
  144. package/dist/state/plan-state.d.ts.map +1 -0
  145. package/dist/state/plan-state.js +108 -0
  146. package/dist/state/plan-state.js.map +1 -0
  147. package/dist/state/sprint-state.d.ts +20 -0
  148. package/dist/state/sprint-state.d.ts.map +1 -0
  149. package/dist/state/sprint-state.js +98 -0
  150. package/dist/state/sprint-state.js.map +1 -0
  151. package/dist/utils/fs.d.ts +31 -0
  152. package/dist/utils/fs.d.ts.map +1 -0
  153. package/dist/utils/fs.js +67 -0
  154. package/dist/utils/fs.js.map +1 -0
  155. package/dist/utils/git.d.ts +35 -0
  156. package/dist/utils/git.d.ts.map +1 -0
  157. package/dist/utils/git.js +84 -0
  158. package/dist/utils/git.js.map +1 -0
  159. package/dist/utils/index.d.ts +4 -0
  160. package/dist/utils/index.d.ts.map +1 -0
  161. package/dist/utils/index.js +4 -0
  162. package/dist/utils/index.js.map +1 -0
  163. package/dist/utils/logger.d.ts +45 -0
  164. package/dist/utils/logger.d.ts.map +1 -0
  165. package/dist/utils/logger.js +73 -0
  166. package/dist/utils/logger.js.map +1 -0
  167. package/hooks/hooks.json +10 -0
  168. package/package.json +67 -0
  169. package/scripts/detect-stack.sh +287 -0
  170. package/scripts/init-project.sh +206 -0
  171. package/scripts/run-eval.sh +175 -0
  172. package/skills/bober.anchor/SKILL.md +365 -0
  173. package/skills/bober.anchor/references/anchor-guide.md +567 -0
  174. package/skills/bober.brownfield/SKILL.md +422 -0
  175. package/skills/bober.brownfield/references/codebase-analysis.md +304 -0
  176. package/skills/bober.eval/SKILL.md +235 -0
  177. package/skills/bober.eval/references/eval-strategies.md +407 -0
  178. package/skills/bober.eval/references/feedback-format.md +182 -0
  179. package/skills/bober.plan/SKILL.md +244 -0
  180. package/skills/bober.plan/references/clarification-guide.md +124 -0
  181. package/skills/bober.plan/references/spec-schema.md +253 -0
  182. package/skills/bober.react/SKILL.md +330 -0
  183. package/skills/bober.react/references/react-scaffold.md +344 -0
  184. package/skills/bober.run/SKILL.md +303 -0
  185. package/skills/bober.solidity/SKILL.md +416 -0
  186. package/skills/bober.solidity/references/solidity-guide.md +487 -0
  187. package/skills/bober.sprint/SKILL.md +280 -0
  188. package/skills/bober.sprint/references/contract-schema.md +251 -0
  189. package/templates/base/CLAUDE.md +20 -0
  190. package/templates/base/bober.config.json +35 -0
  191. package/templates/brownfield/CLAUDE.md +34 -0
  192. package/templates/brownfield/bober.config.json +37 -0
  193. package/templates/presets/anchor/CLAUDE.md +163 -0
  194. package/templates/presets/anchor/bober.config.json +9 -0
  195. package/templates/presets/api-node/CLAUDE.md +153 -0
  196. package/templates/presets/api-node/bober.config.json +10 -0
  197. package/templates/presets/nextjs/CLAUDE.md +82 -0
  198. package/templates/presets/nextjs/bober.config.json +14 -0
  199. package/templates/presets/python-api/CLAUDE.md +202 -0
  200. package/templates/presets/python-api/bober.config.json +9 -0
  201. package/templates/presets/react-vite/CLAUDE.md +71 -0
  202. package/templates/presets/react-vite/bober.config.json +53 -0
  203. package/templates/presets/react-vite/scaffold/package.json +45 -0
  204. package/templates/presets/react-vite/scaffold/server/index.ts +38 -0
  205. package/templates/presets/react-vite/scaffold/server/tsconfig.json +24 -0
  206. package/templates/presets/react-vite/scaffold/src/App.tsx +37 -0
  207. package/templates/presets/react-vite/scaffold/src/index.html +12 -0
  208. package/templates/presets/react-vite/scaffold/src/main.tsx +12 -0
  209. package/templates/presets/react-vite/scaffold/tsconfig.json +27 -0
  210. package/templates/presets/react-vite/scaffold/vite.config.ts +34 -0
  211. package/templates/presets/solidity/CLAUDE.md +106 -0
  212. package/templates/presets/solidity/bober.config.json +9 -0
@@ -0,0 +1,407 @@
1
+ # Evaluation Strategies Reference
2
+
3
+ This document describes all built-in evaluation strategies available in the Bober evaluator system. Strategies are configured in `bober.config.json` under `evaluator.strategies`.
4
+
5
+ ## Strategy Configuration Format
6
+
7
+ Each strategy in the config array follows this structure:
8
+ ```json
9
+ {
10
+ "type": "typecheck | lint | unit-test | playwright | api-check | build | custom",
11
+ "required": true,
12
+ "plugin": "string (optional, for custom strategies)",
13
+ "config": {
14
+ "key": "value (optional, strategy-specific configuration)"
15
+ }
16
+ }
17
+ ```
18
+
19
+ The `required` field determines whether a strategy failure blocks the sprint from passing:
20
+ - `required: true` — Sprint FAILS if this strategy fails
21
+ - `required: false` — Strategy result is recorded but does not block the sprint
22
+
23
+ ---
24
+
25
+ ## typecheck
26
+
27
+ **Purpose:** Verify that all TypeScript code compiles without type errors.
28
+
29
+ **Default command:** `npx tsc --noEmit`
30
+ **Config override:** `commands.typecheck` in `bober.config.json`
31
+
32
+ **What it checks:**
33
+ - All `.ts` and `.tsx` files compile under the project's `tsconfig.json`
34
+ - No type errors (TS2xxx codes)
35
+ - No missing imports or unresolved modules
36
+ - Strict mode violations (if `strict: true` in tsconfig)
37
+
38
+ **Pass criteria:** Zero type errors in output. Warnings do not cause failure.
39
+
40
+ **Common failures:**
41
+ - Missing type imports: `Cannot find module './types' or its corresponding type declarations`
42
+ - Type mismatch: `Type 'string' is not assignable to type 'number'`
43
+ - Missing properties: `Property 'name' is missing in type '{}' but required in type 'User'`
44
+ - Implicit any: `Parameter 'x' implicitly has an 'any' type` (when `noImplicitAny` is enabled)
45
+
46
+ **Configuration:**
47
+ ```json
48
+ {
49
+ "type": "typecheck",
50
+ "required": true,
51
+ "config": {
52
+ "tsconfig": "tsconfig.json",
53
+ "strict": true
54
+ }
55
+ }
56
+ ```
57
+
58
+ **Notes:**
59
+ - Runs against the full project, not just files changed in the sprint
60
+ - Catches regressions in existing code caused by the sprint's changes
61
+
62
+ ---
63
+
64
+ ## lint
65
+
66
+ **Purpose:** Verify code follows the project's linting rules.
67
+
68
+ **Default command:** `npm run lint`
69
+ **Config override:** `commands.lint` in `bober.config.json`
70
+
71
+ **Supported linters:**
72
+ - **ESLint** (most common): Detected by `eslint.config.js`, `.eslintrc.*`, or `eslint` in devDependencies
73
+ - **Biome**: Detected by `biome.json` or `@biomejs/biome` in devDependencies
74
+ - **Both:** Some projects use both. Run whatever `commands.lint` specifies.
75
+
76
+ **What it checks:**
77
+ - Code style violations
78
+ - Potential bugs (unused variables, unreachable code, implicit type coercion)
79
+ - Import order and organization
80
+ - Framework-specific rules (React hooks rules, etc.)
81
+
82
+ **Pass criteria:** Zero errors. Warnings are acceptable (but should be noted in the report).
83
+
84
+ **Common failures:**
85
+ - Unused variables: `'x' is defined but never used`
86
+ - Missing dependencies in hook deps: `React Hook useEffect has a missing dependency`
87
+ - Prefer const: `'x' is never reassigned. Use 'const' instead`
88
+ - Import order violations
89
+
90
+ **Configuration:**
91
+ ```json
92
+ {
93
+ "type": "lint",
94
+ "required": true,
95
+ "config": {
96
+ "fix": false,
97
+ "maxWarnings": -1
98
+ }
99
+ }
100
+ ```
101
+
102
+ **Notes:**
103
+ - `fix: false` means the evaluator reports violations without auto-fixing them. The Generator must fix them.
104
+ - `maxWarnings: -1` means unlimited warnings are tolerated. Set a number to fail on too many warnings.
105
+
106
+ ---
107
+
108
+ ## unit-test
109
+
110
+ **Purpose:** Verify that unit tests pass, including both new tests and pre-existing tests.
111
+
112
+ **Default command:** `npm test`
113
+ **Config override:** `commands.test` in `bober.config.json`
114
+
115
+ **Supported frameworks:**
116
+ - **Vitest**: Detected by `vitest` in devDependencies or `vitest.config.*`
117
+ - **Jest**: Detected by `jest` in devDependencies or `jest.config.*`
118
+ - **Mocha**: Detected by `mocha` in devDependencies
119
+ - **Custom:** Whatever `commands.test` runs
120
+
121
+ **What it checks:**
122
+ - All tests pass (both new and existing)
123
+ - No test regressions (existing tests that previously passed should still pass)
124
+ - Test coverage (if configured)
125
+
126
+ **Pass criteria:** All tests pass with exit code 0.
127
+
128
+ **Common failures:**
129
+ - Assertion failures: `Expected 200 but received 500`
130
+ - Missing test dependencies: Module not found errors in test files
131
+ - Timeout: Tests that hang due to unresolved promises or server connections
132
+ - Snapshot mismatches (for snapshot testing)
133
+
134
+ **Configuration:**
135
+ ```json
136
+ {
137
+ "type": "unit-test",
138
+ "required": true,
139
+ "config": {
140
+ "coverage": false,
141
+ "coverageThreshold": 80,
142
+ "testMatch": "**/*.test.{ts,tsx}",
143
+ "timeout": 30000
144
+ }
145
+ }
146
+ ```
147
+
148
+ **Notes:**
149
+ - If `coverage: true`, the evaluator checks that coverage meets `coverageThreshold`
150
+ - The evaluator should count total tests, passed, failed, and skipped
151
+ - If no tests exist yet and this is the first sprint, the strategy passes vacuously but the evaluator should note "no tests found" in the report
152
+
153
+ ---
154
+
155
+ ## playwright
156
+
157
+ **Purpose:** Run end-to-end browser tests that verify the application works from a user's perspective.
158
+
159
+ **Default command:** `npx playwright test`
160
+ **Config override:** Strategy-specific config
161
+
162
+ **Prerequisites:**
163
+ - Playwright must be installed: `npx playwright install` (installs browsers)
164
+ - A dev server must be running or `webServer` must be configured in `playwright.config.ts`
165
+ - Test files must exist (usually in `tests/` or `e2e/` directory)
166
+
167
+ **What it checks:**
168
+ - Full user flows work end-to-end (login, navigation, form submission, etc.)
169
+ - UI renders correctly in a real browser
170
+ - Client-server interaction works
171
+ - No console errors or unhandled exceptions
172
+
173
+ **Pass criteria:** All Playwright tests pass.
174
+
175
+ **Common failures:**
176
+ - Element not found: `Timeout waiting for selector '#login-form'`
177
+ - Navigation error: `Page navigated to unexpected URL`
178
+ - Network error: API calls returning errors
179
+ - Visual regression: Screenshot comparison failures
180
+
181
+ **Configuration:**
182
+ ```json
183
+ {
184
+ "type": "playwright",
185
+ "required": false,
186
+ "config": {
187
+ "project": "chromium",
188
+ "retries": 1,
189
+ "timeout": 60000,
190
+ "webServer": {
191
+ "command": "npm run dev",
192
+ "port": 3000,
193
+ "reuseExistingServer": true,
194
+ "timeout": 30000
195
+ }
196
+ }
197
+ }
198
+ ```
199
+
200
+ **Notes:**
201
+ - Default `required: false` because Playwright setup is non-trivial. Mark as `required: true` only when E2E tests are critical and known to be configured.
202
+ - If Playwright is not installed, the evaluator marks this as `skipped` (not failed), even if `required: true`. It should flag this as a configuration issue.
203
+ - The evaluator should try to start the dev server before running tests if `webServer` is configured.
204
+
205
+ ---
206
+
207
+ ## api-check
208
+
209
+ **Purpose:** Verify that HTTP API endpoints respond correctly.
210
+
211
+ **Default command:** Uses `curl` or the configured HTTP client
212
+ **Config override:** Strategy-specific config
213
+
214
+ **What it checks:**
215
+ - Endpoints exist and respond
216
+ - Correct HTTP status codes
217
+ - Response body structure matches expectations
218
+ - Error responses are properly formatted
219
+ - Content-Type headers are correct
220
+
221
+ **Pass criteria:** All configured endpoint checks return expected status codes and response shapes.
222
+
223
+ **Configuration:**
224
+ ```json
225
+ {
226
+ "type": "api-check",
227
+ "required": true,
228
+ "config": {
229
+ "baseUrl": "http://localhost:3000",
230
+ "startServer": true,
231
+ "serverCommand": "npm run dev",
232
+ "serverReadyPattern": "listening on port",
233
+ "serverTimeout": 15000,
234
+ "endpoints": [
235
+ {
236
+ "method": "POST",
237
+ "path": "/api/auth/register",
238
+ "body": { "email": "test@example.com", "password": "testpassword123" },
239
+ "expectedStatus": 201,
240
+ "expectedBodyKeys": ["id", "email"]
241
+ },
242
+ {
243
+ "method": "POST",
244
+ "path": "/api/auth/register",
245
+ "body": { "email": "test@example.com", "password": "testpassword123" },
246
+ "expectedStatus": 400,
247
+ "description": "Duplicate registration should fail"
248
+ }
249
+ ]
250
+ }
251
+ }
252
+ ```
253
+
254
+ **Notes:**
255
+ - The evaluator typically derives endpoint checks from the sprint contract's success criteria rather than relying solely on pre-configured endpoints
256
+ - If `startServer: true`, the evaluator starts the dev server, waits for `serverReadyPattern` in stdout, runs checks, then stops the server
257
+ - API checks are often used in combination with `manual` verification for the same criterion
258
+
259
+ ---
260
+
261
+ ## build
262
+
263
+ **Purpose:** Verify that the project compiles/builds without errors.
264
+
265
+ **Default command:** `npm run build`
266
+ **Config override:** `commands.build` in `bober.config.json`
267
+
268
+ **What it checks:**
269
+ - The full build pipeline completes successfully
270
+ - No compilation errors
271
+ - All assets are generated correctly
272
+ - Build output exists in the expected directory
273
+
274
+ **Pass criteria:** Build command exits with code 0 and no errors in output.
275
+
276
+ **Common failures:**
277
+ - Import errors: Missing modules or circular dependencies
278
+ - Syntax errors in new code
279
+ - Environment variable issues
280
+ - Asset processing failures (CSS, images)
281
+ - Bundle size exceeded (if configured)
282
+
283
+ **Configuration:**
284
+ ```json
285
+ {
286
+ "type": "build",
287
+ "required": true,
288
+ "config": {
289
+ "outputDir": "dist",
290
+ "verifyOutput": true
291
+ }
292
+ }
293
+ ```
294
+
295
+ **Notes:**
296
+ - This should almost always be `required: true`. If the project does not build, nothing else matters.
297
+ - `verifyOutput: true` means the evaluator checks that the output directory exists and is non-empty after the build
298
+ - This is different from `typecheck` -- `build` runs the full build pipeline (bundling, optimization, etc.), while `typecheck` only verifies types
299
+
300
+ ---
301
+
302
+ ## custom
303
+
304
+ **Purpose:** Run a user-defined evaluation command for project-specific checks.
305
+
306
+ **Default command:** None (must be configured)
307
+ **Config override:** Strategy-specific config
308
+
309
+ **What it checks:** Whatever the custom command checks. The evaluator interprets results based on exit code and output.
310
+
311
+ **Pass criteria:** Command exits with code 0.
312
+
313
+ **Configuration:**
314
+ ```json
315
+ {
316
+ "type": "custom",
317
+ "required": false,
318
+ "plugin": "check-bundle-size",
319
+ "config": {
320
+ "command": "node scripts/check-bundle-size.js",
321
+ "maxSizeKb": 500,
322
+ "parseOutput": "json",
323
+ "passCondition": "output.passed === true"
324
+ }
325
+ }
326
+ ```
327
+
328
+ **How to write a custom evaluator plugin:**
329
+
330
+ A custom evaluator is a script or command that:
331
+ 1. Runs a specific check
332
+ 2. Outputs results to stdout (optionally as JSON for structured parsing)
333
+ 3. Exits with code 0 for pass, non-zero for fail
334
+
335
+ **Example custom evaluator script:**
336
+ ```javascript
337
+ // scripts/check-bundle-size.js
338
+ import { statSync } from 'fs';
339
+ import { glob } from 'glob';
340
+
341
+ const MAX_SIZE_KB = 500;
342
+ const files = glob.sync('dist/**/*.js');
343
+ const totalSize = files.reduce((sum, f) => sum + statSync(f).size, 0);
344
+ const sizeKb = totalSize / 1024;
345
+
346
+ if (sizeKb > MAX_SIZE_KB) {
347
+ console.error(`Bundle size ${sizeKb.toFixed(1)}KB exceeds limit of ${MAX_SIZE_KB}KB`);
348
+ process.exit(1);
349
+ } else {
350
+ console.log(`Bundle size OK: ${sizeKb.toFixed(1)}KB / ${MAX_SIZE_KB}KB`);
351
+ process.exit(0);
352
+ }
353
+ ```
354
+
355
+ **Plugin naming:** The `plugin` field is a human-readable name for the check. It appears in evaluation reports.
356
+
357
+ **Advanced custom evaluators:**
358
+ - Output JSON with `parseOutput: "json"` for structured results
359
+ - Use `passCondition` to evaluate a JavaScript expression against the parsed output
360
+ - Chain multiple commands with `&&` in the command string
361
+
362
+ ---
363
+
364
+ ## Strategy Execution Order
365
+
366
+ The evaluator runs strategies in this recommended order for fastest feedback:
367
+
368
+ 1. **build** — If the build fails, everything else is likely unreliable
369
+ 2. **typecheck** — Type errors indicate fundamental code issues
370
+ 3. **lint** — Style and potential bug detection
371
+ 4. **unit-test** — Functional correctness of individual units
372
+ 5. **api-check** — API endpoint verification (requires running server)
373
+ 6. **playwright** — Full E2E testing (most expensive, most comprehensive)
374
+ 7. **custom** — Project-specific checks
375
+
376
+ The evaluator should continue running all strategies even if an early one fails, so the Generator gets complete feedback in one pass.
377
+
378
+ ---
379
+
380
+ ## Default Strategy Sets by Preset
381
+
382
+ ### nextjs / react-vite
383
+ ```json
384
+ [
385
+ { "type": "typecheck", "required": true },
386
+ { "type": "lint", "required": true },
387
+ { "type": "build", "required": true },
388
+ { "type": "playwright", "required": false }
389
+ ]
390
+ ```
391
+
392
+ ### brownfield
393
+ ```json
394
+ [
395
+ { "type": "typecheck", "required": true },
396
+ { "type": "lint", "required": true },
397
+ { "type": "unit-test", "required": true }
398
+ ]
399
+ ```
400
+
401
+ ### generic
402
+ ```json
403
+ [
404
+ { "type": "build", "required": true },
405
+ { "type": "lint", "required": false }
406
+ ]
407
+ ```
@@ -0,0 +1,182 @@
1
+ # Evaluation Feedback Format
2
+
3
+ This document defines how evaluation feedback should be structured for maximum effectiveness when consumed by the Generator agent during retry iterations.
4
+
5
+ ## Principles
6
+
7
+ 1. **Actionable over descriptive.** Every piece of feedback should give the Generator enough information to fix the issue without guessing.
8
+ 2. **Precise location.** Always include file paths and line numbers when applicable. "There's a bug in the auth code" is useless. "src/routes/auth.ts:42 — the bcrypt.hash call is missing the salt rounds argument" is actionable.
9
+ 3. **One issue per feedback item.** Do not combine multiple issues into one feedback entry. The Generator processes each item independently.
10
+ 4. **Prioritized.** Critical issues (build failures, type errors) come before minor issues (style, optimization).
11
+
12
+ ## EvalResult JSON Schema
13
+
14
+ ```json
15
+ {
16
+ "evalId": "string (required, format: eval-<contractId>-<iteration>)",
17
+ "contractId": "string (required)",
18
+ "specId": "string (required)",
19
+ "timestamp": "string (required, ISO-8601)",
20
+ "iteration": "number (required, 1-indexed)",
21
+ "overallResult": "string (required, one of: pass, fail)",
22
+
23
+ "score": {
24
+ "criteriaTotal": "number",
25
+ "criteriaPassed": "number",
26
+ "criteriaFailed": "number",
27
+ "criteriaSkipped": "number",
28
+ "requiredPassed": "number",
29
+ "requiredFailed": "number",
30
+ "requiredTotal": "number"
31
+ },
32
+
33
+ "strategyResults": [
34
+ {
35
+ "strategy": "string (strategy type)",
36
+ "required": "boolean",
37
+ "result": "string (pass | fail | skipped)",
38
+ "exitCode": "number (optional)",
39
+ "output": "string (relevant output excerpt, not full dump)",
40
+ "errorCount": "number (optional)",
41
+ "details": "string (explanation, especially for failures)"
42
+ }
43
+ ],
44
+
45
+ "criteriaResults": [
46
+ {
47
+ "criterionId": "string (from contract)",
48
+ "description": "string (from contract)",
49
+ "required": "boolean",
50
+ "result": "string (pass | fail | skipped)",
51
+ "evidence": "string (specific evidence supporting judgment)",
52
+ "feedback": "string (if failed: what went wrong and what should happen instead)"
53
+ }
54
+ ],
55
+
56
+ "regressions": [
57
+ {
58
+ "description": "string (what regressed)",
59
+ "evidence": "string (how detected)",
60
+ "severity": "string (critical | major | minor)",
61
+ "affectedFiles": ["string (file paths)"]
62
+ }
63
+ ],
64
+
65
+ "generatorFeedback": [
66
+ {
67
+ "priority": "string (critical | high | medium | low)",
68
+ "category": "string (bug | missing-feature | regression | quality | performance)",
69
+ "file": "string (file path, if applicable)",
70
+ "line": "number (line number, if applicable)",
71
+ "description": "string (precise description of the issue)",
72
+ "expected": "string (what should happen instead)",
73
+ "reproduction": "string (steps to reproduce, if applicable)"
74
+ }
75
+ ],
76
+
77
+ "summary": "string (2-3 sentence summary)"
78
+ }
79
+ ```
80
+
81
+ ## Priority Levels
82
+
83
+ | Priority | Meaning | Examples |
84
+ |----------|---------|---------|
85
+ | `critical` | Sprint cannot pass until this is fixed. Typically build/type errors or complete feature absence. | Build fails, type error in new code, required feature completely missing |
86
+ | `high` | Required criterion failed. Must be fixed for the sprint to pass. | API returns wrong status code, form validation not working, test assertion failing |
87
+ | `medium` | Non-required criterion failed or quality issue. Should be fixed but won't block the sprint. | Lint errors, missing error handling for edge case, incomplete accessibility |
88
+ | `low` | Minor quality issue or suggestion. Can be deferred. | Code style inconsistency, opportunity for optimization, extra console.log |
89
+
90
+ ## Category Definitions
91
+
92
+ | Category | Description |
93
+ |----------|-------------|
94
+ | `bug` | Code that does not behave as specified. Incorrect logic, wrong return values, unhandled errors. |
95
+ | `missing-feature` | A required behavior described in the contract that was not implemented at all. |
96
+ | `regression` | Something that worked before the sprint that no longer works. |
97
+ | `quality` | Code quality issue: poor naming, missing error handling, no input validation, accessibility gaps. |
98
+ | `performance` | Performance issue: unnecessary re-renders, N+1 queries, missing pagination, large bundle size. |
99
+
100
+ ## Writing Effective Feedback
101
+
102
+ ### For Build/Type Errors
103
+
104
+ ```json
105
+ {
106
+ "priority": "critical",
107
+ "category": "bug",
108
+ "file": "src/routes/auth.ts",
109
+ "line": 42,
110
+ "description": "TypeScript error TS2345: Argument of type 'string' is not assignable to parameter of type 'number'. The bcrypt.hash function expects a number for salt rounds but receives a string from process.env.SALT_ROUNDS.",
111
+ "expected": "Parse the environment variable to a number: parseInt(process.env.SALT_ROUNDS || '10', 10)",
112
+ "reproduction": "Run: npx tsc --noEmit"
113
+ }
114
+ ```
115
+
116
+ ### For Functional Failures
117
+
118
+ ```json
119
+ {
120
+ "priority": "high",
121
+ "category": "bug",
122
+ "file": "src/routes/auth.ts",
123
+ "line": 55,
124
+ "description": "POST /api/auth/register returns 500 with error 'relation \"users\" does not exist' instead of creating a user. The Prisma migration has not been run, so the users table does not exist in the database.",
125
+ "expected": "The endpoint should return 201 with { id, email } after successfully creating a user record. Ensure the Prisma migration is included in the sprint setup or generatorNotes.",
126
+ "reproduction": "1. Start the dev server: npm run dev\n2. Run: curl -X POST http://localhost:3000/api/auth/register -H 'Content-Type: application/json' -d '{\"email\":\"test@test.com\",\"password\":\"password123\"}'\n3. Observe: 500 response with database error"
127
+ }
128
+ ```
129
+
130
+ ### For Missing Features
131
+
132
+ ```json
133
+ {
134
+ "priority": "high",
135
+ "category": "missing-feature",
136
+ "file": "src/pages/Register.tsx",
137
+ "line": null,
138
+ "description": "The registration form exists but does not implement client-side password length validation. Contract criterion sc-1-7 requires that submitting a password shorter than 8 characters shows an error message before the form is submitted to the server.",
139
+ "expected": "When the user types a password shorter than 8 characters and attempts to submit (or on blur), the form should display 'Password must be at least 8 characters' below the password input without making an API call.",
140
+ "reproduction": "1. Navigate to /register\n2. Enter 'test@test.com' as email\n3. Enter '123' as password\n4. Click Submit\n5. Observe: form submits to server instead of showing client-side error"
141
+ }
142
+ ```
143
+
144
+ ### For Regressions
145
+
146
+ ```json
147
+ {
148
+ "priority": "critical",
149
+ "category": "regression",
150
+ "file": "src/components/Navbar.tsx",
151
+ "line": 23,
152
+ "description": "The Navbar component import was changed from 'react-router-dom' Link to 'next/link' but the project uses React Router, not Next.js. This causes a build failure in an existing component that was working before this sprint.",
153
+ "expected": "The Navbar should continue using Link from 'react-router-dom' as it did before this sprint's changes.",
154
+ "reproduction": "Run: npm run build -- the error appears at src/components/Navbar.tsx:23"
155
+ }
156
+ ```
157
+
158
+ ## Evidence Standards
159
+
160
+ Evidence must be concrete and reproducible. Here is what counts as evidence for different verification methods:
161
+
162
+ | Method | Good Evidence | Bad Evidence |
163
+ |--------|--------------|-------------|
164
+ | `build` | "Build command exited with code 1. Error: Module not found: src/utils/auth.ts" | "Build seems broken" |
165
+ | `typecheck` | "TS2304: Cannot find name 'UserType' at src/routes/auth.ts:15:22" | "There are type errors" |
166
+ | `lint` | "ESLint: 'password' is defined but never used (no-unused-vars) at src/routes/auth.ts:30" | "Lint has warnings" |
167
+ | `unit-test` | "Test 'should hash password' failed: Expected bcrypt hash (starting with $2b$) but received plain text 'password123'" | "Tests failed" |
168
+ | `manual` | "Reading src/pages/Register.tsx: The component renders two input fields (email, password) but the contract requires three (email, password, confirm-password). No input with name='confirmPassword' or similar exists." | "The form looks incomplete" |
169
+ | `api-check` | "curl -s -o /dev/null -w '%{http_code}' -X POST localhost:3000/api/auth/register returned 404. The route is not defined in src/routes/index.ts." | "API doesn't work" |
170
+
171
+ ## Summary Writing
172
+
173
+ The summary should be 2-3 sentences that:
174
+ 1. State the overall result (pass/fail) and score
175
+ 2. Highlight the most critical issue (if failed)
176
+ 3. Indicate what the Generator should focus on for the retry (if failed)
177
+
178
+ **Good summary:**
179
+ "Sprint 1 FAILED: 5 of 7 required criteria passed. The two critical failures are: (1) the database migration was not run, causing all API endpoints to return 500 errors, and (2) the registration form is missing the confirm-password field. The Generator should focus on adding the Prisma migration step and the missing form field."
180
+
181
+ **Bad summary:**
182
+ "Some things passed and some things failed. There are a few issues to fix."