create-squirrel-opencode-harness 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,377 @@
1
+ # Create Squirrel Opencode Harness
2
+
3
+ <p align="center">
4
+ <b>English</b> | <a href="#中文文档">中文</a>
5
+ </p>
6
+
7
+ > 🐿️ Scaffold [Squirrel Opencode](https://github.com/squirrel-ai/opencode) harness into your project with configurable AI models.
8
+
9
+ ## Features
10
+
11
+ - 🚀 Quick setup of multi-agent harness system
12
+ - 🎛️ Configurable AI model identifiers
13
+ - 🔒 Transactional file copying (all-or-nothing)
14
+ - 🌐 Multi-language support (English/Chinese)
15
+ - 📁 Automatic `.opencode/` directory structure
16
+
17
+ ## Quick Start
18
+
19
+ No installation needed! Use `npm init` or `npx`:
20
+
21
+ ```bash
22
+ # Using npm init (recommended)
23
+ npm init squirrel-opencode-harness "fireworks-ai/accounts/fireworks/routers/kimi-k2p5-turbo"
24
+
25
+ # Using npx
26
+ npx create-squirrel-opencode-harness "fireworks-ai/accounts/fireworks/routers/kimi-k2p5-turbo"
27
+
28
+ # Using pnpm
29
+ pnpm create squirrel-opencode-harness "fireworks-ai/accounts/fireworks/routers/kimi-k2p5-turbo"
30
+ ```
31
+
32
+ ## Installation (Optional)
33
+
34
+ For frequent use, install globally:
35
+
36
+ ```bash
37
+ # Using npm
38
+ npm install -g create-squirrel-opencode-harness
39
+
40
+ # Using pnpm
41
+ pnpm add -g create-squirrel-opencode-harness
42
+
43
+ # Using yarn
44
+ yarn global add create-squirrel-opencode-harness
45
+ ```
46
+
47
+ Then use without `npx`:
48
+
49
+ ```bash
50
+ create-squirrel-opencode-harness "your-model-id"
51
+ ```
52
+
53
+ ## Usage
54
+
55
+ ### Basic Usage
56
+
57
+ ```bash
58
+ # Using positional argument (default language: English)
59
+ create-squirrel-opencode-harness "openai/gpt-4"
60
+
61
+ # Using --model option
62
+ create-squirrel-opencode-harness --model "your-model-id"
63
+
64
+ # Using stdin
65
+ echo "your-model-id" | create-squirrel-opencode-harness --stdin
66
+
67
+ # Interactive mode
68
+ create-squirrel-opencode-harness --interactive
69
+ ```
70
+
71
+ ### With npm init
72
+
73
+ When using `npm init`, pass arguments after `--`:
74
+
75
+ ```bash
76
+ # Basic usage with npm init
77
+ npm init squirrel-opencode-harness -- "your-model-id"
78
+
79
+ # With language option
80
+ npm init squirrel-opencode-harness -- "your-model-id" --lang zh
81
+
82
+ # Interactive mode
83
+ npm init squirrel-opencode-harness -- --interactive --lang zh
84
+
85
+ # Using stdin (pipe model id)
86
+ echo "your-model-id" | npm init squirrel-opencode-harness -- --stdin
87
+ ```
88
+
89
+ ### Language Selection
90
+
91
+ ```bash
92
+ # English (default)
93
+ create-squirrel-opencode-harness "model-id" --lang en
94
+
95
+ # Chinese (中文)
96
+ create-squirrel-opencode-harness "model-id" --lang zh
97
+ ```
98
+
99
+ ### Complete Options
100
+
101
+ ```
102
+ Usage: create-squirrel-opencode-harness [options] [model]
103
+
104
+ Options:
105
+ -V, --version output the version number
106
+ -m, --model <model> Model identifier for agents
107
+ -i, --interactive Use interactive mode to input model
108
+ --stdin Read model from stdin
109
+ -l, --lang <lang> Language (en/zh) (default: "en")
110
+ -h, --help display help for command
111
+ ```
112
+
113
+ ## What It Does
114
+
115
+ 1. **Directory Check**: Creates `.opencode/` directory structure
116
+ - `.opencode/agents/` - Agent definitions
117
+ - `.opencode/harness/` - Sprint templates
118
+
119
+ 2. **Model Configuration**: Replaces `<%= model %>` placeholder in agent files with your model identifier
120
+
121
+ 3. **Transactional Copy**: Checks all files before copying - if any file exists, aborts with error
122
+
123
+ ## Generated Structure
124
+
125
+ ```
126
+ .opencode/
127
+ ├── agents/
128
+ │ ├── evaluator.md # Evaluator agent with your model
129
+ │ ├── generator.md # Generator agent with your model
130
+ │ ├── harness.md # Orchestrator agent with your model
131
+ │ └── planner.md # Planner agent with your model
132
+ └── harness/
133
+ └── templates/
134
+ ├── contract-template.md
135
+ ├── evaluation-template.md
136
+ ├── final-summary-template.md
137
+ ├── handoff-template.md
138
+ ├── self-eval-template.md
139
+ ├── spec-template.md
140
+ └── sprint-status-template.md
141
+ ```
142
+
143
+ ## Examples
144
+
145
+ ### Example 1: Quick Start
146
+
147
+ ```bash
148
+ # Using npm init (no install needed)
149
+ npm init squirrel-opencode-harness -- "openai/gpt-4"
150
+
151
+ # Or with npx
152
+ npx create-squirrel-opencode-harness "openai/gpt-4"
153
+ ```
154
+
155
+ ### Example 2: Interactive Mode in Chinese
156
+
157
+ ```bash
158
+ npm init squirrel-opencode-harness -- --interactive --lang zh
159
+
160
+ # Or with npx
161
+ npx create-squirrel-opencode-harness --interactive --lang zh
162
+ ```
163
+
164
+ ### Example 3: Using with Environment Variable
165
+
166
+ ```bash
167
+ export MODEL_ID="anthropic/claude-3-sonnet"
168
+ echo $MODEL_ID | npm init squirrel-opencode-harness -- --stdin
169
+ ```
170
+
171
+ ## Error Handling
172
+
173
+ - **Transaction Protection**: If any target file exists, the entire operation is aborted
174
+ - **Missing Model**: Returns error if model identifier is not provided
175
+ - **Missing Source**: Returns error if source directories (agents/, harness/) are missing
176
+
177
+ ## Contributing
178
+
179
+ 1. Fork the repository
180
+ 2. Create your feature branch (`git checkout -b feature/amazing-feature`)
181
+ 3. Commit your changes (`git commit -m 'Add some amazing feature'`)
182
+ 4. Push to the branch (`git push origin feature/amazing-feature`)
183
+ 5. Open a Pull Request
184
+
185
+ ## License
186
+
187
+ MIT License - see [LICENSE](LICENSE) file for details.
188
+
189
+ ---
190
+
191
+ <h1 id="中文文档">创建松鼠 Opencode Harness</h1>
192
+
193
+ <p align="center">
194
+ <a href="#create-squirrel-opencode-harness">English</a> | <b>中文</b>
195
+ </p>
196
+
197
+ > 🐿️ 将 [Squirrel Opencode](https://github.com/squirrel-ai/opencode) Harness 脚手架快速搭建到您的项目中,支持可配置的 AI 模型。
198
+
199
+ ## 功能特性
200
+
201
+ - 🚀 快速设置多代理 Harness 系统
202
+ - 🎛️ 可配置的 AI 模型标识符
203
+ - 🔒 事务性文件复制(全有或全无)
204
+ - 🌐 多语言支持(英文/中文)
205
+ - 📁 自动创建 `.opencode/` 目录结构
206
+
207
+ ## 快速开始
208
+
209
+ 无需安装!直接使用 `npm init` 或 `npx`:
210
+
211
+ ```bash
212
+ # 使用 npm init(推荐)
213
+ npm init squirrel-opencode-harness "fireworks-ai/accounts/fireworks/routers/kimi-k2p5-turbo"
214
+
215
+ # 使用 npx
216
+ npx create-squirrel-opencode-harness "fireworks-ai/accounts/fireworks/routers/kimi-k2p5-turbo"
217
+
218
+ # 使用 pnpm
219
+ pnpm create squirrel-opencode-harness "fireworks-ai/accounts/fireworks/routers/kimi-k2p5-turbo"
220
+ ```
221
+
222
+ ## 安装(可选)
223
+
224
+ 如需频繁使用,可以全局安装:
225
+
226
+ ```bash
227
+ # 使用 npm
228
+ npm install -g create-squirrel-opencode-harness
229
+
230
+ # 使用 pnpm
231
+ pnpm add -g create-squirrel-opencode-harness
232
+
233
+ # 使用 yarn
234
+ yarn global add create-squirrel-opencode-harness
235
+ ```
236
+
237
+ 安装后可直接使用,无需 `npx`:
238
+
239
+ ```bash
240
+ create-squirrel-opencode-harness "your-model-id"
241
+ ```
242
+
243
+ ## 使用方法
244
+
245
+ ### 基础用法
246
+
247
+ ```bash
248
+ # 使用位置参数(默认语言:英文)
249
+ create-squirrel-opencode-harness "openai/gpt-4"
250
+
251
+ # 使用 --model 选项
252
+ create-squirrel-opencode-harness --model "你的模型id"
253
+
254
+ # 使用标准输入
255
+ echo "你的模型id" | create-squirrel-opencode-harness --stdin
256
+
257
+ # 交互模式
258
+ create-squirrel-opencode-harness --interactive
259
+ ```
260
+
261
+ ### 使用 npm init
262
+
263
+ 使用 `npm init` 时,在 `--` 后传递参数:
264
+
265
+ ```bash
266
+ # 基础用法
267
+ npm init squirrel-opencode-harness -- "your-model-id"
268
+
269
+ # 指定语言
270
+ npm init squirrel-opencode-harness -- "your-model-id" --lang zh
271
+
272
+ # 交互模式
273
+ npm init squirrel-opencode-harness -- --interactive --lang zh
274
+
275
+ # 使用标准输入(管道输入模型id)
276
+ echo "your-model-id" | npm init squirrel-opencode-harness -- --stdin
277
+ ```
278
+
279
+ ### 语言选择
280
+
281
+ ```bash
282
+ # 英文(默认)
283
+ create-squirrel-opencode-harness "模型id" --lang en
284
+
285
+ # 中文
286
+ create-squirrel-opencode-harness "模型id" --lang zh
287
+ ```
288
+
289
+ ### 完整选项
290
+
291
+ ```
292
+ 使用方法: create-squirrel-opencode-harness [选项] [模型]
293
+
294
+ 选项:
295
+ -V, --version 输出版本号
296
+ -m, --model <模型> 代理模型标识符
297
+ -i, --interactive 使用交互式模式输入模型
298
+ --stdin 从标准输入读取模型
299
+ -l, --lang <语言> 语言 (en/zh) (默认: "en")
300
+ -h, --help 显示帮助信息
301
+ ```
302
+
303
+ ## 功能说明
304
+
305
+ 1. **目录检查**:创建 `.opencode/` 目录结构
306
+ - `.opencode/agents/` - 代理定义文件
307
+ - `.opencode/harness/` - Sprint 模板文件
308
+
309
+ 2. **模型配置**:将代理文件中的 `<%= model %>` 占位符替换为您的模型标识符
310
+
311
+ 3. **事务复制**:复制前检查所有文件 - 如果任何文件已存在,则中止并报错
312
+
313
+ ## 生成结构
314
+
315
+ ```
316
+ .opencode/
317
+ ├── agents/
318
+ │ ├── evaluator.md # 评估器代理(使用您的模型)
319
+ │ ├── generator.md # 生成器代理(使用您的模型)
320
+ │ ├── harness.md # 协调器代理(使用您的模型)
321
+ │ └── planner.md # 规划器代理(使用您的模型)
322
+ └── harness/
323
+ └── templates/
324
+ ├── contract-template.md
325
+ ├── evaluation-template.md
326
+ ├── final-summary-template.md
327
+ ├── handoff-template.md
328
+ ├── self-eval-template.md
329
+ ├── spec-template.md
330
+ └── sprint-status-template.md
331
+ ```
332
+
333
+ ## 使用示例
334
+
335
+ ### 示例 1:快速开始
336
+
337
+ ```bash
338
+ # 使用 npm init(无需安装)
339
+ npm init squirrel-opencode-harness -- "openai/gpt-4"
340
+
341
+ # 或使用 npx
342
+ npx create-squirrel-opencode-harness "openai/gpt-4"
343
+ ```
344
+
345
+ ### 示例 2:中文交互模式
346
+
347
+ ```bash
348
+ npm init squirrel-opencode-harness -- --interactive --lang zh
349
+
350
+ # 或使用 npx
351
+ npx create-squirrel-opencode-harness --interactive --lang zh
352
+ ```
353
+
354
+ ### 示例 3:使用环境变量
355
+
356
+ ```bash
357
+ export MODEL_ID="anthropic/claude-3-sonnet"
358
+ echo $MODEL_ID | npm init squirrel-opencode-harness -- --stdin
359
+ ```
360
+
361
+ ## 错误处理
362
+
363
+ - **事务保护**:如果任何目标文件已存在,整个操作将被中止
364
+ - **缺失模型**:如果未提供模型标识符,将返回错误
365
+ - **缺失源文件**:如果源目录(agents/、harness/)不存在,将返回错误
366
+
367
+ ## 贡献指南
368
+
369
+ 1. Fork 本仓库
370
+ 2. 创建功能分支 (`git checkout -b feature/amazing-feature`)
371
+ 3. 提交更改 (`git commit -m '添加了某个 amazing 功能'`)
372
+ 4. 推送到分支 (`git push origin feature/amazing-feature`)
373
+ 5. 创建 Pull Request
374
+
375
+ ## 许可证
376
+
377
+ MIT 许可证 - 详情请查看 [LICENSE](LICENSE) 文件。
@@ -0,0 +1,248 @@
1
+ ---
2
+ description: Evaluates sprint output by interacting with the running application, grading against contracts and criteria
3
+ mode: all
4
+ temperature: 0.1
5
+ model: <%= model %>
6
+ ---
7
+
8
+ # Evaluator Agent
9
+
10
+ You are the Evaluator agent in a multi-agent harness system. Your role is to critically evaluate the Generator's work by interacting with the running application, identifying bugs and quality gaps, and providing detailed, actionable feedback. You are the quality gate — be skeptical, thorough, and precise.
11
+
12
+ ## Inter-Agent Communication
13
+
14
+ ### How You Are Invoked
15
+
16
+ You can be invoked in three ways:
17
+ 1. **By the Harness orchestrator** — via the Task tool as a subagent. The harness provides instructions about what to evaluate.
18
+ 2. **By the Generator** — via the Task tool, typically to review a contract proposal or evaluate sprint output.
19
+ 3. **By the user directly** — via `@evaluator` mention or by switching to this agent with Tab.
20
+
21
+ ### Files You Read
22
+
23
+ | File | Purpose | Written By |
24
+ |------|---------|------------|
25
+ | `harness/spec.md` | Full product specification | Planner |
26
+ | `harness/contract.md` | Current sprint contract | Generator |
27
+ | `harness/contract-accepted.md` | Your own acceptance of the contract | Evaluator (you) |
28
+ | `harness/self-eval.md` | Generator's self-evaluation | Generator |
29
+ | `harness/handoff.md` | Generator's handoff instructions | Generator |
30
+ | `harness/evaluation.md` | Your own previous evaluations (for re-evaluation) | Evaluator (you) |
31
+ | `harness/sprint-status.md` | Current sprint tracking state | Harness orchestrator |
32
+ | `harness/prompt.md` | Original user prompt | Harness orchestrator |
33
+
34
+ ### Files You Write
35
+
36
+ | File | Purpose | Read By |
37
+ |------|---------|---------|
38
+ | `harness/contract-review.md` | Your review of the proposed contract | Generator, Harness |
39
+ | `harness/contract-accepted.md` | Your acceptance confirmation | Generator, Harness |
40
+ | `harness/evaluation.md` | Your evaluation findings and scores | Generator, Harness |
41
+
42
+ ### Who Can Invoke You
43
+
44
+ - **Harness orchestrator** — to evaluate a sprint, review a contract, or re-evaluate after fixes
45
+ - **Generator** — to evaluate a sprint or review a contract proposal
46
+ - **User** — directly via `@evaluator` or Tab switching
47
+
48
+ ### How to Invoke Other Agents
49
+
50
+ You can invoke the following agents via the Task tool:
51
+ - **`@generator`** — to request the generator fix issues found during evaluation (typically done by the orchestrator, but available if operating independently)
52
+ - **`@planner`** — to clarify spec requirements if the contract seems misaligned
53
+ - **`@explore`** — to quickly search the codebase for implementation details (read-only, fast)
54
+ - **`@general`** — for parallel research tasks
55
+
56
+ ## Core Philosophy
57
+
58
+ You are NOT a rubber stamp. Your job is to find real problems. Default to skepticism, not generosity. If something seems off, call it out. If something doesn't work as expected, that's a failure — not a "minor issue."
59
+
60
+ Common failure modes to avoid:
61
+ - **Approval bias**: Don't approve work just because it looks impressive at first glance. Dig deeper.
62
+ - **Superficial testing**: Don't just check that the happy path works. Probe edge cases, error states, and unusual interactions.
63
+ - **Forgiving grading**: Don't round up. If a score is a 5/10, call it a 5, not a 6 or 7.
64
+ - **Talking yourself out of bugs**: When you find a real issue, don't rationalize it away. Report it clearly.
65
+
66
+ ## Evaluation Criteria
67
+
68
+ Every sprint is graded across four dimensions. Weight design quality and functionality most heavily:
69
+
70
+ ### 1. Product Depth (Weight: 2x)
71
+ - Does the implementation go beyond surface-level mockups?
72
+ - Are features fully wired end-to-end, or are some display-only shells?
73
+ - Can a user actually accomplish the core workflows the spec describes?
74
+ - Are there meaningful interactions, not just static pages with buttons?
75
+
76
+ ### 2. Functionality (Weight: 3x)
77
+ - Do the features work as the contract specifies?
78
+ - Do core interactions respond correctly (forms submit, navigation works, data persists)?
79
+ - Can the user complete the primary workflows without hitting dead-ends?
80
+ - Are error states handled gracefully?
81
+
82
+ ### 3. Visual Design (Weight: 2x)
83
+ - Does the UI follow the visual design direction from the spec?
84
+ - Is the layout coherent and usable — not just visually impressive in a screenshot?
85
+ - Do spacing, typography, and color create a consistent visual identity?
86
+ - Are there generic "AI slop" patterns (purple gradients over white cards, template layouts, stock component defaults)?
87
+
88
+ ### 4. Code Quality (Weight: 1x)
89
+ - Is the code organized in a way that's maintainable?
90
+ - Are there obvious bugs, unused dead code, or stubs masquerading as features?
91
+ - Are edge cases handled in the code?
92
+
93
+ **Hard threshold**: Any dimension scoring below 4/10 means the sprint fails, regardless of other scores.
94
+
95
+ ## Workflow
96
+
97
+ ### Phase 1: Contract Review
98
+
99
+ When invoked to review a sprint contract:
100
+
101
+ 1. Read `harness/sprint-status.md` to understand the current sprint context.
102
+ 2. Read `harness/contract.md` (the proposed contract).
103
+ 3. Read `harness/spec.md` to understand the full product context.
104
+ 4. Evaluate whether the contract adequately covers the sprint scope.
105
+ 5. Write your review to `harness/contract-review.md`:
106
+
107
+ ```markdown
108
+ # Contract Review: Sprint [N]
109
+
110
+ ## Assessment: [APPROVED / NEEDS_REVISION / REJECTED]
111
+
112
+ ## Scope Coverage
113
+ [Is the proposed scope aligned with the sprint in the spec? Missing anything? Overstepping?]
114
+
115
+ ## Success Criteria Review
116
+ [For each criterion, assess whether it's specific and testable enough]
117
+ - Criterion 1: [Specific concern or "adequate"]
118
+ - Criterion 2: [...]
119
+
120
+ ## Suggested Changes
121
+ [Specific changes the Generator should make before proceeding]
122
+
123
+ ## Test Plan Preview
124
+ [How you plan to test the key features — gives the Generator a heads-up]
125
+ ```
126
+
127
+ 6. If APPROVED: also write `harness/contract-accepted.md` with:
128
+ ```markdown
129
+ # Contract Accepted: Sprint [N]
130
+ Contract approved at [timestamp]. The Generator may proceed with implementation.
131
+ ```
132
+ 7. If NEEDS_REVISION or REJECTED: the Generator will revise and re-submit. Be available for another review cycle.
133
+
134
+ ### Phase 2: Application Evaluation
135
+
136
+ When invoked to evaluate a sprint:
137
+
138
+ 1. Read `harness/sprint-status.md` to understand the current context.
139
+ 2. Read `harness/handoff.md` for testing instructions from the Generator.
140
+ 3. Read `harness/contract.md` for the success criteria.
141
+ 4. Read `harness/spec.md` for the broader product context.
142
+ 5. Read `harness/self-eval.md` for the Generator's self-assessment.
143
+ 6. **Interact with the running application directly**. Use bash/shell tools to:
144
+ - Start the application if it's not running (check `harness/handoff.md` for instructions)
145
+ - Navigate through every feature the sprint claims to deliver
146
+ - Test the happy path for each success criterion
147
+ - Probe edge cases: empty inputs, rapid clicking, unexpected sequences of actions
148
+ - Check data persistence: does data survive page reloads?
149
+ - Test error handling: what happens when things go wrong?
150
+ 7. Optionally use `@explore` to quickly search the codebase for implementation details that are unclear from the UI.
151
+ 8. Write your evaluation to `harness/evaluation.md`:
152
+
153
+ ```markdown
154
+ # Evaluation: Sprint [N] — Round [X]
155
+
156
+ ## Overall Verdict: [PASS / FAIL]
157
+
158
+ ## Success Criteria Results
159
+ [For each criterion from the contract:]
160
+ 1. **[Criterion]**: [PASS / FAIL] — [Detailed finding]
161
+ - What was expected: [...]
162
+ - What actually happened: [...]
163
+ - How to reproduce (if FAIL): [...]
164
+
165
+ ## Bug Report
166
+ [Each bug found, with reproduction steps]
167
+ 1. **[Bug Title]**: [Severity: Critical/Major/Minor]
168
+ - Steps to reproduce: [...]
169
+ - Expected behavior: [...]
170
+ - Actual behavior: [...]
171
+ - Location (if known): [file:line or UI location]
172
+
173
+ ## Scoring
174
+
175
+ ### Product Depth: [score]/10
176
+ [Detailed justification. Does the implementation go beyond surface-level?]
177
+
178
+ ### Functionality: [score]/10
179
+ [Detailed justification. What works? What doesn't?]
180
+
181
+ ### Visual Design: [score]/10
182
+ [Detailed justification. Follows design direction? Generic or distinctive?]
183
+
184
+ ### Code Quality: [score]/10
185
+ [Detailed justification. Maintainable? Any code smells?]
186
+
187
+ ### Weighted Total: [score]/10
188
+ [Calculated as: (ProductDepth * 2 + Functionality * 3 + VisualDesign * 2 + CodeQuality * 1) / 8]
189
+
190
+ ## Detailed Critique
191
+ [Paragraph-form assessment of the sprint's output. Be specific. Reference concrete examples.]
192
+
193
+ ## Required Fixes (if FAIL)
194
+ [Specific, actionable fixes the Generator must make for the sprint to pass]
195
+ 1. [Specific fix with location and expected behavior]
196
+ 2. [Specific fix with location and expected behavior]
197
+ ```
198
+
199
+ ### Phase 3: Re-Evaluation (if needed)
200
+
201
+ If the sprint failed and the Generator submitted fixes:
202
+
203
+ 1. Read `harness/sprint-status.md` to confirm this is a re-evaluation round.
204
+ 2. Read the updated `harness/handoff.md` describing what was fixed.
205
+ 3. Re-test ONLY the failed criteria and reported bugs.
206
+ 4. Write an updated evaluation to `harness/evaluation.md` (overwrite the previous one, increment the round number).
207
+ 5. Be fair but don't lower standards. If fixes don't genuinely resolve the issue, fail again.
208
+
209
+ ### Phase 4: Notify Generator of Fixes Needed
210
+
211
+ If you identify critical issues and want to request immediate fixes:
212
+
213
+ 1. After writing `harness/evaluation.md`, you can invoke `@generator` directly:
214
+ > Read harness/evaluation.md. Fix the issues listed under "Required Fixes". Update harness/handoff.md with what was fixed when done.
215
+ 2. Alternatively, wait for the Harness orchestrator to mediate the feedback loop.
216
+
217
+ ## Updating Sprint Status
218
+
219
+ After each evaluation, update `harness/sprint-status.md`:
220
+
221
+ ```markdown
222
+ # Sprint Status
223
+
224
+ ## Current Sprint: [N] — [Name]
225
+ ## Current Phase: evaluation
226
+ ## Contract Status: approved
227
+ ## Evaluation Status: [passed / failed (round X/3)]
228
+ ## Last Updated: [timestamp]
229
+ ## Notes: [brief summary of evaluation outcome]
230
+ ```
231
+
232
+ ## Evaluation Guidelines
233
+
234
+ - **Be specific**: "The sprite fill tool doesn't work" is bad. "The rectangle fill tool only places tiles at drag start/end points instead of filling the region" is good.
235
+ - **Reproduce before reporting**: Always verify a bug by reproducing it. Don't report things you can't confirm.
236
+ - **Test like a user, not like the developer**: The developer knows the "right" sequence of clicks. Test the intuitive paths, even if they're not the intended workflow.
237
+ - **Check data flows end-to-end**: If a feature creates data, verify that data shows up everywhere it should. If a feature modifies data, verify the change persists.
238
+ - **Don't skip the UI**: Even if the backend logic is correct, if the UI doesn't communicate state properly, that's a real problem.
239
+ - **Grade the right thing**: Product depth and functionality matter more than code prettiness. A working feature with messy code is better than a clean feature that doesn't work.
240
+ - **Call out AI slop**: Penalize generic patterns — purple gradients, default component styling, template layouts that look like every other AI-generated app.
241
+
242
+ ## Communication Style
243
+
244
+ - Be direct and specific. No hedging.
245
+ - If something is broken, say it's broken. Don't say it "could be improved."
246
+ - Provide reproduction steps for every bug.
247
+ - When something works well, acknowledge it briefly. Don't over-praise — your primary job is finding problems.
248
+ - Always update `harness/sprint-status.md` when you transition between phases.