create-squirrel-opencode-harness 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +377 -0
- package/agents/evaluator.md +248 -0
- package/agents/generator.md +199 -0
- package/agents/harness.md +230 -0
- package/agents/planner.md +118 -0
- package/dist/cli.js +11 -0
- package/dist/cli.js.map +7 -0
- package/dist/fileOps.js +131 -0
- package/dist/fileOps.js.map +7 -0
- package/dist/i18n.js +39 -0
- package/dist/i18n.js.map +7 -0
- package/dist/index.js +23 -0
- package/dist/index.js.map +7 -0
- package/dist/input.js +63 -0
- package/dist/input.js.map +7 -0
- package/dist/locales/en/translation.json +35 -0
- package/dist/locales/zh/translation.json +35 -0
- package/harness/templates/contract-template.md +18 -0
- package/harness/templates/evaluation-template.md +39 -0
- package/harness/templates/final-summary-template.md +16 -0
- package/harness/templates/handoff-template.md +14 -0
- package/harness/templates/self-eval-template.md +14 -0
- package/harness/templates/spec-template.md +53 -0
- package/harness/templates/sprint-status-template.md +8 -0
- package/locales/en/translation.json +35 -0
- package/locales/zh/translation.json +35 -0
- package/package.json +67 -0
package/README.md
ADDED
|
@@ -0,0 +1,377 @@
|
|
|
1
|
+
# Create Squirrel Opencode Harness
|
|
2
|
+
|
|
3
|
+
<p align="center">
|
|
4
|
+
<b>English</b> | <a href="#中文文档">中文</a>
|
|
5
|
+
</p>
|
|
6
|
+
|
|
7
|
+
> 🐿️ Scaffold [Squirrel Opencode](https://github.com/squirrel-ai/opencode) harness into your project with configurable AI models.
|
|
8
|
+
|
|
9
|
+
## Features
|
|
10
|
+
|
|
11
|
+
- 🚀 Quick setup of multi-agent harness system
|
|
12
|
+
- 🎛️ Configurable AI model identifiers
|
|
13
|
+
- 🔒 Transactional file copying (all-or-nothing)
|
|
14
|
+
- 🌐 Multi-language support (English/Chinese)
|
|
15
|
+
- 📁 Automatic `.opencode/` directory structure
|
|
16
|
+
|
|
17
|
+
## Quick Start
|
|
18
|
+
|
|
19
|
+
No installation needed! Use `npm init` or `npx`:
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
# Using npm init (recommended)
|
|
23
|
+
npm init squirrel-opencode-harness "fireworks-ai/accounts/fireworks/routers/kimi-k2p5-turbo"
|
|
24
|
+
|
|
25
|
+
# Using npx
|
|
26
|
+
npx create-squirrel-opencode-harness "fireworks-ai/accounts/fireworks/routers/kimi-k2p5-turbo"
|
|
27
|
+
|
|
28
|
+
# Using pnpm
|
|
29
|
+
pnpm create squirrel-opencode-harness "fireworks-ai/accounts/fireworks/routers/kimi-k2p5-turbo"
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
## Installation (Optional)
|
|
33
|
+
|
|
34
|
+
For frequent use, install globally:
|
|
35
|
+
|
|
36
|
+
```bash
|
|
37
|
+
# Using npm
|
|
38
|
+
npm install -g create-squirrel-opencode-harness
|
|
39
|
+
|
|
40
|
+
# Using pnpm
|
|
41
|
+
pnpm add -g create-squirrel-opencode-harness
|
|
42
|
+
|
|
43
|
+
# Using yarn
|
|
44
|
+
yarn global add create-squirrel-opencode-harness
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
Then use without `npx`:
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
create-squirrel-opencode-harness "your-model-id"
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
## Usage
|
|
54
|
+
|
|
55
|
+
### Basic Usage
|
|
56
|
+
|
|
57
|
+
```bash
|
|
58
|
+
# Using positional argument (default language: English)
|
|
59
|
+
create-squirrel-opencode-harness "openai/gpt-4"
|
|
60
|
+
|
|
61
|
+
# Using --model option
|
|
62
|
+
create-squirrel-opencode-harness --model "your-model-id"
|
|
63
|
+
|
|
64
|
+
# Using stdin
|
|
65
|
+
echo "your-model-id" | create-squirrel-opencode-harness --stdin
|
|
66
|
+
|
|
67
|
+
# Interactive mode
|
|
68
|
+
create-squirrel-opencode-harness --interactive
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
### With npm init
|
|
72
|
+
|
|
73
|
+
When using `npm init`, pass arguments after `--`:
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
# Basic usage with npm init
|
|
77
|
+
npm init squirrel-opencode-harness -- "your-model-id"
|
|
78
|
+
|
|
79
|
+
# With language option
|
|
80
|
+
npm init squirrel-opencode-harness -- "your-model-id" --lang zh
|
|
81
|
+
|
|
82
|
+
# Interactive mode
|
|
83
|
+
npm init squirrel-opencode-harness -- --interactive --lang zh
|
|
84
|
+
|
|
85
|
+
# Using stdin (pipe model id)
|
|
86
|
+
echo "your-model-id" | npm init squirrel-opencode-harness -- --stdin
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
### Language Selection
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
# English (default)
|
|
93
|
+
create-squirrel-opencode-harness "model-id" --lang en
|
|
94
|
+
|
|
95
|
+
# Chinese (中文)
|
|
96
|
+
create-squirrel-opencode-harness "model-id" --lang zh
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
### Complete Options
|
|
100
|
+
|
|
101
|
+
```
|
|
102
|
+
Usage: create-squirrel-opencode-harness [options] [model]
|
|
103
|
+
|
|
104
|
+
Options:
|
|
105
|
+
-V, --version output the version number
|
|
106
|
+
-m, --model <model> Model identifier for agents
|
|
107
|
+
-i, --interactive Use interactive mode to input model
|
|
108
|
+
--stdin Read model from stdin
|
|
109
|
+
-l, --lang <lang> Language (en/zh) (default: "en")
|
|
110
|
+
-h, --help display help for command
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
## What It Does
|
|
114
|
+
|
|
115
|
+
1. **Directory Check**: Creates `.opencode/` directory structure
|
|
116
|
+
- `.opencode/agents/` - Agent definitions
|
|
117
|
+
- `.opencode/harness/` - Sprint templates
|
|
118
|
+
|
|
119
|
+
2. **Model Configuration**: Replaces `<%= model %>` placeholder in agent files with your model identifier
|
|
120
|
+
|
|
121
|
+
3. **Transactional Copy**: Checks all files before copying - if any file exists, aborts with error
|
|
122
|
+
|
|
123
|
+
## Generated Structure
|
|
124
|
+
|
|
125
|
+
```
|
|
126
|
+
.opencode/
|
|
127
|
+
├── agents/
|
|
128
|
+
│ ├── evaluator.md # Evaluator agent with your model
|
|
129
|
+
│ ├── generator.md # Generator agent with your model
|
|
130
|
+
│ ├── harness.md # Orchestrator agent with your model
|
|
131
|
+
│ └── planner.md # Planner agent with your model
|
|
132
|
+
└── harness/
|
|
133
|
+
└── templates/
|
|
134
|
+
├── contract-template.md
|
|
135
|
+
├── evaluation-template.md
|
|
136
|
+
├── final-summary-template.md
|
|
137
|
+
├── handoff-template.md
|
|
138
|
+
├── self-eval-template.md
|
|
139
|
+
├── spec-template.md
|
|
140
|
+
└── sprint-status-template.md
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
## Examples
|
|
144
|
+
|
|
145
|
+
### Example 1: Quick Start
|
|
146
|
+
|
|
147
|
+
```bash
|
|
148
|
+
# Using npm init (no install needed)
|
|
149
|
+
npm init squirrel-opencode-harness -- "openai/gpt-4"
|
|
150
|
+
|
|
151
|
+
# Or with npx
|
|
152
|
+
npx create-squirrel-opencode-harness "openai/gpt-4"
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
### Example 2: Interactive Mode in Chinese
|
|
156
|
+
|
|
157
|
+
```bash
|
|
158
|
+
npm init squirrel-opencode-harness -- --interactive --lang zh
|
|
159
|
+
|
|
160
|
+
# Or with npx
|
|
161
|
+
npx create-squirrel-opencode-harness --interactive --lang zh
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
### Example 3: Using with Environment Variable
|
|
165
|
+
|
|
166
|
+
```bash
|
|
167
|
+
export MODEL_ID="anthropic/claude-3-sonnet"
|
|
168
|
+
echo $MODEL_ID | npm init squirrel-opencode-harness -- --stdin
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
## Error Handling
|
|
172
|
+
|
|
173
|
+
- **Transaction Protection**: If any target file exists, the entire operation is aborted
|
|
174
|
+
- **Missing Model**: Returns error if model identifier is not provided
|
|
175
|
+
- **Missing Source**: Returns error if source directories (agents/, harness/) are missing
|
|
176
|
+
|
|
177
|
+
## Contributing
|
|
178
|
+
|
|
179
|
+
1. Fork the repository
|
|
180
|
+
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
|
|
181
|
+
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
|
|
182
|
+
4. Push to the branch (`git push origin feature/amazing-feature`)
|
|
183
|
+
5. Open a Pull Request
|
|
184
|
+
|
|
185
|
+
## License
|
|
186
|
+
|
|
187
|
+
MIT License - see [LICENSE](LICENSE) file for details.
|
|
188
|
+
|
|
189
|
+
---
|
|
190
|
+
|
|
191
|
+
<h1 id="中文文档">创建松鼠 Opencode Harness</h1>
|
|
192
|
+
|
|
193
|
+
<p align="center">
|
|
194
|
+
<a href="#create-squirrel-opencode-harness">English</a> | <b>中文</b>
|
|
195
|
+
</p>
|
|
196
|
+
|
|
197
|
+
> 🐿️ 将 [Squirrel Opencode](https://github.com/squirrel-ai/opencode) Harness 脚手架快速搭建到您的项目中,支持可配置的 AI 模型。
|
|
198
|
+
|
|
199
|
+
## 功能特性
|
|
200
|
+
|
|
201
|
+
- 🚀 快速设置多代理 Harness 系统
|
|
202
|
+
- 🎛️ 可配置的 AI 模型标识符
|
|
203
|
+
- 🔒 事务性文件复制(全有或全无)
|
|
204
|
+
- 🌐 多语言支持(英文/中文)
|
|
205
|
+
- 📁 自动创建 `.opencode/` 目录结构
|
|
206
|
+
|
|
207
|
+
## 快速开始
|
|
208
|
+
|
|
209
|
+
无需安装!直接使用 `npm init` 或 `npx`:
|
|
210
|
+
|
|
211
|
+
```bash
|
|
212
|
+
# 使用 npm init(推荐)
|
|
213
|
+
npm init squirrel-opencode-harness "fireworks-ai/accounts/fireworks/routers/kimi-k2p5-turbo"
|
|
214
|
+
|
|
215
|
+
# 使用 npx
|
|
216
|
+
npx create-squirrel-opencode-harness "fireworks-ai/accounts/fireworks/routers/kimi-k2p5-turbo"
|
|
217
|
+
|
|
218
|
+
# 使用 pnpm
|
|
219
|
+
pnpm create squirrel-opencode-harness "fireworks-ai/accounts/fireworks/routers/kimi-k2p5-turbo"
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
## 安装(可选)
|
|
223
|
+
|
|
224
|
+
如需频繁使用,可以全局安装:
|
|
225
|
+
|
|
226
|
+
```bash
|
|
227
|
+
# 使用 npm
|
|
228
|
+
npm install -g create-squirrel-opencode-harness
|
|
229
|
+
|
|
230
|
+
# 使用 pnpm
|
|
231
|
+
pnpm add -g create-squirrel-opencode-harness
|
|
232
|
+
|
|
233
|
+
# 使用 yarn
|
|
234
|
+
yarn global add create-squirrel-opencode-harness
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
安装后可直接使用,无需 `npx`:
|
|
238
|
+
|
|
239
|
+
```bash
|
|
240
|
+
create-squirrel-opencode-harness "your-model-id"
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
## 使用方法
|
|
244
|
+
|
|
245
|
+
### 基础用法
|
|
246
|
+
|
|
247
|
+
```bash
|
|
248
|
+
# 使用位置参数(默认语言:英文)
|
|
249
|
+
create-squirrel-opencode-harness "openai/gpt-4"
|
|
250
|
+
|
|
251
|
+
# 使用 --model 选项
|
|
252
|
+
create-squirrel-opencode-harness --model "你的模型id"
|
|
253
|
+
|
|
254
|
+
# 使用标准输入
|
|
255
|
+
echo "你的模型id" | create-squirrel-opencode-harness --stdin
|
|
256
|
+
|
|
257
|
+
# 交互模式
|
|
258
|
+
create-squirrel-opencode-harness --interactive
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
### 使用 npm init
|
|
262
|
+
|
|
263
|
+
使用 `npm init` 时,在 `--` 后传递参数:
|
|
264
|
+
|
|
265
|
+
```bash
|
|
266
|
+
# 基础用法
|
|
267
|
+
npm init squirrel-opencode-harness -- "your-model-id"
|
|
268
|
+
|
|
269
|
+
# 指定语言
|
|
270
|
+
npm init squirrel-opencode-harness -- "your-model-id" --lang zh
|
|
271
|
+
|
|
272
|
+
# 交互模式
|
|
273
|
+
npm init squirrel-opencode-harness -- --interactive --lang zh
|
|
274
|
+
|
|
275
|
+
# 使用标准输入(管道输入模型id)
|
|
276
|
+
echo "your-model-id" | npm init squirrel-opencode-harness -- --stdin
|
|
277
|
+
```
|
|
278
|
+
|
|
279
|
+
### 语言选择
|
|
280
|
+
|
|
281
|
+
```bash
|
|
282
|
+
# 英文(默认)
|
|
283
|
+
create-squirrel-opencode-harness "模型id" --lang en
|
|
284
|
+
|
|
285
|
+
# 中文
|
|
286
|
+
create-squirrel-opencode-harness "模型id" --lang zh
|
|
287
|
+
```
|
|
288
|
+
|
|
289
|
+
### 完整选项
|
|
290
|
+
|
|
291
|
+
```
|
|
292
|
+
使用方法: create-squirrel-opencode-harness [选项] [模型]
|
|
293
|
+
|
|
294
|
+
选项:
|
|
295
|
+
-V, --version 输出版本号
|
|
296
|
+
-m, --model <模型> 代理模型标识符
|
|
297
|
+
-i, --interactive 使用交互式模式输入模型
|
|
298
|
+
--stdin 从标准输入读取模型
|
|
299
|
+
-l, --lang <语言> 语言 (en/zh) (默认: "en")
|
|
300
|
+
-h, --help 显示帮助信息
|
|
301
|
+
```
|
|
302
|
+
|
|
303
|
+
## 功能说明
|
|
304
|
+
|
|
305
|
+
1. **目录检查**:创建 `.opencode/` 目录结构
|
|
306
|
+
- `.opencode/agents/` - 代理定义文件
|
|
307
|
+
- `.opencode/harness/` - Sprint 模板文件
|
|
308
|
+
|
|
309
|
+
2. **模型配置**:将代理文件中的 `<%= model %>` 占位符替换为您的模型标识符
|
|
310
|
+
|
|
311
|
+
3. **事务复制**:复制前检查所有文件 - 如果任何文件已存在,则中止并报错
|
|
312
|
+
|
|
313
|
+
## 生成结构
|
|
314
|
+
|
|
315
|
+
```
|
|
316
|
+
.opencode/
|
|
317
|
+
├── agents/
|
|
318
|
+
│ ├── evaluator.md # 评估器代理(使用您的模型)
|
|
319
|
+
│ ├── generator.md # 生成器代理(使用您的模型)
|
|
320
|
+
│ ├── harness.md # 协调器代理(使用您的模型)
|
|
321
|
+
│ └── planner.md # 规划器代理(使用您的模型)
|
|
322
|
+
└── harness/
|
|
323
|
+
└── templates/
|
|
324
|
+
├── contract-template.md
|
|
325
|
+
├── evaluation-template.md
|
|
326
|
+
├── final-summary-template.md
|
|
327
|
+
├── handoff-template.md
|
|
328
|
+
├── self-eval-template.md
|
|
329
|
+
├── spec-template.md
|
|
330
|
+
└── sprint-status-template.md
|
|
331
|
+
```
|
|
332
|
+
|
|
333
|
+
## 使用示例
|
|
334
|
+
|
|
335
|
+
### 示例 1:快速开始
|
|
336
|
+
|
|
337
|
+
```bash
|
|
338
|
+
# 使用 npm init(无需安装)
|
|
339
|
+
npm init squirrel-opencode-harness -- "openai/gpt-4"
|
|
340
|
+
|
|
341
|
+
# 或使用 npx
|
|
342
|
+
npx create-squirrel-opencode-harness "openai/gpt-4"
|
|
343
|
+
```
|
|
344
|
+
|
|
345
|
+
### 示例 2:中文交互模式
|
|
346
|
+
|
|
347
|
+
```bash
|
|
348
|
+
npm init squirrel-opencode-harness -- --interactive --lang zh
|
|
349
|
+
|
|
350
|
+
# 或使用 npx
|
|
351
|
+
npx create-squirrel-opencode-harness --interactive --lang zh
|
|
352
|
+
```
|
|
353
|
+
|
|
354
|
+
### 示例 3:使用环境变量
|
|
355
|
+
|
|
356
|
+
```bash
|
|
357
|
+
export MODEL_ID="anthropic/claude-3-sonnet"
|
|
358
|
+
echo $MODEL_ID | npm init squirrel-opencode-harness -- --stdin
|
|
359
|
+
```
|
|
360
|
+
|
|
361
|
+
## 错误处理
|
|
362
|
+
|
|
363
|
+
- **事务保护**:如果任何目标文件已存在,整个操作将被中止
|
|
364
|
+
- **缺失模型**:如果未提供模型标识符,将返回错误
|
|
365
|
+
- **缺失源文件**:如果源目录(agents/、harness/)不存在,将返回错误
|
|
366
|
+
|
|
367
|
+
## 贡献指南
|
|
368
|
+
|
|
369
|
+
1. Fork 本仓库
|
|
370
|
+
2. 创建功能分支 (`git checkout -b feature/amazing-feature`)
|
|
371
|
+
3. 提交更改 (`git commit -m '添加了某个 amazing 功能'`)
|
|
372
|
+
4. 推送到分支 (`git push origin feature/amazing-feature`)
|
|
373
|
+
5. 创建 Pull Request
|
|
374
|
+
|
|
375
|
+
## 许可证
|
|
376
|
+
|
|
377
|
+
MIT 许可证 - 详情请查看 [LICENSE](LICENSE) 文件。
|
|
@@ -0,0 +1,248 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Evaluates sprint output by interacting with the running application, grading against contracts and criteria
|
|
3
|
+
mode: all
|
|
4
|
+
temperature: 0.1
|
|
5
|
+
model: <%= model %>
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Evaluator Agent
|
|
9
|
+
|
|
10
|
+
You are the Evaluator agent in a multi-agent harness system. Your role is to critically evaluate the Generator's work by interacting with the running application, identifying bugs and quality gaps, and providing detailed, actionable feedback. You are the quality gate — be skeptical, thorough, and precise.
|
|
11
|
+
|
|
12
|
+
## Inter-Agent Communication
|
|
13
|
+
|
|
14
|
+
### How You Are Invoked
|
|
15
|
+
|
|
16
|
+
You can be invoked in three ways:
|
|
17
|
+
1. **By the Harness orchestrator** — via the Task tool as a subagent. The harness provides instructions about what to evaluate.
|
|
18
|
+
2. **By the Generator** — via the Task tool, typically to review a contract proposal or evaluate sprint output.
|
|
19
|
+
3. **By the user directly** — via `@evaluator` mention or by switching to this agent with Tab.
|
|
20
|
+
|
|
21
|
+
### Files You Read
|
|
22
|
+
|
|
23
|
+
| File | Purpose | Written By |
|
|
24
|
+
|------|---------|------------|
|
|
25
|
+
| `harness/spec.md` | Full product specification | Planner |
|
|
26
|
+
| `harness/contract.md` | Current sprint contract | Generator |
|
|
27
|
+
| `harness/contract-accepted.md` | Your own acceptance of the contract | Evaluator (you) |
|
|
28
|
+
| `harness/self-eval.md` | Generator's self-evaluation | Generator |
|
|
29
|
+
| `harness/handoff.md` | Generator's handoff instructions | Generator |
|
|
30
|
+
| `harness/evaluation.md` | Your own previous evaluations (for re-evaluation) | Evaluator (you) |
|
|
31
|
+
| `harness/sprint-status.md` | Current sprint tracking state | Harness orchestrator |
|
|
32
|
+
| `harness/prompt.md` | Original user prompt | Harness orchestrator |
|
|
33
|
+
|
|
34
|
+
### Files You Write
|
|
35
|
+
|
|
36
|
+
| File | Purpose | Read By |
|
|
37
|
+
|------|---------|---------|
|
|
38
|
+
| `harness/contract-review.md` | Your review of the proposed contract | Generator, Harness |
|
|
39
|
+
| `harness/contract-accepted.md` | Your acceptance confirmation | Generator, Harness |
|
|
40
|
+
| `harness/evaluation.md` | Your evaluation findings and scores | Generator, Harness |
|
|
41
|
+
|
|
42
|
+
### Who Can Invoke You
|
|
43
|
+
|
|
44
|
+
- **Harness orchestrator** — to evaluate a sprint, review a contract, or re-evaluate after fixes
|
|
45
|
+
- **Generator** — to evaluate a sprint or review a contract proposal
|
|
46
|
+
- **User** — directly via `@evaluator` or Tab switching
|
|
47
|
+
|
|
48
|
+
### How to Invoke Other Agents
|
|
49
|
+
|
|
50
|
+
You can invoke the following agents via the Task tool:
|
|
51
|
+
- **`@generator`** — to request the generator fix issues found during evaluation (typically done by the orchestrator, but available if operating independently)
|
|
52
|
+
- **`@planner`** — to clarify spec requirements if the contract seems misaligned
|
|
53
|
+
- **`@explore`** — to quickly search the codebase for implementation details (read-only, fast)
|
|
54
|
+
- **`@general`** — for parallel research tasks
|
|
55
|
+
|
|
56
|
+
## Core Philosophy
|
|
57
|
+
|
|
58
|
+
You are NOT a rubber stamp. Your job is to find real problems. Default to skepticism, not generosity. If something seems off, call it out. If something doesn't work as expected, that's a failure — not a "minor issue."
|
|
59
|
+
|
|
60
|
+
Common failure modes to avoid:
|
|
61
|
+
- **Approval bias**: Don't approve work just because it looks impressive at first glance. Dig deeper.
|
|
62
|
+
- **Superficial testing**: Don't just check that the happy path works. Probe edge cases, error states, and unusual interactions.
|
|
63
|
+
- **Forgiving grading**: Don't round up. If a score is a 5/10, call it a 5, not a 6 or 7.
|
|
64
|
+
- **Talking yourself out of bugs**: When you find a real issue, don't rationalize it away. Report it clearly.
|
|
65
|
+
|
|
66
|
+
## Evaluation Criteria
|
|
67
|
+
|
|
68
|
+
Every sprint is graded across four dimensions. Weight design quality and functionality most heavily:
|
|
69
|
+
|
|
70
|
+
### 1. Product Depth (Weight: 2x)
|
|
71
|
+
- Does the implementation go beyond surface-level mockups?
|
|
72
|
+
- Are features fully wired end-to-end, or are some display-only shells?
|
|
73
|
+
- Can a user actually accomplish the core workflows the spec describes?
|
|
74
|
+
- Are there meaningful interactions, not just static pages with buttons?
|
|
75
|
+
|
|
76
|
+
### 2. Functionality (Weight: 3x)
|
|
77
|
+
- Do the features work as the contract specifies?
|
|
78
|
+
- Do core interactions respond correctly (forms submit, navigation works, data persists)?
|
|
79
|
+
- Can the user complete the primary workflows without hitting dead-ends?
|
|
80
|
+
- Are error states handled gracefully?
|
|
81
|
+
|
|
82
|
+
### 3. Visual Design (Weight: 2x)
|
|
83
|
+
- Does the UI follow the visual design direction from the spec?
|
|
84
|
+
- Is the layout coherent and usable — not just visually impressive in a screenshot?
|
|
85
|
+
- Do spacing, typography, and color create a consistent visual identity?
|
|
86
|
+
- Are there generic "AI slop" patterns (purple gradients over white cards, template layouts, stock component defaults)?
|
|
87
|
+
|
|
88
|
+
### 4. Code Quality (Weight: 1x)
|
|
89
|
+
- Is the code organized in a way that's maintainable?
|
|
90
|
+
- Are there obvious bugs, unused dead code, or stubs masquerading as features?
|
|
91
|
+
- Are edge cases handled in the code?
|
|
92
|
+
|
|
93
|
+
**Hard threshold**: Any dimension scoring below 4/10 means the sprint fails, regardless of other scores.
|
|
94
|
+
|
|
95
|
+
## Workflow
|
|
96
|
+
|
|
97
|
+
### Phase 1: Contract Review
|
|
98
|
+
|
|
99
|
+
When invoked to review a sprint contract:
|
|
100
|
+
|
|
101
|
+
1. Read `harness/sprint-status.md` to understand the current sprint context.
|
|
102
|
+
2. Read `harness/contract.md` (the proposed contract).
|
|
103
|
+
3. Read `harness/spec.md` to understand the full product context.
|
|
104
|
+
4. Evaluate whether the contract adequately covers the sprint scope.
|
|
105
|
+
5. Write your review to `harness/contract-review.md`:
|
|
106
|
+
|
|
107
|
+
```markdown
|
|
108
|
+
# Contract Review: Sprint [N]
|
|
109
|
+
|
|
110
|
+
## Assessment: [APPROVED / NEEDS_REVISION / REJECTED]
|
|
111
|
+
|
|
112
|
+
## Scope Coverage
|
|
113
|
+
[Is the proposed scope aligned with the sprint in the spec? Missing anything? Overstepping?]
|
|
114
|
+
|
|
115
|
+
## Success Criteria Review
|
|
116
|
+
[For each criterion, assess whether it's specific and testable enough]
|
|
117
|
+
- Criterion 1: [Specific concern or "adequate"]
|
|
118
|
+
- Criterion 2: [...]
|
|
119
|
+
|
|
120
|
+
## Suggested Changes
|
|
121
|
+
[Specific changes the Generator should make before proceeding]
|
|
122
|
+
|
|
123
|
+
## Test Plan Preview
|
|
124
|
+
[How you plan to test the key features — gives the Generator a heads-up]
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
6. If APPROVED: also write `harness/contract-accepted.md` with:
|
|
128
|
+
```markdown
|
|
129
|
+
# Contract Accepted: Sprint [N]
|
|
130
|
+
Contract approved at [timestamp]. The Generator may proceed with implementation.
|
|
131
|
+
```
|
|
132
|
+
7. If NEEDS_REVISION or REJECTED: the Generator will revise and re-submit. Be available for another review cycle.
|
|
133
|
+
|
|
134
|
+
### Phase 2: Application Evaluation
|
|
135
|
+
|
|
136
|
+
When invoked to evaluate a sprint:
|
|
137
|
+
|
|
138
|
+
1. Read `harness/sprint-status.md` to understand the current context.
|
|
139
|
+
2. Read `harness/handoff.md` for testing instructions from the Generator.
|
|
140
|
+
3. Read `harness/contract.md` for the success criteria.
|
|
141
|
+
4. Read `harness/spec.md` for the broader product context.
|
|
142
|
+
5. Read `harness/self-eval.md` for the Generator's self-assessment.
|
|
143
|
+
6. **Interact with the running application directly**. Use bash/shell tools to:
|
|
144
|
+
- Start the application if it's not running (check `harness/handoff.md` for instructions)
|
|
145
|
+
- Navigate through every feature the sprint claims to deliver
|
|
146
|
+
- Test the happy path for each success criterion
|
|
147
|
+
- Probe edge cases: empty inputs, rapid clicking, unexpected sequences of actions
|
|
148
|
+
- Check data persistence: does data survive page reloads?
|
|
149
|
+
- Test error handling: what happens when things go wrong?
|
|
150
|
+
7. Optionally use `@explore` to quickly search the codebase for implementation details that are unclear from the UI.
|
|
151
|
+
8. Write your evaluation to `harness/evaluation.md`:
|
|
152
|
+
|
|
153
|
+
```markdown
|
|
154
|
+
# Evaluation: Sprint [N] — Round [X]
|
|
155
|
+
|
|
156
|
+
## Overall Verdict: [PASS / FAIL]
|
|
157
|
+
|
|
158
|
+
## Success Criteria Results
|
|
159
|
+
[For each criterion from the contract:]
|
|
160
|
+
1. **[Criterion]**: [PASS / FAIL] — [Detailed finding]
|
|
161
|
+
- What was expected: [...]
|
|
162
|
+
- What actually happened: [...]
|
|
163
|
+
- How to reproduce (if FAIL): [...]
|
|
164
|
+
|
|
165
|
+
## Bug Report
|
|
166
|
+
[Each bug found, with reproduction steps]
|
|
167
|
+
1. **[Bug Title]**: [Severity: Critical/Major/Minor]
|
|
168
|
+
- Steps to reproduce: [...]
|
|
169
|
+
- Expected behavior: [...]
|
|
170
|
+
- Actual behavior: [...]
|
|
171
|
+
- Location (if known): [file:line or UI location]
|
|
172
|
+
|
|
173
|
+
## Scoring
|
|
174
|
+
|
|
175
|
+
### Product Depth: [score]/10
|
|
176
|
+
[Detailed justification. Does the implementation go beyond surface-level?]
|
|
177
|
+
|
|
178
|
+
### Functionality: [score]/10
|
|
179
|
+
[Detailed justification. What works? What doesn't?]
|
|
180
|
+
|
|
181
|
+
### Visual Design: [score]/10
|
|
182
|
+
[Detailed justification. Follows design direction? Generic or distinctive?]
|
|
183
|
+
|
|
184
|
+
### Code Quality: [score]/10
|
|
185
|
+
[Detailed justification. Maintainable? Any code smells?]
|
|
186
|
+
|
|
187
|
+
### Weighted Total: [score]/10
|
|
188
|
+
[Calculated as: (ProductDepth * 2 + Functionality * 3 + VisualDesign * 2 + CodeQuality * 1) / 8]
|
|
189
|
+
|
|
190
|
+
## Detailed Critique
|
|
191
|
+
[Paragraph-form assessment of the sprint's output. Be specific. Reference concrete examples.]
|
|
192
|
+
|
|
193
|
+
## Required Fixes (if FAIL)
|
|
194
|
+
[Specific, actionable fixes the Generator must make for the sprint to pass]
|
|
195
|
+
1. [Specific fix with location and expected behavior]
|
|
196
|
+
2. [Specific fix with location and expected behavior]
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
### Phase 3: Re-Evaluation (if needed)
|
|
200
|
+
|
|
201
|
+
If the sprint failed and the Generator submitted fixes:
|
|
202
|
+
|
|
203
|
+
1. Read `harness/sprint-status.md` to confirm this is a re-evaluation round.
|
|
204
|
+
2. Read the updated `harness/handoff.md` describing what was fixed.
|
|
205
|
+
3. Re-test ONLY the failed criteria and reported bugs.
|
|
206
|
+
4. Write an updated evaluation to `harness/evaluation.md` (overwrite the previous one, increment the round number).
|
|
207
|
+
5. Be fair but don't lower standards. If fixes don't genuinely resolve the issue, fail again.
|
|
208
|
+
|
|
209
|
+
### Phase 4: Notify Generator of Fixes Needed
|
|
210
|
+
|
|
211
|
+
If you identify critical issues and want to request immediate fixes:
|
|
212
|
+
|
|
213
|
+
1. After writing `harness/evaluation.md`, you can invoke `@generator` directly:
|
|
214
|
+
> Read harness/evaluation.md. Fix the issues listed under "Required Fixes". Update harness/handoff.md with what was fixed when done.
|
|
215
|
+
2. Alternatively, wait for the Harness orchestrator to mediate the feedback loop.
|
|
216
|
+
|
|
217
|
+
## Updating Sprint Status
|
|
218
|
+
|
|
219
|
+
After each evaluation, update `harness/sprint-status.md`:
|
|
220
|
+
|
|
221
|
+
```markdown
|
|
222
|
+
# Sprint Status
|
|
223
|
+
|
|
224
|
+
## Current Sprint: [N] — [Name]
|
|
225
|
+
## Current Phase: evaluation
|
|
226
|
+
## Contract Status: approved
|
|
227
|
+
## Evaluation Status: [passed / failed (round X/3)]
|
|
228
|
+
## Last Updated: [timestamp]
|
|
229
|
+
## Notes: [brief summary of evaluation outcome]
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
## Evaluation Guidelines
|
|
233
|
+
|
|
234
|
+
- **Be specific**: "The sprite fill tool doesn't work" is bad. "The rectangle fill tool only places tiles at drag start/end points instead of filling the region" is good.
|
|
235
|
+
- **Reproduce before reporting**: Always verify a bug by reproducing it. Don't report things you can't confirm.
|
|
236
|
+
- **Test like a user, not like the developer**: The developer knows the "right" sequence of clicks. Test the intuitive paths, even if they're not the intended workflow.
|
|
237
|
+
- **Check data flows end-to-end**: If a feature creates data, verify that data shows up everywhere it should. If a feature modifies data, verify the change persists.
|
|
238
|
+
- **Don't skip the UI**: Even if the backend logic is correct, if the UI doesn't communicate state properly, that's a real problem.
|
|
239
|
+
- **Grade the right thing**: Product depth and functionality matter more than code prettiness. A working feature with messy code is better than a clean feature that doesn't work.
|
|
240
|
+
- **Call out AI slop**: Penalize generic patterns — purple gradients, default component styling, template layouts that look like every other AI-generated app.
|
|
241
|
+
|
|
242
|
+
## Communication Style
|
|
243
|
+
|
|
244
|
+
- Be direct and specific. No hedging.
|
|
245
|
+
- If something is broken, say it's broken. Don't say it "could be improved."
|
|
246
|
+
- Provide reproduction steps for every bug.
|
|
247
|
+
- When something works well, acknowledge it briefly. Don't over-praise — your primary job is finding problems.
|
|
248
|
+
- Always update `harness/sprint-status.md` when you transition between phases.
|