codex-harness-engineering 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,302 @@
1
+ # Harness Artifact Templates
2
+
3
+ Use these templates selectively. Do not create every artifact by default.
4
+
5
+ Each artifact must answer at least one question:
6
+
7
+ - What should the agent know?
8
+ - What state survives context loss?
9
+ - What can the agent observe?
10
+ - How does the agent verify work?
11
+ - What constraint is mechanically enforced?
12
+
13
+ ## Contents
14
+
15
+ - Minimal Repository Harness
16
+ - AGENTS.md
17
+ - progress.md
18
+ - feature_list.json
19
+ - init.sh
20
+ - Makefile
21
+ - Acceptance Contract
22
+ - Sprint Contract
23
+ - Evaluator Notes
24
+ - Cleanup Task
25
+
26
+ ## Minimal Repository Harness
27
+
28
+ Start here unless a named failure mode requires more.
29
+
30
+ ```text
31
+ AGENTS.md
32
+ README.md
33
+ progress.md
34
+ feature_list.json
35
+ init.sh
36
+ Makefile or task runner
37
+ tests/ or smoke test
38
+ ```
39
+
40
+ Optional only when needed:
41
+
42
+ ```text
43
+ docs/architecture.md
44
+ docs/product-spec.md
45
+ docs/tool-contracts.md
46
+ evals/
47
+ cleanup.md
48
+ ```
49
+
50
+ ## AGENTS.md
51
+
52
+ ```markdown
53
+ # Agent Instructions
54
+
55
+ ## Start Here
56
+ 1. Read `README.md`.
57
+ 2. Read latest entries in `progress.md`.
58
+ 3. Check `feature_list.json`.
59
+ 4. Run `./init.sh` or the standard setup command.
60
+ 5. Run the cheapest smoke test before editing.
61
+
62
+ ## Commands
63
+ - Setup:
64
+ - Test:
65
+ - Lint:
66
+ - Build:
67
+ - Smoke:
68
+
69
+ ## Rules
70
+ - Keep changes scoped to the requested feature/fix.
71
+ - Update feature status only after verification passes.
72
+ - Record durable progress before ending a long session.
73
+ - Do not refactor unrelated code.
74
+ ```
75
+
76
+ ## progress.md
77
+
78
+ ```markdown
79
+ # Progress
80
+
81
+ ## YYYY-MM-DD
82
+
83
+ ### Context
84
+ - Task:
85
+ - Current branch:
86
+ - Relevant files:
87
+
88
+ ### Done
89
+ - ...
90
+
91
+ ### Verification
92
+ - Command:
93
+ - Result:
94
+
95
+ ### Open Issues
96
+ - ...
97
+
98
+ ### Next
99
+ - ...
100
+ ```
101
+
102
+ Keep entries short and recoverable. Prefer file paths, command names, failing
103
+ test names, and artifact paths over vague prose.
104
+
105
+ ## feature_list.json
106
+
107
+ ```json
108
+ [
109
+ {
110
+ "id": "F001",
111
+ "title": "Feature or capability",
112
+ "status": "not_started",
113
+ "acceptance": [
114
+ "User can ...",
115
+ "System rejects ...",
116
+ "Regression check passes ..."
117
+ ],
118
+ "verify": [
119
+ "make test",
120
+ "make smoke"
121
+ ],
122
+ "evidence": []
123
+ }
124
+ ]
125
+ ```
126
+
127
+ Use status values consistently: `not_started`, `in_progress`, `blocked`,
128
+ `verified`. Only set `verified` after listed checks pass.
129
+
130
+ ## init.sh
131
+
132
+ ```bash
133
+ #!/usr/bin/env bash
134
+ set -euo pipefail
135
+
136
+ # Keep this script idempotent. It should be safe for a new session to run first.
137
+ make setup
138
+ make smoke
139
+ ```
140
+
141
+ ## Makefile
142
+
143
+ ```makefile
144
+ .PHONY: setup test lint build smoke verify
145
+
146
+ setup:
147
+ # install dependencies or prepare local environment
148
+
149
+ test:
150
+ # run unit tests
151
+
152
+ lint:
153
+ # run lint or structural checks
154
+
155
+ build:
156
+ # run build
157
+
158
+ smoke:
159
+ # run the cheapest end-to-end confidence check
160
+
161
+ verify: lint test build smoke
162
+ ```
163
+
164
+ Keep command names stable. Agent instructions should point to these targets
165
+ instead of repeating long command lines across files.
166
+
167
+ ## Acceptance Contract
168
+
169
+ Use this for a small bug or feature when planner/evaluator would be too much.
170
+
171
+ ```markdown
172
+ # Acceptance Contract
173
+
174
+ ## Scope
175
+ - Feature/fix:
176
+ - User-visible behavior:
177
+ - Likely files:
178
+
179
+ ## Acceptance Criteria
180
+ - [ ] ...
181
+ - [ ] ...
182
+
183
+ ## Verification
184
+ - Unit:
185
+ - Integration:
186
+ - Browser/API:
187
+ - Log/metric/trace:
188
+
189
+ ## Out of Scope
190
+ - ...
191
+ ```
192
+
193
+ ## Sprint Contract
194
+
195
+ Use this when work spans multiple files, runtime behavior, or subjective quality.
196
+
197
+ ```markdown
198
+ # Sprint Contract
199
+
200
+ ## Scope
201
+ - Feature:
202
+ - User path:
203
+ - API/data path:
204
+ - Likely files/modules:
205
+
206
+ ## Done Means
207
+ - [ ] User can ...
208
+ - [ ] API or data reflects ...
209
+ - [ ] Error state handles ...
210
+ - [ ] No regression in ...
211
+
212
+ ## Verification
213
+ - Unit:
214
+ - Integration:
215
+ - Browser/API:
216
+ - Log/metric/trace:
217
+
218
+ ## Evaluator Focus
219
+ - Runtime behavior:
220
+ - Negative cases:
221
+ - UX or quality concerns:
222
+
223
+ ## Out of Scope
224
+ - ...
225
+ ```
226
+
227
+ If the sprint contract becomes longer than the work, split the work or fall back
228
+ to a smaller acceptance contract.
229
+
230
+ ## Evaluator Notes
231
+
232
+ Use this when generator self-review is not enough.
233
+
234
+ ```markdown
235
+ # Evaluator Notes
236
+
237
+ ## Contract
238
+ - Sprint:
239
+ - Expected behavior:
240
+
241
+ ## Checks Run
242
+ - Command/check:
243
+ - Result:
244
+ - Artifact:
245
+
246
+ ## Findings
247
+ - [ ] P0/P1/P2:
248
+ - Evidence:
249
+ - Repro:
250
+ - Suggested next step:
251
+
252
+ ## Verdict
253
+ - pass/fail:
254
+ - Reason:
255
+ ```
256
+
257
+ Evaluator feedback should cite observed evidence: screenshots, DOM state, API
258
+ response, database state, logs, traces, or command output.
259
+
260
+ ## Legibility Map
261
+
262
+ Use this when the agent cannot see enough runtime behavior.
263
+
264
+ ```markdown
265
+ # Legibility Map
266
+
267
+ | Area | Signal | How to collect | Owner/check |
268
+ | --- | --- | --- | --- |
269
+ | UI | Screenshot/DOM | | |
270
+ | API | Request/response | | |
271
+ | Backend runtime | Structured log/trace | | |
272
+ | Data | Schema/query/seed | | |
273
+ | Build | Build log/CI log | | |
274
+ | Architecture | Lint/structural test | | |
275
+ ```
276
+
277
+ ## Cleanup Task
278
+
279
+ Use this when agent throughput creates repeated drift.
280
+
281
+ ```markdown
282
+ # Cleanup Task
283
+
284
+ ## Trigger
285
+ - Repeated pattern:
286
+ - Evidence:
287
+
288
+ ## Scope
289
+ - Include:
290
+ - Exclude:
291
+
292
+ ## Acceptance Criteria
293
+ - [ ] ...
294
+
295
+ ## Verification
296
+ - Lint:
297
+ - Test:
298
+ - Smoke:
299
+
300
+ ## Rollback
301
+ - ...
302
+ ```