@glrs-dev/harness-plugin-opencode 1.2.0 → 2.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +36 -0
- package/README.md +68 -26
- package/dist/agents/prompts/pilot-assessor.md +77 -0
- package/dist/agents/prompts/pilot-builder.md +24 -116
- package/dist/agents/prompts/pilot-planner.md +38 -160
- package/dist/agents/prompts/pilot-scoper.md +58 -0
- package/dist/{chunk-BWERBERN.js → chunk-6CZPRUMJ.js} +12 -62
- package/dist/chunk-DZG4D3OH.js +54 -0
- package/dist/chunk-OYRKOEXK.js +88 -0
- package/dist/cli.js +1631 -4224
- package/dist/index.js +831 -166
- package/dist/{install-5JKWK6Z4.js → install-6775ZBDG.js} +1 -1
- package/dist/paths-WZ23ZQOV.js +18 -0
- package/package.json +1 -1
- package/dist/agents/prompts/pilot-builder.open.md +0 -129
- package/dist/chunk-57EOY72Y.js +0 -174
- package/dist/chunk-5TAMY7P6.js +0 -67
- package/dist/chunk-BKTFWXLG.js +0 -204
- package/dist/chunk-EK7K4NTV.js +0 -747
- package/dist/chunk-KB7M7JXU.js +0 -145
- package/dist/chunk-RNRCXQ65.js +0 -56
- package/dist/paths-LT3QQKCF.js +0 -18
- package/dist/pilot/mcp/status-server.d.ts +0 -1
- package/dist/pilot/mcp/status-server.js +0 -228
- package/dist/pilot-config-7LJZ23YK.js +0 -55
- package/dist/runs-QWPL3TKV.js +0 -18
- package/dist/safety-gate-WM3EWOCY.js +0 -10
- package/dist/setup-hook-FHTXMAQL.js +0 -88
- package/dist/skills/pilot-planning/SKILL.md +0 -80
- package/dist/skills/pilot-planning/rules/dag-shape.md +0 -47
- package/dist/skills/pilot-planning/rules/decomposition.md +0 -63
- package/dist/skills/pilot-planning/rules/first-principles.md +0 -29
- package/dist/skills/pilot-planning/rules/milestones.md +0 -57
- package/dist/skills/pilot-planning/rules/qa-expectations.md +0 -120
- package/dist/skills/pilot-planning/rules/self-review.md +0 -46
- package/dist/skills/pilot-planning/rules/task-context.md +0 -47
- package/dist/skills/pilot-planning/rules/touches-scope.md +0 -81
- package/dist/skills/pilot-planning/rules/verify-design.md +0 -163
- package/dist/tasks-KJ3WN2KY.js +0 -32
|
@@ -1,163 +0,0 @@
|
|
|
1
|
-
# Rule 3 — Verify-command design
|
|
2
|
-
|
|
3
|
-
**Each task's `verify:` commands must succeed iff the task is correctly done.**
|
|
4
|
-
|
|
5
|
-
The verify list is the contract between the planner and the builder. It is the ONLY signal pilot uses to decide "did this task work?". A weak verify means you're shipping work the run thinks is fine but really isn't. An over-broad verify means the task fails for reasons unrelated to the work — pre-existing test failures, missing infrastructure, flaky integration tests — and the agent wastes its retry budget on something it can't fix.
|
|
6
|
-
|
|
7
|
-
## The cardinal rule: verify ONLY what the task changed
|
|
8
|
-
|
|
9
|
-
A verify command must exercise **exactly the code the task produced** — no more, no less. If the task adds `src/entities/audit-log/schema.ts` and its test file, the verify is:
|
|
10
|
-
|
|
11
|
-
```yaml
|
|
12
|
-
verify:
|
|
13
|
-
- pnpm --filter @kn/core test -- --run src/entities/audit-log/__tests__/schema.test.ts
|
|
14
|
-
```
|
|
15
|
-
|
|
16
|
-
NOT:
|
|
17
|
-
|
|
18
|
-
```yaml
|
|
19
|
-
verify:
|
|
20
|
-
- pnpm --filter @kn/core test -- --run src/entities/audit-log
|
|
21
|
-
```
|
|
22
|
-
|
|
23
|
-
The second form runs EVERY test under that directory — including integration tests that need a running database, tests for pre-existing code the task didn't touch, and tests that may already be failing on the base branch. The agent cannot fix those failures. It will exhaust its retry budget and STOP.
|
|
24
|
-
|
|
25
|
-
**The verify command's scope must be as tight as the `touches:` scope.** If you wouldn't put a file in `touches:`, don't let the verify command exercise it.
|
|
26
|
-
|
|
27
|
-
## What a good verify looks like
|
|
28
|
-
|
|
29
|
-
- `pnpm test -- --run path/to/specific.test.ts` — runs ONE test file
|
|
30
|
-
- `bun test test/api/specific.test.ts` — same, bun flavor
|
|
31
|
-
- `bun run typecheck` — semantic check, catches real type failures (good as `verify_after_each`)
|
|
32
|
-
- `node scripts/check-schema.ts` — your own probe script (write it as part of the task)
|
|
33
|
-
- `grep -q 'export function newThing' src/file.ts && bun test test/file.test.ts` — existence + behavior
|
|
34
|
-
|
|
35
|
-
## What's not OK
|
|
36
|
-
|
|
37
|
-
- `echo done` — proves nothing
|
|
38
|
-
- `test -f src/foo.ts` — file existence is necessary but rarely sufficient
|
|
39
|
-
- `bun run build` ALONE — build success without tests means "TypeScript was happy"; insufficient for behavior tasks
|
|
40
|
-
- `pnpm test` (whole package) — pulls in every test in the package; pre-existing failures block the task
|
|
41
|
-
- `pnpm --filter @pkg test -- --run src/module` (directory-level) — same problem; runs integration tests the task didn't write
|
|
42
|
-
- `grep -q 'newFunction' src/file.ts` — proves text presence, not behavior
|
|
43
|
-
- `git diff --name-only | grep src/api` — proves edits happened, not that they're correct
|
|
44
|
-
|
|
45
|
-
## The pre-existing-failure trap
|
|
46
|
-
|
|
47
|
-
Pilot runs a **baseline check** before the agent starts: every verify command is executed on the clean tree. If ANY command fails in baseline, the task aborts immediately with a clear message:
|
|
48
|
-
|
|
49
|
-
> baseline verify failed: `pnpm --filter @kn/core test` → exit 1.
|
|
50
|
-
> This command fails on the clean tree BEFORE the agent starts —
|
|
51
|
-
> fix your environment or narrow the verify scope.
|
|
52
|
-
|
|
53
|
-
This prevents the agent from wasting its 5-attempt retry budget on failures it didn't cause and can't fix. The baseline is the planner's contract: "these commands WILL pass if the environment is set up correctly."
|
|
54
|
-
|
|
55
|
-
**If your verify command fails in baseline, the fix is one of:**
|
|
56
|
-
1. Start the missing infrastructure (the setup hook should handle this).
|
|
57
|
-
2. Narrow the verify to only the specific test file the task creates.
|
|
58
|
-
3. Fix the pre-existing test failure on the base branch first.
|
|
59
|
-
|
|
60
|
-
The agent gets 5 attempts (with escalating "try a different approach" nudges) for failures it introduces AFTER the baseline passes. Pre-existing failures never reach the agent.
|
|
61
|
-
|
|
62
|
-
## Milestone and defaults verify run in the baseline too
|
|
63
|
-
|
|
64
|
-
The baseline check doesn't only run task-specific verify commands — it runs **everything except** the task's own `verify:` list. That means:
|
|
65
|
-
|
|
66
|
-
- `defaults.verify_after_each` commands
|
|
67
|
-
- The task's milestone `verify` commands
|
|
68
|
-
- `pilot.json` `baseline` and `after_each` commands
|
|
69
|
-
|
|
70
|
-
These commands run on the clean tree **before every task in their scope**. If a milestone verify is `pnpm --filter @pkg test` and the first task in that milestone scaffolds the package with a test runner config but zero test files, the *second* task's baseline fails — vitest/jest exit 1 on "no test files found", and the entire downstream DAG cascades to failure.
|
|
71
|
-
|
|
72
|
-
**The rule: every milestone and defaults verify command must pass at every point in the DAG where it applies — including immediately after scaffold tasks that create zero test files.**
|
|
73
|
-
|
|
74
|
-
### The empty-test-suite trap
|
|
75
|
-
|
|
76
|
-
Test runners treat "no test files found" as a failure by default:
|
|
77
|
-
|
|
78
|
-
| Runner | Behavior on zero tests | Fix |
|
|
79
|
-
|---|---|---|
|
|
80
|
-
| vitest | exit 1 | `--passWithNoTests` |
|
|
81
|
-
| jest | exit 1 | `--passWithNoTests` |
|
|
82
|
-
| bun test | exit 0 (safe by default) | — |
|
|
83
|
-
|
|
84
|
-
When a plan scaffolds a new package or module, the scaffold task creates the test runner config but typically no test files — the first real task creates those. Any milestone or defaults verify that runs the package's test suite will hit the empty-suite exit code.
|
|
85
|
-
|
|
86
|
-
**Fix: always use `--passWithNoTests` (or equivalent) on milestone and defaults verify commands that run a test suite.** This is not a weakening of the verify — it's acknowledging that "zero tests, zero failures" is a valid baseline state for a package under construction.
|
|
87
|
-
|
|
88
|
-
```yaml
|
|
89
|
-
# WRONG — fails baseline after scaffold task
|
|
90
|
-
milestones:
|
|
91
|
-
- name: M1-ENGINE
|
|
92
|
-
verify:
|
|
93
|
-
- pnpm --filter @pkg test
|
|
94
|
-
|
|
95
|
-
# RIGHT — tolerates the empty state between scaffold and first real task
|
|
96
|
-
milestones:
|
|
97
|
-
- name: M1-ENGINE
|
|
98
|
-
verify:
|
|
99
|
-
- pnpm --filter @pkg test -- --passWithNoTests
|
|
100
|
-
```
|
|
101
|
-
|
|
102
|
-
Task-specific verify does NOT need `--passWithNoTests` — it targets the exact test file the task creates, and the baseline excludes task-specific verify commands (they'd fail before the task runs by design — that's TDD).
|
|
103
|
-
|
|
104
|
-
## Two-tier verify
|
|
105
|
-
|
|
106
|
-
Use BOTH a per-task verify and `defaults.verify_after_each`:
|
|
107
|
-
|
|
108
|
-
```yaml
|
|
109
|
-
defaults:
|
|
110
|
-
verify_after_each:
|
|
111
|
-
- bun run typecheck # always must pass — catches cross-file breakage
|
|
112
|
-
tasks:
|
|
113
|
-
- id: T1
|
|
114
|
-
verify:
|
|
115
|
-
- bun test test/api/create-rule.test.ts # task-specific behavior proof
|
|
116
|
-
```
|
|
117
|
-
|
|
118
|
-
`verify_after_each` catches global breakage (a syntax error in a file the task didn't even touch); per-task verify catches task-specific behavior. Together they form a tight net without over-reaching.
|
|
119
|
-
|
|
120
|
-
## Touches and verify must agree
|
|
121
|
-
|
|
122
|
-
If the task `touches: [src/api/rules.ts, test/api/rules.test.ts]` but the verify command runs `bun test test/web/`, you have a wrong scope. The verify must exercise files in the touched scope — and ONLY those files.
|
|
123
|
-
|
|
124
|
-
Conversely: if the verify runs `test/api/rules.test.ts` but `touches:` doesn't include `test/api/rules.test.ts`, the agent can't create or edit that test file. Both must agree.
|
|
125
|
-
|
|
126
|
-
## Verify must be deterministic and self-contained
|
|
127
|
-
|
|
128
|
-
- No `sleep` to wait for a service that may not start.
|
|
129
|
-
- No external network calls that could flake — mock or skip.
|
|
130
|
-
- No dependency on infrastructure the setup hook didn't start. If the verify needs postgres, the setup hook must start it. If the verify needs an API server, the setup hook must start it.
|
|
131
|
-
- No dependency on other tasks' output being committed (use `depends_on` to sequence).
|
|
132
|
-
|
|
133
|
-
If a verify command flakes, three retries will exhaust attempts and the task fails for environmental reasons. Pilot has no way to distinguish "real failure" from "flake".
|
|
134
|
-
|
|
135
|
-
## Always include a "before" check
|
|
136
|
-
|
|
137
|
-
For non-trivial tasks, write a verify that would HAVE FAILED before the task ran. This makes the task's value observable. If the verify passed before AND passes after, the task didn't actually move the system.
|
|
138
|
-
|
|
139
|
-
Good pattern: the test file the agent creates IS the "before" check — it didn't exist before, so `bun test path/to/new.test.ts` would have failed (file not found). After the task, it exists and passes.
|
|
140
|
-
|
|
141
|
-
## Port and environment awareness
|
|
142
|
-
|
|
143
|
-
If the setup hook starts services on non-default ports (to avoid collisions with the user's dev stack), verify commands must use those ports. Two patterns:
|
|
144
|
-
|
|
145
|
-
**A. Source the env file the hook wrote:**
|
|
146
|
-
```yaml
|
|
147
|
-
verify:
|
|
148
|
-
- bash -c 'source .env.pilot && pnpm --filter @pkg test -- --run path/to/test.ts'
|
|
149
|
-
```
|
|
150
|
-
|
|
151
|
-
**B. Use `defaults.verify_after_each` for the env-sourcing wrapper:**
|
|
152
|
-
```yaml
|
|
153
|
-
defaults:
|
|
154
|
-
verify_after_each:
|
|
155
|
-
- bash -c 'source .env.pilot && bun run typecheck'
|
|
156
|
-
```
|
|
157
|
-
|
|
158
|
-
**C. Tests read from `process.env` at runtime** (best — no wrapper needed):
|
|
159
|
-
If the test framework reads `DATABASE_URL` from the environment, and the setup hook exports it, the verify command just works. This is the cleanest pattern.
|
|
160
|
-
|
|
161
|
-
## Cross-reference: per-surface tooling menu
|
|
162
|
-
|
|
163
|
-
For the per-surface tooling menu (Playwright for UI, curl for API, Postgres for DB), see rule 9 (`qa-expectations.md`). That rule applies these principles to specific tools; this rule defines the principles themselves.
|
package/dist/tasks-KJ3WN2KY.js
DELETED
|
@@ -1,32 +0,0 @@
|
|
|
1
|
-
import {
|
|
2
|
-
countByStatus,
|
|
3
|
-
getTask,
|
|
4
|
-
listTasks,
|
|
5
|
-
markAborted,
|
|
6
|
-
markBlocked,
|
|
7
|
-
markFailed,
|
|
8
|
-
markPending,
|
|
9
|
-
markReady,
|
|
10
|
-
markRunning,
|
|
11
|
-
markSucceeded,
|
|
12
|
-
readyTasks,
|
|
13
|
-
resetTasksForResume,
|
|
14
|
-
setCostUsd,
|
|
15
|
-
upsertFromPlan
|
|
16
|
-
} from "./chunk-57EOY72Y.js";
|
|
17
|
-
export {
|
|
18
|
-
countByStatus,
|
|
19
|
-
getTask,
|
|
20
|
-
listTasks,
|
|
21
|
-
markAborted,
|
|
22
|
-
markBlocked,
|
|
23
|
-
markFailed,
|
|
24
|
-
markPending,
|
|
25
|
-
markReady,
|
|
26
|
-
markRunning,
|
|
27
|
-
markSucceeded,
|
|
28
|
-
readyTasks,
|
|
29
|
-
resetTasksForResume,
|
|
30
|
-
setCostUsd,
|
|
31
|
-
upsertFromPlan
|
|
32
|
-
};
|