devlyn-cli 1.15.0 → 2.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +104 -0
- package/CLAUDE.md +135 -21
- package/README.md +43 -125
- package/benchmark/auto-resolve/BENCHMARK-DESIGN.md +272 -0
- package/benchmark/auto-resolve/README.md +114 -0
- package/benchmark/auto-resolve/RUBRIC.md +162 -0
- package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/NOTES.md +30 -0
- package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/expected.json +68 -0
- package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/metadata.json +10 -0
- package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/setup.sh +4 -0
- package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/spec.md +45 -0
- package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/task.txt +8 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/NOTES.md +54 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected-pair-plan-registry.json +170 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected.json +84 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/metadata.json +21 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/pair-plan.sample-fail.json +214 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/pair-plan.sample-pass.json +223 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/setup.sh +5 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/spec.md +56 -0
- package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/task.txt +14 -0
- package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/NOTES.md +28 -0
- package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected-pair-plan-registry.json +162 -0
- package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected.json +65 -0
- package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/metadata.json +19 -0
- package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/setup.sh +4 -0
- package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/spec.md +56 -0
- package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/task.txt +9 -0
- package/benchmark/auto-resolve/fixtures/F4-web-browser-design/NOTES.md +40 -0
- package/benchmark/auto-resolve/fixtures/F4-web-browser-design/expected.json +57 -0
- package/benchmark/auto-resolve/fixtures/F4-web-browser-design/metadata.json +10 -0
- package/benchmark/auto-resolve/fixtures/F4-web-browser-design/setup.sh +6 -0
- package/benchmark/auto-resolve/fixtures/F4-web-browser-design/spec.md +49 -0
- package/benchmark/auto-resolve/fixtures/F4-web-browser-design/task.txt +9 -0
- package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/NOTES.md +38 -0
- package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/expected.json +65 -0
- package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/metadata.json +10 -0
- package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/setup.sh +55 -0
- package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/spec.md +49 -0
- package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/task.txt +7 -0
- package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/NOTES.md +38 -0
- package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/expected.json +77 -0
- package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/metadata.json +10 -0
- package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/setup.sh +4 -0
- package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/spec.md +49 -0
- package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/task.txt +10 -0
- package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/NOTES.md +50 -0
- package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/expected.json +76 -0
- package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/metadata.json +10 -0
- package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/setup.sh +36 -0
- package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/spec.md +46 -0
- package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/task.txt +7 -0
- package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/NOTES.md +50 -0
- package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/expected.json +63 -0
- package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/metadata.json +10 -0
- package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/setup.sh +4 -0
- package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/spec.md +48 -0
- package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/task.txt +1 -0
- package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/NOTES.md +93 -0
- package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/expected.json +74 -0
- package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/metadata.json +10 -0
- package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/setup.sh +28 -0
- package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/spec.md +62 -0
- package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/task.txt +5 -0
- package/benchmark/auto-resolve/fixtures/SCHEMA.md +130 -0
- package/benchmark/auto-resolve/fixtures/test-repo/README.md +27 -0
- package/benchmark/auto-resolve/fixtures/test-repo/bin/cli.js +63 -0
- package/benchmark/auto-resolve/fixtures/test-repo/package-lock.json +823 -0
- package/benchmark/auto-resolve/fixtures/test-repo/package.json +22 -0
- package/benchmark/auto-resolve/fixtures/test-repo/playwright.config.js +17 -0
- package/benchmark/auto-resolve/fixtures/test-repo/server/index.js +37 -0
- package/benchmark/auto-resolve/fixtures/test-repo/tests/cli.test.js +25 -0
- package/benchmark/auto-resolve/fixtures/test-repo/tests/server.test.js +58 -0
- package/benchmark/auto-resolve/fixtures/test-repo/web/index.html +37 -0
- package/benchmark/auto-resolve/scripts/build-pair-eligible-manifest.py +174 -0
- package/benchmark/auto-resolve/scripts/check-f9-artifacts.py +256 -0
- package/benchmark/auto-resolve/scripts/compile-report.py +331 -0
- package/benchmark/auto-resolve/scripts/iter-0033c-compare.py +552 -0
- package/benchmark/auto-resolve/scripts/judge-opus-pass.sh +430 -0
- package/benchmark/auto-resolve/scripts/judge.sh +359 -0
- package/benchmark/auto-resolve/scripts/oracle-scope-tier-a.py +260 -0
- package/benchmark/auto-resolve/scripts/oracle-scope-tier-b.py +274 -0
- package/benchmark/auto-resolve/scripts/oracle-test-fidelity.py +328 -0
- package/benchmark/auto-resolve/scripts/pair-plan-idgen.py +401 -0
- package/benchmark/auto-resolve/scripts/pair-plan-lint.py +468 -0
- package/benchmark/auto-resolve/scripts/run-fixture.sh +691 -0
- package/benchmark/auto-resolve/scripts/run-iter-0033c.sh +234 -0
- package/benchmark/auto-resolve/scripts/run-suite.sh +214 -0
- package/benchmark/auto-resolve/scripts/ship-gate.py +222 -0
- package/bin/devlyn.js +175 -17
- package/config/skills/_shared/adapters/README.md +64 -0
- package/config/skills/_shared/adapters/gpt-5-5.md +29 -0
- package/config/skills/_shared/adapters/opus-4-7.md +29 -0
- package/config/skills/{devlyn:auto-resolve/scripts → _shared}/archive_run.py +26 -0
- package/config/skills/_shared/codex-config.md +54 -0
- package/config/skills/_shared/codex-monitored.sh +141 -0
- package/config/skills/_shared/engine-preflight.md +35 -0
- package/config/skills/_shared/expected.schema.json +93 -0
- package/config/skills/_shared/pair-plan-schema.md +298 -0
- package/config/skills/_shared/runtime-principles.md +110 -0
- package/config/skills/_shared/spec-verify-check.py +519 -0
- package/config/skills/devlyn:ideate/SKILL.md +99 -429
- package/config/skills/devlyn:ideate/references/elicitation.md +97 -0
- package/config/skills/devlyn:ideate/references/from-spec-mode.md +54 -0
- package/config/skills/devlyn:ideate/references/project-mode.md +76 -0
- package/config/skills/devlyn:ideate/references/spec-template.md +102 -0
- package/config/skills/devlyn:resolve/SKILL.md +172 -184
- package/config/skills/devlyn:resolve/references/free-form-mode.md +68 -0
- package/config/skills/devlyn:resolve/references/phases/build-gate.md +45 -0
- package/config/skills/devlyn:resolve/references/phases/cleanup.md +39 -0
- package/config/skills/devlyn:resolve/references/phases/implement.md +42 -0
- package/config/skills/devlyn:resolve/references/phases/plan.md +42 -0
- package/config/skills/devlyn:resolve/references/phases/verify.md +69 -0
- package/config/skills/devlyn:resolve/references/state-schema.md +106 -0
- package/{config/skills → optional-skills}/devlyn:design-system/SKILL.md +1 -0
- package/{config/skills → optional-skills}/devlyn:reap/SKILL.md +1 -0
- package/{config/skills → optional-skills}/devlyn:team-design-ui/SKILL.md +5 -0
- package/package.json +12 -2
- package/scripts/lint-skills.sh +431 -0
- package/config/skills/devlyn:auto-resolve/SKILL.md +0 -252
- package/config/skills/devlyn:auto-resolve/evals/evals.json +0 -21
- package/config/skills/devlyn:auto-resolve/evals/task-doctor-subcommand.md +0 -42
- package/config/skills/devlyn:auto-resolve/references/build-gate.md +0 -130
- package/config/skills/devlyn:auto-resolve/references/engine-routing.md +0 -82
- package/config/skills/devlyn:auto-resolve/references/findings-schema.md +0 -103
- package/config/skills/devlyn:auto-resolve/references/phases/phase-1-build.md +0 -54
- package/config/skills/devlyn:auto-resolve/references/phases/phase-2-evaluate.md +0 -45
- package/config/skills/devlyn:auto-resolve/references/phases/phase-3-critic.md +0 -84
- package/config/skills/devlyn:auto-resolve/references/pipeline-routing.md +0 -114
- package/config/skills/devlyn:auto-resolve/references/pipeline-state.md +0 -201
- package/config/skills/devlyn:auto-resolve/scripts/terminal_verdict.py +0 -96
- package/config/skills/devlyn:browser-validate/SKILL.md +0 -164
- package/config/skills/devlyn:browser-validate/references/flow-testing.md +0 -118
- package/config/skills/devlyn:browser-validate/references/tier1-chrome.md +0 -137
- package/config/skills/devlyn:browser-validate/references/tier2-playwright.md +0 -195
- package/config/skills/devlyn:browser-validate/references/tier3-curl.md +0 -57
- package/config/skills/devlyn:clean/SKILL.md +0 -285
- package/config/skills/devlyn:design-ui/SKILL.md +0 -351
- package/config/skills/devlyn:discover-product/SKILL.md +0 -124
- package/config/skills/devlyn:evaluate/SKILL.md +0 -564
- package/config/skills/devlyn:feature-spec/SKILL.md +0 -630
- package/config/skills/devlyn:ideate/references/challenge-rubric.md +0 -122
- package/config/skills/devlyn:ideate/references/codex-critic-template.md +0 -42
- package/config/skills/devlyn:ideate/references/templates/item-spec.md +0 -90
- package/config/skills/devlyn:implement-ui/SKILL.md +0 -466
- package/config/skills/devlyn:preflight/SKILL.md +0 -355
- package/config/skills/devlyn:preflight/references/auditors/browser-auditor.md +0 -32
- package/config/skills/devlyn:preflight/references/auditors/code-auditor.md +0 -86
- package/config/skills/devlyn:preflight/references/auditors/docs-auditor.md +0 -38
- package/config/skills/devlyn:product-spec/SKILL.md +0 -603
- package/config/skills/devlyn:recommend-features/SKILL.md +0 -286
- package/config/skills/devlyn:review/SKILL.md +0 -161
- package/config/skills/devlyn:team-resolve/SKILL.md +0 -631
- package/config/skills/devlyn:team-review/SKILL.md +0 -493
- package/config/skills/devlyn:update-docs/SKILL.md +0 -463
- package/config/skills/workflow-routing/SKILL.md +0 -73
- /package/{config/skills → optional-skills}/devlyn:reap/scripts/reap.sh +0 -0
- /package/{config/skills → optional-skills}/devlyn:reap/scripts/scan.sh +0 -0
|
@@ -1,187 +1,175 @@
|
|
|
1
|
-
|
|
2
|
-
|
|
3
|
-
|
|
1
|
+
---
|
|
2
|
+
name: devlyn:resolve
|
|
3
|
+
description: Hands-free pipeline for any coding task — bug fix, feature, refactor, debug, modify, PR review. Free-form goal or formal spec input. Plan → Implement → Build-gate → Cleanup → Verify (fresh subagent, findings-only). Mechanical-first verification; pair-mode optional in Verify. Use when the user says "resolve this", "fix this", "implement this", "refactor this", "debug this", "review this PR", or wants hands-off completion.
|
|
4
|
+
---
|
|
4
5
|
|
|
5
|
-
|
|
6
|
+
Orchestrator for the 2-skill harness pipeline. One subagent per phase; file-based handoff via `.devlyn/pipeline.state.json`. VERIFY spawns a fresh-context subagent so independence is structural — not advisory.
|
|
6
7
|
|
|
7
|
-
<
|
|
8
|
+
<pipeline_config>
|
|
8
9
|
$ARGUMENTS
|
|
9
|
-
</
|
|
10
|
-
|
|
11
|
-
<
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
-
|
|
59
|
-
-
|
|
60
|
-
-
|
|
61
|
-
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
<
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
-
|
|
117
|
-
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
**
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
-
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
- Why 3: Try/catch at line 72 catches the error but doesn't run cleanup in finally block
|
|
176
|
-
- Root cause: Missing `finally` block for connection cleanup in error path
|
|
177
|
-
- Fix: Move `connection.release()` to `finally` block, add test simulating query failure
|
|
178
|
-
|
|
179
|
-
</examples>
|
|
180
|
-
|
|
181
|
-
<next_steps>
|
|
182
|
-
1. If Complexity is "Multiple files" or "Architectural change" → enter plan mode immediately
|
|
183
|
-
2. In plan mode, present fix options if multiple valid solutions exist
|
|
184
|
-
3. Write failing test before implementing
|
|
185
|
-
4. Only mark complete after full test suite passes
|
|
186
|
-
5. If stuck after 2 hypothesis attempts → escalate to `/devlyn:team-resolve`
|
|
187
|
-
</next_steps>
|
|
10
|
+
</pipeline_config>
|
|
11
|
+
|
|
12
|
+
<orchestrator_context>
|
|
13
|
+
Long-horizon agentic work; context auto-compacts. State lives in `.devlyn/pipeline.state.json` — the single authoritative verdict source. Schemas in `references/state-schema.md`. Best at `xhigh` effort.
|
|
14
|
+
</orchestrator_context>
|
|
15
|
+
|
|
16
|
+
<autonomy_contract>
|
|
17
|
+
Hands-free. Measured by how far we get without human intervention.
|
|
18
|
+
|
|
19
|
+
1. Do not prompt the user mid-pipeline. When tempted to ask, pick the safe default, proceed, and log it in the final report.
|
|
20
|
+
2. Codex availability: on `--engine auto`/`codex`, follow `_shared/engine-preflight.md`. On failure, silently fall back to Claude and log `engine downgraded: codex-unavailable` in the final report.
|
|
21
|
+
3. Phases run in declared order. No extra phases.
|
|
22
|
+
4. Orchestrator does not write code. It parses input, spawns phases, reads state, branches on verdicts, emits the report.
|
|
23
|
+
5. Continue by default. Halt only on (a) unrecoverable subagent failure, (b) IMPLEMENT producing zero code changes, (c) BUILD_GATE or VERIFY fix-loop exhausting `max_rounds`.
|
|
24
|
+
</autonomy_contract>
|
|
25
|
+
|
|
26
|
+
<harness_principles>
|
|
27
|
+
Every phase reads `_shared/runtime-principles.md` (Subtractive-first / Goal-locked / No-workaround / Evidence). Codex routings receive the contract excerpt inlined in their prompt body.
|
|
28
|
+
</harness_principles>
|
|
29
|
+
|
|
30
|
+
<engine_routing>
|
|
31
|
+
Each phase routes to an engine and prepends the per-engine adapter header from `_shared/adapters/<model>.md` to the canonical phase body. Adapter is the per-model delta (Anthropic Opus 4.7 guide for Claude, OpenAI GPT-5.5 guide for Codex). Canonical body is engine-agnostic.
|
|
32
|
+
|
|
33
|
+
- Claude phases: spawn `Agent` (`mode: "bypassPermissions"`); prompt = adapter-header + canonical-body + task-context.
|
|
34
|
+
- Codex phases: shell out via `bash _shared/codex-monitored.sh` with the same compounded prompt. The wrapper closes stdin and emits a heartbeat. No MCP.
|
|
35
|
+
- Default engine: Claude. `--engine codex` routes IMPLEMENT to Codex; orchestration stays Claude. Pair-mode (only in VERIFY/JUDGE) selects a different engine for the fresh subagent than IMPLEMENT used.
|
|
36
|
+
- Multi-LLM evolution: when a new model adapter ships in `_shared/adapters/`, that engine becomes selectable via `--engine <model>` without further skill changes (NORTH-STAR.md "Multi-LLM evolution direction").
|
|
37
|
+
</engine_routing>
|
|
38
|
+
|
|
39
|
+
<modes>
|
|
40
|
+
Three input shapes:
|
|
41
|
+
|
|
42
|
+
1. **Free-form**: `/devlyn:resolve "fix the login bug"`. PHASE 0 runs the complexity classifier and either proceeds with an internal mini-spec (trivial), drafts focused questions for in-prompt resolution (medium), or escalates to `/devlyn:ideate` (large/ambiguous). No mid-pipeline prompts in any branch.
|
|
43
|
+
2. **Spec**: `/devlyn:resolve --spec docs/roadmap/phase-N/X.md`. Spec is read-only. Verification commands pre-staged from spec's `## Verification` block.
|
|
44
|
+
3. **Verify-only**: `/devlyn:resolve --verify-only <diff-or-PR-ref> --spec <path>`. Skips PHASE 1-4. Runs PHASE 5 (VERIFY) on the supplied diff against the spec.
|
|
45
|
+
</modes>
|
|
46
|
+
|
|
47
|
+
<post_implement_invariant>
|
|
48
|
+
Once `state.implement_passed_sha` is non-null (PHASE 2 returned and produced a diff), the post-IMPLEMENT phases (CLEANUP, VERIFY) operate under structural constraints:
|
|
49
|
+
|
|
50
|
+
- CLEANUP may only mutate files in the cleanup allowlist (tooling artifacts, dead code added by this diff, doc references this diff invalidated). Other paths trigger revert.
|
|
51
|
+
- VERIFY runs in a fresh subagent context with no code-mutation tools. Findings only — never edits files. The fresh-context spawn is the structural guarantee; the prompt body reinforces it but the spawn is what makes independence real.
|
|
52
|
+
</post_implement_invariant>
|
|
53
|
+
|
|
54
|
+
## PHASE 0: PARSE + CLASSIFY + ROUTE
|
|
55
|
+
|
|
56
|
+
1. Parse flags from `<pipeline_config>`:
|
|
57
|
+
- `--max-rounds N` (default 4) — fix-loop budget shared across BUILD_GATE and VERIFY.
|
|
58
|
+
- `--engine MODE` (default `claude`) — picks the adapter for IMPLEMENT and CLEANUP.
|
|
59
|
+
- `--spec <path>` — switches to spec mode.
|
|
60
|
+
- `--verify-only <ref>` — switches to verify-only mode. Requires `--spec`.
|
|
61
|
+
- `--pair-verify` — force pair-mode JUDGE in PHASE 5 even when not auto-triggered.
|
|
62
|
+
- `--bypass <phase>[,...]` — skip specific phases. Valid: `build-gate`, `cleanup`. PLAN, IMPLEMENT, VERIFY are non-bypassable.
|
|
63
|
+
- `--perf` — opt in to per-phase timing.
|
|
64
|
+
|
|
65
|
+
2. Engine pre-flight: follow `_shared/engine-preflight.md`. The downgrade banner surfaces in the final report.
|
|
66
|
+
|
|
67
|
+
3. Initialize `.devlyn/pipeline.state.json` per `references/state-schema.md`. Set `state.run_id`, `started_at`, `engine`, `base_ref.{branch, sha}`, `rounds.{max_rounds, global: 0}`, `bypasses`, empty `phases`, empty `criteria`.
|
|
68
|
+
|
|
69
|
+
4. **Mode-specific init**:
|
|
70
|
+
- **Free-form**: read `references/free-form-mode.md`. Run the complexity classifier deterministically (rules over keyword density / file count / spec-shape signals). Set `state.complexity ∈ {trivial, medium, large}`. Trivial: write internal mini-spec to `.devlyn/criteria.generated.md` and proceed. Medium: synthesize a minimal spec from the goal + add 1-2 context anchors from the codebase, write to `.devlyn/criteria.generated.md`, proceed. Large: log `recommend: /devlyn:ideate first` in the final report and either halt (default) or proceed with assumed defaults if `--continue-on-large` flag set.
|
|
71
|
+
- **Spec**: validate spec exists + `## Verification` block parses (run `python3 .claude/skills/_shared/spec-verify-check.py --check <spec-path>` to validate carrier shape). Compute `state.source.spec_sha256`. Stage `.devlyn/spec-verify.json` from the spec's verification block.
|
|
72
|
+
- **Verify-only**: skip to PHASE 5 with `state.source.spec_path` set, the supplied diff captured at `.devlyn/external-diff.patch`.
|
|
73
|
+
|
|
74
|
+
5. Announce one line: `resolve starting — run <run_id> — engine <engine> — mode <mode> — complexity <complexity-or-na>`.
|
|
75
|
+
|
|
76
|
+
## PHASE 1: PLAN
|
|
77
|
+
|
|
78
|
+
Skip in verify-only mode. The heaviest phase by design — spec/criteria define non-negotiable invariants; plan formalizes how the implementation hits them.
|
|
79
|
+
|
|
80
|
+
Engine: Claude (PLAN-pair is **unmeasured at HEAD** — iter-0033d is the first L1-vs-L2 measurement; iter-0020 falsified Codex-BUILD/IMPLEMENT, NOT PLAN-pair). Prompt body: `references/phases/plan.md`.
|
|
81
|
+
|
|
82
|
+
Subagent output (writes `.devlyn/plan.md`): file list to touch, risk list (out-of-scope expansions, ambiguous spec sections), acceptance restatement (what `## Verification` actually requires verbatim).
|
|
83
|
+
|
|
84
|
+
State write: `phases.plan.{started_at, verdict, completed_at, duration_ms}`.
|
|
85
|
+
|
|
86
|
+
After return:
|
|
87
|
+
1. If `.devlyn/plan.md` lists zero files → halt with verdict `BLOCKED:plan-empty`.
|
|
88
|
+
2. If risk list flags an out-of-scope expansion the user did not authorize → re-spawn once with the reminder; second fail → halt.
|
|
89
|
+
|
|
90
|
+
## PHASE 2: IMPLEMENT
|
|
91
|
+
|
|
92
|
+
Skip in verify-only mode. Constrained design judgment within PLAN's invariants. Writes code, tests, and inline doc-comments. No standalone DOCS phase — what the spec licenses is updated here, what it does not is out of scope.
|
|
93
|
+
|
|
94
|
+
Engine: per `--engine`. Prompt body: `references/phases/implement.md`.
|
|
95
|
+
|
|
96
|
+
State write: `phases.implement.{started_at, verdict, completed_at, duration_ms}`.
|
|
97
|
+
|
|
98
|
+
After return:
|
|
99
|
+
1. `git diff --stat` — empty diff → halt with `BLOCKED:implement-empty`.
|
|
100
|
+
2. Set `state.implement_passed_sha = git rev-parse HEAD` (activates `<post_implement_invariant>`).
|
|
101
|
+
3. Checkpoint: `git add -A && git commit -m "chore(pipeline): implement"`.
|
|
102
|
+
|
|
103
|
+
## PHASE 3: BUILD_GATE
|
|
104
|
+
|
|
105
|
+
Skip in verify-only mode OR when `build-gate` in `state.bypasses`. Deterministic — same commands CI / Docker / production run.
|
|
106
|
+
|
|
107
|
+
Spawn Claude `Agent` (`mode: "bypassPermissions"`) with prompt body `references/phases/build-gate.md`. The agent:
|
|
108
|
+
1. Detects language/framework via project files (`package.json`, `pyproject.toml`, etc.).
|
|
109
|
+
2. Runs language-specific gates (tsc / lint / test).
|
|
110
|
+
3. Always runs `python3 .claude/skills/_shared/spec-verify-check.py` (verification_commands literal-match).
|
|
111
|
+
4. If `spec.expected.json.browser_flows` declared OR diff touches web-surface files: invokes the browser runner (Chrome MCP → Playwright → curl tier as available).
|
|
112
|
+
5. Emits `.devlyn/build_gate.findings.jsonl` + `.devlyn/build_gate.log.md`.
|
|
113
|
+
|
|
114
|
+
State write: `phases.build_gate.{started_at, verdict, completed_at, duration_ms, artifacts}`.
|
|
115
|
+
|
|
116
|
+
Branch:
|
|
117
|
+
- `PASS` → PHASE 4.
|
|
118
|
+
- `FAIL` → fix loop. Spawn IMPLEMENT-engine agent with the build_gate findings as input. Increment `state.rounds.global`. On second FAIL with `state.rounds.global >= state.rounds.max_rounds` → halt with verdict `BLOCKED:build-gate-exhausted`.
|
|
119
|
+
|
|
120
|
+
## PHASE 4: CLEANUP
|
|
121
|
+
|
|
122
|
+
Skip if `cleanup` in `state.bypasses`. Task-scoped pass.
|
|
123
|
+
|
|
124
|
+
Engine: per `--engine`. Prompt body: `references/phases/cleanup.md`. Allowlist enforced post-spawn:
|
|
125
|
+
- Tooling artifacts the spec did not list as deliverables (`test-results/`, `playwright-report/`, `.last-run.json`, coverage HTML).
|
|
126
|
+
- Dead code added by this diff (not pre-existing dead code).
|
|
127
|
+
- Doc references whose target this diff renamed or removed.
|
|
128
|
+
|
|
129
|
+
Before spawn: capture `state.phases.cleanup.pre_sha = git rev-parse HEAD`.
|
|
130
|
+
|
|
131
|
+
State write: `phases.cleanup.{started_at, verdict, completed_at, duration_ms}`.
|
|
132
|
+
|
|
133
|
+
After return:
|
|
134
|
+
1. Run `git diff --name-only <pre_sha>` — any path outside the cleanup allowlist → revert to `pre_sha` and emit `invariant.cleanup-out-of-scope` finding into `.devlyn/cleanup.findings.jsonl`.
|
|
135
|
+
2. If allowlist honored and diff non-empty: `git add -A && git commit -m "chore(pipeline): cleanup"`.
|
|
136
|
+
|
|
137
|
+
## PHASE 5: VERIFY (fresh subagent, findings-only)
|
|
138
|
+
|
|
139
|
+
Independent quality layer. **Spawned with empty conversation context** — no carry-over from PHASE 1-4. Inputs limited to `spec.md` (or `.devlyn/criteria.generated.md`), `spec.expected.json`, the cumulative diff, and the spec hash. The fresh-context spawn is the structural guarantee of independence; the prompt body reinforces it.
|
|
140
|
+
|
|
141
|
+
Two sub-phases:
|
|
142
|
+
|
|
143
|
+
1. **MECHANICAL** (deterministic): re-run `python3 .claude/skills/_shared/spec-verify-check.py` against the post-CLEANUP code (independent of BUILD_GATE's earlier run). Re-scan `spec.expected.json.forbidden_patterns` against the diff. Re-check `required_files` and `forbidden_files`. Emit `.devlyn/verify-mechanical.findings.jsonl`.
|
|
144
|
+
|
|
145
|
+
2. **JUDGE** (fresh-context Agent): grade the diff against the spec on rubric axes (spec compliance, scope, quality, consistency). Default engine = same as IMPLEMENT (solo). Pair-mode (cross-model JUDGE) fires when:
|
|
146
|
+
- `--pair-verify` flag set, OR
|
|
147
|
+
- MECHANICAL emits findings flagged `severity: warning` (not disqualifier — those route to fix loop directly), OR
|
|
148
|
+
- `state.verify.coverage_failed == true` (judge could not exercise a required spec axis from available evidence).
|
|
149
|
+
|
|
150
|
+
Pair-mode JUDGE: spawn a second Agent with the OTHER engine's adapter; both judgments merge with the rule "any HIGH/CRITICAL finding either model surfaces is the verdict-binding finding." Cross-model disagreement on lower-severity findings is logged but does not change the verdict.
|
|
151
|
+
|
|
152
|
+
Findings written to `.devlyn/verify.findings.jsonl`. **VERIFY agents have no code-mutation tools.** State write: `phases.verify.{started_at, verdict, completed_at, duration_ms, sub_verdicts: {mechanical, judge, pair_judge?}, artifacts}`.
|
|
153
|
+
|
|
154
|
+
Branch:
|
|
155
|
+
- `PASS` → PHASE 6.
|
|
156
|
+
- `PASS_WITH_ISSUES` (LOW severity only) → PHASE 6 with banner.
|
|
157
|
+
- `NEEDS_WORK` / `BLOCKED` → fix loop with `triggered_by: "verify"`. Spawn IMPLEMENT-engine agent with the verify findings; increment `state.rounds.global`. Second `NEEDS_WORK` → halt with verdict `BLOCKED:verify-exhausted`.
|
|
158
|
+
|
|
159
|
+
## PHASE 6: FINAL REPORT + ARCHIVE
|
|
160
|
+
|
|
161
|
+
State write: `phases.final_report.started_at` at the top of this phase.
|
|
162
|
+
|
|
163
|
+
1. **Terminal verdict** — derive from `state.phases.{plan, implement, build_gate, cleanup, verify}.verdict` per the precedence rules in `references/state-schema.md#terminal-verdict`. Verify-only mode short-circuits to `state.phases.verify.verdict`.
|
|
164
|
+
|
|
165
|
+
2. **Render report** — sections: header (run_id, engine, mode, verdict, wall-time), per-phase summary, findings table (verify findings only — post-IMPLEMENT phases are findings-only), follow-up notes (any `--continue-on-large` assumptions, any silent fallbacks).
|
|
166
|
+
|
|
167
|
+
3. State write: `phases.final_report.{verdict, completed_at, duration_ms}` BEFORE archive runs (archive prune logic skips runs whose `final_report.verdict` is null).
|
|
168
|
+
|
|
169
|
+
4. **Archive** — invoke the deterministic script: `python3 .claude/skills/_shared/archive_run.py`. The script reads `run_id` from `.devlyn/pipeline.state.json`, moves per-run artifacts (state.json + `*.findings.jsonl` + `*.log.md` + `fix-batch.round-*.json` + `criteria.generated.md` + `spec-verify*.json` + `spec-verify-findings.jsonl`) into `.devlyn/runs/<run_id>/`, then best-effort prunes to last 10 completed runs. Archive must run; running this step as deterministic-script-not-prose ensures the move actually happens (iter-0033a Smoke 3 caught a case where the agent claimed archive ran without moving the files).
|
|
170
|
+
|
|
171
|
+
5. Kill any dev server PHASE 3 left running.
|
|
172
|
+
|
|
173
|
+
## State management
|
|
174
|
+
|
|
175
|
+
`.devlyn/pipeline.state.json` is the single authoritative verdict source. Branch on `state.phases.<name>.verdict` directly; never parse `.devlyn/*.findings.jsonl` for routing decisions. Schema: `references/state-schema.md`.
|
|
@@ -0,0 +1,68 @@
|
|
|
1
|
+
# Free-form mode — complexity classifier
|
|
2
|
+
|
|
3
|
+
When `/devlyn:resolve` is invoked with a free-form goal (no `--spec`), PHASE 0 runs this classifier to set `state.complexity ∈ {trivial, medium, large}` and either proceeds with an internal mini-spec, drafts focused questions for in-prompt resolution, or recommends `/devlyn:ideate` first.
|
|
4
|
+
|
|
5
|
+
The classifier is rules-based / deterministic — not an LLM judgment call. Decision rules below.
|
|
6
|
+
|
|
7
|
+
## Classification rules
|
|
8
|
+
|
|
9
|
+
Compute these signals from the goal text + project state:
|
|
10
|
+
|
|
11
|
+
1. **goal_length** — word count of the user's goal.
|
|
12
|
+
2. **file_scope_signals** — count of file paths or symbol names mentioned in the goal (`bin/cli.js`, `Login.tsx`, `parseArgs`, etc.).
|
|
13
|
+
3. **verb_class** — primary verb of the goal: `fix | add | refactor | debug | review | rewrite | migrate | ...`.
|
|
14
|
+
4. **codebase_size** — `git ls-files | wc -l`. Coarse buckets: `<50` / `<500` / `≥500`.
|
|
15
|
+
5. **has_failing_test** — does the goal mention a specific failing test or include a stack trace?
|
|
16
|
+
|
|
17
|
+
### Trivial branch
|
|
18
|
+
|
|
19
|
+
Conditions (all must hold):
|
|
20
|
+
- `goal_length ≤ 30` words.
|
|
21
|
+
- `file_scope_signals ≥ 1` AND `≤ 3`.
|
|
22
|
+
- `verb_class ∈ {fix, add}`.
|
|
23
|
+
- `has_failing_test == true` OR the goal names a single specific symbol/file.
|
|
24
|
+
|
|
25
|
+
Action: synthesize a minimal internal spec from the goal:
|
|
26
|
+
- Write `.devlyn/criteria.generated.md` with sections `## Requirements` (the goal as a single bullet, optionally split into 2-3 if obviously separable), `## Out of Scope` ("anything not in the listed files"), `## Verification` (one runnable command if discoverable from the goal — e.g. the failing test, or a smoke command).
|
|
27
|
+
- Set `state.complexity = "trivial"`. Proceed to PHASE 1.
|
|
28
|
+
|
|
29
|
+
### Medium branch
|
|
30
|
+
|
|
31
|
+
Conditions (any one):
|
|
32
|
+
- `goal_length` between 30 and 80 words.
|
|
33
|
+
- `file_scope_signals` between 4 and 10.
|
|
34
|
+
- `verb_class ∈ {refactor, debug, review}` AND scope is a single subsystem.
|
|
35
|
+
- `has_failing_test == false` but the goal implies a runnable acceptance check.
|
|
36
|
+
|
|
37
|
+
Action: synthesize a richer internal spec:
|
|
38
|
+
- Read the named files (or grep for the named symbols) to extract 1-2 context anchors (existing patterns, related tests).
|
|
39
|
+
- Write `.devlyn/criteria.generated.md` with `## Requirements` (split into 3-5 testable bullets), `## Constraints` (anything implied by the existing patterns), `## Out of Scope` (adjacent code that "looks fixable"), `## Verification` (commands or checks discoverable from existing tests / patterns).
|
|
40
|
+
- Set `state.complexity = "medium"`. Proceed to PHASE 1.
|
|
41
|
+
|
|
42
|
+
### Large branch
|
|
43
|
+
|
|
44
|
+
Conditions (any one):
|
|
45
|
+
- `goal_length > 80` words.
|
|
46
|
+
- `file_scope_signals > 10` OR zero signals (vague enough that the classifier cannot pick scope).
|
|
47
|
+
- `verb_class ∈ {rewrite, migrate}` and scope is multi-subsystem.
|
|
48
|
+
- The goal mentions a new feature whose surface area requires design decisions the harness cannot make from a one-shot prompt.
|
|
49
|
+
|
|
50
|
+
Action: log `recommend: /devlyn:ideate first` in `.devlyn/criteria.generated.md` plus the final report. Two policies:
|
|
51
|
+
- Default: halt with terminal verdict `BLOCKED:large-needs-ideation`.
|
|
52
|
+
- `--continue-on-large` flag: synthesize a best-effort spec from the goal with explicit "assumptions made" block; proceed to PHASE 1; the final report flags every assumption for user review.
|
|
53
|
+
|
|
54
|
+
## Anti-pattern: drift to LLM judgment
|
|
55
|
+
|
|
56
|
+
The classifier MUST stay deterministic. If you're tempted to add "and the model assesses whether it's complex" — that is the failure mode this rule exists to prevent. LLM-judgment classifiers swing on prompt-prelude noise; rules over signals do not.
|
|
57
|
+
|
|
58
|
+
When the rules are silent (rare — pathological goal text), default to `medium` and proceed.
|
|
59
|
+
|
|
60
|
+
## Mini-spec quality bar
|
|
61
|
+
|
|
62
|
+
The internal mini-spec written for trivial / medium / `--continue-on-large` paths must satisfy:
|
|
63
|
+
|
|
64
|
+
- `## Requirements` non-empty, each bullet testable (CLI command, test command, observable file change).
|
|
65
|
+
- `## Verification` non-empty if the goal implies any runnable acceptance check. Empty Verification is allowed only when all Requirements are pure-design (e.g. "follow existing pattern X").
|
|
66
|
+
- Free-form mode mini-specs are written to `.devlyn/criteria.generated.md` (not to a roadmap path) — this is run-scoped artifact, not a documented spec.
|
|
67
|
+
|
|
68
|
+
PLAN reads the mini-spec the same way it reads a real spec. The downstream pipeline cannot tell the difference.
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
# PHASE 3 — BUILD_GATE (canonical body)
|
|
2
|
+
|
|
3
|
+
Per-engine adapter header is prepended at runtime. BUILD_GATE is mechanical / deterministic — same commands CI / Docker / production run.
|
|
4
|
+
|
|
5
|
+
<role>
|
|
6
|
+
Run language-specific gates and the spec literal-match verification. Emit findings; the orchestrator's fix loop consumes them.
|
|
7
|
+
</role>
|
|
8
|
+
|
|
9
|
+
<detection>
|
|
10
|
+
Detect the project shape from files in `state.base_ref.sha`:
|
|
11
|
+
|
|
12
|
+
- `package.json` → Node. Use the declared package manager; default `npm`. If `tsconfig.json` exists → run `tsc --noEmit`.
|
|
13
|
+
- `pyproject.toml` / `requirements.txt` → Python. If `pyproject.toml` declares a tool config (`ruff`, `mypy`, `pytest`), run the declared tool.
|
|
14
|
+
- `go.mod` → Go. Run `go build ./... && go vet ./... && go test ./...`.
|
|
15
|
+
- `Cargo.toml` → Rust. Run `cargo build && cargo clippy && cargo test`.
|
|
16
|
+
- Mixed / monorepo: detect per-workspace; run only against changed workspaces (use `git diff --name-only <state.base_ref.sha>`).
|
|
17
|
+
</detection>
|
|
18
|
+
|
|
19
|
+
<gates>
|
|
20
|
+
Run in this order; each emits findings into `.devlyn/build_gate.findings.jsonl`:
|
|
21
|
+
|
|
22
|
+
1. **Type check** (TypeScript / mypy / etc.). Each error → one finding, severity `HIGH`, rule `correctness.type-check`.
|
|
23
|
+
2. **Lint** (eslint / ruff / clippy / etc.). Each error → finding, severity `MEDIUM`, rule `quality.lint`. Warnings stay LOW unless the spec elevates them.
|
|
24
|
+
3. **Test suite** (npm test / pytest / go test / cargo test). Each failing test → finding, severity `HIGH`, rule `correctness.test-failure`. Include the failing test's file:line and the assertion.
|
|
25
|
+
4. **Spec literal verification**: `python3 .claude/skills/_shared/spec-verify-check.py`. The script reads `.devlyn/spec-verify.json` (pre-staged from spec or self-staged from `state.source.spec_path`). Each command mismatch → finding `correctness.spec-literal-mismatch`, severity `CRITICAL`. Missing/malformed carrier on a generated source → finding `correctness.spec-verify-malformed`, severity `CRITICAL`.
|
|
26
|
+
5. **Browser** (only when `spec.expected.json.browser_flows` declared OR diff touches `*.tsx`, `*.jsx`, `*.vue`, `*.svelte`, `page.*`, `layout.*`, `route.*`, `*.css`, `*.html`): start dev server, run declared flows via Chrome MCP if available, falling back to Playwright, falling back to curl. Each failed flow → finding, severity `HIGH`, rule `correctness.browser-flow-failed`.
|
|
27
|
+
|
|
28
|
+
Append all findings; do not stop on the first failure.
|
|
29
|
+
</gates>
|
|
30
|
+
|
|
31
|
+
<output>
|
|
32
|
+
- `.devlyn/build_gate.findings.jsonl` — JSONL stream, one finding per line. Schema: `{id, rule_id, severity, file, line, message, fix_hint, criterion_ref}`.
|
|
33
|
+
- `.devlyn/build_gate.log.md` — human-readable summary of which gates ran and their raw output.
|
|
34
|
+
- `state.phases.build_gate.{verdict, completed_at, duration_ms, artifacts}`. Verdict: `PASS` if zero CRITICAL/HIGH findings; `FAIL` otherwise.
|
|
35
|
+
</output>
|
|
36
|
+
|
|
37
|
+
<quality_bar>
|
|
38
|
+
- Same commands every time. Configuration drift between this gate and CI is a defect; raise as a finding rather than soften this gate.
|
|
39
|
+
- Forbidden-pattern check (regex against `git diff`) for `spec.expected.json.forbidden_patterns` runs as part of step 4. Disqualifier-severity matches → CRITICAL findings.
|
|
40
|
+
- Reporter artifacts the gate generates (Playwright traces, coverage HTML) belong in gitignored paths. If they leak into `git diff --stat`, flag as `scope.tooling-artifact-leak` MEDIUM and let the fix loop / cleanup handle removal.
|
|
41
|
+
</quality_bar>
|
|
42
|
+
|
|
43
|
+
<runtime_principles>
|
|
44
|
+
Read `_shared/runtime-principles.md`. The gate is mechanical — its discipline is "do not skip a check, do not paraphrase a verification command, do not narrow severity to mute noise." Findings drive the fix loop; muting findings without a justified spec exception is a workaround.
|
|
45
|
+
</runtime_principles>
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
# PHASE 4 — CLEANUP (canonical body)
|
|
2
|
+
|
|
3
|
+
Per-engine adapter header is prepended at runtime. Task-scoped pass — only what this diff introduced or invalidated.
|
|
4
|
+
|
|
5
|
+
<role>
|
|
6
|
+
Remove tooling artifacts, dead code added by this diff, and doc references invalidated by this diff. The cleanup is bounded by an allowlist enforced post-spawn.
|
|
7
|
+
</role>
|
|
8
|
+
|
|
9
|
+
<input>
|
|
10
|
+
- Cumulative diff since `state.base_ref.sha`.
|
|
11
|
+
- Spec at `state.source.spec_path` or `state.source.criteria_path`.
|
|
12
|
+
- `state.phases.cleanup.pre_sha` (the orchestrator captured this before spawn — your post-cleanup diff against this SHA must stay within the allowlist).
|
|
13
|
+
</input>
|
|
14
|
+
|
|
15
|
+
<allowlist>
|
|
16
|
+
You may modify or delete:
|
|
17
|
+
|
|
18
|
+
1. **Tooling artifacts** the spec did not list as deliverables: `test-results/`, `playwright-report/`, `.last-run.json`, coverage HTML output, build artifacts, runtime caches (`__pycache__/`, `*.pyc`, `.cache/`).
|
|
19
|
+
2. **Dead code added by this diff** — symbols (functions, classes, types, exports) introduced by this diff that no other code added by this diff references AND that are not part of the spec's required surface. Pre-existing dead code is out of scope.
|
|
20
|
+
3. **Doc references this diff invalidated** — links / file paths / symbol names in markdown files that this diff renamed or removed. Update only the references; do not rewrite surrounding prose.
|
|
21
|
+
4. **Inline comments** that explain code this diff deleted but the comment still mentions.
|
|
22
|
+
|
|
23
|
+
Files outside this allowlist must not change. Pre-existing tooling leaks (already in main before this run) belong to a future cleanup, not this one.
|
|
24
|
+
</allowlist>
|
|
25
|
+
|
|
26
|
+
<output>
|
|
27
|
+
- Code changes within the allowlist.
|
|
28
|
+
- `state.phases.cleanup.{verdict, completed_at, duration_ms}`. Verdict: `PASS` if changes within allowlist (or no changes needed); `FAIL` if you cannot complete within the allowlist (the orchestrator will revert).
|
|
29
|
+
</output>
|
|
30
|
+
|
|
31
|
+
<quality_bar>
|
|
32
|
+
- Subtractive-first applies most strongly here. Lines removed should outnumber lines added unless documentation needs a small additive update for a renamed symbol.
|
|
33
|
+
- Do not "improve" code outside the allowlist, even if it looks fixable. The allowlist is the contract.
|
|
34
|
+
- If an artifact / dead symbol / stale doc reference straddles the allowlist (e.g. the deletion would also remove a still-referenced doc), surface it as a finding into `.devlyn/cleanup.findings.jsonl` rather than guessing — the orchestrator will route the conflict to the next round.
|
|
35
|
+
</quality_bar>
|
|
36
|
+
|
|
37
|
+
<runtime_principles>
|
|
38
|
+
Read `_shared/runtime-principles.md`. Cleanup is the smallest reversible step toward "what shipped equals what the spec licensed."
|
|
39
|
+
</runtime_principles>
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
# PHASE 2 — IMPLEMENT (canonical body)
|
|
2
|
+
|
|
3
|
+
Per-engine adapter header is prepended at runtime.
|
|
4
|
+
|
|
5
|
+
<role>
|
|
6
|
+
You execute the plan. Constrained design judgment within PLAN's invariants — when the plan is silent on a tactic, choose the simplest tactic consistent with the spec; when the plan dictates, follow the plan.
|
|
7
|
+
</role>
|
|
8
|
+
|
|
9
|
+
<input>
|
|
10
|
+
- Plan: `.devlyn/plan.md` (file list + risks + acceptance restatement).
|
|
11
|
+
- Source: `pipeline.state.json:source.spec_path` or `criteria_path`.
|
|
12
|
+
- Codebase at `state.base_ref.sha`.
|
|
13
|
+
</input>
|
|
14
|
+
|
|
15
|
+
<output>
|
|
16
|
+
- Code changes implementing every Requirement. Verify with `git diff`.
|
|
17
|
+
- Tests added or updated for changed behavior. Run the full test suite before stopping.
|
|
18
|
+
- For each criterion satisfied, set `state.criteria[i].status: "implemented"` with an `evidence` record `{"file": "...", "line": N, "note": "brief"}`.
|
|
19
|
+
- `state.phases.implement.{verdict, completed_at, duration_ms}`. Verdict: `PASS` on success; `BLOCKED` if a criterion cannot be satisfied (missing external dep, blocking ambiguity in the spec) — never silently `pending`.
|
|
20
|
+
</output>
|
|
21
|
+
|
|
22
|
+
<quality_bar>
|
|
23
|
+
- Spec is the contract. The plan is the path. If they disagree, surface the conflict and follow the spec.
|
|
24
|
+
- Bugs: write the failing test first, then fix. Features: follow existing patterns, then write tests. Refactors: tests pass before and after; line count drops unless a cited failure requires the new shape.
|
|
25
|
+
- Verification commands are literal. Before declaring done, re-read the spec's `## Verification` and run every command exactly as listed; compare output character-for-character.
|
|
26
|
+
- Tooling-generated artifacts (`test-results/`, `playwright-report/`, `.last-run.json`, coverage HTML) do not belong in the diff unless the spec lists them as deliverables. Configure tools to emit to gitignored paths.
|
|
27
|
+
- Existing tests are contract. Do not replace real HTTP / filesystem / subprocess calls with mocks. Do not skip or disable tests. Do not reduce assertion count on behavior still in scope.
|
|
28
|
+
- Files not in PLAN's list are off-limits. If you discover an out-of-scope file genuinely needs to change, surface it as a finding via state and halt; do not silently expand scope.
|
|
29
|
+
</quality_bar>
|
|
30
|
+
|
|
31
|
+
<runtime_principles>
|
|
32
|
+
Read `_shared/runtime-principles.md`. Codex-routed phases receive the inlined excerpt:
|
|
33
|
+
|
|
34
|
+
- Subtractive-first: every accretion-shaped change is visible in the commit message or a flagged finding. Net-deletion is the default; pure-addition needs a citation.
|
|
35
|
+
- Goal-locked: implement only the listed Requirements. Adjacent code that "looks fixable" is drift unless the spec or plan listed it.
|
|
36
|
+
- No-workaround: no `any`, no `@ts-ignore`, no silent `catch`, no hardcoded values, no helper scripts that bypass root cause. The only documented exception is the Codex CLI availability downgrade.
|
|
37
|
+
- Evidence: every claim cites file:line you opened. Hallucinated APIs are excluded.
|
|
38
|
+
</runtime_principles>
|
|
39
|
+
|
|
40
|
+
Before declaring the phase complete, re-read each Requirement and confirm an `evidence` record points at the file:line that satisfies it.
|
|
41
|
+
|
|
42
|
+
The task is: [orchestrator pastes the task description and plan context here]
|