devlyn-cli 1.13.0 → 1.15.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CLAUDE.md +28 -149
- package/README.md +30 -1
- package/config/skills/devlyn:auto-resolve/SKILL.md +167 -453
- package/config/skills/devlyn:auto-resolve/evals/evals.json +21 -0
- package/config/skills/devlyn:auto-resolve/evals/task-doctor-subcommand.md +42 -0
- package/config/skills/devlyn:auto-resolve/references/build-gate.md +36 -22
- package/config/skills/devlyn:auto-resolve/references/engine-routing.md +43 -165
- package/config/skills/devlyn:auto-resolve/references/findings-schema.md +103 -0
- package/config/skills/devlyn:auto-resolve/references/phases/phase-1-build.md +54 -0
- package/config/skills/devlyn:auto-resolve/references/phases/phase-2-evaluate.md +45 -0
- package/config/skills/devlyn:auto-resolve/references/phases/phase-3-critic.md +84 -0
- package/config/skills/devlyn:auto-resolve/references/pipeline-routing.md +114 -0
- package/config/skills/devlyn:auto-resolve/references/pipeline-state.md +201 -0
- package/config/skills/devlyn:auto-resolve/scripts/archive_run.py +104 -0
- package/config/skills/devlyn:auto-resolve/scripts/terminal_verdict.py +96 -0
- package/config/skills/devlyn:ideate/SKILL.md +17 -78
- package/config/skills/devlyn:ideate/references/codex-critic-template.md +42 -0
- package/config/skills/devlyn:ideate/references/templates/item-spec.md +4 -0
- package/config/skills/devlyn:preflight/SKILL.md +25 -40
- package/config/skills/devlyn:preflight/references/auditors/code-auditor.md +6 -10
- package/config/skills/devlyn:reap/SKILL.md +104 -0
- package/config/skills/devlyn:reap/scripts/reap.sh +129 -0
- package/config/skills/devlyn:reap/scripts/scan.sh +116 -0
- package/package.json +5 -1
package/CLAUDE.md
CHANGED
|
@@ -2,175 +2,54 @@
|
|
|
2
2
|
|
|
3
3
|
## Quick Start
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
Three commands cover most work. All default to `--engine auto` — Codex GPT-5.4 builds, Claude Opus 4.7 critiques (cross-model GAN dynamic).
|
|
6
6
|
|
|
7
|
-
1. `/devlyn:ideate` —
|
|
8
|
-
2. `/devlyn:auto-resolve "Implement per spec at docs/roadmap/phase-N/X-name.md"` — hands-free build → evaluate →
|
|
9
|
-
3. `/devlyn:preflight` — verify the implementation matches the roadmap
|
|
7
|
+
1. `/devlyn:ideate` — unstructured idea → VISION/ROADMAP/item specs
|
|
8
|
+
2. `/devlyn:auto-resolve "Implement per spec at docs/roadmap/phase-N/X-name.md"` — hands-free build → evaluate → ship
|
|
9
|
+
3. `/devlyn:preflight` — verify the implementation matches the roadmap
|
|
10
10
|
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
## General
|
|
14
|
-
|
|
15
|
-
- Proactively use subagents and skills where needed
|
|
16
|
-
- Follow commit conventions in `.claude/commit-conventions.md`
|
|
17
|
-
- Follow design system in `docs/design-system.md` for UI/UX work if exist
|
|
11
|
+
Each skill's `SKILL.md` is the source of truth for its flags and workflow — don't duplicate them here.
|
|
18
12
|
|
|
19
13
|
## Error Handling Philosophy
|
|
20
14
|
|
|
21
15
|
**No silent fallbacks.** Handle errors explicitly and show the user what happened.
|
|
22
16
|
|
|
23
|
-
- **Default
|
|
24
|
-
- **Fallbacks are the exception
|
|
25
|
-
- **
|
|
26
|
-
- **Pattern**: `try { doThing() } catch (error) { showErrorUI(error) }` — NOT `try { doThing() } catch { return fallbackValue }`
|
|
17
|
+
- **Default**: when something fails, display a clear error state — message, retry option, or actionable guidance. Do NOT silently fall back to default/placeholder data.
|
|
18
|
+
- **Fallbacks are the exception.** Only use them when it's a widely accepted best practice (CSS fallback fonts, CDN failover, image placeholders). Otherwise handle the error explicitly.
|
|
19
|
+
- **Pattern**: `try { doThing() } catch (error) { showErrorUI(error) }` — NOT `try { doThing() } catch { return fallbackValue }`.
|
|
27
20
|
|
|
28
21
|
## Investigation Workflow
|
|
29
22
|
|
|
30
23
|
When investigating bugs, analyzing features, or exploring code:
|
|
31
24
|
|
|
32
|
-
1. **Define exit criteria upfront**
|
|
33
|
-
2. **Checkpoint progress**
|
|
34
|
-
3. **
|
|
35
|
-
4. **Always deliver findings**
|
|
36
|
-
- Files examined
|
|
37
|
-
- Key findings
|
|
38
|
-
- Remaining unknowns
|
|
39
|
-
- Recommended next steps
|
|
40
|
-
|
|
41
|
-
For complex investigations, use `/devlyn:team-resolve` to assemble a multi-perspective investigation team, or spawn parallel Agent subagents to explore different areas simultaneously.
|
|
42
|
-
|
|
43
|
-
## UI/UX Workflow
|
|
44
|
-
|
|
45
|
-
The full design-to-implementation pipeline:
|
|
46
|
-
|
|
47
|
-
1. `/devlyn:design-ui` → Generate 5 style explorations
|
|
48
|
-
2. `/devlyn:design-system [N]` → Extract tokens from chosen style
|
|
49
|
-
3. `/devlyn:implement-ui` → Team implements or improves UI from design system
|
|
50
|
-
4. `/devlyn:team-resolve [feature]` → Add features on top
|
|
51
|
-
|
|
52
|
-
## Feature Development
|
|
53
|
-
|
|
54
|
-
1. **Plan first** - Always output a concrete implementation plan with specific file changes before writing code
|
|
55
|
-
2. **Track progress** - Use the task tools (TaskCreate / TaskUpdate) to checkpoint each phase
|
|
56
|
-
3. **Test validation** - Write tests alongside implementation; iterate until green
|
|
57
|
-
4. **Small commits** - Commit working increments rather than large changesets
|
|
58
|
-
|
|
59
|
-
For complex features, spawn the `Plan` subagent (`Agent` tool with `subagent_type: "Plan"`) to design the approach before implementation.
|
|
60
|
-
|
|
61
|
-
## Automated Pipeline (Recommended Starting Point)
|
|
62
|
-
|
|
63
|
-
For hands-free build-evaluate-polish cycles — works for bugs, features, refactors, and chores:
|
|
64
|
-
|
|
65
|
-
```
|
|
66
|
-
/devlyn:auto-resolve [task description]
|
|
67
|
-
```
|
|
68
|
-
|
|
69
|
-
This runs the full pipeline automatically: **Build → Build Gate → Browser Validate → Evaluate → Fix Loop → Simplify → Review → Challenge → Security Review → Clean → Docs**. Each phase runs as a separate subagent with its own context. Communication between phases happens via files (`.devlyn/done-criteria.md`, `.devlyn/BUILD-GATE.md`, `.devlyn/EVAL-FINDINGS.md`, `.devlyn/BROWSER-RESULTS.md`, `.devlyn/CHALLENGE-FINDINGS.md`).
|
|
70
|
-
|
|
71
|
-
The **Build Gate** (Phase 1.4) runs real compilers, typecheckers, and linters — the same commands CI/Docker/production will run. It auto-detects project types (Next.js, Rust, Go, Solidity, Expo, Swift, etc.) and Dockerfiles. This is the primary defense against "tests pass locally, breaks in CI/Docker" class of bugs (type errors in un-tested files, cross-package drift, Dockerfile copy mismatches).
|
|
72
|
-
|
|
73
|
-
The **Challenge** phase (Phase 4.5) is a fresh skeptical review with no checklist — a subagent reads the entire diff cold with zero context from prior phases and asks "would I ship this to production with my name on it?" This catches the subtle issues that structured checklist-driven reviews miss: wrong-but-working approaches, unstated assumptions, non-idiomatic patterns, and integration gaps.
|
|
74
|
-
|
|
75
|
-
For web projects, the Browser Validate phase starts the dev server and tests the implemented feature in a real browser — clicking buttons, filling forms, verifying results. If the feature doesn't work, findings feed back into the fix loop.
|
|
76
|
-
|
|
77
|
-
Optional flags:
|
|
78
|
-
- `--max-rounds 6` — increase max evaluate-fix iterations (default: 4)
|
|
79
|
-
- `--skip-browser` — skip browser validation phase (auto-skipped for non-web changes)
|
|
80
|
-
- `--skip-build-gate` — skip the deterministic build gate (not recommended)
|
|
81
|
-
- `--build-gate strict` — treat warnings as errors; `--build-gate no-docker` — skip Docker builds for speed
|
|
82
|
-
- `--skip-review` — skip team-review phase
|
|
83
|
-
- `--skip-clean` — skip clean phase
|
|
84
|
-
- `--skip-docs` — skip update-docs phase
|
|
85
|
-
- `--engine auto|codex|claude` — intelligent model routing. `auto` (default) routes each phase and team role to the optimal model based on benchmark data: Codex GPT-5.4 handles BUILD and FIX (SWE-bench Pro lead), Claude Opus 4.7 handles EVALUATE and CHALLENGE (long-context retrieval + skeptical reasoning). Different models build vs critique — the cross-model GAN dynamic catches what single-model pipelines miss. `codex` forces Codex for implementation, Claude for orchestration and Chrome MCP. `claude` uses Claude for everything. Requires codex-mcp-server for `auto` and `codex` modes.
|
|
86
|
-
|
|
87
|
-
## Preflight Check (Post-Roadmap Verification)
|
|
88
|
-
|
|
89
|
-
After completing a roadmap (or a phase), verify that everything was actually implemented correctly:
|
|
90
|
-
|
|
91
|
-
```
|
|
92
|
-
/devlyn:preflight
|
|
93
|
-
```
|
|
25
|
+
1. **Define exit criteria upfront** — ask "what does 'done' look like?" before starting.
|
|
26
|
+
2. **Checkpoint progress** — use `TaskCreate`/`TaskUpdate` every 5–10 minutes.
|
|
27
|
+
3. **Intermediate summaries** — output "current understanding" snapshots so work isn't lost if interrupted.
|
|
28
|
+
4. **Always deliver findings** — never end mid-analysis. Minimum output: files examined, key findings, remaining unknowns, recommended next steps.
|
|
94
29
|
|
|
95
|
-
|
|
30
|
+
For complex investigations, use `/devlyn:team-resolve` for a multi-perspective team, or spawn parallel `Agent` subagents.
|
|
96
31
|
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
Optional flags:
|
|
100
|
-
- `--phase N` — audit only phase N items
|
|
101
|
-
- `--autofix` — auto-promote CRITICAL/HIGH findings and run auto-resolve
|
|
102
|
-
- `--skip-browser` — skip browser validation
|
|
103
|
-
- `--skip-docs` — skip documentation audit
|
|
104
|
-
- `--engine auto|codex|claude` — `auto` (default) routes the code-auditor to Codex (SWE-bench Pro +11.7pp on code analysis); the docs-auditor and browser-auditor always use Claude regardless of `--engine` (writing-quality strength on docs drift; Chrome MCP tools are session-bound to Claude Code)
|
|
105
|
-
|
|
106
|
-
**Recommended workflow**: `/devlyn:ideate` → `/devlyn:auto-resolve` (repeat) → `/devlyn:preflight` → fix gaps → `/devlyn:preflight` (verify)
|
|
107
|
-
|
|
108
|
-
## Manual Pipeline (Step-by-Step Control)
|
|
109
|
-
|
|
110
|
-
When you want to run each step yourself with review between phases:
|
|
111
|
-
|
|
112
|
-
1. `/devlyn:team-resolve [issue]` → Investigate + implement (writes `.devlyn/done-criteria.md`)
|
|
113
|
-
2. `/devlyn:evaluate` → Grade against done-criteria (writes `.devlyn/EVAL-FINDINGS.md`)
|
|
114
|
-
3. If findings exist: `/devlyn:team-resolve "Fix issues in .devlyn/EVAL-FINDINGS.md"` → Fix loop
|
|
115
|
-
4. `/simplify` → Quick cleanup pass
|
|
116
|
-
5. `/devlyn:team-review` → Multi-perspective team review (for important PRs)
|
|
117
|
-
6. `/devlyn:clean` → Codebase hygiene
|
|
118
|
-
7. `/devlyn:update-docs` → Keep docs in sync
|
|
119
|
-
|
|
120
|
-
Steps 5-7 are optional depending on scope.
|
|
121
|
-
|
|
122
|
-
## Vibe Coding Workflow
|
|
123
|
-
|
|
124
|
-
The recommended sequence after writing code:
|
|
125
|
-
|
|
126
|
-
1. **Write code** (vibe coding)
|
|
127
|
-
2. `/simplify` → Quick cleanup pass (reuse, quality, efficiency)
|
|
128
|
-
3. `/devlyn:review` → Thorough solo review with security-first checklist
|
|
129
|
-
4. `/devlyn:team-review` → Multi-perspective team review (for important PRs)
|
|
130
|
-
5. `/devlyn:clean` → Periodic codebase-wide hygiene
|
|
131
|
-
6. `/devlyn:update-docs` → Keep docs in sync
|
|
132
|
-
|
|
133
|
-
Steps 4-6 are optional depending on the scope of changes. `/simplify` should always run before `/devlyn:review` to catch low-hanging fruit cheaply.
|
|
134
|
-
|
|
135
|
-
## Documentation Workflow
|
|
136
|
-
|
|
137
|
-
- **Sync docs with codebase**: Use `/devlyn:update-docs` to clean up stale content, update outdated info, and generate missing docs
|
|
138
|
-
- **Focused doc update**: Use `/devlyn:update-docs [area]` for targeted updates (e.g., "API reference", "getting-started")
|
|
139
|
-
- Preserves all forward-looking content: roadmaps, future plans, visions, open questions
|
|
140
|
-
- If no docs exist, proposes a tailored docs structure and generates initial content
|
|
141
|
-
|
|
142
|
-
## Browser Testing Workflow
|
|
143
|
-
|
|
144
|
-
- **Standalone**: Use `/devlyn:browser-validate` to test any web feature in the browser — starts the dev server, tests the feature end-to-end, fixes issues it finds
|
|
145
|
-
- **In pipeline**: Auto-resolve includes browser validation automatically for web projects (between Build and Evaluate phases)
|
|
146
|
-
- **Tiered**: Uses chrome MCP tools if available, falls back to Playwright, then curl
|
|
147
|
-
- **Feature-first**: Tests the implemented feature (from done-criteria), not just "does the page load"
|
|
148
|
-
|
|
149
|
-
## Debugging Workflow
|
|
32
|
+
## Context Window Management
|
|
150
33
|
|
|
151
|
-
-
|
|
152
|
-
- **Complex bugs**: Use `/devlyn:team-resolve` for multi-perspective investigation with a full agent team
|
|
153
|
-
- **Hands-free**: Use `/devlyn:auto-resolve` for fully automated resolve → evaluate → fix → polish pipeline
|
|
154
|
-
- **Post-fix review**: Use `/devlyn:team-review` for thorough multi-reviewer validation
|
|
34
|
+
Claude 4.5/4.6/4.7 auto-compact as context approaches the limit — you can work indefinitely without manual handoffs in most cases. Don't stop early due to token-budget concerns; the model resumes from where it left off after compaction.
|
|
155
35
|
|
|
156
|
-
|
|
36
|
+
For multi-context-window work (e.g., a large roadmap), persist state to disk:
|
|
37
|
+
- auto-resolve writes durable state to `.devlyn/runs/<run_id>/` (pipeline.state.json, `<phase>.findings.jsonl`, `<phase>.log.md`) plus git commits. Pick up by reading `state.json` first; drill into JSONL/log files as needed.
|
|
38
|
+
- preflight writes `.devlyn/PREFLIGHT-REPORT.md`.
|
|
39
|
+
- For long investigations, write progress to `HANDOFF.md`; resume with `@HANDOFF.md continue`.
|
|
157
40
|
|
|
158
|
-
|
|
159
|
-
- **Focused cleanup**: Use `/devlyn:clean [category]` for targeted sweeps (dead code, deps, tests, complexity, hygiene)
|
|
160
|
-
- **Periodic maintenance sequence**: `/devlyn:clean` → `/simplify` → `/devlyn:update-docs` → `/devlyn:review`
|
|
41
|
+
Manually `/clear` only when context is genuinely irrelevant to the next task.
|
|
161
42
|
|
|
162
|
-
##
|
|
43
|
+
## Communication Style
|
|
163
44
|
|
|
164
|
-
|
|
45
|
+
- Lead with **objective data** (popularity, benchmarks, community adoption) before personal opinions.
|
|
46
|
+
- When the user asks "what's popular" or "what do others use", answer with data.
|
|
47
|
+
- Keep recommendations actionable and specific.
|
|
165
48
|
|
|
166
|
-
|
|
167
|
-
- All `auto-resolve` and `preflight` runs already write durable state to `.devlyn/*.md` (done-criteria, BUILD-GATE, EVAL-FINDINGS, BROWSER-RESULTS, CHALLENGE-FINDINGS, PREFLIGHT-REPORT) and to git commits — pick up by reading those files plus `git log`.
|
|
168
|
-
- For long investigations, write progress notes to a `HANDOFF.md` and resume with `@HANDOFF.md continue from where this left off` if you need a fresh window.
|
|
49
|
+
## Commit Conventions
|
|
169
50
|
|
|
170
|
-
|
|
51
|
+
Follow `.claude/commit-conventions.md`.
|
|
171
52
|
|
|
172
|
-
##
|
|
53
|
+
## Design System
|
|
173
54
|
|
|
174
|
-
|
|
175
|
-
- When user asks "what's popular" or "what do others use", provide data-driven answers
|
|
176
|
-
- Keep recommendations actionable and specific
|
|
55
|
+
When doing UI/UX work, follow `docs/design-system.md` if it exists.
|
package/README.md
CHANGED
|
@@ -109,7 +109,7 @@ Install the Codex MCP server during setup, then:
|
|
|
109
109
|
/devlyn:auto-resolve "fix the auth bug" --engine auto
|
|
110
110
|
```
|
|
111
111
|
|
|
112
|
-
**`--engine auto`** routes each pipeline phase and team role to the optimal model (Claude Opus 4.
|
|
112
|
+
**`--engine auto`** routes each pipeline phase and team role to the optimal model (Claude Opus 4.7 or GPT-5.4) — validated through A/B testing, not just benchmarks.
|
|
113
113
|
|
|
114
114
|
> `--engine auto` (default, recommended) · `--engine codex` (force Codex for build) · `--engine claude` (Claude only)
|
|
115
115
|
|
|
@@ -146,6 +146,35 @@ Works across the full pipeline:
|
|
|
146
146
|
|
|
147
147
|
</details>
|
|
148
148
|
|
|
149
|
+
<details>
|
|
150
|
+
<summary><strong>What's new in 1.14.0</strong> — CPO lens + handoff enforcement</summary>
|
|
151
|
+
|
|
152
|
+
`/devlyn:ideate` now thinks like a world-class Product Owner, and `/devlyn:auto-resolve` finally honors the spec contract the ideate skill was already designed to produce. Validated with 19 parallel eval subagents, 1.2M tokens of evidence — Customer Frame propagation went from 0/20 to 20/20 across seven test scenarios.
|
|
153
|
+
|
|
154
|
+
- **Jobs-to-be-Done forcing in FRAME** — ideate's opening FRAME phase now requires a one-sentence JTBD statement ("When [situation], [user] wants [motivation] so they can [outcome]") before anything else. A bare problem statement is a state description, not a job — downstream specs built without this frame describe system behavior instead of customer progress.
|
|
155
|
+
- **Customer Frame field on every item spec** — item-spec template gains a `## Customer Frame` section between Context and Objective that carries the per-item JTBD sentence all the way through to auto-resolve's build agent. The build agent uses this line to resolve ambiguity in Requirements rather than inventing interpretations.
|
|
156
|
+
- **PHASE 0.5 SPEC PREFLIGHT on auto-resolve** — when the task names a `docs/roadmap/phase-N/...md` spec, auto-resolve now reads it BEFORE BUILD, verifies internal dependencies are `status: done`, and writes `.devlyn/SPEC-CONTEXT.md` so downstream phases stop re-deriving what the spec already owns. Un-done deps halt the pipeline with `BLOCKED` rather than shipping out-of-sequence code.
|
|
157
|
+
- **Done-criteria verbatim copy** — when PHASE 0.5 found a spec, BUILD's Phase B copies the spec's `Requirements`, `Out of Scope`, and `Verification` sections verbatim into `.devlyn/done-criteria.md`. No silent re-derivation; the ideate CHALLENGE rubric's validation is preserved through the handoff.
|
|
158
|
+
- **Spec-bounded exploration** — BUILD's Phase A uses the spec's `Architecture Notes` + `Dependencies` as the exploration boundary instead of re-classifying the task type open-endedly.
|
|
159
|
+
- **Complexity-gated team ceremony** — `complexity: low` specs with no security/auth/API/data risk keywords skip TeamCreate entirely. Medium/high complexity or risk-flagged specs still assemble the team as before.
|
|
160
|
+
- **Evidence discipline in ideate EXPLORE** — research phase now labels unsourced market/tech claims `[UNVERIFIED]` inline rather than presenting recall as fact. The CHALLENGE rubric's NO GUESSWORK axis fires on unlabeled authoritative claims.
|
|
161
|
+
- **Mode tie-break rule** — when a request matches two ideate modes (Quick Add vs Expand, Research-first vs Deep-dive), the narrowest mode wins. Deterministic selection replaces intuitive match.
|
|
162
|
+
- **Bloat removal** — three redundant motivational blocks deleted from ideate SKILL.md (`<why_this_matters>` rationale, duplicate CHALLENGE preamble, external engine-routing pointer). SKILL.md shrank from 529 to 519 lines despite the new features.
|
|
163
|
+
|
|
164
|
+
</details>
|
|
165
|
+
|
|
166
|
+
<details>
|
|
167
|
+
<summary><strong>What's new in 1.13.0</strong> — Opus 4.7 pipeline pass</summary>
|
|
168
|
+
|
|
169
|
+
Core pipeline skills (`ideate`, `auto-resolve`, `preflight`) rewritten against Anthropic's Opus 4.7 prompting guidance, validated by multi-round comprehension and quality-grading subagents.
|
|
170
|
+
|
|
171
|
+
- **4.7 prompt patterns** — `<investigate_before_answering>` on evaluator and challenge, `<coverage_over_filtering>` with per-finding confidence, 3 few-shot examples in the Challenge phase, `<orchestrator_context>` (auto-compaction + xhigh effort), `<use_parallel_tool_calls>` in ideate EXPLORE and preflight Phase 0.
|
|
172
|
+
- **`--with-codex` consolidated into `--engine auto`** — auto now covers BUILD/FIX + team roles + ideate CHALLENGE critic (broader than `--with-codex both` ever was). Legacy flag still accepted with a graceful handoff.
|
|
173
|
+
- **Bug fixes** — PHASE 1.5 BLOCKED browser failures re-route correctly via PHASE 2.5; PHASE 1.4-fix and PHASE 2.5 share one global round counter; preflight PHASE 1 numbering fixed; build-gate-exhausted now produces a graceful final report.
|
|
174
|
+
- **CLAUDE.md refresh** (shipped to `npx` installers) — Quick Start pointing to ideate → auto-resolve → preflight, Context Window Management updated for Opus 4.7 auto-compaction, terminology refresh (TodoWrite → task tools, Task agents → Agent subagents).
|
|
175
|
+
|
|
176
|
+
</details>
|
|
177
|
+
|
|
149
178
|
---
|
|
150
179
|
|
|
151
180
|
## Manual Commands
|