devlyn-cli 1.14.0 → 1.15.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CLAUDE.md +28 -149
- package/config/skills/devlyn:auto-resolve/SKILL.md +165 -515
- package/config/skills/devlyn:auto-resolve/evals/evals.json +21 -0
- package/config/skills/devlyn:auto-resolve/evals/task-doctor-subcommand.md +42 -0
- package/config/skills/devlyn:auto-resolve/references/build-gate.md +36 -22
- package/config/skills/devlyn:auto-resolve/references/engine-routing.md +43 -165
- package/config/skills/devlyn:auto-resolve/references/findings-schema.md +103 -0
- package/config/skills/devlyn:auto-resolve/references/phases/phase-1-build.md +54 -0
- package/config/skills/devlyn:auto-resolve/references/phases/phase-2-evaluate.md +45 -0
- package/config/skills/devlyn:auto-resolve/references/phases/phase-3-critic.md +84 -0
- package/config/skills/devlyn:auto-resolve/references/pipeline-routing.md +114 -0
- package/config/skills/devlyn:auto-resolve/references/pipeline-state.md +201 -0
- package/config/skills/devlyn:auto-resolve/scripts/archive_run.py +104 -0
- package/config/skills/devlyn:auto-resolve/scripts/terminal_verdict.py +96 -0
- package/config/skills/devlyn:ideate/SKILL.md +12 -64
- package/config/skills/devlyn:ideate/references/codex-critic-template.md +42 -0
- package/config/skills/devlyn:preflight/SKILL.md +25 -40
- package/config/skills/devlyn:preflight/references/auditors/code-auditor.md +6 -10
- package/config/skills/devlyn:reap/SKILL.md +104 -0
- package/config/skills/devlyn:reap/scripts/reap.sh +129 -0
- package/config/skills/devlyn:reap/scripts/scan.sh +116 -0
- package/package.json +5 -1
package/CLAUDE.md
CHANGED
|
@@ -2,175 +2,54 @@
|
|
|
2
2
|
|
|
3
3
|
## Quick Start
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
Three commands cover most work. All default to `--engine auto` — Codex GPT-5.4 builds, Claude Opus 4.7 critiques (cross-model GAN dynamic).
|
|
6
6
|
|
|
7
|
-
1. `/devlyn:ideate` —
|
|
8
|
-
2. `/devlyn:auto-resolve "Implement per spec at docs/roadmap/phase-N/X-name.md"` — hands-free build → evaluate →
|
|
9
|
-
3. `/devlyn:preflight` — verify the implementation matches the roadmap
|
|
7
|
+
1. `/devlyn:ideate` — unstructured idea → VISION/ROADMAP/item specs
|
|
8
|
+
2. `/devlyn:auto-resolve "Implement per spec at docs/roadmap/phase-N/X-name.md"` — hands-free build → evaluate → ship
|
|
9
|
+
3. `/devlyn:preflight` — verify the implementation matches the roadmap
|
|
10
10
|
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
## General
|
|
14
|
-
|
|
15
|
-
- Proactively use subagents and skills where needed
|
|
16
|
-
- Follow commit conventions in `.claude/commit-conventions.md`
|
|
17
|
-
- Follow design system in `docs/design-system.md` for UI/UX work if exist
|
|
11
|
+
Each skill's `SKILL.md` is the source of truth for its flags and workflow — don't duplicate them here.
|
|
18
12
|
|
|
19
13
|
## Error Handling Philosophy
|
|
20
14
|
|
|
21
15
|
**No silent fallbacks.** Handle errors explicitly and show the user what happened.
|
|
22
16
|
|
|
23
|
-
- **Default
|
|
24
|
-
- **Fallbacks are the exception
|
|
25
|
-
- **
|
|
26
|
-
- **Pattern**: `try { doThing() } catch (error) { showErrorUI(error) }` — NOT `try { doThing() } catch { return fallbackValue }`
|
|
17
|
+
- **Default**: when something fails, display a clear error state — message, retry option, or actionable guidance. Do NOT silently fall back to default/placeholder data.
|
|
18
|
+
- **Fallbacks are the exception.** Only use them when it's a widely accepted best practice (CSS fallback fonts, CDN failover, image placeholders). Otherwise handle the error explicitly.
|
|
19
|
+
- **Pattern**: `try { doThing() } catch (error) { showErrorUI(error) }` — NOT `try { doThing() } catch { return fallbackValue }`.
|
|
27
20
|
|
|
28
21
|
## Investigation Workflow
|
|
29
22
|
|
|
30
23
|
When investigating bugs, analyzing features, or exploring code:
|
|
31
24
|
|
|
32
|
-
1. **Define exit criteria upfront**
|
|
33
|
-
2. **Checkpoint progress**
|
|
34
|
-
3. **
|
|
35
|
-
4. **Always deliver findings**
|
|
36
|
-
- Files examined
|
|
37
|
-
- Key findings
|
|
38
|
-
- Remaining unknowns
|
|
39
|
-
- Recommended next steps
|
|
40
|
-
|
|
41
|
-
For complex investigations, use `/devlyn:team-resolve` to assemble a multi-perspective investigation team, or spawn parallel Agent subagents to explore different areas simultaneously.
|
|
42
|
-
|
|
43
|
-
## UI/UX Workflow
|
|
44
|
-
|
|
45
|
-
The full design-to-implementation pipeline:
|
|
46
|
-
|
|
47
|
-
1. `/devlyn:design-ui` → Generate 5 style explorations
|
|
48
|
-
2. `/devlyn:design-system [N]` → Extract tokens from chosen style
|
|
49
|
-
3. `/devlyn:implement-ui` → Team implements or improves UI from design system
|
|
50
|
-
4. `/devlyn:team-resolve [feature]` → Add features on top
|
|
51
|
-
|
|
52
|
-
## Feature Development
|
|
53
|
-
|
|
54
|
-
1. **Plan first** - Always output a concrete implementation plan with specific file changes before writing code
|
|
55
|
-
2. **Track progress** - Use the task tools (TaskCreate / TaskUpdate) to checkpoint each phase
|
|
56
|
-
3. **Test validation** - Write tests alongside implementation; iterate until green
|
|
57
|
-
4. **Small commits** - Commit working increments rather than large changesets
|
|
58
|
-
|
|
59
|
-
For complex features, spawn the `Plan` subagent (`Agent` tool with `subagent_type: "Plan"`) to design the approach before implementation.
|
|
60
|
-
|
|
61
|
-
## Automated Pipeline (Recommended Starting Point)
|
|
62
|
-
|
|
63
|
-
For hands-free build-evaluate-polish cycles — works for bugs, features, refactors, and chores:
|
|
64
|
-
|
|
65
|
-
```
|
|
66
|
-
/devlyn:auto-resolve [task description]
|
|
67
|
-
```
|
|
68
|
-
|
|
69
|
-
This runs the full pipeline automatically: **Build → Build Gate → Browser Validate → Evaluate → Fix Loop → Simplify → Review → Challenge → Security Review → Clean → Docs**. Each phase runs as a separate subagent with its own context. Communication between phases happens via files (`.devlyn/done-criteria.md`, `.devlyn/BUILD-GATE.md`, `.devlyn/EVAL-FINDINGS.md`, `.devlyn/BROWSER-RESULTS.md`, `.devlyn/CHALLENGE-FINDINGS.md`).
|
|
70
|
-
|
|
71
|
-
The **Build Gate** (Phase 1.4) runs real compilers, typecheckers, and linters — the same commands CI/Docker/production will run. It auto-detects project types (Next.js, Rust, Go, Solidity, Expo, Swift, etc.) and Dockerfiles. This is the primary defense against "tests pass locally, breaks in CI/Docker" class of bugs (type errors in un-tested files, cross-package drift, Dockerfile copy mismatches).
|
|
72
|
-
|
|
73
|
-
The **Challenge** phase (Phase 4.5) is a fresh skeptical review with no checklist — a subagent reads the entire diff cold with zero context from prior phases and asks "would I ship this to production with my name on it?" This catches the subtle issues that structured checklist-driven reviews miss: wrong-but-working approaches, unstated assumptions, non-idiomatic patterns, and integration gaps.
|
|
74
|
-
|
|
75
|
-
For web projects, the Browser Validate phase starts the dev server and tests the implemented feature in a real browser — clicking buttons, filling forms, verifying results. If the feature doesn't work, findings feed back into the fix loop.
|
|
76
|
-
|
|
77
|
-
Optional flags:
|
|
78
|
-
- `--max-rounds 6` — increase max evaluate-fix iterations (default: 4)
|
|
79
|
-
- `--skip-browser` — skip browser validation phase (auto-skipped for non-web changes)
|
|
80
|
-
- `--skip-build-gate` — skip the deterministic build gate (not recommended)
|
|
81
|
-
- `--build-gate strict` — treat warnings as errors; `--build-gate no-docker` — skip Docker builds for speed
|
|
82
|
-
- `--skip-review` — skip team-review phase
|
|
83
|
-
- `--skip-clean` — skip clean phase
|
|
84
|
-
- `--skip-docs` — skip update-docs phase
|
|
85
|
-
- `--engine auto|codex|claude` — intelligent model routing. `auto` (default) routes each phase and team role to the optimal model based on benchmark data: Codex GPT-5.4 handles BUILD and FIX (SWE-bench Pro lead), Claude Opus 4.7 handles EVALUATE and CHALLENGE (long-context retrieval + skeptical reasoning). Different models build vs critique — the cross-model GAN dynamic catches what single-model pipelines miss. `codex` forces Codex for implementation, Claude for orchestration and Chrome MCP. `claude` uses Claude for everything. Requires codex-mcp-server for `auto` and `codex` modes.
|
|
86
|
-
|
|
87
|
-
## Preflight Check (Post-Roadmap Verification)
|
|
88
|
-
|
|
89
|
-
After completing a roadmap (or a phase), verify that everything was actually implemented correctly:
|
|
90
|
-
|
|
91
|
-
```
|
|
92
|
-
/devlyn:preflight
|
|
93
|
-
```
|
|
25
|
+
1. **Define exit criteria upfront** — ask "what does 'done' look like?" before starting.
|
|
26
|
+
2. **Checkpoint progress** — use `TaskCreate`/`TaskUpdate` every 5–10 minutes.
|
|
27
|
+
3. **Intermediate summaries** — output "current understanding" snapshots so work isn't lost if interrupted.
|
|
28
|
+
4. **Always deliver findings** — never end mid-analysis. Minimum output: files examined, key findings, remaining unknowns, recommended next steps.
|
|
94
29
|
|
|
95
|
-
|
|
30
|
+
For complex investigations, use `/devlyn:team-resolve` for a multi-perspective team, or spawn parallel `Agent` subagents.
|
|
96
31
|
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
Optional flags:
|
|
100
|
-
- `--phase N` — audit only phase N items
|
|
101
|
-
- `--autofix` — auto-promote CRITICAL/HIGH findings and run auto-resolve
|
|
102
|
-
- `--skip-browser` — skip browser validation
|
|
103
|
-
- `--skip-docs` — skip documentation audit
|
|
104
|
-
- `--engine auto|codex|claude` — `auto` (default) routes the code-auditor to Codex (SWE-bench Pro +11.7pp on code analysis); the docs-auditor and browser-auditor always use Claude regardless of `--engine` (writing-quality strength on docs drift; Chrome MCP tools are session-bound to Claude Code)
|
|
105
|
-
|
|
106
|
-
**Recommended workflow**: `/devlyn:ideate` → `/devlyn:auto-resolve` (repeat) → `/devlyn:preflight` → fix gaps → `/devlyn:preflight` (verify)
|
|
107
|
-
|
|
108
|
-
## Manual Pipeline (Step-by-Step Control)
|
|
109
|
-
|
|
110
|
-
When you want to run each step yourself with review between phases:
|
|
111
|
-
|
|
112
|
-
1. `/devlyn:team-resolve [issue]` → Investigate + implement (writes `.devlyn/done-criteria.md`)
|
|
113
|
-
2. `/devlyn:evaluate` → Grade against done-criteria (writes `.devlyn/EVAL-FINDINGS.md`)
|
|
114
|
-
3. If findings exist: `/devlyn:team-resolve "Fix issues in .devlyn/EVAL-FINDINGS.md"` → Fix loop
|
|
115
|
-
4. `/simplify` → Quick cleanup pass
|
|
116
|
-
5. `/devlyn:team-review` → Multi-perspective team review (for important PRs)
|
|
117
|
-
6. `/devlyn:clean` → Codebase hygiene
|
|
118
|
-
7. `/devlyn:update-docs` → Keep docs in sync
|
|
119
|
-
|
|
120
|
-
Steps 5-7 are optional depending on scope.
|
|
121
|
-
|
|
122
|
-
## Vibe Coding Workflow
|
|
123
|
-
|
|
124
|
-
The recommended sequence after writing code:
|
|
125
|
-
|
|
126
|
-
1. **Write code** (vibe coding)
|
|
127
|
-
2. `/simplify` → Quick cleanup pass (reuse, quality, efficiency)
|
|
128
|
-
3. `/devlyn:review` → Thorough solo review with security-first checklist
|
|
129
|
-
4. `/devlyn:team-review` → Multi-perspective team review (for important PRs)
|
|
130
|
-
5. `/devlyn:clean` → Periodic codebase-wide hygiene
|
|
131
|
-
6. `/devlyn:update-docs` → Keep docs in sync
|
|
132
|
-
|
|
133
|
-
Steps 4-6 are optional depending on the scope of changes. `/simplify` should always run before `/devlyn:review` to catch low-hanging fruit cheaply.
|
|
134
|
-
|
|
135
|
-
## Documentation Workflow
|
|
136
|
-
|
|
137
|
-
- **Sync docs with codebase**: Use `/devlyn:update-docs` to clean up stale content, update outdated info, and generate missing docs
|
|
138
|
-
- **Focused doc update**: Use `/devlyn:update-docs [area]` for targeted updates (e.g., "API reference", "getting-started")
|
|
139
|
-
- Preserves all forward-looking content: roadmaps, future plans, visions, open questions
|
|
140
|
-
- If no docs exist, proposes a tailored docs structure and generates initial content
|
|
141
|
-
|
|
142
|
-
## Browser Testing Workflow
|
|
143
|
-
|
|
144
|
-
- **Standalone**: Use `/devlyn:browser-validate` to test any web feature in the browser — starts the dev server, tests the feature end-to-end, fixes issues it finds
|
|
145
|
-
- **In pipeline**: Auto-resolve includes browser validation automatically for web projects (between Build and Evaluate phases)
|
|
146
|
-
- **Tiered**: Uses chrome MCP tools if available, falls back to Playwright, then curl
|
|
147
|
-
- **Feature-first**: Tests the implemented feature (from done-criteria), not just "does the page load"
|
|
148
|
-
|
|
149
|
-
## Debugging Workflow
|
|
32
|
+
## Context Window Management
|
|
150
33
|
|
|
151
|
-
-
|
|
152
|
-
- **Complex bugs**: Use `/devlyn:team-resolve` for multi-perspective investigation with a full agent team
|
|
153
|
-
- **Hands-free**: Use `/devlyn:auto-resolve` for fully automated resolve → evaluate → fix → polish pipeline
|
|
154
|
-
- **Post-fix review**: Use `/devlyn:team-review` for thorough multi-reviewer validation
|
|
34
|
+
Claude 4.5/4.6/4.7 auto-compact as context approaches the limit — you can work indefinitely without manual handoffs in most cases. Don't stop early due to token-budget concerns; the model resumes from where it left off after compaction.
|
|
155
35
|
|
|
156
|
-
|
|
36
|
+
For multi-context-window work (e.g., a large roadmap), persist state to disk:
|
|
37
|
+
- auto-resolve writes durable state to `.devlyn/runs/<run_id>/` (pipeline.state.json, `<phase>.findings.jsonl`, `<phase>.log.md`) plus git commits. Pick up by reading `state.json` first; drill into JSONL/log files as needed.
|
|
38
|
+
- preflight writes `.devlyn/PREFLIGHT-REPORT.md`.
|
|
39
|
+
- For long investigations, write progress to `HANDOFF.md`; resume with `@HANDOFF.md continue`.
|
|
157
40
|
|
|
158
|
-
|
|
159
|
-
- **Focused cleanup**: Use `/devlyn:clean [category]` for targeted sweeps (dead code, deps, tests, complexity, hygiene)
|
|
160
|
-
- **Periodic maintenance sequence**: `/devlyn:clean` → `/simplify` → `/devlyn:update-docs` → `/devlyn:review`
|
|
41
|
+
Manually `/clear` only when context is genuinely irrelevant to the next task.
|
|
161
42
|
|
|
162
|
-
##
|
|
43
|
+
## Communication Style
|
|
163
44
|
|
|
164
|
-
|
|
45
|
+
- Lead with **objective data** (popularity, benchmarks, community adoption) before personal opinions.
|
|
46
|
+
- When the user asks "what's popular" or "what do others use", answer with data.
|
|
47
|
+
- Keep recommendations actionable and specific.
|
|
165
48
|
|
|
166
|
-
|
|
167
|
-
- All `auto-resolve` and `preflight` runs already write durable state to `.devlyn/*.md` (done-criteria, BUILD-GATE, EVAL-FINDINGS, BROWSER-RESULTS, CHALLENGE-FINDINGS, PREFLIGHT-REPORT) and to git commits — pick up by reading those files plus `git log`.
|
|
168
|
-
- For long investigations, write progress notes to a `HANDOFF.md` and resume with `@HANDOFF.md continue from where this left off` if you need a fresh window.
|
|
49
|
+
## Commit Conventions
|
|
169
50
|
|
|
170
|
-
|
|
51
|
+
Follow `.claude/commit-conventions.md`.
|
|
171
52
|
|
|
172
|
-
##
|
|
53
|
+
## Design System
|
|
173
54
|
|
|
174
|
-
|
|
175
|
-
- When user asks "what's popular" or "what do others use", provide data-driven answers
|
|
176
|
-
- Keep recommendations actionable and specific
|
|
55
|
+
When doing UI/UX work, follow `docs/design-system.md` if it exists.
|