devlyn-cli 1.0.1 → 1.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CLAUDE.md
CHANGED
|
@@ -63,6 +63,7 @@ Optional flags:
|
|
|
63
63
|
- `--skip-review` — skip team-review phase
|
|
64
64
|
- `--skip-clean` — skip clean phase
|
|
65
65
|
- `--skip-docs` — skip update-docs phase
|
|
66
|
+
- `--with-codex [evaluate|review|both]` — use OpenAI Codex as cross-model evaluator/reviewer (requires codex-mcp-server)
|
|
66
67
|
|
|
67
68
|
## Manual Pipeline (Step-by-Step Control)
|
|
68
69
|
|
package/README.md
CHANGED
|
@@ -6,9 +6,9 @@
|
|
|
6
6
|
<img alt="DEVLYN" src="assets/logo.svg" width="540" />
|
|
7
7
|
</picture>
|
|
8
8
|
|
|
9
|
-
###
|
|
9
|
+
### Context Engineering & Harness Engineering Toolkit for Claude Code
|
|
10
10
|
|
|
11
|
-
**
|
|
11
|
+
**Structured prompts, agent orchestration, and automated pipelines — debugging, code review, UI design, product specs, and more.**
|
|
12
12
|
|
|
13
13
|
[](https://www.npmjs.com/package/devlyn-cli)
|
|
14
14
|
[](https://opensource.org/licenses/MIT)
|
|
@@ -24,13 +24,26 @@
|
|
|
24
24
|
|
|
25
25
|
[Claude Code](https://docs.anthropic.com/en/docs/claude-code) is powerful out of the box — but teams need **consistent, repeatable workflows**. Without shared conventions, every developer prompts differently, reviews differently, and debugs differently.
|
|
26
26
|
|
|
27
|
-
devlyn-cli solves this
|
|
27
|
+
devlyn-cli solves this with two complementary engineering approaches:
|
|
28
|
+
|
|
29
|
+
### Context Engineering
|
|
30
|
+
|
|
31
|
+
Structured prompts and role-based instructions that shape _what the AI knows and how it thinks_ for each task.
|
|
28
32
|
|
|
29
33
|
- **16 slash commands** for debugging, code review, UI design, documentation, and more
|
|
30
34
|
- **5 core skills** that activate automatically based on conversation context
|
|
31
|
-
- **Agent team workflows** that spawn specialized AI teammates
|
|
35
|
+
- **Agent team workflows** that spawn specialized AI teammates with role-specific expertise
|
|
32
36
|
- **Product & feature spec templates** for structured planning
|
|
33
|
-
|
|
37
|
+
|
|
38
|
+
### Harness Engineering
|
|
39
|
+
|
|
40
|
+
Pipeline orchestration that controls _how agents execute_ — permissions, state management, multi-phase workflows, and cross-model evaluation.
|
|
41
|
+
|
|
42
|
+
- **`/devlyn:auto-resolve`** — 8-phase automated pipeline (build → evaluate → fix loop → simplify → review → security → clean → docs)
|
|
43
|
+
- **`bypassPermissions` mode** for autonomous subagent execution
|
|
44
|
+
- **File-based state machine** — agents communicate via `.claude/done-criteria.md` and `EVAL-FINDINGS.md`
|
|
45
|
+
- **Git checkpoints** at each phase for rollback safety
|
|
46
|
+
- **Cross-model evaluation** via `--with-codex` flag (OpenAI Codex as independent evaluator)
|
|
34
47
|
|
|
35
48
|
**Zero dependencies. One command. Works with any project.**
|
|
36
49
|
|
|
@@ -76,7 +89,7 @@ Slash commands are invoked directly in Claude Code conversations (e.g., type `/d
|
|
|
76
89
|
|---|---|
|
|
77
90
|
| `/devlyn:resolve` | Systematic bug fixing with root-cause analysis and test-driven validation |
|
|
78
91
|
| `/devlyn:team-resolve` | Spawns a full agent team — root cause analyst, test engineer, security auditor — to investigate complex issues |
|
|
79
|
-
| `/devlyn:auto-resolve` | Fully automated pipeline for any task — bugs, features, refactors, chores. Build → evaluate → fix loop → simplify → review → clean → docs. One command, zero human intervention |
|
|
92
|
+
| `/devlyn:auto-resolve` | Fully automated pipeline for any task — bugs, features, refactors, chores. Build → evaluate → fix loop → simplify → review → clean → docs. One command, zero human intervention. Supports `--with-codex` for cross-model evaluation via OpenAI Codex |
|
|
80
93
|
|
|
81
94
|
### Code Review & Quality
|
|
82
95
|
|
|
@@ -146,7 +159,7 @@ One command runs the full cycle — no human intervention needed:
|
|
|
146
159
|
| **Clean** | Remove dead code and unused dependencies |
|
|
147
160
|
| **Docs** | Sync documentation with changes |
|
|
148
161
|
|
|
149
|
-
Each phase runs as a separate subagent (fresh context), communicates via files, and commits a git checkpoint for rollback safety. Skip phases with flags: `--skip-review`, `--skip-clean`, `--skip-docs`, `--max-rounds 3
|
|
162
|
+
Each phase runs as a separate subagent (fresh context), communicates via files, and commits a git checkpoint for rollback safety. Skip phases with flags: `--skip-review`, `--skip-clean`, `--skip-docs`, `--max-rounds 3`, `--with-codex` (cross-model evaluation via OpenAI Codex).
|
|
150
163
|
|
|
151
164
|
### Manual Workflow
|
|
152
165
|
|
|
@@ -208,6 +221,9 @@ Copied directly into your `.claude/skills/` directory.
|
|
|
208
221
|
| `prompt-engineering` | Claude 4 prompt optimization using official Anthropic best practices |
|
|
209
222
|
| `better-auth-setup` | Production-ready Better Auth + Hono + Drizzle + PostgreSQL auth setup |
|
|
210
223
|
| `pyx-scan` | Check whether an AI agent skill is safe before installing |
|
|
224
|
+
| `dokkit` | Document template filling for DOCX/HWPX — ingest, fill, review, export |
|
|
225
|
+
| `devlyn:pencil-pull` | Pull Pencil designs into code with exact visual fidelity |
|
|
226
|
+
| `devlyn:pencil-push` | Push codebase UI to Pencil canvas for design sync |
|
|
211
227
|
|
|
212
228
|
### Community Packs
|
|
213
229
|
|
|
@@ -219,6 +235,7 @@ Installed via the [skills CLI](https://github.com/anthropics/skills) (`npx skill
|
|
|
219
235
|
| `supabase/agent-skills` | Supabase integration patterns |
|
|
220
236
|
| `coreyhaines31/marketingskills` | Marketing automation and content skills |
|
|
221
237
|
| `anthropics/skills` | Official Anthropic skill-creator with eval framework and description optimizer |
|
|
238
|
+
| `Leonxlnx/taste-skill` | Premium frontend design skills — modern layouts, animations, and visual refinement |
|
|
222
239
|
|
|
223
240
|
> **Want to add a pack?** Open a PR adding your pack to the `OPTIONAL_ADDONS` array in [`bin/devlyn.js`](bin/devlyn.js).
|
|
224
241
|
|
|
@@ -20,21 +20,26 @@ $ARGUMENTS
|
|
|
20
20
|
- `--security-review` (auto) — run dedicated security audit. Auto-detects: runs when changes touch auth, secrets, user data, API endpoints, env/config, or crypto. Force with `--security-review always` or skip with `--security-review skip`
|
|
21
21
|
- `--skip-clean` (false) — skip clean phase
|
|
22
22
|
- `--skip-docs` (false) — skip update-docs phase
|
|
23
|
+
- `--with-codex` (false) — use OpenAI Codex as a cross-model evaluator/reviewer via `mcp__codex-cli__*` MCP tools. Accepts: `evaluate`, `review`, or `both` (default when flag is present without value). When enabled, Codex provides an independent second opinion from a different model family, creating a GAN-like dynamic where Claude builds and Codex critiques.
|
|
23
24
|
|
|
24
25
|
Flags can be passed naturally: `/devlyn:auto-resolve fix the auth bug --max-rounds 3 --skip-docs`
|
|
26
|
+
Codex examples: `--with-codex` (both), `--with-codex evaluate`, `--with-codex review`
|
|
25
27
|
If no flags are present, use defaults.
|
|
26
28
|
|
|
27
|
-
3.
|
|
29
|
+
3. **If `--with-codex` is enabled**: Read `references/codex-integration.md` and run the "PRE-FLIGHT CHECK" section to verify Codex MCP server availability before proceeding.
|
|
30
|
+
|
|
31
|
+
4. Announce the pipeline plan:
|
|
28
32
|
```
|
|
29
33
|
Auto-resolve pipeline starting
|
|
30
34
|
Task: [extracted task description]
|
|
31
35
|
Phases: Build → Evaluate → [Fix loop if needed] → Simplify → [Review] → [Security] → [Clean] → [Docs]
|
|
32
36
|
Max evaluation rounds: [N]
|
|
37
|
+
Cross-model evaluation (Codex): [evaluate / review / both / disabled]
|
|
33
38
|
```
|
|
34
39
|
|
|
35
40
|
## PHASE 1: BUILD
|
|
36
41
|
|
|
37
|
-
Spawn a subagent using the Agent tool to investigate and implement the fix. The subagent does NOT have access to skills, so include all necessary instructions inline.
|
|
42
|
+
Spawn a subagent using the Agent tool with `mode: "bypassPermissions"` to investigate and implement the fix. The subagent does NOT have access to skills, so include all necessary instructions inline.
|
|
38
43
|
|
|
39
44
|
Agent prompt — pass this to the Agent tool:
|
|
40
45
|
|
|
@@ -72,7 +77,7 @@ The task is: [paste the task description here]
|
|
|
72
77
|
|
|
73
78
|
## PHASE 2: EVALUATE
|
|
74
79
|
|
|
75
|
-
Spawn a subagent using the Agent tool to evaluate the work. Include all evaluation instructions inline.
|
|
80
|
+
Spawn a subagent using the Agent tool with `mode: "bypassPermissions"` to evaluate the work. Include all evaluation instructions inline.
|
|
76
81
|
|
|
77
82
|
Agent prompt — pass this to the Agent tool:
|
|
78
83
|
|
|
@@ -120,18 +125,19 @@ Do NOT delete `.claude/done-criteria.md` or `.claude/EVAL-FINDINGS.md` — the o
|
|
|
120
125
|
**After the agent completes**:
|
|
121
126
|
1. Read `.claude/EVAL-FINDINGS.md`
|
|
122
127
|
2. Extract the verdict
|
|
123
|
-
3.
|
|
128
|
+
3. **If `--with-codex` includes `evaluate` or `both`**: Read `references/codex-integration.md` and follow the "PHASE 2-CODEX: CROSS-MODEL EVALUATE" section. This runs Codex as a second evaluator and merges findings into `EVAL-FINDINGS.md`.
|
|
129
|
+
4. Branch on verdict (from the merged findings if Codex was used):
|
|
124
130
|
- `PASS` → skip to PHASE 3
|
|
125
131
|
- `PASS WITH ISSUES` → skip to PHASE 3 (issues are shippable)
|
|
126
132
|
- `NEEDS WORK` → go to PHASE 2.5 (fix loop)
|
|
127
133
|
- `BLOCKED` → go to PHASE 2.5 (fix loop)
|
|
128
|
-
|
|
134
|
+
5. If `.claude/EVAL-FINDINGS.md` was not created, treat as PASS WITH ISSUES and log a warning
|
|
129
135
|
|
|
130
136
|
## PHASE 2.5: FIX LOOP (conditional)
|
|
131
137
|
|
|
132
138
|
Track the current round number. If `round >= max-rounds`, stop the loop and proceed to PHASE 3 with a warning that unresolved findings remain.
|
|
133
139
|
|
|
134
|
-
Spawn a subagent using the Agent tool to fix the evaluation findings.
|
|
140
|
+
Spawn a subagent using the Agent tool with `mode: "bypassPermissions"` to fix the evaluation findings.
|
|
135
141
|
|
|
136
142
|
Agent prompt — pass this to the Agent tool:
|
|
137
143
|
|
|
@@ -148,7 +154,7 @@ For each finding: read the referenced file:line, understand the issue, implement
|
|
|
148
154
|
|
|
149
155
|
## PHASE 3: SIMPLIFY
|
|
150
156
|
|
|
151
|
-
Spawn a subagent using the Agent tool for a quick cleanup pass.
|
|
157
|
+
Spawn a subagent using the Agent tool with `mode: "bypassPermissions"` for a quick cleanup pass.
|
|
152
158
|
|
|
153
159
|
Agent prompt — pass this to the Agent tool:
|
|
154
160
|
|
|
@@ -161,7 +167,7 @@ Review the recently changed files (use `git diff HEAD~1` to see what changed). L
|
|
|
161
167
|
|
|
162
168
|
Skip if `--skip-review` was set.
|
|
163
169
|
|
|
164
|
-
Spawn a subagent using the Agent tool for a multi-perspective review.
|
|
170
|
+
Spawn a subagent using the Agent tool with `mode: "bypassPermissions"` for a multi-perspective review.
|
|
165
171
|
|
|
166
172
|
Agent prompt — pass this to the Agent tool:
|
|
167
173
|
|
|
@@ -171,7 +177,9 @@ Each reviewer evaluates from their perspective, sends findings with file:line ev
|
|
|
171
177
|
|
|
172
178
|
Clean up the team after completion.
|
|
173
179
|
|
|
174
|
-
**
|
|
180
|
+
**If `--with-codex` includes `review` or `both`**: Read `references/codex-integration.md` and follow the "PHASE 4B: CODEX REVIEW" section. This runs Codex's independent code review and reconciles findings with the Claude team review.
|
|
181
|
+
|
|
182
|
+
**After the review phase completes**:
|
|
175
183
|
1. If CRITICAL issues remain unfixed, log a warning in the final report
|
|
176
184
|
2. **Checkpoint**: Run `git add -A && git commit -m "chore(pipeline): review fixes complete"` if there are changes
|
|
177
185
|
|
|
@@ -185,7 +193,7 @@ Determine whether to run this phase:
|
|
|
185
193
|
- Also run `git diff main` and scan for patterns: `API_KEY`, `SECRET`, `TOKEN`, `PASSWORD`, `PRIVATE_KEY`, `Bearer`, `jwt`, `bcrypt`, `crypto`, `env.`, `process.env`
|
|
186
194
|
- If any match → run. If no matches → skip and note "Security review skipped — no security-sensitive changes detected."
|
|
187
195
|
|
|
188
|
-
Spawn a subagent using the Agent tool for a dedicated security audit.
|
|
196
|
+
Spawn a subagent using the Agent tool with `mode: "bypassPermissions"` for a dedicated security audit.
|
|
189
197
|
|
|
190
198
|
Agent prompt — pass this to the Agent tool:
|
|
191
199
|
|
|
@@ -213,7 +221,7 @@ Fix any CRITICAL findings directly. For HIGH findings, fix if straightforward, o
|
|
|
213
221
|
|
|
214
222
|
Skip if `--skip-clean` was set.
|
|
215
223
|
|
|
216
|
-
Spawn a subagent using the Agent tool
|
|
224
|
+
Spawn a subagent using the Agent tool with `mode: "bypassPermissions"`.
|
|
217
225
|
|
|
218
226
|
Agent prompt — pass this to the Agent tool:
|
|
219
227
|
|
|
@@ -226,7 +234,7 @@ Scan the codebase for dead code, unused dependencies, and code hygiene issues in
|
|
|
226
234
|
|
|
227
235
|
Skip if `--skip-docs` was set.
|
|
228
236
|
|
|
229
|
-
Spawn a subagent using the Agent tool
|
|
237
|
+
Spawn a subagent using the Agent tool with `mode: "bypassPermissions"`.
|
|
230
238
|
|
|
231
239
|
Agent prompt — pass this to the Agent tool:
|
|
232
240
|
|
|
@@ -256,10 +264,12 @@ After all phases complete:
|
|
|
256
264
|
| Phase | Status | Notes |
|
|
257
265
|
|-------|--------|-------|
|
|
258
266
|
| Build (team-resolve) | [completed] | [brief summary] |
|
|
259
|
-
| Evaluate | [PASS/NEEDS WORK after N rounds] | [verdict + key findings] |
|
|
267
|
+
| Evaluate (Claude) | [PASS/NEEDS WORK after N rounds] | [verdict + key findings] |
|
|
268
|
+
| Evaluate (Codex) | [completed / skipped] | [Codex-only findings count, merged verdict] |
|
|
260
269
|
| Fix rounds | [N rounds / skipped] | [what was fixed] |
|
|
261
270
|
| Simplify | [completed / skipped] | [changes made] |
|
|
262
|
-
| Review (team
|
|
271
|
+
| Review (Claude team) | [completed / skipped] | [findings summary] |
|
|
272
|
+
| Review (Codex) | [completed / skipped] | [Codex-only findings, agreed findings] |
|
|
263
273
|
| Security review | [completed / skipped / auto-skipped] | [findings or "no security-sensitive changes"] |
|
|
264
274
|
| Clean | [completed / skipped] | [items cleaned] |
|
|
265
275
|
| Docs (update-docs) | [completed / skipped] | [docs updated] |
|
|
@@ -0,0 +1,103 @@
|
|
|
1
|
+
# Codex Cross-Model Integration
|
|
2
|
+
|
|
3
|
+
Instructions for using OpenAI Codex as an independent evaluator/reviewer in the auto-resolve pipeline. Only read this file when `--with-codex` is enabled.
|
|
4
|
+
|
|
5
|
+
Codex is accessed via `mcp__codex-cli__*` MCP tools (provided by codex-mcp-server). This creates a GAN-like adversarial dynamic — Claude builds and Codex critiques, reducing shared blind spots between model families.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## PRE-FLIGHT CHECK
|
|
10
|
+
|
|
11
|
+
Before starting the pipeline, verify the Codex MCP server is available by calling `mcp__codex-cli__ping`.
|
|
12
|
+
|
|
13
|
+
- **If ping succeeds**: continue normally.
|
|
14
|
+
- **If ping fails or `mcp__codex-cli__ping` tool is not found**: warn the user and ask how to proceed:
|
|
15
|
+
```
|
|
16
|
+
⚠ Codex MCP server not detected. --with-codex requires codex-mcp-server.
|
|
17
|
+
|
|
18
|
+
To install:
|
|
19
|
+
npm i -g @openai/codex
|
|
20
|
+
claude mcp add codex-cli -- npx -y codex-mcp-server
|
|
21
|
+
|
|
22
|
+
Options:
|
|
23
|
+
[1] Continue without --with-codex (Claude-only evaluation/review)
|
|
24
|
+
[2] Abort pipeline
|
|
25
|
+
```
|
|
26
|
+
If the user chooses [1], disable `--with-codex` and continue. If [2], stop.
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## PHASE 2-CODEX: CROSS-MODEL EVALUATE
|
|
31
|
+
|
|
32
|
+
Run after the Claude evaluator (Phase 2) completes, only if `--with-codex` includes `evaluate` or `both`.
|
|
33
|
+
|
|
34
|
+
### Step 1 — Get Codex's evaluation
|
|
35
|
+
|
|
36
|
+
Call `mcp__codex-cli__codex` with:
|
|
37
|
+
- `prompt`: Include the full content of `.claude/done-criteria.md` and the output of `git diff HEAD~1`. Ask Codex to evaluate the changes against the done criteria and report issues by severity (CRITICAL, HIGH, MEDIUM, LOW) with file:line references.
|
|
38
|
+
- `workingDirectory`: the project root
|
|
39
|
+
- `sandbox`: `"read-only"` (Codex should only read, not modify files)
|
|
40
|
+
- `reasoningEffort`: `"high"`
|
|
41
|
+
|
|
42
|
+
Example prompt to pass:
|
|
43
|
+
```
|
|
44
|
+
You are an independent code evaluator. Grade the following code changes against the done criteria below. Be strict — when in doubt, flag it.
|
|
45
|
+
|
|
46
|
+
## Done Criteria
|
|
47
|
+
[paste contents of .claude/done-criteria.md]
|
|
48
|
+
|
|
49
|
+
## Code Changes
|
|
50
|
+
[paste output of git diff HEAD~1]
|
|
51
|
+
|
|
52
|
+
For each criterion, mark VERIFIED (with evidence) or FAILED (with file:line and what's wrong).
|
|
53
|
+
Then list all issues found grouped by severity: CRITICAL, HIGH, MEDIUM, LOW.
|
|
54
|
+
For each issue provide: file:line, description, and suggested fix.
|
|
55
|
+
End with a verdict: PASS, PASS WITH ISSUES, NEEDS WORK, or BLOCKED.
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
### Step 2 — Merge findings
|
|
59
|
+
|
|
60
|
+
Spawn a subagent using the Agent tool with `mode: "bypassPermissions"` to merge Claude's and Codex's evaluations.
|
|
61
|
+
|
|
62
|
+
Agent prompt:
|
|
63
|
+
|
|
64
|
+
Read `.claude/EVAL-FINDINGS.md` (Claude's evaluation) and the Codex evaluation output below. Merge them into a single unified `.claude/EVAL-FINDINGS.md` following the existing format. Rules:
|
|
65
|
+
- Take the MORE SEVERE verdict between the two evaluators
|
|
66
|
+
- Deduplicate findings that reference the same file:line or describe the same issue
|
|
67
|
+
- When both evaluators flag the same issue, keep the more detailed description
|
|
68
|
+
- Prefix Codex-only findings with `[codex]` so the fix loop knows the source
|
|
69
|
+
- Preserve the exact structure: Verdict, Done Criteria Results, Findings Requiring Action (CRITICAL/HIGH), Cross-Cutting Patterns
|
|
70
|
+
|
|
71
|
+
Codex evaluation:
|
|
72
|
+
[paste Codex's response here]
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
76
|
+
## PHASE 4B: CODEX REVIEW
|
|
77
|
+
|
|
78
|
+
Run after the Claude team review (Phase 4A) completes, only if `--with-codex` includes `review` or `both`.
|
|
79
|
+
|
|
80
|
+
### Step 1 — Run Codex review
|
|
81
|
+
|
|
82
|
+
Call `mcp__codex-cli__review` with:
|
|
83
|
+
- `base`: `"main"` — review all changes since main
|
|
84
|
+
- `workingDirectory`: the project root
|
|
85
|
+
- `title`: `"Cross-model review (Codex)"`
|
|
86
|
+
|
|
87
|
+
This runs OpenAI Codex's built-in code review against the diff. The review tool returns structured findings automatically — no custom prompt needed.
|
|
88
|
+
|
|
89
|
+
### Step 2 — Reconcile both reviews
|
|
90
|
+
|
|
91
|
+
Spawn a subagent using the Agent tool with `mode: "bypassPermissions"` to reconcile both reviews.
|
|
92
|
+
|
|
93
|
+
Agent prompt:
|
|
94
|
+
|
|
95
|
+
Two independent reviews have been conducted on recent changes — one by a Claude team review and one by OpenAI Codex. Reconcile them:
|
|
96
|
+
|
|
97
|
+
Claude team review findings: [paste Phase 4A agent's output summary]
|
|
98
|
+
Codex review findings: [paste mcp__codex-cli__review output]
|
|
99
|
+
|
|
100
|
+
1. Deduplicate findings that describe the same issue
|
|
101
|
+
2. For unique Codex findings not caught by Claude's team, prefix with `[codex]` and assess severity
|
|
102
|
+
3. Fix any CRITICAL issues directly. For HIGH issues, fix if straightforward.
|
|
103
|
+
4. Write a brief reconciliation summary to stdout listing: findings from both (agreed), Claude-only, Codex-only, and what was fixed
|