devlyn-cli 1.14.0 → 1.15.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CLAUDE.md +28 -149
- package/config/skills/devlyn:auto-resolve/SKILL.md +165 -515
- package/config/skills/devlyn:auto-resolve/evals/evals.json +21 -0
- package/config/skills/devlyn:auto-resolve/evals/task-doctor-subcommand.md +42 -0
- package/config/skills/devlyn:auto-resolve/references/build-gate.md +36 -22
- package/config/skills/devlyn:auto-resolve/references/engine-routing.md +43 -165
- package/config/skills/devlyn:auto-resolve/references/findings-schema.md +103 -0
- package/config/skills/devlyn:auto-resolve/references/phases/phase-1-build.md +54 -0
- package/config/skills/devlyn:auto-resolve/references/phases/phase-2-evaluate.md +45 -0
- package/config/skills/devlyn:auto-resolve/references/phases/phase-3-critic.md +84 -0
- package/config/skills/devlyn:auto-resolve/references/pipeline-routing.md +114 -0
- package/config/skills/devlyn:auto-resolve/references/pipeline-state.md +201 -0
- package/config/skills/devlyn:auto-resolve/scripts/archive_run.py +104 -0
- package/config/skills/devlyn:auto-resolve/scripts/terminal_verdict.py +96 -0
- package/config/skills/devlyn:ideate/SKILL.md +12 -64
- package/config/skills/devlyn:ideate/references/codex-critic-template.md +42 -0
- package/config/skills/devlyn:preflight/SKILL.md +25 -40
- package/config/skills/devlyn:preflight/references/auditors/code-auditor.md +6 -10
- package/config/skills/devlyn:reap/SKILL.md +104 -0
- package/config/skills/devlyn:reap/scripts/reap.sh +129 -0
- package/config/skills/devlyn:reap/scripts/scan.sh +116 -0
- package/package.json +5 -1
|
@@ -31,7 +31,7 @@ Parse these from the user's invocation message:
|
|
|
31
31
|
|
|
32
32
|
**Engine pre-flight** (runs unless `--engine claude` was explicitly passed):
|
|
33
33
|
- The default engine is `auto`. If the user did not pass `--engine`, the engine is `auto` — not `claude`.
|
|
34
|
-
- Call `mcp__codex-cli__ping` to verify
|
|
34
|
+
- Call `mcp__codex-cli__ping` to verify Codex MCP availability. On failure, **silently fall back to `--engine claude`** and note `engine downgraded: codex-ping failed` in your eventual output summary. Do not present a menu; do not abort. This matches auto-resolve's hands-off contract.
|
|
35
35
|
- Read `references/challenge-rubric.md` up front.
|
|
36
36
|
|
|
37
37
|
**Consolidated flag**: `--with-codex` was rolled into the smarter `--engine auto` default. If the user passes it, inform them once and proceed with `--engine auto`: "Note: `--with-codex` was consolidated into `--engine auto` (default), which routes the CHALLENGE rubric pass to Codex automatically. No flag needed. Continuing with `--engine auto`."
|
|
@@ -151,23 +151,20 @@ To implement:
|
|
|
151
151
|
|
|
152
152
|
### Context Archiving
|
|
153
153
|
|
|
154
|
-
ROADMAP.md is the tactical index.
|
|
154
|
+
ROADMAP.md is the tactical index. Done work should move to a collapsed `## Completed` block at the bottom, not clutter the active view. Item spec files stay on disk at `docs/roadmap/phase-N/{id}.md` — only the index row moves.
|
|
155
155
|
|
|
156
|
-
The
|
|
156
|
+
#### The Archive Pass (conditional)
|
|
157
157
|
|
|
158
|
-
|
|
158
|
+
Run this at the start of Quick Add / Expand / Replan **only when** `docs/ROADMAP.md` contains at least one phase where every row is `Done`. A quick scan tells you within seconds. Skip the pass otherwise — running it on a roadmap with no fully-done phases is no-op bookkeeping that burns the user's turn.
|
|
159
159
|
|
|
160
|
-
|
|
160
|
+
When it runs:
|
|
161
161
|
|
|
162
|
-
1.
|
|
163
|
-
2.
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
4. **Scan the Backlog table.** Surface any row whose "Revisit" date has passed — mention it to the user as a replan candidate. Don't auto-promote it; that's a conversation.
|
|
169
|
-
5. **Scan `docs/roadmap/decisions/`.** Flag any decision whose status is `accepted` but whose reasoning is visibly contradicted by the work that's now Done. Don't silently edit decisions; raise them as open questions.
|
|
170
|
-
6. **Report what you did.** Before moving on to the mode's main work, tell the user in one short paragraph: "Archived Phase 1 (3 items). Active roadmap is now Phase 2 (2 items). Proceeding with [Quick Add / Expand / Replan]." Skip the report only if nothing changed.
|
|
162
|
+
1. Read `docs/ROADMAP.md`.
|
|
163
|
+
2. For each phase where every row is `Done`: cut the `## Phase N: …` heading and table, move it into a new or existing `## Completed` block at the bottom as a `<details>` entry (see format below). Use the latest completion date found in item spec frontmatter (`completed:`), or today's if absent. Item count is the row count.
|
|
164
|
+
3. Individual `Done` rows inside an otherwise-active phase stay put — mixed phases show recent wins alongside open work.
|
|
165
|
+
4. Scan the Backlog table; surface any row whose `Revisit` date has passed as a replan candidate (don't auto-promote — that's a conversation).
|
|
166
|
+
5. Scan `docs/roadmap/decisions/` for `accepted` decisions whose reasoning is visibly contradicted by newly-Done work; raise them as open questions rather than silently editing.
|
|
167
|
+
6. One-sentence report of what was archived, then proceed with the mode's main work. Skip the report if nothing changed.
|
|
171
168
|
|
|
172
169
|
**Completed block format** (place at the bottom of ROADMAP.md, below Decisions):
|
|
173
170
|
|
|
@@ -327,40 +324,7 @@ For Quick Add with one new item, one solo pass is enough. For a full greenfield
|
|
|
327
324
|
|
|
328
325
|
Call `mcp__codex-cli__codex` with `model: "gpt-5.4"`, `reasoningEffort: "xhigh"`, `sandbox: "read-only"`, `workingDirectory: <project root>`. The `prompt` parameter is built from the packaged plan + the inlined rubric + the appended Codex instructions. Codex has no filesystem access to this project, so everything it needs travels in the prompt.
|
|
329
326
|
|
|
330
|
-
**Step 1 — Package the post-solo plan.** Build the prompt
|
|
331
|
-
|
|
332
|
-
```
|
|
333
|
-
## Problem framing (from FRAME phase)
|
|
334
|
-
[problem statement, constraints, success criteria, anti-goals]
|
|
335
|
-
|
|
336
|
-
## Confirmed facts vs assumptions
|
|
337
|
-
Confirmed by user: [list each fact the user explicitly confirmed]
|
|
338
|
-
Assumptions (not yet confirmed): [list each assumption the agent made]
|
|
339
|
-
|
|
340
|
-
## Plan (post-solo-CHALLENGE)
|
|
341
|
-
Vision: [one sentence]
|
|
342
|
-
Phase 1 ([theme]): [items with one-line descriptions and dependencies]
|
|
343
|
-
Phase 2 ([theme]): ...
|
|
344
|
-
Architecture decisions: [each with what / why / alternatives considered]
|
|
345
|
-
Deferred to backlog: [items + reason]
|
|
346
|
-
|
|
347
|
-
## Findings from the solo rubric pass
|
|
348
|
-
[list each with: severity, axis, quote, why, fix, whether applied]
|
|
349
|
-
|
|
350
|
-
## Rubric
|
|
351
|
-
[INLINE the full text of references/challenge-rubric.md here verbatim — Codex needs the rubric definition in the prompt itself]
|
|
352
|
-
|
|
353
|
-
## Your job
|
|
354
|
-
You are applying an independent rubric pass to the PLANNING document above. This is a roadmap, not code — judge the shape of the plan, not implementation details. The user explicitly asked to be challenged because soft-pedaled plans waste their time.
|
|
355
|
-
|
|
356
|
-
You are running AFTER a solo pass by Claude. Catch what the solo pass missed; do not just agree with what it already caught. For each existing solo finding, reply either "confirmed" (with one-line agreement) or "I would frame this differently" (with a reason). Then add your own findings that the solo pass missed.
|
|
357
|
-
|
|
358
|
-
Use the finding format from the rubric above: Severity / Quote / Axis / Why / Fix. The Quote field is load-bearing — anchor each finding to a specific line from the plan.
|
|
359
|
-
|
|
360
|
-
Respect explicit user intent. If the user confirmed something in the "Confirmed facts" section, the rubric does not override it silently. Raise the conflict as a note and let the orchestrator surface it to the user.
|
|
361
|
-
|
|
362
|
-
End with a verdict: PASS / PASS WITH MINOR FIXES / FAIL — REVISION REQUIRED, plus a one-line explanation.
|
|
363
|
-
```
|
|
327
|
+
**Step 1 — Package the post-solo plan.** Build the prompt per `references/codex-critic-template.md` (section order, rubric inlining, Codex-specific instructions all live there verbatim — follow the template structure, fill in the plan/findings sections).
|
|
364
328
|
|
|
365
329
|
**Step 2 — Reconcile.** Merge the two finding lists:
|
|
366
330
|
- Same finding from both → keep the more specific wording, mark "confirmed by both"
|
|
@@ -498,22 +462,6 @@ After completing each item:
|
|
|
498
462
|
|
|
499
463
|
The auto-resolve prompt explicitly tells the build agent to read the spec file — this ensures done-criteria are adopted from the spec rather than generated from scratch, preserving the ideation context through to implementation.
|
|
500
464
|
|
|
501
|
-
## Quality Checklist
|
|
502
|
-
|
|
503
|
-
Before finalizing, verify:
|
|
504
|
-
- [ ] Every roadmap item has a linked spec file
|
|
505
|
-
- [ ] Every spec has testable requirements (not vague statements)
|
|
506
|
-
- [ ] Every spec has an Out of Scope section
|
|
507
|
-
- [ ] Every spec's Context section is 3 sentences or fewer
|
|
508
|
-
- [ ] ROADMAP.md is an index only — no inline specifications
|
|
509
|
-
- [ ] No spec requires reading VISION.md to be understood (self-contained)
|
|
510
|
-
- [ ] Dependencies between items are documented in both specs
|
|
511
|
-
- [ ] Architecture decisions include reasoning and alternatives considered
|
|
512
|
-
- [ ] CHALLENGE ran against `references/challenge-rubric.md` (solo, plus Codex critic on `--engine auto`); no item still fails any axis at CRITICAL or HIGH severity
|
|
513
|
-
- [ ] User saw the post-challenge plan as the first and only confirmation prompt — no pre-challenge draft was shown first
|
|
514
|
-
- [ ] Any rubric finding that conflicted with explicit user intent was surfaced as an open question, not silently applied
|
|
515
|
-
- [ ] Every requirement is traceable to a confirmed fact, a verified source, or an explicitly labeled assumption — no unmarked guesses slipped into the specs
|
|
516
|
-
|
|
517
465
|
## Language
|
|
518
466
|
|
|
519
467
|
Generate all documents in the language the user communicates in. If the user mixes languages, match their primary language for prose and keep technical terms in English.
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
# Codex Critic Prompt Template (Phase 3.5)
|
|
2
|
+
|
|
3
|
+
Used by `devlyn:ideate` when `--engine auto` or `--engine claude` (role reversal). Call `mcp__codex-cli__codex` with `model: "gpt-5.4"`, `reasoningEffort: "xhigh"`, `sandbox: "read-only"`, `workingDirectory: <project root>`. Codex has no filesystem access to this project — everything it needs travels in the prompt.
|
|
4
|
+
|
|
5
|
+
Assemble the prompt with these sections in this exact order, filling in placeholders:
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
## Problem framing (from FRAME phase)
|
|
9
|
+
[problem statement, constraints, success criteria, anti-goals]
|
|
10
|
+
|
|
11
|
+
## Confirmed facts vs assumptions
|
|
12
|
+
Confirmed by user: [list each fact the user explicitly confirmed]
|
|
13
|
+
Assumptions (not yet confirmed): [list each assumption the agent made]
|
|
14
|
+
|
|
15
|
+
## Plan (post-solo-CHALLENGE)
|
|
16
|
+
Vision: [one sentence]
|
|
17
|
+
Phase 1 ([theme]): [items with one-line descriptions and dependencies]
|
|
18
|
+
Phase 2 ([theme]): ...
|
|
19
|
+
Architecture decisions: [each with what / why / alternatives considered]
|
|
20
|
+
Deferred to backlog: [items + reason]
|
|
21
|
+
|
|
22
|
+
## Findings from the solo rubric pass
|
|
23
|
+
[list each with: severity, axis, quote, why, fix, whether applied]
|
|
24
|
+
|
|
25
|
+
## Rubric
|
|
26
|
+
[INLINE the full text of references/challenge-rubric.md here verbatim — Codex needs the rubric definition in the prompt itself]
|
|
27
|
+
|
|
28
|
+
## Your job
|
|
29
|
+
You are applying an independent rubric pass to the PLANNING document above. This is a roadmap, not code — judge the shape of the plan, not implementation details. The user explicitly asked to be challenged because soft-pedaled plans waste their time.
|
|
30
|
+
|
|
31
|
+
You are running AFTER a solo pass by Claude. Catch what the solo pass missed; do not just agree with what it already caught. For each existing solo finding, reply either "confirmed" (with one-line agreement) or "I would frame this differently" (with a reason). Then add your own findings that the solo pass missed.
|
|
32
|
+
|
|
33
|
+
Use the finding format from the rubric above: Severity / Quote / Axis / Why / Fix. The Quote field is load-bearing — anchor each finding to a specific line from the plan.
|
|
34
|
+
|
|
35
|
+
Respect explicit user intent. If the user confirmed something in the "Confirmed facts" section, the rubric does not override it silently. Raise the conflict as a note and let the orchestrator surface it to the user.
|
|
36
|
+
|
|
37
|
+
End with a verdict: PASS / PASS WITH MINOR FIXES / FAIL — REVISION REQUIRED, plus a one-line explanation.
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
## Why a separate file
|
|
41
|
+
|
|
42
|
+
Inlining the rubric and the boilerplate instructions into the orchestrator SKILL.md burned ~30 lines per load of the ideate skill. The critic packaging runs exactly once per session; the template only needs to be read at Phase 3.5 time. On-demand loading matches the progressive-disclosure pattern used across the devlyn harness.
|
|
@@ -1,18 +1,14 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: devlyn:preflight
|
|
3
3
|
description: >
|
|
4
|
-
Final alignment check between vision/roadmap documents and the actual codebase
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
"final check before shipping", or when the user says they've finished implementing and wants
|
|
13
|
-
verification. This is different from /devlyn:evaluate (which grades a single changeset) and
|
|
14
|
-
/devlyn:review (which reviews code quality) — preflight audits the ENTIRE project against its
|
|
15
|
-
planning documents holistically.
|
|
4
|
+
Final alignment check between vision/roadmap documents and the actual codebase before declaring
|
|
5
|
+
a roadmap phase complete. Reads commitments from VISION.md, ROADMAP.md, and item specs, then
|
|
6
|
+
audits the implementation with file:line evidence. Catches missing/incomplete features, spec
|
|
7
|
+
divergence, bugs, and doc drift; validates browser behavior for web projects. Use when
|
|
8
|
+
implementation is finished and you want a holistic roadmap-vs-code verification. Triggers on
|
|
9
|
+
"preflight", "gap analysis", "did I miss anything", "check against the roadmap", "verify
|
|
10
|
+
implementation", "are we done". Differs from /devlyn:evaluate (single changeset) and
|
|
11
|
+
/devlyn:review (code quality) — preflight audits the entire project against planning docs.
|
|
16
12
|
---
|
|
17
13
|
|
|
18
14
|
# Vision-to-Implementation Preflight Check
|
|
@@ -54,7 +50,7 @@ Example with engine: `/devlyn:preflight --engine auto`
|
|
|
54
50
|
|
|
55
51
|
**Engine pre-flight** (runs unless `--engine claude` was explicitly passed):
|
|
56
52
|
- The default engine is `auto`. If the user did not pass `--engine`, the engine is `auto` — NOT `claude`.
|
|
57
|
-
- Call `mcp__codex-cli__ping` to verify Codex MCP availability.
|
|
53
|
+
- Call `mcp__codex-cli__ping` to verify Codex MCP availability. On failure, **silently fall back to `--engine claude`** and note `engine downgraded: codex-ping failed` in the final preflight report header. Do not abort. Matches the hands-off contract used by auto-resolve and ideate.
|
|
58
54
|
|
|
59
55
|
## PHASE 0: DISCOVER & SCOPE
|
|
60
56
|
|
|
@@ -104,11 +100,12 @@ Read all in-scope planning documents and build a **commitment registry** — eve
|
|
|
104
100
|
4. **Filter out** (excluded from audit entirely):
|
|
105
101
|
- Items in `backlog/` or `deferred.md`
|
|
106
102
|
- Items with `status: cut` in ROADMAP.md
|
|
107
|
-
- Out of Scope entries — these are anti-commitments (things promised NOT to build)
|
|
108
103
|
|
|
109
|
-
5. **
|
|
104
|
+
5. **Anti-commitments ARE audited** (Out of Scope entries in each spec). These are "must NOT build" claims — if the codebase has shipped something the spec explicitly excluded, that is a WORKAROUND / scope-creep finding, not a success. The code-auditor checks each anti-commitment: "is this excluded behavior present in the code?" If yes → emit a finding with `rule_id: "scope.anti-commitment-violation"` (severity HIGH).
|
|
110
105
|
|
|
111
|
-
6. **
|
|
106
|
+
6. **Separate planned items**: Items with `status: planned` in their spec frontmatter or "Planned" in ROADMAP.md are not expected to be implemented yet. Include them in a `[PLANNED]` section of the registry for visibility, but do **not** audit them as missing. Flagging planned items as MISSING creates noise and buries the real gaps in work that was supposed to be done.
|
|
107
|
+
|
|
108
|
+
7. **Write to `.devlyn/commitment-registry.md`**:
|
|
112
109
|
|
|
113
110
|
```markdown
|
|
114
111
|
# Commitment Registry
|
|
@@ -124,15 +121,15 @@ Total commitments: [N]
|
|
|
124
121
|
- [INTEGRATION] Auth middleware applied to all /api/* routes
|
|
125
122
|
- [TEST] Auth flow covered by E2E tests
|
|
126
123
|
|
|
127
|
-
## Anti-Commitments (Out of Scope)
|
|
128
|
-
- [item 1.1]
|
|
129
|
-
- [item 1.2]
|
|
124
|
+
## Anti-Commitments (Out of Scope — audited as "must NOT exist in code")
|
|
125
|
+
- [item 1.1] Must NOT include social login
|
|
126
|
+
- [item 1.2] Must NOT include real-time inventory sync
|
|
130
127
|
|
|
131
|
-
## Not Started (Planned —
|
|
128
|
+
## Not Started (Planned — not audited for presence, but still anti-commitments inside them apply)
|
|
132
129
|
### 2.1 [item title] (spec status: planned)
|
|
133
130
|
- [FEATURE] WebSocket connection on page load
|
|
134
131
|
- [FEATURE] Real-time task list updates
|
|
135
|
-
[
|
|
132
|
+
[Planned items are tracked for visibility; code-auditor does not flag as MISSING.]
|
|
136
133
|
```
|
|
137
134
|
|
|
138
135
|
## PHASE 2: AUDIT
|
|
@@ -168,36 +165,23 @@ Tests user-facing features in the browser against commitment registry. Writes to
|
|
|
168
165
|
|
|
169
166
|
## PHASE 3: SYNTHESIZE & REPORT
|
|
170
167
|
|
|
171
|
-
|
|
168
|
+
Auditors already emit each finding with its category (`MISSING`/`INCOMPLETE`/`DIVERGENT`/`BROKEN`/`UNDOCUMENTED`/`STALE_DOC`/`scope.anti-commitment-violation`) and severity (`CRITICAL`/`HIGH`/`MEDIUM`/`LOW`). Synthesis passes them through — do NOT re-classify or re-severity-label. That would replace domain judgment with orchestrator mechanics.
|
|
172
169
|
|
|
173
170
|
1. **Read all audit files** in parallel:
|
|
174
171
|
- `.devlyn/audit-code.md`
|
|
175
172
|
- `.devlyn/audit-docs.md` (if exists)
|
|
176
173
|
- `.devlyn/audit-browser.md` (if exists)
|
|
177
174
|
|
|
178
|
-
2. **Deduplicate**:
|
|
179
|
-
|
|
180
|
-
3. **Filter accepted divergences**: If `.devlyn/preflight-accepted.md` exists, remove any findings that match accepted entries.
|
|
181
|
-
|
|
182
|
-
4. **Classify each finding** using these categories:
|
|
183
|
-
|
|
184
|
-
| Category | Description | Typical source |
|
|
185
|
-
|----------|-------------|----------------|
|
|
186
|
-
| `MISSING` | In roadmap but not implemented | code-auditor |
|
|
187
|
-
| `INCOMPLETE` | Implementation started but unfinished | code-auditor |
|
|
188
|
-
| `DIVERGENT` | Implemented differently than spec says | code-auditor |
|
|
189
|
-
| `BROKEN` | Implemented but has a bug | code-auditor, browser-auditor |
|
|
190
|
-
| `UNDOCUMENTED` | Implemented but not in docs | docs-auditor |
|
|
191
|
-
| `STALE_DOC` | Docs don't match current code | docs-auditor |
|
|
175
|
+
2. **Deduplicate**: if multiple auditors flagged the same issue (same category + file:line), merge into one finding at the highest severity the reporting auditor assigned. Trust the auditor's severity — do not override.
|
|
192
176
|
|
|
193
|
-
|
|
177
|
+
3. **Filter accepted divergences**: if `.devlyn/preflight-accepted.md` exists, remove findings whose (category, commitment) matches an accepted entry.
|
|
194
178
|
|
|
195
|
-
|
|
179
|
+
4. **Compare with previous run** (if `.devlyn/PREFLIGHT-REPORT.md` existed):
|
|
196
180
|
- `RESOLVED`: finding from previous run no longer present
|
|
197
181
|
- `PERSISTS`: finding still present
|
|
198
182
|
- `NEW`: finding not in previous run
|
|
199
183
|
|
|
200
|
-
|
|
184
|
+
5. **Generate `.devlyn/PREFLIGHT-REPORT.md`**:
|
|
201
185
|
|
|
202
186
|
```markdown
|
|
203
187
|
# Preflight Report
|
|
@@ -212,6 +196,7 @@ Previous run: [timestamp / none]
|
|
|
212
196
|
| INCOMPLETE | [N] |
|
|
213
197
|
| DIVERGENT | [N] |
|
|
214
198
|
| BROKEN | [N] |
|
|
199
|
+
| SCOPE_VIOLATION | [N] |
|
|
215
200
|
| UNDOCUMENTED | [N] |
|
|
216
201
|
| STALE_DOC | [N] |
|
|
217
202
|
| **Total findings** | **[N]** |
|
|
@@ -266,7 +251,7 @@ These items are acknowledged future work per the roadmap. They will be audited w
|
|
|
266
251
|
- [list any, or "None"]
|
|
267
252
|
```
|
|
268
253
|
|
|
269
|
-
|
|
254
|
+
6. **Present the report** to the user with a summary.
|
|
270
255
|
|
|
271
256
|
## PHASE 4: TRIAGE & PROMOTE
|
|
272
257
|
|
|
@@ -8,16 +8,7 @@ You are auditing a codebase against its planning commitments. Your job is to ver
|
|
|
8
8
|
|
|
9
9
|
Read `.devlyn/commitment-registry.md` for the full list of commitments to verify. Skip any items in the "Not Started (Planned)" section — those are acknowledged future work, not gaps.
|
|
10
10
|
|
|
11
|
-
**Step 0 — Build health check**: Before auditing individual commitments, verify the project actually builds.
|
|
12
|
-
- `package.json` with `next` → `npx tsc --noEmit && npx next build`
|
|
13
|
-
- `package.json` with `vite` + `tsconfig.json` → `npx tsc --noEmit`
|
|
14
|
-
- `Cargo.toml` → `cargo check --all-targets`
|
|
15
|
-
- `go.mod` → `go build ./... && go vet ./...`
|
|
16
|
-
- `foundry.toml` → `forge build`
|
|
17
|
-
- `hardhat.config.*` → `npx hardhat compile`
|
|
18
|
-
- Monorepo (`pnpm-workspace.yaml`/`turbo.json`) → workspace-wide build
|
|
19
|
-
- `Dockerfile*` → `docker build` (if Docker available)
|
|
20
|
-
- For other project types, look for a `build` script in `package.json` or equivalent
|
|
11
|
+
**Step 0 — Build health check**: Before auditing individual commitments, verify the project actually builds. Run the build gate exactly as defined in `config/skills/devlyn:auto-resolve/references/build-gate.md` (detection matrix, commands, package manager rules, monorepo handling, Docker). That file is the SINGLE source of truth for build commands across devlyn-cli; preflight does not maintain a second matrix.
|
|
21
12
|
|
|
22
13
|
Any build/typecheck failure is a BROKEN finding at CRITICAL severity — code that doesn't compile cannot fulfill any commitment. Include the full compiler error output with file:line references. This catches type errors, missing imports, cross-package drift, and Dockerfile build failures that text-based code reading alone cannot detect.
|
|
23
14
|
|
|
@@ -33,6 +24,11 @@ Any build/typecheck failure is a BROKEN finding at CRITICAL severity — code th
|
|
|
33
24
|
| INCOMPLETE | Implementation started but doesn't fully satisfy | What's there + what's missing, both with file:line |
|
|
34
25
|
| DIVERGENT | Implementation does something different than specified | Spec requirement vs actual behavior, with file:line |
|
|
35
26
|
| BROKEN | Implementation exists but has a bug preventing it from working | The bug with file:line |
|
|
27
|
+
| SCOPE_VIOLATION | Code ships behavior an anti-commitment (`Out of Scope`) explicitly excluded | file:line showing the prohibited behavior |
|
|
28
|
+
|
|
29
|
+
**Anti-commitment audit** (new in v3.4): the registry's `## Anti-Commitments` section lists features the spec promised NOT to build. Check each one against the code:
|
|
30
|
+
- If the excluded behavior is present, emit a finding with `rule_id: "scope.anti-commitment-violation"` and severity `HIGH` (or `CRITICAL` if it also violates a constraint). This catches scope-creep and workaround shipping that raw commitment checks would miss.
|
|
31
|
+
- If the excluded behavior is absent, no finding — anti-commitments are satisfied by absence.
|
|
36
32
|
|
|
37
33
|
**Beyond the commitment checklist**, also investigate:
|
|
38
34
|
- Cross-feature integration gaps: features that should connect but don't
|
|
@@ -0,0 +1,104 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Safely count and kill orphaned child processes (PPID=1) left behind by Claude Code MCP plugins, Superset terminal tabs, and codex wrappers. Use this whenever the user says "too many processes", "can't open terminals", "pty/process limit", "hundreds of bun/codex/workerd piling up", "clean up orphans", "reap processes", or reports new terminals failing to spawn on macOS. Also use proactively after long Claude sessions to prevent hitting kern.maxprocperuid or kern.tty.ptmx_max limits. ONLY touches a conservative whitelist of known leaks — never guesses on unknown processes.
|
|
3
|
+
allowed-tools: Read, Bash(ps:*), Bash(lsof:*), Bash(pgrep:*), Bash(awk:*), Bash(id:*), Bash(sysctl:*), Bash(bash:*)
|
|
4
|
+
argument-hint: [scan | kill | kill --force | kill --include workerd | kill --only telegram-bun]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
<role>
|
|
8
|
+
You are a process-hygiene janitor for macOS. Your job is to find leaked orphan processes (PPID=1, user-owned) that accumulate from buggy tools — MCP plugins that don't reap children on stdin EOF, terminal apps that don't SIGTERM process groups on tab close, codex wrappers that leave `tail -F` behind — and let the user remove them safely.
|
|
9
|
+
|
|
10
|
+
Your operating principle: **the user's trust costs more than one missed cleanup.** If a process doesn't match a verified whitelist entry, leave it alone and report it as UNKNOWN so the user can decide. Never guess.
|
|
11
|
+
</role>
|
|
12
|
+
|
|
13
|
+
<user_input>
|
|
14
|
+
$ARGUMENTS
|
|
15
|
+
</user_input>
|
|
16
|
+
|
|
17
|
+
<process>
|
|
18
|
+
|
|
19
|
+
## Phase 1: Parse intent
|
|
20
|
+
|
|
21
|
+
Look at `$ARGUMENTS` and classify:
|
|
22
|
+
|
|
23
|
+
| Input | Mode |
|
|
24
|
+
|---|---|
|
|
25
|
+
| empty, `scan`, `status`, `count`, `list`, or anything non-imperative | **SCAN only** (default) |
|
|
26
|
+
| starts with `kill`, `reap`, `clean`, `prune`, `죽여`, `정리` | **KILL** mode |
|
|
27
|
+
|
|
28
|
+
In KILL mode, also parse:
|
|
29
|
+
- `--force` → SIGKILL instead of SIGTERM
|
|
30
|
+
- `--include workerd` → extend the default whitelist with the workerd-dev category
|
|
31
|
+
- `--only <category>` → restrict to a single category
|
|
32
|
+
- `--dry-run` → list kills but don't send signals
|
|
33
|
+
|
|
34
|
+
If the user's intent is ambiguous (e.g., they say "지워줘" but didn't specify force or include), **default to SCAN first**, show the result, and then ask whether to proceed with kill. Never escalate to `--force` without an explicit request.
|
|
35
|
+
|
|
36
|
+
## Phase 2: SCAN
|
|
37
|
+
|
|
38
|
+
Always run scan first — even in KILL mode — so the user sees what is about to happen.
|
|
39
|
+
|
|
40
|
+
Run the bundled scanner. The skill is installed at `~/.claude/skills/devlyn:reap/`:
|
|
41
|
+
|
|
42
|
+
```bash
|
|
43
|
+
bash ~/.claude/skills/devlyn:reap/scripts/scan.sh
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
Report the output verbatim to the user. Then add your own 2-line summary:
|
|
47
|
+
|
|
48
|
+
- total orphan count across whitelist categories
|
|
49
|
+
- any UNKNOWN_ORPHANS that the user might want to investigate manually
|
|
50
|
+
|
|
51
|
+
Also surface the macOS limits for context, only once per session:
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
sysctl kern.maxprocperuid kern.tty.ptmx_max 2>/dev/null
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## Phase 3: KILL (only when requested)
|
|
58
|
+
|
|
59
|
+
Run the reap script with the parsed flags:
|
|
60
|
+
|
|
61
|
+
```bash
|
|
62
|
+
bash ~/.claude/skills/devlyn:reap/scripts/reap.sh [flags]
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
Show the output verbatim. The script re-verifies `PPID==1 && user==current` for every PID right before signaling — a process that was legitimately adopted since the scan will be skipped, not killed.
|
|
66
|
+
|
|
67
|
+
After kill, re-run scan to confirm the counts dropped. If any whitelisted PIDs are still present after SIGTERM and 2 seconds, mention that `--force` (SIGKILL) is available.
|
|
68
|
+
|
|
69
|
+
## Phase 4: Recommend (only if signals of chronic leak)
|
|
70
|
+
|
|
71
|
+
If `telegram-bun` count > 10 OR oldest whitelisted orphan > 24h, tell the user this is a recurring leak and suggest one of:
|
|
72
|
+
|
|
73
|
+
1. **Patch the telegram plugin** — add `process.stdin.on('end', () => process.exit(0))` to `server.ts` so the child dies when Claude Code exits.
|
|
74
|
+
2. **Schedule this skill** — run `/devlyn:reap kill` periodically (e.g., via the `/loop` skill or a launchd agent).
|
|
75
|
+
3. **Update Superset** — newer versions may SIGTERM process groups on tab close.
|
|
76
|
+
|
|
77
|
+
Do NOT apply these automatically. Recommend and let the user choose.
|
|
78
|
+
|
|
79
|
+
</process>
|
|
80
|
+
|
|
81
|
+
<safety>
|
|
82
|
+
|
|
83
|
+
## Never-touch rules
|
|
84
|
+
|
|
85
|
+
- **NEVER kill** a process whose command does not match a whitelist category in `scan.sh`. Unknown = informational only.
|
|
86
|
+
- **NEVER kill** anything where `ps -o ppid=` returns something other than `1` at signal time.
|
|
87
|
+
- **NEVER kill** processes owned by another user (the scripts check `id -un`).
|
|
88
|
+
- **NEVER use** `killall`, `pkill -9`, or wildcard `kill $(pgrep ...)` in this skill. Always iterate PIDs individually with per-PID re-verification.
|
|
89
|
+
- **NEVER suggest** `sudo` escalation — this is a user-scope cleanup tool.
|
|
90
|
+
|
|
91
|
+
## Whitelist definitions
|
|
92
|
+
|
|
93
|
+
These are the ONLY categories reap.sh will touch:
|
|
94
|
+
|
|
95
|
+
| Category | Match | Why safe |
|
|
96
|
+
|---|---|---|
|
|
97
|
+
| `telegram-bun` | `bun server.ts` **AND** cwd contains `/plugins/cache/claude-plugins-official/telegram/` | Telegram MCP plugin leaks one per Claude session. Verified by cwd, not just cmdline. |
|
|
98
|
+
| `superset-codex-bash` | `/bin/bash .*/.superset/bin/codex` with PPID=1 | `.superset/bin/codex` wrapper exits without killing its tail child; bash copies left behind. |
|
|
99
|
+
| `superset-codex-tail` | `tail -F .*superset-codex-session-*.jsonl` with PPID=1 | Log tail from the same wrapper, always safe to stop. |
|
|
100
|
+
| `workerd` (opt-in) | `@cloudflare/workerd-darwin-*/bin/workerd serve ` with PPID=1 | moonmaker-engine dev server that survives tab close. Opt-in because the user may have an active dev session. |
|
|
101
|
+
|
|
102
|
+
If the user asks to add a new category, **edit scan.sh and reap.sh together** — both must know the same pattern so scan never promises a cleanup that reap won't deliver.
|
|
103
|
+
|
|
104
|
+
</safety>
|
|
@@ -0,0 +1,129 @@
|
|
|
1
|
+
#!/usr/bin/env bash
|
|
2
|
+
# devlyn:reap — kill orphan processes from safe whitelist categories.
|
|
3
|
+
# Verifies PPID==1 and user-ownership AGAIN at kill time to avoid racing a
|
|
4
|
+
# legitimately-reparented process. Unknown orphans are never killed.
|
|
5
|
+
#
|
|
6
|
+
# Usage:
|
|
7
|
+
# reap.sh # default categories, SIGTERM
|
|
8
|
+
# reap.sh --force # SIGKILL instead of SIGTERM
|
|
9
|
+
# reap.sh --include workerd # add workerd-dev to the default set
|
|
10
|
+
# reap.sh --only telegram-bun # restrict to a single category
|
|
11
|
+
# reap.sh --dry-run # print what WOULD be killed, kill nothing
|
|
12
|
+
|
|
13
|
+
set -u
|
|
14
|
+
LC_ALL=C
|
|
15
|
+
export LC_ALL
|
|
16
|
+
|
|
17
|
+
ME="$(id -un)"
|
|
18
|
+
SIGNAL="TERM"
|
|
19
|
+
DRY=0
|
|
20
|
+
INCLUDE=""
|
|
21
|
+
ONLY=""
|
|
22
|
+
|
|
23
|
+
while [ $# -gt 0 ]; do
|
|
24
|
+
case "$1" in
|
|
25
|
+
--force) SIGNAL="KILL" ;;
|
|
26
|
+
--dry-run) DRY=1 ;;
|
|
27
|
+
--include) shift; INCLUDE="${INCLUDE},$1" ;;
|
|
28
|
+
--only) shift; ONLY="$1" ;;
|
|
29
|
+
-h|--help)
|
|
30
|
+
sed -n '2,14p' "$0"; exit 0 ;;
|
|
31
|
+
*)
|
|
32
|
+
printf 'unknown flag: %s\n' "$1" >&2; exit 2 ;;
|
|
33
|
+
esac
|
|
34
|
+
shift
|
|
35
|
+
done
|
|
36
|
+
|
|
37
|
+
DEFAULT_CATEGORIES="telegram-bun,superset-codex-bash,superset-codex-tail"
|
|
38
|
+
if [ -n "$ONLY" ]; then
|
|
39
|
+
CATEGORIES="$ONLY"
|
|
40
|
+
else
|
|
41
|
+
CATEGORIES="${DEFAULT_CATEGORIES}${INCLUDE}"
|
|
42
|
+
fi
|
|
43
|
+
|
|
44
|
+
SNAPSHOT="$(ps -eo pid=,ppid=,user=,etime=,command= 2>/dev/null | awk -v me="$ME" '$2==1 && $3==me')"
|
|
45
|
+
|
|
46
|
+
collect_pids() {
|
|
47
|
+
local category="$1"
|
|
48
|
+
case "$category" in
|
|
49
|
+
telegram-bun)
|
|
50
|
+
# cwd-verified — same logic as scan.sh
|
|
51
|
+
printf '%s\n' "$SNAPSHOT" \
|
|
52
|
+
| grep -E '/bun[^ ]* server\.ts( |$)' \
|
|
53
|
+
| awk '{print $1}' \
|
|
54
|
+
| while read -r pid; do
|
|
55
|
+
cwd="$(lsof -a -d cwd -p "$pid" 2>/dev/null | awk 'NR==2 {for(i=9;i<=NF;i++) printf "%s ", $i; print ""}')"
|
|
56
|
+
case "$cwd" in
|
|
57
|
+
*"/plugins/cache/claude-plugins-official/telegram/"*) printf '%s\n' "$pid" ;;
|
|
58
|
+
esac
|
|
59
|
+
done
|
|
60
|
+
;;
|
|
61
|
+
superset-codex-bash)
|
|
62
|
+
printf '%s\n' "$SNAPSHOT" | grep -E '/bin/bash .*/\.superset/bin/codex( |$)' | awk '{print $1}' ;;
|
|
63
|
+
superset-codex-tail)
|
|
64
|
+
printf '%s\n' "$SNAPSHOT" | grep -E 'tail .*superset-codex-session-.*\.jsonl' | awk '{print $1}' ;;
|
|
65
|
+
workerd)
|
|
66
|
+
printf '%s\n' "$SNAPSHOT" | grep -E '@cloudflare/workerd-darwin-[^/]+/bin/workerd serve ' | awk '{print $1}' ;;
|
|
67
|
+
*)
|
|
68
|
+
printf 'unknown category: %s\n' "$category" >&2
|
|
69
|
+
return 1 ;;
|
|
70
|
+
esac
|
|
71
|
+
}
|
|
72
|
+
|
|
73
|
+
TOTAL_KILLED=0
|
|
74
|
+
TOTAL_SKIPPED=0
|
|
75
|
+
|
|
76
|
+
# Split the comma-separated category list without letting IFS leak into the
|
|
77
|
+
# inner loop that iterates newline-separated PIDs.
|
|
78
|
+
CATS_ARR=()
|
|
79
|
+
OLD_IFS="$IFS"
|
|
80
|
+
IFS=,
|
|
81
|
+
for c in $CATEGORIES; do
|
|
82
|
+
[ -n "$c" ] && CATS_ARR+=("$c")
|
|
83
|
+
done
|
|
84
|
+
IFS="$OLD_IFS"
|
|
85
|
+
|
|
86
|
+
for cat in "${CATS_ARR[@]}"; do
|
|
87
|
+
pids="$(collect_pids "$cat")" || continue
|
|
88
|
+
if [ -z "$pids" ]; then
|
|
89
|
+
printf '[%s] nothing to kill\n' "$cat"
|
|
90
|
+
continue
|
|
91
|
+
fi
|
|
92
|
+
while IFS= read -r pid; do
|
|
93
|
+
[ -z "$pid" ] && continue
|
|
94
|
+
# Re-verify right before killing. Any of these mean "don't touch":
|
|
95
|
+
# - process already gone
|
|
96
|
+
# - PPID is no longer 1 (got adopted by a real parent — not our target)
|
|
97
|
+
# - owner changed (extremely unlikely but cheap to check)
|
|
98
|
+
live_info="$(ps -o ppid=,user= -p "$pid" 2>/dev/null)"
|
|
99
|
+
if [ -z "$live_info" ]; then
|
|
100
|
+
printf '[%s] %s skipped (already exited)\n' "$cat" "$pid"
|
|
101
|
+
TOTAL_SKIPPED=$((TOTAL_SKIPPED+1))
|
|
102
|
+
continue
|
|
103
|
+
fi
|
|
104
|
+
live_ppid="$(printf '%s' "$live_info" | awk '{print $1}')"
|
|
105
|
+
live_user="$(printf '%s' "$live_info" | awk '{print $2}')"
|
|
106
|
+
if [ "$live_ppid" != "1" ] || [ "$live_user" != "$ME" ]; then
|
|
107
|
+
printf '[%s] %s skipped (ppid=%s user=%s — no longer orphan)\n' "$cat" "$pid" "$live_ppid" "$live_user"
|
|
108
|
+
TOTAL_SKIPPED=$((TOTAL_SKIPPED+1))
|
|
109
|
+
continue
|
|
110
|
+
fi
|
|
111
|
+
if [ "$DRY" -eq 1 ]; then
|
|
112
|
+
printf '[%s] %s would SIG%s\n' "$cat" "$pid" "$SIGNAL"
|
|
113
|
+
else
|
|
114
|
+
if kill -s "$SIGNAL" "$pid" 2>/dev/null; then
|
|
115
|
+
printf '[%s] %s SIG%s sent\n' "$cat" "$pid" "$SIGNAL"
|
|
116
|
+
TOTAL_KILLED=$((TOTAL_KILLED+1))
|
|
117
|
+
else
|
|
118
|
+
printf '[%s] %s kill failed\n' "$cat" "$pid"
|
|
119
|
+
TOTAL_SKIPPED=$((TOTAL_SKIPPED+1))
|
|
120
|
+
fi
|
|
121
|
+
fi
|
|
122
|
+
done <<< "$pids"
|
|
123
|
+
done
|
|
124
|
+
|
|
125
|
+
if [ "$DRY" -eq 1 ]; then
|
|
126
|
+
printf '\ndry-run complete.\n'
|
|
127
|
+
else
|
|
128
|
+
printf '\ndone. killed=%s skipped=%s\n' "$TOTAL_KILLED" "$TOTAL_SKIPPED"
|
|
129
|
+
fi
|