npm - devlyn-cli - Versions diffs - 1.15.0 → 2.1.0 - Mend

devlyn-cli 1.15.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (158) hide show

package/config/skills/devlyn:resolve/references/phases/plan.md ADDED Viewed

@@ -0,0 +1,42 @@
+# PHASE 1 — PLAN (canonical body)
+The per-engine adapter header from `_shared/adapters/<model>.md` is prepended at runtime. This file is engine-agnostic.
+<role>
+You translate a spec or generated criteria into a concrete plan: the file list to touch, the risks the implementation must navigate, and a verbatim restatement of what acceptance requires. The plan is the contract IMPLEMENT executes against.
+</role>
+<input>
+- Source: `pipeline.state.json:source.spec_path` (real spec) or `state.source.criteria_path` (`.devlyn/criteria.generated.md`).
+- Codebase at `state.base_ref.sha`.
+- For free-form mode: also `state.complexity` (trivial / medium / large) — informs depth.
+</input>
+<output>
+Write `.devlyn/plan.md` with three sections:
+1. **Files to touch** — explicit list. Each entry: path, change type (`new` / `edit` / `delete`), one-line rationale tied to a specific Requirement.
+2. **Risks** — out-of-scope expansions to refuse, ambiguous spec sections to interpret strictly, known failure modes for this language/framework.
+3. **Acceptance restatement** — verbatim copy of the spec's `## Verification` block (or generated criteria's equivalent). The plan is wrong if any verification command later fails because of a planning oversight.
+Also update `pipeline.state.json:phases.plan.{verdict, completed_at, duration_ms}`. Verdict: `PASS` if plan is shippable; `BLOCKED` if spec is internally contradictory or cannot be planned without violating constraints.
+</output>
+<quality_bar>
+- Scope first, then implementation. Decide what files to touch before deciding how to implement. Files not in the list are off-limits to IMPLEMENT.
+- Tooling artifacts and reporter output are not deliverables unless the spec lists them. Plan to configure tools to emit to gitignored paths.
+- Existing tests are contract. Plan to extend them; do not plan to remove or weaken them.
+- Spec frontmatter is read-only to PLAN and IMPLEMENT. The DOCS-style status flip happens in CLEANUP under a tight allowlist.
+- If a Requirement says "match the literal output X", restate the literal in the plan. Paraphrasing the contract here propagates into IMPLEMENT.
+</quality_bar>
+<runtime_principles>
+Read `_shared/runtime-principles.md` (Subtractive-first / Goal-locked / No-workaround / Evidence). Codex-routed phases receive the contract excerpt inlined:
+- Subtractive-first: prefer trimming an existing helper to introducing a new one. Pure-addition needs a cited prior failure mode or an explicit spec/user requirement.
+- Goal-locked: refuse "while I'm here" cleanups, speculative robustness, mid-flight re-scoping. Single test before any deviation: "did the user ask for this OR does the stated goal strictly require it?"
+- No-workaround: no `any`, no `@ts-ignore`, no silent `catch`, no hardcoded values, no helper scripts that bypass root cause.
+- Evidence: every claim cites file:line opened at planning time. Vague claims excluded.
+</runtime_principles>
+The task is: [orchestrator pastes the task description and spec context here]

package/config/skills/devlyn:resolve/references/phases/verify.md ADDED Viewed

@@ -0,0 +1,69 @@
+# PHASE 5 — VERIFY (canonical body, fresh subagent context)
+Per-engine adapter header is prepended at runtime. **You are spawned with empty conversation context.** No carry-over from PLAN / IMPLEMENT / BUILD_GATE / CLEANUP. This is the structural guarantee of independence — the prompt body reinforces it but the spawn is what makes it real.
+<role>
+Independent quality layer. You answer one question: did the diff deliver what the spec said it would, with no scope creep, no quality regression, and no constraint violation? You produce findings only — you have no code-mutation tools.
+</role>
+<input>
+- `spec.md` (or `.devlyn/criteria.generated.md` for free-form mode) — the contract.
+- `spec.expected.json` — the mechanical acceptance contract per `_shared/expected.schema.json`.
+- The cumulative diff against `state.base_ref.sha`.
+- The spec hash (`state.source.spec_sha256`) — re-read the spec from disk and confirm the hash matches; if it does not, write `state.phases.verify.verdict: "BLOCKED"` with reason `spec_sha256_mismatch` and stop.
+You do NOT receive: PLAN, IMPLEMENT's reasoning, BUILD_GATE's findings, CLEANUP's allowlist negotiations. Reading those would compromise independence.
+</input>
+<sub_phases>
+### MECHANICAL (deterministic)
+Re-run the mechanical checks fresh, independent of BUILD_GATE's earlier run:
+1. `python3 .claude/skills/_shared/spec-verify-check.py` against the post-CLEANUP code.
+2. Re-scan `spec.expected.json.forbidden_patterns` against the diff (Python re.search; honor each pattern's `files` allowlist).
+3. Confirm `required_files` exist post-diff; confirm `forbidden_files` do not appear in the diff.
+4. Confirm `max_deps_added` is not exceeded (`git diff -- package.json` for Node; equivalent for other ecosystems).
+Emit findings to `.devlyn/verify-mechanical.findings.jsonl`. Each match = one finding. Severity from the pattern's `severity` field (disqualifier → CRITICAL, warning → MEDIUM).
+### JUDGE (fresh-context grading)
+Grade the diff against the spec on rubric axes:
+- **Spec compliance** — did every Requirement get an `evidence` record pointing at code that satisfies it?
+- **Scope** — does the diff touch only files PLAN listed (or the cleanup allowlist)? Out-of-scope file = HIGH finding `scope.out-of-scope-violation`.
+- **Quality** — does the implementation follow the framework's idiomatic patterns, or are there hand-rolled helpers replacing standard primitives? `design.unidiomatic-pattern` MEDIUM if so.
+- **Consistency** — internal style (naming, error shape, module boundaries) consistent with the surrounding code.
+For each finding, write file:line evidence. Do not paraphrase code; quote it.
+**Coverage check**: before declaring done, confirm you have evidence for every spec axis. If you could not exercise an axis (the spec asks for behavior X but the diff does not touch the code that produces X), set `state.verify.coverage_failed: true` and surface the missing-evidence finding rather than passing on assumption.
+**Anti-self-filter rule**: report every finding you observe, including ones you consider low-severity or low-confidence. Tag each with `confidence: high|medium|low` and let the harness's downstream filter rank them. Filtering at this stage suppresses recall.
+### Pair-mode (when triggered by orchestrator)
+When the orchestrator spawns a second VERIFY agent with the OTHER engine's adapter, both judgments are merged:
+- Any HIGH/CRITICAL finding either model surfaces is verdict-binding.
+- Lower-severity disagreements are logged but do not change the verdict.
+- The orchestrator handles merge; you only emit your own findings.
+</sub_phases>
+<output>
+- `.devlyn/verify-mechanical.findings.jsonl` — MECHANICAL findings.
+- `.devlyn/verify.findings.jsonl` — JUDGE findings.
+- `state.phases.verify.{verdict, completed_at, duration_ms, sub_verdicts: {mechanical, judge}, artifacts}`. Verdict: WORSE of the two sub-verdicts. `PASS` requires zero CRITICAL/HIGH findings AND coverage met.
+</output>
+<quality_bar>
+- Independence is structural (fresh context) and behavioral (no code mutation). Both must hold.
+- Quote, do not paraphrase. Findings without quoted file:line evidence are excluded.
+- Coverage > confidence. Missing-evidence findings outrank a confident "looks fine."
+</quality_bar>
+<runtime_principles>
+Read `_shared/runtime-principles.md`. VERIFY's discipline is "the spec is the contract, the diff is the evidence, the verdict is the comparison."
+</runtime_principles>

package/config/skills/devlyn:resolve/references/state-schema.md ADDED Viewed

@@ -0,0 +1,106 @@
+# pipeline.state.json schema
+Single authoritative verdict source for `/devlyn:resolve`. The orchestrator branches on `state.phases.<name>.verdict` directly — never parses `.devlyn/*.findings.jsonl` for routing. Living document; bump `version` on a breaking change.
+## Top-level shape
+```json
+{
+  "version": "2.0",
+  "run_id": "rs-<UTC-timestamp>-<12-hex>",
+  "started_at": "2026-04-30T12:00:00Z",
+  "engine": "claude",
+  "mode": "spec",
+  "complexity": null,
+  "base_ref": { "branch": "main", "sha": "abc123..." },
+  "rounds": { "max_rounds": 4, "global": 0 },
+  "bypasses": [],
+  "implement_passed_sha": null,
+  "source": {
+    "type": "spec",
+    "spec_path": "docs/roadmap/phase-1/X.md",
+    "spec_sha256": "...",
+    "criteria_path": null,
+    "criteria_sha256": null
+  },
+  "criteria": [
+    { "id": "C1", "ref": "spec://requirements/0", "status": "pending", "evidence": [], "failed_by_finding_ids": [] }
+  ],
+  "phases": {
+    "plan": null,
+    "implement": null,
+    "build_gate": null,
+    "cleanup": null,
+    "verify": null,
+    "final_report": null
+  },
+  "verify": { "coverage_failed": false }
+}
+```
+## Field rules
+- **version** — string. Bump major on a breaking schema change.
+- **mode** — `"free-form" | "spec" | "verify-only"`.
+- **complexity** — `null | "trivial" | "medium" | "large"`. Free-form mode populates this; spec/verify-only mode leaves it null.
+- **engine** — `"claude" | "codex" | "auto"` initially; rewritten by engine-preflight if a downgrade fired.
+- **rounds.global** — incremented every fix-loop pass (BUILD_GATE → fix-loop OR VERIFY → fix-loop).
+- **bypasses** — array of phase names from `--bypass`. Valid: `"build-gate" | "cleanup"`. PLAN, IMPLEMENT, VERIFY are non-bypassable (orchestrator rejects at parse time).
+- **implement_passed_sha** — captured at end of PHASE 2; null until then. Activates the post-implement invariant for CLEANUP and VERIFY.
+- **criteria** — generated from spec's `## Requirements` checklist (one per `- [ ]`). `status: pending → implemented` is the legal transition. `failed_by_finding_ids` populates when VERIFY surfaces a finding tied to a criterion.
+- **verify.coverage_failed** — set by VERIFY's JUDGE sub-phase when a spec axis could not be exercised against the diff. Triggers pair-mode escalation when set.
+## Per-phase shape
+Each entry under `phases.<name>` (for `plan`, `implement`, `build_gate`, `cleanup`, `verify`, `final_report`):
+```json
+{
+  "started_at": "2026-04-30T12:00:01Z",
+  "completed_at": "2026-04-30T12:00:30Z",
+  "duration_ms": 29000,
+  "round": 0,
+  "triggered_by": null,
+  "verdict": "PASS",
+  "engine": "claude",
+  "model": "claude-opus-4-7",
+  "pre_sha": null,
+  "artifacts": { "findings_file": null, "log_file": null },
+  "sub_verdicts": null
+}
+```
+- `verdict` — `"PASS" | "PASS_WITH_ISSUES" | "FAIL" | "NEEDS_WORK" | "BLOCKED"`. PHASE 6 (FINAL_REPORT) writes its own verdict per the terminal-verdict precedence.
+- `triggered_by` — null on first run; one of `"build_gate" | "verify"` when the phase is a fix-loop respawn.
+- `pre_sha` — captured by orchestrator before CLEANUP and (if needed) other allowlist-enforced phases. Used to validate the post-spawn diff.
+- `sub_verdicts` — only populated for VERIFY: `{ "mechanical": "PASS|FAIL", "judge": "PASS|...", "pair_judge": "PASS|..." | null }`.
+## Write protocol
+1. **Before each phase spawn**: orchestrator writes `phases.<name>.{started_at, round, triggered_by}` and (when applicable) `pre_sha`.
+2. **After each agent returns**: orchestrator validates `verdict`, `completed_at`, `duration_ms`, `artifacts` are populated. Missing fields → orchestrator fills from observable state. Branching on a null verdict is undefined behavior.
+3. **Before archive** (PHASE 6 step 3): `phases.final_report.verdict` must be non-null. Archive prune skips runs whose final_report verdict is null (treated as in-flight).
+## Terminal verdict (PHASE 6)
+Precedence:
+1. `phases.<any>.verdict == "BLOCKED"` → terminal `BLOCKED:<reason>`.
+2. `phases.verify.verdict == "NEEDS_WORK"` after fix-loop exhaustion → terminal `NEEDS_WORK`.
+3. `phases.verify.verdict == "PASS_WITH_ISSUES"` → terminal `PASS_WITH_ISSUES`.
+4. `phases.verify.verdict == "PASS"` → terminal `PASS`.
+5. Verify-only mode: terminal = `phases.verify.verdict` directly (PHASE 1-4 are skipped).
+## Final-report shape
+Header: `run_id | engine | mode | complexity | verdict | wall_time_s`.
+Per-phase summary table: `phase | verdict | duration_ms | round | triggered_by | findings_count`.
+Findings table (post-IMPLEMENT phases only — they are findings-only): each finding's `severity | rule_id | file:line | message | confidence`.
+Follow-up notes: any `--continue-on-large` assumptions, any silent fallbacks (engine downgrade), any `state.verify.coverage_failed` axes.
+## Archive contract
+PHASE 6 step 4 moves `.devlyn/*` (excluding `.devlyn/runs/`) into `.devlyn/runs/<run_id>/`. The `.devlyn/runs/` directory keeps the last 10 completed runs (sorted by `started_at`). Best-effort prune; archive failure does not change the run's verdict.

package/{config/skills → optional-skills}/devlyn:design-system/SKILL.md RENAMED Viewed

@@ -1,4 +1,5 @@
 ---
+name: devlyn:design-system
 description: Extract all design values from selected style for exact reproduction
 argument-hint: <style-number> [platform] (e.g., "3", "3 flutter", "style 2 web")
 allowed-tools: Bash(ls:*), Bash(cat:*), Bash(grep:*), View, Edit, Write

package/{config/skills → optional-skills}/devlyn:reap/SKILL.md RENAMED Viewed

@@ -1,4 +1,5 @@
 ---
+name: devlyn:reap
 description: Safely count and kill orphaned child processes (PPID=1) left behind by Claude Code MCP plugins, Superset terminal tabs, and codex wrappers. Use this whenever the user says "too many processes", "can't open terminals", "pty/process limit", "hundreds of bun/codex/workerd piling up", "clean up orphans", "reap processes", or reports new terminals failing to spawn on macOS. Also use proactively after long Claude sessions to prevent hitting kern.maxprocperuid or kern.tty.ptmx_max limits. ONLY touches a conservative whitelist of known leaks — never guesses on unknown processes.
 allowed-tools: Read, Bash(ps:*), Bash(lsof:*), Bash(pgrep:*), Bash(awk:*), Bash(id:*), Bash(sysctl:*), Bash(bash:*)
 argument-hint: [scan | kill | kill --force | kill --include workerd | kill --only telegram-bun]

package/{config/skills → optional-skills}/devlyn:team-design-ui/SKILL.md RENAMED Viewed

@@ -1,3 +1,8 @@
+---
+name: devlyn:team-design-ui
+description: Assemble a world-class design team to generate 5 radically distinct, portfolio-worthy UI style explorations. Like /devlyn:design-ui but powered by a full team of design specialists.
+---
 Assemble a world-class design team to generate 5 radically distinct, portfolio-worthy UI style explorations. Like `/devlyn:design-ui` but powered by a full team of design specialists — Creative Director, Product Designer, Visual Designer, Interaction Designer, and Accessibility Designer — who collaborate to produce 5 stunning HTML design samples that go far beyond what a single designer could achieve.
 This is design exploration only. After the user picks a style:

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "devlyn-cli",
-  "version": "1.15.0",
+  "version": "2.1.0",
   "description": "AI development toolkit for Claude Code — ideate, auto-resolve, and ship with context engineering and agent orchestration",
   "homepage": "https://github.com/fysoul17/devlyn-cli#readme",
   "bin": {
@@ -19,7 +19,17 @@
     "!config/skills/roadmap-archival-workspace/**",
     "agents-config",
     "optional-skills",
-    "CLAUDE.md"
+    "benchmark/auto-resolve/BENCHMARK-DESIGN.md",
+    "benchmark/auto-resolve/README.md",
+    "benchmark/auto-resolve/RUBRIC.md",
+    "benchmark/auto-resolve/fixtures/SCHEMA.md",
+    "benchmark/auto-resolve/fixtures/F*/**",
+    "benchmark/auto-resolve/fixtures/test-repo/**",
+    "!benchmark/auto-resolve/fixtures/test-repo/node_modules/**",
+    "benchmark/auto-resolve/scripts/**",
+    "scripts/lint-skills.sh",
+    "CLAUDE.md",
+    "AGENTS.md"
   ],
   "keywords": [
     "claude",