npm - devlyn-cli - Versions diffs - 1.14.0 → 2.0.0 - Mend

devlyn-cli 1.14.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (148) hide show

package/AGENTS.md ADDED Viewed

@@ -0,0 +1,104 @@
+# Codex Project Instructions
+devlyn-cli installs `/devlyn:ideate` (optional) and `/devlyn:resolve` (required) into Claude Code, plus the contract below. Codex CLI reads this file when invoked inside a project that has it. The principles are non-negotiable on every change.
+## Core principles
+Seven rules govern every change. Cite them by name when a decision touches one.
+1. **No workaround** — fix the root cause, never the symptom. No `any`, no `@ts-ignore`, no silent `catch`, no hardcoded fallback that hides a broken contract.
+2. **No overengineering** — smallest change that closes the goal. New abstractions require an observed failure mode they prevent. Subtractive-first: ask "what can I delete instead?" before writing anything new.
+3. **No guesswork** — verify with the actual files, logs, diffs, and run output before forming conclusions. State the falsifiable prediction BEFORE the experiment; record raw results AFTER.
+4. **Worldclass** — code that survives review at a non-trivial codebase. Zero CRITICAL, zero HIGH security/design findings on the shippable path.
+5. **Best practice** — idiomatic for the language and framework. Use standard primitives; do not hand-roll what the library already provides.
+6. **Optimized** — efficient on the resource that matters (wall-time, tokens, attention, cognitive load on the next reader). "Slower but more thoughtful" is not free.
+7. **Production ready** — error states are explicit and visible; behavior under failure is what the user expects, not silent corruption.
+Three discipline rules govern HOW the principles are applied:
+- **Root cause via flexible why-chain.** Keep asking "why?" until you find the violated invariant. **If the answer surfaces in 2 questions, stop.** If it takes 5 or 7, keep going. Strict counts are wrong; until-found is right.
+- **First-principles thinking.** Challenge the requirement before optimizing the answer. Most "we have to do X" assumptions are habit, not necessity. Reduce to irreducible truths and rebuild from there.
+- **Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away.** — Saint-Exupéry. The operating definition of "done." A change is finished when no further line, branch, flag, or doc paragraph can be removed without breaking a learned failure mode.
+## Quick Start
+```text
+ideate (optional)  ->  resolve  ->  ship
+```
+- `/devlyn:ideate` (optional) — unstructured idea → `docs/specs/<id>/spec.md` + `spec.expected.json`. Modes: default Q&A, `--quick` (autonomous-pipeline-safe), `--from-spec <path>`, `--project` (multi-feature).
+- `/devlyn:resolve` — hands-free pipeline for any coding task. Free-form goal, `--spec <path>`, or `--verify-only <ref> --spec <path>`. Phases run inline: PLAN → IMPLEMENT → BUILD_GATE → CLEANUP → VERIFY (fresh-subagent, findings-only).
+- Three creative power-user skills (`/devlyn:reap`, `/devlyn:design-system`, `/devlyn:team-design-ui`) live in `optional-skills/` and install only when the user opts in.
+Each skill's `SKILL.md` is the source of truth for flags and workflow. Do not duplicate.
+## Subtractive-First Editing
+Before writing any change, answer in this order:
+1. What can I delete that makes the addition unnecessary?
+2. What can I delete that makes the addition smaller?
+3. What is the minimum addition still required?
+Hard rules:
+- A pure-addition diff needs a citation: an explicit user/spec requirement OR an observed failure mode.
+- Refactor-only changes should reduce line count unless a cited failure requires the new shape.
+- Do not add flags, branches, or options for hypothetical users.
+- Do not add defensive wrappers when an upstream contract can be corrected instead.
+- Doc growth has the same cost as code growth. Delete the now-stale sentence before adding new prose.
+- A change is not done until you have attempted one more deletion and confirmed it would break something.
+## Goal-Locked Execution
+Default mode is execution toward the user's stated goal. Do not drift.
+Refuse these patterns:
+- Unrequested work ("while here, also fix...").
+- Tangential cleanup in files the task does not require.
+- Speculative robustness for cases not observed in production, tests, findings, or the spec.
+- Mid-flight re-scoping without user approval.
+- Curiosity exploration that is not on the critical path.
+Drift test: **did the user ask for this, OR does the stated goal strictly require it?** If both no, surface it as a follow-up note and continue on the original path.
+In interactive sessions, ask a concise clarification when scope expansion is real. In hands-free pipelines, stay on scope and log the assumption in the final artifact.
+## Error Handling
+No silent fallbacks.
+- Show clear errors, retry paths, or actionable guidance.
+- Fallbacks allowed only when widely accepted and harmless (CSS fallback fonts, CDN failover, image placeholders).
+- Silent `catch` blocks are bugs.
+- Logging is not user-visible error handling.
+- The Codex CLI availability downgrade is the one documented exception: emit `engine downgraded: codex-unavailable` and behave exactly like explicit Claude routing.
+## Evidence Over Claim
+Every finding cites concrete evidence:
+- Code: `file:line` you opened.
+- Missing implementation: state exactly what you searched and found absent.
+- Doc: cite the stale text + section/line.
+- Browser: route/URL + screenshot or observed evidence.
+- Benchmark: run id, fixture id, metric, raw result path.
+Exclude vague claims. They produce vague fixes.
+## Working In a devlyn-cli Project
+- Check `git status --short` before editing.
+- Never revert user changes unless explicitly asked.
+- Use `rg` / `rg --files` for search.
+- Keep changes scoped to the task; stop when the core request is answered or the change is verified.
+- For installer or skill edits inside the devlyn-cli source repo, run `bash scripts/lint-skills.sh`.
+- Treat the project's own `docs/VISION.md`, `docs/ROADMAP.md`, `docs/roadmap/**`, and any local `AGENTS.md` / `CLAUDE.md` overrides as authoritative when present.
+## Communication
+- Lead with objective evidence before opinion.
+- Be concise and specific.
+- State blockers plainly.
+- Separate completed work, verification, and remaining risks.

package/CLAUDE.md CHANGED Viewed

@@ -1,176 +1,169 @@
 # Project Instructions
-## Quick Start
-For most work, the recommended sequence is:
+devlyn-cli installs `/devlyn:ideate` (optional) and `/devlyn:resolve` (required) into Claude Code, plus the contract below. These principles are non-negotiable on every change — yours and any sub-agent's.
-1. `/devlyn:ideate` — turn an idea into roadmap-ready specs
-2. `/devlyn:auto-resolve "Implement per spec at docs/roadmap/phase-N/X-name.md"` — hands-free build → evaluate → polish
-3. `/devlyn:preflight` — verify the implementation matches the roadmap before shipping
+## Core principles
-All three default to `--engine auto`, which routes each phase to the optimal model (Codex GPT-5.4 for hard coding, Claude Opus 4.7 for evaluation/critique). The cross-model GAN dynamic — different models build vs critique — catches what single-model pipelines miss.
+Seven rules govern every change. Cite them by name when a decision touches one.
-## General
+1. **No workaround** — fix the root cause, never the symptom. No `any`, no `@ts-ignore`, no silent `catch`, no hardcoded fallback that hides a broken contract. Configuration-level skips that bypass the real issue are also rejects.
+2. **No overengineering** — smallest change that closes the goal. New abstractions require an observed failure mode they prevent. Subtractive-first: ask "what can I delete instead?" before writing anything new.
+3. **No guesswork** — verify with the actual files, logs, diffs, and run output before forming conclusions. State the falsifiable prediction BEFORE the experiment; record raw results AFTER. Retroactive prediction edits are dishonest.
+4. **Worldclass** — code that survives review at a non-trivial codebase. Zero CRITICAL, zero HIGH security/design findings on the shippable path.
+5. **Best practice** — idiomatic for the language and framework. Use standard primitives; do not hand-roll what the library already provides.
+6. **Optimized** — efficient on the resource that matters (wall-time, tokens, attention, cognitive load on the next reader). "Slower but more thoughtful" is not free. Each layer of composition or process must beat the simpler baseline.
+7. **Production ready** — error states are explicit and visible; behavior under failure is what the user expects, not silent corruption.
-- Proactively use subagents and skills where needed
-- Follow commit conventions in `.claude/commit-conventions.md`
-- Follow design system in `docs/design-system.md` for UI/UX work if exist
+Three discipline rules govern HOW the principles are applied:
-## Error Handling Philosophy
+- **Root cause via flexible why-chain.** Keep asking "why?" until you find the violated invariant. **If the answer surfaces in 2 questions, stop.** If it takes 5 or 7, keep going. Strict counts are wrong; until-found is right.
+- **First-principles thinking.** Challenge the requirement before optimizing the answer. Most "we have to do X" assumptions are habit, not necessity. Reduce the problem to its irreducible truths and rebuild from there.
+- **Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away.** — Saint-Exupéry. This is the operating definition of "done." A change is finished when no further line, branch, flag, or doc paragraph can be removed without breaking a learned failure mode. Not before.
-**No silent fallbacks.** Handle errors explicitly and show the user what happened.
+The runtime sub-agent contract below (Subtractive-first / Goal-locked / No-workaround discipline / Evidence over claim) expands these principles into concrete operational tests. Sub-agents in `/devlyn:resolve` and `/devlyn:ideate` enforce them at every phase.
-- **Default behavior**: When something fails, display a clear error state in the UI (error message, retry option, or actionable guidance). Do NOT silently fall back to default/placeholder data.
-- **Fallbacks are the exception, not the rule.** Only use fallbacks when it is a widely accepted best practice (e.g., fallback fonts in CSS, CDN failover, graceful image loading with placeholder). If unsure, handle the error explicitly instead.
-- **Never hide failures.** The user should always know when something went wrong. A visible error with a retry button is better UX than silently showing stale/default data.
-- **Pattern**: `try { doThing() } catch (error) { showErrorUI(error) }` — NOT `try { doThing() } catch { return fallbackValue }`
+## Quick Start
-## Investigation Workflow
+Two skills cover the full cycle post iter-0034 Phase 4 cutover (2026-05-04). `/devlyn:ideate` is OPTIONAL; `/devlyn:resolve` is REQUIRED. **Both default to `--engine claude`** — pair / multi-engine routing is research-only at HEAD per the iter-0020 + iter-0033g + iter-0034 close-outs (see [`autoresearch/iterations/0020-pair-policy-narrow.md`](autoresearch/iterations/0020-pair-policy-narrow.md) + [`autoresearch/iterations/0034-phase-4-cutover.md`](autoresearch/iterations/0034-phase-4-cutover.md)). Pass `--engine auto` or `--engine codex` explicitly to opt into the research path; the harness silently downgrades to `claude` and emits a banner if the Codex CLI is missing.
-When investigating bugs, analyzing features, or exploring code:
+1. `/devlyn:ideate` (optional) — unstructured idea → `docs/specs/<id>/spec.md` + `spec.expected.json`. Modes: default Q&A, `--quick` (autonomous-pipeline-safe), `--from-spec <path>`, `--project`.
+2. `/devlyn:resolve` — hands-free pipeline for any coding task. Free-form goal, `--spec <path>`, or `--verify-only <diff> --spec <path>`. Phases: PLAN → IMPLEMENT → BUILD_GATE → CLEANUP → VERIFY (fresh subagent, findings-only).
-1. **Define exit criteria upfront** - Ask "What does 'done' look like?" before starting
-2. **Checkpoint progress** - Use the task tools (TaskCreate / TaskUpdate) every 5-10 minutes to save findings
-3. **Output intermediate summaries** - Provide "Current Understanding" snapshots so work isn't lost if interrupted
-4. **Always deliver findings** - Never end mid-analysis; at minimum output:
-   - Files examined
-   - Key findings
-   - Remaining unknowns
-   - Recommended next steps
+Each skill's `SKILL.md` is the source of truth for its flags and workflow — don't duplicate them here.
-For complex investigations, use `/devlyn:team-resolve` to assemble a multi-perspective investigation team, or spawn parallel Agent subagents to explore different areas simultaneously.
+### When to use which
-## UI/UX Workflow
+| Situation | Command |
+|-----------|---------|
+| New project / multi-feature plan / external spec to normalize | `/devlyn:ideate` (`--project`, default, or `--from-spec`) |
+| One-line goal you want turned into a spec autonomously | `/devlyn:ideate --quick` |
+| Spec already in hand | `/devlyn:resolve --spec <path>` |
+| Free-form fix / feature / refactor / debug / PR review | `/devlyn:resolve "<describe the work>"` |
+| Verify a diff or PR against a spec | `/devlyn:resolve --verify-only <ref> --spec <path>` |
-The full design-to-implementation pipeline:
+### Subtractive-first editing — perfection = nothing left to remove
+<!-- runtime-principles:section=subtractive-first:begin -->
-1. `/devlyn:design-ui` → Generate 5 style explorations
-2. `/devlyn:design-system [N]` → Extract tokens from chosen style
-3. `/devlyn:implement-ui` → Team implements or improves UI from design system
-4. `/devlyn:team-resolve [feature]` → Add features on top
+> "Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away." — Saint-Exupéry. **This is the operating definition of "done" in this repo.** A change is finished when no further line, branch, flag, or doc paragraph can be removed without breaking a learned failure mode. Not before.
-## Feature Development
+This rule overrides instinct. LLMs (including you) are trained on corpora that reward elaborate, defensive, "thorough" code — so the default impulse is to add. That impulse is wrong here. Read the rules below as hard tests, not aesthetic preferences. They are not optional, not negotiable, and not satisfiable by writing more careful additions.
-1. **Plan first** - Always output a concrete implementation plan with specific file changes before writing code
-2. **Track progress** - Use the task tools (TaskCreate / TaskUpdate) to checkpoint each phase
-3. **Test validation** - Write tests alongside implementation; iterate until green
-4. **Small commits** - Commit working increments rather than large changesets
+**Mandatory pre-edit question.** Before writing any change, you must answer in this order:
-For complex features, spawn the `Plan` subagent (`Agent` tool with `subagent_type: "Plan"`) to design the approach before implementation.
+1. **What can I delete that makes the addition unnecessary?** If the addition becomes redundant after the deletion, ship the deletion alone.
+2. **What can I delete that makes the addition smaller?** Trim the surrounding accretion before adding.
+3. **Only then**, what is the minimum addition required?
-## Automated Pipeline (Recommended Starting Point)
+If you skip question 1 or 2, you are violating this rule even if the resulting code looks clean.
-For hands-free build-evaluate-polish cycles — works for bugs, features, refactors, and chores:
+**Hard tests every edit must pass:**
-```
-/devlyn:auto-resolve [task description]
-```
+- **Net-negative is the default; pure-addition needs a citation.** A diff that adds N lines and removes 0 must point to a specific cause: a previously-observed failure mode (commit hash, fixture ID, finding ID, user-reported incident), OR an explicit user request / spec requirement that demands new user-visible behavior. The latter is a sufficient citation — do not block legitimate requested additions on the absence of a past failure. What is rejected: vague justifications like "it seems clearer," "for future flexibility," "just in case," "to be safe," "for completeness," "to handle edge cases" — these are the exact phrases that produce accretion.
+- **Delete the line that makes the bug impossible, not the line that catches it.** Defensive wrappers, validation layers, error normalizers, and `try/catch` shells are usually evidence that an upstream contract is unclear. Fix the contract upstream and remove the defenses downstream. The trap: adding the wrapper feels like progress because it makes a test pass. The wrapper is debt; the contract fix is the work. **Scope guard**: if the upstream contract fix is outside the user's stated scope, stop and surface the scope expansion to the user before editing — Goal-locked execution overrides this. The right scope-expansion outcome is "user authorizes the upstream fix" or "user accepts a scoped local fix and a follow-up for upstream"; never silently restructure something the user didn't ask you to.
+- **A new flag, branch, or option is admitting two failures**: (a) the default was wrong, (b) every reader pays attention cost forever. Default-fix-and-delete-flag beats add-flag-with-better-default. The bar for adding a configuration knob is "I have observed two real users with genuinely conflicting needs," not "this might be useful someday."
+- **Doc additions are subject to the same rule.** Before adding a section to any `.md` file (CLAUDE.md, SKILL.md, README, references/), find the now-stale sentence or section the new one supersedes — delete that first. A growing instructions file dilutes the instructions that actually need to be followed; readers (human and LLM) skim long files and miss load-bearing rules.
+- **A "cleaner" refactor that grows line count is not cleaner.** It is a sideways move that increases context, parsing, and review cost. **For refactor-only changes**, line count must drop unless a cited observed failure requires the new shape. **Never delete tests, contracts, public API, comments documenting non-obvious WHY, or user-facing behavior just to win the count** — that is gaming the metric, not honoring the principle. The metric serves complexity reduction; if a deletion would lose information not recoverable from code + commit history, it is the wrong deletion.
+- **Stop adding when no further deletion is possible.** This is the Saint-Exupéry test inverted into a stopping rule: if you have made an addition and you cannot identify anything else that can be removed, examine the addition itself — is part of it still removable? Iterate until the diff is irreducible.
-This runs the full pipeline automatically: **Build → Build Gate → Browser Validate → Evaluate → Fix Loop → Simplify → Review → Challenge → Security Review → Clean → Docs**. Each phase runs as a separate subagent with its own context. Communication between phases happens via files (`.devlyn/done-criteria.md`, `.devlyn/BUILD-GATE.md`, `.devlyn/EVAL-FINDINGS.md`, `.devlyn/BROWSER-RESULTS.md`, `.devlyn/CHALLENGE-FINDINGS.md`).
+**Anti-rationalization clause** — explicitly guarding against LLM-style hedging:
-The **Build Gate** (Phase 1.4) runs real compilers, typecheckers, and linters — the same commands CI/Docker/production will run. It auto-detects project types (Next.js, Rust, Go, Solidity, Expo, Swift, etc.) and Dockerfiles. This is the primary defense against "tests pass locally, breaks in CI/Docker" class of bugs (type errors in un-tested files, cross-package drift, Dockerfile copy mismatches).
+- "More explicit is safer" is **not** a justification. Explicitness has a cost in attention and rot. Required-explicit goes in; nice-to-explicit gets cut.
+- "Adding context for future readers" is **not** a justification. Future readers benefit more from shorter files than from explanatory prose. The code and the commit message together carry the why.
+- "Defense-in-depth" is **not** a justification at the harness layer. Two layers that catch the same bug are evidence one of them should be the only layer.
+- If you find yourself writing the phrase "in case" in a comment, code reviewer note, or doc, **stop and re-evaluate** — that phrase predicts an unjustified addition.
-The **Challenge** phase (Phase 4.5) is a fresh skeptical review with no checklist — a subagent reads the entire diff cold with zero context from prior phases and asks "would I ship this to production with my name on it?" This catches the subtle issues that structured checklist-driven reviews miss: wrong-but-working approaches, unstated assumptions, non-idiomatic patterns, and integration gaps.
+**Stopping rule.** A change is done when (a) all hypotheses it was meant to close are closed, AND (b) you have attempted at least one further deletion and confirmed it would break something. If you have not tried to delete more, you are not done. If nothing can be deleted to justify the current addition, the addition itself is too large — re-scope or surface the conflict to the user before proceeding.
-For web projects, the Browser Validate phase starts the dev server and tests the implemented feature in a real browser — clicking buttons, filling forms, verifying results. If the feature doesn't work, findings feed back into the fix loop.
+**Never grow surface area silently.** Every accretion-shaped change must be visible: in the commit message, in the iteration file, or in a flagged review. Silent growth is the failure mode this rule exists to prevent.
+<!-- runtime-principles:section=subtractive-first:end -->
-Optional flags:
-- `--max-rounds 6` — increase max evaluate-fix iterations (default: 4)
-- `--skip-browser` — skip browser validation phase (auto-skipped for non-web changes)
-- `--skip-build-gate` — skip the deterministic build gate (not recommended)
-- `--build-gate strict` — treat warnings as errors; `--build-gate no-docker` — skip Docker builds for speed
-- `--skip-review` — skip team-review phase
-- `--skip-clean` — skip clean phase
-- `--skip-docs` — skip update-docs phase
-- `--engine auto|codex|claude` — intelligent model routing. `auto` (default) routes each phase and team role to the optimal model based on benchmark data: Codex GPT-5.4 handles BUILD and FIX (SWE-bench Pro lead), Claude Opus 4.7 handles EVALUATE and CHALLENGE (long-context retrieval + skeptical reasoning). Different models build vs critique — the cross-model GAN dynamic catches what single-model pipelines miss. `codex` forces Codex for implementation, Claude for orchestration and Chrome MCP. `claude` uses Claude for everything. Requires codex-mcp-server for `auto` and `codex` modes.
+### Goal-locked execution — stay on the North Star, do not wander
+<!-- runtime-principles:section=goal-locked:begin -->
-## Preflight Check (Post-Roadmap Verification)
+Even with a North Star defined, work drifts off-course ("산으로 간다" / "삼천포로 빠진다" — going up the wrong mountain instead of forward). The harness must **actively block** this drift at run time, not merely discourage it. The default is ruler-straight execution toward the user's stated goal; any deviation requires explicit justification, not the inverse.
-After completing a roadmap (or a phase), verify that everything was actually implemented correctly:
+This rule exists because LLMs (including you) are trained to be helpful, comprehensive, and thorough — and "helpful" easily becomes "did more than asked." Doing more than asked is not helpfulness; it is scope creep. Read the rules below as hard blocks, not soft preferences.
-```
-/devlyn:preflight
-```
+**The five drift patterns you must refuse to execute on:**
-This reads every commitment from VISION.md, ROADMAP.md, and item specs, then audits the codebase evidence-based. The code auditor now runs real build/typecheck commands as its first step — any project that doesn't compile is flagged as BROKEN at CRITICAL severity before individual commitments are even checked. Also checks in the browser for web projects.
+1. **Unrequested work.** "While I'm here, I noticed X is broken/ugly/inefficient" → **stop**. The user did not ask for X. If X is a real defect, surface it as a finding, a follow-up suggestion, or an entry in a TODO list — do NOT fix it inside the current change. Mixing unrequested work with requested work is what makes diffs unreviewable and PRs eternal.
+2. **Tangential cleanup.** "This file looks messy, let me also tidy..." → **stop**. The current task is the only task. Unrelated cleanup is a separate change requiring its own justification, scope, and pre-flight 0 check.
+3. **Speculative robustness.** "Just adding a check / fallback / handler for the case where..." → **stop**. If the case has not been observed (in production, in tests, in a finding), it does not belong in this change. Defensive code added for unobserved cases is the most common form of accretion debt — it never gets removed because nobody can prove the case never happens.
+4. **Re-scoping mid-flight.** "Actually, the better way to do this is to also restructure / rename / migrate..." → **stop**. If you discover the requested approach is wrong, surface that to the user with evidence and let them adjudicate. Do NOT silently expand scope. The user's explicit redirect is the only authorization to enlarge a task.
+5. **Curiosity detours.** "Let me also explore how Y works to understand this better..." → **stop**, unless Y is provably on the goal's critical path. Curiosity-driven exploration is creative-mode; default is execution-mode.
-Output: `.devlyn/PREFLIGHT-REPORT.md` with categorized findings (MISSING, INCOMPLETE, DIVERGENT, BROKEN, STALE_DOC). Confirmed gaps can be promoted to new roadmap items for auto-resolve.
+**The single drift test before any deviation from the stated goal:** *"Did the user ask for this, OR does the user's stated goal strictly require it?"* If the answer to both is no, do not do it. Surface it as a note (commit message, end-of-turn summary, finding) and continue on the original path.
-Optional flags:
-- `--phase N` — audit only phase N items
-- `--autofix` — auto-promote CRITICAL/HIGH findings and run auto-resolve
-- `--skip-browser` — skip browser validation
-- `--skip-docs` — skip documentation audit
-- `--engine auto|codex|claude` — `auto` (default) routes the code-auditor to Codex (SWE-bench Pro +11.7pp on code analysis); the docs-auditor and browser-auditor always use Claude regardless of `--engine` (writing-quality strength on docs drift; Chrome MCP tools are session-bound to Claude Code)
+**Creative-mode is the narrow exception, not the default.** Creative-mode applies only when (a) the user explicitly invoked an ideation/exploration surface (`/devlyn:ideate`, optional `/devlyn:design-system`, "let's brainstorm", "explore options for"), OR (b) the goal is genuinely under-specified and a clarifying question is impossible (extremely rare — usually you should ask). For everything else — bug fixes, feature work, refactors, doc updates, pipeline runs, code review, debugging — execution-mode is the default and drift is a defect, not a feature.
-**Recommended workflow**: `/devlyn:ideate` → `/devlyn:auto-resolve` (repeat) → `/devlyn:preflight` → fix gaps → `/devlyn:preflight` (verify)
+**Anti-rationalization clause** — explicitly guarding against LLM hedging:
-## Manual Pipeline (Step-by-Step Control)
+- "It's a small extra change" is **not** a justification. Small accretions compound; one of them is always small.
+- "It's related to what they asked for" is **not** a justification. Related ≠ requested. Requested is the only standard.
+- "It would be incomplete without this" is **not** a justification. The user defines completeness, not your sense of it.
+- "I'm being thorough" is **not** a justification. Thoroughness on the requested goal is required; thoroughness extending past the goal is drift.
-When you want to run each step yourself with review between phases:
+**When in doubt, ask — outside hands-free pipelines.** In interactive sessions a short clarification ("the requested fix touches the X code path; I notice Y also looks broken — should I fix it in this change or surface it as a follow-up?") is always cheaper than a wrong-scope diff. Asking is not a weakness; silently expanding scope is. **Inside hands-free pipelines** (`/devlyn:resolve`, scheduled remote agents, autonomous skill runs) the contract forbids mid-pipeline prompts — there asking is unsafe because there is no user to answer. The substitute is: stay strictly on the requested goal, do not expand scope, and log the question/assumption explicitly in the final report (or `.devlyn/runs/<run_id>/` artifacts) so the user can adjudicate after the run completes. Choosing scope creep over logging-and-staying-on-path is always wrong.
-1. `/devlyn:team-resolve [issue]` → Investigate + implement (writes `.devlyn/done-criteria.md`)
-2. `/devlyn:evaluate` → Grade against done-criteria (writes `.devlyn/EVAL-FINDINGS.md`)
-3. If findings exist: `/devlyn:team-resolve "Fix issues in .devlyn/EVAL-FINDINGS.md"` → Fix loop
-4. `/simplify` → Quick cleanup pass
-5. `/devlyn:team-review` → Multi-perspective team review (for important PRs)
-6. `/devlyn:clean` → Codebase hygiene
-7. `/devlyn:update-docs` → Keep docs in sync
+**Stopping rule.** A task is done when the user's stated goal is closed AND no off-path work was added. If you find yourself hesitating because "I should also do Z" — Z is drift. Note it for follow-up, do not execute.
+<!-- runtime-principles:section=goal-locked:end -->
-Steps 5-7 are optional depending on scope.
+## Error Handling Philosophy
-## Vibe Coding Workflow
+**No silent fallbacks.** Handle errors explicitly and show the user what happened.
-The recommended sequence after writing code:
+- **Default**: when something fails, display a clear error state — message, retry option, or actionable guidance. Do NOT silently fall back to default/placeholder data.
+- **Fallbacks are the exception.** Only use them when it's a widely accepted best practice (CSS fallback fonts, CDN failover, image placeholders). Otherwise handle the error explicitly.
+- **Pattern**: `try { doThing() } catch (error) { showErrorUI(error) }` — NOT `try { doThing() } catch { return fallbackValue }`.
-1. **Write code** (vibe coding)
-2. `/simplify` → Quick cleanup pass (reuse, quality, efficiency)
-3. `/devlyn:review` → Thorough solo review with security-first checklist
-4. `/devlyn:team-review` → Multi-perspective team review (for important PRs)
-5. `/devlyn:clean` → Periodic codebase-wide hygiene
-6. `/devlyn:update-docs` → Keep docs in sync
+### No-workaround discipline (runtime salience)
+<!-- runtime-principles:section=no-workaround:begin -->
-Steps 4-6 are optional depending on the scope of changes. `/simplify` should always run before `/devlyn:review` to catch low-hanging fruit cheaply.
+No `any`, no `@ts-ignore`, no silent `catch`, no hardcoded values, no helper scripts that bypass the root cause. Fix root causes; handle errors with user-visible state per the rule above.
-## Documentation Workflow
+**Permitted exceptions** (explicitly carved out):
+- CSS fallback fonts, CDN failover, image placeholders — widely-accepted best practices.
+- Codex CLI availability downgrade — the one documented silent fallback in this repo. Fires when the resolved engine is `auto` or `codex` (either via skill default or explicit `--engine` flag) and the Codex CLI is absent. Banner `engine downgraded: codex-unavailable` always prints; verdict identical to `--engine claude`. Any other silent fallback in skills code is a bug — file it against the skill that introduced it.
+<!-- runtime-principles:section=no-workaround:end -->
-- **Sync docs with codebase**: Use `/devlyn:update-docs` to clean up stale content, update outdated info, and generate missing docs
-- **Focused doc update**: Use `/devlyn:update-docs [area]` for targeted updates (e.g., "API reference", "getting-started")
-- Preserves all forward-looking content: roadmaps, future plans, visions, open questions
-- If no docs exist, proposes a tailored docs structure and generates initial content
+### Evidence over claim
+<!-- runtime-principles:section=evidence:begin -->
-## Browser Testing Workflow
+Every finding cites concrete evidence. Vague claims are speculation; exclude them.
-- **Standalone**: Use `/devlyn:browser-validate` to test any web feature in the browser — starts the dev server, tests the feature end-to-end, fixes issues it finds
-- **In pipeline**: Auto-resolve includes browser validation automatically for web projects (between Build and Evaluate phases)
-- **Tiered**: Uses chrome MCP tools if available, falls back to Playwright, then curl
-- **Feature-first**: Tests the implemented feature (from done-criteria), not just "does the page load"
+- **Code findings**: `file:line` you have opened.
+- **Missing findings**: explicit "searched X and found no implementation" statement.
+- **Doc findings**: quote of the stale text + section/line reference.
+- **Browser findings**: screenshot reference + URL/route.
-## Debugging Workflow
+A finding without one of these forms is excluded. Vague findings produce vague fixes.
+<!-- runtime-principles:section=evidence:end -->
-- **Simple bugs**: Use `/devlyn:resolve` for systematic bug fixing with test-driven validation
-- **Complex bugs**: Use `/devlyn:team-resolve` for multi-perspective investigation with a full agent team
-- **Hands-free**: Use `/devlyn:auto-resolve` for fully automated resolve → evaluate → fix → polish pipeline
-- **Post-fix review**: Use `/devlyn:team-review` for thorough multi-reviewer validation
+## Codex invocation
-## Maintenance Workflow
+When `/devlyn:resolve` or `/devlyn:ideate` route a phase to Codex (`--engine codex` or `--engine auto`), the wrapper-form contract lives in `config/skills/_shared/codex-config.md` (or `.claude/skills/_shared/codex-config.md` once installed). Omit `-m <model>` — the CLI's current flagship is used automatically. MCP is not in the loop. If the Codex CLI is absent the harness silently downgrades to Claude and prints `engine downgraded: codex-unavailable` in the final report.
-- **Codebase cleanup**: Use `/devlyn:clean` to detect and remove dead code, unused dependencies, complexity hotspots, and tech debt
-- **Focused cleanup**: Use `/devlyn:clean [category]` for targeted sweeps (dead code, deps, tests, complexity, hygiene)
-- **Periodic maintenance sequence**: `/devlyn:clean` → `/simplify` → `/devlyn:update-docs` → `/devlyn:review`
+## Working Mode
-## Context Window Management
+- **Checkpoint with TaskCreate / TaskUpdate.** Long investigations or multi-phase work: create tasks at start, mark completed as each one closes — don't batch.
+- **Don't stop early on token-budget concerns.** Context auto-compacts; the model resumes after compaction. Run the work to a real stopping point.
+- **Persist across context windows via disk.** `/devlyn:resolve` writes `.devlyn/pipeline.state.json` plus per-phase log/findings under `.devlyn/runs/<run_id>/`; for ad-hoc long work use `HANDOFF.md` and resume with `@HANDOFF.md continue`.
+- **Parallelize independent tool calls; reserve `Agent` subagents for independent fan-out or isolated-context verification** — a single perspective is the default on the resolve hot path.
-Claude 4.5 / 4.6 / 4.7 models auto-compact the conversation as it approaches the context limit, so you can keep working indefinitely without manual handoffs in most cases. Don't stop early due to token-budget concerns — the model continues from where it left off after compaction.
+## Skill Boundary Policy
-For genuinely multi-context-window work (e.g., a roadmap with many phases), persist state to disk so the next instance can resume:
-- All `auto-resolve` and `preflight` runs already write durable state to `.devlyn/*.md` (done-criteria, BUILD-GATE, EVAL-FINDINGS, BROWSER-RESULTS, CHALLENGE-FINDINGS, PREFLIGHT-REPORT) and to git commits — pick up by reading those files plus `git log`.
-- For long investigations, write progress notes to a `HANDOFF.md` and resume with `@HANDOFF.md continue from where this left off` if you need a fresh window.
+Post iter-0034 Phase 4 cutover (2026-05-04) the runtime product surface is two skills — `/devlyn:resolve` and `/devlyn:ideate`. `/devlyn:resolve` runs PLAN → IMPLEMENT → BUILD_GATE → CLEANUP → VERIFY inline; verification, cleanup, and security review (delegated to the native `security-review` Claude Code skill from BUILD_GATE) all live inside the pipeline. There are no standalone `/devlyn:review`, `/devlyn:evaluate`, `/devlyn:team-resolve`, etc. surfaces to delegate to — those skills were folded into resolve's phases or removed in iter-0034. Three creative power-user skills (`/devlyn:reap`, `/devlyn:design-system`, `/devlyn:team-design-ui`) live in `optional-skills/` and are user-invoked only; resolve never delegates to them.
-Manually clearing with `/clear` is rarely necessary — only do it when context is genuinely irrelevant to the next task.
+Browser validation routes through `_shared/browser-runner.sh` (Chrome MCP → Playwright → curl tier) directly from BUILD_GATE — there is no separate `/devlyn:browser-validate` skill at HEAD.
 ## Communication Style
-- Lead with **objective data** (popularity, benchmarks, community adoption) before personal opinions
-- When user asks "what's popular" or "what do others use", provide data-driven answers
-- Keep recommendations actionable and specific
+Lead with **objective data** (popularity, benchmarks, community adoption) before opinions — especially when the user asks "what's popular" or "what do others use."
+## Commit Conventions
+Follow `.claude/commit-conventions.md`.
+## Design System
+When doing UI/UX work, follow `docs/design-system.md` if it exists.