npm - devlyn-cli - Versions diffs - 1.8.1 → 1.9.0 - Mend

devlyn-cli 1.8.1 → 1.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/CLAUDE.md +6 -2
package/README.md +5 -3
package/config/skills/devlyn:auto-resolve/SKILL.md +62 -2
package/config/skills/devlyn:auto-resolve/references/build-gate.md +116 -0
package/config/skills/devlyn:preflight/references/auditors/code-auditor.md +13 -0
package/package.json +1 -1

package/CLAUDE.md CHANGED Viewed

@@ -56,13 +56,17 @@ For hands-free build-evaluate-polish cycles — works for bugs, features, refact
 /devlyn:auto-resolve [task description]
 ```
-This runs the full pipeline automatically: **Build → Browser Validate → Evaluate → Fix Loop → Simplify → Review → Security Review → Clean → Docs**. Each phase runs as a separate subagent with its own context. Communication between phases happens via files (`.devlyn/done-criteria.md`, `.devlyn/EVAL-FINDINGS.md`, `.devlyn/BROWSER-RESULTS.md`).
+This runs the full pipeline automatically: **Build → Build Gate → Browser Validate → Evaluate → Fix Loop → Simplify → Review → Security Review → Clean → Docs**. Each phase runs as a separate subagent with its own context. Communication between phases happens via files (`.devlyn/done-criteria.md`, `.devlyn/BUILD-GATE.md`, `.devlyn/EVAL-FINDINGS.md`, `.devlyn/BROWSER-RESULTS.md`).
+The **Build Gate** (Phase 1.4) runs real compilers, typecheckers, and linters — the same commands CI/Docker/production will run. It auto-detects project types (Next.js, Rust, Go, Solidity, Expo, Swift, etc.) and Dockerfiles. This is the primary defense against "tests pass locally, breaks in CI/Docker" class of bugs (type errors in un-tested files, cross-package drift, Dockerfile copy mismatches).
 For web projects, the Browser Validate phase starts the dev server and tests the implemented feature in a real browser — clicking buttons, filling forms, verifying results. If the feature doesn't work, findings feed back into the fix loop.
 Optional flags:
 - `--max-rounds 6` — increase max evaluate-fix iterations (default: 4)
 - `--skip-browser` — skip browser validation phase (auto-skipped for non-web changes)
+- `--skip-build-gate` — skip the deterministic build gate (not recommended)
+- `--build-gate strict` — treat warnings as errors; `--build-gate no-docker` — skip Docker builds for speed
 - `--skip-review` — skip team-review phase
 - `--skip-clean` — skip clean phase
 - `--skip-docs` — skip update-docs phase
@@ -76,7 +80,7 @@ After completing a roadmap (or a phase), verify that everything was actually imp
 /devlyn:preflight
 ```
-This reads every commitment from VISION.md, ROADMAP.md, and item specs, then audits the codebase evidence-based. Finds: missing features, incomplete implementations, spec divergence, bugs, stale documentation. Also checks in the browser for web projects.
+This reads every commitment from VISION.md, ROADMAP.md, and item specs, then audits the codebase evidence-based. The code auditor now runs real build/typecheck commands as its first step — any project that doesn't compile is flagged as BROKEN at CRITICAL severity before individual commitments are even checked. Also checks in the browser for web projects.
 Output: `.devlyn/PREFLIGHT-REPORT.md` with categorized findings (MISSING, INCOMPLETE, DIVERGENT, BROKEN, STALE_DOC). Confirmed gaps can be promoted to new roadmap items for auto-resolve.

package/README.md CHANGED Viewed

@@ -65,18 +65,20 @@ Point it at a spec (or just describe what you want) and walk away.
 /devlyn:auto-resolve "Implement per spec at docs/roadmap/phase-1/1.1-user-auth.md"
 ```
-It runs a **9-phase pipeline** autonomously:
+It runs a **10-phase pipeline** autonomously:
 ```
-Build → Browser Test → Evaluate → Fix Loop → Simplify → Review → Security → Clean → Docs
+Build → Build Gate → Browser Test → Evaluate → Fix Loop → Simplify → Review → Security → Clean → Docs
 ```
 - Each phase runs as a separate agent with fresh context
 - Git checkpoints at every phase for safe rollback
+- **Build Gate** runs your project's real compilers, typecheckers, and linters — catches type errors, cross-package drift, and Docker build failures that tests alone miss. Auto-detects project type (Next.js, Rust, Go, Solidity, Expo, Swift, and more) and Dockerfiles.
 - Browser validation tests your feature end-to-end (clicks, forms, verification)
 - Evaluation grades against done-criteria — if it fails, auto-fix and re-evaluate
-Skip phases you don't need: `--skip-browser`, `--skip-review`, `--skip-clean`, `--skip-docs`, `--max-rounds 6`
+Skip phases you don't need: `--skip-browser`, `--skip-review`, `--skip-clean`, `--skip-docs`, `--skip-build-gate`, `--max-rounds 6`
+Customize the build gate: `--build-gate strict` (warnings = errors), `--build-gate no-docker` (skip Docker builds for speed)
 ### Step 3 — Verify with `/devlyn:preflight`

package/config/skills/devlyn:auto-resolve/SKILL.md CHANGED Viewed

@@ -31,6 +31,8 @@ This pipeline runs hands-free. The user launches it to walk away and come back t
    - `--skip-clean` (false) — skip clean phase
    - `--skip-browser` (false) — skip browser validation phase (auto-skipped for non-web changes)
    - `--skip-docs` (false) — skip update-docs phase
+   - `--skip-build-gate` (false) — skip the deterministic build gate (Phase 1.4). Not recommended — the build gate is the primary defense against "tests pass locally, breaks in CI/Docker/production" class of bugs.
+   - `--build-gate MODE` (auto) — controls build gate behavior. `auto`: detect project type and run appropriate build/typecheck/lint commands; if Dockerfile(s) are present, Docker builds are included automatically. `strict`: auto + treat warnings as errors. `no-docker`: auto but skip Docker builds even if Dockerfiles exist (for faster iteration). `skip`: same as --skip-build-gate.
    - `--with-codex` (false) — use OpenAI Codex as a cross-model evaluator/reviewer via `mcp__codex-cli__*` MCP tools. Accepts: `evaluate`, `review`, or `both` (default when flag is present without value). When enabled, Codex provides an independent second opinion from a different model family, creating a GAN-like dynamic where Claude builds and Codex critiques.
    Flags can be passed naturally: `/devlyn:auto-resolve fix the auth bug --max-rounds 3 --skip-docs`
@@ -43,7 +45,7 @@ This pipeline runs hands-free. The user launches it to walk away and come back t
 ```
 Auto-resolve pipeline starting
 Task: [extracted task description]
-Phases: Build → [Browser] → Evaluate → [Fix loop if needed] → Simplify → [Review] → [Security] → [Clean] → [Docs]
+Phases: Build → Build Gate → [Browser] → Evaluate → [Fix loop if needed] → Simplify → [Review] → [Security] → [Clean] → [Docs]
 Max evaluation rounds: [N]
 Cross-model evaluation (Codex): [evaluate / review / both / disabled]
 ```
@@ -86,6 +88,63 @@ The task is: [paste the task description here]
 3. If no changes were made, report failure and stop
 4. **Checkpoint**: Run `git add -A && git commit -m "chore(pipeline): phase 1 — build complete"` to create a rollback point
+## PHASE 1.4: BUILD GATE
+Skip if `--skip-build-gate` or `--build-gate skip` was set.
+This phase runs the project's real build, typecheck, and lint commands — the same ones CI, Docker, and production environments will run. It catches the entire class of bugs that LLM-based evaluation and test suites cannot: type errors in un-tested files, cross-package type drift in monorepos, lint violations, missing production dependencies, and Dockerfile copy mismatches.
+This is deterministic — if the compiler says no, the pipeline stops. No LLM judgment involved.
+Spawn a subagent using the Agent tool with `mode: "bypassPermissions"`.
+Agent prompt — pass this to the Agent tool:
+You are the build gate agent. Read `references/build-gate.md` from the auto-resolve skill directory for the full project-type detection matrix and execution rules.
+Your job: detect every project type in this repo, run their build/typecheck/lint commands, and report results. You do NOT reason about code quality — you run commands and faithfully report what they output.
+1. Read the detection matrix in `references/build-gate.md`
+2. Scan the repo to detect all matching project types (a monorepo may match several)
+3. Detect the package manager (npm/pnpm/yarn/bun) per the rules in the reference file
+4. Run all gate commands. Sequential within a project type, parallel across unrelated types.
+5. If `--build-gate strict` is set, apply strict-mode flags per the reference file
+6. Run Dockerfile builds if Dockerfiles are detected, UNLESS `--build-gate no-docker` is set (see reference file)
+7. Write results to `.devlyn/BUILD-GATE.md` following the output format in the reference file
+For failures: include the FULL error output (not truncated) and extract root file:line references with concrete fix guidance so the fix agent knows exactly where to look.
+**After the agent completes**:
+1. Read `.devlyn/BUILD-GATE.md`
+2. Extract verdict
+3. Branch:
+   - `PASS` → continue to PHASE 1.5
+   - `FAIL` → go to PHASE 1.4-fix (build gate fix loop)
+## PHASE 1.4-fix: BUILD GATE FIX LOOP
+Triggered only when PHASE 1.4 returns FAIL.
+Track a round counter (shared with the main fix loop counter against `max-rounds`). If `round >= max-rounds`, stop with a clear failure report — do NOT continue to evaluate/browser/etc. Code that doesn't build cannot be meaningfully evaluated or tested.
+Spawn a subagent using the Agent tool with `mode: "bypassPermissions"`.
+Agent prompt — pass this to the Agent tool:
+Read `.devlyn/BUILD-GATE.md` — it contains deterministic build/typecheck/lint failures from real compiler output. These are not opinions; the compiler rejected this code. Fix every listed failure at the root cause level.
+For each failure:
+1. Read the referenced file:line and enough surrounding context to understand the error
+2. For type errors: check BOTH sides of the type contract — the consumer AND the type definition. The fix may belong to either side. Do NOT suppress errors with `any`, `@ts-ignore`, `as unknown as`, `// eslint-disable`, or equivalent escape hatches.
+3. For lint errors: fix the underlying issue, do not disable the rule.
+4. For missing module/dependency errors: investigate the cause — it may be a missing dep in package.json, a typo in the import path, or a tsconfig paths misconfiguration.
+5. After fixing, do NOT re-run the build yourself. The orchestrator re-runs PHASE 1.4.
+**After the agent completes**:
+1. **Checkpoint**: `git add -A && git commit -m "chore(pipeline): build gate fix round [N]"`
+2. Increment round counter
+3. Go back to PHASE 1.4 (re-run the gate)
 ## PHASE 1.5: BROWSER VALIDATE (conditional)
 Skip if `--skip-browser` was set.
@@ -284,7 +343,7 @@ Synchronize documentation with recent code changes. Use `git log --oneline -20`
 After all phases complete:
 1. Clean up temporary files:
-   - Delete the `.devlyn/` directory entirely (contains done-criteria.md, EVAL-FINDINGS.md, BROWSER-RESULTS.md, screenshots/, playwright temp files)
+   - Delete the `.devlyn/` directory entirely (contains done-criteria.md, BUILD-GATE.md, EVAL-FINDINGS.md, BROWSER-RESULTS.md, screenshots/, playwright temp files)
    - Kill any dev server process still running from browser validation
 2. Run `git log --oneline -10` to show commits made during the pipeline
@@ -300,6 +359,7 @@ After all phases complete:
 | Phase | Status | Notes |
 |-------|--------|-------|
 | Build (team-resolve) | [completed] | [brief summary] |
+| Build gate | [completed / skipped / FAIL after N rounds] | [project types detected, commands run, pass/fail per command] |
 | Browser validate | [completed / skipped / auto-skipped] | [verdict, tier used, console errors, flow results] |
 | Evaluate (Claude) | [PASS/NEEDS WORK after N rounds] | [verdict + key findings] |
 | Evaluate (Codex) | [completed / skipped] | [Codex-only findings count, merged verdict] |

package/config/skills/devlyn:auto-resolve/references/build-gate.md ADDED Viewed

@@ -0,0 +1,116 @@
+# Build Gate — Project Type Detection & Commands
+Reference for PHASE 1.4 (Build Gate). The build gate agent reads this file to determine which commands to run.
+---
+## Project Type Detection Matrix
+Inspect the repository root and subdirectories (up to 2 levels). A repo can match **multiple** signals — run ALL matching gates. Do not pick "the main one"; a monorepo with a Next.js dashboard + Rust service needs both.
+| Signal file(s) | Project type | Gate commands (run in order) |
+|---|---|---|
+| `package.json` with `next` dep | Next.js | `npx tsc --noEmit` → `npx next build` |
+| `package.json` with `nuxt` dep | Nuxt | `npx nuxi typecheck` → `npx nuxi build` |
+| `package.json` with `vite` + `tsconfig.json` | Vite+TS | `npx tsc --noEmit` → `npm run build` (if script exists) |
+| `package.json` with `expo` dep | Expo (React Native) | `npx tsc --noEmit` → `npx expo-doctor` |
+| `package.json` with `react-native` (no expo) | React Native | `npx tsc --noEmit` |
+| `package.json` with `svelte` + `@sveltejs/kit` | SvelteKit | `npm run check` → `npm run build` |
+| `package.json` only, has `build` script | Generic Node | `npm run build` |
+| `package.json` only, has `tsconfig.json` but no `build` | TS library | `npx tsc --noEmit` |
+| `pnpm-workspace.yaml` / `turbo.json` / `lerna.json` | Monorepo | `pnpm -r build` or `turbo run build typecheck lint` — **workspace-wide**, NOT just the changed package |
+| `Cargo.toml` | Rust | `cargo check --all-targets` → `cargo clippy -- -D warnings` |
+| `go.mod` | Go | `go build ./...` → `go vet ./...` |
+| `foundry.toml` | Foundry (Solidity) | `forge build` |
+| `hardhat.config.{js,ts,cjs}` | Hardhat (Solidity) | `npx hardhat compile` |
+| `Anchor.toml` | Anchor (Solana) | `anchor build` |
+| `Move.toml` | Move (Sui/Aptos) | `sui move build` or `aptos move compile` |
+| `pyproject.toml` / `setup.py` + mypy config | Python+mypy | `mypy .` |
+| `pyproject.toml` with `ruff` | Python+Ruff | `ruff check .` |
+| `Package.swift` | Swift package | `swift build` |
+| `*.xcodeproj` / `*.xcworkspace` | iOS/macOS (Xcode) | Skip by default — log "Xcode project detected, manual build gate recommended". Too project-specific without knowing the scheme. |
+| `build.gradle*` / `settings.gradle*` | Gradle/Android | `./gradlew assembleDebug` (debug, not release — keep it fast) |
+| `CMakeLists.txt` | C/C++ (CMake) | `cmake -B build && cmake --build build` |
+| `Makefile` (with no other signals) | Generic Make | `make` (only if no other type matched — Makefiles are too generic) |
+| `Unity/ProjectSettings/` or `ProjectSettings/ProjectVersion.txt` | Unity | Skip by default — log "Unity project detected, manual build gate recommended" |
+| `project.godot` | Godot | Skip by default — log "Godot project detected, manual build gate recommended" |
+| `Dockerfile*` | Docker | `docker build -f <dockerfile> -t _pipeline_gate_test .` — included by default in `auto` mode. Skip with `--build-gate no-docker`. |
+## Package Manager Detection
+Respect the project's package manager. Check in order:
+1. `packageManager` field in root `package.json` → use that
+2. `pnpm-lock.yaml` exists → `pnpm`
+3. `yarn.lock` exists → `yarn`
+4. `bun.lockb` / `bun.lock` exists → `bun`
+5. Default → `npm`
+Replace `npm run build` / `npx` accordingly: `pnpm build` / `pnpm exec`, `yarn build` / `yarn`, `bun run build` / `bunx`.
+## Monorepo Handling
+Monorepo is the most critical case — cross-package type drift is the #1 source of "tests pass locally, build fails in CI."
+1. Detect workspace root markers: `pnpm-workspace.yaml`, `turbo.json`, `lerna.json`, `workspaces` in root `package.json`
+2. Run gates at the **workspace root** level, not per-changed-package:
+   - Turbo: `turbo run build typecheck lint` (respects dependency graph)
+   - pnpm: `pnpm -r build` (runs in topological order)
+   - yarn workspaces: `yarn workspaces foreach -A run build`
+   - npm workspaces: `npm run build --workspaces`
+3. This ensures Package A's type change that breaks Package B's consumer is caught, even if only Package A was directly modified.
+## Strict Mode (`--build-gate strict`)
+When strict mode is set, treat warnings as failures:
+- TypeScript: add `--strict` if not already in tsconfig (or verify it's set)
+- Clippy: `-D warnings` (already default in the matrix)
+- ESLint: `--max-warnings 0`
+- Go vet: already treats warnings as errors
+- Foundry: `--deny-warnings`
+In default (auto) mode, only hard errors (non-zero exit code from the tool's perspective) block.
+## Docker Build (default in `auto` mode)
+When `Dockerfile*` files are detected AND `--build-gate no-docker` is NOT set:
+1. Run all non-Docker gates first (they're faster and catch most errors before the slow Docker step)
+2. Then run `docker build -f <dockerfile> -t _pipeline_gate_test .` for each Dockerfile found in the repo root and subdirectories (up to 2 levels)
+3. If Docker daemon is not available, log the skip with a warning but do NOT fail — developers without Docker should not be blocked. The warning should note: "Docker builds were skipped because the Docker daemon is unavailable. Use `--build-gate no-docker` to suppress this warning, or ensure Docker is running to catch Dockerfile-specific issues."
+4. This catches Dockerfile-specific issues that no other gate can: COPY paths referencing files excluded by .dockerignore, multi-stage build failures, production-only dependency resolution, and environment differences between dev and container builds
+Use `--build-gate no-docker` to skip Docker builds for faster iteration during development — the language-level gates (tsc, cargo check, etc.) still run and catch the majority of issues. Docker builds are most valuable as a final gate before shipping.
+## Output Format
+Write results to `.devlyn/BUILD-GATE.md`:
+```markdown
+# Build Gate Results
+## Verdict: [PASS / FAIL]
+## Detected Project Types
+- [type] ([path/])
+## Gate Commands Run
+| # | Command | Dir | Exit | Status | Time |
+|---|---|---|---|---|---|
+| 1 | `npx tsc --noEmit` | dashboard/ | 0 | PASS | 4.2s |
+| 2 | `npx next build` | dashboard/ | 1 | FAIL | 9.8s |
+| 3 | `cargo check --all-targets` | services/indexer/ | 0 | PASS | 12.1s |
+## Failures
+### Command #2: `npx next build` (dashboard/, exit 1)
+```
+[full error output — do NOT truncate. Build errors reference files from earlier in output.]
+```
+**Root file:line(s)**:
+- `dashboard/app/(dashboard)/settings/page.tsx:90` — Type error: Property 'config' does not exist on type 'SettingsTabsProps'
+**Fix guidance**:
+Read `dashboard/app/(dashboard)/settings/page.tsx:88-93` and `dashboard/components/settings/SettingsTabs.tsx` (the SettingsTabsProps type definition). Either add `config` to SettingsTabsProps or remove the prop from the parent. Then re-run `npx next build` from `dashboard/` to verify.
+```
+Verdict rules:
+- Any exit code != 0 → **FAIL**
+- All exit codes == 0 → **PASS**
+- No gates detected → **PASS** with note "No build gate detected — project type unknown. Consider adding `--build-gate deploy` if Dockerfiles are present."

package/config/skills/devlyn:preflight/references/auditors/code-auditor.md CHANGED Viewed

@@ -8,6 +8,19 @@ You are auditing a codebase against its planning commitments. Your job is to ver
 Read `.devlyn/commitment-registry.md` for the full list of commitments to verify. Skip any items in the "Not Started (Planned)" section — those are acknowledged future work, not gaps.
+**Step 0 — Build health check**: Before auditing individual commitments, verify the project actually builds. Detect the project type(s) and run their build/typecheck commands:
+- `package.json` with `next` → `npx tsc --noEmit && npx next build`
+- `package.json` with `vite` + `tsconfig.json` → `npx tsc --noEmit`
+- `Cargo.toml` → `cargo check --all-targets`
+- `go.mod` → `go build ./... && go vet ./...`
+- `foundry.toml` → `forge build`
+- `hardhat.config.*` → `npx hardhat compile`
+- Monorepo (`pnpm-workspace.yaml`/`turbo.json`) → workspace-wide build
+- `Dockerfile*` → `docker build` (if Docker available)
+- For other project types, look for a `build` script in `package.json` or equivalent
+Any build/typecheck failure is a BROKEN finding at CRITICAL severity — code that doesn't compile cannot fulfill any commitment. Include the full compiler error output with file:line references. This catches type errors, missing imports, cross-package drift, and Dockerfile build failures that text-based code reading alone cannot detect.
 **For each active commitment (not planned):**
 1. Search the codebase for its implementation (use Grep, Glob, Read in parallel where possible)
 2. Read the implementing code thoroughly — line by line for critical paths

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "devlyn-cli",
-  "version": "1.8.1",
+  "version": "1.9.0",
   "description": "AI development toolkit for Claude Code — ideate, auto-resolve, and ship with context engineering and agent orchestration",
   "homepage": "https://github.com/fysoul17/devlyn-cli#readme",
   "bin": {