npm - devlyn-cli - Versions diffs - 1.3.2 → 1.4.0 - Mend

devlyn-cli 1.3.2 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/CLAUDE.md +4 -4
package/README.md +2 -2
package/agents-config/evaluator.md +3 -3
package/bin/devlyn.js +16 -7
package/config/skills/devlyn:auto-resolve/SKILL.md +15 -18
package/config/skills/devlyn:auto-resolve/references/codex-integration.md +3 -3
package/config/skills/devlyn:browser-validate/SKILL.md +3 -3
package/config/skills/devlyn:browser-validate/references/flow-testing.md +2 -2
package/config/skills/devlyn:browser-validate/references/tier2-playwright.md +14 -14
package/config/skills/devlyn:evaluate/SKILL.md +5 -5
package/config/skills/devlyn:team-resolve/SKILL.md +1 -1
package/package.json +1 -1

package/CLAUDE.md CHANGED Viewed

@@ -56,7 +56,7 @@ For hands-free build-evaluate-polish cycles — works for bugs, features, refact
 /devlyn:auto-resolve [task description]
 ```
-This runs the full pipeline automatically: **Build → Browser Validate → Evaluate → Fix Loop → Simplify → Review → Security Review → Clean → Docs**. Each phase runs as a separate subagent with its own context. Communication between phases happens via files (`.claude/done-criteria.md`, `.claude/EVAL-FINDINGS.md`, `.claude/BROWSER-RESULTS.md`).
+This runs the full pipeline automatically: **Build → Browser Validate → Evaluate → Fix Loop → Simplify → Review → Security Review → Clean → Docs**. Each phase runs as a separate subagent with its own context. Communication between phases happens via files (`.devlyn/done-criteria.md`, `.devlyn/EVAL-FINDINGS.md`, `.devlyn/BROWSER-RESULTS.md`).
 For web projects, the Browser Validate phase starts the dev server and tests the implemented feature in a real browser — clicking buttons, filling forms, verifying results. If the feature doesn't work, findings feed back into the fix loop.
@@ -72,9 +72,9 @@ Optional flags:
 When you want to run each step yourself with review between phases:
-1. `/devlyn:team-resolve [issue]` → Investigate + implement (writes `.claude/done-criteria.md`)
-2. `/devlyn:evaluate` → Grade against done-criteria (writes `.claude/EVAL-FINDINGS.md`)
-3. If findings exist: `/devlyn:team-resolve "Fix issues in .claude/EVAL-FINDINGS.md"` → Fix loop
+1. `/devlyn:team-resolve [issue]` → Investigate + implement (writes `.devlyn/done-criteria.md`)
+2. `/devlyn:evaluate` → Grade against done-criteria (writes `.devlyn/EVAL-FINDINGS.md`)
+3. If findings exist: `/devlyn:team-resolve "Fix issues in .devlyn/EVAL-FINDINGS.md"` → Fix loop
 4. `/simplify` → Quick cleanup pass
 5. `/devlyn:team-review` → Multi-perspective team review (for important PRs)
 6. `/devlyn:clean` → Codebase hygiene

package/README.md CHANGED Viewed

@@ -42,7 +42,7 @@ Pipeline orchestration that controls _how agents execute_ — permissions, state
 - **`/devlyn:auto-resolve`** — 9-phase automated pipeline (build → browser validate → evaluate → fix loop → simplify → review → security → clean → docs)
 - **`/devlyn:browser-validate`** — feature verification in a real browser with tiered fallback (Chrome MCP → Playwright → curl)
 - **`bypassPermissions` mode** for autonomous subagent execution
-- **File-based state machine** — agents communicate via `.claude/done-criteria.md`, `EVAL-FINDINGS.md`, and `BROWSER-RESULTS.md`
+- **File-based state machine** — agents communicate via `.devlyn/done-criteria.md`, `EVAL-FINDINGS.md`, and `BROWSER-RESULTS.md`
 - **Git checkpoints** at each phase for rollback safety
 - **Cross-model evaluation** via `--with-codex` flag (OpenAI Codex as independent evaluator)
@@ -172,7 +172,7 @@ For step-by-step control between phases:
 |---|---|---|
 | 1. **Resolve** | `/devlyn:resolve` or `/devlyn:team-resolve` | Fix the issue — solo for focused bugs (1-2 modules), team for complex issues (3+ modules) |
 | 2. **Evaluate** | `/devlyn:evaluate` | Independent quality evaluation — grades against done criteria written in step 1 |
-| | | *If the evaluation finds issues: `/devlyn:team-resolve "Fix issues in .claude/EVAL-FINDINGS.md"`* |
+| | | *If the evaluation finds issues: `/devlyn:team-resolve "Fix issues in .devlyn/EVAL-FINDINGS.md"`* |
 | 3. **Simplify** | `/simplify` | Quick cleanup pass for reuse, quality, and efficiency *(built-in Claude Code command)* |
 | 4. **Review** | `/devlyn:review` or `/devlyn:team-review` | Audit the changes — solo for small PRs (< 10 files), team for large PRs (10+ files) |
 | 5. **Clean** | `/devlyn:clean` | Remove dead code, unused dependencies, and complexity hotspots |

package/agents-config/evaluator.md CHANGED Viewed

@@ -4,7 +4,7 @@ You are a code quality evaluator. Your job is to audit work produced by another
 ## Before You Start
-1. **Check for done criteria**: Read `.claude/done-criteria.md` if it exists. When present, this is your primary grading rubric — every criterion must be verified with evidence. When absent, fall back to the checklists below.
+1. **Check for done criteria**: Read `.devlyn/done-criteria.md` if it exists. When present, this is your primary grading rubric — every criterion must be verified with evidence. When absent, fall back to the checklists below.
 ## Calibration
@@ -36,7 +36,7 @@ You will be too lenient by default. You will identify real issues, then talk you
 ## Output
-Write findings to `.claude/EVAL-FINDINGS.md` for downstream consumption:
+Write findings to `.devlyn/EVAL-FINDINGS.md` for downstream consumption:
 ```markdown
 # Evaluation Findings
@@ -61,4 +61,4 @@ Write findings to `.claude/EVAL-FINDINGS.md` for downstream consumption:
 - [positive observations]
 ```
-Do NOT delete `.claude/done-criteria.md` or `.claude/EVAL-FINDINGS.md` — the orchestrator or user is responsible for cleanup.
+Do NOT delete `.devlyn/done-criteria.md` or `.devlyn/EVAL-FINDINGS.md` — the orchestrator or user is responsible for cleanup.

package/bin/devlyn.js CHANGED Viewed

@@ -528,6 +528,19 @@ async function init(skipPrompts = false) {
     log('  → CLAUDE.md', 'dim');
   }
+  // Add .devlyn/ (pipeline state directory) to .gitignore
+  const gitignorePath = path.join(process.cwd(), '.gitignore');
+  const gitignoreEntry = '.devlyn/';
+  let gitignoreContent = fs.existsSync(gitignorePath)
+    ? fs.readFileSync(gitignorePath, 'utf8')
+    : '';
+  if (!gitignoreContent.split('\n').some((line) => line.trim() === gitignoreEntry || line.trim() === '.devlyn')) {
+    const prefix = gitignoreContent && !gitignoreContent.endsWith('\n') ? '\n' : '';
+    const header = gitignoreContent ? '\n# devlyn-cli pipeline state\n' : '# devlyn-cli pipeline state\n';
+    fs.writeFileSync(gitignorePath, gitignoreContent + prefix + header + gitignoreEntry + '\n');
+    log('  → .gitignore (added .devlyn/)', 'dim');
+  }
   // Enable agent teams in project settings
   const settingsPath = path.join(targetDir, 'settings.json');
   let settings = {};
@@ -539,16 +552,12 @@ async function init(skipPrompts = false) {
     }
   }
   if (!settings.env) settings.env = {};
-  // Auto-allow pipeline files so auto-resolve doesn't prompt for permission
+  // Auto-allow pipeline state directory and common git commands so auto-resolve doesn't prompt
   if (!settings.permissions) settings.permissions = {};
   if (!settings.permissions.allow) settings.permissions.allow = [];
   const pipelinePermissions = [
-    'Write(.claude/done-criteria.md)',
-    'Write(.claude/EVAL-FINDINGS.md)',
-    'Write(.claude/BROWSER-RESULTS.md)',
-    'Edit(.claude/done-criteria.md)',
-    'Edit(.claude/EVAL-FINDINGS.md)',
-    'Edit(.claude/BROWSER-RESULTS.md)',
+    'Write(.devlyn/**)',
+    'Edit(.devlyn/**)',
     'Bash(git add *)',
     'Bash(git commit *)',
     'Bash(git diff *)',

package/config/skills/devlyn:auto-resolve/SKILL.md CHANGED Viewed

@@ -53,7 +53,7 @@ Investigate and implement the following task. Work through these phases in order
 - **UI/UX**: review existing components, design system, and user flows.
 Read relevant files in parallel. Build a clear picture of what exists and what needs to change.
-**Phase B — Define done criteria**: Before writing any code, create `.claude/done-criteria.md` with testable success criteria. Each criterion must be verifiable (a test can assert it or a human can observe it in under 30 seconds), specific (not vague like "handles errors correctly"), and scoped to this task. Include an "Out of Scope" section and a "Verification Method" section. This file is required — downstream evaluation depends on it.
+**Phase B — Define done criteria**: Before writing any code, create `.devlyn/done-criteria.md` with testable success criteria. Each criterion must be verifiable (a test can assert it or a human can observe it in under 30 seconds), specific (not vague like "handles errors correctly"), and scoped to this task. Include an "Out of Scope" section and a "Verification Method" section. This file is required — downstream evaluation depends on it.
 **Phase C — Assemble a team**: Use TeamCreate to create a team. Select teammates based on task type:
 - Bug fix: root-cause-analyst + test-engineer (+ security-auditor, performance-engineer as needed)
@@ -64,14 +64,14 @@ Each teammate investigates from their perspective and sends findings back.
 **Phase D — Synthesize and implement**: After all teammates report, compile findings into a unified plan. Implement the solution — no workarounds, no hardcoded values, no silent error swallowing. For bugs: write a failing test first, then fix. For features: implement following existing patterns, then write tests. For refactors: ensure tests pass before and after.
-**Phase E — Update done criteria**: Mark each criterion in `.claude/done-criteria.md` as satisfied. Run the full test suite.
+**Phase E — Update done criteria**: Mark each criterion in `.devlyn/done-criteria.md` as satisfied. Run the full test suite.
 **Phase F — Cleanup**: Shut down all teammates and delete the team.
 The task is: [paste the task description here]
 **After the agent completes**:
-1. Verify `.claude/done-criteria.md` exists — if missing, create a basic one from the agent's output summary
+1. Verify `.devlyn/done-criteria.md` exists — if missing, create a basic one from the agent's output summary
 2. Run `git diff --stat` to confirm code was actually changed
 3. If no changes were made, report failure and stop
 4. **Checkpoint**: Run `git add -A && git commit -m "chore(pipeline): phase 1 — build complete"` to create a rollback point
@@ -86,16 +86,16 @@ Skip if `--skip-browser` was set.
 Agent prompt — pass this to the Agent tool:
-You are a browser validation agent. Read the skill instructions at `.claude/skills/devlyn:browser-validate/SKILL.md` and follow the full workflow to validate this web application. The dev server should be started, tested, and left running (pass `--keep-server` internally) — the pipeline will clean it up later. Write your findings to `.claude/BROWSER-RESULTS.md`.
+You are a browser validation agent. Read the skill instructions at `.claude/skills/devlyn:browser-validate/SKILL.md` and follow the full workflow to validate this web application. The dev server should be started, tested, and left running (pass `--keep-server` internally) — the pipeline will clean it up later. Write your findings to `.devlyn/BROWSER-RESULTS.md`.
 **After the agent completes**:
-1. Read `.claude/BROWSER-RESULTS.md`
+1. Read `.devlyn/BROWSER-RESULTS.md`
 2. Extract the verdict
 3. Branch on verdict:
    - `PASS` → continue to PHASE 2
    - `PASS WITH ISSUES` → continue to PHASE 2 (evaluator reads browser results as extra context)
    - `PARTIALLY VERIFIED` → continue to PHASE 2, but flag to the evaluator that browser coverage was incomplete — unverified features should be weighted more heavily
-   - `NEEDS WORK` → features don't work in the browser. Go to PHASE 2.5 fix loop. Fix agent reads `.claude/BROWSER-RESULTS.md` for which criterion failed, at what step, with what error. After fixing, re-run PHASE 1.5 to verify the fix before proceeding to Evaluate.
+   - `NEEDS WORK` → features don't work in the browser. Go to PHASE 2.5 fix loop. Fix agent reads `.devlyn/BROWSER-RESULTS.md` for which criterion failed, at what step, with what error. After fixing, re-run PHASE 1.5 to verify the fix before proceeding to Evaluate.
    - `BLOCKED` → app doesn't render. Go to PHASE 2.5 fix loop. After fixing, re-run PHASE 1.5.
 ## PHASE 2: EVALUATE
@@ -106,7 +106,7 @@ Agent prompt — pass this to the Agent tool:
 You are an independent evaluator. Your job is to grade work produced by another agent, not to praise it. You will be too lenient by default — fight this tendency. When in doubt, score DOWN, not up. A false negative (missing a bug) ships broken code. A false positive (flagging a non-issue) costs minutes of review. The cost is asymmetric.
-**Step 1 — Read the done criteria**: Read `.claude/done-criteria.md`. This is your primary grading rubric. Every criterion must be verified with evidence.
+**Step 1 — Read the done criteria**: Read `.devlyn/done-criteria.md`. This is your primary grading rubric. Every criterion must be verified with evidence.
 **Step 2 — Discover changes**: Run `git diff HEAD~1` and `git status` to see what changed. Read all changed/new files in parallel.
@@ -119,7 +119,7 @@ You are an independent evaluator. Your job is to grade work produced by another
 **Step 4 — Grade against done criteria**: For each criterion in done-criteria.md, mark VERIFIED (with evidence) or FAILED (with file:line and what's wrong).
-**Step 5 — Write findings**: Write `.claude/EVAL-FINDINGS.md` with this exact structure:
+**Step 5 — Write findings**: Write `.devlyn/EVAL-FINDINGS.md` with this exact structure:
 ```
 # Evaluation Findings
@@ -143,10 +143,10 @@ Calibration examples to guide your judgment:
 - A `let` that could be `const` = LOW note only. Linters catch this.
 - "The error handling is generally quite good" = WRONG. Count the instances. Name the files. "3 of 7 async ops have error states. 4 are missing: file:line, file:line..."
-Do NOT delete `.claude/done-criteria.md` or `.claude/EVAL-FINDINGS.md` — the orchestrator needs them.
+Do NOT delete `.devlyn/done-criteria.md` or `.devlyn/EVAL-FINDINGS.md` — the orchestrator needs them.
 **After the agent completes**:
-1. Read `.claude/EVAL-FINDINGS.md`
+1. Read `.devlyn/EVAL-FINDINGS.md`
 2. Extract the verdict
 3. **If `--with-codex` includes `evaluate` or `both`**: Read `references/codex-integration.md` and follow the "PHASE 2-CODEX: CROSS-MODEL EVALUATE" section. This runs Codex as a second evaluator and merges findings into `EVAL-FINDINGS.md`.
 4. Branch on verdict (from the merged findings if Codex was used):
@@ -154,7 +154,7 @@ Do NOT delete `.claude/done-criteria.md` or `.claude/EVAL-FINDINGS.md` — the o
    - `PASS WITH ISSUES` → skip to PHASE 3 (issues are shippable)
    - `NEEDS WORK` → go to PHASE 2.5 (fix loop)
    - `BLOCKED` → go to PHASE 2.5 (fix loop)
-5. If `.claude/EVAL-FINDINGS.md` was not created, treat as PASS WITH ISSUES and log a warning
+5. If `.devlyn/EVAL-FINDINGS.md` was not created, treat as PASS WITH ISSUES and log a warning
 ## PHASE 2.5: FIX LOOP (conditional)
@@ -164,11 +164,11 @@ Spawn a subagent using the Agent tool with `mode: "bypassPermissions"` to fix th
 Agent prompt — pass this to the Agent tool:
-Read `.claude/EVAL-FINDINGS.md` — it contains specific issues found by an independent evaluator. Fix every CRITICAL and HIGH finding. Address MEDIUM findings if straightforward.
+Read `.devlyn/EVAL-FINDINGS.md` — it contains specific issues found by an independent evaluator. Fix every CRITICAL and HIGH finding. Address MEDIUM findings if straightforward.
-The original done criteria are in `.claude/done-criteria.md` — your fixes must still satisfy those criteria. Do not delete or weaken criteria to make them pass.
+The original done criteria are in `.devlyn/done-criteria.md` — your fixes must still satisfy those criteria. Do not delete or weaken criteria to make them pass.
-For each finding: read the referenced file:line, understand the issue, implement the fix. No workarounds — fix the actual root cause. Run tests after fixing. Update `.claude/done-criteria.md` to mark fixed items.
+For each finding: read the referenced file:line, understand the issue, implement the fix. No workarounds — fix the actual root cause. Run tests after fixing. Update `.devlyn/done-criteria.md` to mark fixed items.
 **After the agent completes**:
 1. **Checkpoint**: Run `git add -A && git commit -m "chore(pipeline): fix round [N] complete"` to preserve the fix
@@ -271,10 +271,7 @@ Synchronize documentation with recent code changes. Use `git log --oneline -20`
 After all phases complete:
 1. Clean up temporary files:
-   - Delete `.claude/done-criteria.md`
-   - Delete `.claude/EVAL-FINDINGS.md`
-   - Delete `.claude/BROWSER-RESULTS.md` (if exists)
-   - Delete `.claude/screenshots/` directory (if exists)
+   - Delete the `.devlyn/` directory entirely (contains done-criteria.md, EVAL-FINDINGS.md, BROWSER-RESULTS.md, screenshots/, playwright temp files)
    - Kill any dev server process still running from browser validation
 2. Run `git log --oneline -10` to show commits made during the pipeline

package/config/skills/devlyn:auto-resolve/references/codex-integration.md CHANGED Viewed

@@ -34,7 +34,7 @@ Run after the Claude evaluator (Phase 2) completes, only if `--with-codex` inclu
 ### Step 1 — Get Codex's evaluation
 Call `mcp__codex-cli__codex` with:
-- `prompt`: Include the full content of `.claude/done-criteria.md` and the output of `git diff HEAD~1`. Ask Codex to evaluate the changes against the done criteria and report issues by severity (CRITICAL, HIGH, MEDIUM, LOW) with file:line references.
+- `prompt`: Include the full content of `.devlyn/done-criteria.md` and the output of `git diff HEAD~1`. Ask Codex to evaluate the changes against the done criteria and report issues by severity (CRITICAL, HIGH, MEDIUM, LOW) with file:line references.
 - `workingDirectory`: the project root
 - `sandbox`: `"read-only"` (Codex should only read, not modify files)
 - `reasoningEffort`: `"high"`
@@ -44,7 +44,7 @@ Example prompt to pass:
 You are an independent code evaluator. Grade the following code changes against the done criteria below. Be strict — when in doubt, flag it.
 ## Done Criteria
-[paste contents of .claude/done-criteria.md]
+[paste contents of .devlyn/done-criteria.md]
 ## Code Changes
 [paste output of git diff HEAD~1]
@@ -61,7 +61,7 @@ Spawn a subagent using the Agent tool with `mode: "bypassPermissions"` to merge
 Agent prompt:
-Read `.claude/EVAL-FINDINGS.md` (Claude's evaluation) and the Codex evaluation output below. Merge them into a single unified `.claude/EVAL-FINDINGS.md` following the existing format. Rules:
+Read `.devlyn/EVAL-FINDINGS.md` (Claude's evaluation) and the Codex evaluation output below. Merge them into a single unified `.devlyn/EVAL-FINDINGS.md` following the existing format. Rules:
 - Take the MORE SEVERE verdict between the two evaluators
 - Deduplicate findings that reference the same file:line or describe the same issue
 - When both evaluators flag the same issue, keep the more detailed description

package/config/skills/devlyn:browser-validate/SKILL.md CHANGED Viewed

@@ -15,7 +15,7 @@ $ARGUMENTS
 ## PHASE 1: DETECT
-1. **What was built**: This is the most important input. Read `.claude/done-criteria.md` if it exists — it tells you what the feature is supposed to do. If it doesn't exist, read `git diff --stat` and `git log -1` to understand what changed. You need to know what to test before anything else.
+1. **What was built**: This is the most important input. Read `.devlyn/done-criteria.md` if it exists — it tells you what the feature is supposed to do. If it doesn't exist, read `git diff --stat` and `git log -1` to understand what changed. You need to know what to test before anything else.
 2. **Framework detection**: Read `package.json` → identify framework and start command from `scripts.dev`, `scripts.start`, or `scripts.preview`.
@@ -65,7 +65,7 @@ If the app isn't rendering, the verdict is BLOCKED — feature testing can't hap
 This is the primary purpose of browser validation. Everything else is in service of getting here.
-Read `.claude/done-criteria.md` (or infer from git diff what was built). For each criterion that describes something a user can do or see in the UI, test it end-to-end in the browser:
+Read `.devlyn/done-criteria.md` (or infer from git diff what was built). For each criterion that describes something a user can do or see in the UI, test it end-to-end in the browser:
 1. **Plan the test**: What would a user do to verify this feature works? Navigate where, click what, type what, expect what result?
 2. **Execute it**: Navigate to the page, find the interactive elements, perform the actions, verify the outcome. Read `references/flow-testing.md` for patterns on converting criteria to browser steps.
@@ -88,7 +88,7 @@ Judgment-based — look at the screenshots and report visible issues.
 ## PHASE 6: REPORT
-Write `.claude/BROWSER-RESULTS.md`:
+Write `.devlyn/BROWSER-RESULTS.md`:
 ```markdown
 # Browser Validation Results

package/config/skills/devlyn:browser-validate/references/flow-testing.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Flow Testing: Done-Criteria to Browser Steps
-How to read `.claude/done-criteria.md` and convert testable criteria into browser action sequences. This is the bridge between "what should work" and "prove it works in the browser."
+How to read `.devlyn/done-criteria.md` and convert testable criteria into browser action sequences. This is the bridge between "what should work" and "prove it works in the browser."
 Read this file only during PHASE 4 (FLOW) when done-criteria exists.
@@ -8,7 +8,7 @@ Read this file only during PHASE 4 (FLOW) when done-criteria exists.
 ## Step 1: Classify Each Criterion
-Read `.claude/done-criteria.md` and classify each criterion:
+Read `.devlyn/done-criteria.md` and classify each criterion:
 **Browser-testable** — the criterion describes something a user can see or do in the UI:
 - "User can create a new project from the dashboard"

package/config/skills/devlyn:browser-validate/references/tier2-playwright.md CHANGED Viewed

@@ -44,7 +44,7 @@ Generate a temporary test script from the test steps, run it with Playwright's J
 ## Script Generation
-For each phase (smoke, flow, visual), generate a test script at `.claude/browser-test.spec.ts`.
+For each phase (smoke, flow, visual), generate a test script at `.devlyn/browser-test.spec.ts`.
 ### Smoke Test Script Template
@@ -89,7 +89,7 @@ test.describe('Smoke Tests', () => {
       const pageUrl = page.url();
       expect(title, 'Page shows a browser error — server may be down').not.toBe(pageUrl);
-      await page.screenshot({ path: `.claude/screenshots/smoke${route.replace(/\//g, '-') || '-root'}.png`, fullPage: true });
+      await page.screenshot({ path: `.devlyn/screenshots/smoke${route.replace(/\//g, '-') || '-root'}.png`, fullPage: true });
       if (errors.length > 0) {
         test.info().annotations.push({ type: 'console_errors', description: errors.join(' | ') });
@@ -123,7 +123,7 @@ test('flow: [criterion description]', async ({ page }) => {
   await expect(page.locator('[verification selector]')).toBeVisible();
   // Screenshot
-  await page.screenshot({ path: '.claude/screenshots/flow-[name].png' });
+  await page.screenshot({ path: '.devlyn/screenshots/flow-[name].png' });
 });
 ```
@@ -135,7 +135,7 @@ test.describe('Visual - Mobile', () => {
   for (const route of ROUTES) {
     test(`visual-mobile: ${route}`, async ({ page }) => {
       await page.goto(`http://localhost:${PORT}${route}`, { waitUntil: 'networkidle' });
-      await page.screenshot({ path: `.claude/screenshots/visual-mobile${route.replace(/\//g, '-') || '-root'}.png`, fullPage: true });
+      await page.screenshot({ path: `.devlyn/screenshots/visual-mobile${route.replace(/\//g, '-') || '-root'}.png`, fullPage: true });
     });
   }
 });
@@ -145,7 +145,7 @@ test.describe('Visual - Desktop', () => {
   for (const route of ROUTES) {
     test(`visual-desktop: ${route}`, async ({ page }) => {
       await page.goto(`http://localhost:${PORT}${route}`, { waitUntil: 'networkidle' });
-      await page.screenshot({ path: `.claude/screenshots/visual-desktop${route.replace(/\//g, '-') || '-root'}.png`, fullPage: true });
+      await page.screenshot({ path: `.devlyn/screenshots/visual-desktop${route.replace(/\//g, '-') || '-root'}.png`, fullPage: true });
     });
   }
 });
@@ -154,16 +154,16 @@ test.describe('Visual - Desktop', () => {
 ## Execution
 ```bash
-mkdir -p .claude/screenshots
-npx playwright test .claude/browser-test.spec.ts \
+mkdir -p .devlyn/screenshots
+npx playwright test .devlyn/browser-test.spec.ts \
   --reporter=json \
-  --output=.claude/playwright-results \
-  2>&1 | tee .claude/playwright-output.json
+  --output=.devlyn/playwright-results \
+  2>&1 | tee .devlyn/playwright-output.json
 ```
 ## Parsing Results
-Read `.claude/playwright-output.json`. The JSON structure contains:
+Read `.devlyn/playwright-output.json`. The JSON structure contains:
 - `suites[].specs[].tests[].results[].status` — `"passed"`, `"failed"`, `"timedOut"`
 - `suites[].specs[].tests[].results[].errors` — error messages with stack traces
 - `suites[].specs[].tests[].annotations` — custom annotations (console_errors, network_failures)
@@ -177,12 +177,12 @@ Map these to BROWSER-RESULTS.md findings:
 After parsing results:
 ```bash
-rm -f .claude/browser-test.spec.ts
-rm -rf .claude/playwright-results
-rm -f .claude/playwright-output.json
+rm -f .devlyn/browser-test.spec.ts
+rm -rf .devlyn/playwright-results
+rm -f .devlyn/playwright-output.json
 ```
-Keep `.claude/screenshots/` — those are evidence referenced by the report.
+Keep `.devlyn/screenshots/` — those are evidence referenced by the report.
 ## Limitations vs Tier 1

package/config/skills/devlyn:evaluate/SKILL.md CHANGED Viewed

@@ -23,7 +23,7 @@ Before spawning any evaluators, understand what you're evaluating:
    - **"recent changes"** or no argument: Use `git diff HEAD` for unstaged changes, `git status` for new files
    - **Running session / live monitoring**: Take a baseline snapshot with `git status --short | wc -l`, then poll every 30-45 seconds for new changes using `git status` and `find . -newer <reference-file> -type f`. Report findings incrementally as changes appear.
-2. **Check for done criteria**: Read `.claude/done-criteria.md` if it exists. This file contains testable success criteria written by the generator (e.g., `/devlyn:team-resolve` Phase 1.5). When present, it is the primary grading rubric — every criterion in it must be verified. When absent, fall back to the evaluation checklists below.
+2. **Check for done criteria**: Read `.devlyn/done-criteria.md` if it exists. This file contains testable success criteria written by the generator (e.g., `/devlyn:team-resolve` Phase 1.5). When present, it is the primary grading rubric — every criterion in it must be verified. When absent, fall back to the evaluation checklists below.
 3. Build the evaluation baseline:
    - Run `git status --short` to see all changed and new files
@@ -297,9 +297,9 @@ LOW (note):
 4. For each catch block: is the error surfaced to the user or silently swallowed?
 5. Check for React anti-patterns: uncontrolled-to-controlled switches, direct DOM mutation, missing cleanup
 6. Compare against existing components for pattern consistency
-7. **Browser evidence** (when available): Read `.claude/BROWSER-RESULTS.md` if it exists — it contains pre-collected smoke test results, flow test results, console errors, network failures, and screenshots from the `devlyn:browser-validate` skill. Use this as additional evidence in your evaluation. Do not re-run smoke tests that are already covered.
+7. **Browser evidence** (when available): Read `.devlyn/BROWSER-RESULTS.md` if it exists — it contains pre-collected smoke test results, flow test results, console errors, network failures, and screenshots from the `devlyn:browser-validate` skill. Use this as additional evidence in your evaluation. Do not re-run smoke tests that are already covered.
    If the dev server is still running and you need deeper investigation on a specific interaction, use browser tools directly (check if `mcp__claude-in-chrome__*` tools are available, or fall back to Playwright). Focus on verifying specific findings, not duplicating the full smoke/flow suite.
-   If neither `.claude/BROWSER-RESULTS.md` exists nor browser tools are available, note "Live testing skipped — no browser validation available" in your deliverable.
+   If neither `.devlyn/BROWSER-RESULTS.md` exists nor browser tools are available, note "Live testing skipped — no browser validation available" in your deliverable.
 **Your deliverable**: Send a message to the team lead with:
 1. Component quality assessment for each new/changed component
@@ -480,7 +480,7 @@ After receiving all evaluator findings:
 1. Present the evaluation report to the user (format below).
-2. **Write findings to `.claude/EVAL-FINDINGS.md`** for downstream consumption by other agents (e.g., `/devlyn:auto-resolve` orchestrator or a follow-up `/devlyn:team-resolve`). This file enables the feedback loop — the generator can read it and fix the issues without human relay.
+2. **Write findings to `.devlyn/EVAL-FINDINGS.md`** for downstream consumption by other agents (e.g., `/devlyn:auto-resolve` orchestrator or a follow-up `/devlyn:team-resolve`). This file enables the feedback loop — the generator can read it and fix the issues without human relay.
 ```markdown
 # Evaluation Findings
@@ -502,7 +502,7 @@ After receiving all evaluator findings:
 - [pattern description]
 ```
-3. Do NOT delete `.claude/done-criteria.md` or `.claude/EVAL-FINDINGS.md` — downstream consumers (e.g., `/devlyn:auto-resolve` orchestrator or a follow-up `/devlyn:team-resolve`) may need to read them. The orchestrator or user is responsible for cleanup.
+3. Do NOT delete `.devlyn/done-criteria.md` or `.devlyn/EVAL-FINDINGS.md` — downstream consumers (e.g., `/devlyn:auto-resolve` orchestrator or a follow-up `/devlyn:team-resolve`) may need to read them. The orchestrator or user is responsible for cleanup.
 ## Phase 6: CLEANUP

package/config/skills/devlyn:team-resolve/SKILL.md CHANGED Viewed

@@ -93,7 +93,7 @@ Teammates: [list of roles being spawned and why each was chosen]
 Before any code is written, define what "done" looks like. This prevents self-evaluation bias and gives external evaluators (like `/devlyn:evaluate`) concrete criteria to grade against.
-1. Based on your Phase 1 investigation, write testable success criteria to `.claude/done-criteria.md`:
+1. Based on your Phase 1 investigation, write testable success criteria to `.devlyn/done-criteria.md`:
 ```markdown
 # Done Criteria: [issue summary]

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "devlyn-cli",
-  "version": "1.3.2",
+  "version": "1.4.0",
   "description": "Claude Code configuration toolkit for teams",
   "bin": {
     "devlyn": "bin/devlyn.js"