devlyn-cli 1.15.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (158) hide show
  1. package/AGENTS.md +104 -0
  2. package/CLAUDE.md +135 -21
  3. package/README.md +43 -125
  4. package/benchmark/auto-resolve/BENCHMARK-DESIGN.md +272 -0
  5. package/benchmark/auto-resolve/README.md +114 -0
  6. package/benchmark/auto-resolve/RUBRIC.md +162 -0
  7. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/NOTES.md +30 -0
  8. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/expected.json +68 -0
  9. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/metadata.json +10 -0
  10. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/setup.sh +4 -0
  11. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/spec.md +45 -0
  12. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/task.txt +8 -0
  13. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/NOTES.md +54 -0
  14. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected-pair-plan-registry.json +170 -0
  15. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected.json +84 -0
  16. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/metadata.json +21 -0
  17. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/pair-plan.sample-fail.json +214 -0
  18. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/pair-plan.sample-pass.json +223 -0
  19. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/setup.sh +5 -0
  20. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/spec.md +56 -0
  21. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/task.txt +14 -0
  22. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/NOTES.md +28 -0
  23. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected-pair-plan-registry.json +162 -0
  24. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected.json +65 -0
  25. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/metadata.json +19 -0
  26. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/setup.sh +4 -0
  27. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/spec.md +56 -0
  28. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/task.txt +9 -0
  29. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/NOTES.md +40 -0
  30. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/expected.json +57 -0
  31. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/metadata.json +10 -0
  32. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/setup.sh +6 -0
  33. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/spec.md +49 -0
  34. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/task.txt +9 -0
  35. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/NOTES.md +38 -0
  36. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/expected.json +65 -0
  37. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/metadata.json +10 -0
  38. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/setup.sh +55 -0
  39. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/spec.md +49 -0
  40. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/task.txt +7 -0
  41. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/NOTES.md +38 -0
  42. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/expected.json +77 -0
  43. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/metadata.json +10 -0
  44. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/setup.sh +4 -0
  45. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/spec.md +49 -0
  46. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/task.txt +10 -0
  47. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/NOTES.md +50 -0
  48. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/expected.json +76 -0
  49. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/metadata.json +10 -0
  50. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/setup.sh +36 -0
  51. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/spec.md +46 -0
  52. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/task.txt +7 -0
  53. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/NOTES.md +50 -0
  54. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/expected.json +63 -0
  55. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/metadata.json +10 -0
  56. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/setup.sh +4 -0
  57. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/spec.md +48 -0
  58. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/task.txt +1 -0
  59. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/NOTES.md +93 -0
  60. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/expected.json +74 -0
  61. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/metadata.json +10 -0
  62. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/setup.sh +28 -0
  63. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/spec.md +62 -0
  64. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/task.txt +5 -0
  65. package/benchmark/auto-resolve/fixtures/SCHEMA.md +130 -0
  66. package/benchmark/auto-resolve/fixtures/test-repo/README.md +27 -0
  67. package/benchmark/auto-resolve/fixtures/test-repo/bin/cli.js +63 -0
  68. package/benchmark/auto-resolve/fixtures/test-repo/package-lock.json +823 -0
  69. package/benchmark/auto-resolve/fixtures/test-repo/package.json +22 -0
  70. package/benchmark/auto-resolve/fixtures/test-repo/playwright.config.js +17 -0
  71. package/benchmark/auto-resolve/fixtures/test-repo/server/index.js +37 -0
  72. package/benchmark/auto-resolve/fixtures/test-repo/tests/cli.test.js +25 -0
  73. package/benchmark/auto-resolve/fixtures/test-repo/tests/server.test.js +58 -0
  74. package/benchmark/auto-resolve/fixtures/test-repo/web/index.html +37 -0
  75. package/benchmark/auto-resolve/scripts/build-pair-eligible-manifest.py +174 -0
  76. package/benchmark/auto-resolve/scripts/check-f9-artifacts.py +256 -0
  77. package/benchmark/auto-resolve/scripts/compile-report.py +331 -0
  78. package/benchmark/auto-resolve/scripts/iter-0033c-compare.py +552 -0
  79. package/benchmark/auto-resolve/scripts/judge-opus-pass.sh +430 -0
  80. package/benchmark/auto-resolve/scripts/judge.sh +359 -0
  81. package/benchmark/auto-resolve/scripts/oracle-scope-tier-a.py +260 -0
  82. package/benchmark/auto-resolve/scripts/oracle-scope-tier-b.py +274 -0
  83. package/benchmark/auto-resolve/scripts/oracle-test-fidelity.py +328 -0
  84. package/benchmark/auto-resolve/scripts/pair-plan-idgen.py +401 -0
  85. package/benchmark/auto-resolve/scripts/pair-plan-lint.py +468 -0
  86. package/benchmark/auto-resolve/scripts/run-fixture.sh +691 -0
  87. package/benchmark/auto-resolve/scripts/run-iter-0033c.sh +234 -0
  88. package/benchmark/auto-resolve/scripts/run-suite.sh +214 -0
  89. package/benchmark/auto-resolve/scripts/ship-gate.py +222 -0
  90. package/bin/devlyn.js +175 -17
  91. package/config/skills/_shared/adapters/README.md +64 -0
  92. package/config/skills/_shared/adapters/gpt-5-5.md +29 -0
  93. package/config/skills/_shared/adapters/opus-4-7.md +29 -0
  94. package/config/skills/{devlyn:auto-resolve/scripts → _shared}/archive_run.py +26 -0
  95. package/config/skills/_shared/codex-config.md +54 -0
  96. package/config/skills/_shared/codex-monitored.sh +141 -0
  97. package/config/skills/_shared/engine-preflight.md +35 -0
  98. package/config/skills/_shared/expected.schema.json +93 -0
  99. package/config/skills/_shared/pair-plan-schema.md +298 -0
  100. package/config/skills/_shared/runtime-principles.md +110 -0
  101. package/config/skills/_shared/spec-verify-check.py +519 -0
  102. package/config/skills/devlyn:ideate/SKILL.md +99 -429
  103. package/config/skills/devlyn:ideate/references/elicitation.md +97 -0
  104. package/config/skills/devlyn:ideate/references/from-spec-mode.md +54 -0
  105. package/config/skills/devlyn:ideate/references/project-mode.md +76 -0
  106. package/config/skills/devlyn:ideate/references/spec-template.md +102 -0
  107. package/config/skills/devlyn:resolve/SKILL.md +172 -184
  108. package/config/skills/devlyn:resolve/references/free-form-mode.md +68 -0
  109. package/config/skills/devlyn:resolve/references/phases/build-gate.md +45 -0
  110. package/config/skills/devlyn:resolve/references/phases/cleanup.md +39 -0
  111. package/config/skills/devlyn:resolve/references/phases/implement.md +42 -0
  112. package/config/skills/devlyn:resolve/references/phases/plan.md +42 -0
  113. package/config/skills/devlyn:resolve/references/phases/verify.md +69 -0
  114. package/config/skills/devlyn:resolve/references/state-schema.md +106 -0
  115. package/{config/skills → optional-skills}/devlyn:design-system/SKILL.md +1 -0
  116. package/{config/skills → optional-skills}/devlyn:reap/SKILL.md +1 -0
  117. package/{config/skills → optional-skills}/devlyn:team-design-ui/SKILL.md +5 -0
  118. package/package.json +12 -2
  119. package/scripts/lint-skills.sh +431 -0
  120. package/config/skills/devlyn:auto-resolve/SKILL.md +0 -252
  121. package/config/skills/devlyn:auto-resolve/evals/evals.json +0 -21
  122. package/config/skills/devlyn:auto-resolve/evals/task-doctor-subcommand.md +0 -42
  123. package/config/skills/devlyn:auto-resolve/references/build-gate.md +0 -130
  124. package/config/skills/devlyn:auto-resolve/references/engine-routing.md +0 -82
  125. package/config/skills/devlyn:auto-resolve/references/findings-schema.md +0 -103
  126. package/config/skills/devlyn:auto-resolve/references/phases/phase-1-build.md +0 -54
  127. package/config/skills/devlyn:auto-resolve/references/phases/phase-2-evaluate.md +0 -45
  128. package/config/skills/devlyn:auto-resolve/references/phases/phase-3-critic.md +0 -84
  129. package/config/skills/devlyn:auto-resolve/references/pipeline-routing.md +0 -114
  130. package/config/skills/devlyn:auto-resolve/references/pipeline-state.md +0 -201
  131. package/config/skills/devlyn:auto-resolve/scripts/terminal_verdict.py +0 -96
  132. package/config/skills/devlyn:browser-validate/SKILL.md +0 -164
  133. package/config/skills/devlyn:browser-validate/references/flow-testing.md +0 -118
  134. package/config/skills/devlyn:browser-validate/references/tier1-chrome.md +0 -137
  135. package/config/skills/devlyn:browser-validate/references/tier2-playwright.md +0 -195
  136. package/config/skills/devlyn:browser-validate/references/tier3-curl.md +0 -57
  137. package/config/skills/devlyn:clean/SKILL.md +0 -285
  138. package/config/skills/devlyn:design-ui/SKILL.md +0 -351
  139. package/config/skills/devlyn:discover-product/SKILL.md +0 -124
  140. package/config/skills/devlyn:evaluate/SKILL.md +0 -564
  141. package/config/skills/devlyn:feature-spec/SKILL.md +0 -630
  142. package/config/skills/devlyn:ideate/references/challenge-rubric.md +0 -122
  143. package/config/skills/devlyn:ideate/references/codex-critic-template.md +0 -42
  144. package/config/skills/devlyn:ideate/references/templates/item-spec.md +0 -90
  145. package/config/skills/devlyn:implement-ui/SKILL.md +0 -466
  146. package/config/skills/devlyn:preflight/SKILL.md +0 -355
  147. package/config/skills/devlyn:preflight/references/auditors/browser-auditor.md +0 -32
  148. package/config/skills/devlyn:preflight/references/auditors/code-auditor.md +0 -86
  149. package/config/skills/devlyn:preflight/references/auditors/docs-auditor.md +0 -38
  150. package/config/skills/devlyn:product-spec/SKILL.md +0 -603
  151. package/config/skills/devlyn:recommend-features/SKILL.md +0 -286
  152. package/config/skills/devlyn:review/SKILL.md +0 -161
  153. package/config/skills/devlyn:team-resolve/SKILL.md +0 -631
  154. package/config/skills/devlyn:team-review/SKILL.md +0 -493
  155. package/config/skills/devlyn:update-docs/SKILL.md +0 -463
  156. package/config/skills/workflow-routing/SKILL.md +0 -73
  157. /package/{config/skills → optional-skills}/devlyn:reap/scripts/reap.sh +0 -0
  158. /package/{config/skills → optional-skills}/devlyn:reap/scripts/scan.sh +0 -0
@@ -1,355 +0,0 @@
1
- ---
2
- name: devlyn:preflight
3
- description: >
4
- Final alignment check between vision/roadmap documents and the actual codebase before declaring
5
- a roadmap phase complete. Reads commitments from VISION.md, ROADMAP.md, and item specs, then
6
- audits the implementation with file:line evidence. Catches missing/incomplete features, spec
7
- divergence, bugs, and doc drift; validates browser behavior for web projects. Use when
8
- implementation is finished and you want a holistic roadmap-vs-code verification. Triggers on
9
- "preflight", "gap analysis", "did I miss anything", "check against the roadmap", "verify
10
- implementation", "are we done". Differs from /devlyn:evaluate (single changeset) and
11
- /devlyn:review (code quality) — preflight audits the entire project against planning docs.
12
- ---
13
-
14
- # Vision-to-Implementation Preflight Check
15
-
16
- The final gate before you declare "done." Read every promise the planning documents made, then verify each one against the actual codebase — evidence-based, no guessing.
17
-
18
- <preflight_config>
19
- $ARGUMENTS
20
- </preflight_config>
21
-
22
- <why_this_matters>
23
- After implementing a full roadmap, gaps are almost inevitable. Features get partially implemented, edge cases from specs get skipped, implementations drift from the original design, and docs fall out of sync. These gaps compound — a missing integration here, a forgotten error state there — until the shipped product doesn't match the vision.
24
-
25
- This skill catches those gaps systematically, before users do. The difference between "we built everything on the list" and "we actually delivered what we promised."
26
- </why_this_matters>
27
-
28
- <evidence_standard>
29
- Every finding must cite evidence: file:line for code, specific doc section for documentation, screenshot for browser issues. A finding without evidence is speculation — exclude it.
30
-
31
- The corollary: if you search thoroughly and can't find evidence that something exists, that IS evidence it's missing. "Searched for X across src/ and found no implementation" is a valid, evidence-based finding.
32
-
33
- This matters because the report feeds into auto-resolve. Vague findings produce vague fixes.
34
- </evidence_standard>
35
-
36
- ## Flags
37
-
38
- Parse from `<preflight_config>`:
39
- - `--phase N` — audit only phase N items (default: all phases)
40
- - `--autofix` — auto-promote all findings to roadmap items and run auto-resolve on each
41
- - `--skip-browser` — skip browser validation
42
- - `--skip-docs` — skip documentation audit
43
- - `--engine MODE` (auto) — controls which model handles audit phases. Modes:
44
- - `auto` (default): code-auditor uses Codex (SWE-bench Pro +11.7pp for code analysis), docs-auditor uses Claude (writing quality), browser-auditor uses Claude (Chrome MCP). Requires Codex MCP server.
45
- - `codex`: code-auditor uses Codex, docs-auditor and browser-auditor use Claude.
46
- - `claude`: all auditors use Claude subagents. No Codex calls.
47
-
48
- Example: `/devlyn:preflight --phase 2 --skip-browser`
49
- Example with engine: `/devlyn:preflight --engine auto`
50
-
51
- **Engine pre-flight** (runs unless `--engine claude` was explicitly passed):
52
- - The default engine is `auto`. If the user did not pass `--engine`, the engine is `auto` — NOT `claude`.
53
- - Call `mcp__codex-cli__ping` to verify Codex MCP availability. On failure, **silently fall back to `--engine claude`** and note `engine downgraded: codex-ping failed` in the final preflight report header. Do not abort. Matches the hands-off contract used by auto-resolve and ideate.
54
-
55
- ## PHASE 0: DISCOVER & SCOPE
56
-
57
- <use_parallel_tool_calls>
58
- Phase 0 and Phase 1 do many independent reads (planning docs, item specs, prior state). When tool calls have no dependencies between them, issue them in parallel in a single response — that includes globbing for spec files and reading several specs at once. Only chain calls that depend on values from a previous call.
59
- </use_parallel_tool_calls>
60
-
61
- 1. **Find planning documents** — search in parallel:
62
- - `docs/VISION.md`
63
- - `docs/ROADMAP.md`
64
- - `docs/roadmap/` directory (item specs)
65
- - If none found, stop clearly: "No vision/roadmap documents found. Run `/devlyn:ideate` first to create them."
66
-
67
- 2. **Determine scope**:
68
- - If `--phase N` specified → only read specs in `docs/roadmap/phase-N/`
69
- - Otherwise → read all phases
70
- - Read `docs/roadmap/backlog/` to identify deferred items (excluded from audit)
71
-
72
- 3. **Check for prior state**:
73
- - If `.devlyn/PREFLIGHT-REPORT.md` exists from a previous run → note it for delta comparison in PHASE 4
74
- - If `.devlyn/preflight-accepted.md` exists → load accepted divergences to filter in PHASE 4
75
-
76
- 4. **Announce**:
77
- ```
78
- Preflight check starting
79
- Scope: [Phase N / All phases]
80
- Documents: VISION.md, ROADMAP.md, [N] item specs
81
- Deferred items (excluded): [N]
82
- Previous run: [found — will show delta / none]
83
- Phases: 1 Extract → 2 Audit (code + docs + browser) → 3 Report → 4 Triage
84
- ```
85
-
86
- ## PHASE 1: EXTRACT COMMITMENTS
87
-
88
- Read all in-scope planning documents and build a **commitment registry** — every concrete promise the documents make. This registry is the grading rubric for all auditors.
89
-
90
- 1. **Read in parallel**: VISION.md, ROADMAP.md, all in-scope item specs, phase `_overview.md` files
91
-
92
- 2. **Extract from each item spec**:
93
- - Requirements section → each bullet becomes a `FEATURE` or `BEHAVIOR` commitment
94
- - Constraints section → each becomes a `CONSTRAINT` commitment
95
- - Dependencies section → each becomes an `INTEGRATION` commitment
96
- - Explicit test requirements → `TEST` commitments
97
-
98
- 3. **Extract from VISION.md**: high-level success criteria — checked at a broader level ("the system supports X" rather than "file Y has function Z")
99
-
100
- 4. **Filter out** (excluded from audit entirely):
101
- - Items in `backlog/` or `deferred.md`
102
- - Items with `status: cut` in ROADMAP.md
103
-
104
- 5. **Anti-commitments ARE audited** (Out of Scope entries in each spec). These are "must NOT build" claims — if the codebase has shipped something the spec explicitly excluded, that is a WORKAROUND / scope-creep finding, not a success. The code-auditor checks each anti-commitment: "is this excluded behavior present in the code?" If yes → emit a finding with `rule_id: "scope.anti-commitment-violation"` (severity HIGH).
105
-
106
- 6. **Separate planned items**: Items with `status: planned` in their spec frontmatter or "Planned" in ROADMAP.md are not expected to be implemented yet. Include them in a `[PLANNED]` section of the registry for visibility, but do **not** audit them as missing. Flagging planned items as MISSING creates noise and buries the real gaps in work that was supposed to be done.
107
-
108
- 7. **Write to `.devlyn/commitment-registry.md`**:
109
-
110
- ```markdown
111
- # Commitment Registry
112
- Generated: [timestamp]
113
- Scope: [phase N / all]
114
- Total commitments: [N]
115
-
116
- ## Phase 1: [name]
117
- ### 1.1 [item title] (spec status: [done/in-progress/planned])
118
- - [FEATURE] User can sign up with email and password
119
- - [BEHAVIOR] Failed login returns 401 with specific error message
120
- - [CONSTRAINT] Passwords hashed with bcrypt, min 8 characters
121
- - [INTEGRATION] Auth middleware applied to all /api/* routes
122
- - [TEST] Auth flow covered by E2E tests
123
-
124
- ## Anti-Commitments (Out of Scope — audited as "must NOT exist in code")
125
- - [item 1.1] Must NOT include social login
126
- - [item 1.2] Must NOT include real-time inventory sync
127
-
128
- ## Not Started (Planned — not audited for presence, but still anti-commitments inside them apply)
129
- ### 2.1 [item title] (spec status: planned)
130
- - [FEATURE] WebSocket connection on page load
131
- - [FEATURE] Real-time task list updates
132
- [Planned items are tracked for visibility; code-auditor does not flag as MISSING.]
133
- ```
134
-
135
- ## PHASE 2: AUDIT
136
-
137
- Spawn all applicable auditors in parallel. Each reads `.devlyn/commitment-registry.md` and investigates from their perspective.
138
-
139
- ### code-auditor (always)
140
-
141
- Engine routes per the auto-resolve skill's `references/engine-routing.md` ("Pipeline Phase Routing (preflight)" → CODE AUDIT row): Codex on `--engine auto`/`codex`, Claude on `--engine claude`. When the route is **Codex**, call `mcp__codex-cli__codex` with the auditor prompt inline (Codex cannot read `.devlyn/commitment-registry.md` directly under `read-only` sandbox, so paste the registry into the prompt). When the route is **Claude**, spawn a subagent with `mode: "bypassPermissions"`. Read the auditor prompt from `references/auditors/code-auditor.md` either way.
142
-
143
- The code-auditor classifies each commitment as IMPLEMENTED, MISSING, INCOMPLETE, DIVERGENT, or BROKEN — with file:line evidence. Also catches cross-feature integration gaps and constraint violations. Writes to `.devlyn/audit-code.md`.
144
-
145
- ### docs-auditor (unless --skip-docs)
146
-
147
- Always Claude (writing-quality strength) regardless of `--engine`. Spawn a subagent with `mode: "bypassPermissions"`. Read the full prompt from `references/auditors/docs-auditor.md` and pass it to the subagent.
148
-
149
- Checks: ROADMAP.md status accuracy, README alignment, API doc coverage, VISION.md currency, item spec status. Writes to `.devlyn/audit-docs.md`.
150
-
151
- ### browser-auditor (conditional)
152
-
153
- Always Claude (Chrome MCP tools are session-bound) regardless of `--engine`.
154
-
155
- **Skip conditions** (check in order):
156
- 1. `--skip-browser` flag → skip
157
- 2. No web-relevant files in project (no `*.tsx`, `*.jsx`, `*.vue`, `*.svelte`, `*.html`, `page.*`, `layout.*`) → skip with note "Browser validation skipped — no web files detected"
158
- 3. Otherwise → spawn
159
-
160
- Spawn a subagent with `mode: "bypassPermissions"`. Read the full prompt from `references/auditors/browser-auditor.md` and pass it to the subagent.
161
-
162
- Tests user-facing features in the browser against commitment registry. Writes to `.devlyn/audit-browser.md`.
163
-
164
- **After all auditors complete**: Read each audit file and proceed to PHASE 3.
165
-
166
- ## PHASE 3: SYNTHESIZE & REPORT
167
-
168
- Auditors already emit each finding with its category (`MISSING`/`INCOMPLETE`/`DIVERGENT`/`BROKEN`/`UNDOCUMENTED`/`STALE_DOC`/`scope.anti-commitment-violation`) and severity (`CRITICAL`/`HIGH`/`MEDIUM`/`LOW`). Synthesis passes them through — do NOT re-classify or re-severity-label. That would replace domain judgment with orchestrator mechanics.
169
-
170
- 1. **Read all audit files** in parallel:
171
- - `.devlyn/audit-code.md`
172
- - `.devlyn/audit-docs.md` (if exists)
173
- - `.devlyn/audit-browser.md` (if exists)
174
-
175
- 2. **Deduplicate**: if multiple auditors flagged the same issue (same category + file:line), merge into one finding at the highest severity the reporting auditor assigned. Trust the auditor's severity — do not override.
176
-
177
- 3. **Filter accepted divergences**: if `.devlyn/preflight-accepted.md` exists, remove findings whose (category, commitment) matches an accepted entry.
178
-
179
- 4. **Compare with previous run** (if `.devlyn/PREFLIGHT-REPORT.md` existed):
180
- - `RESOLVED`: finding from previous run no longer present
181
- - `PERSISTS`: finding still present
182
- - `NEW`: finding not in previous run
183
-
184
- 5. **Generate `.devlyn/PREFLIGHT-REPORT.md`**:
185
-
186
- ```markdown
187
- # Preflight Report
188
- Generated: [timestamp]
189
- Scope: [phase N / all]
190
- Previous run: [timestamp / none]
191
-
192
- ## Summary
193
- | Category | Count |
194
- |----------|-------|
195
- | MISSING | [N] |
196
- | INCOMPLETE | [N] |
197
- | DIVERGENT | [N] |
198
- | BROKEN | [N] |
199
- | SCOPE_VIOLATION | [N] |
200
- | UNDOCUMENTED | [N] |
201
- | STALE_DOC | [N] |
202
- | **Total findings** | **[N]** |
203
-
204
- ## Delta (vs previous run)
205
- - Resolved: [N]
206
- - Persists: [N]
207
- - New: [N]
208
-
209
- ## Commitment Coverage
210
- - Active commitments (done/in-progress specs): [N]
211
- - Verified (IMPLEMENTED): [N] ([%])
212
- - Issues found: [N] ([%])
213
- - Planned items (excluded from audit): [N] across [M] specs
214
-
215
- ## Findings
216
-
217
- ### CRITICAL
218
- - **[MISSING]** `1.2` — Order cancellation flow
219
- - **Commitment**: "User can cancel pending orders within 24 hours"
220
- - **Evidence**: No cancellation endpoint in `src/api/orders/`. No cancel button in `src/components/OrderDetail.tsx`.
221
- - **Impact**: Core user workflow completely absent.
222
-
223
- ### HIGH
224
- - **[INCOMPLETE]** `1.1` — Error handling on signup
225
- - **Commitment**: "Failed signup shows specific validation errors"
226
- - **Evidence**: `src/api/auth/signup.ts:34` returns generic 500. No field-level validation.
227
- - **Impact**: Users see "Something went wrong" instead of actionable feedback.
228
-
229
- ### MEDIUM
230
- ...
231
-
232
- ### LOW
233
- ...
234
-
235
- ## Documentation Findings
236
- - [STALE_DOC] ROADMAP.md: Item 1.3 status "In Progress" → should be "Done"
237
- - [UNDOCUMENTED] WebSocket real-time updates not mentioned in README
238
-
239
- ## What's Verified
240
- [Explicitly list areas that passed — balanced feedback prevents over-correction]
241
- - Auth flow: all 5 commitments verified (signup, login, logout, password reset, session management)
242
- - Database schema: matches all spec constraints
243
-
244
- ## Not Started (Expected — Planned Items)
245
- [List planned items here for visibility, not as findings]
246
- - 2.1 Real-time Updates — status: planned, 5 commitments
247
- - 2.2 Team Management — status: planned, 6 commitments
248
- These items are acknowledged future work per the roadmap. They will be audited when their status changes to in-progress or done.
249
-
250
- ## Accepted Divergences (from previous runs)
251
- - [list any, or "None"]
252
- ```
253
-
254
- 6. **Present the report** to the user with a summary.
255
-
256
- ## PHASE 4: TRIAGE & PROMOTE
257
-
258
- How this phase runs depends on the `--autofix` flag:
259
-
260
- ### Without --autofix (default — interactive)
261
-
262
- Present findings and guide the user through triage:
263
-
264
- ```
265
- Preflight found [N] findings across [categories].
266
-
267
- For each finding, you can:
268
- 1. **Promote** → creates a roadmap item spec, adds to ROADMAP.md
269
- 2. **Accept** → marks as intentional divergence (won't flag on future runs)
270
- 3. **Skip** → leave for later
271
-
272
- Which findings would you like to promote to the roadmap?
273
- ```
274
-
275
- **When the user confirms findings to promote:**
276
-
277
- 1. **Generate item specs** for each confirmed finding, following the ideate template format:
278
- ```markdown
279
- ---
280
- id: "[phase].[next-number]"
281
- title: "[Fix/Add: description]"
282
- phase: [N]
283
- status: planned
284
- priority: [derived from finding severity]
285
- complexity: [estimated from finding scope]
286
- depends-on: []
287
- ---
288
-
289
- # [id] [Title]
290
-
291
- ## Context
292
- Preflight check identified this gap against the original roadmap specification.
293
- [Brief context from the original commitment and what's wrong]
294
-
295
- ## Objective
296
- [What needs to be true after this is fixed]
297
-
298
- ## Requirements
299
- - [ ] [Specific fix requirement derived from the finding]
300
- - [ ] [Verification step]
301
-
302
- ## Constraints
303
- - Must align with original spec at docs/roadmap/phase-N/[original-item].md
304
-
305
- ## Out of Scope
306
- - Changes beyond what the original spec requires
307
- ```
308
-
309
- 2. **Place specs** in the appropriate roadmap phase directory (same phase as the original item, or a new "fixes" phase if multiple phases are affected)
310
-
311
- 3. **Update ROADMAP.md** with new rows for promoted findings
312
-
313
- 4. **Record accepted divergences** in `.devlyn/preflight-accepted.md`:
314
- ```markdown
315
- # Accepted Divergences
316
- # Findings marked as intentional — excluded from future preflight runs
317
-
318
- - [item-id] [commitment]: [reason accepted]
319
- ```
320
-
321
- 5. **STALE_DOC findings**: Fix these directly — update ROADMAP.md statuses, item spec frontmatter, and VISION.md "What's Next" sections. These are factual corrections, not implementation decisions.
322
-
323
- 6. **Suggest next steps**:
324
- ```
325
- Triage complete.
326
- - [N] findings promoted to roadmap ([list item IDs])
327
- - [N] divergences accepted
328
- - [N] doc issues fixed directly
329
-
330
- Next steps:
331
- - To implement fixes: /devlyn:auto-resolve "Implement per spec at docs/roadmap/phase-N/[id]-[name].md"
332
- - For CRITICAL severity or complex DIVERGENT findings, the default `--engine auto` already routes BUILD/FIX to Codex and EVALUATE/CHALLENGE to Claude (cross-model GAN dynamic). No extra flag needed.
333
- - To re-run preflight after fixes: /devlyn:preflight [same flags]
334
- - To add new features discovered during audit: /devlyn:ideate expand
335
- ```
336
-
337
- ### With --autofix
338
-
339
- 1. Auto-promote all CRITICAL and HIGH findings to roadmap items (steps 1-3 above)
340
- 2. Fix all STALE_DOC findings directly
341
- 3. MEDIUM and LOW findings are reported but not auto-promoted (include in report with note "manually promote if needed")
342
- 4. For each promoted item, spawn `/devlyn:auto-resolve` sequentially:
343
- ```
344
- /devlyn:auto-resolve "Implement per spec at docs/roadmap/phase-N/[id]-[name].md"
345
- ```
346
- 5. After all auto-resolve runs complete, re-run preflight (without --autofix) as a verification pass
347
- 6. Present final delta report showing what was resolved
348
-
349
- <autofix_safety>
350
- Auto-promoting only CRITICAL and HIGH findings prevents noise — MEDIUM/LOW findings often benefit from human judgment on whether they're worth fixing or should be accepted as intentional divergence. The user can always manually promote remaining findings after reviewing the report.
351
- </autofix_safety>
352
-
353
- ## Language
354
-
355
- Generate all documents and reports in the language the user communicates in. Keep technical terms (file paths, code references, category names like MISSING/DIVERGENT) in English for consistency with the rest of the devlyn toolchain.
@@ -1,32 +0,0 @@
1
- # Browser Auditor Prompt
2
-
3
- Use this as the subagent prompt when spawning the browser-auditor in PHASE 2.
4
-
5
- **Skip conditions** (check in order before spawning):
6
- 1. `--skip-browser` flag → skip
7
- 2. No web-relevant files in project (no `*.tsx`, `*.jsx`, `*.vue`, `*.svelte`, `*.html`, `page.*`, `layout.*`) → skip with note "Browser validation skipped — no web files detected"
8
- 3. Otherwise → spawn
9
-
10
- ---
11
-
12
- You are performing browser-based verification of a web application against its planning commitments.
13
-
14
- Read `.devlyn/commitment-registry.md` for the user-facing features that should be working.
15
-
16
- **Your workflow:**
17
- 1. Read `.claude/skills/devlyn:browser-validate/SKILL.md` for the browser testing methodology and tier system
18
- 2. Start the dev server
19
- 3. For each user-facing FEATURE and BEHAVIOR commitment:
20
- - Navigate to the relevant page
21
- - Perform the user action described in the commitment
22
- - Verify the expected outcome
23
- - Take screenshots as evidence
24
- 4. Pay special attention to:
25
- - Error states: trigger errors and verify error UI appears
26
- - Empty states: verify empty state UI for lists/collections
27
- - Loading states: verify loading indicators during async operations
28
- - Edge cases explicitly mentioned in specs
29
-
30
- Write findings to `.devlyn/audit-browser.md` with screenshot paths as evidence.
31
-
32
- If browser tools are unavailable, fall back to HTTP smoke testing (curl endpoints, verify response codes and shapes). Note the reduced coverage in your findings.
@@ -1,86 +0,0 @@
1
- # Code Auditor Prompt
2
-
3
- Use this as the subagent prompt when spawning the code-auditor in PHASE 2.
4
-
5
- ---
6
-
7
- You are auditing a codebase against its planning commitments. Your job is to verify that every commitment was actually implemented — and implemented correctly.
8
-
9
- Read `.devlyn/commitment-registry.md` for the full list of commitments to verify. Skip any items in the "Not Started (Planned)" section — those are acknowledged future work, not gaps.
10
-
11
- **Step 0 — Build health check**: Before auditing individual commitments, verify the project actually builds. Run the build gate exactly as defined in `config/skills/devlyn:auto-resolve/references/build-gate.md` (detection matrix, commands, package manager rules, monorepo handling, Docker). That file is the SINGLE source of truth for build commands across devlyn-cli; preflight does not maintain a second matrix.
12
-
13
- Any build/typecheck failure is a BROKEN finding at CRITICAL severity — code that doesn't compile cannot fulfill any commitment. Include the full compiler error output with file:line references. This catches type errors, missing imports, cross-package drift, and Dockerfile build failures that text-based code reading alone cannot detect.
14
-
15
- **For each active commitment (not planned):**
16
- 1. Search the codebase for its implementation (use Grep, Glob, Read in parallel where possible)
17
- 2. Read the implementing code thoroughly — line by line for critical paths
18
- 3. Classify the commitment:
19
-
20
- | Classification | Meaning | Evidence required |
21
- |---|---|---|
22
- | IMPLEMENTED | Code exists and fulfills the commitment | file:line showing the implementation |
23
- | MISSING | No implementation found after thorough search | What you searched for and where |
24
- | INCOMPLETE | Implementation started but doesn't fully satisfy | What's there + what's missing, both with file:line |
25
- | DIVERGENT | Implementation does something different than specified | Spec requirement vs actual behavior, with file:line |
26
- | BROKEN | Implementation exists but has a bug preventing it from working | The bug with file:line |
27
- | SCOPE_VIOLATION | Code ships behavior an anti-commitment (`Out of Scope`) explicitly excluded | file:line showing the prohibited behavior |
28
-
29
- **Anti-commitment audit** (new in v3.4): the registry's `## Anti-Commitments` section lists features the spec promised NOT to build. Check each one against the code:
30
- - If the excluded behavior is present, emit a finding with `rule_id: "scope.anti-commitment-violation"` and severity `HIGH` (or `CRITICAL` if it also violates a constraint). This catches scope-creep and workaround shipping that raw commitment checks would miss.
31
- - If the excluded behavior is absent, no finding — anti-commitments are satisfied by absence.
32
-
33
- **Beyond the commitment checklist**, also investigate:
34
- - Cross-feature integration gaps: features that should connect but don't
35
- - Error handling specified in specs but not implemented in code
36
- - Constraints specified but violated (e.g., spec says "use bcrypt" but code uses plaintext)
37
- - Edge cases explicitly mentioned in specs but unhandled
38
-
39
- <code_auditor_calibration>
40
- Calibrate your judgment with these examples:
41
-
42
- **This IS a finding (INCOMPLETE)**:
43
- Spec says "failed API calls display an error banner with retry button."
44
- Code at `src/components/Dashboard.tsx:42` has `catch (e) { console.error(e) }` — error is logged but no UI feedback. The user sees a blank screen on failure.
45
- Why: logging is not user-facing error handling. The commitment specifies visible feedback.
46
-
47
- **This IS a finding (DIVERGENT)**:
48
- Spec says "alert admin via push notification when stock below threshold."
49
- Code at `src/inventory/alerts.ts:28` sends an email instead.
50
- Why: the channel matters — push notification has different urgency characteristics than email.
51
-
52
- **This is NOT a finding**:
53
- Spec says "store user preferences." Code stores them in localStorage instead of the database.
54
- Why: unless the spec explicitly requires server-side persistence, the implementation choice is reasonable. The commitment is fulfilled.
55
-
56
- **General rule**: focus on whether the user-facing OUTCOME matches the commitment, not on internal implementation details. But when the spec explicitly constrains HOW something should work, verify that too.
57
- </code_auditor_calibration>
58
-
59
- Write findings to `.devlyn/audit-code.md`:
60
-
61
- ```markdown
62
- # Code Audit Findings
63
-
64
- ## Summary
65
- - Commitments checked: [N]
66
- - IMPLEMENTED: [N]
67
- - MISSING: [N]
68
- - INCOMPLETE: [N]
69
- - DIVERGENT: [N]
70
- - BROKEN: [N]
71
-
72
- ## Findings
73
-
74
- ### [MISSING] 1.1 — Email validation on signup
75
- **Commitment**: "Email format validated on signup"
76
- **Evidence**: Searched `src/auth/`, `src/validators/`, `src/api/auth*`. No validation found. `src/api/auth/signup.ts:15` accepts email parameter without any format check.
77
- **Severity**: HIGH
78
- **Impact**: Invalid emails enter the database, breaking password reset flow.
79
-
80
- ### [DIVERGENT] 1.3 — Inventory threshold alerts
81
- **Commitment**: "Alert admin via push notification when stock below threshold"
82
- **Spec says**: Push notification
83
- **Code does**: Email only (`src/inventory/alerts.ts:28`)
84
- **Severity**: MEDIUM
85
- **Impact**: Alerts work but through a lower-urgency channel than specified.
86
- ```
@@ -1,38 +0,0 @@
1
- # Docs Auditor Prompt
2
-
3
- Use this as the subagent prompt when spawning the docs-auditor in PHASE 2.
4
-
5
- ---
6
-
7
- You are auditing documentation alignment for a project. Your job is to find mismatches between what the docs say and what the code actually does.
8
-
9
- Read `.devlyn/commitment-registry.md` for context on what was planned.
10
-
11
- **Check these dimensions:**
12
-
13
- 1. **ROADMAP.md status accuracy**: For each item marked "Done" in ROADMAP.md, verify the implementation exists. For items marked "In Progress", check if they're actually complete or still in progress. Status mismatches are common and misleading.
14
-
15
- 2. **README alignment**: Compare features listed in README.md against actual implementation. Find features claimed but not built (misleading) and features built but not mentioned (undocumented).
16
-
17
- 3. **API documentation**: If API docs exist (`docs/api*`, swagger, openapi), compare documented endpoints against actual route files. Find undocumented endpoints and documented-but-missing endpoints.
18
-
19
- 4. **VISION.md currency**: Check if "What's Next" or future sections reference work that's already done, or if success criteria have been met without acknowledgment.
20
-
21
- 5. **Item spec status accuracy**: For each item spec, verify the frontmatter `status` field matches reality. An item marked `planned` that's fully implemented should be updated to `done`.
22
-
23
- Write findings to `.devlyn/audit-docs.md`:
24
-
25
- ```markdown
26
- # Documentation Audit Findings
27
-
28
- ## ROADMAP.md Status Accuracy
29
- - [STALE_DOC] Item 1.3 marked "In Progress" — implementation is complete (evidence: src/inventory/ fully implemented)
30
- - [STALE_DOC] Item 2.1 marked "Done" — only partially implemented (missing: webhook handler)
31
-
32
- ## README Alignment
33
- - [UNDOCUMENTED] Real-time notifications exist in code but README doesn't mention them
34
- - [STALE_DOC] README claims "SSO support" — no SSO implementation found
35
-
36
- ## Item Spec Status
37
- - [STALE_DOC] docs/roadmap/phase-1/1.2-order-mgmt.md: status says "planned", should be "done"
38
- ```