devlyn-cli 2.3.0 → 2.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (219) hide show
  1. package/AGENTS.md +1 -1
  2. package/CLAUDE.md +2 -2
  3. package/README.md +80 -29
  4. package/benchmark/auto-resolve/BENCHMARK-DESIGN.md +61 -44
  5. package/benchmark/auto-resolve/BENCHMARK-RESULTS.md +341 -0
  6. package/benchmark/auto-resolve/README.md +307 -44
  7. package/benchmark/auto-resolve/RUBRIC.md +23 -14
  8. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/NOTES.md +7 -3
  9. package/benchmark/auto-resolve/fixtures/F10-persist-write-collision/NOTES.md +8 -3
  10. package/benchmark/auto-resolve/fixtures/F11-batch-import-all-or-nothing/NOTES.md +8 -3
  11. package/benchmark/auto-resolve/fixtures/F12-webhook-raw-body-signature/NOTES.md +10 -4
  12. package/benchmark/auto-resolve/fixtures/F15-frozen-diff-race-review/NOTES.md +10 -4
  13. package/benchmark/auto-resolve/fixtures/F16-cli-quote-tax-rules/NOTES.md +12 -0
  14. package/benchmark/auto-resolve/fixtures/F16-cli-quote-tax-rules/spec.md +6 -0
  15. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/NOTES.md +7 -4
  16. package/benchmark/auto-resolve/fixtures/F21-cli-scheduler-priority/NOTES.md +12 -0
  17. package/benchmark/auto-resolve/fixtures/F21-cli-scheduler-priority/spec.md +6 -0
  18. package/benchmark/auto-resolve/fixtures/F22-cli-ledger-close/NOTES.md +8 -0
  19. package/benchmark/auto-resolve/fixtures/F23-cli-fulfillment-wave/NOTES.md +12 -0
  20. package/benchmark/auto-resolve/fixtures/F23-cli-fulfillment-wave/spec.md +6 -0
  21. package/benchmark/auto-resolve/fixtures/F25-cli-cart-promotion-rules/NOTES.md +16 -4
  22. package/benchmark/auto-resolve/fixtures/F25-cli-cart-promotion-rules/spec.md +7 -0
  23. package/benchmark/auto-resolve/fixtures/F26-cli-payout-ledger-rules/NOTES.md +11 -5
  24. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/NOTES.md +8 -1
  25. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected.json +4 -2
  26. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/spec.md +1 -1
  27. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/NOTES.md +34 -0
  28. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/expected.json +57 -0
  29. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/metadata.json +10 -0
  30. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/setup.sh +2 -0
  31. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/spec.md +67 -0
  32. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/task.txt +7 -0
  33. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/verifiers/duplicate-event-error.js +35 -0
  34. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/verifiers/priority-transfer-rollback.js +53 -0
  35. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/NOTES.md +38 -0
  36. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/expected.json +57 -0
  37. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/metadata.json +10 -0
  38. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/setup.sh +2 -0
  39. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/spec.md +70 -0
  40. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/task.txt +3 -0
  41. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/verifiers/duplicate-renewal-error.js +42 -0
  42. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/verifiers/priority-credit-rollback.js +70 -0
  43. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/NOTES.md +10 -3
  44. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/NOTES.md +7 -0
  45. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/NOTES.md +5 -0
  46. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/NOTES.md +7 -0
  47. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/NOTES.md +3 -0
  48. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/spec.md +1 -1
  49. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/NOTES.md +15 -3
  50. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/spec.md +1 -1
  51. package/benchmark/auto-resolve/fixtures/SCHEMA.md +53 -7
  52. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/NOTES.md +37 -0
  53. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/RETIRED.md +13 -0
  54. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/expected.json +56 -0
  55. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/metadata.json +10 -0
  56. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/setup.sh +18 -0
  57. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/spec.md +69 -0
  58. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/task.txt +7 -0
  59. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/verifiers/exact-proration.js +48 -0
  60. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/verifiers/rules-source-and-conflict.js +79 -0
  61. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/NOTES.md +54 -0
  62. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/RETIRED.md +7 -0
  63. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/expected.json +67 -0
  64. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/metadata.json +10 -0
  65. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/setup.sh +2 -0
  66. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/spec.md +67 -0
  67. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/task.txt +5 -0
  68. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/verifiers/policy-precedence.js +72 -0
  69. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/verifiers/validation-and-immutability.js +43 -0
  70. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/verifiers/validation-boundary.js +116 -0
  71. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/NOTES.md +35 -0
  72. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/RETIRED.md +12 -0
  73. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/expected.json +58 -0
  74. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/metadata.json +10 -0
  75. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/setup.sh +2 -0
  76. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/spec.md +73 -0
  77. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/task.txt +17 -0
  78. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/verifiers/mixed-idempotent-settlement.js +53 -0
  79. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/verifiers/rejection-boundaries.js +74 -0
  80. package/benchmark/auto-resolve/fixtures/retired/F9-e2e-ideate-to-preflight/NOTES.md +60 -0
  81. package/benchmark/auto-resolve/fixtures/retired/F9-e2e-ideate-to-preflight/RETIRED.md +29 -0
  82. package/benchmark/auto-resolve/fixtures/retired/F9-e2e-ideate-to-preflight/expected.json +73 -0
  83. package/benchmark/auto-resolve/fixtures/retired/F9-e2e-ideate-to-preflight/metadata.json +10 -0
  84. package/benchmark/auto-resolve/fixtures/retired/F9-e2e-ideate-to-preflight/setup.sh +28 -0
  85. package/benchmark/auto-resolve/fixtures/retired/F9-e2e-ideate-to-preflight/spec.md +58 -0
  86. package/benchmark/auto-resolve/fixtures/retired/F9-e2e-ideate-to-preflight/task.txt +5 -0
  87. package/benchmark/auto-resolve/results/20260510-f16-f23-f25-combined-proof/full-pipeline-pair-gate.json +82 -0
  88. package/benchmark/auto-resolve/results/20260510-f16-f23-f25-combined-proof/full-pipeline-pair-gate.md +18 -0
  89. package/benchmark/auto-resolve/results/20260510-f16-f23-f25-combined-proof/headroom-gate.json +46 -0
  90. package/benchmark/auto-resolve/results/20260510-f16-f23-f25-combined-proof/headroom-gate.md +17 -0
  91. package/benchmark/auto-resolve/run-real-benchmark.md +303 -0
  92. package/benchmark/auto-resolve/scripts/audit-headroom-rejections.py +441 -0
  93. package/benchmark/auto-resolve/scripts/audit-pair-evidence.py +1256 -0
  94. package/benchmark/auto-resolve/scripts/build-pair-eligible-manifest.py +147 -15
  95. package/benchmark/auto-resolve/scripts/check-f9-artifacts.py +28 -16
  96. package/benchmark/auto-resolve/scripts/collect-swebench-predictions.py +11 -1
  97. package/benchmark/auto-resolve/scripts/compile-report.py +208 -46
  98. package/benchmark/auto-resolve/scripts/fetch-swebench-instances.py +22 -4
  99. package/benchmark/auto-resolve/scripts/frozen-verify-gate.py +175 -30
  100. package/benchmark/auto-resolve/scripts/full-pipeline-pair-gate.py +408 -46
  101. package/benchmark/auto-resolve/scripts/headroom-gate.py +270 -39
  102. package/benchmark/auto-resolve/scripts/iter-0033c-compare.py +164 -33
  103. package/benchmark/auto-resolve/scripts/iter-0033c-l1-summary.py +97 -0
  104. package/benchmark/auto-resolve/scripts/judge-opus-pass.sh +150 -38
  105. package/benchmark/auto-resolve/scripts/judge.sh +153 -26
  106. package/benchmark/auto-resolve/scripts/oracle-scope-tier-a.py +12 -5
  107. package/benchmark/auto-resolve/scripts/oracle-scope-tier-b.py +25 -2
  108. package/benchmark/auto-resolve/scripts/pair-candidate-frontier.py +469 -0
  109. package/benchmark/auto-resolve/scripts/pair-plan-idgen.py +5 -5
  110. package/benchmark/auto-resolve/scripts/pair-plan-lint.py +9 -2
  111. package/benchmark/auto-resolve/scripts/pair-rejected-fixtures.sh +91 -0
  112. package/benchmark/auto-resolve/scripts/pair_evidence_contract.py +269 -0
  113. package/benchmark/auto-resolve/scripts/prepare-swebench-frozen-case.py +39 -10
  114. package/benchmark/auto-resolve/scripts/prepare-swebench-frozen-corpus.py +34 -4
  115. package/benchmark/auto-resolve/scripts/prepare-swebench-solver-worktree.py +23 -5
  116. package/benchmark/auto-resolve/scripts/recent-benchmark-summary.py +232 -0
  117. package/benchmark/auto-resolve/scripts/run-fixture.sh +118 -51
  118. package/benchmark/auto-resolve/scripts/run-frozen-verify-pair.sh +211 -39
  119. package/benchmark/auto-resolve/scripts/run-full-pipeline-pair-candidate.sh +335 -39
  120. package/benchmark/auto-resolve/scripts/run-headroom-candidate.sh +249 -6
  121. package/benchmark/auto-resolve/scripts/run-iter-0033c.sh +22 -48
  122. package/benchmark/auto-resolve/scripts/run-suite.sh +44 -7
  123. package/benchmark/auto-resolve/scripts/run-swebench-frozen-corpus.sh +120 -19
  124. package/benchmark/auto-resolve/scripts/run-swebench-solver-batch.sh +32 -14
  125. package/benchmark/auto-resolve/scripts/ship-gate.py +219 -50
  126. package/benchmark/auto-resolve/scripts/solo-ceiling-avoidance.py +53 -0
  127. package/benchmark/auto-resolve/scripts/solo-headroom-hypothesis.py +77 -0
  128. package/benchmark/auto-resolve/scripts/swebench-frozen-matrix.py +239 -26
  129. package/benchmark/auto-resolve/scripts/test-audit-headroom-rejections.sh +288 -0
  130. package/benchmark/auto-resolve/scripts/test-audit-pair-evidence.sh +1672 -0
  131. package/benchmark/auto-resolve/scripts/test-benchmark-arg-parsing.sh +933 -0
  132. package/benchmark/auto-resolve/scripts/test-build-pair-eligible-manifest.sh +491 -0
  133. package/benchmark/auto-resolve/scripts/test-check-f9-artifacts.sh +91 -0
  134. package/benchmark/auto-resolve/scripts/test-frozen-verify-gate.sh +328 -3
  135. package/benchmark/auto-resolve/scripts/test-full-pipeline-pair-gate.sh +497 -18
  136. package/benchmark/auto-resolve/scripts/test-headroom-gate.sh +331 -14
  137. package/benchmark/auto-resolve/scripts/test-iter-0033c-compare.sh +525 -0
  138. package/benchmark/auto-resolve/scripts/test-iter-0033c-l1-summary.sh +254 -0
  139. package/benchmark/auto-resolve/scripts/test-lint-fixtures.sh +580 -0
  140. package/benchmark/auto-resolve/scripts/test-pair-candidate-frontier.sh +591 -0
  141. package/benchmark/auto-resolve/scripts/test-run-full-pipeline-pair-candidate.sh +497 -0
  142. package/benchmark/auto-resolve/scripts/test-run-headroom-candidate.sh +401 -0
  143. package/benchmark/auto-resolve/scripts/test-run-swebench-solver-batch.sh +111 -0
  144. package/benchmark/auto-resolve/scripts/test-ship-gate.sh +1189 -0
  145. package/benchmark/auto-resolve/scripts/test-swebench-frozen-case.sh +924 -5
  146. package/benchmark/auto-resolve/shadow-fixtures/S1-cli-lang-flag/NOTES.md +28 -0
  147. package/benchmark/auto-resolve/shadow-fixtures/S1-cli-lang-flag/expected.json +63 -0
  148. package/benchmark/auto-resolve/shadow-fixtures/S1-cli-lang-flag/metadata.json +10 -0
  149. package/benchmark/auto-resolve/shadow-fixtures/S1-cli-lang-flag/setup.sh +3 -0
  150. package/benchmark/auto-resolve/shadow-fixtures/S1-cli-lang-flag/spec.md +47 -0
  151. package/benchmark/auto-resolve/shadow-fixtures/S1-cli-lang-flag/task.txt +1 -0
  152. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/NOTES.md +34 -0
  153. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/expected.json +53 -0
  154. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/metadata.json +10 -0
  155. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/setup.sh +3 -0
  156. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/spec.md +50 -0
  157. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/task.txt +1 -0
  158. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/verifiers/duplicate-order-error.js +27 -0
  159. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/verifiers/priority-stock-reservation.js +44 -0
  160. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/NOTES.md +34 -0
  161. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/expected.json +55 -0
  162. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/metadata.json +10 -0
  163. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/setup.sh +3 -0
  164. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/spec.md +52 -0
  165. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/task.txt +1 -0
  166. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/verifiers/duplicate-ticket-error.js +29 -0
  167. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/verifiers/priority-agent-assignment.js +48 -0
  168. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/NOTES.md +34 -0
  169. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/expected.json +55 -0
  170. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/metadata.json +10 -0
  171. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/setup.sh +3 -0
  172. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/spec.md +55 -0
  173. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/task.txt +1 -0
  174. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/verifiers/duplicate-return-error.js +43 -0
  175. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/verifiers/priority-return-routing.js +70 -0
  176. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/NOTES.md +37 -0
  177. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/expected.json +54 -0
  178. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/metadata.json +10 -0
  179. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/setup.sh +3 -0
  180. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/spec.md +59 -0
  181. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/task.txt +1 -0
  182. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/verifiers/credit-ledger-priority.js +98 -0
  183. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/verifiers/duplicate-charge-error.js +38 -0
  184. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/NOTES.md +36 -0
  185. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/expected.json +56 -0
  186. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/metadata.json +10 -0
  187. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/setup.sh +3 -0
  188. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/spec.md +59 -0
  189. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/task.txt +1 -0
  190. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/verifiers/duplicate-refund-error.js +41 -0
  191. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/verifiers/priority-refund-ledger.js +65 -0
  192. package/bin/devlyn.js +210 -17
  193. package/config/skills/_shared/adapters/README.md +3 -0
  194. package/config/skills/_shared/adapters/gpt-5-5.md +5 -1
  195. package/config/skills/_shared/adapters/opus-4-7.md +9 -1
  196. package/config/skills/_shared/archive_run.py +78 -6
  197. package/config/skills/_shared/codex-config.md +3 -2
  198. package/config/skills/_shared/codex-monitored.sh +46 -1
  199. package/config/skills/_shared/collect-codex-findings.py +20 -5
  200. package/config/skills/_shared/engine-preflight.md +1 -1
  201. package/config/skills/_shared/runtime-principles.md +5 -8
  202. package/config/skills/_shared/spec-verify-check.py +2664 -107
  203. package/config/skills/_shared/verify-merge-findings.py +1369 -19
  204. package/config/skills/devlyn:ideate/SKILL.md +7 -4
  205. package/config/skills/devlyn:ideate/references/elicitation.md +50 -4
  206. package/config/skills/devlyn:ideate/references/from-spec-mode.md +26 -4
  207. package/config/skills/devlyn:ideate/references/project-mode.md +20 -1
  208. package/config/skills/devlyn:ideate/references/spec-template.md +10 -1
  209. package/config/skills/devlyn:resolve/SKILL.md +49 -18
  210. package/config/skills/devlyn:resolve/references/free-form-mode.md +15 -0
  211. package/config/skills/devlyn:resolve/references/phases/build-gate.md +2 -2
  212. package/config/skills/devlyn:resolve/references/phases/probe-derive.md +74 -2
  213. package/config/skills/devlyn:resolve/references/phases/verify.md +62 -28
  214. package/config/skills/devlyn:resolve/references/state-schema.md +7 -4
  215. package/package.json +47 -2
  216. package/scripts/lint-fixtures.sh +349 -0
  217. package/scripts/lint-shadow-fixtures.sh +58 -0
  218. package/scripts/lint-skills.sh +3642 -92
  219. /package/{optional-skills → config/skills}/devlyn:design-ui/SKILL.md +0 -0
@@ -0,0 +1,35 @@
1
+ # F30 - Notes
2
+
3
+ ## Failure Mode
4
+
5
+ This fixture detects payment-style implementations that pass simple happy-path
6
+ tests while double-applying duplicate operations, letting rejected operations
7
+ consume available credit, failing to release holds on capture/release, or
8
+ mutating input files while computing settlement state.
9
+
10
+ ## Pipeline Phases
11
+
12
+ It stresses IMPLEMENT and VERIFY. The visible spec names the state transition
13
+ rules, idempotency output shape, and exact account summary; hidden verifiers
14
+ combine those rules so a one-axis implementation is not enough.
15
+
16
+ ## Why Existing Fixtures Do Not Cover This
17
+
18
+ F16 covers quote math, F23 covers warehouse allocation rollback, F25 covers cart
19
+ promotions, and F28 return authorization was rejected after corrected scoring
20
+ showed solo saturation. None focus on duplicate operation idempotency plus
21
+ credit-hold mutation, capture/release transitions, and validation immutability.
22
+
23
+ ## Retirement Criteria
24
+
25
+ Retire or rotate this fixture if both `solo_claude` and the selected pair arm
26
+ score near the ceiling for two shipped versions, or if another fixture covers
27
+ idempotent financial hold mutation with clearer pair headroom.
28
+
29
+ ## Headroom Status
30
+
31
+ Retired after headroom run `20260511-f30-headroom-v1`: bare 33 /
32
+ solo_claude 98, headroom FAIL because `solo_claude score 98 > 80`.
33
+
34
+ Do not count F30 as pair-lift evidence. Rework the visible contract or hidden
35
+ verifiers before spending pair arms on this idea again.
@@ -0,0 +1,12 @@
1
+ # F30 retired
2
+
3
+ Retired from the active golden suite after headroom run
4
+ `20260511-f30-headroom-v1`.
5
+
6
+ Reason: `solo_claude` scored 98, exceeding the headroom ceiling of 80, while
7
+ bare scored 33. The fixture is useful as a record of an idempotent hold
8
+ settlement candidate that proved too explicit for solo, but it is not
9
+ pair-lift evidence.
10
+
11
+ Future use: rework the visible contract or hidden verifiers so the task creates
12
+ a fair pair-risk-probe gap without hiding requirements from the spec.
@@ -0,0 +1,58 @@
1
+ {
2
+ "verification_commands": [
3
+ {
4
+ "cmd": "node --test tests/cli.test.js",
5
+ "exit_code": 0,
6
+ "stdout_contains": [],
7
+ "stdout_not_contains": ["not ok "]
8
+ },
9
+ {
10
+ "cmd": "node \"$BENCH_FIXTURE_DIR/verifiers/mixed-idempotent-settlement.js\"",
11
+ "exit_code": 0,
12
+ "stdout_contains": ["\"ok\":true"],
13
+ "stdout_not_contains": [],
14
+ "contract_refs": [
15
+ "A failed authorization does not reserve credit and does not block a later valid authorization.",
16
+ "Capture removes the active hold and increases `balance_cents`.",
17
+ "Release removes the active hold without changing `balance_cents`.",
18
+ "Duplicate operation ids do not mutate state and report the original status."
19
+ ]
20
+ },
21
+ {
22
+ "cmd": "node \"$BENCH_FIXTURE_DIR/verifiers/rejection-boundaries.js\"",
23
+ "exit_code": 0,
24
+ "stdout_contains": ["\"ok\":true"],
25
+ "stdout_not_contains": [],
26
+ "contract_refs": [
27
+ "Reusing an active `hold_id` is rejected as `\"duplicate_hold\"`.",
28
+ "Capture or release with a wrong amount is rejected as `\"amount_mismatch\"` and does not mutate state.",
29
+ "Invalid unknown-account input exits `2`, prints one JSON error object to stderr, and prints no stdout.",
30
+ "The input file contents are unchanged after settlement."
31
+ ]
32
+ }
33
+ ],
34
+ "forbidden_patterns": [
35
+ {
36
+ "pattern": "catch\\s*\\([^)]*\\)\\s*\\{[^}]*return\\s+(null|undefined|''|\\{\\})",
37
+ "description": "silent catch returning fallback in credit hold settlement path",
38
+ "files": ["bin/cli.js"],
39
+ "severity": "disqualifier"
40
+ },
41
+ {
42
+ "pattern": "catch\\s*\\([^)]*\\)\\s*\\{\\s*\\}",
43
+ "description": "empty catch block",
44
+ "files": ["bin/cli.js"],
45
+ "severity": "disqualifier"
46
+ },
47
+ {
48
+ "pattern": "/\\*\\s*eslint-disable",
49
+ "description": "eslint-disable without scoped justification",
50
+ "files": ["bin/cli.js"],
51
+ "severity": "disqualifier"
52
+ }
53
+ ],
54
+ "required_files": ["bin/cli.js", "tests/cli.test.js"],
55
+ "forbidden_files": [],
56
+ "max_deps_added": 0,
57
+ "spec_output_files": ["bin/cli.js", "tests/cli.test.js"]
58
+ }
@@ -0,0 +1,10 @@
1
+ {
2
+ "id": "F30-cli-credit-hold-settlement",
3
+ "category": "high-risk",
4
+ "difficulty": "high",
5
+ "timeout_seconds": 1500,
6
+ "required_tools": ["node"],
7
+ "browser": false,
8
+ "deps_change_expected": false,
9
+ "intent": "Add a bench-cli settle-holds command that reads account credit holds and operations, applies authorization/capture/release with idempotency and rollback, and emits exact cents-based settlement state without mutating input."
10
+ }
@@ -0,0 +1,73 @@
1
+ ---
2
+ id: "F30-cli-credit-hold-settlement"
3
+ title: "Credit hold settlement"
4
+ status: planned
5
+ complexity: high
6
+ depends-on: []
7
+ ---
8
+
9
+ # F30 Credit hold settlement
10
+
11
+ ## Context
12
+
13
+ `bench-cli` currently has greeting and version commands only. The task:
14
+ add a `settle-holds` command that reads account credit holds and operations,
15
+ applies authorization/capture/release with idempotency and rollback, and emits
16
+ exact cents-based settlement state without mutating input.
17
+
18
+ Credit holds feed payment and ledger workflows, so duplicate operations,
19
+ failed operations, and available-credit calculations must be deterministic.
20
+
21
+ ## Requirements
22
+
23
+ - [ ] `bench-cli settle-holds --input <path>` reads JSON shaped as `{ "accounts": Array<Account>, "operations": Array<Operation> }`.
24
+ - [ ] Each account has `{ "id": string, "balance_cents": number, "credit_limit_cents": number }`.
25
+ - [ ] Each operation has `{ "id": string, "account_id": string, "type": "authorize" | "capture" | "release", "hold_id": string, "amount_cents": number }`.
26
+ - [ ] Validate before settlement: ids and hold ids must be non-empty strings, account ids must be unique, balances and credit limits must be non-negative integers, amount cents must be positive integers, operation types must be one of the allowed strings, and every operation account must exist.
27
+ - [ ] Invalid input exits `2`, writes exactly one JSON error object to stderr, and writes nothing to stdout.
28
+ - [ ] Business rejections do not exit non-zero and do not mutate settlement state.
29
+ - [ ] Process operations in input order.
30
+ - [ ] An `authorize` operation creates one active hold when `credit_limit_cents - balance_cents - active_hold_cents >= amount_cents`; otherwise it is rejected with reason `"insufficient_credit"`.
31
+ - [ ] An `authorize` operation for a `hold_id` that is already active is rejected with reason `"duplicate_hold"`.
32
+ - [ ] A `capture` operation requires an active hold for the same account and exactly the requested amount available on that hold; otherwise it is rejected with reason `"unknown_hold"` or `"amount_mismatch"`.
33
+ - [ ] An approved `capture` increases `balance_cents` by `amount_cents` and removes the active hold.
34
+ - [ ] A `release` operation requires an active hold for the same account and exactly the requested amount available on that hold; otherwise it is rejected with reason `"unknown_hold"` or `"amount_mismatch"`.
35
+ - [ ] An approved `release` removes the active hold without changing `balance_cents`.
36
+ - [ ] Duplicate operation ids are idempotent: the first occurrence is processed normally; each later operation with the same `id` must not mutate state and emits `{ "id": string, "status": "duplicate", "original_status": "approved" | "rejected" }`.
37
+ - [ ] Approved rows have keys `id`, `status`, `type`, `account_id`, `hold_id`, `amount_cents`.
38
+ - [ ] Rejected rows have keys `id`, `status`, `reason`.
39
+ - [ ] Duplicate rows have keys `id`, `status`, `original_status`.
40
+ - [ ] On success, write exactly one JSON object to stdout and no stderr. Keys: `results`, `accounts`.
41
+ - [ ] `results` is ordered by input operation order.
42
+ - [ ] `accounts` is sorted by account id. Each account row has keys `id`, `balance_cents`, `active_hold_cents`, `available_cents`.
43
+ - [ ] The command must not modify the input file.
44
+ - [ ] `tests/cli.test.js` is updated. Existing tests still pass AND at least two new tests cover one mixed approved/rejected/duplicate settlement and one validation failure.
45
+
46
+ ## Constraints
47
+
48
+ - **No new npm dependencies.**
49
+ - **No floating-money output.** All public amounts are integer cents.
50
+ - **No silent catches.** Invalid input and file-read failures must surface as JSON errors with exit `2`.
51
+ - **No extra stdout/stderr text** on the success path; downstream tooling parses stdout as JSON.
52
+ - **Touch only `bin/cli.js` and `tests/cli.test.js`.**
53
+
54
+ ## Out of Scope
55
+
56
+ - Payment processor calls.
57
+ - Persistence beyond stdout.
58
+ - Partial captures or partial releases.
59
+ - Currencies, interest, fees, or statement generation.
60
+ - Touching `server/`, `web/`, or `tests/server.test.js`.
61
+
62
+ ## Verification
63
+
64
+ - `node --test tests/cli.test.js` exits 0.
65
+ - A failed authorization does not reserve credit and does not block a later valid authorization.
66
+ - Capture removes the active hold and increases `balance_cents`.
67
+ - Release removes the active hold without changing `balance_cents`.
68
+ - Duplicate operation ids do not mutate state and report the original status.
69
+ - Reusing an active `hold_id` is rejected as `"duplicate_hold"`.
70
+ - Capture or release with a wrong amount is rejected as `"amount_mismatch"` and does not mutate state.
71
+ - Invalid unknown-account input exits `2`, prints one JSON error object to stderr, and prints no stdout.
72
+ - The input file contents are unchanged after settlement.
73
+ - `git diff --stat` shows only `bin/cli.js` and `tests/cli.test.js` touched.
@@ -0,0 +1,17 @@
1
+ Add a bench-cli settle-holds command that reads account credit holds and
2
+ operations, applies authorization/capture/release with idempotency and rollback,
3
+ and emits exact cents-based settlement state without mutating input.
4
+
5
+ The command should read `--input <path>` JSON with accounts and operations.
6
+ Validate the input before settlement. Use integer cents only, write exactly one
7
+ JSON object to stdout on success, and write exactly one JSON error object to
8
+ stderr with exit code 2 for invalid input. Do not add npm dependencies.
9
+
10
+ Business rejections should stay on exit code 0. Process operations in input
11
+ order. Authorizations create active holds only when available credit is enough.
12
+ Captures and releases require an active hold for the same account and exact
13
+ amount. Failed operations must not mutate state. Duplicate operation ids are
14
+ idempotent: only the first occurrence mutates state, and later duplicates report
15
+ the original approved/rejected status.
16
+
17
+ Update the CLI tests. Touch only `bin/cli.js` and `tests/cli.test.js`.
@@ -0,0 +1,53 @@
1
+ 'use strict';
2
+
3
+ const assert = require('node:assert');
4
+ const { execFileSync } = require('node:child_process');
5
+ const fs = require('node:fs');
6
+ const os = require('node:os');
7
+ const path = require('node:path');
8
+
9
+ const work = process.env.BENCH_WORKDIR || process.cwd();
10
+ const cli = path.join(work, 'bin', 'cli.js');
11
+ const tmp = fs.mkdtempSync(path.join(os.tmpdir(), 'f30-mixed-'));
12
+ const input = path.join(tmp, 'holds.json');
13
+
14
+ fs.writeFileSync(input, JSON.stringify({
15
+ accounts: [
16
+ { id: 'acct-a', balance_cents: 2000, credit_limit_cents: 10000 },
17
+ { id: 'acct-b', balance_cents: 0, credit_limit_cents: 5000 }
18
+ ],
19
+ operations: [
20
+ { id: 'op-auth-1', account_id: 'acct-a', type: 'authorize', hold_id: 'h-1', amount_cents: 5000 },
21
+ { id: 'op-too-large', account_id: 'acct-a', type: 'authorize', hold_id: 'h-2', amount_cents: 4000 },
22
+ { id: 'op-release', account_id: 'acct-a', type: 'release', hold_id: 'h-1', amount_cents: 5000 },
23
+ { id: 'op-auth-3', account_id: 'acct-a', type: 'authorize', hold_id: 'h-3', amount_cents: 3000 },
24
+ { id: 'op-capture', account_id: 'acct-a', type: 'capture', hold_id: 'h-3', amount_cents: 3000 },
25
+ { id: 'op-capture', account_id: 'acct-a', type: 'capture', hold_id: 'h-3', amount_cents: 3000 },
26
+ { id: 'op-auth-b', account_id: 'acct-b', type: 'authorize', hold_id: 'h-b', amount_cents: 5000 }
27
+ ]
28
+ }), 'utf8');
29
+
30
+ const stdout = execFileSync('node', [cli, 'settle-holds', '--input', input], {
31
+ cwd: work,
32
+ encoding: 'utf8',
33
+ stdio: ['ignore', 'pipe', 'pipe']
34
+ });
35
+ const parsed = JSON.parse(stdout);
36
+
37
+ assert.deepStrictEqual(parsed, {
38
+ results: [
39
+ { id: 'op-auth-1', status: 'approved', type: 'authorize', account_id: 'acct-a', hold_id: 'h-1', amount_cents: 5000 },
40
+ { id: 'op-too-large', status: 'rejected', reason: 'insufficient_credit' },
41
+ { id: 'op-release', status: 'approved', type: 'release', account_id: 'acct-a', hold_id: 'h-1', amount_cents: 5000 },
42
+ { id: 'op-auth-3', status: 'approved', type: 'authorize', account_id: 'acct-a', hold_id: 'h-3', amount_cents: 3000 },
43
+ { id: 'op-capture', status: 'approved', type: 'capture', account_id: 'acct-a', hold_id: 'h-3', amount_cents: 3000 },
44
+ { id: 'op-capture', status: 'duplicate', original_status: 'approved' },
45
+ { id: 'op-auth-b', status: 'approved', type: 'authorize', account_id: 'acct-b', hold_id: 'h-b', amount_cents: 5000 }
46
+ ],
47
+ accounts: [
48
+ { id: 'acct-a', balance_cents: 5000, active_hold_cents: 0, available_cents: 5000 },
49
+ { id: 'acct-b', balance_cents: 0, active_hold_cents: 5000, available_cents: 0 }
50
+ ]
51
+ });
52
+
53
+ console.log(JSON.stringify({ ok: true }));
@@ -0,0 +1,74 @@
1
+ 'use strict';
2
+
3
+ const assert = require('node:assert');
4
+ const { spawnSync } = require('node:child_process');
5
+ const fs = require('node:fs');
6
+ const os = require('node:os');
7
+ const path = require('node:path');
8
+
9
+ const work = process.env.BENCH_WORKDIR || process.cwd();
10
+ const cli = path.join(work, 'bin', 'cli.js');
11
+
12
+ function runPayload(label, payload) {
13
+ const tmp = fs.mkdtempSync(path.join(os.tmpdir(), `f30-${label}-`));
14
+ const input = path.join(tmp, 'holds.json');
15
+ const original = JSON.stringify(payload, null, 2);
16
+ fs.writeFileSync(input, original, 'utf8');
17
+ const result = spawnSync('node', [cli, 'settle-holds', '--input', input], {
18
+ cwd: work,
19
+ encoding: 'utf8'
20
+ });
21
+ assert.strictEqual(fs.readFileSync(input, 'utf8'), original, `${label}: input mutated`);
22
+ return result;
23
+ }
24
+
25
+ const boundary = runPayload('boundary', {
26
+ accounts: [
27
+ { id: 'acct-a', balance_cents: 1000, credit_limit_cents: 7000 }
28
+ ],
29
+ operations: [
30
+ { id: 'op-auth', account_id: 'acct-a', type: 'authorize', hold_id: 'h-1', amount_cents: 3000 },
31
+ { id: 'op-dupe-hold', account_id: 'acct-a', type: 'authorize', hold_id: 'h-1', amount_cents: 1000 },
32
+ { id: 'op-bad-capture', account_id: 'acct-a', type: 'capture', hold_id: 'h-1', amount_cents: 2000 },
33
+ { id: 'op-release', account_id: 'acct-a', type: 'release', hold_id: 'h-1', amount_cents: 3000 },
34
+ { id: 'op-bad-release', account_id: 'acct-a', type: 'release', hold_id: 'h-1', amount_cents: 3000 },
35
+ { id: 'op-after', account_id: 'acct-a', type: 'authorize', hold_id: 'h-2', amount_cents: 6000 },
36
+ { id: 'op-dupe-reject', account_id: 'acct-a', type: 'authorize', hold_id: 'h-3', amount_cents: 1 },
37
+ { id: 'op-dupe-reject', account_id: 'acct-a', type: 'authorize', hold_id: 'h-4', amount_cents: 1 }
38
+ ]
39
+ });
40
+
41
+ assert.strictEqual(boundary.status, 0);
42
+ assert.strictEqual(boundary.stderr, '');
43
+ assert.deepStrictEqual(JSON.parse(boundary.stdout), {
44
+ results: [
45
+ { id: 'op-auth', status: 'approved', type: 'authorize', account_id: 'acct-a', hold_id: 'h-1', amount_cents: 3000 },
46
+ { id: 'op-dupe-hold', status: 'rejected', reason: 'duplicate_hold' },
47
+ { id: 'op-bad-capture', status: 'rejected', reason: 'amount_mismatch' },
48
+ { id: 'op-release', status: 'approved', type: 'release', account_id: 'acct-a', hold_id: 'h-1', amount_cents: 3000 },
49
+ { id: 'op-bad-release', status: 'rejected', reason: 'unknown_hold' },
50
+ { id: 'op-after', status: 'approved', type: 'authorize', account_id: 'acct-a', hold_id: 'h-2', amount_cents: 6000 },
51
+ { id: 'op-dupe-reject', status: 'rejected', reason: 'insufficient_credit' },
52
+ { id: 'op-dupe-reject', status: 'duplicate', original_status: 'rejected' }
53
+ ],
54
+ accounts: [
55
+ { id: 'acct-a', balance_cents: 1000, active_hold_cents: 6000, available_cents: 0 }
56
+ ]
57
+ });
58
+
59
+ const invalid = runPayload('invalid', {
60
+ accounts: [
61
+ { id: 'acct-a', balance_cents: 0, credit_limit_cents: 1000 }
62
+ ],
63
+ operations: [
64
+ { id: 'op-missing-account', account_id: 'acct-missing', type: 'authorize', hold_id: 'h-1', amount_cents: 100 }
65
+ ]
66
+ });
67
+
68
+ assert.strictEqual(invalid.status, 2);
69
+ assert.strictEqual(invalid.stdout, '');
70
+ const err = JSON.parse(invalid.stderr);
71
+ assert.strictEqual(typeof err.error, 'string');
72
+ assert.notStrictEqual(err.error.length, 0);
73
+
74
+ console.log(JSON.stringify({ ok: true }));
@@ -0,0 +1,60 @@
1
+ # F9 — Notes
2
+
3
+ ## Purpose
4
+
5
+ **Load-bearing for the novice-user contract.** The suite ship-gate requires
6
+ F9 to pass (margin ≥ +5) on every shipped version. If F9 fails, the "type
7
+ `/devlyn:ideate` and ship worldclass software" promise is not being met.
8
+
9
+ ## What the variant arm does
10
+
11
+ A novice-simulating prompt (`task.txt` is identical to what the user typed)
12
+ is delivered to a fresh Claude session. The session has our skills installed.
13
+ The pipeline arm is expected to:
14
+
15
+ 1. Recognize this is a vague idea, not a spec → invoke `/devlyn:ideate`.
16
+ 2. Ideate produces `docs/VISION.md`, `docs/ROADMAP.md`, and
17
+ `docs/roadmap/phase-1/1.1-gitstats.md` (or similar).
18
+ 3. Run `/devlyn:auto-resolve` on the generated spec.
19
+ 4. Run `/devlyn:preflight` for verification.
20
+
21
+ The variant arm's prompt explicitly instructs this chain so we're not
22
+ relying on Claude to invent it. That's fair because the novice contract is
23
+ about the TOOLS being available + discoverable; the user in this benchmark
24
+ is already primed to use them.
25
+
26
+ ## What the bare arm does
27
+
28
+ Same raw task delivered as a direct prompt. Bare implements `gitstats`
29
+ using its own judgment. Bare does NOT produce VISION/ROADMAP documents
30
+ (and isn't expected to).
31
+
32
+ ## Why margin ≥ +5 is required
33
+
34
+ The pipeline's whole value prop is that it trades some bare-case tokens for
35
+ quality uplift on novice flows. If this fixture can't show ≥ +5 margin,
36
+ we're paying pipeline cost without delivering on the novice promise.
37
+
38
+ ## Scoring notes
39
+
40
+ - The variant's `docs/VISION.md` + `ROADMAP.md` + spec files ARE part of
41
+ the judge's evaluation. The judge sees the full product (code + docs +
42
+ roadmap state), not just the diff to `bin/cli.js`.
43
+ - Bare doesn't produce roadmap files, so bare's judge payload is
44
+ code+test only. This asymmetry is INTENTIONAL — the fixture tests
45
+ total-output quality, not per-file quality.
46
+
47
+ ## Failure modes detected
48
+
49
+ - **Pipeline skips ideate.** Variant goes straight to auto-resolve with a
50
+ vague spec → downstream implementation is weak. Caught by judge:
51
+ `docs/roadmap/` files missing.
52
+ - **Bare over-engineers.** Without a skeleton, bare builds too much,
53
+ touches wrong files, adds deps. Caught by spec constraints.
54
+ - **Pipeline ships "done" but preflight was a no-op.** If `.devlyn/PREFLIGHT-REPORT.md` exists but shows no commitment audit, something is broken upstream.
55
+
56
+ ## Rotation trigger
57
+
58
+ F9 is the last fixture we rotate — it's the anchor. If it saturates
59
+ (variant consistently > 95), the whole suite needs a harder novice-flow
60
+ anchor before we retire this one.
@@ -0,0 +1,29 @@
1
+ # RETIRED — F9-e2e-ideate-to-preflight
2
+
3
+ **Retired**: 2026-04-30 (iter-0033a)
4
+ **Replaced by**: `benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/`
5
+ **Source SHA**: `8d4d57f` (commit before the 2-skill-contract rename).
6
+
7
+ ## Why retired
8
+
9
+ The 2-skill redesign (Phases 1-3, iter-0029 / 0031 / 0032) replaced
10
+ `/devlyn:ideate` (greenfield) and folded `/devlyn:preflight` into
11
+ `/devlyn:resolve`'s VERIFY phase. The OLD F9 fixture's contract assumed
12
+ the 3-skill chain (`/devlyn:ideate` → `/devlyn:auto-resolve` →
13
+ `/devlyn:preflight`), which is unobtainable at HEAD post-iter-0032
14
+ because OLD ideate was deleted.
15
+
16
+ iter-0033a redesigned F9 to match the shipped 2-skill contract.
17
+
18
+ ## When to consult this archive
19
+
20
+ - Replaying a regression suspected from the OLD chain.
21
+ - Migrating a pre-2026-04-30 historical run record back to readable shape.
22
+ - Auditing what changed when the new fixture's measurements diverge from
23
+ pre-redesign baselines.
24
+
25
+ ## What lives here
26
+
27
+ The exact file contents of the F9 fixture as of `8d4d57f` (the last commit
28
+ before the rename). DO NOT use this directory as a live fixture — it is
29
+ not picked up by `run-suite.sh`. Restore-and-run requires a manual copy.
@@ -0,0 +1,73 @@
1
+ {
2
+ "verification_commands": [
3
+ {
4
+ "cmd": "node bin/cli.js gitstats",
5
+ "exit_code": 0,
6
+ "stdout_contains": [
7
+ "Commits:",
8
+ "Last commit:"
9
+ ],
10
+ "stdout_not_contains": [
11
+ "Error:"
12
+ ]
13
+ },
14
+ {
15
+ "cmd": "node bin/cli.js gitstats --json",
16
+ "exit_code": 0,
17
+ "stdout_contains": [
18
+ "{",
19
+ "commits",
20
+ "authors"
21
+ ],
22
+ "stdout_not_contains": []
23
+ },
24
+ {
25
+ "cmd": "cd /tmp && node -e 'const { spawnSync } = require(\"child_process\"); const p = process.env.BENCH_WORKDIR || process.cwd(); console.log(spawnSync(\"node\", [p + \"/bin/cli.js\", \"gitstats\"], { encoding: \"utf8\", cwd: \"/tmp\" }).status)'",
26
+ "exit_code": 0,
27
+ "stdout_contains": [
28
+ "2"
29
+ ],
30
+ "stdout_not_contains": [
31
+ "0"
32
+ ]
33
+ },
34
+ {
35
+ "cmd": "node bin/cli.js hello",
36
+ "exit_code": 0,
37
+ "stdout_contains": [
38
+ "Hello, world!"
39
+ ],
40
+ "stdout_not_contains": []
41
+ },
42
+ {
43
+ "cmd": "node --test tests/",
44
+ "exit_code": 0,
45
+ "stdout_contains": [],
46
+ "stdout_not_contains": []
47
+ }
48
+ ],
49
+ "forbidden_patterns": [
50
+ {
51
+ "pattern": "catch\\s*\\([^)]*\\)\\s*\\{\\s*\\}",
52
+ "description": "empty catch",
53
+ "files": [
54
+ "bin/cli.js"
55
+ ],
56
+ "severity": "disqualifier"
57
+ }
58
+ ],
59
+ "required_files": [
60
+ "bin/cli.js"
61
+ ],
62
+ "forbidden_files": [],
63
+ "max_deps_added": 0,
64
+ "tier_a_waivers": [
65
+ "docs/VISION.md",
66
+ "docs/ROADMAP.md",
67
+ "docs/roadmap/**"
68
+ ],
69
+ "spec_output_files": [
70
+ "bin/**",
71
+ "tests/**"
72
+ ]
73
+ }
@@ -0,0 +1,10 @@
1
+ {
2
+ "id": "F9-e2e-ideate-to-preflight",
3
+ "category": "e2e",
4
+ "difficulty": "high",
5
+ "timeout_seconds": 3600,
6
+ "required_tools": ["node"],
7
+ "browser": false,
8
+ "deps_change_expected": false,
9
+ "intent": "End-to-end novice flow: from a vague idea ('git stats CLI for the current repo') the variant must run /devlyn:ideate → /devlyn:auto-resolve → /devlyn:preflight to produce Vision/Roadmap + implemented code + preflight sign-off. The bare arm receives the same vague idea as a direct prompt. This fixture gates the novice-user contract."
10
+ }
@@ -0,0 +1,28 @@
1
+ #!/usr/bin/env bash
2
+ # F9 setup — seed a few synthetic commits with different authors so the
3
+ # `gitstats` subcommand's "top 3 authors by commit count" requirement is
4
+ # meaningfully exercised. Without this, every commit author is the runner's
5
+ # default and the ranking test is a no-op.
6
+ set -e
7
+
8
+ commit_as() {
9
+ local name="$1" email="$2" file="$3" message="$4"
10
+ echo "$(date +%s%N) $name" >> "$file"
11
+ git add "$file"
12
+ git -c user.name="$name" -c user.email="$email" commit -q -m "$message"
13
+ }
14
+
15
+ mkdir -p .bench-seed
16
+
17
+ commit_as "Alpha Author" "alpha@bench.test" .bench-seed/log "seed: alpha 1"
18
+ commit_as "Alpha Author" "alpha@bench.test" .bench-seed/log "seed: alpha 2"
19
+ commit_as "Alpha Author" "alpha@bench.test" .bench-seed/log "seed: alpha 3"
20
+ commit_as "Alpha Author" "alpha@bench.test" .bench-seed/log "seed: alpha 4"
21
+ commit_as "Beta Author" "beta@bench.test" .bench-seed/log "seed: beta 1"
22
+ commit_as "Beta Author" "beta@bench.test" .bench-seed/log "seed: beta 2"
23
+ commit_as "Beta Author" "beta@bench.test" .bench-seed/log "seed: beta 3"
24
+ commit_as "Gamma Author" "gamma@bench.test" .bench-seed/log "seed: gamma 1"
25
+ commit_as "Gamma Author" "gamma@bench.test" .bench-seed/log "seed: gamma 2"
26
+ commit_as "Delta Author" "delta@bench.test" .bench-seed/log "seed: delta 1"
27
+
28
+ echo "F9 setup: seeded 10 commits across 4 authors (Alpha 4 / Beta 3 / Gamma 2 / Delta 1)"
@@ -0,0 +1,58 @@
1
+ ---
2
+ id: "F9-e2e-ideate-to-preflight"
3
+ title: "End-to-end: idea → shipped CLI feature"
4
+ status: planned
5
+ complexity: high
6
+ depends-on: []
7
+ ---
8
+
9
+ # F9 End-to-End Novice Flow
10
+
11
+ ## Context
12
+
13
+ A first-time user has a vague idea:
14
+
15
+ > "I want a CLI subcommand that shows basic stats about the current git repo — commit count, last commit date, top 3 authors. Call it `gitstats`."
16
+
17
+ The variant arm is expected to use the pipeline: `/devlyn:ideate` to
18
+ produce a VISION/ROADMAP + spec, then `/devlyn:auto-resolve` to implement
19
+ per spec, then `/devlyn:preflight` to verify. The bare arm receives the
20
+ same idea as a direct prompt and implements it without the pipeline.
21
+
22
+ This fixture is the suite's most important gate for the "novice user contract":
23
+ a first-time user typing `/devlyn:ideate` should land at working, well-structured software.
24
+
25
+ ## Requirements
26
+
27
+ - [ ] A new `gitstats` subcommand exists in `bin/cli.js`.
28
+ - [ ] `node bin/cli.js gitstats` (run inside a git repo) prints:
29
+ - Line 1: commit count (e.g., `Commits: 42`).
30
+ - Line 2: last commit ISO date (e.g., `Last commit: 2026-04-23T12:00:00Z`).
31
+ - Lines 3-5: top 3 authors by commit count, format `<rank>. <name> <count>`.
32
+ - [ ] Run outside a git repo → stderr message `Error: not a git repository` and exit 2.
33
+ - [ ] `node bin/cli.js gitstats --json` emits valid JSON with the same data.
34
+ - [ ] Existing subcommands (`hello`, `version`) unchanged.
35
+ - [ ] Add at least one test.
36
+ - [ ] For variant: a `docs/VISION.md`, `docs/ROADMAP.md`, and a `docs/roadmap/phase-1/` spec file must exist after the run (evidence the ideate stage happened).
37
+
38
+ ## Constraints
39
+
40
+ - **No new npm dependencies.** Use `child_process` to shell out to `git`.
41
+ - **No silent catches.**
42
+ - **Non-git-repo handling.** Do not assume the user is always in a repo.
43
+
44
+ - **Lifecycle note.** The harness's DOCS phase flips this spec's frontmatter `status` after implementation completes — that is benchmark lifecycle bookkeeping, not a scope violation.
45
+
46
+ ## Out of Scope
47
+
48
+ - Parsing commit messages, tags, branches.
49
+ - Remote API calls.
50
+ - Touching `server/` or `web/`.
51
+
52
+ ## Verification
53
+
54
+ - Inside this worktree (which IS a git repo): `node bin/cli.js gitstats` exits 0 and prints at least 5 lines of summary.
55
+ - `node bin/cli.js gitstats --json | node -e 'const d=JSON.parse(require("fs").readFileSync(0,"utf8")); console.log(typeof d.commits)'` prints `number`.
56
+ - `cd /tmp && node <worktree>/bin/cli.js gitstats` (from outside a repo — use the worktree's absolute path) exits 2.
57
+ - For variant: `test -f docs/VISION.md && test -f docs/ROADMAP.md && ls docs/roadmap/phase-1/*.md | head -1`.
58
+ - `node --test tests/` passes.
@@ -0,0 +1,5 @@
1
+ I want a CLI subcommand that shows basic stats about the current git repo — commit count, last commit date, top 3 authors. Call it `gitstats`.
2
+
3
+ Should work inside this repo when I run `node bin/cli.js gitstats`, and fail cleanly if I'm not in a git repo. A `--json` flag for machine-readable output would be useful too.
4
+
5
+ Keep the existing `hello` and `version` subcommands working. Add a test. No new npm dependencies.