devlyn-cli 1.15.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (158) hide show
  1. package/AGENTS.md +104 -0
  2. package/CLAUDE.md +135 -21
  3. package/README.md +43 -125
  4. package/benchmark/auto-resolve/BENCHMARK-DESIGN.md +272 -0
  5. package/benchmark/auto-resolve/README.md +114 -0
  6. package/benchmark/auto-resolve/RUBRIC.md +162 -0
  7. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/NOTES.md +30 -0
  8. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/expected.json +68 -0
  9. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/metadata.json +10 -0
  10. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/setup.sh +4 -0
  11. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/spec.md +45 -0
  12. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/task.txt +8 -0
  13. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/NOTES.md +54 -0
  14. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected-pair-plan-registry.json +170 -0
  15. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected.json +84 -0
  16. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/metadata.json +21 -0
  17. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/pair-plan.sample-fail.json +214 -0
  18. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/pair-plan.sample-pass.json +223 -0
  19. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/setup.sh +5 -0
  20. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/spec.md +56 -0
  21. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/task.txt +14 -0
  22. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/NOTES.md +28 -0
  23. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected-pair-plan-registry.json +162 -0
  24. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected.json +65 -0
  25. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/metadata.json +19 -0
  26. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/setup.sh +4 -0
  27. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/spec.md +56 -0
  28. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/task.txt +9 -0
  29. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/NOTES.md +40 -0
  30. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/expected.json +57 -0
  31. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/metadata.json +10 -0
  32. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/setup.sh +6 -0
  33. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/spec.md +49 -0
  34. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/task.txt +9 -0
  35. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/NOTES.md +38 -0
  36. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/expected.json +65 -0
  37. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/metadata.json +10 -0
  38. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/setup.sh +55 -0
  39. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/spec.md +49 -0
  40. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/task.txt +7 -0
  41. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/NOTES.md +38 -0
  42. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/expected.json +77 -0
  43. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/metadata.json +10 -0
  44. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/setup.sh +4 -0
  45. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/spec.md +49 -0
  46. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/task.txt +10 -0
  47. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/NOTES.md +50 -0
  48. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/expected.json +76 -0
  49. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/metadata.json +10 -0
  50. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/setup.sh +36 -0
  51. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/spec.md +46 -0
  52. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/task.txt +7 -0
  53. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/NOTES.md +50 -0
  54. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/expected.json +63 -0
  55. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/metadata.json +10 -0
  56. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/setup.sh +4 -0
  57. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/spec.md +48 -0
  58. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/task.txt +1 -0
  59. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/NOTES.md +93 -0
  60. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/expected.json +74 -0
  61. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/metadata.json +10 -0
  62. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/setup.sh +28 -0
  63. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/spec.md +62 -0
  64. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/task.txt +5 -0
  65. package/benchmark/auto-resolve/fixtures/SCHEMA.md +130 -0
  66. package/benchmark/auto-resolve/fixtures/test-repo/README.md +27 -0
  67. package/benchmark/auto-resolve/fixtures/test-repo/bin/cli.js +63 -0
  68. package/benchmark/auto-resolve/fixtures/test-repo/package-lock.json +823 -0
  69. package/benchmark/auto-resolve/fixtures/test-repo/package.json +22 -0
  70. package/benchmark/auto-resolve/fixtures/test-repo/playwright.config.js +17 -0
  71. package/benchmark/auto-resolve/fixtures/test-repo/server/index.js +37 -0
  72. package/benchmark/auto-resolve/fixtures/test-repo/tests/cli.test.js +25 -0
  73. package/benchmark/auto-resolve/fixtures/test-repo/tests/server.test.js +58 -0
  74. package/benchmark/auto-resolve/fixtures/test-repo/web/index.html +37 -0
  75. package/benchmark/auto-resolve/scripts/build-pair-eligible-manifest.py +174 -0
  76. package/benchmark/auto-resolve/scripts/check-f9-artifacts.py +256 -0
  77. package/benchmark/auto-resolve/scripts/compile-report.py +331 -0
  78. package/benchmark/auto-resolve/scripts/iter-0033c-compare.py +552 -0
  79. package/benchmark/auto-resolve/scripts/judge-opus-pass.sh +430 -0
  80. package/benchmark/auto-resolve/scripts/judge.sh +359 -0
  81. package/benchmark/auto-resolve/scripts/oracle-scope-tier-a.py +260 -0
  82. package/benchmark/auto-resolve/scripts/oracle-scope-tier-b.py +274 -0
  83. package/benchmark/auto-resolve/scripts/oracle-test-fidelity.py +328 -0
  84. package/benchmark/auto-resolve/scripts/pair-plan-idgen.py +401 -0
  85. package/benchmark/auto-resolve/scripts/pair-plan-lint.py +468 -0
  86. package/benchmark/auto-resolve/scripts/run-fixture.sh +691 -0
  87. package/benchmark/auto-resolve/scripts/run-iter-0033c.sh +234 -0
  88. package/benchmark/auto-resolve/scripts/run-suite.sh +214 -0
  89. package/benchmark/auto-resolve/scripts/ship-gate.py +222 -0
  90. package/bin/devlyn.js +175 -17
  91. package/config/skills/_shared/adapters/README.md +64 -0
  92. package/config/skills/_shared/adapters/gpt-5-5.md +29 -0
  93. package/config/skills/_shared/adapters/opus-4-7.md +29 -0
  94. package/config/skills/{devlyn:auto-resolve/scripts → _shared}/archive_run.py +26 -0
  95. package/config/skills/_shared/codex-config.md +54 -0
  96. package/config/skills/_shared/codex-monitored.sh +141 -0
  97. package/config/skills/_shared/engine-preflight.md +35 -0
  98. package/config/skills/_shared/expected.schema.json +93 -0
  99. package/config/skills/_shared/pair-plan-schema.md +298 -0
  100. package/config/skills/_shared/runtime-principles.md +110 -0
  101. package/config/skills/_shared/spec-verify-check.py +519 -0
  102. package/config/skills/devlyn:ideate/SKILL.md +99 -429
  103. package/config/skills/devlyn:ideate/references/elicitation.md +97 -0
  104. package/config/skills/devlyn:ideate/references/from-spec-mode.md +54 -0
  105. package/config/skills/devlyn:ideate/references/project-mode.md +76 -0
  106. package/config/skills/devlyn:ideate/references/spec-template.md +102 -0
  107. package/config/skills/devlyn:resolve/SKILL.md +172 -184
  108. package/config/skills/devlyn:resolve/references/free-form-mode.md +68 -0
  109. package/config/skills/devlyn:resolve/references/phases/build-gate.md +45 -0
  110. package/config/skills/devlyn:resolve/references/phases/cleanup.md +39 -0
  111. package/config/skills/devlyn:resolve/references/phases/implement.md +42 -0
  112. package/config/skills/devlyn:resolve/references/phases/plan.md +42 -0
  113. package/config/skills/devlyn:resolve/references/phases/verify.md +69 -0
  114. package/config/skills/devlyn:resolve/references/state-schema.md +106 -0
  115. package/{config/skills → optional-skills}/devlyn:design-system/SKILL.md +1 -0
  116. package/{config/skills → optional-skills}/devlyn:reap/SKILL.md +1 -0
  117. package/{config/skills → optional-skills}/devlyn:team-design-ui/SKILL.md +5 -0
  118. package/package.json +12 -2
  119. package/scripts/lint-skills.sh +431 -0
  120. package/config/skills/devlyn:auto-resolve/SKILL.md +0 -252
  121. package/config/skills/devlyn:auto-resolve/evals/evals.json +0 -21
  122. package/config/skills/devlyn:auto-resolve/evals/task-doctor-subcommand.md +0 -42
  123. package/config/skills/devlyn:auto-resolve/references/build-gate.md +0 -130
  124. package/config/skills/devlyn:auto-resolve/references/engine-routing.md +0 -82
  125. package/config/skills/devlyn:auto-resolve/references/findings-schema.md +0 -103
  126. package/config/skills/devlyn:auto-resolve/references/phases/phase-1-build.md +0 -54
  127. package/config/skills/devlyn:auto-resolve/references/phases/phase-2-evaluate.md +0 -45
  128. package/config/skills/devlyn:auto-resolve/references/phases/phase-3-critic.md +0 -84
  129. package/config/skills/devlyn:auto-resolve/references/pipeline-routing.md +0 -114
  130. package/config/skills/devlyn:auto-resolve/references/pipeline-state.md +0 -201
  131. package/config/skills/devlyn:auto-resolve/scripts/terminal_verdict.py +0 -96
  132. package/config/skills/devlyn:browser-validate/SKILL.md +0 -164
  133. package/config/skills/devlyn:browser-validate/references/flow-testing.md +0 -118
  134. package/config/skills/devlyn:browser-validate/references/tier1-chrome.md +0 -137
  135. package/config/skills/devlyn:browser-validate/references/tier2-playwright.md +0 -195
  136. package/config/skills/devlyn:browser-validate/references/tier3-curl.md +0 -57
  137. package/config/skills/devlyn:clean/SKILL.md +0 -285
  138. package/config/skills/devlyn:design-ui/SKILL.md +0 -351
  139. package/config/skills/devlyn:discover-product/SKILL.md +0 -124
  140. package/config/skills/devlyn:evaluate/SKILL.md +0 -564
  141. package/config/skills/devlyn:feature-spec/SKILL.md +0 -630
  142. package/config/skills/devlyn:ideate/references/challenge-rubric.md +0 -122
  143. package/config/skills/devlyn:ideate/references/codex-critic-template.md +0 -42
  144. package/config/skills/devlyn:ideate/references/templates/item-spec.md +0 -90
  145. package/config/skills/devlyn:implement-ui/SKILL.md +0 -466
  146. package/config/skills/devlyn:preflight/SKILL.md +0 -355
  147. package/config/skills/devlyn:preflight/references/auditors/browser-auditor.md +0 -32
  148. package/config/skills/devlyn:preflight/references/auditors/code-auditor.md +0 -86
  149. package/config/skills/devlyn:preflight/references/auditors/docs-auditor.md +0 -38
  150. package/config/skills/devlyn:product-spec/SKILL.md +0 -603
  151. package/config/skills/devlyn:recommend-features/SKILL.md +0 -286
  152. package/config/skills/devlyn:review/SKILL.md +0 -161
  153. package/config/skills/devlyn:team-resolve/SKILL.md +0 -631
  154. package/config/skills/devlyn:team-review/SKILL.md +0 -493
  155. package/config/skills/devlyn:update-docs/SKILL.md +0 -463
  156. package/config/skills/workflow-routing/SKILL.md +0 -73
  157. /package/{config/skills → optional-skills}/devlyn:reap/scripts/reap.sh +0 -0
  158. /package/{config/skills → optional-skills}/devlyn:reap/scripts/scan.sh +0 -0
@@ -0,0 +1,223 @@
1
+ {
2
+ "accepted_invariants": [
3
+ {
4
+ "authority": "expected.json/forbidden_patterns",
5
+ "id": "forbidden_pattern__silent_catch_returning_a_fallback_value_violates_no_silent_c__bin_cli_js",
6
+ "operational_check": "variant arm output MUST NOT contain regex pattern \"catch\\\\s*\\\\([^)]*\\\\)\\\\s*\\\\{[^}]*return\\\\s+(\\\\[\\\\]|null|undefined|\\\\{|false|'')\" in files ['bin/cli.js']; rationale: silent catch returning a fallback value — violates no-silent-catches policy",
7
+ "paraphrase": "variant arm output MUST NOT contain regex pattern \"catch\\\\s*\\\\([^)]*\\\\)\\\\s*\\\\{[^}]*return\\\\s+(\\\\[\\\\]|null|undefined|\\\\{|",
8
+ "source_refs": [
9
+ "expected.json:forbidden_patterns[0]"
10
+ ]
11
+ },
12
+ {
13
+ "authority": "expected.json/forbidden_patterns",
14
+ "id": "forbidden_pattern__ts_ignore_escape_hatch__bin_cli_js",
15
+ "operational_check": "variant arm output MUST NOT contain regex pattern '@ts-ignore' in files ['bin/cli.js']; rationale: @ts-ignore escape hatch",
16
+ "paraphrase": "variant arm output MUST NOT contain regex pattern '@ts-ignore' in files ['bin/cli.js']; rationale: @ts-ignore escape hat",
17
+ "source_refs": [
18
+ "expected.json:forbidden_patterns[1]"
19
+ ]
20
+ },
21
+ {
22
+ "authority": "expected.json/max_deps_added",
23
+ "id": "max_deps_added__0",
24
+ "operational_check": "variant arm MUST NOT add more than 0 new npm dependencies (count delta of package.json:dependencies + devDependencies)",
25
+ "paraphrase": "variant arm MUST NOT add more than 0 new npm dependencies (count delta of package.json:dependencies + devDependencies)",
26
+ "source_refs": [
27
+ "expected.json:max_deps_added"
28
+ ]
29
+ },
30
+ {
31
+ "authority": "expected.json/required_files",
32
+ "id": "required_file__bin_cli_js",
33
+ "operational_check": "variant arm output MUST contain file 'bin/cli.js' (created or preserved)",
34
+ "paraphrase": "variant arm output MUST contain file 'bin/cli.js' (created or preserved)",
35
+ "source_refs": [
36
+ "expected.json:required_files[bin/cli.js]"
37
+ ]
38
+ },
39
+ {
40
+ "authority": "metadata/oracle-allowlist",
41
+ "id": "scope-tier-a:lockfile-deletion",
42
+ "operational_check": "variant arm MUST NOT delete a scaffold-present lockfile",
43
+ "paraphrase": "variant arm MUST NOT delete a scaffold-present lockfile",
44
+ "source_refs": [
45
+ "oracle-scope-tier-a.py"
46
+ ]
47
+ },
48
+ {
49
+ "authority": "metadata/oracle-allowlist",
50
+ "id": "scope-tier-a:tier-a-violation",
51
+ "operational_check": "variant arm MUST NOT add or modify paths matching: docs/roadmap/** | docs/VISION.md | docs/ROADMAP.md | .github/** | node_modules/** | **/node_modules/** | test-results/** | coverage/** | .nyc_output/** | basename suffix .log | basename prefix .env or secrets.",
52
+ "paraphrase": "variant arm MUST NOT add or modify paths matching: docs/roadmap/** | docs/VISION.md | docs/ROADMAP.md | .github/** | nod",
53
+ "source_refs": [
54
+ "oracle-scope-tier-a.py"
55
+ ]
56
+ },
57
+ {
58
+ "authority": "metadata/oracle-allowlist",
59
+ "id": "scope-tier-b:scope-unmatched",
60
+ "operational_check": "every variant-touched file MUST be either inside spec_output_files (Tier C) OR reachable from a Tier C seed via static JS/TS imports OR matched by expected.json:tier_a_waivers",
61
+ "paraphrase": "every variant-touched file MUST be either inside spec_output_files (Tier C) OR reachable from a Tier C seed via static J",
62
+ "source_refs": [
63
+ "oracle-scope-tier-b.py"
64
+ ]
65
+ },
66
+ {
67
+ "authority": "expected.json/spec_output_files",
68
+ "id": "spec_output_file__bin_cli_js",
69
+ "operational_check": "variant-touched files MUST be inside (or reachable via static imports from) the spec_output_files set; 'bin/cli.js' is one Tier C seed",
70
+ "paraphrase": "variant-touched files MUST be inside (or reachable via static imports from) the spec_output_files set; 'bin/cli.js' is o",
71
+ "source_refs": [
72
+ "expected.json:spec_output_files[bin/cli.js]"
73
+ ]
74
+ },
75
+ {
76
+ "authority": "expected.json/spec_output_files",
77
+ "id": "spec_output_file__tests_cli_test_js",
78
+ "operational_check": "variant-touched files MUST be inside (or reachable via static imports from) the spec_output_files set; 'tests/cli.test.js' is one Tier C seed",
79
+ "paraphrase": "variant-touched files MUST be inside (or reachable via static imports from) the spec_output_files set; 'tests/cli.test.j",
80
+ "source_refs": [
81
+ "expected.json:spec_output_files[tests/cli.test.js]"
82
+ ]
83
+ },
84
+ {
85
+ "authority": "metadata/oracle-allowlist",
86
+ "id": "test-fidelity:assertion-regression",
87
+ "operational_check": "effective assertion count MUST NOT drop and skipped-test count MUST NOT rise; vacuous expect.assertions(0) is treated as a real regression",
88
+ "paraphrase": "effective assertion count MUST NOT drop and skipped-test count MUST NOT rise; vacuous expect.assertions(0) is treated as",
89
+ "source_refs": [
90
+ "oracle-test-fidelity.py"
91
+ ]
92
+ },
93
+ {
94
+ "authority": "metadata/oracle-allowlist",
95
+ "id": "test-fidelity:mock-swap",
96
+ "operational_check": "post-arm test file MUST NOT swap REAL_PATTERNS hits for MOCK_PATTERNS hits (jest/vi/sinon, nock/msw, app.handle/inject/callback, hand-rolled IncomingMessage/ServerResponse, etc.); a drop in real_calls combined with a rise in mock_calls is a mock-swap flag",
97
+ "paraphrase": "post-arm test file MUST NOT swap REAL_PATTERNS hits for MOCK_PATTERNS hits (jest/vi/sinon, nock/msw, app.handle/inject/c",
98
+ "source_refs": [
99
+ "oracle-test-fidelity.py"
100
+ ]
101
+ },
102
+ {
103
+ "authority": "metadata/oracle-allowlist",
104
+ "id": "test-fidelity:test-file-deleted",
105
+ "operational_check": "no scaffold-present test file may be deleted by the variant arm; deletion of an existing tests/*.test.* / *.spec.* / *.e2e.* file is a flag-severity finding",
106
+ "paraphrase": "no scaffold-present test file may be deleted by the variant arm; deletion of an existing tests/*.test.* / *.spec.* / *.e",
107
+ "source_refs": [
108
+ "oracle-test-fidelity.py"
109
+ ]
110
+ },
111
+ {
112
+ "authority": "metadata/oracle-allowlist",
113
+ "id": "test-fidelity:test-file-renamed",
114
+ "operational_check": "rename of a scaffold-present test file is warn-severity (content fidelity not verified across renames in step 1)",
115
+ "paraphrase": "rename of a scaffold-present test file is warn-severity (content fidelity not verified across renames in step 1)",
116
+ "source_refs": [
117
+ "oracle-test-fidelity.py"
118
+ ]
119
+ },
120
+ {
121
+ "authority": "expected.json/verification_commands",
122
+ "id": "verification__3f35982a",
123
+ "operational_check": "running `node bin/cli.js doctor` in the post-arm work dir MUST exit with code 0; stdout MUST contain all of ['doctor:']; stdout MUST NOT contain any of ['undefined', 'Error:']",
124
+ "paraphrase": "running `node bin/cli.js doctor` in the post-arm work dir MUST exit with code 0; stdout MUST contain all of ['doctor:'];",
125
+ "source_refs": [
126
+ "expected.json:verification_commands[0]"
127
+ ]
128
+ },
129
+ {
130
+ "authority": "expected.json/verification_commands",
131
+ "id": "verification__460fce04",
132
+ "operational_check": "running `HOME=/nonexistent node bin/cli.js doctor` in the post-arm work dir MUST exit with code 1; stdout MUST contain all of ['/nonexistent']; stdout MUST NOT contain any of []",
133
+ "paraphrase": "running `HOME=/nonexistent node bin/cli.js doctor` in the post-arm work dir MUST exit with code 1; stdout MUST contain a",
134
+ "source_refs": [
135
+ "expected.json:verification_commands[1]"
136
+ ]
137
+ },
138
+ {
139
+ "authority": "expected.json/verification_commands",
140
+ "id": "verification__973e287e",
141
+ "operational_check": "running `python3 -c \"import subprocess; r = subprocess.run(['node', 'bin/cli.js', 'doctor'], capture_output=True); n = r.stdout.count(b'\\x1b['); print(n); exit(0 if n == 0 else 1)\"` in the post-arm work dir MUST exit with code 0; stdout MUST contain all of ['0']; stdout MUST NOT contain any of []",
142
+ "paraphrase": "running `python3 -c \"import subprocess; r = subprocess.run(['node', 'bin/cli.js', 'doctor'], capture_output=True); n = r",
143
+ "source_refs": [
144
+ "expected.json:verification_commands[2]"
145
+ ]
146
+ },
147
+ {
148
+ "authority": "expected.json/verification_commands",
149
+ "id": "verification__d6253a97",
150
+ "operational_check": "running `node bin/cli.js doctor --help` in the post-arm work dir MUST exit with code 0; stdout MUST contain all of ['doctor']; stdout MUST NOT contain any of []",
151
+ "paraphrase": "running `node bin/cli.js doctor --help` in the post-arm work dir MUST exit with code 0; stdout MUST contain all of ['doc",
152
+ "source_refs": [
153
+ "expected.json:verification_commands[3]"
154
+ ]
155
+ },
156
+ {
157
+ "authority": "expected.json/verification_commands",
158
+ "id": "verification__e0f149e4",
159
+ "operational_check": "running `node bin/cli.js --help` in the post-arm work dir MUST exit with code 0; stdout MUST contain all of ['doctor']; stdout MUST NOT contain any of []",
160
+ "paraphrase": "running `node bin/cli.js --help` in the post-arm work dir MUST exit with code 0; stdout MUST contain all of ['doctor']; ",
161
+ "source_refs": [
162
+ "expected.json:verification_commands[4]"
163
+ ]
164
+ },
165
+ {
166
+ "authority": "expected.json/verification_commands",
167
+ "id": "verification__fdbcd321",
168
+ "operational_check": "running `node bin/cli.js doctor --verbose` in the post-arm work dir MUST exit with code 0; stdout MUST contain all of ['doctor:']; stdout MUST NOT contain any of ['Error:']",
169
+ "paraphrase": "running `node bin/cli.js doctor --verbose` in the post-arm work dir MUST exit with code 0; stdout MUST contain all of ['",
170
+ "source_refs": [
171
+ "expected.json:verification_commands[5]"
172
+ ]
173
+ }
174
+ ],
175
+ "authority_order": [
176
+ "spec.md",
177
+ "expected.json/rubric",
178
+ "phase prompt",
179
+ "model preference"
180
+ ],
181
+ "escalated_to_user": [],
182
+ "fixture_id": "F2-cli-medium-subcommand",
183
+ "model_stamps": {
184
+ "claude": {
185
+ "blocked_ids": [],
186
+ "model": "claude-opus-4-7",
187
+ "signed_plan_sha256": "7329b8955a94ac680a7a58d09bfc53ce4de17609495b4bf4658cf2d3a43dacd5",
188
+ "status": "sign",
189
+ "timestamp": "2026-04-29T18:30:00Z"
190
+ },
191
+ "codex": {
192
+ "blocked_ids": [],
193
+ "model": "gpt-5.5",
194
+ "signed_plan_sha256": "7329b8955a94ac680a7a58d09bfc53ce4de17609495b4bf4658cf2d3a43dacd5",
195
+ "status": "sign",
196
+ "timestamp": "2026-04-29T18:31:00Z"
197
+ }
198
+ },
199
+ "plan_status": "final",
200
+ "planning_mode": "pair",
201
+ "rejected_alternatives": [],
202
+ "rounds": [
203
+ {
204
+ "claude_draft_sha256": "0000000000000000000000000000000000000000000000000000000000000000",
205
+ "codex_draft_sha256": "1111111111111111111111111111111111111111111111111111111111111111",
206
+ "merged_sha256": "2222222222222222222222222222222222222222222222222222222222222222",
207
+ "note": "sample-pass synthetic round (test fixture)",
208
+ "round": 1
209
+ }
210
+ ],
211
+ "schema_version": "1",
212
+ "source": {
213
+ "canonical_id_registry_path": "benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected-pair-plan-registry.json",
214
+ "canonical_id_registry_sha256": "98ac16e4536ea3ef2e51d3c728982c014211c193a742cea74f1331e4fbba76be",
215
+ "expected_path": "benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected.json",
216
+ "expected_sha256": "ddef8feba49f20b6957e37840bc6a03e78e554776e380d81ad6390944c72fcab",
217
+ "rubric_path": "benchmark/auto-resolve/RUBRIC.md",
218
+ "rubric_sha256": "5b5b709a0b57f7e6f4fbc072af91e1edbc8d7910ae16b9b7be7170616aeaa9af",
219
+ "spec_path": "benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/spec.md",
220
+ "spec_sha256": "9b0949c2afd4a522de2bdbbf267d93907fd908bf0f1d0dc5e111ee30ba875bb7"
221
+ },
222
+ "unresolved": []
223
+ }
@@ -0,0 +1,5 @@
1
+ #!/usr/bin/env bash
2
+ # F2 setup — no base-repo modifications needed. The task starts from the
3
+ # stock test-repo baseline (hello/version CLI) and adds a new subcommand.
4
+ set -e
5
+ exit 0
@@ -0,0 +1,56 @@
1
+ ---
2
+ id: "F2-cli-medium-subcommand"
3
+ title: "Add `doctor` subcommand to bench-test-repo CLI"
4
+ status: planned
5
+ complexity: medium
6
+ depends-on: []
7
+ ---
8
+
9
+ # F2 Add `doctor` subcommand
10
+
11
+ ## Context
12
+
13
+ `bench-test-repo` users need a one-command way to diagnose their local
14
+ environment — node version, Claude Code install, plugins, skills — without
15
+ digging through the filesystem. A `doctor` subcommand lands that capability
16
+ inside the CLI itself.
17
+
18
+ ## Requirements
19
+
20
+ - [ ] `node bin/cli.js doctor` produces a status report and exits 0 on a clean machine.
21
+ - [ ] Node version check — requires `process.version >= v18.0.0`, emits a status line, marks FAIL if below.
22
+ - [ ] `$HOME/.claude/` check — exists as directory AND is writable. Missing → FAIL. Exists but not writable (EACCES) → FAIL with a distinct "permission" message.
23
+ - [ ] Installed plugins scan — read subdirectories of `$HOME/.claude/plugins/cache/` and print a summary line with the count; `--verbose` lists names.
24
+ - [ ] Installed skills scan — count files matching `$HOME/.claude/skills/**/SKILL.md`; print count; `--verbose` lists relative paths.
25
+ - [ ] Colored output with `[OK]` (green), `[WARN]` (yellow), `[FAIL]` (red) via ANSI escape codes **only when `process.stdout.isTTY` is true** — piped output must contain no `\x1b[` sequences.
26
+ - [ ] Summary line: `doctor: <N> ok, <M> warn, <K> fail`.
27
+ - [ ] Exit code: `0` if zero fails, `1` otherwise.
28
+ - [ ] `--verbose` flag expands details for plugins/skills scans.
29
+ - [ ] `node bin/cli.js doctor --help` prints a short help block and exits 0.
30
+ - [ ] `node bin/cli.js --help` lists `doctor` as an available subcommand.
31
+ - [ ] `HOME=/nonexistent node bin/cli.js doctor` prints a FAIL line clearly referencing the missing `/nonexistent/.claude` and exits 1.
32
+
33
+ ## Constraints
34
+
35
+ - **Zero new npm dependencies.** Use only Node.js built-ins (`fs`, `path`, `os`, `process`).
36
+ - **No silent error catches.** Do not wrap operations in `try { … } catch { return fallbackValue }`. All errors visible to the user with actionable messages.
37
+ - **HOME guard.** If `process.env.HOME` is undefined or empty, emit a clear FAIL line ("HOME environment variable is not set") and exit 1.
38
+ - **EACCES handling.** If `readdirSync` fails with EACCES, emit a permission-specific message quoting the offending path. Do not silently return an empty list.
39
+
40
+ - **Lifecycle note.** The harness's DOCS phase flips this spec's frontmatter `status` after implementation completes — that is benchmark lifecycle bookkeeping, not a scope violation.
41
+
42
+ ## Out of Scope
43
+
44
+ - Auto-repair (report only; do not offer to fix detected problems).
45
+ - Checking remote/registry state (npm, GitHub).
46
+ - Any feature requiring a new npm dependency.
47
+
48
+ ## Verification
49
+
50
+ - `node bin/cli.js doctor` exits 0 on a machine with `~/.claude` present.
51
+ - `HOME=/nonexistent node bin/cli.js doctor` prints a FAIL line referencing `/nonexistent/.claude` and exits 1.
52
+ - `node bin/cli.js doctor | cat` contains no `\x1b[` sequences.
53
+ - `node bin/cli.js doctor --help` prints help, exits 0.
54
+ - `node bin/cli.js --help` mentions `doctor`.
55
+ - `git diff -- package.json` is empty.
56
+ - `node bin/cli.js doctor --verbose` lists plugins and skills.
@@ -0,0 +1,14 @@
1
+ Add a `doctor` subcommand to bench-test-repo's CLI (`bin/cli.js`) that diagnoses the local environment. When a user runs `node bin/cli.js doctor`, it should check:
2
+
3
+ 1. Node version is >= v18.0.0
4
+ 2. $HOME/.claude/ exists and is writable
5
+ 3. How many plugins are installed under $HOME/.claude/plugins/cache/
6
+ 4. How many skills are installed (SKILL.md files under $HOME/.claude/skills/)
7
+
8
+ Each check produces a status line with a tag — `[OK]`, `[WARN]`, or `[FAIL]`. The tags should be colored (green/yellow/red) when output is a TTY, but plain text when piped. End with a summary line like `doctor: 3 ok, 1 warn, 0 fail` and exit 0 if no fails, 1 otherwise.
9
+
10
+ Add a `--verbose` flag that lists the individual plugins and skill paths. Add `--help` for the subcommand. Make sure `node bin/cli.js --help` mentions `doctor`.
11
+
12
+ Keep it to Node.js built-ins only (fs, path, os, process) — no new npm dependencies. Handle errors explicitly: if HOME is unset, fail cleanly with a helpful message; if a directory can't be read due to permissions (EACCES), say so with the path.
13
+
14
+ Running `HOME=/nonexistent node bin/cli.js doctor` should exit 1 and mention the missing `/nonexistent/.claude`.
@@ -0,0 +1,28 @@
1
+ # F3 — Notes
2
+
3
+ ## Purpose
4
+
5
+ High-risk contract change. Exercises the pipeline's ability to catch
6
+ **breaking changes hidden inside a reasonable-looking refactor**: lazy
7
+ implementations wrap `items` in an envelope but forget to update tests, or
8
+ update tests but forget backward-compat requirements (single-item route,
9
+ `items` key), or paginate without validating query params.
10
+
11
+ ## Failure modes detected
12
+
13
+ - **Test lie**: arm changes the handler but leaves old `assert.ok(Array.isArray(body.items))` that still passes against `{ items: [...] }` inside the envelope → test passes but new paging fields aren't asserted. Fixture requires ≥ 2 NEW tests.
14
+ - **Query-param trust**: accepts `?per_page=abc` → `parseInt` returns `NaN` → handler explodes or silently treats as default. Fixture requires explicit 400.
15
+ - **Contract drift on single-item lookup**: arm paginates `/items/:id` too, breaking existing clients.
16
+ - **Silent catch**: wrapping `Number(req.query.page)` in a `try/catch { return [] }` — caught by forbidden pattern.
17
+
18
+ ## Pipeline exercise
19
+
20
+ - Phase 0 routing: likely `strict` (no risk keywords, but cross-file multi-function change may escalate in Stage B).
21
+ - Phase 1 BUILD: Codex BUILD produces the implementation.
22
+ - Phase 1.4 BUILD GATE: `node --test tests/server.test.js` must pass.
23
+ - Phase 2 EVAL: scrutinizes that new tests cover new behavior (not just rename passes).
24
+ - Phase 3 CRITIC design: checks invalid-query branch and backward-compat.
25
+
26
+ ## Rotation trigger
27
+
28
+ Retire when both arms consistently score > 95 AND produce 2+ new tests covering paging edge cases without pipeline intervention.
@@ -0,0 +1,162 @@
1
+ {
2
+ "fixture_id": "F3-backend-contract-risk",
3
+ "generated_at": "2026-04-29T09:57:53Z",
4
+ "generated_from": {
5
+ "expected_path": "benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected.json",
6
+ "expected_sha256": "c0925ee948179fbc1c76836d98fba0c14c7eba56f462f2922903951923cb22e6",
7
+ "metadata_path": "benchmark/auto-resolve/fixtures/F3-backend-contract-risk/metadata.json",
8
+ "metadata_sha256": "c54530db26dbb04ce50b698fed2608206eae6f9a5dc2f666f127695e15d3fa30",
9
+ "oracle_script_shas": {
10
+ "scope-tier-a": "baaf21ed4a67f35d2a8af825e72869ef9737b5dfe08d65dd1a11c26fafe297ae",
11
+ "scope-tier-b": "9349d00a5c7456a4df9142923334e7004407d53f2443f2e210945bb771971e25",
12
+ "test-fidelity": "401184da51ae500cecfc75a6c5819b0d28acb63a397f788fb628c2913562f903"
13
+ }
14
+ },
15
+ "required_invariants": [
16
+ {
17
+ "authority": "expected.json/forbidden_patterns",
18
+ "id": "forbidden_pattern__eslint_disable_without_explicit_per_issue_justification__server_index_js",
19
+ "operational_check": "variant arm output MUST NOT contain regex pattern '/\\\\*\\\\s*eslint-disable' in files ['server/index.js']; rationale: eslint-disable without explicit per-issue justification",
20
+ "severity": "disqualifier",
21
+ "source_field": "expected.json/forbidden_patterns/1",
22
+ "source_ref": "expected.json:forbidden_patterns[1]"
23
+ },
24
+ {
25
+ "authority": "expected.json/forbidden_patterns",
26
+ "id": "forbidden_pattern__silent_catch_returning_fallback__server_index_js",
27
+ "operational_check": "variant arm output MUST NOT contain regex pattern \"catch\\\\s*\\\\([^)]*\\\\)\\\\s*\\\\{[^}]*return\\\\s+(null|undefined|'')\" in files ['server/index.js']; rationale: silent catch returning fallback",
28
+ "severity": "disqualifier",
29
+ "source_field": "expected.json/forbidden_patterns/0",
30
+ "source_ref": "expected.json:forbidden_patterns[0]"
31
+ },
32
+ {
33
+ "authority": "expected.json/max_deps_added",
34
+ "id": "max_deps_added__0",
35
+ "operational_check": "variant arm MUST NOT add more than 0 new npm dependencies (count delta of package.json:dependencies + devDependencies)",
36
+ "severity": "hard",
37
+ "source_field": "expected.json/max_deps_added",
38
+ "source_ref": "expected.json:max_deps_added"
39
+ },
40
+ {
41
+ "authority": "expected.json/required_files",
42
+ "id": "required_file__server_index_js",
43
+ "operational_check": "variant arm output MUST contain file 'server/index.js' (created or preserved)",
44
+ "severity": "hard",
45
+ "source_field": "expected.json/required_files",
46
+ "source_ref": "expected.json:required_files[server/index.js]"
47
+ },
48
+ {
49
+ "authority": "expected.json/required_files",
50
+ "id": "required_file__tests_server_test_js",
51
+ "operational_check": "variant arm output MUST contain file 'tests/server.test.js' (created or preserved)",
52
+ "severity": "hard",
53
+ "source_field": "expected.json/required_files",
54
+ "source_ref": "expected.json:required_files[tests/server.test.js]"
55
+ },
56
+ {
57
+ "authority": "metadata/oracle-allowlist",
58
+ "id": "scope-tier-a:lockfile-deletion",
59
+ "operational_check": "variant arm MUST NOT delete a scaffold-present lockfile",
60
+ "severity": "hard",
61
+ "source_field": "oracle/scope-tier-a/scope-tier-a:lockfile-deletion",
62
+ "source_ref": "oracle-scope-tier-a.py"
63
+ },
64
+ {
65
+ "authority": "metadata/oracle-allowlist",
66
+ "id": "scope-tier-a:tier-a-violation",
67
+ "operational_check": "variant arm MUST NOT add or modify paths matching: docs/roadmap/** | docs/VISION.md | docs/ROADMAP.md | .github/** | node_modules/** | **/node_modules/** | test-results/** | coverage/** | .nyc_output/** | basename suffix .log | basename prefix .env or secrets.",
68
+ "severity": "hard",
69
+ "source_field": "oracle/scope-tier-a/scope-tier-a:tier-a-violation",
70
+ "source_ref": "oracle-scope-tier-a.py"
71
+ },
72
+ {
73
+ "authority": "metadata/oracle-allowlist",
74
+ "id": "scope-tier-b:scope-unmatched",
75
+ "operational_check": "every variant-touched file MUST be either inside spec_output_files (Tier C) OR reachable from a Tier C seed via static JS/TS imports OR matched by expected.json:tier_a_waivers",
76
+ "severity": "warn",
77
+ "source_field": "oracle/scope-tier-b/scope-tier-b:scope-unmatched",
78
+ "source_ref": "oracle-scope-tier-b.py"
79
+ },
80
+ {
81
+ "authority": "expected.json/spec_output_files",
82
+ "id": "spec_output_file__server_index_js",
83
+ "operational_check": "variant-touched files MUST be inside (or reachable via static imports from) the spec_output_files set; 'server/index.js' is one Tier C seed",
84
+ "severity": "warn",
85
+ "source_field": "expected.json/spec_output_files",
86
+ "source_ref": "expected.json:spec_output_files[server/index.js]"
87
+ },
88
+ {
89
+ "authority": "expected.json/spec_output_files",
90
+ "id": "spec_output_file__tests_server_test_js",
91
+ "operational_check": "variant-touched files MUST be inside (or reachable via static imports from) the spec_output_files set; 'tests/server.test.js' is one Tier C seed",
92
+ "severity": "warn",
93
+ "source_field": "expected.json/spec_output_files",
94
+ "source_ref": "expected.json:spec_output_files[tests/server.test.js]"
95
+ },
96
+ {
97
+ "authority": "metadata/oracle-allowlist",
98
+ "id": "test-fidelity:assertion-regression",
99
+ "operational_check": "effective assertion count MUST NOT drop and skipped-test count MUST NOT rise; vacuous expect.assertions(0) is treated as a real regression",
100
+ "severity": "warn",
101
+ "source_field": "oracle/test-fidelity/test-fidelity:assertion-regression",
102
+ "source_ref": "oracle-test-fidelity.py"
103
+ },
104
+ {
105
+ "authority": "metadata/oracle-allowlist",
106
+ "id": "test-fidelity:mock-swap",
107
+ "operational_check": "post-arm test file MUST NOT swap REAL_PATTERNS hits for MOCK_PATTERNS hits (jest/vi/sinon, nock/msw, app.handle/inject/callback, hand-rolled IncomingMessage/ServerResponse, etc.); a drop in real_calls combined with a rise in mock_calls is a mock-swap flag",
108
+ "severity": "flag",
109
+ "source_field": "oracle/test-fidelity/test-fidelity:mock-swap",
110
+ "source_ref": "oracle-test-fidelity.py"
111
+ },
112
+ {
113
+ "authority": "metadata/oracle-allowlist",
114
+ "id": "test-fidelity:test-file-deleted",
115
+ "operational_check": "no scaffold-present test file may be deleted by the variant arm; deletion of an existing tests/*.test.* / *.spec.* / *.e2e.* file is a flag-severity finding",
116
+ "severity": "flag",
117
+ "source_field": "oracle/test-fidelity/test-fidelity:test-file-deleted",
118
+ "source_ref": "oracle-test-fidelity.py"
119
+ },
120
+ {
121
+ "authority": "metadata/oracle-allowlist",
122
+ "id": "test-fidelity:test-file-renamed",
123
+ "operational_check": "rename of a scaffold-present test file is warn-severity (content fidelity not verified across renames in step 1)",
124
+ "severity": "warn",
125
+ "source_field": "oracle/test-fidelity/test-fidelity:test-file-renamed",
126
+ "source_ref": "oracle-test-fidelity.py"
127
+ },
128
+ {
129
+ "authority": "expected.json/verification_commands",
130
+ "id": "verification__6001efe2",
131
+ "operational_check": "running `node -e 'const { app } = require(\"./server\"); const http = require(\"http\"); const s = http.createServer(app).listen(0, () => { const { port } = s.address(); http.get(`http://127.0.0.1:${port}/items?per_page=abc`, r => { console.log(r.statusCode); s.close(); }); });'` in the post-arm work dir MUST exit with code 0; stdout MUST contain all of ['400']; stdout MUST NOT contain any of []",
132
+ "severity": "hard",
133
+ "source_field": "expected.json/verification_commands/3",
134
+ "source_ref": "expected.json:verification_commands[3]"
135
+ },
136
+ {
137
+ "authority": "expected.json/verification_commands",
138
+ "id": "verification__6517d995",
139
+ "operational_check": "running `node -e 'const { app } = require(\"./server\"); const http = require(\"http\"); const s = http.createServer(app).listen(0, () => { const { port } = s.address(); http.get(`http://127.0.0.1:${port}/items`, r => { let b = \"\"; r.on(\"data\", c=>b+=c); r.on(\"end\", () => { const d = JSON.parse(b); console.log(JSON.stringify({ total: d.total, page: d.page, per_page: d.per_page, items_len: d.items.length })); s.close(); }); }); });'` in the post-arm work dir MUST exit with code 0; stdout MUST contain all of ['\"total\":2', '\"page\":1']; stdout MUST NOT contain any of []",
140
+ "severity": "hard",
141
+ "source_field": "expected.json/verification_commands/1",
142
+ "source_ref": "expected.json:verification_commands[1]"
143
+ },
144
+ {
145
+ "authority": "expected.json/verification_commands",
146
+ "id": "verification__73df5e81",
147
+ "operational_check": "running `node -e 'const { app } = require(\"./server\"); const http = require(\"http\"); const s = http.createServer(app).listen(0, () => { const { port } = s.address(); http.get(`http://127.0.0.1:${port}/items?page=2&per_page=1`, r => { let b = \"\"; r.on(\"data\", c=>b+=c); r.on(\"end\", () => { const d = JSON.parse(b); console.log(d.items[0] && d.items[0].name); s.close(); }); }); });'` in the post-arm work dir MUST exit with code 0; stdout MUST contain all of ['beta']; stdout MUST NOT contain any of []",
148
+ "severity": "hard",
149
+ "source_field": "expected.json/verification_commands/2",
150
+ "source_ref": "expected.json:verification_commands[2]"
151
+ },
152
+ {
153
+ "authority": "expected.json/verification_commands",
154
+ "id": "verification__7c5f3637",
155
+ "operational_check": "running `node --test tests/server.test.js` in the post-arm work dir MUST exit with code 0; stdout MUST contain all of []; stdout MUST NOT contain any of ['fail ']",
156
+ "severity": "hard",
157
+ "source_field": "expected.json/verification_commands/0",
158
+ "source_ref": "expected.json:verification_commands[0]"
159
+ }
160
+ ],
161
+ "schema_version": "1"
162
+ }
@@ -0,0 +1,65 @@
1
+ {
2
+ "verification_commands": [
3
+ {
4
+ "cmd": "node --test tests/server.test.js",
5
+ "exit_code": 0,
6
+ "stdout_contains": [],
7
+ "stdout_not_contains": [
8
+ "not ok "
9
+ ]
10
+ },
11
+ {
12
+ "cmd": "node -e 'const { app } = require(\"./server\"); const http = require(\"http\"); const s = http.createServer(app).listen(0, () => { const { port } = s.address(); http.get(`http://127.0.0.1:${port}/items`, r => { let b = \"\"; r.on(\"data\", c=>b+=c); r.on(\"end\", () => { const d = JSON.parse(b); console.log(JSON.stringify({ total: d.total, page: d.page, per_page: d.per_page, items_len: d.items.length })); s.close(); }); }); });'",
13
+ "exit_code": 0,
14
+ "stdout_contains": [
15
+ "\"total\":2",
16
+ "\"page\":1"
17
+ ],
18
+ "stdout_not_contains": []
19
+ },
20
+ {
21
+ "cmd": "node -e 'const { app } = require(\"./server\"); const http = require(\"http\"); const s = http.createServer(app).listen(0, () => { const { port } = s.address(); http.get(`http://127.0.0.1:${port}/items?page=2&per_page=1`, r => { let b = \"\"; r.on(\"data\", c=>b+=c); r.on(\"end\", () => { const d = JSON.parse(b); console.log(d.items[0] && d.items[0].name); s.close(); }); }); });'",
22
+ "exit_code": 0,
23
+ "stdout_contains": [
24
+ "beta"
25
+ ],
26
+ "stdout_not_contains": []
27
+ },
28
+ {
29
+ "cmd": "node -e 'const { app } = require(\"./server\"); const http = require(\"http\"); const s = http.createServer(app).listen(0, () => { const { port } = s.address(); http.get(`http://127.0.0.1:${port}/items?per_page=abc`, r => { console.log(r.statusCode); s.close(); }); });'",
30
+ "exit_code": 0,
31
+ "stdout_contains": [
32
+ "400"
33
+ ],
34
+ "stdout_not_contains": []
35
+ }
36
+ ],
37
+ "forbidden_patterns": [
38
+ {
39
+ "pattern": "catch\\s*\\([^)]*\\)\\s*\\{[^}]*return\\s+(null|undefined|'')",
40
+ "description": "silent catch returning fallback",
41
+ "files": [
42
+ "server/index.js"
43
+ ],
44
+ "severity": "disqualifier"
45
+ },
46
+ {
47
+ "pattern": "/\\*\\s*eslint-disable",
48
+ "description": "eslint-disable without explicit per-issue justification",
49
+ "files": [
50
+ "server/index.js"
51
+ ],
52
+ "severity": "disqualifier"
53
+ }
54
+ ],
55
+ "required_files": [
56
+ "server/index.js",
57
+ "tests/server.test.js"
58
+ ],
59
+ "forbidden_files": [],
60
+ "max_deps_added": 0,
61
+ "spec_output_files": [
62
+ "server/index.js",
63
+ "tests/server.test.js"
64
+ ]
65
+ }
@@ -0,0 +1,19 @@
1
+ {
2
+ "id": "F3-backend-contract-risk",
3
+ "category": "high-risk",
4
+ "difficulty": "high",
5
+ "timeout_seconds": 1500,
6
+ "required_tools": ["node"],
7
+ "browser": false,
8
+ "deps_change_expected": false,
9
+ "intent": "Change the GET /items response shape from { items } to a paginated { items, total, page, per_page } while keeping the existing 1-based id semantics and updating tests. A lazy implementation will leave tests broken or drop the items array — the spec requires both.",
10
+ "pair_plan_oracle_categories": [
11
+ "scope-tier-a:lockfile-deletion",
12
+ "scope-tier-a:tier-a-violation",
13
+ "scope-tier-b:scope-unmatched",
14
+ "test-fidelity:assertion-regression",
15
+ "test-fidelity:mock-swap",
16
+ "test-fidelity:test-file-deleted",
17
+ "test-fidelity:test-file-renamed"
18
+ ]
19
+ }
@@ -0,0 +1,4 @@
1
+ #!/usr/bin/env bash
2
+ # F3 setup — no changes to base test-repo. Task modifies existing server/index.js.
3
+ set -e
4
+ exit 0
@@ -0,0 +1,56 @@
1
+ ---
2
+ id: "F3-backend-contract-risk"
3
+ title: "Paginate GET /items response"
4
+ status: planned
5
+ complexity: high
6
+ depends-on: []
7
+ ---
8
+
9
+ # F3 Paginate `GET /items`
10
+
11
+ ## Context
12
+
13
+ `server/index.js` currently returns `{ items: [...] }` for `GET /items` with
14
+ no pagination metadata. As the dataset grows, clients need a `total` and
15
+ paging information. The task: wrap the existing response in a pagination
16
+ envelope, accept `?page` and `?per_page` query parameters, and update tests
17
+ so existing assertions continue to pass alongside new paging assertions.
18
+
19
+ ## Requirements
20
+
21
+ - [ ] `GET /items` (no query) returns `{ items, total, page, per_page }` where:
22
+ - `items` is the full list (baseline repo has 2 items).
23
+ - `total` is the full item count.
24
+ - `page` is `1`.
25
+ - `per_page` is the full item count when no pagination was requested.
26
+ - [ ] `GET /items?page=1&per_page=1` returns the first item wrapped in the envelope with `items.length === 1`, `total === 2`, `page === 1`, `per_page === 1`.
27
+ - [ ] `GET /items?page=2&per_page=1` returns the second item similarly.
28
+ - [ ] `GET /items?page=99&per_page=1` returns `items: []`, `total === 2`, `page === 99`, `per_page === 1` (out-of-range page is allowed — bare empty array, never a 404).
29
+ - [ ] `GET /items/:id` behavior unchanged (the per-item route does NOT get paginated).
30
+ - [ ] `tests/server.test.js` is updated so every existing assertion still holds (semantically) AND the new paging behavior is covered by at least two new tests.
31
+ - [ ] `GET /health` continues to return `{ status: 'ok' }` unchanged.
32
+
33
+ ## Constraints
34
+
35
+ - **No new npm dependencies.** Use only Express + built-ins already in the repo.
36
+ - **No silent catches.** Invalid `page` or `per_page` (non-numeric, zero, negative) must respond 400 with `{ error: 'invalid_query', field }`.
37
+ - **No breaking change to `/items/:id`.** The per-item route must keep its current contract (the fixture explicitly does NOT paginate single-item lookups).
38
+ - **Backward-compat note**: clients that previously read `response.items` MUST still get the array at the same key inside the new envelope.
39
+
40
+ - **Lifecycle note.** The harness's DOCS phase flips this spec's frontmatter `status` after implementation completes — that is benchmark lifecycle bookkeeping, not a scope violation.
41
+
42
+ ## Out of Scope
43
+
44
+ - Caching, rate limiting, authentication.
45
+ - Converting `items` to a database-backed list.
46
+ - Touching `bin/cli.js`, `web/`, or `tests/cli.test.js`.
47
+ - Adding a new route.
48
+
49
+ ## Verification
50
+
51
+ - Server start: `node server/index.js` listens on port 3000 (exit via SIGINT).
52
+ - `curl -s http://127.0.0.1:3000/items | jq '.total'` returns `2`.
53
+ - `curl -s 'http://127.0.0.1:3000/items?per_page=1&page=2' | jq '.items[0].name'` returns `"beta"`.
54
+ - `curl -s 'http://127.0.0.1:3000/items?per_page=abc' -o /dev/null -w '%{http_code}'` returns `400`.
55
+ - `node --test tests/server.test.js` passes; must include ≥ 2 new paging tests.
56
+ - `git diff --stat` shows only `server/index.js` and `tests/server.test.js` touched.