devlyn-cli 2.2.2 → 2.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (220) hide show
  1. package/AGENTS.md +2 -2
  2. package/CLAUDE.md +4 -4
  3. package/README.md +85 -34
  4. package/benchmark/auto-resolve/BENCHMARK-DESIGN.md +61 -44
  5. package/benchmark/auto-resolve/BENCHMARK-RESULTS.md +341 -0
  6. package/benchmark/auto-resolve/README.md +307 -44
  7. package/benchmark/auto-resolve/RUBRIC.md +23 -14
  8. package/benchmark/auto-resolve/fixtures/F1-cli-trivial-flag/NOTES.md +7 -3
  9. package/benchmark/auto-resolve/fixtures/F10-persist-write-collision/NOTES.md +8 -3
  10. package/benchmark/auto-resolve/fixtures/F11-batch-import-all-or-nothing/NOTES.md +8 -3
  11. package/benchmark/auto-resolve/fixtures/F12-webhook-raw-body-signature/NOTES.md +10 -4
  12. package/benchmark/auto-resolve/fixtures/F15-frozen-diff-race-review/NOTES.md +10 -4
  13. package/benchmark/auto-resolve/fixtures/F16-cli-quote-tax-rules/NOTES.md +12 -0
  14. package/benchmark/auto-resolve/fixtures/F16-cli-quote-tax-rules/spec.md +6 -0
  15. package/benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/NOTES.md +7 -4
  16. package/benchmark/auto-resolve/fixtures/F21-cli-scheduler-priority/NOTES.md +12 -0
  17. package/benchmark/auto-resolve/fixtures/F21-cli-scheduler-priority/spec.md +6 -0
  18. package/benchmark/auto-resolve/fixtures/F22-cli-ledger-close/NOTES.md +8 -0
  19. package/benchmark/auto-resolve/fixtures/F23-cli-fulfillment-wave/NOTES.md +12 -0
  20. package/benchmark/auto-resolve/fixtures/F23-cli-fulfillment-wave/spec.md +6 -0
  21. package/benchmark/auto-resolve/fixtures/F25-cli-cart-promotion-rules/NOTES.md +16 -4
  22. package/benchmark/auto-resolve/fixtures/F25-cli-cart-promotion-rules/spec.md +7 -0
  23. package/benchmark/auto-resolve/fixtures/F26-cli-payout-ledger-rules/NOTES.md +11 -5
  24. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/NOTES.md +8 -1
  25. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/expected.json +4 -2
  26. package/benchmark/auto-resolve/fixtures/F3-backend-contract-risk/spec.md +1 -1
  27. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/NOTES.md +34 -0
  28. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/expected.json +57 -0
  29. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/metadata.json +10 -0
  30. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/setup.sh +2 -0
  31. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/spec.md +67 -0
  32. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/task.txt +7 -0
  33. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/verifiers/duplicate-event-error.js +35 -0
  34. package/benchmark/auto-resolve/fixtures/F31-cli-seat-rebalance/verifiers/priority-transfer-rollback.js +53 -0
  35. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/NOTES.md +38 -0
  36. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/expected.json +57 -0
  37. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/metadata.json +10 -0
  38. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/setup.sh +2 -0
  39. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/spec.md +70 -0
  40. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/task.txt +3 -0
  41. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/verifiers/duplicate-renewal-error.js +42 -0
  42. package/benchmark/auto-resolve/fixtures/F32-cli-subscription-renewal/verifiers/priority-credit-rollback.js +70 -0
  43. package/benchmark/auto-resolve/fixtures/F4-web-browser-design/NOTES.md +10 -3
  44. package/benchmark/auto-resolve/fixtures/F5-fix-loop-red-green/NOTES.md +7 -0
  45. package/benchmark/auto-resolve/fixtures/F6-dep-audit-native-module/NOTES.md +5 -0
  46. package/benchmark/auto-resolve/fixtures/F7-out-of-scope-trap/NOTES.md +7 -0
  47. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/NOTES.md +3 -0
  48. package/benchmark/auto-resolve/fixtures/F8-known-limit-ambiguous/spec.md +1 -1
  49. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/NOTES.md +15 -3
  50. package/benchmark/auto-resolve/fixtures/F9-e2e-ideate-to-resolve/spec.md +1 -1
  51. package/benchmark/auto-resolve/fixtures/SCHEMA.md +53 -7
  52. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/NOTES.md +37 -0
  53. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/RETIRED.md +13 -0
  54. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/expected.json +56 -0
  55. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/metadata.json +10 -0
  56. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/setup.sh +18 -0
  57. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/spec.md +69 -0
  58. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/task.txt +7 -0
  59. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/verifiers/exact-proration.js +48 -0
  60. package/benchmark/auto-resolve/fixtures/retired/F27-cli-subscription-proration/verifiers/rules-source-and-conflict.js +79 -0
  61. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/NOTES.md +54 -0
  62. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/RETIRED.md +7 -0
  63. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/expected.json +67 -0
  64. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/metadata.json +10 -0
  65. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/setup.sh +2 -0
  66. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/spec.md +67 -0
  67. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/task.txt +5 -0
  68. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/verifiers/policy-precedence.js +72 -0
  69. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/verifiers/validation-and-immutability.js +43 -0
  70. package/benchmark/auto-resolve/fixtures/retired/F28-cli-return-authorization/verifiers/validation-boundary.js +116 -0
  71. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/NOTES.md +35 -0
  72. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/RETIRED.md +12 -0
  73. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/expected.json +58 -0
  74. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/metadata.json +10 -0
  75. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/setup.sh +2 -0
  76. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/spec.md +73 -0
  77. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/task.txt +17 -0
  78. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/verifiers/mixed-idempotent-settlement.js +53 -0
  79. package/benchmark/auto-resolve/fixtures/retired/F30-cli-credit-hold-settlement/verifiers/rejection-boundaries.js +74 -0
  80. package/benchmark/auto-resolve/fixtures/retired/F9-e2e-ideate-to-preflight/NOTES.md +60 -0
  81. package/benchmark/auto-resolve/fixtures/retired/F9-e2e-ideate-to-preflight/RETIRED.md +29 -0
  82. package/benchmark/auto-resolve/fixtures/retired/F9-e2e-ideate-to-preflight/expected.json +73 -0
  83. package/benchmark/auto-resolve/fixtures/retired/F9-e2e-ideate-to-preflight/metadata.json +10 -0
  84. package/benchmark/auto-resolve/fixtures/retired/F9-e2e-ideate-to-preflight/setup.sh +28 -0
  85. package/benchmark/auto-resolve/fixtures/retired/F9-e2e-ideate-to-preflight/spec.md +58 -0
  86. package/benchmark/auto-resolve/fixtures/retired/F9-e2e-ideate-to-preflight/task.txt +5 -0
  87. package/benchmark/auto-resolve/results/20260510-f16-f23-f25-combined-proof/full-pipeline-pair-gate.json +82 -0
  88. package/benchmark/auto-resolve/results/20260510-f16-f23-f25-combined-proof/full-pipeline-pair-gate.md +18 -0
  89. package/benchmark/auto-resolve/results/20260510-f16-f23-f25-combined-proof/headroom-gate.json +46 -0
  90. package/benchmark/auto-resolve/results/20260510-f16-f23-f25-combined-proof/headroom-gate.md +17 -0
  91. package/benchmark/auto-resolve/run-real-benchmark.md +303 -0
  92. package/benchmark/auto-resolve/scripts/audit-headroom-rejections.py +441 -0
  93. package/benchmark/auto-resolve/scripts/audit-pair-evidence.py +1256 -0
  94. package/benchmark/auto-resolve/scripts/build-pair-eligible-manifest.py +147 -15
  95. package/benchmark/auto-resolve/scripts/check-f9-artifacts.py +28 -16
  96. package/benchmark/auto-resolve/scripts/collect-swebench-predictions.py +11 -1
  97. package/benchmark/auto-resolve/scripts/compile-report.py +208 -46
  98. package/benchmark/auto-resolve/scripts/fetch-swebench-instances.py +22 -4
  99. package/benchmark/auto-resolve/scripts/frozen-verify-gate.py +175 -30
  100. package/benchmark/auto-resolve/scripts/full-pipeline-pair-gate.py +408 -46
  101. package/benchmark/auto-resolve/scripts/headroom-gate.py +270 -39
  102. package/benchmark/auto-resolve/scripts/iter-0033c-compare.py +164 -33
  103. package/benchmark/auto-resolve/scripts/iter-0033c-l1-summary.py +97 -0
  104. package/benchmark/auto-resolve/scripts/judge-opus-pass.sh +150 -38
  105. package/benchmark/auto-resolve/scripts/judge.sh +153 -26
  106. package/benchmark/auto-resolve/scripts/oracle-scope-tier-a.py +12 -5
  107. package/benchmark/auto-resolve/scripts/oracle-scope-tier-b.py +25 -2
  108. package/benchmark/auto-resolve/scripts/pair-candidate-frontier.py +469 -0
  109. package/benchmark/auto-resolve/scripts/pair-plan-idgen.py +5 -5
  110. package/benchmark/auto-resolve/scripts/pair-plan-lint.py +9 -2
  111. package/benchmark/auto-resolve/scripts/pair-rejected-fixtures.sh +91 -0
  112. package/benchmark/auto-resolve/scripts/pair_evidence_contract.py +269 -0
  113. package/benchmark/auto-resolve/scripts/prepare-swebench-frozen-case.py +39 -10
  114. package/benchmark/auto-resolve/scripts/prepare-swebench-frozen-corpus.py +34 -4
  115. package/benchmark/auto-resolve/scripts/prepare-swebench-solver-worktree.py +23 -5
  116. package/benchmark/auto-resolve/scripts/recent-benchmark-summary.py +232 -0
  117. package/benchmark/auto-resolve/scripts/run-fixture.sh +118 -51
  118. package/benchmark/auto-resolve/scripts/run-frozen-verify-pair.sh +211 -39
  119. package/benchmark/auto-resolve/scripts/run-full-pipeline-pair-candidate.sh +335 -39
  120. package/benchmark/auto-resolve/scripts/run-headroom-candidate.sh +249 -6
  121. package/benchmark/auto-resolve/scripts/run-iter-0033c.sh +22 -48
  122. package/benchmark/auto-resolve/scripts/run-suite.sh +44 -7
  123. package/benchmark/auto-resolve/scripts/run-swebench-frozen-corpus.sh +120 -19
  124. package/benchmark/auto-resolve/scripts/run-swebench-solver-batch.sh +32 -14
  125. package/benchmark/auto-resolve/scripts/ship-gate.py +219 -50
  126. package/benchmark/auto-resolve/scripts/solo-ceiling-avoidance.py +53 -0
  127. package/benchmark/auto-resolve/scripts/solo-headroom-hypothesis.py +77 -0
  128. package/benchmark/auto-resolve/scripts/swebench-frozen-matrix.py +239 -26
  129. package/benchmark/auto-resolve/scripts/test-audit-headroom-rejections.sh +288 -0
  130. package/benchmark/auto-resolve/scripts/test-audit-pair-evidence.sh +1672 -0
  131. package/benchmark/auto-resolve/scripts/test-benchmark-arg-parsing.sh +933 -0
  132. package/benchmark/auto-resolve/scripts/test-build-pair-eligible-manifest.sh +491 -0
  133. package/benchmark/auto-resolve/scripts/test-check-f9-artifacts.sh +91 -0
  134. package/benchmark/auto-resolve/scripts/test-frozen-verify-gate.sh +328 -3
  135. package/benchmark/auto-resolve/scripts/test-full-pipeline-pair-gate.sh +497 -18
  136. package/benchmark/auto-resolve/scripts/test-headroom-gate.sh +331 -14
  137. package/benchmark/auto-resolve/scripts/test-iter-0033c-compare.sh +525 -0
  138. package/benchmark/auto-resolve/scripts/test-iter-0033c-l1-summary.sh +254 -0
  139. package/benchmark/auto-resolve/scripts/test-lint-fixtures.sh +580 -0
  140. package/benchmark/auto-resolve/scripts/test-pair-candidate-frontier.sh +591 -0
  141. package/benchmark/auto-resolve/scripts/test-run-full-pipeline-pair-candidate.sh +497 -0
  142. package/benchmark/auto-resolve/scripts/test-run-headroom-candidate.sh +401 -0
  143. package/benchmark/auto-resolve/scripts/test-run-swebench-solver-batch.sh +111 -0
  144. package/benchmark/auto-resolve/scripts/test-ship-gate.sh +1189 -0
  145. package/benchmark/auto-resolve/scripts/test-swebench-frozen-case.sh +924 -5
  146. package/benchmark/auto-resolve/shadow-fixtures/S1-cli-lang-flag/NOTES.md +28 -0
  147. package/benchmark/auto-resolve/shadow-fixtures/S1-cli-lang-flag/expected.json +63 -0
  148. package/benchmark/auto-resolve/shadow-fixtures/S1-cli-lang-flag/metadata.json +10 -0
  149. package/benchmark/auto-resolve/shadow-fixtures/S1-cli-lang-flag/setup.sh +3 -0
  150. package/benchmark/auto-resolve/shadow-fixtures/S1-cli-lang-flag/spec.md +47 -0
  151. package/benchmark/auto-resolve/shadow-fixtures/S1-cli-lang-flag/task.txt +1 -0
  152. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/NOTES.md +34 -0
  153. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/expected.json +53 -0
  154. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/metadata.json +10 -0
  155. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/setup.sh +3 -0
  156. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/spec.md +50 -0
  157. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/task.txt +1 -0
  158. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/verifiers/duplicate-order-error.js +27 -0
  159. package/benchmark/auto-resolve/shadow-fixtures/S2-cli-inventory-reservation/verifiers/priority-stock-reservation.js +44 -0
  160. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/NOTES.md +34 -0
  161. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/expected.json +55 -0
  162. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/metadata.json +10 -0
  163. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/setup.sh +3 -0
  164. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/spec.md +52 -0
  165. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/task.txt +1 -0
  166. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/verifiers/duplicate-ticket-error.js +29 -0
  167. package/benchmark/auto-resolve/shadow-fixtures/S3-cli-ticket-assignment/verifiers/priority-agent-assignment.js +48 -0
  168. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/NOTES.md +34 -0
  169. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/expected.json +55 -0
  170. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/metadata.json +10 -0
  171. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/setup.sh +3 -0
  172. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/spec.md +55 -0
  173. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/task.txt +1 -0
  174. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/verifiers/duplicate-return-error.js +43 -0
  175. package/benchmark/auto-resolve/shadow-fixtures/S4-cli-return-routing/verifiers/priority-return-routing.js +70 -0
  176. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/NOTES.md +37 -0
  177. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/expected.json +54 -0
  178. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/metadata.json +10 -0
  179. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/setup.sh +3 -0
  180. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/spec.md +59 -0
  181. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/task.txt +1 -0
  182. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/verifiers/credit-ledger-priority.js +98 -0
  183. package/benchmark/auto-resolve/shadow-fixtures/S5-cli-credit-grant-ledger/verifiers/duplicate-charge-error.js +38 -0
  184. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/NOTES.md +36 -0
  185. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/expected.json +56 -0
  186. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/metadata.json +10 -0
  187. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/setup.sh +3 -0
  188. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/spec.md +59 -0
  189. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/task.txt +1 -0
  190. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/verifiers/duplicate-refund-error.js +41 -0
  191. package/benchmark/auto-resolve/shadow-fixtures/S6-cli-refund-window-ledger/verifiers/priority-refund-ledger.js +65 -0
  192. package/bin/devlyn.js +221 -17
  193. package/config/skills/_shared/adapters/README.md +3 -0
  194. package/config/skills/_shared/adapters/gpt-5-5.md +5 -1
  195. package/config/skills/_shared/adapters/opus-4-7.md +9 -1
  196. package/config/skills/_shared/archive_run.py +78 -6
  197. package/config/skills/_shared/codex-config.md +5 -4
  198. package/config/skills/_shared/codex-monitored.sh +46 -1
  199. package/config/skills/_shared/collect-codex-findings.py +20 -5
  200. package/config/skills/_shared/engine-preflight.md +17 -13
  201. package/config/skills/_shared/runtime-principles.md +6 -9
  202. package/config/skills/_shared/spec-verify-check.py +2664 -107
  203. package/config/skills/_shared/verify-merge-findings.py +1369 -19
  204. package/config/skills/devlyn:design-ui/SKILL.md +364 -0
  205. package/config/skills/devlyn:ideate/SKILL.md +7 -4
  206. package/config/skills/devlyn:ideate/references/elicitation.md +50 -4
  207. package/config/skills/devlyn:ideate/references/from-spec-mode.md +26 -4
  208. package/config/skills/devlyn:ideate/references/project-mode.md +20 -1
  209. package/config/skills/devlyn:ideate/references/spec-template.md +10 -1
  210. package/config/skills/devlyn:resolve/SKILL.md +78 -26
  211. package/config/skills/devlyn:resolve/references/free-form-mode.md +15 -0
  212. package/config/skills/devlyn:resolve/references/phases/build-gate.md +2 -2
  213. package/config/skills/devlyn:resolve/references/phases/implement.md +1 -1
  214. package/config/skills/devlyn:resolve/references/phases/probe-derive.md +74 -2
  215. package/config/skills/devlyn:resolve/references/phases/verify.md +80 -29
  216. package/config/skills/devlyn:resolve/references/state-schema.md +9 -4
  217. package/package.json +47 -2
  218. package/scripts/lint-fixtures.sh +349 -0
  219. package/scripts/lint-shadow-fixtures.sh +58 -0
  220. package/scripts/lint-skills.sh +3645 -95
@@ -1,5 +1,5 @@
1
1
  #!/usr/bin/env python3
2
- """Archive auto-resolve run artifacts per references/pipeline-state.md#archive-contract.
2
+ """Archive devlyn:resolve run artifacts per references/pipeline-state.md#archive-contract.
3
3
 
4
4
  Usage:
5
5
  python3 scripts/archive_run.py [--devlyn-dir .devlyn]
@@ -16,8 +16,10 @@ from __future__ import annotations
16
16
  import argparse
17
17
  import json
18
18
  import pathlib
19
+ import re
19
20
  import shutil
20
21
  import sys
22
+ import tempfile
21
23
 
22
24
 
23
25
  PER_RUN_PATTERNS = (
@@ -57,18 +59,30 @@ PER_RUN_PATTERNS = (
57
59
  "codex-judge.*",
58
60
  )
59
61
 
62
+ SAFE_RUN_ID_RE = re.compile(r"^[A-Za-z0-9_.-]+$")
63
+
64
+
65
+ def reject_json_constant(token: str) -> None:
66
+ raise ValueError(f"invalid JSON numeric constant: {token}")
67
+
68
+
69
+ def loads_strict_json(text: str):
70
+ return json.loads(text, parse_constant=reject_json_constant)
71
+
60
72
 
61
73
  def read_run_id(devlyn: pathlib.Path) -> str:
62
74
  state_path = devlyn / "pipeline.state.json"
63
75
  if not state_path.is_file():
64
76
  raise SystemExit(f"error: {state_path} not found")
65
77
  try:
66
- state = json.loads(state_path.read_text(encoding="utf-8"))
67
- except json.JSONDecodeError as e:
78
+ state = loads_strict_json(state_path.read_text(encoding="utf-8"))
79
+ except ValueError as e:
68
80
  raise SystemExit(f"error: {state_path} is not valid JSON: {e}")
69
81
  run_id = state.get("run_id")
70
- if not run_id:
82
+ if not isinstance(run_id, str) or not run_id:
71
83
  raise SystemExit(f"error: {state_path} has no run_id")
84
+ if not SAFE_RUN_ID_RE.fullmatch(run_id):
85
+ raise SystemExit(f"error: {state_path} run_id must match [A-Za-z0-9_.-]+")
72
86
  return run_id
73
87
 
74
88
 
@@ -91,8 +105,8 @@ def prune(runs_dir: pathlib.Path, keep: int = 10) -> int:
91
105
  if not state_file.is_file():
92
106
  continue
93
107
  try:
94
- s = json.loads(state_file.read_text(encoding="utf-8"))
95
- except json.JSONDecodeError:
108
+ s = loads_strict_json(state_file.read_text(encoding="utf-8"))
109
+ except ValueError:
96
110
  # Can't decide flight-state safely; skip (never prune)
97
111
  continue
98
112
  verdict = s.get("phases", {}).get("final_report", {}).get("verdict")
@@ -109,11 +123,69 @@ def prune(runs_dir: pathlib.Path, keep: int = 10) -> int:
109
123
  return pruned
110
124
 
111
125
 
126
+ def self_test() -> int:
127
+ with tempfile.TemporaryDirectory() as tmp:
128
+ devlyn = pathlib.Path(tmp) / ".devlyn"
129
+ devlyn.mkdir()
130
+ (devlyn / "pipeline.state.json").write_text(
131
+ json.dumps({
132
+ "run_id": "run-1",
133
+ "phases": {"final_report": {"verdict": "PASS"}},
134
+ }) + "\n",
135
+ encoding="utf-8",
136
+ )
137
+ for name in (
138
+ "risk-probes.jsonl",
139
+ "verify.pair.findings.jsonl",
140
+ "verify-merge.summary.json",
141
+ "codex-judge.stdout",
142
+ "codex-judge.summary.json",
143
+ ):
144
+ (devlyn / name).write_text("{}\n", encoding="utf-8")
145
+ run_id = read_run_id(devlyn)
146
+ assert run_id == "run-1", run_id
147
+ moved = move_artifacts(devlyn, devlyn / "runs" / run_id)
148
+ assert moved >= 6, moved
149
+ for name in (
150
+ "pipeline.state.json",
151
+ "risk-probes.jsonl",
152
+ "verify.pair.findings.jsonl",
153
+ "verify-merge.summary.json",
154
+ "codex-judge.stdout",
155
+ "codex-judge.summary.json",
156
+ ):
157
+ assert (devlyn / "runs" / run_id / name).is_file(), name
158
+
159
+ bad = pathlib.Path(tmp) / "bad"
160
+ bad.mkdir()
161
+ (bad / "pipeline.state.json").write_text('{"run_id": "../escape"}\n', encoding="utf-8")
162
+ try:
163
+ read_run_id(bad)
164
+ except SystemExit as exc:
165
+ assert "run_id must match" in str(exc)
166
+ else:
167
+ raise AssertionError("unsafe archive run_id was accepted")
168
+
169
+ nan = pathlib.Path(tmp) / "nan"
170
+ nan.mkdir()
171
+ (nan / "pipeline.state.json").write_text('{"run_id": NaN}\n', encoding="utf-8")
172
+ try:
173
+ read_run_id(nan)
174
+ except SystemExit as exc:
175
+ assert "invalid JSON numeric constant: NaN" in str(exc)
176
+ else:
177
+ raise AssertionError("NaN archive run_id was accepted")
178
+ return 0
179
+
180
+
112
181
  def main() -> int:
113
182
  ap = argparse.ArgumentParser(description=__doc__.splitlines()[0])
114
183
  ap.add_argument("--devlyn-dir", default=".devlyn")
115
184
  ap.add_argument("--keep", type=int, default=10, help="keep N most recent completed runs")
185
+ ap.add_argument("--self-test", action="store_true")
116
186
  args = ap.parse_args()
187
+ if args.self_test:
188
+ return self_test()
117
189
 
118
190
  devlyn = pathlib.Path(args.devlyn_dir)
119
191
  if not devlyn.is_dir():
@@ -6,10 +6,10 @@ Single source of truth for how every skill calls Codex. **MCP is not used.** Ski
6
6
 
7
7
  All long-running Codex calls go through `codex-monitored.sh` — a thin wrapper that closes stdin (codex 0.124.0 hangs when both stdin is open and a prompt arg is given), streams Codex stdout fully (no `tail -n` truncation), and prints a `[codex-monitored] heartbeat` line every 30s so the outer `claude -p` byte-watchdog stays fed during long reasoning gaps. The wrapper passes its arguments through verbatim to the underlying CLI, so the canonical flag set is unchanged from a raw call — only the launcher differs.
8
8
 
9
- **Read-only critique / adversarial review / debate** (ideate CHALLENGE phase, `/devlyn:resolve` VERIFY pair-mode when triggered). Security review is delegated to the native `security-review` Claude Code skill, invoked from `/devlyn:resolve` BUILD_GATE rather than from Codex. Read-only critique returns findings on stdout; the orchestrator writes any files.
9
+ **Read-only critique / adversarial review / debate** (`/devlyn:resolve` VERIFY pair-mode, plus any future ideate read-only critique). Security review stays native to Claude Code BUILD_GATE. Codex returns findings on stdout; the orchestrator writes files.
10
10
 
11
11
  ```bash
12
- bash .claude/skills/_shared/codex-monitored.sh \
12
+ CODEX_MONITORED_ISOLATED=1 bash .claude/skills/_shared/codex-monitored.sh \
13
13
  -C <project-root> \
14
14
  -s read-only \
15
15
  -c model_reasoning_effort=xhigh \
@@ -31,6 +31,7 @@ Notes:
31
31
  - `-s read-only` / `--full-auto` — sandbox policy. `--full-auto` = `-s workspace-write` with auto-approval of sandboxed commands.
32
32
  - `-c model_reasoning_effort=xhigh` — config override for reasoning depth. Required for deep critique; skills may choose `high` or `medium` when thoroughness doesn't warrant xhigh.
33
33
  - **Omit `-m <model>`** — Codex CLI uses its configured flagship (currently `gpt-5.5`, automatically whatever ships next). This is the zero-touch mechanism. Only name `-m` when a role explicitly needs a different model (e.g., `gpt-5.3-codex` for SWE-bench-heavy coding tasks, `gpt-5.3-codex-spark` for speed).
34
+ - `CODEX_MONITORED_ISOLATED=1` — required for bounded read-only critique/probe/judge calls. The wrapper adds `--ignore-user-config --ignore-rules --ephemeral --disable codex_hooks --disable hooks` so user config, AGENTS.md, pyx-memory, hooks, and project rules cannot add hidden context, tool calls, or transcript side effects. Do not set it for workspace-write implementation phases.
34
35
  - Raw `codex exec ...` invocations are **forbidden** in skill prompts. The benchmark variant arm runs a PATH shim (`scripts/codex-shim/codex`) that transparently re-routes any raw `codex exec` to the wrapper as a safety net, but skills should always emit the wrapper form directly so the orchestrator's first-attempt has the right shape. Two prior iterations (iter-0006 universal foreground ban, iter-0008 prompt-level kill-shape contract) failed because the orchestrator picked starvation-prone shapes (`codex exec ... 2>&1 | tail -200`) from its own pattern prior — the wrapper plus the shim is the runtime binding layer those iters lacked. See `autoresearch/iterations/0009-wrapper-and-hook.md`.
35
36
 
36
37
  ## Availability check
@@ -41,11 +42,11 @@ Before the first Codex call in a run, verify the CLI is on PATH:
41
42
  command -v codex >/dev/null 2>&1
42
43
  ```
43
44
 
44
- If the check fails, the skill follows the `_shared/engine-preflight.md` downgrade rule silently switch to Claude for this run and log `engine downgraded: codex-unavailable` in the final report. Never prompt, never abort.
45
+ If the check fails while Codex is explicitly selected or conditionally required by pair/risk-probe VERIFY, follow `_shared/engine-preflight.md`: stop with `BLOCKED:codex-unavailable`, preserve run evidence, and print setup guidance. Do not convert the run to Claude. `--no-pair` and `--no-risk-probes` are explicit user opt-outs for reruns, not automatic fallbacks.
45
46
 
46
47
  ## Why CLI over other paths
47
48
 
48
- The local Codex CLI (fronted by `codex-monitored.sh`) is the primary (and only) integration. It beats alternatives on three dimensions: the model is inherited from the CLI's own default so no skill edits are needed when OpenAI ships a new flagship; flags compose on the command line and the skill docs stay grep-friendly; the invocation has one failure mode (the binary is on PATH or it isn't), which the shared availability check covers cleanly.
49
+ The local Codex CLI (fronted by `codex-monitored.sh`) is the primary (and only) integration. It beats alternatives on three dimensions: the model is inherited from the CLI's own default so no skill edits are needed when OpenAI ships a new flagship; flags compose on the command line and the skill docs stay grep-friendly; the invocation has one failure mode (the binary is on PATH or it isn't), which the shared availability check reports explicitly.
49
50
 
50
51
  ## Invocation from inside a skill prompt
51
52
 
@@ -47,6 +47,10 @@
47
47
  # CODEX_REAL_BIN when set, else `codex`.
48
48
  # Set this when the shim has put us first
49
49
  # on PATH.
50
+ # CODEX_MONITORED_ISOLATED — set non-empty for bounded read-only
51
+ # probe/judge calls that must ignore
52
+ # user config, project rules, session
53
+ # persistence, and hook side effects.
50
54
  # CODEX_MONITORED_ALLOW_PIPED — set non-empty to skip the pipe-stdout
51
55
  # refusal. Reserved for tests; don't use
52
56
  # in skill prompts.
@@ -70,6 +74,44 @@ TIMEOUT_SEC="${CODEX_MONITORED_TIMEOUT_SEC:-0}"
70
74
  CODEX_BIN="${CODEX_BIN:-${CODEX_REAL_BIN:-codex}}"
71
75
  START=$(date +%s)
72
76
  TIMEOUT_FLAG=""
77
+ CODEX_ARGS=("$@")
78
+
79
+ require_nonnegative_int() {
80
+ local name="$1"
81
+ local value="$2"
82
+ case "$value" in
83
+ ''|*[!0-9]*)
84
+ printf '[codex-monitored] error: %s must be a non-negative integer (got %s)\n' \
85
+ "$name" "$value" >&2
86
+ exit 64
87
+ ;;
88
+ esac
89
+ }
90
+
91
+ require_positive_int() {
92
+ local name="$1"
93
+ local value="$2"
94
+ require_nonnegative_int "$name" "$value"
95
+ if [ "$value" -le 0 ]; then
96
+ printf '[codex-monitored] error: %s must be > 0 (got %s)\n' \
97
+ "$name" "$value" >&2
98
+ exit 64
99
+ fi
100
+ }
101
+
102
+ require_positive_int CODEX_MONITORED_HEARTBEAT "$HEARTBEAT_SEC"
103
+ require_nonnegative_int CODEX_MONITORED_TIMEOUT_SEC "$TIMEOUT_SEC"
104
+
105
+ if [ -n "${CODEX_MONITORED_ISOLATED:-}" ]; then
106
+ CODEX_ARGS=(
107
+ --ignore-user-config
108
+ --ignore-rules
109
+ --ephemeral
110
+ --disable codex_hooks
111
+ --disable hooks
112
+ "${CODEX_ARGS[@]}"
113
+ )
114
+ fi
73
115
 
74
116
  # --- Pipe-stdout refusal (iter-0009 R2 finding #1) -------------------------
75
117
  # `[ -p /dev/stdout ]` is the POSIX test for "is fd 1 a FIFO/pipe". Verified
@@ -169,10 +211,13 @@ trap cleanup EXIT
169
211
 
170
212
  printf '[codex-monitored] start: ts=%s heartbeat=%ds timeout=%ss bin=%s\n' \
171
213
  "$(date -u +%FT%TZ)" "$HEARTBEAT_SEC" "$TIMEOUT_SEC" "$CODEX_BIN" >&2
214
+ if [ -n "${CODEX_MONITORED_ISOLATED:-}" ]; then
215
+ printf '[codex-monitored] isolated=1\n' >&2
216
+ fi
172
217
 
173
218
  # Launch codex with stdin closed; output streams directly to OUR stdout/stderr.
174
219
  set -m
175
- "$CODEX_BIN" exec "$@" < /dev/null &
220
+ "$CODEX_BIN" exec "${CODEX_ARGS[@]}" < /dev/null &
176
221
  CODEX_PID=$!
177
222
  set +m
178
223
  printf '[codex-monitored] codex pid=%d\n' "$CODEX_PID" >&2
@@ -14,6 +14,14 @@ from typing import Any
14
14
  FINDING_SEVERITIES = {"CRITICAL", "HIGH", "MEDIUM", "LOW", "INFO"}
15
15
 
16
16
 
17
+ def reject_json_constant(token: str) -> None:
18
+ raise ValueError(f"invalid JSON numeric constant: {token}")
19
+
20
+
21
+ def loads_strict_json(text: str) -> Any:
22
+ return json.loads(text, parse_constant=reject_json_constant)
23
+
24
+
17
25
  def atomic_write(path: pathlib.Path, text: str) -> None:
18
26
  path.parent.mkdir(parents=True, exist_ok=True)
19
27
  with tempfile.NamedTemporaryFile(
@@ -34,8 +42,8 @@ def collect(stdout_path: pathlib.Path) -> tuple[list[dict[str, Any]], dict[str,
34
42
  continue
35
43
  if raw.startswith("# SUMMARY "):
36
44
  try:
37
- item = json.loads(raw.removeprefix("# SUMMARY ").strip())
38
- except json.JSONDecodeError as exc:
45
+ item = loads_strict_json(raw.removeprefix("# SUMMARY ").strip())
46
+ except ValueError as exc:
39
47
  raise SystemExit(f"error: invalid SUMMARY JSON at {stdout_path}:{line_no}: {exc}")
40
48
  if not isinstance(item, dict):
41
49
  raise SystemExit(f"error: SUMMARY is not an object at {stdout_path}:{line_no}")
@@ -44,8 +52,8 @@ def collect(stdout_path: pathlib.Path) -> tuple[list[dict[str, Any]], dict[str,
44
52
  if raw.startswith("#"):
45
53
  continue
46
54
  try:
47
- item = json.loads(raw)
48
- except json.JSONDecodeError as exc:
55
+ item = loads_strict_json(raw)
56
+ except ValueError as exc:
49
57
  raise SystemExit(f"error: invalid JSONL at {stdout_path}:{line_no}: {exc}")
50
58
  if not isinstance(item, dict):
51
59
  raise SystemExit(f"error: JSONL item is not an object at {stdout_path}:{line_no}")
@@ -74,7 +82,14 @@ def self_test() -> int:
74
82
  findings, summary = collect(stdout_path)
75
83
  write_outputs(findings, summary, out_path, summary_path)
76
84
  assert out_path.read_text(encoding="utf-8").count("\n") == 1
77
- assert json.loads(summary_path.read_text(encoding="utf-8"))["verdict"] == "NEEDS_WORK"
85
+ assert loads_strict_json(summary_path.read_text(encoding="utf-8"))["verdict"] == "NEEDS_WORK"
86
+ stdout_path.write_text('{"id":"nan","severity":NaN}\n', encoding="utf-8")
87
+ try:
88
+ collect(stdout_path)
89
+ except SystemExit as exc:
90
+ assert "invalid JSON numeric constant: NaN" in str(exc)
91
+ else:
92
+ raise AssertionError("NaN Codex stdout finding must not normalize")
78
93
  stdout_path.write_text("", encoding="utf-8")
79
94
  try:
80
95
  collect(stdout_path)
@@ -1,34 +1,38 @@
1
- # Shared — `--engine` Pre-flight
1
+ # Shared — Engine Pre-flight
2
2
 
3
3
  Used by `/devlyn:resolve` and `/devlyn:ideate`. One shared availability rule so every skill routes identically.
4
4
 
5
5
  ## Rule
6
6
 
7
- Each skill resolves the effective engine from its own SKILL.md default plus any explicit `--engine` flag passed by the user. This pre-flight runs **only when the resolved engine is `auto` or `codex`** — when the resolved engine is `claude` (whether by skill default or explicit flag), the Codex check is skipped entirely.
7
+ Each skill resolves the effective engine from its own SKILL.md default plus any explicit `--engine` flag passed by the user. `/devlyn:resolve` also computes conditional pair/risk-probe requirements before the phase that needs the OTHER engine.
8
8
 
9
- When the resolved engine is `auto` or `codex`, on entry (before spawning any phase that could route to Codex):
9
+ When a run or phase requires Codex, before spawning that phase:
10
10
 
11
11
  1. Check if the Codex CLI is installed: `command -v codex >/dev/null 2>&1` (or equivalent bash test).
12
- 2. On failure silently set `engine = "claude"` for the remainder of this run AND log `engine downgraded: codex-unavailable` into the skill's final summary/report header.
13
- 3. On success proceed with the original engine value.
12
+ 2. On failure -> set the current phase/run verdict to `BLOCKED:codex-unavailable`, preserve the failed check evidence, and show setup guidance: install/configure the Codex CLI, run the current Codex auth/login flow, verify `codex --version`, then rerun. If the user intentionally wants solo VERIFY, they may rerun with `--no-pair`.
13
+ 3. On success -> proceed with the original engine value.
14
14
 
15
- Never prompt the user. Never abort the run on missing CLI.
15
+ When a run or phase requires Claude, before spawning that phase:
16
16
 
17
- Per-skill defaults: `/devlyn:resolve` defaults to `claude` for PLAN/IMPLEMENT (post iter-0020 close-out — Codex BUILD/IMPLEMENT below quality floor; iter-0033g + iter-0034 close-out — PLAN-pair research-only until container/sandbox infra justifies a measurement). `/devlyn:resolve` VERIFY is the exception: gated pair-JUDGE may invoke the OTHER engine when its SKILL.md trigger policy fires. `/devlyn:ideate` defaults to `auto` for the CHALLENGE phase's cross-model GAN-critic dynamic. Each skill's SKILL.md flag block is the source of truth for that skill's default.
17
+ 1. Confirm the runtime can spawn Claude agents. Where the CLI is the launcher, `command -v claude >/dev/null 2>&1` is the equivalent check.
18
+ 2. On failure -> set the current phase/run verdict to `BLOCKED:claude-unavailable` and show setup guidance: install/configure Claude Code, verify `claude --version` where available, then rerun.
19
+ 3. On success -> proceed.
18
20
 
19
- ## Why this is the one permitted silent fallback
21
+ Never prompt the user mid-pipeline. Missing required engines are explicit BLOCKED states, not silent fallbacks.
20
22
 
21
- `CLAUDE.md` sets the no-silent-fallback rule for this repo. This downgrade is documented there as the single explicit exception because the hands-free contract skills the user walks away from would otherwise fail every run whenever the Codex CLI is absent. The user-visible behavior is identical to an explicit `--engine claude` invocation, and the banner in the final report removes the silence. Any other silent fallback in skills code is a bug.
23
+ Per-skill defaults: `/devlyn:resolve` uses Claude for PLAN/IMPLEMENT; VERIFY may invoke the OTHER engine when its pair-JUDGE trigger fires. `/devlyn:ideate` defaults to Claude; `--engine` selects the elicitation/normalization adapter, not an automatic cross-model challenge phase. Any future ideate read-only critique must follow `_shared/codex-config.md` isolation rules. Each SKILL.md flag block is source of truth for that skill's default.
22
24
 
23
- ## What a skill must log after downgrade
25
+ ## What a skill must report after a BLOCKED engine check
24
26
 
25
- When the resolved engine was `auto` / `codex` and the Codex CLI was absent, the final user-facing report/summary shows both the requested and effective mode:
27
+ When an engine required by the selected route or conditional pair trigger is absent, the final user-facing report/summary shows the requested route, the missing engine, and setup steps:
26
28
 
27
29
  ```
28
- Engine: claude (downgraded from auto — codex-unavailable)
30
+ Engine: claude + codex pair required
31
+ Verdict: BLOCKED:codex-unavailable
32
+ Setup: install/configure Codex CLI; run the current Codex auth/login flow; verify `codex --version`; rerun. Use `--no-pair` only for an intentional solo VERIFY run.
29
33
  ```
30
34
 
31
- If no downgrade happened (either Codex was available, or the resolved engine was already `claude`), omit the parenthetical. That single line is the contract — the user can always see why Codex did or did not participate.
35
+ Do not report a downgraded successful run when a required engine is missing.
32
36
 
33
37
  ## Canonical Codex invocation
34
38
 
@@ -1,6 +1,6 @@
1
1
  # Runtime principles — sub-agent contract
2
2
 
3
- The runtime contract every sub-agent inside `/devlyn:resolve` (PLAN / IMPLEMENT / BUILD_GATE / CLEANUP / VERIFY) and `/devlyn:ideate` (FRAME / EXPLORE / SPEC / CHALLENGE) must satisfy. Source of truth for sub-agent behavior on user tasks. NOT for autoresearch-loop / harness-developer concerns (see `autoresearch/PRINCIPLES.md`).
3
+ The runtime contract every sub-agent inside `/devlyn:resolve` (PLAN / IMPLEMENT / BUILD_GATE / CLEANUP / VERIFY) and `/devlyn:ideate` must satisfy. Source of truth for sub-agent behavior on user tasks. NOT for autoresearch-loop / harness-developer concerns (see `autoresearch/PRINCIPLES.md`).
4
4
 
5
5
  The four sections below mirror the corresponding CLAUDE.md sections (Subtractive-first editing, Goal-locked execution, No-workaround discipline, Evidence over claim). Each section is wrapped in `<!-- runtime-principles:section=NAME:begin -->` / `:end -->` markers in BOTH this file and CLAUDE.md; lint Check 12 (added in iter-0019.A Step 5) extracts each named block from both files and diffs to detect drift.
6
6
 
@@ -79,7 +79,7 @@ No `any`, no `@ts-ignore`, no silent `catch`, no hardcoded values, no helper scr
79
79
 
80
80
  **Permitted exceptions** (explicitly carved out):
81
81
  - CSS fallback fonts, CDN failover, image placeholders — widely-accepted best practices.
82
- - Codex CLI availability downgrade — the one documented silent fallback in this repo. Fires when the resolved engine is `auto` or `codex` (either via skill default or explicit `--engine` flag) and the Codex CLI is absent. Banner `engine downgraded: codex-unavailable` always prints; verdict identical to `--engine claude`. Any other silent fallback in skills code is a bug — file it against the skill that introduced it.
82
+ - No engine-availability fallback is permitted for `/devlyn:resolve` pair/risk-probe routes. If Codex or Claude is required and unavailable, the run stops with `BLOCKED:codex-unavailable` or `BLOCKED:claude-unavailable` plus setup guidance. `--no-pair` / `--no-risk-probes` are explicit user opt-outs, not fallbacks.
83
83
  <!-- runtime-principles:section=no-workaround:end -->
84
84
 
85
85
  ## Evidence over claim
@@ -97,14 +97,11 @@ A finding without one of these forms is excluded. Vague findings produce vague f
97
97
  <!-- runtime-principles:contract:end -->
98
98
 
99
99
  <!-- runtime-principles:consumption:begin -->
100
- ## Consumption (as of iter-0019.A)
100
+ ## Consumption
101
101
 
102
102
  **Consumers**:
103
- - `auto-resolve/SKILL.md` `<harness_principles>` block points here as the contract source. Phase prompt bodies (`phase-1-build.md`, `phase-2-evaluate.md`, `phase-3-critic.md`) inline a compact operational excerpt derived from the contract — phase-specific rule_id mappings + the four section names — not the full text.
104
- - `preflight/SKILL.md` PHASE 3 (Synthesize) and PHASE 3.5 (RND2) reference this file. Auditor prompts (`code-auditor.md`, `browser-auditor.md`) emit `principle.*` rule_ids derived from the rules above.
103
+ - `devlyn:resolve/SKILL.md` `<harness_principles>` points here as the contract source. Phase prompt bodies inline or reference the operational excerpt needed for each phase.
104
+ - `devlyn:ideate/SKILL.md` consumes this file for spec-shaping and conversation discipline through its own `<harness_principles>` block.
105
105
 
106
- **Codex routing**: skills that route to Codex (auto-resolve fix-loop on `--engine auto`/`codex`, preflight code-auditor on `--engine auto`/`codex`) MUST inline the contract excerpt directly into the Codex prompt Codex has no filesystem access under `read-only` sandbox.
107
-
108
- **Non-consumers**:
109
- - `ideate/SKILL.md` does NOT consume this file. Ideate is planning-layer; its CHALLENGE rubric (`references/challenge-rubric.md`) covers analogous concerns at planning scope, with deliberate one-shot Codex critic discipline.
106
+ **Codex routing**: Codex-routed phases must inline the contract excerpt directly into the prompt body. Bounded read-only Codex critique, probe, or judge calls must also follow `_shared/codex-config.md` isolation rules.
110
107
  <!-- runtime-principles:consumption:end -->