martin-loop 0.1.5 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (274) hide show
  1. package/CODE_OF_CONDUCT.md +32 -0
  2. package/LICENSE +21 -21
  3. package/README.md +307 -398
  4. package/demo/seeded-workspace/README.md +35 -35
  5. package/demo/seeded-workspace/TASKS.md +29 -29
  6. package/demo/seeded-workspace/martin.config.yaml +11 -11
  7. package/demo/seeded-workspace/package.json +8 -8
  8. package/demo/seeded-workspace/src/invoice-summary.js +11 -11
  9. package/demo/seeded-workspace/test/invoice-summary.test.js +20 -20
  10. package/dist/bin/martin-loop.js +0 -0
  11. package/dist/vendor/adapters/counter.d.ts +1 -0
  12. package/dist/vendor/adapters/counter.js +4 -0
  13. package/dist/vendor/adapters/git-baseline.d.ts +50 -0
  14. package/dist/vendor/adapters/git-baseline.js +233 -0
  15. package/dist/vendor/adapters/openrouter-adapter.d.ts +15 -0
  16. package/dist/vendor/adapters/openrouter-adapter.js +302 -0
  17. package/dist/vendor/adapters/usage.d.ts +48 -0
  18. package/dist/vendor/adapters/usage.js +66 -0
  19. package/dist/vendor/cli/bin/exit.d.ts +12 -0
  20. package/dist/vendor/cli/bin/exit.js +28 -0
  21. package/dist/vendor/cli/commands/analyze.d.ts +5 -0
  22. package/dist/vendor/cli/commands/analyze.js +58 -0
  23. package/dist/vendor/cli/commands/audit-log-verify.d.ts +34 -0
  24. package/dist/vendor/cli/commands/audit-log-verify.js +99 -0
  25. package/dist/vendor/cli/commands/audit.d.ts +8 -0
  26. package/dist/vendor/cli/commands/audit.js +199 -0
  27. package/dist/vendor/cli/commands/corpus.d.ts +5 -0
  28. package/dist/vendor/cli/commands/corpus.js +60 -0
  29. package/dist/vendor/cli/commands/doctor.d.ts +8 -0
  30. package/dist/vendor/cli/commands/doctor.js +219 -0
  31. package/dist/vendor/cli/commands/explain.d.ts +17 -0
  32. package/dist/vendor/cli/commands/explain.js +176 -0
  33. package/dist/vendor/cli/commands/export.d.ts +5 -0
  34. package/dist/vendor/cli/commands/export.js +60 -0
  35. package/dist/vendor/cli/commands/governance.d.ts +8 -0
  36. package/dist/vendor/cli/commands/governance.js +95 -0
  37. package/dist/vendor/cli/commands/improve.d.ts +18 -0
  38. package/dist/vendor/cli/commands/improve.js +396 -0
  39. package/dist/vendor/cli/commands/init.d.ts +8 -0
  40. package/dist/vendor/cli/commands/init.js +281 -0
  41. package/dist/vendor/cli/commands/migration.d.ts +8 -0
  42. package/dist/vendor/cli/commands/migration.js +67 -0
  43. package/dist/vendor/cli/commands/prior.d.ts +23 -0
  44. package/dist/vendor/cli/commands/prior.js +145 -0
  45. package/dist/vendor/cli/commands/resume.d.ts +21 -0
  46. package/dist/vendor/cli/commands/resume.js +73 -0
  47. package/dist/vendor/cli/commands/verify.d.ts +6 -0
  48. package/dist/vendor/cli/commands/verify.js +43 -0
  49. package/dist/vendor/cli/research/public-corpus.d.ts +43 -0
  50. package/dist/vendor/cli/research/public-corpus.js +151 -0
  51. package/dist/vendor/cli/ui/error-card.d.ts +38 -0
  52. package/dist/vendor/cli/ui/error-card.js +103 -0
  53. package/dist/vendor/cli/ui/mission-brief.d.ts +41 -0
  54. package/dist/vendor/cli/ui/mission-brief.js +173 -0
  55. package/dist/vendor/cli/ui/summary-card.d.ts +34 -0
  56. package/dist/vendor/cli/ui/summary-card.js +102 -0
  57. package/dist/vendor/contracts/audit.d.ts +46 -0
  58. package/dist/vendor/contracts/audit.js +360 -0
  59. package/dist/vendor/contracts/post-phase15.d.ts +240 -0
  60. package/dist/vendor/contracts/post-phase15.js +166 -0
  61. package/dist/vendor/core/agent/mandates.d.ts +46 -0
  62. package/dist/vendor/core/agent/mandates.js +178 -0
  63. package/dist/vendor/core/agent/receipts.d.ts +38 -0
  64. package/dist/vendor/core/agent/receipts.js +131 -0
  65. package/dist/vendor/core/agent/signing.d.ts +17 -0
  66. package/dist/vendor/core/agent/signing.js +91 -0
  67. package/dist/vendor/core/attestation/sign.d.ts +25 -0
  68. package/dist/vendor/core/attestation/sign.js +216 -0
  69. package/dist/vendor/core/autonomy/autonomous-promotion.d.ts +120 -0
  70. package/dist/vendor/core/autonomy/autonomous-promotion.js +346 -0
  71. package/dist/vendor/core/autonomy/envelope-v2.d.ts +29 -0
  72. package/dist/vendor/core/autonomy/envelope-v2.js +60 -0
  73. package/dist/vendor/core/autonomy/envelope.d.ts +17 -0
  74. package/dist/vendor/core/autonomy/envelope.js +27 -0
  75. package/dist/vendor/core/autonomy/escalation-ledger.d.ts +20 -0
  76. package/dist/vendor/core/autonomy/escalation-ledger.js +18 -0
  77. package/dist/vendor/core/autonomy/resume.d.ts +15 -0
  78. package/dist/vendor/core/autonomy/resume.js +23 -0
  79. package/dist/vendor/core/circuit/circuit-breaker.d.ts +60 -0
  80. package/dist/vendor/core/circuit/circuit-breaker.js +143 -0
  81. package/dist/vendor/core/context-distillation.d.ts +3 -0
  82. package/dist/vendor/core/context-distillation.js +44 -0
  83. package/dist/vendor/core/context-flow/compile-context.d.ts +8 -0
  84. package/dist/vendor/core/context-flow/compile-context.js +111 -0
  85. package/dist/vendor/core/context-flow/entities.d.ts +2 -0
  86. package/dist/vendor/core/context-flow/entities.js +44 -0
  87. package/dist/vendor/core/context-flow/evaluate-policy.d.ts +2 -0
  88. package/dist/vendor/core/context-flow/evaluate-policy.js +42 -0
  89. package/dist/vendor/core/context-flow/index.d.ts +11 -0
  90. package/dist/vendor/core/context-flow/index.js +24 -0
  91. package/dist/vendor/core/context-flow/labels.d.ts +3 -0
  92. package/dist/vendor/core/context-flow/labels.js +17 -0
  93. package/dist/vendor/core/context-flow/normalizer.d.ts +9 -0
  94. package/dist/vendor/core/context-flow/normalizer.js +69 -0
  95. package/dist/vendor/core/context-flow/profiles.d.ts +33 -0
  96. package/dist/vendor/core/context-flow/profiles.js +36 -0
  97. package/dist/vendor/core/context-flow/redaction.d.ts +1 -0
  98. package/dist/vendor/core/context-flow/redaction.js +6 -0
  99. package/dist/vendor/core/context-flow/sensitivity.d.ts +2 -0
  100. package/dist/vendor/core/context-flow/sensitivity.js +27 -0
  101. package/dist/vendor/core/context-flow/sync-preview.d.ts +2 -0
  102. package/dist/vendor/core/context-flow/sync-preview.js +22 -0
  103. package/dist/vendor/core/context-flow/token-estimator.d.ts +3 -0
  104. package/dist/vendor/core/context-flow/token-estimator.js +13 -0
  105. package/dist/vendor/core/context-flow/types.d.ts +91 -0
  106. package/dist/vendor/core/context-flow/types.js +2 -0
  107. package/dist/vendor/core/context-utility.d.ts +47 -0
  108. package/dist/vendor/core/context-utility.js +405 -0
  109. package/dist/vendor/core/cost/pipeline.d.ts +92 -0
  110. package/dist/vendor/core/cost/pipeline.js +141 -0
  111. package/dist/vendor/core/cost/tagged-cost.d.ts +27 -0
  112. package/dist/vendor/core/cost/tagged-cost.js +55 -0
  113. package/dist/vendor/core/cost-governor.d.ts +2 -0
  114. package/dist/vendor/core/cost-governor.js +50 -0
  115. package/dist/vendor/core/cve/cve-check.d.ts +80 -0
  116. package/dist/vendor/core/cve/cve-check.js +172 -0
  117. package/dist/vendor/core/digital-twin/index.d.ts +27 -0
  118. package/dist/vendor/core/digital-twin/index.js +90 -0
  119. package/dist/vendor/core/drift/drift-graph.d.ts +47 -0
  120. package/dist/vendor/core/drift/drift-graph.js +100 -0
  121. package/dist/vendor/core/drift/objective-lock.d.ts +69 -0
  122. package/dist/vendor/core/drift/objective-lock.js +88 -0
  123. package/dist/vendor/core/drift/scope.d.ts +46 -0
  124. package/dist/vendor/core/drift/scope.js +102 -0
  125. package/dist/vendor/core/drift/signature-lock.d.ts +48 -0
  126. package/dist/vendor/core/drift/signature-lock.js +202 -0
  127. package/dist/vendor/core/drift/stale-proof-gate.d.ts +21 -0
  128. package/dist/vendor/core/drift/stale-proof-gate.js +19 -0
  129. package/dist/vendor/core/eval/known-bad-world-runner.d.ts +24 -0
  130. package/dist/vendor/core/eval/known-bad-world-runner.js +256 -0
  131. package/dist/vendor/core/evidence/claim-audit.d.ts +18 -0
  132. package/dist/vendor/core/evidence/claim-audit.js +89 -0
  133. package/dist/vendor/core/exit-intelligence.d.ts +2 -0
  134. package/dist/vendor/core/exit-intelligence.js +58 -0
  135. package/dist/vendor/core/explain/formatter.d.ts +42 -0
  136. package/dist/vendor/core/explain/formatter.js +171 -0
  137. package/dist/vendor/core/explain/timeline.d.ts +29 -0
  138. package/dist/vendor/core/explain/timeline.js +213 -0
  139. package/dist/vendor/core/failure-taxonomy.d.ts +2 -0
  140. package/dist/vendor/core/failure-taxonomy.js +76 -0
  141. package/dist/vendor/core/gateway/index.d.ts +10 -0
  142. package/dist/vendor/core/gateway/index.js +12 -0
  143. package/dist/vendor/core/gateway/registry.d.ts +40 -0
  144. package/dist/vendor/core/gateway/registry.js +97 -0
  145. package/dist/vendor/core/gateway/transport.d.ts +31 -0
  146. package/dist/vendor/core/gateway/transport.js +82 -0
  147. package/dist/vendor/core/gateway/vault.d.ts +19 -0
  148. package/dist/vendor/core/gateway/vault.js +29 -0
  149. package/dist/vendor/core/graph/adapters.d.ts +43 -0
  150. package/dist/vendor/core/graph/adapters.js +91 -0
  151. package/dist/vendor/core/graph/hotspots.d.ts +22 -0
  152. package/dist/vendor/core/graph/hotspots.js +30 -0
  153. package/dist/vendor/core/graph/index.d.ts +1 -0
  154. package/dist/vendor/core/graph/index.js +2 -0
  155. package/dist/vendor/core/honey/honey-tokens.d.ts +32 -0
  156. package/dist/vendor/core/honey/honey-tokens.js +44 -0
  157. package/dist/vendor/core/index.d.ts +2 -2
  158. package/dist/vendor/core/index.js +38 -12
  159. package/dist/vendor/core/learning/bayesian-update.d.ts +31 -0
  160. package/dist/vendor/core/learning/bayesian-update.js +60 -0
  161. package/dist/vendor/core/learning/prior-sets.d.ts +42 -0
  162. package/dist/vendor/core/learning/prior-sets.js +111 -0
  163. package/dist/vendor/core/learning/promotion-gate.d.ts +17 -0
  164. package/dist/vendor/core/learning/promotion-gate.js +23 -0
  165. package/dist/vendor/core/leash/blast-radius.d.ts +42 -0
  166. package/dist/vendor/core/leash/blast-radius.js +156 -0
  167. package/dist/vendor/core/leash/policy-leash.d.ts +31 -0
  168. package/dist/vendor/core/leash/policy-leash.js +117 -0
  169. package/dist/vendor/core/memo/memo.d.ts +63 -0
  170. package/dist/vendor/core/memo/memo.js +97 -0
  171. package/dist/vendor/core/memory/learning-pipeline.d.ts +154 -0
  172. package/dist/vendor/core/memory/learning-pipeline.js +391 -0
  173. package/dist/vendor/core/memory/palace.d.ts +84 -0
  174. package/dist/vendor/core/memory/palace.js +379 -0
  175. package/dist/vendor/core/merge/ast-merge.d.ts +22 -0
  176. package/dist/vendor/core/merge/ast-merge.js +350 -0
  177. package/dist/vendor/core/merge/text-merge.d.ts +12 -0
  178. package/dist/vendor/core/merge/text-merge.js +182 -0
  179. package/dist/vendor/core/otel/tracer.d.ts +45 -0
  180. package/dist/vendor/core/otel/tracer.js +116 -0
  181. package/dist/vendor/core/parallel/parallel-attempts.d.ts +28 -0
  182. package/dist/vendor/core/parallel/parallel-attempts.js +41 -0
  183. package/dist/vendor/core/parallel/scorer.d.ts +24 -0
  184. package/dist/vendor/core/parallel/scorer.js +65 -0
  185. package/dist/vendor/core/pattern-detection.d.ts +64 -0
  186. package/dist/vendor/core/pattern-detection.js +108 -0
  187. package/dist/vendor/core/persistence/checkpoint.d.ts +44 -0
  188. package/dist/vendor/core/persistence/checkpoint.js +156 -0
  189. package/dist/vendor/core/persistence/cleanup.d.ts +22 -0
  190. package/dist/vendor/core/persistence/cleanup.js +131 -0
  191. package/dist/vendor/core/persistence/index.d.ts +2 -0
  192. package/dist/vendor/core/persistence/index.js +1 -0
  193. package/dist/vendor/core/persistence/runs-reader.d.ts +52 -0
  194. package/dist/vendor/core/persistence/runs-reader.js +84 -0
  195. package/dist/vendor/core/persistence/store.d.ts +6 -1
  196. package/dist/vendor/core/persistence/store.js +5 -0
  197. package/dist/vendor/core/policy/file-touch-quota.d.ts +60 -0
  198. package/dist/vendor/core/policy/file-touch-quota.js +105 -0
  199. package/dist/vendor/core/policy/policy-loader.d.ts +30 -0
  200. package/dist/vendor/core/policy/policy-loader.js +170 -0
  201. package/dist/vendor/core/policy/policy-schema.d.ts +55 -0
  202. package/dist/vendor/core/policy/policy-schema.js +78 -0
  203. package/dist/vendor/core/probe/probe.d.ts +49 -0
  204. package/dist/vendor/core/probe/probe.js +115 -0
  205. package/dist/vendor/core/proof/patch-proof.d.ts +58 -0
  206. package/dist/vendor/core/proof/patch-proof.js +84 -0
  207. package/dist/vendor/core/proof/semantic-probe.d.ts +25 -0
  208. package/dist/vendor/core/proof/semantic-probe.js +82 -0
  209. package/dist/vendor/core/recovery/failure-mode-runner.d.ts +29 -0
  210. package/dist/vendor/core/recovery/failure-mode-runner.js +39 -0
  211. package/dist/vendor/core/red-blue/red-phase.d.ts +64 -0
  212. package/dist/vendor/core/red-blue/red-phase.js +141 -0
  213. package/dist/vendor/core/red-blue/risk-tiers.d.ts +22 -0
  214. package/dist/vendor/core/red-blue/risk-tiers.js +33 -0
  215. package/dist/vendor/core/replay/replay.d.ts +85 -0
  216. package/dist/vendor/core/replay/replay.js +109 -0
  217. package/dist/vendor/core/router/engine.d.ts +54 -0
  218. package/dist/vendor/core/router/engine.js +131 -0
  219. package/dist/vendor/core/router/index.d.ts +1 -0
  220. package/dist/vendor/core/router/index.js +2 -0
  221. package/dist/vendor/core/router/trust-calibration.d.ts +57 -0
  222. package/dist/vendor/core/router/trust-calibration.js +127 -0
  223. package/dist/vendor/core/run-martin.d.ts +2 -0
  224. package/dist/vendor/core/run-martin.js +287 -0
  225. package/dist/vendor/core/security/cve-scanner.d.ts +62 -0
  226. package/dist/vendor/core/security/cve-scanner.js +178 -0
  227. package/dist/vendor/core/sentinel/efficiency-sentinel.d.ts +29 -0
  228. package/dist/vendor/core/sentinel/efficiency-sentinel.js +30 -0
  229. package/dist/vendor/core/sentinel/progress-guard.d.ts +35 -0
  230. package/dist/vendor/core/sentinel/progress-guard.js +46 -0
  231. package/dist/vendor/core/siem/siem-emitter.d.ts +49 -0
  232. package/dist/vendor/core/siem/siem-emitter.js +157 -0
  233. package/dist/vendor/core/strategy/attempt-brief.d.ts +22 -0
  234. package/dist/vendor/core/strategy/attempt-brief.js +89 -0
  235. package/dist/vendor/core/summarize/diff-summary.d.ts +35 -0
  236. package/dist/vendor/core/summarize/diff-summary.js +204 -0
  237. package/dist/vendor/core/surface-signals.d.ts +21 -0
  238. package/dist/vendor/core/surface-signals.js +139 -0
  239. package/dist/vendor/core/truth/truth-wall.d.ts +51 -0
  240. package/dist/vendor/core/truth/truth-wall.js +69 -0
  241. package/dist/vendor/core/truth-spine.d.ts +26 -0
  242. package/dist/vendor/core/truth-spine.js +62 -0
  243. package/dist/vendor/core/types.d.ts +115 -0
  244. package/dist/vendor/core/types.js +2 -0
  245. package/dist/vendor/core/verification/tiered-verify.d.ts +17 -0
  246. package/dist/vendor/core/verification/tiered-verify.js +29 -0
  247. package/dist/vendor/core/verifier-pyramid.d.ts +32 -0
  248. package/dist/vendor/core/verifier-pyramid.js +111 -0
  249. package/dist/vendor/core/workflow-artifacts.d.ts +99 -0
  250. package/dist/vendor/core/workflow-artifacts.js +668 -0
  251. package/dist/vendor/core/wrap/supervised-run.d.ts +96 -0
  252. package/dist/vendor/core/wrap/supervised-run.js +178 -0
  253. package/docs/assets/cli-animated.svg +139 -0
  254. package/docs/assets/cli-static.svg +34 -0
  255. package/docs/assets/github-hero-v2.svg +23 -0
  256. package/docs/assets/martin-raplph.png.jpg +0 -0
  257. package/docs/assets/martinloop-logo.png +0 -0
  258. package/docs/assets/nvidia-inception-program-light.png +0 -0
  259. package/docs/assets/nvidia-inception-program.png +0 -0
  260. package/docs/assets/phase3c-sidesidebyside-demo.html +228 -0
  261. package/docs/assets/side-by-side.svg +134 -0
  262. package/docs/oss/CLAUDE-CODE-WALKTHROUGH.md +142 -142
  263. package/docs/oss/EXAMPLES.md +134 -134
  264. package/docs/oss/OSS-BOUNDARY-REPORT.json +1 -1
  265. package/docs/oss/OSS-BOUNDARY-REPORT.md +1 -1
  266. package/docs/oss/QUICKSTART.md +170 -165
  267. package/docs/oss/RALPH-LOOP-SAFETY.md +113 -113
  268. package/docs/oss/README.md +96 -96
  269. package/docs/oss/RELEASE-SURFACE-REPORT.json +2 -1
  270. package/docs/oss/RELEASE-SURFACE-REPORT.md +2 -1
  271. package/package.json +130 -58
  272. package/docs/distribution/DIRECTORY-SUBMISSIONS.md +0 -89
  273. package/docs/distribution/INTEGRATION-OUTREACH.md +0 -61
  274. package/docs/distribution/UNDER-3-CHALLENGE.md +0 -65
@@ -0,0 +1,228 @@
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>MartinLoop vs Ralph — Repair Flaky CI Gate</title>
7
+ <link href="https://fonts.googleapis.com/css2?family=Fraunces:ital,opsz,wght@0,9..144,700;0,9..144,900&family=JetBrains+Mono:wght@400;600;700&display=swap" rel="stylesheet">
8
+ <style>
9
+ *{box-sizing:border-box;margin:0;padding:0;}
10
+ :root{
11
+ --cream:#f8f5ed;--paper:#fffdf8;--border:#e2d9cc;
12
+ --ink:#18181b;--ink-2:#52525b;--ink-3:#a1a1aa;
13
+ --purple:#5b50d6;--purple-lt:#ede9ff;
14
+ --green:#15803d;--red:#b91c1c;--navy:#1a1a2e;
15
+ --fd:'Fraunces',Georgia,serif;--fm:'JetBrains Mono',monospace;
16
+ }
17
+ body{background:var(--cream);color:var(--ink);font-family:-apple-system,sans-serif;padding:48px 24px;}
18
+ .wrap{max-width:1060px;margin:0 auto;display:flex;flex-direction:column;gap:48px;}
19
+ .logo{font-family:var(--fd);font-size:1.6rem;font-weight:700;letter-spacing:-.04em;}
20
+ .logo span{color:var(--purple);}
21
+ .ey{font-family:var(--fm);font-size:11px;font-weight:700;letter-spacing:.14em;text-transform:uppercase;color:var(--purple);margin-bottom:10px;display:block;}
22
+ .headline{font-family:var(--fd);font-size:clamp(2rem,3.5vw,3rem);font-weight:700;letter-spacing:-.04em;margin-bottom:8px;}
23
+ .sub{font-size:14px;color:var(--ink-2);line-height:1.65;max-width:580px;margin-bottom:24px;}
24
+
25
+ /* SCENARIO BADGE */
26
+ .scenario-badge{display:inline-flex;align-items:center;gap:8px;background:var(--navy);color:#fff;border-radius:999px;padding:8px 18px;font-family:var(--fm);font-size:11px;font-weight:700;margin-bottom:20px;}
27
+ .badge-dot{width:7px;height:7px;border-radius:50%;background:var(--purple);}
28
+
29
+ /* COMPARE GRID */
30
+ .compare{display:grid;grid-template-columns:1fr 1fr;gap:20px;}
31
+ .col-label{font-family:var(--fm);font-size:10.5px;font-weight:700;text-transform:uppercase;letter-spacing:.12em;margin-bottom:10px;padding:6px 12px;border-radius:6px;display:inline-block;}
32
+ .col-label.bad{background:#fee2e2;color:var(--red);font-weight:700;}
33
+ .col-label.good{background:#dcfce7;color:var(--green);}
34
+
35
+ /* TERMINAL */
36
+ .term-wrap{border-radius:14px;overflow:hidden;box-shadow:0 6px 24px rgba(0,0,0,.12);}
37
+ .term-bar{background:#1a1d2e;padding:10px 14px;display:flex;align-items:center;gap:7px;}
38
+ .d{width:11px;height:11px;border-radius:50%;}
39
+ .dr{background:#ff5f57;}.dy{background:#ffbd2e;}.dg{background:#28c840;}
40
+ .tt{flex:1;text-align:center;font-family:var(--fm);font-size:10px;color:#8b8fa8;}
41
+ .term-body{background:#0d1117;padding:16px 18px;font-family:var(--fm);font-size:12px;line-height:1.85;min-height:320px;}
42
+ .ln{display:block;white-space:pre;}
43
+ .cp{color:#3d4060;}.cc{color:#e8e8f0;}.ca{color:#7c72f0;}
44
+ .cok{color:#2ab97a;}.ce{color:#f87171;}.cw{color:#fbbf24;}.cd{color:#4a4d62;}
45
+
46
+ /* ANIMATED LINES */
47
+ @keyframes fadeUp{from{opacity:0;transform:translateY(3px)}to{opacity:1;transform:none}}
48
+ .hidden{opacity:0;}
49
+ .show{animation:fadeUp .18s ease forwards;}
50
+
51
+ /* OUTCOME */
52
+ .outcome{display:flex;align-items:flex-start;gap:12px;padding:14px 18px;border-radius:12px;margin-top:12px;}
53
+ .outcome.pass{background:#dcfce7;border:1px solid #86efac;}
54
+ .outcome.fail{background:#fee2e2;border:1px solid #fca5a5;}
55
+ .oi{font-size:18px;flex-shrink:0;}
56
+ .ol{font-size:13.5px;font-weight:800;color:var(--ink);margin-bottom:3px;}
57
+ .od{font-size:12.5px;color:var(--ink-2);line-height:1.5;}
58
+
59
+ /* VERDICT BAR */
60
+ .verdict{background:var(--navy);border-radius:16px;padding:24px 32px;display:flex;gap:28px;align-items:center;flex-wrap:wrap;}
61
+ .vs{text-align:center;}
62
+ .vn{font-family:var(--fd);font-size:2rem;font-weight:700;letter-spacing:-.04em;}
63
+ .vn.g{color:#2ab97a;}.vn.r{color:#f87171;}
64
+ .vl{font-family:var(--fm);font-size:9px;text-transform:uppercase;letter-spacing:.1em;color:#4a4d62;margin-top:3px;}
65
+ .vdiv{width:1px;height:44px;background:#1e2030;}
66
+ .vnote{font-size:12.5px;color:#8b8fa8;line-height:1.6;max-width:340px;}
67
+ .vnote strong{color:#f0f0f5;}
68
+
69
+ /* REPLAY */
70
+ .replay-btn{background:var(--purple);color:#fff;border:none;padding:10px 24px;border-radius:999px;font-family:var(--fm);font-size:11px;font-weight:700;cursor:pointer;letter-spacing:.08em;text-transform:uppercase;transition:opacity .15s;}
71
+ .replay-btn:hover{opacity:.85;}
72
+
73
+ /* SOURCE NOTE */
74
+ .source{background:var(--paper);border:1px solid var(--border);border-radius:12px;padding:14px 18px;display:flex;gap:16px;align-items:center;}
75
+ .source-icon{font-size:20px;flex-shrink:0;}
76
+ .source-text{font-size:12px;color:var(--ink-2);line-height:1.6;}
77
+ .source-text strong{color:var(--ink);}
78
+ .source-text a{color:var(--purple);}
79
+ </style>
80
+ </head>
81
+ <body>
82
+ <div class="wrap">
83
+
84
+ <div>
85
+ <div class="logo">Martin<span>Loop</span></div>
86
+ <div style="font-family:var(--fm);font-size:11px;color:var(--ink-3);margin-top:4px;text-transform:uppercase;letter-spacing:.08em;">Side-by-Side Demo · Real Benchmark Numbers</div>
87
+ </div>
88
+
89
+ <div>
90
+ <div class="scenario-badge"><div class="badge-dot"></div>Benchmark Task: Repair Flaky CI Gate</div>
91
+ <span class="ey">The Most Common Engineering Nightmare</span>
92
+ <div class="headline">Same task. Same starting state.<br>Wildly different outcomes.</div>
93
+ <div class="sub">This is a real task from the public MartinLoop benchmark suite. The costs, outcomes, and behavior you see below are from actual benchmark runs — not hypotheticals. Run <code style="font-family:monospace;background:var(--purple-lt);color:var(--purple);padding:1px 5px;border-radius:4px;">pnpm --filter @martin/benchmarks eval</code> to reproduce.</div>
94
+
95
+ <div class="compare">
96
+ <div>
97
+ <div class="col-label bad">Without MartinLoop — Ralph</div>
98
+ <div class="term-wrap">
99
+ <div class="term-bar"><div class="d dr"></div><div class="d dy"></div><div class="d dg"></div><div class="tt">ralph — no governance</div></div>
100
+ <div class="term-body" id="ralph-body">
101
+ <span class="ln"><span class="cp">$ </span><span class="cc">ralph run "repair flaky CI gate"</span></span>
102
+ </div>
103
+ </div>
104
+ <div class="outcome fail" id="ralph-outcome" style="display:none;">
105
+ <div class="oi">✗</div>
106
+ <div><div class="ol">Failed — incorrect/unknown_failure</div><div class="od">4 uncontrolled retries. $5.20 spent. No verifier. No receipt. No rollback available.</div></div>
107
+ </div>
108
+ </div>
109
+ <div>
110
+ <div class="col-label good">With MartinLoop</div>
111
+ <div class="term-wrap">
112
+ <div class="term-bar"><div class="d dr"></div><div class="d dy"></div><div class="d dg"></div><div class="tt">martin — governed</div></div>
113
+ <div class="term-body" id="martin-body">
114
+ <span class="ln"><span class="cp">$ </span><span class="cc">martin run "repair flaky CI gate" --budget 5.00 --verify "pnpm test"</span></span>
115
+ </div>
116
+ </div>
117
+ <div class="outcome pass" id="martin-outcome" style="display:none;">
118
+ <div class="oi">✓</div>
119
+ <div><div class="ol">Passed — lifecycleState: completed · verified</div><div class="od">1 attempt. $2.30 spent. Tests pass. Structured JSONL audit record. Rollback ready.</div></div>
120
+ </div>
121
+ </div>
122
+ </div>
123
+
124
+ <div class="verdict" id="verdict" style="display:none;">
125
+ <div class="vs"><div class="vn g">$2.30</div><div class="vl">Martin cost</div></div>
126
+ <div class="vdiv"></div>
127
+ <div class="vs"><div class="vn r">$5.20</div><div class="vl">Ralph cost</div></div>
128
+ <div class="vdiv"></div>
129
+ <div class="vs"><div class="vn g">Completed</div><div class="vl">Martin result</div></div>
130
+ <div class="vdiv"></div>
131
+ <div class="vs"><div class="vn r">Failed</div><div class="vl">Ralph result</div></div>
132
+ <div class="vdiv"></div>
133
+ <div class="vnote"><strong>55% lower cost. Verified pass vs failed outcome.</strong> Martin completed in 1 attempt with test-verified results. Ralph retried 4 times with no verifier, no governance, and left no audit trail.</div>
134
+ <div style="margin-left:auto;"><button class="replay-btn" onclick="replay()">↺ Replay</button></div>
135
+ </div>
136
+ </div>
137
+
138
+ <div class="source">
139
+ <div class="source-icon">📋</div>
140
+ <div class="source-text">
141
+ <strong>These are real benchmark numbers.</strong> Task: "Repair flaky CI gate" — from <code style="font-family:monospace;font-size:11px;">benchmarks/comparative/history/latest.md</code>, MartinLoop v0.1.2, April 2026.<br>
142
+ Martin: $2.30, lifecycleState: completed, verified · Ralph: $5.20, incorrect/unknown_failure, no verifier.<br>
143
+ Source: <a href="https://github.com/Keesan12/MartinLoop">github.com/Keesan12/MartinLoop</a> · Reproduce: <code style="font-family:monospace;font-size:11px;">pnpm --filter @martin/benchmarks eval</code>
144
+ </div>
145
+ </div>
146
+
147
+ </div>
148
+
149
+ <script>
150
+ const ralphLines = [
151
+ ['cd','⟳ Attempt 1/∞ ...'],
152
+ ['cd',' Analyzing CI config...'],
153
+ ['ce',' ✗ Test suite: 3 failures'],
154
+ ['cw',' ↻ Retrying (full context)... [$1.10 spent]'],
155
+ ['cd','⟳ Attempt 2/∞ ...'],
156
+ ['ce',' ✗ Test suite: 2 failures'],
157
+ ['cw',' ↻ Retrying... [$2.40 spent]'],
158
+ ['cd','⟳ Attempt 3/∞ ...'],
159
+ ['ce',' ✗ Test suite: 2 failures'],
160
+ ['cw',' ↻ Retrying... [$3.85 spent]'],
161
+ ['cd','⟳ Attempt 4/∞ ...'],
162
+ ['ce',' ✗ Unknown failure — model lost context'],
163
+ ['ce',' ⚠ Halted by API limit'],
164
+ [' ',' '],
165
+ ['cd','Cost: '],['ce','$5.20 (no cap enforced)'],
166
+ ['cd','Audit trail: '],['ce','None'],
167
+ ['cd','Rollback: '],['ce','Not available'],
168
+ ];
169
+ const martinLines = [
170
+ ['ca','✓ Budget cap $5.00 hard stop'],
171
+ ['ca','✓ Safety leash 11 failure classes'],
172
+ ['ca','✓ Verifier pnpm test'],
173
+ ['cd','⟳ Attempt 1...'],
174
+ ['cd',' Analyzing CI config...'],
175
+ ['cok',' ✓ Race condition identified in test runner'],
176
+ ['cok',' ✓ Fix applied: jest.config.js + test helper'],
177
+ ['cd',' ⟳ Running verifier: pnpm test...'],
178
+ ['cok',' ✓ 44/44 tests pass — verified'],
179
+ [' ',' '],
180
+ ['ca','────────────────────────────────'],
181
+ ['cok','✓ Cost: $2.30 / $5.00 cap'],
182
+ ['cok','✓ lifecycleState: completed · verified'],
183
+ ['cok','✓ JSONL audit record (cost, files, exit)'],
184
+ ['cok','✓ Rollback ready (0 files on failure)'],
185
+ ];
186
+
187
+ function addLine(el, cls, text, delay) {
188
+ setTimeout(() => {
189
+ const s = document.createElement('span');
190
+ s.className = 'ln show';
191
+ s.innerHTML = `<span class="${cls}">${text}</span>`;
192
+ el.appendChild(s);
193
+ }, delay);
194
+ }
195
+
196
+ let running = false;
197
+ function run() {
198
+ if (running) return; running = true;
199
+ const rb = document.getElementById('ralph-body');
200
+ const mb = document.getElementById('martin-body');
201
+ const baseDelay = 600;
202
+ ralphLines.forEach(([cls, txt], i) => addLine(rb, cls, txt, baseDelay + i * 420));
203
+ martinLines.forEach(([cls, txt], i) => addLine(mb, cls, txt, baseDelay + 300 + i * 380));
204
+ const totalTime = baseDelay + Math.max(ralphLines.length * 420, martinLines.length * 380) + 200;
205
+ setTimeout(() => {
206
+ document.getElementById('ralph-outcome').style.display = 'flex';
207
+ document.getElementById('martin-outcome').style.display = 'flex';
208
+ document.getElementById('verdict').style.display = 'flex';
209
+ }, totalTime);
210
+ }
211
+
212
+ function replay() {
213
+ const rb = document.getElementById('ralph-body');
214
+ const mb = document.getElementById('martin-body');
215
+ rb.innerHTML = '<span class="ln"><span class="cp">$ </span><span class="cc">ralph run "repair flaky CI gate"</span></span>';
216
+ mb.innerHTML = '<span class="ln"><span class="cp">$ </span><span class="cc">martin run "repair flaky CI gate" --budget 5.00 --verify "pnpm test"</span></span>';
217
+ document.getElementById('ralph-outcome').style.display = 'none';
218
+ document.getElementById('martin-outcome').style.display = 'none';
219
+ document.getElementById('verdict').style.display = 'none';
220
+ running = false;
221
+ setTimeout(run, 100);
222
+ }
223
+
224
+ // Auto-start
225
+ setTimeout(run, 800);
226
+ </script>
227
+ </body>
228
+ </html>
@@ -0,0 +1,134 @@
1
+ <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 960 582" width="960" height="582">
2
+ <defs>
3
+ <style>
4
+ .m{font-family:"SF Mono","JetBrains Mono","Cascadia Code",Consolas,monospace}
5
+ </style>
6
+ </defs>
7
+
8
+ <!-- Root background -->
9
+ <rect width="960" height="582" rx="14" fill="#0d1117"/>
10
+ <rect x=".5" y=".5" width="959" height="581" rx="14" fill="none" stroke="#30363d"/>
11
+
12
+ <!-- Header -->
13
+ <text x="480" y="28" text-anchor="middle" class="m" font-size="9" fill="#bc8cff" letter-spacing="1.2">BENCHMARK · REPAIR FLAKY CI GATE · MartinLoop v0.1.2</text>
14
+ <text x="480" y="50" text-anchor="middle" class="m" font-size="13" fill="#e6edf3">Same task. Same starting state. Wildly different outcomes.</text>
15
+ <text x="480" y="68" text-anchor="middle" class="m" font-size="9.5" fill="#8b949e">Real benchmark numbers · Reproduce: pnpm --filter @martin/benchmarks eval</text>
16
+
17
+ <!-- ═══════════════ LEFT PANEL: RALPH ═══════════════ -->
18
+
19
+ <!-- Badge -->
20
+ <rect x="20" y="78" width="232" height="20" rx="4" fill="#3d1515"/>
21
+ <text x="32" y="92" class="m" font-size="10" fill="#f85149" font-weight="bold">WITHOUT MARTINLOOP — RALPH</text>
22
+
23
+ <!-- Window frame -->
24
+ <rect x="20" y="102" width="450" height="340" rx="10" fill="#161b22"/>
25
+ <rect x="20.5" y="102.5" width="449" height="339" rx="10" fill="none" stroke="#30363d"/>
26
+ <!-- Title bar -->
27
+ <rect x="20" y="102" width="450" height="36" rx="10" fill="#1c2128"/>
28
+ <rect x="20" y="124" width="450" height="14" fill="#1c2128"/>
29
+ <line x1="20" y1="138" x2="470" y2="138" stroke="#30363d"/>
30
+ <!-- Traffic lights -->
31
+ <circle cx="40" cy="120" r="5.5" fill="#ff5f57"/>
32
+ <circle cx="58" cy="120" r="5.5" fill="#ffbd2e"/>
33
+ <circle cx="76" cy="120" r="5.5" fill="#28c840"/>
34
+ <text x="245" y="125" text-anchor="middle" class="m" font-size="10" fill="#8b949e">ralph — no governance</text>
35
+
36
+ <!-- Ralph content: y starts at 154, step 16 -->
37
+ <text x="34" y="154" class="m" font-size="11" fill="#3fb950">$ ralph run "repair flaky CI gate"</text>
38
+
39
+ <text x="34" y="186" class="m" font-size="11" fill="#8b949e">&#x27F3; Attempt 1/&#x221E; ...</text>
40
+ <text x="34" y="202" class="m" font-size="11" fill="#8b949e"> Analyzing CI config...</text>
41
+ <text x="34" y="218" class="m" font-size="11" fill="#f85149"> &#x2717; Test suite: 3 failures</text>
42
+ <text x="34" y="234" class="m" font-size="11" fill="#d29922"> &#x21BB; Retrying... [$1.10 spent]</text>
43
+ <text x="34" y="250" class="m" font-size="11" fill="#8b949e">&#x27F3; Attempt 2/&#x221E; ...</text>
44
+ <text x="34" y="266" class="m" font-size="11" fill="#f85149"> &#x2717; Test suite: 2 failures</text>
45
+ <text x="34" y="282" class="m" font-size="11" fill="#d29922"> &#x21BB; Retrying... [$2.40 spent]</text>
46
+ <text x="34" y="298" class="m" font-size="11" fill="#8b949e">&#x27F3; Attempt 3/&#x221E; ... 4/&#x221E; ...</text>
47
+ <text x="34" y="314" class="m" font-size="11" fill="#f85149"> &#x2717; Unknown failure — model lost context</text>
48
+ <text x="34" y="330" class="m" font-size="11" fill="#d29922"> &#x26A0; Halted by API limit [$5.20 total]</text>
49
+
50
+ <text x="34" y="362" class="m" font-size="11" fill="#f85149">Cost: $5.20 (no cap enforced)</text>
51
+ <text x="34" y="378" class="m" font-size="11" fill="#8b949e">Audit trail: <tspan fill="#f85149">None</tspan></text>
52
+ <text x="34" y="394" class="m" font-size="11" fill="#8b949e">Rollback: <tspan fill="#f85149">Not available</tspan></text>
53
+ <text x="34" y="410" class="m" font-size="11" fill="#8b949e">Exit: <tspan fill="#f85149">unknown_failure</tspan></text>
54
+
55
+ <!-- Ralph outcome box -->
56
+ <rect x="20" y="450" width="450" height="48" rx="8" fill="#3d1515"/>
57
+ <rect x="20.5" y="450.5" width="449" height="47" rx="8" fill="none" stroke="#f85149" stroke-opacity=".4"/>
58
+ <text x="38" y="470" class="m" font-size="12" fill="#f85149" font-weight="bold">&#x2717; Failed — unknown_failure</text>
59
+ <text x="38" y="488" class="m" font-size="10" fill="#8b949e">4 retries · $5.20 · no verifier · no receipt · no rollback</text>
60
+
61
+ <!-- ═══════════════ RIGHT PANEL: MARTIN ═══════════════ -->
62
+
63
+ <!-- Badge -->
64
+ <rect x="490" y="78" width="172" height="20" rx="4" fill="#0d2b17"/>
65
+ <text x="502" y="92" class="m" font-size="10" fill="#3fb950" font-weight="bold">WITH MARTINLOOP</text>
66
+
67
+ <!-- Window frame -->
68
+ <rect x="490" y="102" width="450" height="340" rx="10" fill="#161b22"/>
69
+ <rect x="490.5" y="102.5" width="449" height="339" rx="10" fill="none" stroke="#30363d"/>
70
+ <!-- Title bar -->
71
+ <rect x="490" y="102" width="450" height="36" rx="10" fill="#1c2128"/>
72
+ <rect x="490" y="124" width="450" height="14" fill="#1c2128"/>
73
+ <line x1="490" y1="138" x2="940" y2="138" stroke="#30363d"/>
74
+ <!-- Traffic lights -->
75
+ <circle cx="510" cy="120" r="5.5" fill="#ff5f57"/>
76
+ <circle cx="528" cy="120" r="5.5" fill="#ffbd2e"/>
77
+ <circle cx="546" cy="120" r="5.5" fill="#28c840"/>
78
+ <text x="715" y="125" text-anchor="middle" class="m" font-size="10" fill="#8b949e">martin — governed</text>
79
+
80
+ <!-- Martin content -->
81
+ <text x="504" y="154" class="m" font-size="11" fill="#3fb950">$ martin run "repair flaky CI gate" \</text>
82
+ <text x="504" y="170" class="m" font-size="11" fill="#3fb950"> --budget 5.00 --verify "pnpm test"</text>
83
+ <text x="504" y="191" class="m" font-size="11" fill="#79c0ff">&#x2713; Budget $5.00 · Safety leash · Verifier set</text>
84
+
85
+ <text x="504" y="223" class="m" font-size="11" fill="#8b949e">&#x27F3; Attempt 1/3 ...</text>
86
+ <text x="504" y="239" class="m" font-size="11" fill="#8b949e"> Analyzing CI config...</text>
87
+ <text x="504" y="255" class="m" font-size="11" fill="#3fb950"> &#x2713; Race condition found in test runner</text>
88
+ <text x="504" y="271" class="m" font-size="11" fill="#3fb950"> &#x2713; Fix: jest.config.js + test helper</text>
89
+ <text x="504" y="287" class="m" font-size="11" fill="#8b949e"> &#x27F3; Running verifier: pnpm test...</text>
90
+ <text x="504" y="303" class="m" font-size="11" fill="#3fb950"> &#x2713; 44/44 tests pass — verified</text>
91
+
92
+ <!-- Separator line -->
93
+ <rect x="504" y="312" width="422" height="1" fill="#30363d"/>
94
+
95
+ <text x="504" y="328" class="m" font-size="11" fill="#3fb950">&#x2713; Cost: $2.30 / $5.00 cap</text>
96
+ <text x="504" y="344" class="m" font-size="11" fill="#3fb950">&#x2713; Status: completed · verified</text>
97
+ <text x="504" y="360" class="m" font-size="11" fill="#bc8cff">&#x2713; Record: JSONL audit record written</text>
98
+ <text x="504" y="376" class="m" font-size="11" fill="#3fb950">&#x2713; Rollback ready</text>
99
+
100
+ <!-- Martin outcome box -->
101
+ <rect x="490" y="450" width="450" height="48" rx="8" fill="#0d2b17"/>
102
+ <rect x="490.5" y="450.5" width="449" height="47" rx="8" fill="none" stroke="#3fb950" stroke-opacity=".4"/>
103
+ <text x="508" y="470" class="m" font-size="12" fill="#3fb950" font-weight="bold">&#x2713; Passed — lifecycleState: completed</text>
104
+ <text x="508" y="488" class="m" font-size="10" fill="#8b949e">1 attempt · $2.30 · verifier passed · JSONL record · rollback ready</text>
105
+
106
+ <!-- ═══════════════ VERDICT BAR ═══════════════ -->
107
+ <rect x="20" y="506" width="920" height="62" rx="10" fill="#161b22"/>
108
+ <rect x="20.5" y="506.5" width="919" height="61" rx="10" fill="none" stroke="#30363d"/>
109
+
110
+ <!-- Stat: Martin cost -->
111
+ <text x="88" y="529" text-anchor="middle" class="m" font-size="18" fill="#3fb950" font-weight="bold">$2.30</text>
112
+ <text x="88" y="550" text-anchor="middle" class="m" font-size="8" fill="#8b949e" letter-spacing=".6">MARTIN COST</text>
113
+ <line x1="155" y1="516" x2="155" y2="558" stroke="#30363d"/>
114
+
115
+ <!-- Stat: Ralph cost -->
116
+ <text x="218" y="529" text-anchor="middle" class="m" font-size="18" fill="#f85149" font-weight="bold">$5.20</text>
117
+ <text x="218" y="550" text-anchor="middle" class="m" font-size="8" fill="#8b949e" letter-spacing=".6">RALPH COST</text>
118
+ <line x1="285" y1="516" x2="285" y2="558" stroke="#30363d"/>
119
+
120
+ <!-- Stat: Martin result -->
121
+ <text x="360" y="529" text-anchor="middle" class="m" font-size="14" fill="#3fb950" font-weight="bold">Completed</text>
122
+ <text x="360" y="550" text-anchor="middle" class="m" font-size="8" fill="#8b949e" letter-spacing=".6">MARTIN RESULT</text>
123
+ <line x1="430" y1="516" x2="430" y2="558" stroke="#30363d"/>
124
+
125
+ <!-- Stat: Ralph result -->
126
+ <text x="498" y="529" text-anchor="middle" class="m" font-size="14" fill="#f85149" font-weight="bold">Failed</text>
127
+ <text x="498" y="550" text-anchor="middle" class="m" font-size="8" fill="#8b949e" letter-spacing=".6">RALPH RESULT</text>
128
+ <line x1="558" y1="516" x2="558" y2="558" stroke="#30363d"/>
129
+
130
+ <!-- Note -->
131
+ <text x="576" y="528" class="m" font-size="11" fill="#e6edf3" font-weight="bold">55% lower cost. Verified pass vs failed outcome.</text>
132
+ <text x="576" y="549" class="m" font-size="10" fill="#8b949e">Reproduce: pnpm --filter @martin/benchmarks eval</text>
133
+
134
+ </svg>
@@ -1,142 +1,142 @@
1
- # Claude Code Walkthrough
2
-
3
- This walkthrough shows how to put MartinLoop around a Claude Code-driven coding task so the run has a budget, a verifier gate, an explicit stop reason, and an inspectable run record.
4
-
5
- Back to the repo overview: [README.md](../../README.md)
6
-
7
- ## What MartinLoop adds around Claude Code
8
-
9
- Claude Code is the coding engine. MartinLoop is the governance layer around it.
10
-
11
- - **Budget**: hard USD, token, and iteration limits decide how far the run can go.
12
- - **Verifier**: the run only counts as complete when the post-run verification command passes.
13
- - **Stop reason**: MartinLoop records why the run stopped, such as `completed`, `budget_exit`, or `human_escalation`.
14
- - **Run record**: each run appends a JSONL record under `~/.martin/runs/` so you can inspect it later.
15
-
16
- ## Prerequisites
17
-
18
- - Node.js 20+
19
- - `pnpm` 10.x if you are running from this repo
20
- - Claude Code CLI installed and authenticated
21
- - A repo you want Claude Code to work in
22
-
23
- ## Install MartinLoop
24
-
25
- For the published CLI:
26
-
27
- ```bash
28
- npm install -g martin-loop
29
- ```
30
-
31
- For repo-local development in this monorepo:
32
-
33
- ```bash
34
- pnpm install
35
- pnpm build
36
- ```
37
-
38
- ## Simple local run
39
-
40
- Run MartinLoop with the default Claude adapter and a verifier command:
41
-
42
- ```bash
43
- martin run "fix the auth regression" \
44
- --engine claude \
45
- --budget 3.00 \
46
- --verify "pnpm test"
47
- ```
48
-
49
- What happens:
50
-
51
- - MartinLoop hands the objective to Claude Code
52
- - Claude Code attempts the work
53
- - MartinLoop runs the verifier command
54
- - the loop only finishes as `completed` when the agent result and verifier both pass
55
-
56
- ## Budget example
57
-
58
- Use a hard cap and a smaller iteration budget when you want Claude Code to stay tightly bounded:
59
-
60
- ```bash
61
- martin run "tighten the login retry handling" \
62
- --engine claude \
63
- --budget 2.00 \
64
- --soft-limit-usd 1.25 \
65
- --max-iterations 2 \
66
- --max-tokens 20000 \
67
- --verify "pnpm --filter @martin/core test"
68
- ```
69
-
70
- This is the key MartinLoop value-add for Claude Code workflows: the agent can keep trying, but only inside a contract you can review before the spend drifts.
71
-
72
- ## Verifier example
73
-
74
- Use a verifier that matches the exact scope of the change:
75
-
76
- ```bash
77
- martin run "update the OSS quickstart wording" \
78
- --engine claude \
79
- --cwd . \
80
- --allow-path README.md \
81
- --allow-path docs/oss/** \
82
- --deny-path apps/control-plane/** \
83
- --accept "Only documentation files may change" \
84
- --verify "pnpm --filter @martin/core test"
85
- ```
86
-
87
- The verifier gate matters because Claude Code producing a patch is not the same thing as the repo being in a valid state.
88
-
89
- ## Inspect example
90
-
91
- After a run, inspect the persisted JSONL record:
92
-
93
- ```bash
94
- martin inspect --file ~/.martin/runs/<workspaceId>.jsonl
95
- ```
96
-
97
- Look for:
98
-
99
- - the final lifecycle state and stop reason
100
- - budget and token totals
101
- - verifier outcome
102
- - attempt count and failure classification
103
-
104
- ## Safe repo-local dry run
105
-
106
- If you want to validate the MartinLoop flow without real model spend, use stub mode first:
107
-
108
- ### PowerShell
109
-
110
- ```powershell
111
- $env:MARTIN_LIVE='false'
112
- $repoRoot = (Get-Location).Path
113
- pnpm run:cli -- run `
114
- --cwd $repoRoot `
115
- --objective "Summarize the current runtime state" `
116
- --verify "pnpm --filter @martin/core test"
117
- Remove-Item Env:MARTIN_LIVE
118
- ```
119
-
120
- This does not invoke Claude Code, and it will usually end with a recorded non-success stop reason because no live provider request was attempted. That is still the fastest way to confirm the loop, persistence, and verifier path are wired correctly before you switch to a live Claude run.
121
-
122
- ## Common errors and troubleshooting
123
-
124
- ### `claude` is not found
125
-
126
- MartinLoop can only use the Claude adapter when the Claude Code CLI is installed and available on `PATH`. Confirm the CLI itself works before you debug MartinLoop.
127
-
128
- ### The run stops with `budget_exit`
129
-
130
- The configured budget, iteration limit, or token ceiling was too tight for the requested task. Either narrow the task or raise the budget intentionally.
131
-
132
- ### The verifier fails even though Claude Code produced a patch
133
-
134
- That means MartinLoop did its job. The patch was attempted, but the repo did not reach a verified state. Tighten the scope, change the verifier, or ask Claude Code to address the failing checks directly.
135
-
136
- ### The run exits with `human_escalation`
137
-
138
- That usually means MartinLoop detected a path that should not proceed unattended, such as an unsafe verifier or a control boundary that needs review.
139
-
140
- ### `martin inspect` cannot find the file
141
-
142
- Run another task first, or point `inspect` at the correct JSONL file under `~/.martin/runs/`.
1
+ # Claude Code Walkthrough
2
+
3
+ This walkthrough shows how to put MartinLoop around a Claude Code-driven coding task so the run has a budget, a verifier gate, an explicit stop reason, and an inspectable run record.
4
+
5
+ Back to the repo overview: [README.md](../../README.md)
6
+
7
+ ## What MartinLoop adds around Claude Code
8
+
9
+ Claude Code is the coding engine. MartinLoop is the governance layer around it.
10
+
11
+ - **Budget**: hard USD, token, and iteration limits decide how far the run can go.
12
+ - **Verifier**: the run only counts as complete when the post-run verification command passes.
13
+ - **Stop reason**: MartinLoop records why the run stopped, such as `completed`, `budget_exit`, or `human_escalation`.
14
+ - **Run record**: each run appends a JSONL record under `~/.martin/runs/` so you can inspect it later.
15
+
16
+ ## Prerequisites
17
+
18
+ - Node.js 20+
19
+ - `pnpm` 10.x if you are running from this repo
20
+ - Claude Code CLI installed and authenticated
21
+ - A repo you want Claude Code to work in
22
+
23
+ ## Install MartinLoop
24
+
25
+ For the published CLI:
26
+
27
+ ```bash
28
+ npm install -g martin-loop
29
+ ```
30
+
31
+ For repo-local development in this monorepo:
32
+
33
+ ```bash
34
+ pnpm install
35
+ pnpm build
36
+ ```
37
+
38
+ ## Simple local run
39
+
40
+ Run MartinLoop with the default Claude adapter and a verifier command:
41
+
42
+ ```bash
43
+ martin run "fix the auth regression" \
44
+ --engine claude \
45
+ --budget 3.00 \
46
+ --verify "pnpm test"
47
+ ```
48
+
49
+ What happens:
50
+
51
+ - MartinLoop hands the objective to Claude Code
52
+ - Claude Code attempts the work
53
+ - MartinLoop runs the verifier command
54
+ - the loop only finishes as `completed` when the agent result and verifier both pass
55
+
56
+ ## Budget example
57
+
58
+ Use a hard cap and a smaller iteration budget when you want Claude Code to stay tightly bounded:
59
+
60
+ ```bash
61
+ martin run "tighten the login retry handling" \
62
+ --engine claude \
63
+ --budget 2.00 \
64
+ --soft-limit-usd 1.25 \
65
+ --max-iterations 2 \
66
+ --max-tokens 20000 \
67
+ --verify "pnpm --filter @martin/core test"
68
+ ```
69
+
70
+ This is the key MartinLoop value-add for Claude Code workflows: the agent can keep trying, but only inside a contract you can review before the spend drifts.
71
+
72
+ ## Verifier example
73
+
74
+ Use a verifier that matches the exact scope of the change:
75
+
76
+ ```bash
77
+ martin run "update the OSS quickstart wording" \
78
+ --engine claude \
79
+ --cwd . \
80
+ --allow-path README.md \
81
+ --allow-path docs/oss/** \
82
+ --deny-path apps/control-plane/** \
83
+ --accept "Only documentation files may change" \
84
+ --verify "pnpm --filter @martin/core test"
85
+ ```
86
+
87
+ The verifier gate matters because Claude Code producing a patch is not the same thing as the repo being in a valid state.
88
+
89
+ ## Inspect example
90
+
91
+ After a run, inspect the persisted JSONL record:
92
+
93
+ ```bash
94
+ martin inspect --file ~/.martin/runs/<workspaceId>.jsonl
95
+ ```
96
+
97
+ Look for:
98
+
99
+ - the final lifecycle state and stop reason
100
+ - budget and token totals
101
+ - verifier outcome
102
+ - attempt count and failure classification
103
+
104
+ ## Safe repo-local dry run
105
+
106
+ If you want to validate the MartinLoop flow without real model spend, use stub mode first:
107
+
108
+ ### PowerShell
109
+
110
+ ```powershell
111
+ $env:MARTIN_LIVE='false'
112
+ $repoRoot = (Get-Location).Path
113
+ pnpm run:cli -- run `
114
+ --cwd $repoRoot `
115
+ --objective "Summarize the current runtime state" `
116
+ --verify "pnpm --filter @martin/core test"
117
+ Remove-Item Env:MARTIN_LIVE
118
+ ```
119
+
120
+ This does not invoke Claude Code, and it will usually end with a recorded non-success stop reason because no live provider request was attempted. That is still the fastest way to confirm the loop, persistence, and verifier path are wired correctly before you switch to a live Claude run.
121
+
122
+ ## Common errors and troubleshooting
123
+
124
+ ### `claude` is not found
125
+
126
+ MartinLoop can only use the Claude adapter when the Claude Code CLI is installed and available on `PATH`. Confirm the CLI itself works before you debug MartinLoop.
127
+
128
+ ### The run stops with `budget_exit`
129
+
130
+ The configured budget, iteration limit, or token ceiling was too tight for the requested task. Either narrow the task or raise the budget intentionally.
131
+
132
+ ### The verifier fails even though Claude Code produced a patch
133
+
134
+ That means MartinLoop did its job. The patch was attempted, but the repo did not reach a verified state. Tighten the scope, change the verifier, or ask Claude Code to address the failing checks directly.
135
+
136
+ ### The run exits with `human_escalation`
137
+
138
+ That usually means MartinLoop detected a path that should not proceed unattended, such as an unsafe verifier or a control boundary that needs review.
139
+
140
+ ### `martin inspect` cannot find the file
141
+
142
+ Run another task first, or point `inspect` at the correct JSONL file under `~/.martin/runs/`.