martin-loop 0.1.5 → 1.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CODE_OF_CONDUCT.md +32 -0
- package/LICENSE +21 -21
- package/README.md +307 -398
- package/demo/seeded-workspace/README.md +35 -35
- package/demo/seeded-workspace/TASKS.md +29 -29
- package/demo/seeded-workspace/martin.config.yaml +11 -11
- package/demo/seeded-workspace/package.json +8 -8
- package/demo/seeded-workspace/src/invoice-summary.js +11 -11
- package/demo/seeded-workspace/test/invoice-summary.test.js +20 -20
- package/dist/bin/martin-loop.js +0 -0
- package/dist/vendor/adapters/counter.d.ts +1 -0
- package/dist/vendor/adapters/counter.js +4 -0
- package/dist/vendor/adapters/git-baseline.d.ts +50 -0
- package/dist/vendor/adapters/git-baseline.js +233 -0
- package/dist/vendor/adapters/openrouter-adapter.d.ts +15 -0
- package/dist/vendor/adapters/openrouter-adapter.js +302 -0
- package/dist/vendor/adapters/usage.d.ts +48 -0
- package/dist/vendor/adapters/usage.js +66 -0
- package/dist/vendor/cli/bin/exit.d.ts +12 -0
- package/dist/vendor/cli/bin/exit.js +28 -0
- package/dist/vendor/cli/commands/analyze.d.ts +5 -0
- package/dist/vendor/cli/commands/analyze.js +58 -0
- package/dist/vendor/cli/commands/audit-log-verify.d.ts +34 -0
- package/dist/vendor/cli/commands/audit-log-verify.js +99 -0
- package/dist/vendor/cli/commands/audit.d.ts +8 -0
- package/dist/vendor/cli/commands/audit.js +199 -0
- package/dist/vendor/cli/commands/corpus.d.ts +5 -0
- package/dist/vendor/cli/commands/corpus.js +60 -0
- package/dist/vendor/cli/commands/doctor.d.ts +8 -0
- package/dist/vendor/cli/commands/doctor.js +219 -0
- package/dist/vendor/cli/commands/explain.d.ts +17 -0
- package/dist/vendor/cli/commands/explain.js +176 -0
- package/dist/vendor/cli/commands/export.d.ts +5 -0
- package/dist/vendor/cli/commands/export.js +60 -0
- package/dist/vendor/cli/commands/governance.d.ts +8 -0
- package/dist/vendor/cli/commands/governance.js +95 -0
- package/dist/vendor/cli/commands/improve.d.ts +18 -0
- package/dist/vendor/cli/commands/improve.js +396 -0
- package/dist/vendor/cli/commands/init.d.ts +8 -0
- package/dist/vendor/cli/commands/init.js +281 -0
- package/dist/vendor/cli/commands/migration.d.ts +8 -0
- package/dist/vendor/cli/commands/migration.js +67 -0
- package/dist/vendor/cli/commands/prior.d.ts +23 -0
- package/dist/vendor/cli/commands/prior.js +145 -0
- package/dist/vendor/cli/commands/resume.d.ts +21 -0
- package/dist/vendor/cli/commands/resume.js +73 -0
- package/dist/vendor/cli/commands/verify.d.ts +6 -0
- package/dist/vendor/cli/commands/verify.js +43 -0
- package/dist/vendor/cli/research/public-corpus.d.ts +43 -0
- package/dist/vendor/cli/research/public-corpus.js +151 -0
- package/dist/vendor/cli/ui/error-card.d.ts +38 -0
- package/dist/vendor/cli/ui/error-card.js +103 -0
- package/dist/vendor/cli/ui/mission-brief.d.ts +41 -0
- package/dist/vendor/cli/ui/mission-brief.js +173 -0
- package/dist/vendor/cli/ui/summary-card.d.ts +34 -0
- package/dist/vendor/cli/ui/summary-card.js +102 -0
- package/dist/vendor/contracts/audit.d.ts +46 -0
- package/dist/vendor/contracts/audit.js +360 -0
- package/dist/vendor/contracts/post-phase15.d.ts +240 -0
- package/dist/vendor/contracts/post-phase15.js +166 -0
- package/dist/vendor/core/agent/mandates.d.ts +46 -0
- package/dist/vendor/core/agent/mandates.js +178 -0
- package/dist/vendor/core/agent/receipts.d.ts +38 -0
- package/dist/vendor/core/agent/receipts.js +131 -0
- package/dist/vendor/core/agent/signing.d.ts +17 -0
- package/dist/vendor/core/agent/signing.js +91 -0
- package/dist/vendor/core/attestation/sign.d.ts +25 -0
- package/dist/vendor/core/attestation/sign.js +216 -0
- package/dist/vendor/core/autonomy/autonomous-promotion.d.ts +120 -0
- package/dist/vendor/core/autonomy/autonomous-promotion.js +346 -0
- package/dist/vendor/core/autonomy/envelope-v2.d.ts +29 -0
- package/dist/vendor/core/autonomy/envelope-v2.js +60 -0
- package/dist/vendor/core/autonomy/envelope.d.ts +17 -0
- package/dist/vendor/core/autonomy/envelope.js +27 -0
- package/dist/vendor/core/autonomy/escalation-ledger.d.ts +20 -0
- package/dist/vendor/core/autonomy/escalation-ledger.js +18 -0
- package/dist/vendor/core/autonomy/resume.d.ts +15 -0
- package/dist/vendor/core/autonomy/resume.js +23 -0
- package/dist/vendor/core/circuit/circuit-breaker.d.ts +60 -0
- package/dist/vendor/core/circuit/circuit-breaker.js +143 -0
- package/dist/vendor/core/context-distillation.d.ts +3 -0
- package/dist/vendor/core/context-distillation.js +44 -0
- package/dist/vendor/core/context-flow/compile-context.d.ts +8 -0
- package/dist/vendor/core/context-flow/compile-context.js +111 -0
- package/dist/vendor/core/context-flow/entities.d.ts +2 -0
- package/dist/vendor/core/context-flow/entities.js +44 -0
- package/dist/vendor/core/context-flow/evaluate-policy.d.ts +2 -0
- package/dist/vendor/core/context-flow/evaluate-policy.js +42 -0
- package/dist/vendor/core/context-flow/index.d.ts +11 -0
- package/dist/vendor/core/context-flow/index.js +24 -0
- package/dist/vendor/core/context-flow/labels.d.ts +3 -0
- package/dist/vendor/core/context-flow/labels.js +17 -0
- package/dist/vendor/core/context-flow/normalizer.d.ts +9 -0
- package/dist/vendor/core/context-flow/normalizer.js +69 -0
- package/dist/vendor/core/context-flow/profiles.d.ts +33 -0
- package/dist/vendor/core/context-flow/profiles.js +36 -0
- package/dist/vendor/core/context-flow/redaction.d.ts +1 -0
- package/dist/vendor/core/context-flow/redaction.js +6 -0
- package/dist/vendor/core/context-flow/sensitivity.d.ts +2 -0
- package/dist/vendor/core/context-flow/sensitivity.js +27 -0
- package/dist/vendor/core/context-flow/sync-preview.d.ts +2 -0
- package/dist/vendor/core/context-flow/sync-preview.js +22 -0
- package/dist/vendor/core/context-flow/token-estimator.d.ts +3 -0
- package/dist/vendor/core/context-flow/token-estimator.js +13 -0
- package/dist/vendor/core/context-flow/types.d.ts +91 -0
- package/dist/vendor/core/context-flow/types.js +2 -0
- package/dist/vendor/core/context-utility.d.ts +47 -0
- package/dist/vendor/core/context-utility.js +405 -0
- package/dist/vendor/core/cost/pipeline.d.ts +92 -0
- package/dist/vendor/core/cost/pipeline.js +141 -0
- package/dist/vendor/core/cost/tagged-cost.d.ts +27 -0
- package/dist/vendor/core/cost/tagged-cost.js +55 -0
- package/dist/vendor/core/cost-governor.d.ts +2 -0
- package/dist/vendor/core/cost-governor.js +50 -0
- package/dist/vendor/core/cve/cve-check.d.ts +80 -0
- package/dist/vendor/core/cve/cve-check.js +172 -0
- package/dist/vendor/core/digital-twin/index.d.ts +27 -0
- package/dist/vendor/core/digital-twin/index.js +90 -0
- package/dist/vendor/core/drift/drift-graph.d.ts +47 -0
- package/dist/vendor/core/drift/drift-graph.js +100 -0
- package/dist/vendor/core/drift/objective-lock.d.ts +69 -0
- package/dist/vendor/core/drift/objective-lock.js +88 -0
- package/dist/vendor/core/drift/scope.d.ts +46 -0
- package/dist/vendor/core/drift/scope.js +102 -0
- package/dist/vendor/core/drift/signature-lock.d.ts +48 -0
- package/dist/vendor/core/drift/signature-lock.js +202 -0
- package/dist/vendor/core/drift/stale-proof-gate.d.ts +21 -0
- package/dist/vendor/core/drift/stale-proof-gate.js +19 -0
- package/dist/vendor/core/eval/known-bad-world-runner.d.ts +24 -0
- package/dist/vendor/core/eval/known-bad-world-runner.js +256 -0
- package/dist/vendor/core/evidence/claim-audit.d.ts +18 -0
- package/dist/vendor/core/evidence/claim-audit.js +89 -0
- package/dist/vendor/core/exit-intelligence.d.ts +2 -0
- package/dist/vendor/core/exit-intelligence.js +58 -0
- package/dist/vendor/core/explain/formatter.d.ts +42 -0
- package/dist/vendor/core/explain/formatter.js +171 -0
- package/dist/vendor/core/explain/timeline.d.ts +29 -0
- package/dist/vendor/core/explain/timeline.js +213 -0
- package/dist/vendor/core/failure-taxonomy.d.ts +2 -0
- package/dist/vendor/core/failure-taxonomy.js +76 -0
- package/dist/vendor/core/gateway/index.d.ts +10 -0
- package/dist/vendor/core/gateway/index.js +12 -0
- package/dist/vendor/core/gateway/registry.d.ts +40 -0
- package/dist/vendor/core/gateway/registry.js +97 -0
- package/dist/vendor/core/gateway/transport.d.ts +31 -0
- package/dist/vendor/core/gateway/transport.js +82 -0
- package/dist/vendor/core/gateway/vault.d.ts +19 -0
- package/dist/vendor/core/gateway/vault.js +29 -0
- package/dist/vendor/core/graph/adapters.d.ts +43 -0
- package/dist/vendor/core/graph/adapters.js +91 -0
- package/dist/vendor/core/graph/hotspots.d.ts +22 -0
- package/dist/vendor/core/graph/hotspots.js +30 -0
- package/dist/vendor/core/graph/index.d.ts +1 -0
- package/dist/vendor/core/graph/index.js +2 -0
- package/dist/vendor/core/honey/honey-tokens.d.ts +32 -0
- package/dist/vendor/core/honey/honey-tokens.js +44 -0
- package/dist/vendor/core/index.d.ts +2 -2
- package/dist/vendor/core/index.js +38 -12
- package/dist/vendor/core/learning/bayesian-update.d.ts +31 -0
- package/dist/vendor/core/learning/bayesian-update.js +60 -0
- package/dist/vendor/core/learning/prior-sets.d.ts +42 -0
- package/dist/vendor/core/learning/prior-sets.js +111 -0
- package/dist/vendor/core/learning/promotion-gate.d.ts +17 -0
- package/dist/vendor/core/learning/promotion-gate.js +23 -0
- package/dist/vendor/core/leash/blast-radius.d.ts +42 -0
- package/dist/vendor/core/leash/blast-radius.js +156 -0
- package/dist/vendor/core/leash/policy-leash.d.ts +31 -0
- package/dist/vendor/core/leash/policy-leash.js +117 -0
- package/dist/vendor/core/memo/memo.d.ts +63 -0
- package/dist/vendor/core/memo/memo.js +97 -0
- package/dist/vendor/core/memory/learning-pipeline.d.ts +154 -0
- package/dist/vendor/core/memory/learning-pipeline.js +391 -0
- package/dist/vendor/core/memory/palace.d.ts +84 -0
- package/dist/vendor/core/memory/palace.js +379 -0
- package/dist/vendor/core/merge/ast-merge.d.ts +22 -0
- package/dist/vendor/core/merge/ast-merge.js +350 -0
- package/dist/vendor/core/merge/text-merge.d.ts +12 -0
- package/dist/vendor/core/merge/text-merge.js +182 -0
- package/dist/vendor/core/otel/tracer.d.ts +45 -0
- package/dist/vendor/core/otel/tracer.js +116 -0
- package/dist/vendor/core/parallel/parallel-attempts.d.ts +28 -0
- package/dist/vendor/core/parallel/parallel-attempts.js +41 -0
- package/dist/vendor/core/parallel/scorer.d.ts +24 -0
- package/dist/vendor/core/parallel/scorer.js +65 -0
- package/dist/vendor/core/pattern-detection.d.ts +64 -0
- package/dist/vendor/core/pattern-detection.js +108 -0
- package/dist/vendor/core/persistence/checkpoint.d.ts +44 -0
- package/dist/vendor/core/persistence/checkpoint.js +156 -0
- package/dist/vendor/core/persistence/cleanup.d.ts +22 -0
- package/dist/vendor/core/persistence/cleanup.js +131 -0
- package/dist/vendor/core/persistence/index.d.ts +2 -0
- package/dist/vendor/core/persistence/index.js +1 -0
- package/dist/vendor/core/persistence/runs-reader.d.ts +52 -0
- package/dist/vendor/core/persistence/runs-reader.js +84 -0
- package/dist/vendor/core/persistence/store.d.ts +6 -1
- package/dist/vendor/core/persistence/store.js +5 -0
- package/dist/vendor/core/policy/file-touch-quota.d.ts +60 -0
- package/dist/vendor/core/policy/file-touch-quota.js +105 -0
- package/dist/vendor/core/policy/policy-loader.d.ts +30 -0
- package/dist/vendor/core/policy/policy-loader.js +170 -0
- package/dist/vendor/core/policy/policy-schema.d.ts +55 -0
- package/dist/vendor/core/policy/policy-schema.js +78 -0
- package/dist/vendor/core/probe/probe.d.ts +49 -0
- package/dist/vendor/core/probe/probe.js +115 -0
- package/dist/vendor/core/proof/patch-proof.d.ts +58 -0
- package/dist/vendor/core/proof/patch-proof.js +84 -0
- package/dist/vendor/core/proof/semantic-probe.d.ts +25 -0
- package/dist/vendor/core/proof/semantic-probe.js +82 -0
- package/dist/vendor/core/recovery/failure-mode-runner.d.ts +29 -0
- package/dist/vendor/core/recovery/failure-mode-runner.js +39 -0
- package/dist/vendor/core/red-blue/red-phase.d.ts +64 -0
- package/dist/vendor/core/red-blue/red-phase.js +141 -0
- package/dist/vendor/core/red-blue/risk-tiers.d.ts +22 -0
- package/dist/vendor/core/red-blue/risk-tiers.js +33 -0
- package/dist/vendor/core/replay/replay.d.ts +85 -0
- package/dist/vendor/core/replay/replay.js +109 -0
- package/dist/vendor/core/router/engine.d.ts +54 -0
- package/dist/vendor/core/router/engine.js +131 -0
- package/dist/vendor/core/router/index.d.ts +1 -0
- package/dist/vendor/core/router/index.js +2 -0
- package/dist/vendor/core/router/trust-calibration.d.ts +57 -0
- package/dist/vendor/core/router/trust-calibration.js +127 -0
- package/dist/vendor/core/run-martin.d.ts +2 -0
- package/dist/vendor/core/run-martin.js +287 -0
- package/dist/vendor/core/security/cve-scanner.d.ts +62 -0
- package/dist/vendor/core/security/cve-scanner.js +178 -0
- package/dist/vendor/core/sentinel/efficiency-sentinel.d.ts +29 -0
- package/dist/vendor/core/sentinel/efficiency-sentinel.js +30 -0
- package/dist/vendor/core/sentinel/progress-guard.d.ts +35 -0
- package/dist/vendor/core/sentinel/progress-guard.js +46 -0
- package/dist/vendor/core/siem/siem-emitter.d.ts +49 -0
- package/dist/vendor/core/siem/siem-emitter.js +157 -0
- package/dist/vendor/core/strategy/attempt-brief.d.ts +22 -0
- package/dist/vendor/core/strategy/attempt-brief.js +89 -0
- package/dist/vendor/core/summarize/diff-summary.d.ts +35 -0
- package/dist/vendor/core/summarize/diff-summary.js +204 -0
- package/dist/vendor/core/surface-signals.d.ts +21 -0
- package/dist/vendor/core/surface-signals.js +139 -0
- package/dist/vendor/core/truth/truth-wall.d.ts +51 -0
- package/dist/vendor/core/truth/truth-wall.js +69 -0
- package/dist/vendor/core/truth-spine.d.ts +26 -0
- package/dist/vendor/core/truth-spine.js +62 -0
- package/dist/vendor/core/types.d.ts +115 -0
- package/dist/vendor/core/types.js +2 -0
- package/dist/vendor/core/verification/tiered-verify.d.ts +17 -0
- package/dist/vendor/core/verification/tiered-verify.js +29 -0
- package/dist/vendor/core/verifier-pyramid.d.ts +32 -0
- package/dist/vendor/core/verifier-pyramid.js +111 -0
- package/dist/vendor/core/workflow-artifacts.d.ts +99 -0
- package/dist/vendor/core/workflow-artifacts.js +668 -0
- package/dist/vendor/core/wrap/supervised-run.d.ts +96 -0
- package/dist/vendor/core/wrap/supervised-run.js +178 -0
- package/docs/assets/cli-animated.svg +139 -0
- package/docs/assets/cli-static.svg +34 -0
- package/docs/assets/github-hero-v2.svg +23 -0
- package/docs/assets/martin-raplph.png.jpg +0 -0
- package/docs/assets/martinloop-logo.png +0 -0
- package/docs/assets/nvidia-inception-program-light.png +0 -0
- package/docs/assets/nvidia-inception-program.png +0 -0
- package/docs/assets/phase3c-sidesidebyside-demo.html +228 -0
- package/docs/assets/side-by-side.svg +134 -0
- package/docs/oss/CLAUDE-CODE-WALKTHROUGH.md +142 -142
- package/docs/oss/EXAMPLES.md +134 -134
- package/docs/oss/OSS-BOUNDARY-REPORT.json +1 -1
- package/docs/oss/OSS-BOUNDARY-REPORT.md +1 -1
- package/docs/oss/QUICKSTART.md +170 -165
- package/docs/oss/RALPH-LOOP-SAFETY.md +113 -113
- package/docs/oss/README.md +96 -96
- package/docs/oss/RELEASE-SURFACE-REPORT.json +2 -1
- package/docs/oss/RELEASE-SURFACE-REPORT.md +2 -1
- package/package.json +130 -58
- package/docs/distribution/DIRECTORY-SUBMISSIONS.md +0 -89
- package/docs/distribution/INTEGRATION-OUTREACH.md +0 -61
- package/docs/distribution/UNDER-3-CHALLENGE.md +0 -65
|
@@ -0,0 +1,228 @@
|
|
|
1
|
+
<!DOCTYPE html>
|
|
2
|
+
<html lang="en">
|
|
3
|
+
<head>
|
|
4
|
+
<meta charset="UTF-8">
|
|
5
|
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
|
6
|
+
<title>MartinLoop vs Ralph — Repair Flaky CI Gate</title>
|
|
7
|
+
<link href="https://fonts.googleapis.com/css2?family=Fraunces:ital,opsz,wght@0,9..144,700;0,9..144,900&family=JetBrains+Mono:wght@400;600;700&display=swap" rel="stylesheet">
|
|
8
|
+
<style>
|
|
9
|
+
*{box-sizing:border-box;margin:0;padding:0;}
|
|
10
|
+
:root{
|
|
11
|
+
--cream:#f8f5ed;--paper:#fffdf8;--border:#e2d9cc;
|
|
12
|
+
--ink:#18181b;--ink-2:#52525b;--ink-3:#a1a1aa;
|
|
13
|
+
--purple:#5b50d6;--purple-lt:#ede9ff;
|
|
14
|
+
--green:#15803d;--red:#b91c1c;--navy:#1a1a2e;
|
|
15
|
+
--fd:'Fraunces',Georgia,serif;--fm:'JetBrains Mono',monospace;
|
|
16
|
+
}
|
|
17
|
+
body{background:var(--cream);color:var(--ink);font-family:-apple-system,sans-serif;padding:48px 24px;}
|
|
18
|
+
.wrap{max-width:1060px;margin:0 auto;display:flex;flex-direction:column;gap:48px;}
|
|
19
|
+
.logo{font-family:var(--fd);font-size:1.6rem;font-weight:700;letter-spacing:-.04em;}
|
|
20
|
+
.logo span{color:var(--purple);}
|
|
21
|
+
.ey{font-family:var(--fm);font-size:11px;font-weight:700;letter-spacing:.14em;text-transform:uppercase;color:var(--purple);margin-bottom:10px;display:block;}
|
|
22
|
+
.headline{font-family:var(--fd);font-size:clamp(2rem,3.5vw,3rem);font-weight:700;letter-spacing:-.04em;margin-bottom:8px;}
|
|
23
|
+
.sub{font-size:14px;color:var(--ink-2);line-height:1.65;max-width:580px;margin-bottom:24px;}
|
|
24
|
+
|
|
25
|
+
/* SCENARIO BADGE */
|
|
26
|
+
.scenario-badge{display:inline-flex;align-items:center;gap:8px;background:var(--navy);color:#fff;border-radius:999px;padding:8px 18px;font-family:var(--fm);font-size:11px;font-weight:700;margin-bottom:20px;}
|
|
27
|
+
.badge-dot{width:7px;height:7px;border-radius:50%;background:var(--purple);}
|
|
28
|
+
|
|
29
|
+
/* COMPARE GRID */
|
|
30
|
+
.compare{display:grid;grid-template-columns:1fr 1fr;gap:20px;}
|
|
31
|
+
.col-label{font-family:var(--fm);font-size:10.5px;font-weight:700;text-transform:uppercase;letter-spacing:.12em;margin-bottom:10px;padding:6px 12px;border-radius:6px;display:inline-block;}
|
|
32
|
+
.col-label.bad{background:#fee2e2;color:var(--red);font-weight:700;}
|
|
33
|
+
.col-label.good{background:#dcfce7;color:var(--green);}
|
|
34
|
+
|
|
35
|
+
/* TERMINAL */
|
|
36
|
+
.term-wrap{border-radius:14px;overflow:hidden;box-shadow:0 6px 24px rgba(0,0,0,.12);}
|
|
37
|
+
.term-bar{background:#1a1d2e;padding:10px 14px;display:flex;align-items:center;gap:7px;}
|
|
38
|
+
.d{width:11px;height:11px;border-radius:50%;}
|
|
39
|
+
.dr{background:#ff5f57;}.dy{background:#ffbd2e;}.dg{background:#28c840;}
|
|
40
|
+
.tt{flex:1;text-align:center;font-family:var(--fm);font-size:10px;color:#8b8fa8;}
|
|
41
|
+
.term-body{background:#0d1117;padding:16px 18px;font-family:var(--fm);font-size:12px;line-height:1.85;min-height:320px;}
|
|
42
|
+
.ln{display:block;white-space:pre;}
|
|
43
|
+
.cp{color:#3d4060;}.cc{color:#e8e8f0;}.ca{color:#7c72f0;}
|
|
44
|
+
.cok{color:#2ab97a;}.ce{color:#f87171;}.cw{color:#fbbf24;}.cd{color:#4a4d62;}
|
|
45
|
+
|
|
46
|
+
/* ANIMATED LINES */
|
|
47
|
+
@keyframes fadeUp{from{opacity:0;transform:translateY(3px)}to{opacity:1;transform:none}}
|
|
48
|
+
.hidden{opacity:0;}
|
|
49
|
+
.show{animation:fadeUp .18s ease forwards;}
|
|
50
|
+
|
|
51
|
+
/* OUTCOME */
|
|
52
|
+
.outcome{display:flex;align-items:flex-start;gap:12px;padding:14px 18px;border-radius:12px;margin-top:12px;}
|
|
53
|
+
.outcome.pass{background:#dcfce7;border:1px solid #86efac;}
|
|
54
|
+
.outcome.fail{background:#fee2e2;border:1px solid #fca5a5;}
|
|
55
|
+
.oi{font-size:18px;flex-shrink:0;}
|
|
56
|
+
.ol{font-size:13.5px;font-weight:800;color:var(--ink);margin-bottom:3px;}
|
|
57
|
+
.od{font-size:12.5px;color:var(--ink-2);line-height:1.5;}
|
|
58
|
+
|
|
59
|
+
/* VERDICT BAR */
|
|
60
|
+
.verdict{background:var(--navy);border-radius:16px;padding:24px 32px;display:flex;gap:28px;align-items:center;flex-wrap:wrap;}
|
|
61
|
+
.vs{text-align:center;}
|
|
62
|
+
.vn{font-family:var(--fd);font-size:2rem;font-weight:700;letter-spacing:-.04em;}
|
|
63
|
+
.vn.g{color:#2ab97a;}.vn.r{color:#f87171;}
|
|
64
|
+
.vl{font-family:var(--fm);font-size:9px;text-transform:uppercase;letter-spacing:.1em;color:#4a4d62;margin-top:3px;}
|
|
65
|
+
.vdiv{width:1px;height:44px;background:#1e2030;}
|
|
66
|
+
.vnote{font-size:12.5px;color:#8b8fa8;line-height:1.6;max-width:340px;}
|
|
67
|
+
.vnote strong{color:#f0f0f5;}
|
|
68
|
+
|
|
69
|
+
/* REPLAY */
|
|
70
|
+
.replay-btn{background:var(--purple);color:#fff;border:none;padding:10px 24px;border-radius:999px;font-family:var(--fm);font-size:11px;font-weight:700;cursor:pointer;letter-spacing:.08em;text-transform:uppercase;transition:opacity .15s;}
|
|
71
|
+
.replay-btn:hover{opacity:.85;}
|
|
72
|
+
|
|
73
|
+
/* SOURCE NOTE */
|
|
74
|
+
.source{background:var(--paper);border:1px solid var(--border);border-radius:12px;padding:14px 18px;display:flex;gap:16px;align-items:center;}
|
|
75
|
+
.source-icon{font-size:20px;flex-shrink:0;}
|
|
76
|
+
.source-text{font-size:12px;color:var(--ink-2);line-height:1.6;}
|
|
77
|
+
.source-text strong{color:var(--ink);}
|
|
78
|
+
.source-text a{color:var(--purple);}
|
|
79
|
+
</style>
|
|
80
|
+
</head>
|
|
81
|
+
<body>
|
|
82
|
+
<div class="wrap">
|
|
83
|
+
|
|
84
|
+
<div>
|
|
85
|
+
<div class="logo">Martin<span>Loop</span></div>
|
|
86
|
+
<div style="font-family:var(--fm);font-size:11px;color:var(--ink-3);margin-top:4px;text-transform:uppercase;letter-spacing:.08em;">Side-by-Side Demo · Real Benchmark Numbers</div>
|
|
87
|
+
</div>
|
|
88
|
+
|
|
89
|
+
<div>
|
|
90
|
+
<div class="scenario-badge"><div class="badge-dot"></div>Benchmark Task: Repair Flaky CI Gate</div>
|
|
91
|
+
<span class="ey">The Most Common Engineering Nightmare</span>
|
|
92
|
+
<div class="headline">Same task. Same starting state.<br>Wildly different outcomes.</div>
|
|
93
|
+
<div class="sub">This is a real task from the public MartinLoop benchmark suite. The costs, outcomes, and behavior you see below are from actual benchmark runs — not hypotheticals. Run <code style="font-family:monospace;background:var(--purple-lt);color:var(--purple);padding:1px 5px;border-radius:4px;">pnpm --filter @martin/benchmarks eval</code> to reproduce.</div>
|
|
94
|
+
|
|
95
|
+
<div class="compare">
|
|
96
|
+
<div>
|
|
97
|
+
<div class="col-label bad">Without MartinLoop — Ralph</div>
|
|
98
|
+
<div class="term-wrap">
|
|
99
|
+
<div class="term-bar"><div class="d dr"></div><div class="d dy"></div><div class="d dg"></div><div class="tt">ralph — no governance</div></div>
|
|
100
|
+
<div class="term-body" id="ralph-body">
|
|
101
|
+
<span class="ln"><span class="cp">$ </span><span class="cc">ralph run "repair flaky CI gate"</span></span>
|
|
102
|
+
</div>
|
|
103
|
+
</div>
|
|
104
|
+
<div class="outcome fail" id="ralph-outcome" style="display:none;">
|
|
105
|
+
<div class="oi">✗</div>
|
|
106
|
+
<div><div class="ol">Failed — incorrect/unknown_failure</div><div class="od">4 uncontrolled retries. $5.20 spent. No verifier. No receipt. No rollback available.</div></div>
|
|
107
|
+
</div>
|
|
108
|
+
</div>
|
|
109
|
+
<div>
|
|
110
|
+
<div class="col-label good">With MartinLoop</div>
|
|
111
|
+
<div class="term-wrap">
|
|
112
|
+
<div class="term-bar"><div class="d dr"></div><div class="d dy"></div><div class="d dg"></div><div class="tt">martin — governed</div></div>
|
|
113
|
+
<div class="term-body" id="martin-body">
|
|
114
|
+
<span class="ln"><span class="cp">$ </span><span class="cc">martin run "repair flaky CI gate" --budget 5.00 --verify "pnpm test"</span></span>
|
|
115
|
+
</div>
|
|
116
|
+
</div>
|
|
117
|
+
<div class="outcome pass" id="martin-outcome" style="display:none;">
|
|
118
|
+
<div class="oi">✓</div>
|
|
119
|
+
<div><div class="ol">Passed — lifecycleState: completed · verified</div><div class="od">1 attempt. $2.30 spent. Tests pass. Structured JSONL audit record. Rollback ready.</div></div>
|
|
120
|
+
</div>
|
|
121
|
+
</div>
|
|
122
|
+
</div>
|
|
123
|
+
|
|
124
|
+
<div class="verdict" id="verdict" style="display:none;">
|
|
125
|
+
<div class="vs"><div class="vn g">$2.30</div><div class="vl">Martin cost</div></div>
|
|
126
|
+
<div class="vdiv"></div>
|
|
127
|
+
<div class="vs"><div class="vn r">$5.20</div><div class="vl">Ralph cost</div></div>
|
|
128
|
+
<div class="vdiv"></div>
|
|
129
|
+
<div class="vs"><div class="vn g">Completed</div><div class="vl">Martin result</div></div>
|
|
130
|
+
<div class="vdiv"></div>
|
|
131
|
+
<div class="vs"><div class="vn r">Failed</div><div class="vl">Ralph result</div></div>
|
|
132
|
+
<div class="vdiv"></div>
|
|
133
|
+
<div class="vnote"><strong>55% lower cost. Verified pass vs failed outcome.</strong> Martin completed in 1 attempt with test-verified results. Ralph retried 4 times with no verifier, no governance, and left no audit trail.</div>
|
|
134
|
+
<div style="margin-left:auto;"><button class="replay-btn" onclick="replay()">↺ Replay</button></div>
|
|
135
|
+
</div>
|
|
136
|
+
</div>
|
|
137
|
+
|
|
138
|
+
<div class="source">
|
|
139
|
+
<div class="source-icon">📋</div>
|
|
140
|
+
<div class="source-text">
|
|
141
|
+
<strong>These are real benchmark numbers.</strong> Task: "Repair flaky CI gate" — from <code style="font-family:monospace;font-size:11px;">benchmarks/comparative/history/latest.md</code>, MartinLoop v0.1.2, April 2026.<br>
|
|
142
|
+
Martin: $2.30, lifecycleState: completed, verified · Ralph: $5.20, incorrect/unknown_failure, no verifier.<br>
|
|
143
|
+
Source: <a href="https://github.com/Keesan12/MartinLoop">github.com/Keesan12/MartinLoop</a> · Reproduce: <code style="font-family:monospace;font-size:11px;">pnpm --filter @martin/benchmarks eval</code>
|
|
144
|
+
</div>
|
|
145
|
+
</div>
|
|
146
|
+
|
|
147
|
+
</div>
|
|
148
|
+
|
|
149
|
+
<script>
|
|
150
|
+
const ralphLines = [
|
|
151
|
+
['cd','⟳ Attempt 1/∞ ...'],
|
|
152
|
+
['cd',' Analyzing CI config...'],
|
|
153
|
+
['ce',' ✗ Test suite: 3 failures'],
|
|
154
|
+
['cw',' ↻ Retrying (full context)... [$1.10 spent]'],
|
|
155
|
+
['cd','⟳ Attempt 2/∞ ...'],
|
|
156
|
+
['ce',' ✗ Test suite: 2 failures'],
|
|
157
|
+
['cw',' ↻ Retrying... [$2.40 spent]'],
|
|
158
|
+
['cd','⟳ Attempt 3/∞ ...'],
|
|
159
|
+
['ce',' ✗ Test suite: 2 failures'],
|
|
160
|
+
['cw',' ↻ Retrying... [$3.85 spent]'],
|
|
161
|
+
['cd','⟳ Attempt 4/∞ ...'],
|
|
162
|
+
['ce',' ✗ Unknown failure — model lost context'],
|
|
163
|
+
['ce',' ⚠ Halted by API limit'],
|
|
164
|
+
[' ',' '],
|
|
165
|
+
['cd','Cost: '],['ce','$5.20 (no cap enforced)'],
|
|
166
|
+
['cd','Audit trail: '],['ce','None'],
|
|
167
|
+
['cd','Rollback: '],['ce','Not available'],
|
|
168
|
+
];
|
|
169
|
+
const martinLines = [
|
|
170
|
+
['ca','✓ Budget cap $5.00 hard stop'],
|
|
171
|
+
['ca','✓ Safety leash 11 failure classes'],
|
|
172
|
+
['ca','✓ Verifier pnpm test'],
|
|
173
|
+
['cd','⟳ Attempt 1...'],
|
|
174
|
+
['cd',' Analyzing CI config...'],
|
|
175
|
+
['cok',' ✓ Race condition identified in test runner'],
|
|
176
|
+
['cok',' ✓ Fix applied: jest.config.js + test helper'],
|
|
177
|
+
['cd',' ⟳ Running verifier: pnpm test...'],
|
|
178
|
+
['cok',' ✓ 44/44 tests pass — verified'],
|
|
179
|
+
[' ',' '],
|
|
180
|
+
['ca','────────────────────────────────'],
|
|
181
|
+
['cok','✓ Cost: $2.30 / $5.00 cap'],
|
|
182
|
+
['cok','✓ lifecycleState: completed · verified'],
|
|
183
|
+
['cok','✓ JSONL audit record (cost, files, exit)'],
|
|
184
|
+
['cok','✓ Rollback ready (0 files on failure)'],
|
|
185
|
+
];
|
|
186
|
+
|
|
187
|
+
function addLine(el, cls, text, delay) {
|
|
188
|
+
setTimeout(() => {
|
|
189
|
+
const s = document.createElement('span');
|
|
190
|
+
s.className = 'ln show';
|
|
191
|
+
s.innerHTML = `<span class="${cls}">${text}</span>`;
|
|
192
|
+
el.appendChild(s);
|
|
193
|
+
}, delay);
|
|
194
|
+
}
|
|
195
|
+
|
|
196
|
+
let running = false;
|
|
197
|
+
function run() {
|
|
198
|
+
if (running) return; running = true;
|
|
199
|
+
const rb = document.getElementById('ralph-body');
|
|
200
|
+
const mb = document.getElementById('martin-body');
|
|
201
|
+
const baseDelay = 600;
|
|
202
|
+
ralphLines.forEach(([cls, txt], i) => addLine(rb, cls, txt, baseDelay + i * 420));
|
|
203
|
+
martinLines.forEach(([cls, txt], i) => addLine(mb, cls, txt, baseDelay + 300 + i * 380));
|
|
204
|
+
const totalTime = baseDelay + Math.max(ralphLines.length * 420, martinLines.length * 380) + 200;
|
|
205
|
+
setTimeout(() => {
|
|
206
|
+
document.getElementById('ralph-outcome').style.display = 'flex';
|
|
207
|
+
document.getElementById('martin-outcome').style.display = 'flex';
|
|
208
|
+
document.getElementById('verdict').style.display = 'flex';
|
|
209
|
+
}, totalTime);
|
|
210
|
+
}
|
|
211
|
+
|
|
212
|
+
function replay() {
|
|
213
|
+
const rb = document.getElementById('ralph-body');
|
|
214
|
+
const mb = document.getElementById('martin-body');
|
|
215
|
+
rb.innerHTML = '<span class="ln"><span class="cp">$ </span><span class="cc">ralph run "repair flaky CI gate"</span></span>';
|
|
216
|
+
mb.innerHTML = '<span class="ln"><span class="cp">$ </span><span class="cc">martin run "repair flaky CI gate" --budget 5.00 --verify "pnpm test"</span></span>';
|
|
217
|
+
document.getElementById('ralph-outcome').style.display = 'none';
|
|
218
|
+
document.getElementById('martin-outcome').style.display = 'none';
|
|
219
|
+
document.getElementById('verdict').style.display = 'none';
|
|
220
|
+
running = false;
|
|
221
|
+
setTimeout(run, 100);
|
|
222
|
+
}
|
|
223
|
+
|
|
224
|
+
// Auto-start
|
|
225
|
+
setTimeout(run, 800);
|
|
226
|
+
</script>
|
|
227
|
+
</body>
|
|
228
|
+
</html>
|
|
@@ -0,0 +1,134 @@
|
|
|
1
|
+
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 960 582" width="960" height="582">
|
|
2
|
+
<defs>
|
|
3
|
+
<style>
|
|
4
|
+
.m{font-family:"SF Mono","JetBrains Mono","Cascadia Code",Consolas,monospace}
|
|
5
|
+
</style>
|
|
6
|
+
</defs>
|
|
7
|
+
|
|
8
|
+
<!-- Root background -->
|
|
9
|
+
<rect width="960" height="582" rx="14" fill="#0d1117"/>
|
|
10
|
+
<rect x=".5" y=".5" width="959" height="581" rx="14" fill="none" stroke="#30363d"/>
|
|
11
|
+
|
|
12
|
+
<!-- Header -->
|
|
13
|
+
<text x="480" y="28" text-anchor="middle" class="m" font-size="9" fill="#bc8cff" letter-spacing="1.2">BENCHMARK · REPAIR FLAKY CI GATE · MartinLoop v0.1.2</text>
|
|
14
|
+
<text x="480" y="50" text-anchor="middle" class="m" font-size="13" fill="#e6edf3">Same task. Same starting state. Wildly different outcomes.</text>
|
|
15
|
+
<text x="480" y="68" text-anchor="middle" class="m" font-size="9.5" fill="#8b949e">Real benchmark numbers · Reproduce: pnpm --filter @martin/benchmarks eval</text>
|
|
16
|
+
|
|
17
|
+
<!-- ═══════════════ LEFT PANEL: RALPH ═══════════════ -->
|
|
18
|
+
|
|
19
|
+
<!-- Badge -->
|
|
20
|
+
<rect x="20" y="78" width="232" height="20" rx="4" fill="#3d1515"/>
|
|
21
|
+
<text x="32" y="92" class="m" font-size="10" fill="#f85149" font-weight="bold">WITHOUT MARTINLOOP — RALPH</text>
|
|
22
|
+
|
|
23
|
+
<!-- Window frame -->
|
|
24
|
+
<rect x="20" y="102" width="450" height="340" rx="10" fill="#161b22"/>
|
|
25
|
+
<rect x="20.5" y="102.5" width="449" height="339" rx="10" fill="none" stroke="#30363d"/>
|
|
26
|
+
<!-- Title bar -->
|
|
27
|
+
<rect x="20" y="102" width="450" height="36" rx="10" fill="#1c2128"/>
|
|
28
|
+
<rect x="20" y="124" width="450" height="14" fill="#1c2128"/>
|
|
29
|
+
<line x1="20" y1="138" x2="470" y2="138" stroke="#30363d"/>
|
|
30
|
+
<!-- Traffic lights -->
|
|
31
|
+
<circle cx="40" cy="120" r="5.5" fill="#ff5f57"/>
|
|
32
|
+
<circle cx="58" cy="120" r="5.5" fill="#ffbd2e"/>
|
|
33
|
+
<circle cx="76" cy="120" r="5.5" fill="#28c840"/>
|
|
34
|
+
<text x="245" y="125" text-anchor="middle" class="m" font-size="10" fill="#8b949e">ralph — no governance</text>
|
|
35
|
+
|
|
36
|
+
<!-- Ralph content: y starts at 154, step 16 -->
|
|
37
|
+
<text x="34" y="154" class="m" font-size="11" fill="#3fb950">$ ralph run "repair flaky CI gate"</text>
|
|
38
|
+
|
|
39
|
+
<text x="34" y="186" class="m" font-size="11" fill="#8b949e">⟳ Attempt 1/∞ ...</text>
|
|
40
|
+
<text x="34" y="202" class="m" font-size="11" fill="#8b949e"> Analyzing CI config...</text>
|
|
41
|
+
<text x="34" y="218" class="m" font-size="11" fill="#f85149"> ✗ Test suite: 3 failures</text>
|
|
42
|
+
<text x="34" y="234" class="m" font-size="11" fill="#d29922"> ↻ Retrying... [$1.10 spent]</text>
|
|
43
|
+
<text x="34" y="250" class="m" font-size="11" fill="#8b949e">⟳ Attempt 2/∞ ...</text>
|
|
44
|
+
<text x="34" y="266" class="m" font-size="11" fill="#f85149"> ✗ Test suite: 2 failures</text>
|
|
45
|
+
<text x="34" y="282" class="m" font-size="11" fill="#d29922"> ↻ Retrying... [$2.40 spent]</text>
|
|
46
|
+
<text x="34" y="298" class="m" font-size="11" fill="#8b949e">⟳ Attempt 3/∞ ... 4/∞ ...</text>
|
|
47
|
+
<text x="34" y="314" class="m" font-size="11" fill="#f85149"> ✗ Unknown failure — model lost context</text>
|
|
48
|
+
<text x="34" y="330" class="m" font-size="11" fill="#d29922"> ⚠ Halted by API limit [$5.20 total]</text>
|
|
49
|
+
|
|
50
|
+
<text x="34" y="362" class="m" font-size="11" fill="#f85149">Cost: $5.20 (no cap enforced)</text>
|
|
51
|
+
<text x="34" y="378" class="m" font-size="11" fill="#8b949e">Audit trail: <tspan fill="#f85149">None</tspan></text>
|
|
52
|
+
<text x="34" y="394" class="m" font-size="11" fill="#8b949e">Rollback: <tspan fill="#f85149">Not available</tspan></text>
|
|
53
|
+
<text x="34" y="410" class="m" font-size="11" fill="#8b949e">Exit: <tspan fill="#f85149">unknown_failure</tspan></text>
|
|
54
|
+
|
|
55
|
+
<!-- Ralph outcome box -->
|
|
56
|
+
<rect x="20" y="450" width="450" height="48" rx="8" fill="#3d1515"/>
|
|
57
|
+
<rect x="20.5" y="450.5" width="449" height="47" rx="8" fill="none" stroke="#f85149" stroke-opacity=".4"/>
|
|
58
|
+
<text x="38" y="470" class="m" font-size="12" fill="#f85149" font-weight="bold">✗ Failed — unknown_failure</text>
|
|
59
|
+
<text x="38" y="488" class="m" font-size="10" fill="#8b949e">4 retries · $5.20 · no verifier · no receipt · no rollback</text>
|
|
60
|
+
|
|
61
|
+
<!-- ═══════════════ RIGHT PANEL: MARTIN ═══════════════ -->
|
|
62
|
+
|
|
63
|
+
<!-- Badge -->
|
|
64
|
+
<rect x="490" y="78" width="172" height="20" rx="4" fill="#0d2b17"/>
|
|
65
|
+
<text x="502" y="92" class="m" font-size="10" fill="#3fb950" font-weight="bold">WITH MARTINLOOP</text>
|
|
66
|
+
|
|
67
|
+
<!-- Window frame -->
|
|
68
|
+
<rect x="490" y="102" width="450" height="340" rx="10" fill="#161b22"/>
|
|
69
|
+
<rect x="490.5" y="102.5" width="449" height="339" rx="10" fill="none" stroke="#30363d"/>
|
|
70
|
+
<!-- Title bar -->
|
|
71
|
+
<rect x="490" y="102" width="450" height="36" rx="10" fill="#1c2128"/>
|
|
72
|
+
<rect x="490" y="124" width="450" height="14" fill="#1c2128"/>
|
|
73
|
+
<line x1="490" y1="138" x2="940" y2="138" stroke="#30363d"/>
|
|
74
|
+
<!-- Traffic lights -->
|
|
75
|
+
<circle cx="510" cy="120" r="5.5" fill="#ff5f57"/>
|
|
76
|
+
<circle cx="528" cy="120" r="5.5" fill="#ffbd2e"/>
|
|
77
|
+
<circle cx="546" cy="120" r="5.5" fill="#28c840"/>
|
|
78
|
+
<text x="715" y="125" text-anchor="middle" class="m" font-size="10" fill="#8b949e">martin — governed</text>
|
|
79
|
+
|
|
80
|
+
<!-- Martin content -->
|
|
81
|
+
<text x="504" y="154" class="m" font-size="11" fill="#3fb950">$ martin run "repair flaky CI gate" \</text>
|
|
82
|
+
<text x="504" y="170" class="m" font-size="11" fill="#3fb950"> --budget 5.00 --verify "pnpm test"</text>
|
|
83
|
+
<text x="504" y="191" class="m" font-size="11" fill="#79c0ff">✓ Budget $5.00 · Safety leash · Verifier set</text>
|
|
84
|
+
|
|
85
|
+
<text x="504" y="223" class="m" font-size="11" fill="#8b949e">⟳ Attempt 1/3 ...</text>
|
|
86
|
+
<text x="504" y="239" class="m" font-size="11" fill="#8b949e"> Analyzing CI config...</text>
|
|
87
|
+
<text x="504" y="255" class="m" font-size="11" fill="#3fb950"> ✓ Race condition found in test runner</text>
|
|
88
|
+
<text x="504" y="271" class="m" font-size="11" fill="#3fb950"> ✓ Fix: jest.config.js + test helper</text>
|
|
89
|
+
<text x="504" y="287" class="m" font-size="11" fill="#8b949e"> ⟳ Running verifier: pnpm test...</text>
|
|
90
|
+
<text x="504" y="303" class="m" font-size="11" fill="#3fb950"> ✓ 44/44 tests pass — verified</text>
|
|
91
|
+
|
|
92
|
+
<!-- Separator line -->
|
|
93
|
+
<rect x="504" y="312" width="422" height="1" fill="#30363d"/>
|
|
94
|
+
|
|
95
|
+
<text x="504" y="328" class="m" font-size="11" fill="#3fb950">✓ Cost: $2.30 / $5.00 cap</text>
|
|
96
|
+
<text x="504" y="344" class="m" font-size="11" fill="#3fb950">✓ Status: completed · verified</text>
|
|
97
|
+
<text x="504" y="360" class="m" font-size="11" fill="#bc8cff">✓ Record: JSONL audit record written</text>
|
|
98
|
+
<text x="504" y="376" class="m" font-size="11" fill="#3fb950">✓ Rollback ready</text>
|
|
99
|
+
|
|
100
|
+
<!-- Martin outcome box -->
|
|
101
|
+
<rect x="490" y="450" width="450" height="48" rx="8" fill="#0d2b17"/>
|
|
102
|
+
<rect x="490.5" y="450.5" width="449" height="47" rx="8" fill="none" stroke="#3fb950" stroke-opacity=".4"/>
|
|
103
|
+
<text x="508" y="470" class="m" font-size="12" fill="#3fb950" font-weight="bold">✓ Passed — lifecycleState: completed</text>
|
|
104
|
+
<text x="508" y="488" class="m" font-size="10" fill="#8b949e">1 attempt · $2.30 · verifier passed · JSONL record · rollback ready</text>
|
|
105
|
+
|
|
106
|
+
<!-- ═══════════════ VERDICT BAR ═══════════════ -->
|
|
107
|
+
<rect x="20" y="506" width="920" height="62" rx="10" fill="#161b22"/>
|
|
108
|
+
<rect x="20.5" y="506.5" width="919" height="61" rx="10" fill="none" stroke="#30363d"/>
|
|
109
|
+
|
|
110
|
+
<!-- Stat: Martin cost -->
|
|
111
|
+
<text x="88" y="529" text-anchor="middle" class="m" font-size="18" fill="#3fb950" font-weight="bold">$2.30</text>
|
|
112
|
+
<text x="88" y="550" text-anchor="middle" class="m" font-size="8" fill="#8b949e" letter-spacing=".6">MARTIN COST</text>
|
|
113
|
+
<line x1="155" y1="516" x2="155" y2="558" stroke="#30363d"/>
|
|
114
|
+
|
|
115
|
+
<!-- Stat: Ralph cost -->
|
|
116
|
+
<text x="218" y="529" text-anchor="middle" class="m" font-size="18" fill="#f85149" font-weight="bold">$5.20</text>
|
|
117
|
+
<text x="218" y="550" text-anchor="middle" class="m" font-size="8" fill="#8b949e" letter-spacing=".6">RALPH COST</text>
|
|
118
|
+
<line x1="285" y1="516" x2="285" y2="558" stroke="#30363d"/>
|
|
119
|
+
|
|
120
|
+
<!-- Stat: Martin result -->
|
|
121
|
+
<text x="360" y="529" text-anchor="middle" class="m" font-size="14" fill="#3fb950" font-weight="bold">Completed</text>
|
|
122
|
+
<text x="360" y="550" text-anchor="middle" class="m" font-size="8" fill="#8b949e" letter-spacing=".6">MARTIN RESULT</text>
|
|
123
|
+
<line x1="430" y1="516" x2="430" y2="558" stroke="#30363d"/>
|
|
124
|
+
|
|
125
|
+
<!-- Stat: Ralph result -->
|
|
126
|
+
<text x="498" y="529" text-anchor="middle" class="m" font-size="14" fill="#f85149" font-weight="bold">Failed</text>
|
|
127
|
+
<text x="498" y="550" text-anchor="middle" class="m" font-size="8" fill="#8b949e" letter-spacing=".6">RALPH RESULT</text>
|
|
128
|
+
<line x1="558" y1="516" x2="558" y2="558" stroke="#30363d"/>
|
|
129
|
+
|
|
130
|
+
<!-- Note -->
|
|
131
|
+
<text x="576" y="528" class="m" font-size="11" fill="#e6edf3" font-weight="bold">55% lower cost. Verified pass vs failed outcome.</text>
|
|
132
|
+
<text x="576" y="549" class="m" font-size="10" fill="#8b949e">Reproduce: pnpm --filter @martin/benchmarks eval</text>
|
|
133
|
+
|
|
134
|
+
</svg>
|
|
@@ -1,142 +1,142 @@
|
|
|
1
|
-
# Claude Code Walkthrough
|
|
2
|
-
|
|
3
|
-
This walkthrough shows how to put MartinLoop around a Claude Code-driven coding task so the run has a budget, a verifier gate, an explicit stop reason, and an inspectable run record.
|
|
4
|
-
|
|
5
|
-
Back to the repo overview: [README.md](../../README.md)
|
|
6
|
-
|
|
7
|
-
## What MartinLoop adds around Claude Code
|
|
8
|
-
|
|
9
|
-
Claude Code is the coding engine. MartinLoop is the governance layer around it.
|
|
10
|
-
|
|
11
|
-
- **Budget**: hard USD, token, and iteration limits decide how far the run can go.
|
|
12
|
-
- **Verifier**: the run only counts as complete when the post-run verification command passes.
|
|
13
|
-
- **Stop reason**: MartinLoop records why the run stopped, such as `completed`, `budget_exit`, or `human_escalation`.
|
|
14
|
-
- **Run record**: each run appends a JSONL record under `~/.martin/runs/` so you can inspect it later.
|
|
15
|
-
|
|
16
|
-
## Prerequisites
|
|
17
|
-
|
|
18
|
-
- Node.js 20+
|
|
19
|
-
- `pnpm` 10.x if you are running from this repo
|
|
20
|
-
- Claude Code CLI installed and authenticated
|
|
21
|
-
- A repo you want Claude Code to work in
|
|
22
|
-
|
|
23
|
-
## Install MartinLoop
|
|
24
|
-
|
|
25
|
-
For the published CLI:
|
|
26
|
-
|
|
27
|
-
```bash
|
|
28
|
-
npm install -g martin-loop
|
|
29
|
-
```
|
|
30
|
-
|
|
31
|
-
For repo-local development in this monorepo:
|
|
32
|
-
|
|
33
|
-
```bash
|
|
34
|
-
pnpm install
|
|
35
|
-
pnpm build
|
|
36
|
-
```
|
|
37
|
-
|
|
38
|
-
## Simple local run
|
|
39
|
-
|
|
40
|
-
Run MartinLoop with the default Claude adapter and a verifier command:
|
|
41
|
-
|
|
42
|
-
```bash
|
|
43
|
-
martin run "fix the auth regression" \
|
|
44
|
-
--engine claude \
|
|
45
|
-
--budget 3.00 \
|
|
46
|
-
--verify "pnpm test"
|
|
47
|
-
```
|
|
48
|
-
|
|
49
|
-
What happens:
|
|
50
|
-
|
|
51
|
-
- MartinLoop hands the objective to Claude Code
|
|
52
|
-
- Claude Code attempts the work
|
|
53
|
-
- MartinLoop runs the verifier command
|
|
54
|
-
- the loop only finishes as `completed` when the agent result and verifier both pass
|
|
55
|
-
|
|
56
|
-
## Budget example
|
|
57
|
-
|
|
58
|
-
Use a hard cap and a smaller iteration budget when you want Claude Code to stay tightly bounded:
|
|
59
|
-
|
|
60
|
-
```bash
|
|
61
|
-
martin run "tighten the login retry handling" \
|
|
62
|
-
--engine claude \
|
|
63
|
-
--budget 2.00 \
|
|
64
|
-
--soft-limit-usd 1.25 \
|
|
65
|
-
--max-iterations 2 \
|
|
66
|
-
--max-tokens 20000 \
|
|
67
|
-
--verify "pnpm --filter @martin/core test"
|
|
68
|
-
```
|
|
69
|
-
|
|
70
|
-
This is the key MartinLoop value-add for Claude Code workflows: the agent can keep trying, but only inside a contract you can review before the spend drifts.
|
|
71
|
-
|
|
72
|
-
## Verifier example
|
|
73
|
-
|
|
74
|
-
Use a verifier that matches the exact scope of the change:
|
|
75
|
-
|
|
76
|
-
```bash
|
|
77
|
-
martin run "update the OSS quickstart wording" \
|
|
78
|
-
--engine claude \
|
|
79
|
-
--cwd . \
|
|
80
|
-
--allow-path README.md \
|
|
81
|
-
--allow-path docs/oss/** \
|
|
82
|
-
--deny-path apps/control-plane/** \
|
|
83
|
-
--accept "Only documentation files may change" \
|
|
84
|
-
--verify "pnpm --filter @martin/core test"
|
|
85
|
-
```
|
|
86
|
-
|
|
87
|
-
The verifier gate matters because Claude Code producing a patch is not the same thing as the repo being in a valid state.
|
|
88
|
-
|
|
89
|
-
## Inspect example
|
|
90
|
-
|
|
91
|
-
After a run, inspect the persisted JSONL record:
|
|
92
|
-
|
|
93
|
-
```bash
|
|
94
|
-
martin inspect --file ~/.martin/runs/<workspaceId>.jsonl
|
|
95
|
-
```
|
|
96
|
-
|
|
97
|
-
Look for:
|
|
98
|
-
|
|
99
|
-
- the final lifecycle state and stop reason
|
|
100
|
-
- budget and token totals
|
|
101
|
-
- verifier outcome
|
|
102
|
-
- attempt count and failure classification
|
|
103
|
-
|
|
104
|
-
## Safe repo-local dry run
|
|
105
|
-
|
|
106
|
-
If you want to validate the MartinLoop flow without real model spend, use stub mode first:
|
|
107
|
-
|
|
108
|
-
### PowerShell
|
|
109
|
-
|
|
110
|
-
```powershell
|
|
111
|
-
$env:MARTIN_LIVE='false'
|
|
112
|
-
$repoRoot = (Get-Location).Path
|
|
113
|
-
pnpm run:cli -- run `
|
|
114
|
-
--cwd $repoRoot `
|
|
115
|
-
--objective "Summarize the current runtime state" `
|
|
116
|
-
--verify "pnpm --filter @martin/core test"
|
|
117
|
-
Remove-Item Env:MARTIN_LIVE
|
|
118
|
-
```
|
|
119
|
-
|
|
120
|
-
This does not invoke Claude Code, and it will usually end with a recorded non-success stop reason because no live provider request was attempted. That is still the fastest way to confirm the loop, persistence, and verifier path are wired correctly before you switch to a live Claude run.
|
|
121
|
-
|
|
122
|
-
## Common errors and troubleshooting
|
|
123
|
-
|
|
124
|
-
### `claude` is not found
|
|
125
|
-
|
|
126
|
-
MartinLoop can only use the Claude adapter when the Claude Code CLI is installed and available on `PATH`. Confirm the CLI itself works before you debug MartinLoop.
|
|
127
|
-
|
|
128
|
-
### The run stops with `budget_exit`
|
|
129
|
-
|
|
130
|
-
The configured budget, iteration limit, or token ceiling was too tight for the requested task. Either narrow the task or raise the budget intentionally.
|
|
131
|
-
|
|
132
|
-
### The verifier fails even though Claude Code produced a patch
|
|
133
|
-
|
|
134
|
-
That means MartinLoop did its job. The patch was attempted, but the repo did not reach a verified state. Tighten the scope, change the verifier, or ask Claude Code to address the failing checks directly.
|
|
135
|
-
|
|
136
|
-
### The run exits with `human_escalation`
|
|
137
|
-
|
|
138
|
-
That usually means MartinLoop detected a path that should not proceed unattended, such as an unsafe verifier or a control boundary that needs review.
|
|
139
|
-
|
|
140
|
-
### `martin inspect` cannot find the file
|
|
141
|
-
|
|
142
|
-
Run another task first, or point `inspect` at the correct JSONL file under `~/.martin/runs/`.
|
|
1
|
+
# Claude Code Walkthrough
|
|
2
|
+
|
|
3
|
+
This walkthrough shows how to put MartinLoop around a Claude Code-driven coding task so the run has a budget, a verifier gate, an explicit stop reason, and an inspectable run record.
|
|
4
|
+
|
|
5
|
+
Back to the repo overview: [README.md](../../README.md)
|
|
6
|
+
|
|
7
|
+
## What MartinLoop adds around Claude Code
|
|
8
|
+
|
|
9
|
+
Claude Code is the coding engine. MartinLoop is the governance layer around it.
|
|
10
|
+
|
|
11
|
+
- **Budget**: hard USD, token, and iteration limits decide how far the run can go.
|
|
12
|
+
- **Verifier**: the run only counts as complete when the post-run verification command passes.
|
|
13
|
+
- **Stop reason**: MartinLoop records why the run stopped, such as `completed`, `budget_exit`, or `human_escalation`.
|
|
14
|
+
- **Run record**: each run appends a JSONL record under `~/.martin/runs/` so you can inspect it later.
|
|
15
|
+
|
|
16
|
+
## Prerequisites
|
|
17
|
+
|
|
18
|
+
- Node.js 20+
|
|
19
|
+
- `pnpm` 10.x if you are running from this repo
|
|
20
|
+
- Claude Code CLI installed and authenticated
|
|
21
|
+
- A repo you want Claude Code to work in
|
|
22
|
+
|
|
23
|
+
## Install MartinLoop
|
|
24
|
+
|
|
25
|
+
For the published CLI:
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
npm install -g martin-loop
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
For repo-local development in this monorepo:
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
pnpm install
|
|
35
|
+
pnpm build
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
## Simple local run
|
|
39
|
+
|
|
40
|
+
Run MartinLoop with the default Claude adapter and a verifier command:
|
|
41
|
+
|
|
42
|
+
```bash
|
|
43
|
+
martin run "fix the auth regression" \
|
|
44
|
+
--engine claude \
|
|
45
|
+
--budget 3.00 \
|
|
46
|
+
--verify "pnpm test"
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
What happens:
|
|
50
|
+
|
|
51
|
+
- MartinLoop hands the objective to Claude Code
|
|
52
|
+
- Claude Code attempts the work
|
|
53
|
+
- MartinLoop runs the verifier command
|
|
54
|
+
- the loop only finishes as `completed` when the agent result and verifier both pass
|
|
55
|
+
|
|
56
|
+
## Budget example
|
|
57
|
+
|
|
58
|
+
Use a hard cap and a smaller iteration budget when you want Claude Code to stay tightly bounded:
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
martin run "tighten the login retry handling" \
|
|
62
|
+
--engine claude \
|
|
63
|
+
--budget 2.00 \
|
|
64
|
+
--soft-limit-usd 1.25 \
|
|
65
|
+
--max-iterations 2 \
|
|
66
|
+
--max-tokens 20000 \
|
|
67
|
+
--verify "pnpm --filter @martin/core test"
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
This is the key MartinLoop value-add for Claude Code workflows: the agent can keep trying, but only inside a contract you can review before the spend drifts.
|
|
71
|
+
|
|
72
|
+
## Verifier example
|
|
73
|
+
|
|
74
|
+
Use a verifier that matches the exact scope of the change:
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
martin run "update the OSS quickstart wording" \
|
|
78
|
+
--engine claude \
|
|
79
|
+
--cwd . \
|
|
80
|
+
--allow-path README.md \
|
|
81
|
+
--allow-path docs/oss/** \
|
|
82
|
+
--deny-path apps/control-plane/** \
|
|
83
|
+
--accept "Only documentation files may change" \
|
|
84
|
+
--verify "pnpm --filter @martin/core test"
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
The verifier gate matters because Claude Code producing a patch is not the same thing as the repo being in a valid state.
|
|
88
|
+
|
|
89
|
+
## Inspect example
|
|
90
|
+
|
|
91
|
+
After a run, inspect the persisted JSONL record:
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
martin inspect --file ~/.martin/runs/<workspaceId>.jsonl
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
Look for:
|
|
98
|
+
|
|
99
|
+
- the final lifecycle state and stop reason
|
|
100
|
+
- budget and token totals
|
|
101
|
+
- verifier outcome
|
|
102
|
+
- attempt count and failure classification
|
|
103
|
+
|
|
104
|
+
## Safe repo-local dry run
|
|
105
|
+
|
|
106
|
+
If you want to validate the MartinLoop flow without real model spend, use stub mode first:
|
|
107
|
+
|
|
108
|
+
### PowerShell
|
|
109
|
+
|
|
110
|
+
```powershell
|
|
111
|
+
$env:MARTIN_LIVE='false'
|
|
112
|
+
$repoRoot = (Get-Location).Path
|
|
113
|
+
pnpm run:cli -- run `
|
|
114
|
+
--cwd $repoRoot `
|
|
115
|
+
--objective "Summarize the current runtime state" `
|
|
116
|
+
--verify "pnpm --filter @martin/core test"
|
|
117
|
+
Remove-Item Env:MARTIN_LIVE
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
This does not invoke Claude Code, and it will usually end with a recorded non-success stop reason because no live provider request was attempted. That is still the fastest way to confirm the loop, persistence, and verifier path are wired correctly before you switch to a live Claude run.
|
|
121
|
+
|
|
122
|
+
## Common errors and troubleshooting
|
|
123
|
+
|
|
124
|
+
### `claude` is not found
|
|
125
|
+
|
|
126
|
+
MartinLoop can only use the Claude adapter when the Claude Code CLI is installed and available on `PATH`. Confirm the CLI itself works before you debug MartinLoop.
|
|
127
|
+
|
|
128
|
+
### The run stops with `budget_exit`
|
|
129
|
+
|
|
130
|
+
The configured budget, iteration limit, or token ceiling was too tight for the requested task. Either narrow the task or raise the budget intentionally.
|
|
131
|
+
|
|
132
|
+
### The verifier fails even though Claude Code produced a patch
|
|
133
|
+
|
|
134
|
+
That means MartinLoop did its job. The patch was attempted, but the repo did not reach a verified state. Tighten the scope, change the verifier, or ask Claude Code to address the failing checks directly.
|
|
135
|
+
|
|
136
|
+
### The run exits with `human_escalation`
|
|
137
|
+
|
|
138
|
+
That usually means MartinLoop detected a path that should not proceed unattended, such as an unsafe verifier or a control boundary that needs review.
|
|
139
|
+
|
|
140
|
+
### `martin inspect` cannot find the file
|
|
141
|
+
|
|
142
|
+
Run another task first, or point `inspect` at the correct JSONL file under `~/.martin/runs/`.
|