martin-loop 0.1.4 → 0.1.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -21
- package/README.md +398 -362
- package/demo/seeded-workspace/README.md +35 -0
- package/demo/seeded-workspace/TASKS.md +29 -0
- package/demo/seeded-workspace/martin.config.yaml +11 -0
- package/demo/seeded-workspace/package.json +8 -0
- package/demo/seeded-workspace/src/invoice-summary.js +11 -0
- package/demo/seeded-workspace/test/invoice-summary.test.js +20 -0
- package/dist/vendor/adapters/claude-cli.d.ts +19 -4
- package/dist/vendor/adapters/claude-cli.js +55 -24
- package/dist/vendor/adapters/cli-bridge.d.ts +1 -0
- package/dist/vendor/adapters/cli-bridge.js +154 -28
- package/dist/vendor/adapters/index.d.ts +1 -0
- package/dist/vendor/adapters/index.js +1 -0
- package/dist/vendor/adapters/verifier-only.d.ts +7 -0
- package/dist/vendor/adapters/verifier-only.js +57 -0
- package/dist/vendor/cli/index.d.ts +6 -1
- package/dist/vendor/cli/index.js +124 -7
- package/dist/vendor/contracts/index.d.ts +3 -1
- package/dist/vendor/core/compiler.d.ts +2 -0
- package/dist/vendor/core/compiler.js +10 -4
- package/dist/vendor/core/context-integrity.d.ts +26 -0
- package/dist/vendor/core/context-integrity.js +56 -0
- package/dist/vendor/core/index.d.ts +5 -2
- package/dist/vendor/core/index.js +186 -54
- package/dist/vendor/core/policy.d.ts +6 -0
- package/docs/distribution/DIRECTORY-SUBMISSIONS.md +89 -0
- package/docs/distribution/INTEGRATION-OUTREACH.md +61 -0
- package/docs/distribution/UNDER-3-CHALLENGE.md +65 -0
- package/docs/oss/CLAUDE-CODE-WALKTHROUGH.md +142 -0
- package/docs/oss/EXAMPLES.md +134 -126
- package/docs/oss/OSS-BOUNDARY-REPORT.json +109 -113
- package/docs/oss/OSS-BOUNDARY-REPORT.md +48 -48
- package/docs/oss/QUICKSTART.md +165 -135
- package/docs/oss/RALPH-LOOP-SAFETY.md +113 -0
- package/docs/oss/README.md +96 -93
- package/docs/oss/RELEASE-SURFACE-REPORT.json +45 -45
- package/docs/oss/RELEASE-SURFACE-REPORT.md +35 -35
- package/package.json +19 -11
|
@@ -1,48 +1,48 @@
|
|
|
1
|
-
# Martin Loop Phase 13 OSS Core Boundary
|
|
2
|
-
|
|
3
|
-
Generated: 2026-
|
|
4
|
-
|
|
5
|
-
## Verdict
|
|
6
|
-
**GO**
|
|
7
|
-
|
|
8
|
-
## Summary
|
|
9
|
-
- Public package target: martin-loop
|
|
10
|
-
- Canonical public package manager: npm
|
|
11
|
-
- Intended OSS core packages: 5
|
|
12
|
-
- Non-OSS workspace packages: 2
|
|
13
|
-
- Local-only surfaces: 1
|
|
14
|
-
- Private OSS-core packages still gated from publish: 3
|
|
15
|
-
- OSS-core packages already publish-configured: 2
|
|
16
|
-
- Dependency leaks: 0
|
|
17
|
-
- No workspace dependency leaks detected between the intended OSS core and the non-OSS workspace surfaces.
|
|
18
|
-
|
|
19
|
-
## Public Package Surface
|
|
20
|
-
- Install target: `npm install martin-loop`
|
|
21
|
-
- CLI target: `npx martin-loop`
|
|
22
|
-
- SDK target: `import { MartinLoop } from 'martin-loop'`
|
|
23
|
-
- Root `npx martin-loop` support shipped: yes
|
|
24
|
-
- Root SDK import shipped: yes
|
|
25
|
-
|
|
26
|
-
## Intended OSS Core Packages
|
|
27
|
-
| Package | Path | Private | Publish Access | Workspace Deps |
|
|
28
|
-
|---|---|---|---|---|
|
|
29
|
-
| @martin/contracts | packages/contracts | yes | n/a | none |
|
|
30
|
-
| @martin/core | packages/core | yes | n/a | @martin/contracts |
|
|
31
|
-
| @martin/adapters | packages/adapters | yes | n/a | @martin/core |
|
|
32
|
-
| @martin/cli | packages/cli | no | public | @martin/adapters, @martin/contracts, @martin/core |
|
|
33
|
-
| @
|
|
34
|
-
|
|
35
|
-
## Non-OSS Workspace Packages
|
|
36
|
-
| Package | Path | Reason |
|
|
37
|
-
|---|---|---|
|
|
38
|
-
| @martin/control-plane | apps/control-plane | Managed or RC-only workspace surface that stays out of the initial OSS boundary. |
|
|
39
|
-
| @martin/benchmarks | benchmarks | Managed or RC-only workspace surface that stays out of the initial OSS boundary. |
|
|
40
|
-
|
|
41
|
-
## Local-Only Surfaces
|
|
42
|
-
| Path | Reason |
|
|
43
|
-
|---|---|
|
|
44
|
-
| apps/local-dashboard | Local read-model viewer that is not yet packaged as a publishable OSS workspace. |
|
|
45
|
-
|
|
46
|
-
## Dependency Leak Review
|
|
47
|
-
- No workspace dependency leaks detected.
|
|
48
|
-
|
|
1
|
+
# Martin Loop Phase 13 OSS Core Boundary
|
|
2
|
+
|
|
3
|
+
Generated: 2026-05-11T21:47:36.834Z
|
|
4
|
+
|
|
5
|
+
## Verdict
|
|
6
|
+
**GO**
|
|
7
|
+
|
|
8
|
+
## Summary
|
|
9
|
+
- Public package target: martin-loop
|
|
10
|
+
- Canonical public package manager: npm
|
|
11
|
+
- Intended OSS core packages: 5
|
|
12
|
+
- Non-OSS workspace packages: 2
|
|
13
|
+
- Local-only surfaces: 1
|
|
14
|
+
- Private OSS-core packages still gated from publish: 3
|
|
15
|
+
- OSS-core packages already publish-configured: 2
|
|
16
|
+
- Dependency leaks: 0
|
|
17
|
+
- No workspace dependency leaks detected between the intended OSS core and the non-OSS workspace surfaces.
|
|
18
|
+
|
|
19
|
+
## Public Package Surface
|
|
20
|
+
- Install target: `npm install martin-loop`
|
|
21
|
+
- CLI target: `npx martin-loop`
|
|
22
|
+
- SDK target: `import { MartinLoop } from 'martin-loop'`
|
|
23
|
+
- Root `npx martin-loop` support shipped: yes
|
|
24
|
+
- Root SDK import shipped: yes
|
|
25
|
+
|
|
26
|
+
## Intended OSS Core Packages
|
|
27
|
+
| Package | Path | Private | Publish Access | Workspace Deps |
|
|
28
|
+
|---|---|---|---|---|
|
|
29
|
+
| @martin/contracts | packages/contracts | yes | n/a | none |
|
|
30
|
+
| @martin/core | packages/core | yes | n/a | @martin/contracts |
|
|
31
|
+
| @martin/adapters | packages/adapters | yes | n/a | @martin/core |
|
|
32
|
+
| @martin/cli | packages/cli | no | public | @martin/adapters, @martin/contracts, @martin/core |
|
|
33
|
+
| @martinloop/mcp | packages/mcp | no | public | none |
|
|
34
|
+
|
|
35
|
+
## Non-OSS Workspace Packages
|
|
36
|
+
| Package | Path | Reason |
|
|
37
|
+
|---|---|---|
|
|
38
|
+
| @martin/control-plane | apps/control-plane | Managed or RC-only workspace surface that stays out of the initial OSS boundary. |
|
|
39
|
+
| @martin/benchmarks | benchmarks | Managed or RC-only workspace surface that stays out of the initial OSS boundary. |
|
|
40
|
+
|
|
41
|
+
## Local-Only Surfaces
|
|
42
|
+
| Path | Reason |
|
|
43
|
+
|---|---|
|
|
44
|
+
| apps/local-dashboard | Local read-model viewer that is not yet packaged as a publishable OSS workspace. |
|
|
45
|
+
|
|
46
|
+
## Dependency Leak Review
|
|
47
|
+
- No workspace dependency leaks detected.
|
|
48
|
+
|
package/docs/oss/QUICKSTART.md
CHANGED
|
@@ -1,135 +1,165 @@
|
|
|
1
|
-
# Quickstart
|
|
2
|
-
|
|
3
|
-
This quickstart is intentionally conservative. It is written for a fresh engineer validating the current Phase 13 release-candidate state, not for a hypothetical future public release.
|
|
4
|
-
|
|
5
|
-
## Public launch target vs current RC path
|
|
6
|
-
|
|
7
|
-
The frozen public launch target is:
|
|
8
|
-
|
|
9
|
-
- `npm install martin-loop`
|
|
10
|
-
- `npx martin-loop ...`
|
|
11
|
-
- `import { MartinLoop } from "martin-loop"`
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
-
|
|
19
|
-
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
pnpm
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
-
|
|
45
|
-
-
|
|
46
|
-
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
- `pnpm
|
|
56
|
-
- `pnpm
|
|
57
|
-
- `pnpm
|
|
58
|
-
- `pnpm
|
|
59
|
-
- `pnpm
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
pnpm
|
|
66
|
-
pnpm
|
|
67
|
-
pnpm
|
|
68
|
-
pnpm
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
```
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
1
|
+
# Quickstart
|
|
2
|
+
|
|
3
|
+
This quickstart is intentionally conservative. It is written for a fresh engineer validating the current Phase 13 release-candidate state, not for a hypothetical future public release.
|
|
4
|
+
|
|
5
|
+
## Public launch target vs current RC path
|
|
6
|
+
|
|
7
|
+
The frozen public launch target is:
|
|
8
|
+
|
|
9
|
+
- `npm install martin-loop`
|
|
10
|
+
- `npx martin-loop ...`
|
|
11
|
+
- `import { MartinLoop } from "martin-loop"`
|
|
12
|
+
- `npx @martinloop/mcp`
|
|
13
|
+
|
|
14
|
+
That runtime launch surface is implemented in the root package facade and smoke-validated from a clean temporary install. The MCP package shape is also smoke-validated from a packed tarball. This quickstart still documents the honest RC-from-source path because public registry publication is a separate release step.
|
|
15
|
+
|
|
16
|
+
## Prerequisites
|
|
17
|
+
|
|
18
|
+
- Node.js 20+ recommended
|
|
19
|
+
- `pnpm` 10.x
|
|
20
|
+
- A clean local checkout of this repo
|
|
21
|
+
|
|
22
|
+
Optional for live runs:
|
|
23
|
+
|
|
24
|
+
- Claude Code CLI for the Claude adapter path
|
|
25
|
+
- OpenAI Codex CLI plus credentials for the Codex adapter path
|
|
26
|
+
|
|
27
|
+
## Install and build
|
|
28
|
+
|
|
29
|
+
From the repo root:
|
|
30
|
+
|
|
31
|
+
```bash
|
|
32
|
+
pnpm install
|
|
33
|
+
pnpm build
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## Run the RC validation matrix
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
pnpm rc:validate
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
What this does:
|
|
43
|
+
|
|
44
|
+
- creates an isolated temporary home or profile directory
|
|
45
|
+
- points Martin run artifacts at that clean location
|
|
46
|
+
- runs the current build, lint, test, benchmark, and certification matrix
|
|
47
|
+
- writes step logs into a temp `martin-rc-validation-*` directory
|
|
48
|
+
|
|
49
|
+
Use this when you want to answer, "Can a fresh environment still reproduce the current RC baseline?"
|
|
50
|
+
|
|
51
|
+
## RC gate commands
|
|
52
|
+
|
|
53
|
+
The current Phase 13 RC gate is made of these commands:
|
|
54
|
+
|
|
55
|
+
- `pnpm oss:validate`
|
|
56
|
+
- `pnpm public:smoke`
|
|
57
|
+
- `pnpm repo:smoke`
|
|
58
|
+
- `pnpm rc:validate`
|
|
59
|
+
- `pnpm pilot:prep:validate`
|
|
60
|
+
- `pnpm release:matrix:local`
|
|
61
|
+
|
|
62
|
+
Recommended order for a fresh local reviewer:
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
pnpm oss:validate
|
|
66
|
+
pnpm public:smoke
|
|
67
|
+
pnpm repo:smoke
|
|
68
|
+
pnpm rc:validate
|
|
69
|
+
pnpm release:matrix:local
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
`pnpm release:matrix:local` runs the full local OS lane for the current machine. The repository also defines Windows, macOS, and Linux CI lanes in `.github/workflows/phase13-release-matrix.yml`.
|
|
73
|
+
|
|
74
|
+
## Stub-safe CLI run
|
|
75
|
+
|
|
76
|
+
This is the safest first run because it avoids real provider spend.
|
|
77
|
+
|
|
78
|
+
### PowerShell
|
|
79
|
+
|
|
80
|
+
```powershell
|
|
81
|
+
$env:MARTIN_LIVE='false'
|
|
82
|
+
pnpm run:cli -- run --objective "Summarize the current runtime state" --verify "pnpm --filter @martin/core test"
|
|
83
|
+
Remove-Item Env:MARTIN_LIVE
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
### Bash
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
MARTIN_LIVE=false pnpm run:cli -- run --objective "Summarize the current runtime state" --verify "pnpm --filter @martin/core test"
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
This path uses the stub adapter and still exercises the loop, persistence, and policy surfaces.
|
|
93
|
+
|
|
94
|
+
## Config-driven run
|
|
95
|
+
|
|
96
|
+
The repo ships an example config at `martin.config.example.yaml`.
|
|
97
|
+
|
|
98
|
+
Martin auto-looks for `martin.config.yaml` in the invocation root, or you can pass `--config <path>`.
|
|
99
|
+
|
|
100
|
+
Example:
|
|
101
|
+
|
|
102
|
+
```bash
|
|
103
|
+
pnpm run:cli -- run --config martin.config.example.yaml --objective "Run with repo defaults" --verify "pnpm --filter @martin/core test"
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
## Inspect a saved run
|
|
107
|
+
|
|
108
|
+
Martin persists runs under `~/.martin/runs/` by default, or under `MARTIN_RUNS_DIR` if you override it.
|
|
109
|
+
|
|
110
|
+
```bash
|
|
111
|
+
pnpm run:cli -- inspect --file path/to/loop-record.json
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
For persisted run folders, inspect the `contract.json`, `state.json`, `ledger.jsonl`, and `artifacts/attempt-XXX/` files together. Those artifacts are the source of truth for runtime behavior.
|
|
115
|
+
|
|
116
|
+
## MCP server
|
|
117
|
+
|
|
118
|
+
The publish-ready MCP install target is:
|
|
119
|
+
|
|
120
|
+
```bash
|
|
121
|
+
npx @martinloop/mcp
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
Claude Code one-line install:
|
|
125
|
+
|
|
126
|
+
```bash
|
|
127
|
+
# macOS/Linux
|
|
128
|
+
claude mcp add --scope user martin-loop -- npx @martinloop/mcp
|
|
129
|
+
|
|
130
|
+
# Windows PowerShell/cmd
|
|
131
|
+
claude mcp add --scope user martin-loop cmd /c "npx @martinloop/mcp"
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
Official MCP Registry publication has an extra metadata step beyond npm packaging. Do not mark `@martinloop/mcp` registry-ready unless both of these exist and match:
|
|
135
|
+
|
|
136
|
+
- `packages/mcp/package.json` with `mcpName`
|
|
137
|
+
- `packages/mcp/server.json` with the official server metadata
|
|
138
|
+
|
|
139
|
+
After publishing `@martinloop/mcp` to npm, run the official registry publisher from `packages/mcp`:
|
|
140
|
+
|
|
141
|
+
```bash
|
|
142
|
+
mcp-publisher login github
|
|
143
|
+
mcp-publisher publish
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
For repo-local verification from source:
|
|
147
|
+
|
|
148
|
+
```bash
|
|
149
|
+
pnpm --filter @martinloop/mcp build
|
|
150
|
+
pnpm --filter @martinloop/mcp smoke:pack
|
|
151
|
+
node packages/mcp/dist/server.js
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
The current MCP tools are:
|
|
155
|
+
|
|
156
|
+
- `martin_run`
|
|
157
|
+
- `martin_inspect`
|
|
158
|
+
- `martin_status`
|
|
159
|
+
|
|
160
|
+
## Notes for reviewers
|
|
161
|
+
|
|
162
|
+
- Fresh-home behavior matters. Do not rely only on a long-lived `~/.martin` directory.
|
|
163
|
+
- Exact-versus-estimated cost labels are meaningful and should not be merged in docs or dashboards.
|
|
164
|
+
- The repo contains control-plane code, but the public OSS boundary is still being finalized during Phase 13.
|
|
165
|
+
- The benchmark harness remains a workspace-level RC surface; `martin bench` is not part of the publishable CLI boundary yet.
|
|
@@ -0,0 +1,113 @@
|
|
|
1
|
+
# Ralph-Style Loop Safety Guide
|
|
2
|
+
|
|
3
|
+
Ralph-style loops are useful because they keep trying until a coding task reaches a stopping condition. MartinLoop is not a replacement for that pattern. It is the governance layer that makes the pattern safer to run unattended.
|
|
4
|
+
|
|
5
|
+
For install and first-run steps, start with the repo quickstart: [README.md#quick-start](../../README.md#quick-start)
|
|
6
|
+
|
|
7
|
+
## 1. What Ralph-style loops do well
|
|
8
|
+
|
|
9
|
+
Ralph-style loops are good at persistence:
|
|
10
|
+
|
|
11
|
+
- they retry after a failed attempt
|
|
12
|
+
- they keep working toward a concrete objective
|
|
13
|
+
- they help teams automate long-running coding tasks that would otherwise need constant supervision
|
|
14
|
+
|
|
15
|
+
That persistence is the reason teams use them. The problem is not the existence of the loop. The problem is what happens when the loop keeps running without a clear governance contract.
|
|
16
|
+
|
|
17
|
+
## 2. Where unattended loops fail
|
|
18
|
+
|
|
19
|
+
An unattended coding loop can fail in ways that are expensive even when no single attempt looks dramatic on its own:
|
|
20
|
+
|
|
21
|
+
- spend keeps accumulating across retries
|
|
22
|
+
- verifier failures repeat without a meaningful strategy change
|
|
23
|
+
- file edits drift outside the intended task boundary
|
|
24
|
+
- the final outcome is hard to audit because the reasoning trail is incomplete
|
|
25
|
+
- operators know that the loop stopped, but not whether it stopped for success, safety, or exhaustion
|
|
26
|
+
|
|
27
|
+
Those are governance failures, not only model failures.
|
|
28
|
+
|
|
29
|
+
## 3. Why max iterations alone are not enough
|
|
30
|
+
|
|
31
|
+
A max-iteration limit is helpful, but it only answers one question: "How many times may this loop try?"
|
|
32
|
+
|
|
33
|
+
It does not answer:
|
|
34
|
+
|
|
35
|
+
- how much budget can be spent before the next attempt is rejected
|
|
36
|
+
- whether the verifier command is safe to run
|
|
37
|
+
- whether the patch stayed inside the approved file scope
|
|
38
|
+
- whether a failed run left rollback evidence behind
|
|
39
|
+
- whether the recorded outcome is trustworthy enough to resume or inspect later
|
|
40
|
+
|
|
41
|
+
Iteration caps are one guardrail. They are not a full control layer.
|
|
42
|
+
|
|
43
|
+
## 4. What MartinLoop adds
|
|
44
|
+
|
|
45
|
+
MartinLoop governs the loop before, during, and after execution:
|
|
46
|
+
|
|
47
|
+
- **Budget governance** rejects work that would exceed the configured spend, token, or iteration envelope
|
|
48
|
+
- **Verifier gates** only allow a run to finish as `completed` when the agent result and verification state both pass
|
|
49
|
+
- **Safety leash checks** evaluate verifier commands, file boundaries, and approval-sensitive actions before work is accepted
|
|
50
|
+
- **Stop reasons** make the final lifecycle state explicit, such as `completed`, `budget_exit`, or `human_escalation`
|
|
51
|
+
- **Run records** append JSONL evidence under `~/.martin/runs/` so operators can inspect what happened later
|
|
52
|
+
- **Rollback evidence** preserves the recovery boundary for repo-backed runs when persistence is configured
|
|
53
|
+
|
|
54
|
+
That is why MartinLoop should be thought of as a companion governance layer around a Ralph-style loop, not an argument against using one.
|
|
55
|
+
|
|
56
|
+
## 5. Example governed run
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
martin run "fix the auth regression" \
|
|
60
|
+
--budget 3.00 \
|
|
61
|
+
--soft-limit-usd 2.00 \
|
|
62
|
+
--max-iterations 2 \
|
|
63
|
+
--verify "pnpm test"
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
This changes the operator contract in a few important ways:
|
|
67
|
+
|
|
68
|
+
- the next attempt can be rejected before overspend happens
|
|
69
|
+
- the run still has to satisfy the verifier
|
|
70
|
+
- the final state is inspectable instead of being inferred from logs alone
|
|
71
|
+
|
|
72
|
+
## 6. Example stop reason
|
|
73
|
+
|
|
74
|
+
MartinLoop returns an explicit lifecycle state and reason when a run stops:
|
|
75
|
+
|
|
76
|
+
```json
|
|
77
|
+
{
|
|
78
|
+
"decision": {
|
|
79
|
+
"shouldExit": true,
|
|
80
|
+
"lifecycleState": "budget_exit",
|
|
81
|
+
"status": "exited",
|
|
82
|
+
"reason": "Martin exited because the budget governor hit a hard limit."
|
|
83
|
+
}
|
|
84
|
+
}
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
That answer is more useful than "the loop stopped" because it tells the operator whether the run ended for success, safety, or exhaustion.
|
|
88
|
+
|
|
89
|
+
## 7. Example JSONL run record
|
|
90
|
+
|
|
91
|
+
Each run appends a JSONL record shaped like:
|
|
92
|
+
|
|
93
|
+
```json
|
|
94
|
+
{
|
|
95
|
+
"loopId": "loop_example123",
|
|
96
|
+
"workspaceId": "ws_demo",
|
|
97
|
+
"projectId": "proj_demo",
|
|
98
|
+
"status": "exited",
|
|
99
|
+
"lifecycleState": "budget_exit",
|
|
100
|
+
"budget": {
|
|
101
|
+
"maxUsd": 3,
|
|
102
|
+
"softLimitUsd": 2,
|
|
103
|
+
"maxIterations": 2,
|
|
104
|
+
"maxTokens": 20000
|
|
105
|
+
},
|
|
106
|
+
"metadata": {
|
|
107
|
+
"policyProfile": "balanced",
|
|
108
|
+
"telemetryDestination": "local-only"
|
|
109
|
+
}
|
|
110
|
+
}
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
The full record can also include attempts, events, verifier outcomes, and persisted artifact references. That is the evidence trail MartinLoop adds around a retrying coding loop.
|