npm - martin-loop - Versions diffs - 0.1.3 → 0.1.5 - Mend

martin-loop 0.1.3 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (38) hide show

package/README.md +52 -16
package/demo/seeded-workspace/README.md +35 -0
package/demo/seeded-workspace/TASKS.md +29 -0
package/demo/seeded-workspace/martin.config.yaml +11 -0
package/demo/seeded-workspace/package.json +8 -0
package/demo/seeded-workspace/src/invoice-summary.js +11 -0
package/demo/seeded-workspace/test/invoice-summary.test.js +20 -0
package/dist/vendor/adapters/claude-cli.d.ts +19 -4
package/dist/vendor/adapters/claude-cli.js +55 -24
package/dist/vendor/adapters/cli-bridge.d.ts +1 -0
package/dist/vendor/adapters/cli-bridge.js +154 -28
package/dist/vendor/adapters/index.d.ts +1 -0
package/dist/vendor/adapters/index.js +1 -0
package/dist/vendor/adapters/verifier-only.d.ts +7 -0
package/dist/vendor/adapters/verifier-only.js +57 -0
package/dist/vendor/cli/index.d.ts +6 -1
package/dist/vendor/cli/index.js +124 -7
package/dist/vendor/contracts/index.d.ts +3 -1
package/dist/vendor/core/compiler.d.ts +2 -0
package/dist/vendor/core/compiler.js +10 -4
package/dist/vendor/core/context-integrity.d.ts +26 -0
package/dist/vendor/core/context-integrity.js +56 -0
package/dist/vendor/core/index.d.ts +5 -2
package/dist/vendor/core/index.js +186 -54
package/dist/vendor/core/policy.d.ts +6 -0
package/docs/distribution/DIRECTORY-SUBMISSIONS.md +89 -0
package/docs/distribution/INTEGRATION-OUTREACH.md +61 -0
package/docs/distribution/UNDER-3-CHALLENGE.md +65 -0
package/docs/oss/CLAUDE-CODE-WALKTHROUGH.md +142 -0
package/docs/oss/EXAMPLES.md +9 -1
package/docs/oss/OSS-BOUNDARY-REPORT.json +3 -7
package/docs/oss/OSS-BOUNDARY-REPORT.md +2 -2
package/docs/oss/QUICKSTART.md +33 -3
package/docs/oss/RALPH-LOOP-SAFETY.md +113 -0
package/docs/oss/README.md +6 -3
package/docs/oss/RELEASE-SURFACE-REPORT.json +1 -1
package/docs/oss/RELEASE-SURFACE-REPORT.md +1 -1
package/package.json +8 -2

package/docs/distribution/UNDER-3-CHALLENGE.md ADDED Viewed

@@ -0,0 +1,65 @@
+# Can your AI coding agent finish this task under $3?
+MartinLoop is testing a simple question:
+Can an AI coding agent complete a task under a fixed budget, with verifier-passed completion and an inspectable run record?
+## Current repo-backed comparison
+Same task, same starting state:
+- governed MartinLoop run: `$2.30`
+- uncontrolled retry loop: `$5.20`
+- governed outcome: `completed` and verifier-passed with an inspectable record
+- uncontrolled outcome: failed after repeated retries with no comparable audit trail
+These numbers match the current public benchmark story shown in the repo README and visualized in [`docs/assets/side-by-side.svg`](../assets/side-by-side.svg).
+## Why this matters
+The claim is not that every governed run is always cheaper. The claim is that the run becomes inspectable and enforceable:
+- budget policy is explicit
+- verifier success is explicit
+- stop reasons are explicit
+- artifacts are inspectable after the run
+That makes a coding-agent result easier to trust, replay, compare, and audit.
+## Reproduce it
+From the repo root:
+```bash
+pnpm --filter @martin/benchmarks test
+pnpm --filter @martin/benchmarks eval
+pnpm --filter @martin/benchmarks eval:phase12
+```
+## What to share back
+If you run a similar challenge with Claude Code, Codex CLI, Cursor, Aider, Cline, Continue, OpenHands, SWE-agent, Goose, or an internal coding agent, share:
+- total budget used
+- number of attempts
+- verifier result
+- whether the final run was auditable
+- whether rollback evidence was available
+## Try MartinLoop without risking your repo
+You can copy the public demo sandbox first:
+```bash
+npx martin-loop demo
+```
+Then run the sandbox locally with the printed next steps.
+## Claim boundary
+This page intentionally stays inside the current public evidence boundary:
+- the `$2.30` and `$5.20` figures are the current repo-backed benchmark story used in the public README
+- the reproduction commands above are real commands from this repository
+- the benchmark harness remains a workspace-level surface, so challenge claims should stay tied to repo-backed outputs rather than generic marketing numbers

package/docs/oss/CLAUDE-CODE-WALKTHROUGH.md ADDED Viewed

@@ -0,0 +1,142 @@
+# Claude Code Walkthrough
+This walkthrough shows how to put MartinLoop around a Claude Code-driven coding task so the run has a budget, a verifier gate, an explicit stop reason, and an inspectable run record.
+Back to the repo overview: [README.md](../../README.md)
+## What MartinLoop adds around Claude Code
+Claude Code is the coding engine. MartinLoop is the governance layer around it.
+- **Budget**: hard USD, token, and iteration limits decide how far the run can go.
+- **Verifier**: the run only counts as complete when the post-run verification command passes.
+- **Stop reason**: MartinLoop records why the run stopped, such as `completed`, `budget_exit`, or `human_escalation`.
+- **Run record**: each run appends a JSONL record under `~/.martin/runs/` so you can inspect it later.
+## Prerequisites
+- Node.js 20+
+- `pnpm` 10.x if you are running from this repo
+- Claude Code CLI installed and authenticated
+- A repo you want Claude Code to work in
+## Install MartinLoop
+For the published CLI:
+```bash
+npm install -g martin-loop
+```
+For repo-local development in this monorepo:
+```bash
+pnpm install
+pnpm build
+```
+## Simple local run
+Run MartinLoop with the default Claude adapter and a verifier command:
+```bash
+martin run "fix the auth regression" \
+  --engine claude \
+  --budget 3.00 \
+  --verify "pnpm test"
+```
+What happens:
+- MartinLoop hands the objective to Claude Code
+- Claude Code attempts the work
+- MartinLoop runs the verifier command
+- the loop only finishes as `completed` when the agent result and verifier both pass
+## Budget example
+Use a hard cap and a smaller iteration budget when you want Claude Code to stay tightly bounded:
+```bash
+martin run "tighten the login retry handling" \
+  --engine claude \
+  --budget 2.00 \
+  --soft-limit-usd 1.25 \
+  --max-iterations 2 \
+  --max-tokens 20000 \
+  --verify "pnpm --filter @martin/core test"
+```
+This is the key MartinLoop value-add for Claude Code workflows: the agent can keep trying, but only inside a contract you can review before the spend drifts.
+## Verifier example
+Use a verifier that matches the exact scope of the change:
+```bash
+martin run "update the OSS quickstart wording" \
+  --engine claude \
+  --cwd . \
+  --allow-path README.md \
+  --allow-path docs/oss/** \
+  --deny-path apps/control-plane/** \
+  --accept "Only documentation files may change" \
+  --verify "pnpm --filter @martin/core test"
+```
+The verifier gate matters because Claude Code producing a patch is not the same thing as the repo being in a valid state.
+## Inspect example
+After a run, inspect the persisted JSONL record:
+```bash
+martin inspect --file ~/.martin/runs/<workspaceId>.jsonl
+```
+Look for:
+- the final lifecycle state and stop reason
+- budget and token totals
+- verifier outcome
+- attempt count and failure classification
+## Safe repo-local dry run
+If you want to validate the MartinLoop flow without real model spend, use stub mode first:
+### PowerShell
+```powershell
+$env:MARTIN_LIVE='false'
+$repoRoot = (Get-Location).Path
+pnpm run:cli -- run `
+  --cwd $repoRoot `
+  --objective "Summarize the current runtime state" `
+  --verify "pnpm --filter @martin/core test"
+Remove-Item Env:MARTIN_LIVE
+```
+This does not invoke Claude Code, and it will usually end with a recorded non-success stop reason because no live provider request was attempted. That is still the fastest way to confirm the loop, persistence, and verifier path are wired correctly before you switch to a live Claude run.
+## Common errors and troubleshooting
+### `claude` is not found
+MartinLoop can only use the Claude adapter when the Claude Code CLI is installed and available on `PATH`. Confirm the CLI itself works before you debug MartinLoop.
+### The run stops with `budget_exit`
+The configured budget, iteration limit, or token ceiling was too tight for the requested task. Either narrow the task or raise the budget intentionally.
+### The verifier fails even though Claude Code produced a patch
+That means MartinLoop did its job. The patch was attempted, but the repo did not reach a verified state. Tighten the scope, change the verifier, or ask Claude Code to address the failing checks directly.
+### The run exits with `human_escalation`
+That usually means MartinLoop detected a path that should not proceed unattended, such as an unsafe verifier or a control boundary that needs review.
+### `martin inspect` cannot find the file
+Run another task first, or point `inspect` at the correct JSONL file under `~/.martin/runs/`.

package/docs/oss/EXAMPLES.md CHANGED Viewed

@@ -107,7 +107,15 @@ Example `martin_run` payload:
 }
 ```
-## 6. What to inspect in artifacts
+## 6. GitHub Actions budget gate example
+See [`examples/github-actions-budget-gate/`](../../examples/github-actions-budget-gate/) for a CI-safe example that runs MartinLoop with a budget cap, an explicit verifier, and an uploaded JSONL run record artifact.
+## 7. OpenCode-style adapter example
+If you want a runnable, no-credentials-required adapter sketch for another coding runtime, see [`examples/opencode-adapter/`](../../examples/opencode-adapter/). It shows how to keep MartinLoop's budget, verifier, and JSONL record shape stable around an OpenCode-style workflow without claiming a native adapter already exists.
+## 8. What to inspect in artifacts
 For a repo-backed attempt, look at:

package/docs/oss/OSS-BOUNDARY-REPORT.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "generatedAt": "2026-04-22T11:35:54.851Z",
+  "generatedAt": "2026-05-11T21:47:36.834Z",
   "verdict": "go",
   "publicSurface": {
     "packageName": "martin-loop",
@@ -56,15 +56,11 @@
       "classificationReason": "Intended Phase 13 OSS core surface."
     },
     {
-      "name": "@martin/mcp",
+      "name": "@martinloop/mcp",
       "path": "packages/mcp",
       "private": false,
       "publishAccess": "public",
-      "workspaceDependencies": [
-        "@martin/adapters",
-        "@martin/contracts",
-        "@martin/core"
-      ],
+      "workspaceDependencies": [],
       "classification": "oss_core",
       "classificationReason": "Intended Phase 13 OSS core surface."
     }

package/docs/oss/OSS-BOUNDARY-REPORT.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Martin Loop Phase 13 OSS Core Boundary
-Generated: 2026-04-22T11:35:54.851Z
+Generated: 2026-05-11T21:47:36.834Z
 ## Verdict
 **GO**
@@ -30,7 +30,7 @@ Generated: 2026-04-22T11:35:54.851Z
 | @martin/core | packages/core | yes | n/a | @martin/contracts |
 | @martin/adapters | packages/adapters | yes | n/a | @martin/core |
 | @martin/cli | packages/cli | no | public | @martin/adapters, @martin/contracts, @martin/core |
-| @martin/mcp | packages/mcp | no | public | @martin/adapters, @martin/contracts, @martin/core |
+| @martinloop/mcp | packages/mcp | no | public | none |
 ## Non-OSS Workspace Packages
 | Package | Path | Reason |

package/docs/oss/QUICKSTART.md CHANGED Viewed

@@ -9,8 +9,9 @@ The frozen public launch target is:
 - `npm install martin-loop`
 - `npx martin-loop ...`
 - `import { MartinLoop } from "martin-loop"`
+- `npx @martinloop/mcp`
-That launch surface is now implemented in the root package facade and smoke-validated from a clean temporary install. This quickstart still documents the honest RC-from-source path because public registry publication is a later release step.
+That runtime launch surface is implemented in the root package facade and smoke-validated from a clean temporary install. The MCP package shape is also smoke-validated from a packed tarball. This quickstart still documents the honest RC-from-source path because public registry publication is a separate release step.
 ## Prerequisites
@@ -114,10 +115,39 @@ For persisted run folders, inspect the `contract.json`, `state.json`, `ledger.js
 ## MCP server
-Build first, then start the server from the workspace:
+The publish-ready MCP install target is:
 ```bash
-pnpm --filter @martin/mcp build
+npx @martinloop/mcp
+```
+Claude Code one-line install:
+```bash
+# macOS/Linux
+claude mcp add --scope user martin-loop -- npx @martinloop/mcp
+# Windows PowerShell/cmd
+claude mcp add --scope user martin-loop cmd /c "npx @martinloop/mcp"
+```
+Official MCP Registry publication has an extra metadata step beyond npm packaging. Do not mark `@martinloop/mcp` registry-ready unless both of these exist and match:
+- `packages/mcp/package.json` with `mcpName`
+- `packages/mcp/server.json` with the official server metadata
+After publishing `@martinloop/mcp` to npm, run the official registry publisher from `packages/mcp`:
+```bash
+mcp-publisher login github
+mcp-publisher publish
+```
+For repo-local verification from source:
+```bash
+pnpm --filter @martinloop/mcp build
+pnpm --filter @martinloop/mcp smoke:pack
 node packages/mcp/dist/server.js
 ```

package/docs/oss/RALPH-LOOP-SAFETY.md ADDED Viewed

@@ -0,0 +1,113 @@
+# Ralph-Style Loop Safety Guide
+Ralph-style loops are useful because they keep trying until a coding task reaches a stopping condition. MartinLoop is not a replacement for that pattern. It is the governance layer that makes the pattern safer to run unattended.
+For install and first-run steps, start with the repo quickstart: [README.md#quick-start](../../README.md#quick-start)
+## 1. What Ralph-style loops do well
+Ralph-style loops are good at persistence:
+- they retry after a failed attempt
+- they keep working toward a concrete objective
+- they help teams automate long-running coding tasks that would otherwise need constant supervision
+That persistence is the reason teams use them. The problem is not the existence of the loop. The problem is what happens when the loop keeps running without a clear governance contract.
+## 2. Where unattended loops fail
+An unattended coding loop can fail in ways that are expensive even when no single attempt looks dramatic on its own:
+- spend keeps accumulating across retries
+- verifier failures repeat without a meaningful strategy change
+- file edits drift outside the intended task boundary
+- the final outcome is hard to audit because the reasoning trail is incomplete
+- operators know that the loop stopped, but not whether it stopped for success, safety, or exhaustion
+Those are governance failures, not only model failures.
+## 3. Why max iterations alone are not enough
+A max-iteration limit is helpful, but it only answers one question: "How many times may this loop try?"
+It does not answer:
+- how much budget can be spent before the next attempt is rejected
+- whether the verifier command is safe to run
+- whether the patch stayed inside the approved file scope
+- whether a failed run left rollback evidence behind
+- whether the recorded outcome is trustworthy enough to resume or inspect later
+Iteration caps are one guardrail. They are not a full control layer.
+## 4. What MartinLoop adds
+MartinLoop governs the loop before, during, and after execution:
+- **Budget governance** rejects work that would exceed the configured spend, token, or iteration envelope
+- **Verifier gates** only allow a run to finish as `completed` when the agent result and verification state both pass
+- **Safety leash checks** evaluate verifier commands, file boundaries, and approval-sensitive actions before work is accepted
+- **Stop reasons** make the final lifecycle state explicit, such as `completed`, `budget_exit`, or `human_escalation`
+- **Run records** append JSONL evidence under `~/.martin/runs/` so operators can inspect what happened later
+- **Rollback evidence** preserves the recovery boundary for repo-backed runs when persistence is configured
+That is why MartinLoop should be thought of as a companion governance layer around a Ralph-style loop, not an argument against using one.
+## 5. Example governed run
+```bash
+martin run "fix the auth regression" \
+  --budget 3.00 \
+  --soft-limit-usd 2.00 \
+  --max-iterations 2 \
+  --verify "pnpm test"
+```
+This changes the operator contract in a few important ways:
+- the next attempt can be rejected before overspend happens
+- the run still has to satisfy the verifier
+- the final state is inspectable instead of being inferred from logs alone
+## 6. Example stop reason
+MartinLoop returns an explicit lifecycle state and reason when a run stops:
+```json
+{
+  "decision": {
+    "shouldExit": true,
+    "lifecycleState": "budget_exit",
+    "status": "exited",
+    "reason": "Martin exited because the budget governor hit a hard limit."
+  }
+}
+```
+That answer is more useful than "the loop stopped" because it tells the operator whether the run ended for success, safety, or exhaustion.
+## 7. Example JSONL run record
+Each run appends a JSONL record shaped like:
+```json
+{
+  "loopId": "loop_example123",
+  "workspaceId": "ws_demo",
+  "projectId": "proj_demo",
+  "status": "exited",
+  "lifecycleState": "budget_exit",
+  "budget": {
+    "maxUsd": 3,
+    "softLimitUsd": 2,
+    "maxIterations": 2,
+    "maxTokens": 20000
+  },
+  "metadata": {
+    "policyProfile": "balanced",
+    "telemetryDestination": "local-only"
+  }
+}
+```
+The full record can also include attempts, events, verifier outcomes, and persisted artifact references. That is the evidence trail MartinLoop adds around a retrying coding loop.

package/docs/oss/README.md CHANGED Viewed

@@ -8,11 +8,11 @@ Martin Loop is a governed AI coding-loop runtime. The core runtime is real and v
 - `@martin/core`: the runtime controller, persistence layer, grounding scanner, leash engine, patch-truth scoring, and rollback restoration logic
 - `@martin/adapters`: normalized Claude CLI, Codex CLI, and direct-provider or stub adapter surfaces
 - `@martin/cli`: the local operator CLI for `run`, `inspect`, and `resume`
-- `@martin/mcp`: the MCP server surface for `martin_run`, `martin_inspect`, and `martin_status`
+- `@martinloop/mcp`: the MCP server surface for `martin_run`, `martin_inspect`, and `martin_status`
 ## What is still outside the initial OSS promise
-- The root workspace now exposes the `martin-loop` public package facade, but registry publication is still a later release step.
+- The root workspace now exposes the `martin-loop` public package facade, and `@martinloop/mcp` now has a standalone tarball shape validated via `pnpm --filter @martinloop/mcp smoke:pack`, but registry publication is still a separate release step.
 - `@martin/contracts`, `@martin/core`, and `@martin/adapters` are still marked `private` in their package manifests.
 - The hosted control-plane and local dashboard remain in the repo, but they are not yet the finalized public OSS boundary.
 - The benchmark harness remains a workspace-only RC surface under `benchmarks/` and is not part of the publishable CLI boundary yet.
@@ -55,8 +55,9 @@ The current engineering memo freezes these public-launch targets for release pla
 - install target: `npm install martin-loop`
 - CLI target: `npx martin-loop ...`
 - SDK target: `import { MartinLoop } from "martin-loop"`
+- MCP target (publish-ready): `npx @martinloop/mcp`
-Those targets are now implemented in the root package facade and verified through a clean-install smoke test. During the current RC phase, the honest operator path still includes the repo-local workflow documented below and in the quickstart, because public registry publication and broader release packaging remain later steps.
+Those runtime targets are implemented in the root package facade and verified through a clean-install smoke test. The MCP target is packaged and verified through a tarball launch smoke test. During the current RC phase, the honest operator path still includes the repo-local workflow documented below and in the quickstart, because public registry publication and broader release packaging remain separate release steps.
 ## Reproducibility
@@ -87,6 +88,8 @@ The current release-candidate gate is:
 - [`docs/oss/QUICKSTART.md`](./QUICKSTART.md) for clone-to-first-run setup
 - [`docs/oss/EXAMPLES.md`](./EXAMPLES.md) for grounded CLI and MCP examples
+- [`docs/oss/CLAUDE-CODE-WALKTHROUGH.md`](./CLAUDE-CODE-WALKTHROUGH.md) for a Claude Code-specific governed-run walkthrough
+- [`docs/oss/RALPH-LOOP-SAFETY.md`](./RALPH-LOOP-SAFETY.md) for a technical guide to governing Ralph-style loops safely
 - [`docs/oss/OSS-BOUNDARY-REPORT.md`](./OSS-BOUNDARY-REPORT.md) for the current machine-checked OSS boundary and public-surface status
 - [`docs/oss/RELEASE-SURFACE-REPORT.md`](./RELEASE-SURFACE-REPORT.md) for the current machine-checked release-surface audit
 - [`docs/pilot/README.md`](../pilot/README.md) for the pilot-prep package that remains explicitly gated behind Phase 13 completion

package/docs/oss/RELEASE-SURFACE-REPORT.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "generatedAt": "2026-04-23T14:55:08.167Z",
+  "generatedAt": "2026-05-11T21:47:37.407Z",
   "publicSurface": {
     "packageName": "martin-loop",
     "installCommand": "npm install martin-loop",

package/docs/oss/RELEASE-SURFACE-REPORT.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Martin Loop Phase 13 Release Surface Audit
-Generated: 2026-04-23T14:55:08.167Z
+Generated: 2026-05-11T21:47:37.407Z
 ## Verdict
 **GO**

package/package.json CHANGED Viewed

@@ -1,10 +1,14 @@
 {
   "name": "martin-loop",
   "private": false,
-  "version": "0.1.3",
+  "version": "0.1.5",
   "type": "module",
   "description": "Martin Loop dual-track monorepo for the OSS runtime and hosted SaaS control plane.",
   "license": "MIT",
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/Keesan12/martin-loop.git"
+  },
   "packageManager": "pnpm@10.17.1",
   "main": "./dist/index.js",
   "types": "./dist/index.d.ts",
@@ -21,8 +25,10 @@
   },
   "files": [
     "dist",
+    "demo",
     "README.md",
-    "docs/oss"
+    "docs/oss",
+    "docs/distribution"
   ],
   "publishConfig": {
     "access": "public"