@openthink/stamp 1.3.0 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -292,6 +292,53 @@ models pinned, each operator records their own verdict in their own
292
292
  state.db (same as today's reviewer-prompt model). Stamp does not assume
293
293
  verdicts are model-portable.
294
294
 
295
+ ### Reviewer execution budgets
296
+
297
+ Each reviewer subprocess runs under bounds that can be set in three
298
+ places, narrowest-wins: per-reviewer fields in `.stamp/config.yml`
299
+ (committed policy, hashed into the attestation), operator env vars on
300
+ the calling shell (per-shell, not committed), or the built-in default.
301
+
302
+ | Knob | Env var (default) | `.stamp/config.yml` field | What it caps |
303
+ |---|---|---|---|
304
+ | Turn cap | `STAMP_REVIEWER_MAX_TURNS` (`8`) | `reviewers.<name>.max_turns` | Model/tool round-trips. Hitting it surfaces as `reviewer "<name>" run failed (subtype=error_max_turns) — turn trace at <path>; raise STAMP_REVIEWER_MAX_TURNS or set reviewers.<name>.max_turns to extend it`. |
305
+ | Wall-clock | `STAMP_REVIEWER_TIMEOUT_MS` (`300000`) | `reviewers.<name>.timeout_ms` | Time per reviewer. Hitting it aborts the SDK call and writes a turn trace. |
306
+ | Diff size | `STAMP_REVIEW_DIFF_CAP_BYTES` (`204800`) | — (operator-side only) | Per-reviewer diff size; bypass per-invocation with `--allow-large`. Lives here because diff size is operator-bounded input rather than per-reviewer execution policy. |
307
+
308
+ The defaults are tight enough that a pathological reviewer gives up in
309
+ single-digit minutes rather than racking up Anthropic spend silently.
310
+ Reach for the committed `.stamp/config.yml` form when one reviewer
311
+ legitimately needs headroom (e.g. a `product` reviewer that does Linear
312
+ ticket reconciliation) but raising the global env would over-budget the
313
+ others; reach for the env vars for ad-hoc operator overrides.
314
+
315
+ ```yaml
316
+ # .stamp/config.yml — example: heavy product reviewer
317
+ reviewers:
318
+ security: { prompt: .stamp/reviewers/security.md }
319
+ standards: { prompt: .stamp/reviewers/standards.md }
320
+ product:
321
+ prompt: .stamp/reviewers/product.md
322
+ max_turns: 20
323
+ timeout_ms: 600000
324
+ ```
325
+
326
+ ```sh
327
+ # Operator-side global override for a one-off ad-hoc run
328
+ STAMP_REVIEWER_MAX_TURNS=20 STAMP_REVIEWER_TIMEOUT_MS=600000 \
329
+ stamp review --diff main..HEAD
330
+ ```
331
+
332
+ When a reviewer trips the cap, a structured turn trace is written to
333
+ `<repoRoot>/.git/stamp/failed-runs/<unix-ms>-<reviewer>.log` (mode
334
+ `0600`, parent `0700`, JSON; lists the tool-call sequence and input
335
+ hashes that the reviewer made before failure — never raw model prose
336
+ or unhashed inputs). Use it to distinguish a looping prompt from a
337
+ legitimately under-budgeted reviewer. `stamp prune --older-than <dur>`
338
+ walks both `failed-runs/` and `failed-parses/`. See
339
+ [`docs/troubleshooting.md`](./docs/troubleshooting.md) for the full
340
+ runbook.
341
+
295
342
  ## Deployment shapes
296
343
 
297
344
  Three ways to run stamp-cli in a real setting, trading setup cost for