npm - @bookedsolid/rea - Versions diffs - 0.13.3 → 0.15.0 - Mend

@bookedsolid/rea 0.13.3 → 0.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/MIGRATING.md +46 -0
package/agents/codex-adversarial.md +1 -1
package/commands/codex-review.md +6 -11
package/commands/freeze.md +1 -0
package/commands/review.md +1 -0
package/dist/cli/doctor.js +12 -6
package/dist/cli/install/settings-merge.js +3 -2
package/dist/hooks/push-gate/codex-runner.d.ts +14 -0
package/dist/hooks/push-gate/codex-runner.js +37 -1
package/dist/hooks/push-gate/index.js +8 -0
package/dist/hooks/push-gate/policy.d.ts +34 -0
package/dist/hooks/push-gate/policy.js +25 -0
package/dist/policy/loader.d.ts +38 -0
package/dist/policy/loader.js +30 -0
package/dist/policy/types.d.ts +28 -0
package/hooks/_lib/cmd-segments.sh +155 -0
package/hooks/_lib/protected-paths.sh +39 -0
package/hooks/attribution-advisory.sh +16 -7
package/hooks/blocked-paths-enforcer.sh +47 -1
package/hooks/changeset-security-gate.sh +45 -7
package/hooks/dangerous-bash-interceptor.sh +70 -46
package/hooks/dependency-audit-gate.sh +54 -24
package/hooks/env-file-protection.sh +14 -4
package/hooks/protected-paths-bash-gate.sh +213 -0
package/hooks/secret-scanner.sh +30 -0
package/hooks/settings-protection.sh +49 -8
package/package.json +4 -2

package/MIGRATING.md CHANGED Viewed

@@ -261,6 +261,50 @@ After migration, run `pnpm rea doctor`. The relevant lines:
 - `[info] extension-hook fragments detected: N pre-push.d, M
   commit-msg.d` — your fragment chain is active
+## Codex model knobs (added in 0.14.0)
+The push-gate now pins the flagship codex model and `high` reasoning
+effort by default. Pre-0.14.0 it used codex's built-in default, which
+is the special-purpose `codex-auto-review` model at `medium`
+reasoning — a meaningfully weaker reviewer than the flagship.
+Same-code-different-verdict thrashing on long-running branches was
+substantially driven by the lower-reasoning default.
+**Defaults (0.14.0+):**
+```yaml
+review:
+  codex_model: gpt-5.4              # was codex-auto-review (codex's own default)
+  codex_reasoning_effort: high      # was medium (codex's own default)
+```
+You don't need to set these — `gpt-5.4` + `high` are baked in at the
+package level. The policy keys exist for cost-bounded environments
+that want to opt into a weaker model:
+```yaml
+review:
+  codex_model: codex-auto-review    # opts back into the prior default
+  codex_reasoning_effort: medium
+```
+The model name is passed through to codex's TOML config layer
+(`-c model="…"`); codex itself validates it. An unknown model name
+surfaces as a clear runtime error at first push, not a silent
+fallback. Codex's current catalog (as of 2026-05-03):
+- `gpt-5.4` — flagship, reasoning-capable (recommended for review)
+- `gpt-5.4-mini` — smaller, faster, cheaper, less reasoning depth
+- `gpt-5.3-codex` — prior generation, code-specialized
+- `gpt-5.3-codex-spark` — even faster prior gen
+- `gpt-5.2` — older, generally avoid for security-relevant review
+- `codex-auto-review` — special-purpose, lower reasoning ceiling
+Reasoning effort is `low | medium | high`. `high` spends more compute
+per finding and produces more consistent verdicts — fewer
+same-code-different-verdict round-trips. Trade-off is push-gate
+latency.
 ## Policy knobs worth setting
 For consumers with a long-running migration branch (>30 commits since
@@ -274,6 +318,8 @@ review:
   timeout_ms: 1800000              # 30 min — explicit pin
   auto_narrow_threshold: 30        # 0 to disable auto-narrow
   last_n_commits: 10               # explicit scope window
+  codex_model: gpt-5.4             # 0.14.0+ default; iron-gate
+  codex_reasoning_effort: high     # 0.14.0+ default; iron-gate
 ```
 ## Bypass when you genuinely need to

package/agents/codex-adversarial.md CHANGED Viewed

@@ -36,7 +36,7 @@ You may read additional files in the repo if needed for context, but do so read-
 5. **Parse the Codex output** — extract structured findings.
 6. **Classify findings** by category: security, correctness, edge cases, test gaps, API design, performance.
 7. **Assign verdict**: `pass` (no material findings), `concerns` (findings worth addressing but not blocking), `blocking` (findings that must be fixed before merge).
-8. **Emit an audit entry** (optional in 0.11.0+) — the pre-push gate does not consult audit records to decide pass/fail, so you are no longer REQUIRED to emit a `codex.review` record on every interactive review. However, append one anyway via the public `@bookedsolid/rea/audit` helper when it helps forensic traceability (investigation of an intermittent verdict, review-history audit, etc.):
+8. **Emit an audit entry — REQUIRED** for every `/codex-review` invocation. The pre-push gate does not consult audit records to decide pass/fail (post-0.11.0 the gate is stateless), but the `/codex-review` slash command's Step 3 verifies an audit entry was appended for this run and surfaces "review never happened" to the user when one is missing. The two specs are a contract pair — audit emission is what tells the operator their interactive review actually completed. Append via the public `@bookedsolid/rea/audit` helper:
    ```ts
    import { appendAuditRecord, CODEX_REVIEW_TOOL_NAME, CODEX_REVIEW_SERVER_NAME, Tier, InvocationStatus } from '@bookedsolid/rea/audit';

package/commands/codex-review.md CHANGED Viewed

@@ -7,6 +7,7 @@ allowed-tools:
   - Bash(git branch:*)
   - Bash(git rev-parse:*)
   - Read
+  - Agent
 ---
 # /codex-review — Adversarial Review via Codex
@@ -54,23 +55,17 @@ Invoke the `codex-adversarial` agent with:
 The agent wraps `/codex:adversarial-review` and returns structured findings.
-## Step 3 — Record to audit log
+## Step 3 — (Optional) verify audit entry
-Every Codex invocation produces an audit entry. The `codex-adversarial` agent writes it via the middleware chain automatically, but verify the entry was recorded:
+Audit emission is **optional** in 0.11.0+. The pre-push gate is stateless and does not consult audit records to decide pass/fail; the agent's structured findings ARE the review. The agent will append an audit entry when it helps forensic traceability (intermittent verdicts, review-history audits) but its absence is not a failure.
+If you want to confirm an entry was written for this run:
 ```bash
 tail -n 1 .rea/audit.jsonl
 ```
-The entry must include:
-- `tool: "codex-adversarial-review"`
-- `head_sha: <SHA>`
-- `target: <ref>`
-- `finding_count: <N>`
-- `verdict: pass | concerns | blocking`
-If the audit entry is missing, report it clearly — do not proceed as if the review happened.
+A `codex-adversarial-review` entry with `head_sha`, `target`, `finding_count`, and `verdict` fields is informative — but DO NOT treat its absence as a failure. The review happened if the agent returned text. (Pre-0.15.0 this step was a hard verification gate that contradicted the agent's "audit optional" contract — see Helix Finding 3, 2026-05-03.)
 ## Step 4 — Report

package/commands/freeze.md CHANGED Viewed

@@ -3,6 +3,7 @@ description: Activate the REA kill switch — writes .rea/HALT with a reason, bl
 argument-hint: "<reason>"
 allowed-tools:
   - Bash(npx rea freeze:*)
+  - Read
 ---
 # /freeze — Activate the Kill Switch

package/commands/review.md CHANGED Viewed

@@ -7,6 +7,7 @@ allowed-tools:
   - Bash(git branch:*)
   - Bash(git status:*)
   - Read
+  - Agent
 ---
 # /review — Code Review on Current Changes

package/dist/cli/doctor.js CHANGED Viewed

@@ -203,7 +203,7 @@ function checkSettingsJson(baseDir) {
     const settingsPath = path.join(baseDir, '.claude', 'settings.json');
     if (!fs.existsSync(settingsPath)) {
         return {
-            label: 'settings.json matchers cover Bash + Write|Edit',
+            label: 'settings.json matchers cover Bash + Write|Edit|MultiEdit',
             status: 'fail',
             detail: `missing: ${settingsPath}`,
         };
@@ -216,23 +216,29 @@ function checkSettingsJson(baseDir) {
         const needs = [];
         if (!matchers.has('Bash'))
             needs.push('Bash');
-        if (!matchers.has('Write|Edit'))
-            needs.push('Write|Edit');
+        // 0.15.0: matcher was widened from `Write|Edit` to `Write|Edit|MultiEdit`
+        // in 0.14.0; doctor's check missed the rename. Accept either form so
+        // pre-0.14.0 installs that haven't run `rea upgrade` still report
+        // accurately, but the canonical produced by `defaultDesiredHooks()` is
+        // the wider matcher.
+        if (!matchers.has('Write|Edit|MultiEdit') && !matchers.has('Write|Edit')) {
+            needs.push('Write|Edit|MultiEdit');
+        }
         if (needs.length === 0) {
             return {
-                label: 'settings.json matchers cover Bash + Write|Edit',
+                label: 'settings.json matchers cover Bash + Write|Edit|MultiEdit',
                 status: 'pass',
             };
         }
         return {
-            label: 'settings.json matchers cover Bash + Write|Edit',
+            label: 'settings.json matchers cover Bash + Write|Edit|MultiEdit',
             status: 'fail',
             detail: `missing PreToolUse matchers: ${needs.join(', ')}`,
         };
     }
     catch (e) {
         return {
-            label: 'settings.json matchers cover Bash + Write|Edit',
+            label: 'settings.json matchers cover Bash + Write|Edit|MultiEdit',
             status: 'fail',
             detail: e instanceof Error ? e.message : String(e),
         };

package/dist/cli/install/settings-merge.js CHANGED Viewed

@@ -259,6 +259,7 @@ export function defaultDesiredHooks() {
             hooks: [
                 { type: 'command', command: `${base}/dangerous-bash-interceptor.sh`, timeout: 10000, statusMessage: 'Checking command safety...' },
                 { type: 'command', command: `${base}/env-file-protection.sh`, timeout: 5000, statusMessage: 'Checking for .env file reads...' },
+                { type: 'command', command: `${base}/protected-paths-bash-gate.sh`, timeout: 5000, statusMessage: 'Checking for shell-redirect to protected paths...' },
                 { type: 'command', command: `${base}/dependency-audit-gate.sh`, timeout: 15000, statusMessage: 'Verifying package exists...' },
                 { type: 'command', command: `${base}/security-disclosure-gate.sh`, timeout: 5000, statusMessage: 'Checking disclosure policy...' },
                 { type: 'command', command: `${base}/pr-issue-link-gate.sh`, timeout: 5000, statusMessage: 'Checking PR for issue reference...' },
@@ -267,7 +268,7 @@ export function defaultDesiredHooks() {
         },
         {
             event: 'PreToolUse',
-            matcher: 'Write|Edit',
+            matcher: 'Write|Edit|MultiEdit',
             hooks: [
                 { type: 'command', command: `${base}/secret-scanner.sh`, timeout: 15000, statusMessage: 'Scanning for credentials...' },
                 { type: 'command', command: `${base}/settings-protection.sh`, timeout: 5000, statusMessage: 'Checking settings protection...' },
@@ -277,7 +278,7 @@ export function defaultDesiredHooks() {
         },
         {
             event: 'PostToolUse',
-            matcher: 'Write|Edit',
+            matcher: 'Write|Edit|MultiEdit',
             hooks: [
                 { type: 'command', command: `${base}/architecture-review-gate.sh`, timeout: 10000, statusMessage: 'Checking architecture impact...' },
             ],

package/dist/hooks/push-gate/codex-runner.d.ts CHANGED Viewed

@@ -72,6 +72,20 @@ export interface CodexRunOptions {
     timeoutMs: number;
     /** Optional custom review prompt; defaults to Codex's built-in. */
     prompt?: string;
+    /**
+     * Codex CLI model override (0.13.4+). When set, the runner passes
+     * `-c model="<value>"` to `codex exec review`. Codex itself validates
+     * the name. `undefined` falls back to codex's own default
+     * (`codex-auto-review` today, NOT the `gpt-5.4` flagship).
+     */
+    model?: string;
+    /**
+     * Codex reasoning effort (0.13.4+). When set, the runner passes
+     * `-c model_reasoning_effort="<value>"`. Only meaningful when paired
+     * with a reasoning-capable model (gpt-5.4, gpt-5.3-codex). Codex's
+     * own default is `medium`.
+     */
+    reasoningEffort?: 'low' | 'medium' | 'high';
     /**
      * Env passthrough. Tests inject a clean env to prevent ambient overrides.
      * Production passes `process.env`.

package/dist/hooks/push-gate/codex-runner.js CHANGED Viewed

@@ -110,6 +110,22 @@ export function createRealGitExecutor(cwd) {
         },
     };
 }
+// ---------------------------------------------------------------------------
+// Codex invocation
+// ---------------------------------------------------------------------------
+/**
+ * Escape a string for safe inclusion inside a TOML basic-string literal.
+ * Codex's `-c key=value` parser runs the value through TOML, so we have to
+ * close over the same escape contract — namely backslash and double-quote
+ * (TOML basic strings forbid raw `"` and `\` in the body). The model names
+ * and reasoning levels we expect (`gpt-5.4`, `high`, etc.) never contain
+ * either character; this guard exists so a future model-name typo with a
+ * shell metacharacter cannot smuggle a TOML escape that codex misparses
+ * into something dangerous.
+ */
+function escapeTomlString(value) {
+    return value.replace(/\\/g, '\\\\').replace(/"/g, '\\"');
+}
 /**
  * Execute `codex exec review` and return the concatenated review text on
  * success. Callers then pass the text to `summarizeReview()` to get a
@@ -120,7 +136,27 @@ export function createRealGitExecutor(cwd) {
  */
 export async function runCodexReview(options) {
     const spawner = options.spawnImpl ?? spawn;
-    const baseArgs = ['exec', 'review', '--base', options.baseRef, '--json', '--ephemeral'];
+    // Model + reasoning overrides go BEFORE the `exec` subcommand because
+    // `-c key=value` is a top-level codex CLI flag, not an `exec` flag.
+    // Codex's TOML parser interprets the value, so we wrap strings in TOML
+    // quotes — `-c model="gpt-5.4"` not `-c model=gpt-5.4` — to ensure the
+    // value lands as a string regardless of upstream parsing changes.
+    const overrideArgs = [];
+    if (options.model !== undefined && options.model.length > 0) {
+        overrideArgs.push('-c', `model="${escapeTomlString(options.model)}"`);
+    }
+    if (options.reasoningEffort !== undefined) {
+        overrideArgs.push('-c', `model_reasoning_effort="${escapeTomlString(options.reasoningEffort)}"`);
+    }
+    const baseArgs = [
+        ...overrideArgs,
+        'exec',
+        'review',
+        '--base',
+        options.baseRef,
+        '--json',
+        '--ephemeral',
+    ];
     const args = options.prompt !== undefined && options.prompt.length > 0 ? [...baseArgs, options.prompt] : baseArgs;
     let child;
     try {

package/dist/hooks/push-gate/index.js CHANGED Viewed

@@ -342,6 +342,14 @@ export async function runPushGate(deps) {
             cwd: deps.baseDir,
             timeoutMs: policy.timeout_ms,
             env,
+            // 0.14.0+: pass the resolved policy's model + reasoning overrides so
+            // codex spawns with `-c model="<name>" -c model_reasoning_effort="<level>"`.
+            // Defaults (gpt-5.4 + high) are baked into resolvePushGatePolicy so
+            // policies that omit these keys still get the iron-gate defaults.
+            ...(policy.codex_model !== undefined ? { model: policy.codex_model } : {}),
+            ...(policy.codex_reasoning_effort !== undefined
+                ? { reasoningEffort: policy.codex_reasoning_effort }
+                : {}),
         });
         const summary = summarizeReview(codexResult.reviewText);
         const blocked = summary.verdict === 'blocking'

package/dist/hooks/push-gate/policy.d.ts CHANGED Viewed

@@ -43,6 +43,19 @@ export interface ResolvedReviewPolicy {
      * emits a stderr warning. Defaults to 30 when unset; 0 disables.
      */
     auto_narrow_threshold: number;
+    /**
+     * Codex CLI model override (0.13.4+). When set, the runner passes
+     * `-c model="<value>"` to every `codex exec review`. `undefined` falls
+     * back to codex's own default (currently `codex-auto-review`, NOT the
+     * flagship `gpt-5.4`).
+     */
+    codex_model: string | undefined;
+    /**
+     * Codex reasoning effort (0.13.4+). When set, the runner passes
+     * `-c model_reasoning_effort="<value>"`. `undefined` falls back to
+     * codex's own default (currently `medium`).
+     */
+    codex_reasoning_effort: 'low' | 'medium' | 'high' | undefined;
     /** `true` when `.rea/policy.yaml` was absent; defaults apply. */
     policyMissing: boolean;
 }
@@ -63,6 +76,27 @@ export declare const PUSH_GATE_DEFAULT_AUTO_NARROW_THRESHOLD = 30;
  * recent work.
  */
 export declare const PUSH_GATE_DEFAULT_LAST_N_COMMITS_FALLBACK = 10;
+/**
+ * Default codex model for the push-gate (0.14.0+). Pinned to the flagship
+ * (`gpt-5.4`) instead of falling through to codex's own default of
+ * `codex-auto-review` (a lower-reasoning special-purpose model). Verdict
+ * stability matters more than per-push compute cost for adversarial
+ * review of consumer codebases — the helixir 2026-04-26 thrashing came
+ * from the lower-reasoning default.
+ *
+ * Override via `policy.review.codex_model: <name>` in `.rea/policy.yaml`
+ * for cost-bounded environments. `codex-auto-review` is the explicit
+ * opt-in to the prior 0.13.x behavior.
+ */
+export declare const PUSH_GATE_DEFAULT_CODEX_MODEL = "gpt-5.4";
+/**
+ * Default codex reasoning effort (0.14.0+). Pinned to `high` for maximum
+ * compute per finding — fewer same-code-different-verdict round-trips.
+ * Trades latency for stability. Override via
+ * `policy.review.codex_reasoning_effort: medium | low` in
+ * `.rea/policy.yaml` for cost-bounded environments.
+ */
+export declare const PUSH_GATE_DEFAULT_CODEX_REASONING_EFFORT: 'low' | 'medium' | 'high';
 /**
  * Resolve the push-gate policy for `baseDir`. Never throws — a malformed
  * policy file surfaces as a typed error via the underlying zod validator,

package/dist/hooks/push-gate/policy.js CHANGED Viewed

@@ -45,6 +45,27 @@ export const PUSH_GATE_DEFAULT_AUTO_NARROW_THRESHOLD = 30;
  * recent work.
  */
 export const PUSH_GATE_DEFAULT_LAST_N_COMMITS_FALLBACK = 10;
+/**
+ * Default codex model for the push-gate (0.14.0+). Pinned to the flagship
+ * (`gpt-5.4`) instead of falling through to codex's own default of
+ * `codex-auto-review` (a lower-reasoning special-purpose model). Verdict
+ * stability matters more than per-push compute cost for adversarial
+ * review of consumer codebases — the helixir 2026-04-26 thrashing came
+ * from the lower-reasoning default.
+ *
+ * Override via `policy.review.codex_model: <name>` in `.rea/policy.yaml`
+ * for cost-bounded environments. `codex-auto-review` is the explicit
+ * opt-in to the prior 0.13.x behavior.
+ */
+export const PUSH_GATE_DEFAULT_CODEX_MODEL = 'gpt-5.4';
+/**
+ * Default codex reasoning effort (0.14.0+). Pinned to `high` for maximum
+ * compute per finding — fewer same-code-different-verdict round-trips.
+ * Trades latency for stability. Override via
+ * `policy.review.codex_reasoning_effort: medium | low` in
+ * `.rea/policy.yaml` for cost-bounded environments.
+ */
+export const PUSH_GATE_DEFAULT_CODEX_REASONING_EFFORT = 'high';
 /**
  * Resolve the push-gate policy for `baseDir`. Never throws — a malformed
  * policy file surfaces as a typed error via the underlying zod validator,
@@ -64,6 +85,8 @@ export async function resolvePushGatePolicy(baseDir) {
             timeout_ms: PUSH_GATE_DEFAULT_TIMEOUT_MS,
             last_n_commits: undefined,
             auto_narrow_threshold: PUSH_GATE_DEFAULT_AUTO_NARROW_THRESHOLD,
+            codex_model: PUSH_GATE_DEFAULT_CODEX_MODEL,
+            codex_reasoning_effort: PUSH_GATE_DEFAULT_CODEX_REASONING_EFFORT,
             policyMissing: true,
         };
     }
@@ -75,6 +98,8 @@ export async function resolvePushGatePolicy(baseDir) {
         timeout_ms: review.timeout_ms ?? PUSH_GATE_DEFAULT_TIMEOUT_MS,
         last_n_commits: review.last_n_commits,
         auto_narrow_threshold: review.auto_narrow_threshold ?? PUSH_GATE_DEFAULT_AUTO_NARROW_THRESHOLD,
+        codex_model: review.codex_model ?? PUSH_GATE_DEFAULT_CODEX_MODEL,
+        codex_reasoning_effort: review.codex_reasoning_effort ?? PUSH_GATE_DEFAULT_CODEX_REASONING_EFFORT,
         policyMissing: false,
     };
 }

package/dist/policy/loader.d.ts CHANGED Viewed

@@ -45,18 +45,52 @@ declare const PolicySchema: z.ZodObject<{
          * intent and auto-narrow stays out of the way).
          */
         auto_narrow_threshold: z.ZodOptional<z.ZodNumber>;
+        /**
+         * Codex CLI model override (0.13.4+). Pinned via `-c model="<name>"` on
+         * every `codex exec review` invocation. When unset, codex's own default
+         * applies — which today is the special-purpose `codex-auto-review`
+         * model at `medium` reasoning, NOT the flagship.
+         *
+         * For serious adversarial review on consumer codebases (where verdict
+         * stability matters) the recommended setting is `gpt-5.4` with
+         * `codex_reasoning_effort: high`. Higher reasoning trades push-gate
+         * latency for finding consistency — fewer same-code-different-verdict
+         * round-trips like the 2026-04-26 helixir migration session.
+         *
+         * Loose string type: codex's model catalog evolves over time and we do
+         * NOT want to lock consumers to a hardcoded enum that drifts behind
+         * upstream. Codex itself validates the model name at exec time.
+         */
+        codex_model: z.ZodOptional<z.ZodString>;
+        /**
+         * Codex reasoning effort knob (0.13.4+). Pinned via
+         * `-c model_reasoning_effort="<level>"` on every invocation. Only
+         * meaningful when paired with a reasoning-capable model (gpt-5.4,
+         * gpt-5.3-codex, etc.). The `codex-auto-review` model honors this
+         * but caps lower than gpt-5.4.
+         *
+         * Recommended: `high` for serious review on long-running branches
+         * (more compute spent per finding, fewer flips). `medium` is codex's
+         * own default. `low` for cost-bounded environments where consistency
+         * matters less than throughput.
+         */
+        codex_reasoning_effort: z.ZodOptional<z.ZodEnum<["low", "medium", "high"]>>;
     }, "strict", z.ZodTypeAny, {
         codex_required?: boolean | undefined;
         concerns_blocks?: boolean | undefined;
         timeout_ms?: number | undefined;
         last_n_commits?: number | undefined;
         auto_narrow_threshold?: number | undefined;
+        codex_model?: string | undefined;
+        codex_reasoning_effort?: "low" | "medium" | "high" | undefined;
     }, {
         codex_required?: boolean | undefined;
         concerns_blocks?: boolean | undefined;
         timeout_ms?: number | undefined;
         last_n_commits?: number | undefined;
         auto_narrow_threshold?: number | undefined;
+        codex_model?: string | undefined;
+        codex_reasoning_effort?: "low" | "medium" | "high" | undefined;
     }>>;
     redact: z.ZodOptional<z.ZodObject<{
         match_timeout_ms: z.ZodOptional<z.ZodNumber>;
@@ -152,6 +186,8 @@ declare const PolicySchema: z.ZodObject<{
         timeout_ms?: number | undefined;
         last_n_commits?: number | undefined;
         auto_narrow_threshold?: number | undefined;
+        codex_model?: string | undefined;
+        codex_reasoning_effort?: "low" | "medium" | "high" | undefined;
     } | undefined;
     redact?: {
         match_timeout_ms?: number | undefined;
@@ -197,6 +233,8 @@ declare const PolicySchema: z.ZodObject<{
         timeout_ms?: number | undefined;
         last_n_commits?: number | undefined;
         auto_narrow_threshold?: number | undefined;
+        codex_model?: string | undefined;
+        codex_reasoning_effort?: "low" | "medium" | "high" | undefined;
     } | undefined;
     redact?: {
         match_timeout_ms?: number | undefined;

package/dist/policy/loader.js CHANGED Viewed

@@ -38,6 +38,36 @@ const ReviewPolicySchema = z
      * intent and auto-narrow stays out of the way).
      */
     auto_narrow_threshold: z.number().int().nonnegative().optional(),
+    /**
+     * Codex CLI model override (0.13.4+). Pinned via `-c model="<name>"` on
+     * every `codex exec review` invocation. When unset, codex's own default
+     * applies — which today is the special-purpose `codex-auto-review`
+     * model at `medium` reasoning, NOT the flagship.
+     *
+     * For serious adversarial review on consumer codebases (where verdict
+     * stability matters) the recommended setting is `gpt-5.4` with
+     * `codex_reasoning_effort: high`. Higher reasoning trades push-gate
+     * latency for finding consistency — fewer same-code-different-verdict
+     * round-trips like the 2026-04-26 helixir migration session.
+     *
+     * Loose string type: codex's model catalog evolves over time and we do
+     * NOT want to lock consumers to a hardcoded enum that drifts behind
+     * upstream. Codex itself validates the model name at exec time.
+     */
+    codex_model: z.string().min(1).optional(),
+    /**
+     * Codex reasoning effort knob (0.13.4+). Pinned via
+     * `-c model_reasoning_effort="<level>"` on every invocation. Only
+     * meaningful when paired with a reasoning-capable model (gpt-5.4,
+     * gpt-5.3-codex, etc.). The `codex-auto-review` model honors this
+     * but caps lower than gpt-5.4.
+     *
+     * Recommended: `high` for serious review on long-running branches
+     * (more compute spent per finding, fewer flips). `medium` is codex's
+     * own default. `low` for cost-bounded environments where consistency
+     * matters less than throughput.
+     */
+    codex_reasoning_effort: z.enum(['low', 'medium', 'high']).optional(),
 })
     .strict();
 /**

package/dist/policy/types.d.ts CHANGED Viewed

@@ -130,6 +130,34 @@ export interface ReviewPolicy {
      * Non-negative integer. The loader rejects negative values.
      */
     auto_narrow_threshold?: number;
+    /**
+     * Codex CLI model override (0.13.4+). Pinned via `-c model="<name>"` on
+     * every `codex exec review` invocation. When unset, codex's own default
+     * applies — which today is the special-purpose `codex-auto-review` model
+     * at medium reasoning, NOT the flagship.
+     *
+     * Recommended for serious adversarial review: `gpt-5.4` paired with
+     * `codex_reasoning_effort: high`. Higher reasoning trades push-gate
+     * latency for verdict consistency — fewer same-code-different-verdict
+     * round-trips like the 2026-04-26 helixir migration session.
+     *
+     * Loose string type — codex's model catalog evolves. Codex itself
+     * validates the model name at exec time; an unknown name surfaces as
+     * a clear runtime error rather than a silent fallback.
+     */
+    codex_model?: string;
+    /**
+     * Codex reasoning effort (0.13.4+). Pinned via
+     * `-c model_reasoning_effort="<level>"` on every invocation. Only
+     * meaningful when paired with a reasoning-capable model (gpt-5.4,
+     * gpt-5.3-codex). Codex's own default is `medium`.
+     *
+     * Recommended: `high` for serious review on long-running branches
+     * (more compute spent per finding, fewer flips). `low` for
+     * cost-bounded environments where consistency matters less than
+     * throughput.
+     */
+    codex_reasoning_effort?: 'low' | 'medium' | 'high';
 }
 /**
  * User-supplied redaction pattern entry. Each pattern has a stable `name` used