@dtechvision/fabrik-runtime 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,70 +1,59 @@
1
1
  # @dtechvision/fabrik-runtime
2
2
 
3
- Shared TypeScript utilities for Fabrik workflow pods.
3
+ TypeScript helpers for Fabrik/Smithers workflows.
4
4
 
5
- - **Credential pool** — read from mounted `/etc/fabrik/credentials`, rotate on failure, notify operators
6
- - **Codex auth rotation** — rotate among `auth.json` / `*.auth.json` credentials for Codex-backed workflows
7
- - **K8s jobs** — dispatch child verification jobs from a running workflow
8
- - **JJ shell** — deterministic JJ/Git snapshot, bookmark push, workspace prep
5
+ ## Scope
9
6
 
10
- ## Import Surface
7
+ Use this package for server-side workflow code.
11
8
 
12
- Workflows should import from `@dtechvision/fabrik-runtime/...`.
9
+ Do not use it for:
10
+ - browser apps
11
+ - generic Node libraries
13
12
 
13
+ Published entrypoints:
14
+ - `@dtechvision/fabrik-runtime`
14
15
  - `@dtechvision/fabrik-runtime/credential-pool`
15
16
  - `@dtechvision/fabrik-runtime/codex-auth`
16
17
  - `@dtechvision/fabrik-runtime/jj-shell`
17
18
  - `@dtechvision/fabrik-runtime/k8s-jobs`
18
19
 
19
- For in-cluster Fabrik runs, the Smithers runtime image ships this package in its `node_modules`.
20
- For local workflow development in another repo, add the package as a dependency from a release or local path.
21
-
22
- ## Installation
20
+ ## Requirements
23
21
 
24
- Install from npm:
25
-
26
- ```bash
27
- bun add @dtechvision/fabrik-runtime
28
- ```
22
+ - ESM-capable runtime
23
+ - TypeScript source consumption
24
+ - Bun/Smithers-style workflow execution is the primary target
29
25
 
30
- or:
26
+ Module-specific requirements:
31
27
 
32
- ```bash
33
- npm install @dtechvision/fabrik-runtime
34
- ```
28
+ | Module | Requirement |
29
+ |---|---|
30
+ | `credential-pool` | mounted credential files for file-pool features |
31
+ | `codex-auth` | Codex auth files in the credential pool layout |
32
+ | `jj-shell` | `jj` and `git` in `PATH` |
33
+ | `k8s-jobs` | Kubernetes runtime access |
35
34
 
36
- Smithers workflows also need their normal workflow dependencies in the consuming repo:
35
+ ## Install
37
36
 
38
37
  ```bash
39
- bun add smithers-orchestrator zod
38
+ bun add @dtechvision/fabrik-runtime smithers-orchestrator zod
40
39
  ```
41
40
 
42
- or:
41
+ or
43
42
 
44
43
  ```bash
45
- npm install smithers-orchestrator zod
44
+ npm install @dtechvision/fabrik-runtime smithers-orchestrator zod
46
45
  ```
47
46
 
48
- Package releases follow the same `v*` tag version as the Fabrik CLI release flow.
49
-
50
- ## Smithers Integration
51
-
52
- Use the package from ordinary Smithers workflows:
47
+ ## Quickstart
53
48
 
54
49
  ```ts
55
50
  /** @jsxImportSource smithers-orchestrator */
56
51
  import { createSmithers, Task, Workflow } from "smithers-orchestrator";
57
52
  import { z } from "zod";
58
53
  import { withCodexAuthPoolEnv } from "@dtechvision/fabrik-runtime/codex-auth";
59
- import { prepareWorkspaces } from "@dtechvision/fabrik-runtime/jj-shell";
60
54
 
61
55
  const { smithers, outputs } = createSmithers(
62
- {
63
- report: z.object({
64
- codexHomeSet: z.boolean(),
65
- jjHelpersLoaded: z.boolean(),
66
- }),
67
- },
56
+ { report: z.object({ codexHomeSet: z.boolean() }) },
68
57
  { dbPath: process.env.SMITHERS_DB_PATH ?? ".smithers/runtime-check.db" },
69
58
  );
70
59
 
@@ -73,42 +62,30 @@ export default smithers(() => (
73
62
  <Task id="verify" output={outputs.report}>
74
63
  {async () => {
75
64
  const env = withCodexAuthPoolEnv({});
76
- return {
77
- codexHomeSet: typeof env.CODEX_HOME === "string" && env.CODEX_HOME.length > 0,
78
- jjHelpersLoaded: typeof prepareWorkspaces === "function",
79
- };
65
+ return { codexHomeSet: typeof env.CODEX_HOME === "string" && env.CODEX_HOME.length > 0 };
80
66
  }}
81
67
  </Task>
82
68
  </Workflow>
83
69
  ));
84
70
  ```
85
71
 
86
- Run it locally with Smithers from a repo that has installed:
87
-
88
- - `@dtechvision/fabrik-runtime`
89
- - `smithers-orchestrator`
90
- - `zod`
91
-
92
- Then:
72
+ Run:
93
73
 
94
74
  ```bash
95
75
  bunx smithers run path/to/workflow.tsx --run-id runtime-package-check
96
76
  ```
97
77
 
98
- The workflow file should live in the consuming project tree so normal Node/Bun package resolution can find the installed dependencies.
78
+ ## Common use
99
79
 
100
- ## Credentials
101
-
102
- Operators manage `fabrik-credentials` in `fabrik-system` via kubectl. The CLI mirrors it into the run namespace at dispatch time. The secret is directory-mounted (no subPath) at `/etc/fabrik/credentials/` so running pods observe file replacements.
80
+ Read a credential into `process.env`:
103
81
 
104
82
  ```ts
105
83
  import { injectCredentialEnv } from "@dtechvision/fabrik-runtime/credential-pool";
106
84
 
107
- // Reads /etc/fabrik/credentials/ANTHROPIC_API_KEY → process.env.ANTHROPIC_API_KEY
108
85
  injectCredentialEnv("ANTHROPIC_API_KEY");
109
86
  ```
110
87
 
111
- For file-pool rotation (e.g. multiple Codex auth files):
88
+ Rotate across credential files:
112
89
 
113
90
  ```ts
114
91
  import { CredentialFilePool } from "@dtechvision/fabrik-runtime/credential-pool";
@@ -120,63 +97,49 @@ const pool = new CredentialFilePool({
120
97
  activeFilename: "auth.json",
121
98
  agent: "codex",
122
99
  });
123
- pool.init();
124
100
 
125
- // On auth failure:
101
+ pool.init();
126
102
  const rotated = await pool.handleError(err);
127
103
  ```
128
104
 
129
- For Codex-specific rotation, use the higher-level helper:
105
+ Create a Codex agent with auth rotation:
130
106
 
131
107
  ```ts
132
- import { createCodexAgentWithPool } from "@dtechvision/fabrik-runtime/codex-auth";
108
+ import {
109
+ CodexAuthBlockedError,
110
+ createCodexAgentWithPool,
111
+ } from "@dtechvision/fabrik-runtime/codex-auth";
133
112
 
134
113
  const codex = createCodexAgentWithPool({
135
114
  model: "gpt-5",
136
115
  cwd: process.cwd(),
137
116
  env: {},
138
117
  });
139
- ```
140
118
 
141
- ## Local Verification
142
-
143
- Runtime package tests:
144
-
145
- ```bash
146
- cd src/fabrik-runtime
147
- bun test ./src
148
- ```
149
-
150
- Repo-wide CLI and workflow verification:
151
-
152
- ```bash
153
- make verify-cli
154
- make verify-cli-k3d
155
- ```
156
-
157
- Focused runtime-package k3d import verification:
158
-
159
- ```bash
160
- cd src/fabrik-cli
161
- FABRIK_K3D_E2E=1 FABRIK_K3D_CLUSTER=dev-single \
162
- go test ./internal/run -run TestK3dWorkflowRuntimePackageImports -timeout 10m -v
119
+ try {
120
+ await codex.generate({ prompt: "Hello" });
121
+ } catch (err) {
122
+ if (err instanceof CodexAuthBlockedError) {
123
+ // resumable auth exhaustion; restore credentials and resume the run
124
+ console.log(err.details); // { total, failed, remaining, activeAuthName, failedAuths }
125
+ }
126
+ throw err;
127
+ }
163
128
  ```
164
129
 
165
- The complex sample in [examples/complex/README.md](/Users/samuel/git/local-isolated-ralph/examples/complex/README.md) shows how workflow code consumes the package surface in practice.
130
+ Read the auth home directory at runtime (lazy, respects `CODEX_AUTH_HOME` env var):
166
131
 
167
- Local Smithers CLI verification:
132
+ ```ts
133
+ import { getCodexAuthHome } from "@dtechvision/fabrik-runtime/codex-auth";
168
134
 
169
- ```bash
170
- bunx smithers run path/to/workflow.tsx --run-id runtime-package-check
135
+ const home = getCodexAuthHome(); // e.g. /tmp/codex-auth-pool
171
136
  ```
172
137
 
173
- The expected result is a successful run whose output reports:
174
-
175
- - `codexHomeSet: true`
176
- - `jjHelpersLoaded: true`
177
-
178
- ## Precedence
138
+ ## Notes
179
139
 
180
- 1. Fabrik runtime metadata (`SMITHERS_*`, `FABRIK_*`, `KUBERNETES_*`)
181
- 2. Project env (`fabrik-env-<project>-<env>`) via `envFrom`
182
- 3. Shared credentials (`fabrik-credentials`) via file mount
140
+ - Each `RotatingCodexAgent` instance owns its own pool state. Multiple agents in the same process do not interfere with each other.
141
+ - Auth failures are tracked by file path + content hash, so replacing a credential file on disk clears its failure history.
142
+ - Codex auth rotation emits OTEL metrics/events when an OpenTelemetry SDK is configured (events are attached to the active span, not standalone).
143
+ - Fabrik runtime images may already ship this package.
144
+ - Package versions follow the same `v*` tag line as Fabrik releases.
145
+ - Prefer aligned Fabrik image and package versions when both are in use.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@dtechvision/fabrik-runtime",
3
- "version": "0.1.0",
3
+ "version": "0.1.1",
4
4
  "type": "module",
5
5
  "main": "src/index.ts",
6
6
  "types": "src/index.ts",
@@ -15,6 +15,7 @@
15
15
  "src/index.ts",
16
16
  "src/credential-pool.ts",
17
17
  "src/codex-auth.ts",
18
+ "src/codex-auth-telemetry.ts",
18
19
  "src/k8s-jobs.ts",
19
20
  "src/jj-shell.ts",
20
21
  "README.md"
@@ -32,6 +33,7 @@
32
33
  "test": "bun test ./src"
33
34
  },
34
35
  "dependencies": {
36
+ "@opentelemetry/api": "1.9.0",
35
37
  "smithers-orchestrator": "0.9.1"
36
38
  },
37
39
  "devDependencies": {
@@ -0,0 +1,153 @@
1
+ import * as otel from "@opentelemetry/api";
2
+ import type { FailureKind } from "./credential-pool";
3
+
4
+ export type CodexAuthBlockedDetails = {
5
+ total: number;
6
+ failed: number;
7
+ remaining: number;
8
+ activeAuthName: string | null;
9
+ failedAuths: Array<{ authName: string; kind: FailureKind }>;
10
+ };
11
+
12
+ export type CodexAuthTelemetryContext = {
13
+ runId?: string;
14
+ namespace?: string;
15
+ };
16
+
17
+ type CodexAuthTelemetrySink = {
18
+ counter(name: string, value: number, attrs?: Record<string, unknown>): void;
19
+ gauge(name: string, value: number, attrs?: Record<string, unknown>): void;
20
+ event(name: string, attrs?: Record<string, unknown>): void;
21
+ };
22
+
23
+ const createOtelSink = (): CodexAuthTelemetrySink => {
24
+ const meter = otel.metrics.getMeter("@dtechvision/fabrik-runtime/codex-auth");
25
+
26
+ const counters = new Map<string, ReturnType<typeof meter.createCounter>>();
27
+ const gauges = new Map<string, ReturnType<typeof meter.createGauge>>();
28
+
29
+ const getCounter = (name: string) => {
30
+ let counter = counters.get(name);
31
+ if (!counter) {
32
+ counter = meter.createCounter(name);
33
+ counters.set(name, counter);
34
+ }
35
+ return counter;
36
+ };
37
+
38
+ const getGauge = (name: string) => {
39
+ let gauge = gauges.get(name);
40
+ if (!gauge) {
41
+ gauge = meter.createGauge(name);
42
+ gauges.set(name, gauge);
43
+ }
44
+ return gauge;
45
+ };
46
+
47
+ return {
48
+ counter(name, value, attrs) {
49
+ getCounter(name).add(value, attrs);
50
+ },
51
+ gauge(name, value, attrs) {
52
+ getGauge(name).record(value, attrs);
53
+ },
54
+ event(name, attrs) {
55
+ const span = otel.trace.getActiveSpan();
56
+ if (span) {
57
+ span.addEvent(name, attrs);
58
+ }
59
+ },
60
+ };
61
+ };
62
+
63
+ let sink: CodexAuthTelemetrySink = createOtelSink();
64
+
65
+ export function __setCodexAuthTelemetrySinkForTests(next: CodexAuthTelemetrySink): void {
66
+ sink = next;
67
+ }
68
+
69
+ export function __resetCodexAuthTelemetrySinkForTests(): void {
70
+ sink = createOtelSink();
71
+ }
72
+
73
+ const baseAttrs = (ctx: CodexAuthTelemetryContext) => ({
74
+ ...(ctx.runId ? { run_id: ctx.runId } : {}),
75
+ ...(ctx.namespace ? { kubernetes_namespace: ctx.namespace } : {}),
76
+ });
77
+
78
+ const failureKindCounts = (details: CodexAuthBlockedDetails) => ({
79
+ usage_limit: details.failedAuths.filter((entry) => entry.kind === "usage_limit").length,
80
+ refresh_token_reused: details.failedAuths.filter((entry) => entry.kind === "refresh_token_reused").length,
81
+ auth_invalid: details.failedAuths.filter((entry) => entry.kind === "auth_invalid").length,
82
+ });
83
+
84
+ export function recordCodexAuthPoolSnapshot(
85
+ details: CodexAuthBlockedDetails,
86
+ ctx: CodexAuthTelemetryContext,
87
+ ): void {
88
+ const attrs = baseAttrs(ctx);
89
+ const counts = failureKindCounts(details);
90
+ sink.gauge("fabrik.codex_auth.pool.total", details.total, attrs);
91
+ sink.gauge("fabrik.codex_auth.pool.failed", details.failed, attrs);
92
+ sink.gauge("fabrik.codex_auth.pool.remaining", details.remaining, attrs);
93
+ sink.gauge("fabrik.codex_auth.pool.failed_usage_limit", counts.usage_limit, attrs);
94
+ sink.gauge(
95
+ "fabrik.codex_auth.pool.failed_refresh_token_reused",
96
+ counts.refresh_token_reused,
97
+ attrs,
98
+ );
99
+ sink.gauge("fabrik.codex_auth.pool.failed_auth_invalid", counts.auth_invalid, attrs);
100
+ }
101
+
102
+ export function recordCodexAuthFailure(
103
+ failure: { authName: string; kind: FailureKind },
104
+ details: CodexAuthBlockedDetails,
105
+ ctx: CodexAuthTelemetryContext,
106
+ ): void {
107
+ const attrs = {
108
+ ...baseAttrs(ctx),
109
+ auth_name: failure.authName,
110
+ failure_kind: failure.kind,
111
+ };
112
+ sink.counter("fabrik.codex_auth.failure_total", 1, attrs);
113
+ sink.event("codex.auth.failure", {
114
+ ...attrs,
115
+ total: details.total,
116
+ failed: details.failed,
117
+ remaining: details.remaining,
118
+ });
119
+ }
120
+
121
+ export function recordCodexAuthRotation(
122
+ rotation: { fromAuthName?: string; toAuthName: string; reason: string },
123
+ details: CodexAuthBlockedDetails,
124
+ ctx: CodexAuthTelemetryContext,
125
+ ): void {
126
+ const attrs = {
127
+ ...baseAttrs(ctx),
128
+ ...(rotation.fromAuthName ? { from_auth_name: rotation.fromAuthName } : {}),
129
+ to_auth_name: rotation.toAuthName,
130
+ reason: rotation.reason,
131
+ };
132
+ sink.counter("fabrik.codex_auth.rotation_total", 1, attrs);
133
+ sink.event("codex.auth.rotation", {
134
+ ...attrs,
135
+ total: details.total,
136
+ failed: details.failed,
137
+ remaining: details.remaining,
138
+ });
139
+ }
140
+
141
+ export function recordCodexAuthExhausted(
142
+ details: CodexAuthBlockedDetails,
143
+ ctx: CodexAuthTelemetryContext,
144
+ ): void {
145
+ const attrs = baseAttrs(ctx);
146
+ sink.counter("fabrik.codex_auth.exhausted_total", 1, attrs);
147
+ sink.event("codex.auth.exhausted", {
148
+ ...attrs,
149
+ total: details.total,
150
+ failed: details.failed,
151
+ remaining: details.remaining,
152
+ });
153
+ }
package/src/codex-auth.ts CHANGED
@@ -7,9 +7,23 @@ import {
7
7
  getCredentialMountPath,
8
8
  type FailureKind,
9
9
  } from "./credential-pool";
10
+ import {
11
+ recordCodexAuthExhausted,
12
+ recordCodexAuthFailure,
13
+ recordCodexAuthPoolSnapshot,
14
+ recordCodexAuthRotation,
15
+ type CodexAuthBlockedDetails,
16
+ } from "./codex-auth-telemetry";
17
+
18
+ export type { CodexAuthBlockedDetails } from "./codex-auth-telemetry";
10
19
 
11
20
  const DEFAULT_CODEX_DIR = resolve(process.env.HOME ?? "", ".codex");
12
21
 
22
+ // Resolution order:
23
+ // 1. CODEX_AUTH_SOURCE_DIR — test/dev override
24
+ // 2. FABRIK_SHARED_CREDENTIALS_DIR — production (set by fabrik-cli dispatch)
25
+ // 3. credential mount path — fallback if directory exists
26
+ // 4. ~/.codex — local dev default
13
27
  function getCodexAuthSourceDir(): string {
14
28
  const sourceDir =
15
29
  process.env.CODEX_AUTH_SOURCE_DIR ??
@@ -18,18 +32,20 @@ function getCodexAuthSourceDir(): string {
18
32
  return resolve(sourceDir);
19
33
  }
20
34
 
21
- export const CODEX_AUTH_HOME = resolve(
22
- process.env.CODEX_AUTH_HOME ?? resolve(tmpdir(), "codex-auth-pool"),
23
- );
35
+ export function getCodexAuthHome(): string {
36
+ return resolve(process.env.CODEX_AUTH_HOME ?? resolve(tmpdir(), "codex-auth-pool"));
37
+ }
38
+
24
39
 
25
- const NOTIFY_WEBHOOK_URL = process.env.CODEX_AUTH_NOTIFY_WEBHOOK_URL?.trim() ?? "";
26
- const NOTIFY_CLUSTER = process.env.KUBERNETES_NAMESPACE?.trim() ?? "";
27
- const NOTIFY_RUN_ID = process.env.SMITHERS_RUN_ID?.trim() ?? "";
40
+ const getNotifyWebhookUrl = () => process.env.CODEX_AUTH_NOTIFY_WEBHOOK_URL?.trim() ?? "";
41
+ const getNotifyCluster = () => process.env.KUBERNETES_NAMESPACE?.trim() ?? "";
42
+ const getNotifyRunID = () => process.env.SMITHERS_RUN_ID?.trim() ?? "";
28
43
 
29
- const AUTH_ROTATE_PATTERN =
30
- /no last agent message|usage limit|quota|rate limit|insufficient (?:credits|balance|quota)|payment required|billing|exceeded.*(quota|limit)|not signed in|please run 'codex login'|unauthorized|authentication required|authentication failed|forbidden|invalid (?:api key|token|credentials)|expired (?:token|credentials)/i;
31
- const AUTH_REFRESH_REUSED_PATTERN =
32
- /refresh_token_reused|refresh token has already been used|could not be refreshed because your refresh token was already used/i;
44
+ type AuthEntry = {
45
+ path: string;
46
+ authName: string;
47
+ contents: string;
48
+ };
33
49
 
34
50
  const listAuthFiles = (): string[] => {
35
51
  const sourceDir = getCodexAuthSourceDir();
@@ -40,83 +56,63 @@ const listAuthFiles = (): string[] => {
40
56
  .sort();
41
57
  };
42
58
 
43
- const ensureCodexHome = () => {
44
- if (!existsSync(CODEX_AUTH_HOME)) {
45
- mkdirSync(CODEX_AUTH_HOME, { recursive: true });
59
+ const ensureDir = (dir: string) => {
60
+ if (!existsSync(dir)) {
61
+ mkdirSync(dir, { recursive: true });
46
62
  }
47
63
  };
48
64
 
49
- let authPool = listAuthFiles();
50
- let authIndex = 0;
51
- let activeAuth = "";
52
- const authFailures = new Map<string, FailureKind>();
65
+ export class CodexAuthBlockedError extends Error {
66
+ readonly code = "CODEX_AUTH_BLOCKED" as const;
67
+ readonly reason = "auth_pool_exhausted" as const;
68
+ readonly details: CodexAuthBlockedDetails;
69
+ readonly runId?: string;
70
+ readonly namespace?: string;
53
71
 
54
- export function resetCodexAuthStateForTests(): void {
55
- authPool = [];
56
- authIndex = 0;
57
- activeAuth = "";
58
- authFailures.clear();
72
+ constructor(args: {
73
+ message?: string;
74
+ details: CodexAuthBlockedDetails;
75
+ runId?: string;
76
+ namespace?: string;
77
+ cause?: unknown;
78
+ }) {
79
+ super(args.message ?? "Codex auth pool exhausted", { cause: args.cause });
80
+ this.name = "CodexAuthBlockedError";
81
+ this.details = args.details;
82
+ this.runId = args.runId;
83
+ this.namespace = args.namespace;
84
+ }
59
85
  }
60
86
 
61
- const setActiveAuth = (authPath: string, reason: string) => {
62
- ensureCodexHome();
63
- const authContents = readFileSync(authPath, "utf8");
64
- writeFileSync(resolve(CODEX_AUTH_HOME, "auth.json"), authContents, "utf8");
65
- const previous = activeAuth ? ` from ${basename(activeAuth)}` : "";
66
- activeAuth = authPath;
67
- console.error(
68
- `[fabrik-runtime] codex auth rotation${previous} -> ${basename(authPath)} (${reason})`,
69
- );
70
- };
71
-
72
- const initAuthPool = () => {
73
- ensureCodexHome();
74
- authPool = listAuthFiles();
75
- if (authPool.length === 0 || activeAuth) return;
76
- const defaultAuth = resolve(getCodexAuthSourceDir(), "auth.json");
77
- if (existsSync(defaultAuth)) {
78
- setActiveAuth(defaultAuth, "initial");
79
- return;
80
- }
81
- setActiveAuth(authPool[0]!, "initial");
82
- };
87
+ const telemetryContext = () => ({
88
+ runId: getNotifyRunID() || undefined,
89
+ namespace: getNotifyCluster() || undefined,
90
+ });
83
91
 
84
- const logAuthSummary = () => {
85
- const total = authPool.length;
86
- const failed = [...authFailures.entries()].map(
87
- ([path, status]) => `${basename(path)}:${status}`,
88
- );
89
- const failedCount = authFailures.size;
90
- const remaining = Math.max(total - failedCount, 0);
91
- const active = activeAuth ? basename(activeAuth) : "none";
92
- console.error(
93
- `[fabrik-runtime] codex auth pool summary: total=${total} failed=${failedCount} remaining=${remaining} active=${active}`,
92
+ const writeBlockerArtifact = (details: CodexAuthBlockedDetails) => {
93
+ const smithersHome = resolve(process.env.SMITHERS_HOME ?? ".");
94
+ const blockerPath = resolve(smithersHome, ".smithers", "blockers", "codex-auth.json");
95
+ mkdirSync(resolve(blockerPath, ".."), { recursive: true });
96
+ writeFileSync(
97
+ blockerPath,
98
+ JSON.stringify(
99
+ {
100
+ kind: "auth_pool_exhausted",
101
+ resumable: true,
102
+ runId: getNotifyRunID() || undefined,
103
+ namespace: getNotifyCluster() || undefined,
104
+ details,
105
+ },
106
+ null,
107
+ 2,
108
+ ),
109
+ "utf8",
94
110
  );
95
- if (failed.length > 0) {
96
- console.error(`[fabrik-runtime] failed auths: ${failed.join(", ")}`);
97
- }
98
- };
99
-
100
- const rotateAuth = (reason: string): boolean => {
101
- authPool = listAuthFiles();
102
- if (authPool.length === 0) return false;
103
- for (let i = 0; i < authPool.length; i += 1) {
104
- const next = authPool[authIndex % authPool.length];
105
- authIndex += 1;
106
- if (next && next !== activeAuth && !authFailures.has(next)) {
107
- setActiveAuth(next, reason);
108
- logAuthSummary();
109
- return true;
110
- }
111
- }
112
- console.error("[fabrik-runtime] no codex auth left to rotate to");
113
- logAuthSummary();
114
- return false;
115
111
  };
116
112
 
117
113
  export const withCodexAuthPoolEnv = (env: Record<string, string>) => ({
118
114
  ...env,
119
- CODEX_HOME: CODEX_AUTH_HOME,
115
+ CODEX_HOME: getCodexAuthHome(),
120
116
  });
121
117
 
122
118
  export type AuthFailureKind = FailureKind;
@@ -142,9 +138,10 @@ const notifyAuthFailure = async (
142
138
  if (onAuthFailure) {
143
139
  await onAuthFailure(event);
144
140
  }
145
- if (!NOTIFY_WEBHOOK_URL) return;
141
+ const webhookUrl = getNotifyWebhookUrl();
142
+ if (!webhookUrl) return;
146
143
  try {
147
- const response = await fetch(NOTIFY_WEBHOOK_URL, {
144
+ const response = await fetch(webhookUrl, {
148
145
  method: "POST",
149
146
  headers: { "content-type": "application/json" },
150
147
  body: JSON.stringify(event),
@@ -175,6 +172,10 @@ export const createCodexAgentWithPool = (
175
172
  export class RotatingCodexAgent {
176
173
  private readonly inner: CodexAgent;
177
174
  private readonly onAuthFailure?: RotatingCodexAgentOptions["onAuthFailure"];
175
+ private authPool: AuthEntry[] = [];
176
+ private authIndex = 0;
177
+ private activeAuth: AuthEntry | null = null;
178
+ private readonly authFailures = new Map<string, FailureKind>();
178
179
 
179
180
  constructor(inner: CodexAgent, opts: RotatingCodexAgentOptions = {}) {
180
181
  this.inner = inner;
@@ -189,9 +190,136 @@ export class RotatingCodexAgent {
189
190
  return this.inner.tools;
190
191
  }
191
192
 
193
+ private failureKey(entry: AuthEntry): string {
194
+ return `${entry.path}:${entry.contents.length}:${Bun.hash(entry.contents)}`;
195
+ }
196
+
197
+ private scanAuthPool(): AuthEntry[] {
198
+ return listAuthFiles().map((path) => ({
199
+ path,
200
+ authName: basename(path),
201
+ contents: readFileSync(path, "utf8"),
202
+ }));
203
+ }
204
+
205
+ private refreshAuthPool(): void {
206
+ this.authPool = this.scanAuthPool();
207
+ if (!this.activeAuth) return;
208
+ const nextActive = this.authPool.find((entry) => entry.path === this.activeAuth?.path) ?? null;
209
+ this.activeAuth = nextActive;
210
+ }
211
+
212
+ private getBlockedDetails(): CodexAuthBlockedDetails {
213
+ this.refreshAuthPool();
214
+ const failedAuths = this.authPool
215
+ .filter((entry) => this.authFailures.has(this.failureKey(entry)))
216
+ .map((entry) => ({
217
+ authName: entry.authName,
218
+ kind: this.authFailures.get(this.failureKey(entry))!,
219
+ }));
220
+ return {
221
+ total: this.authPool.length,
222
+ failed: failedAuths.length,
223
+ remaining: Math.max(this.authPool.length - failedAuths.length, 0),
224
+ activeAuthName: this.activeAuth?.authName ?? null,
225
+ failedAuths,
226
+ };
227
+ }
228
+
229
+ private logAuthSummary(): void {
230
+ const details = this.getBlockedDetails();
231
+ recordCodexAuthPoolSnapshot(details, telemetryContext());
232
+ const failed = details.failedAuths.map(({ authName, kind }) => `${authName}:${kind}`);
233
+ const active = details.activeAuthName ?? "none";
234
+ console.error(
235
+ `[fabrik-runtime] codex auth pool summary: total=${details.total} failed=${details.failed} remaining=${details.remaining} active=${active}`,
236
+ );
237
+ if (failed.length > 0) {
238
+ console.error(`[fabrik-runtime] failed auths: ${failed.join(", ")}`);
239
+ }
240
+ }
241
+
242
+ /** Write the active auth file to the codex auth home dir without emitting telemetry or logs. */
243
+ private syncActiveAuthFile(entry: AuthEntry): void {
244
+ const home = getCodexAuthHome();
245
+ ensureDir(home);
246
+ writeFileSync(resolve(home, "auth.json"), entry.contents, "utf8");
247
+ }
248
+
249
+ /** Activate a credential: write file, update state, emit telemetry + log. */
250
+ private setActiveAuth(entry: AuthEntry, reason: string): void {
251
+ this.syncActiveAuthFile(entry);
252
+ const previousAuth = this.activeAuth?.authName;
253
+ const previous = previousAuth ? ` from ${previousAuth}` : "";
254
+ this.activeAuth = entry;
255
+ recordCodexAuthRotation(
256
+ {
257
+ fromAuthName: previousAuth,
258
+ toAuthName: entry.authName,
259
+ reason,
260
+ },
261
+ this.getBlockedDetails(),
262
+ telemetryContext(),
263
+ );
264
+ console.error(
265
+ `[fabrik-runtime] codex auth rotation${previous} -> ${entry.authName} (${reason})`,
266
+ );
267
+ }
268
+
269
+ private ensureActiveAuth(): void {
270
+ const home = getCodexAuthHome();
271
+ ensureDir(home);
272
+ this.refreshAuthPool();
273
+ if (this.authPool.length === 0) return;
274
+ const currentFailed =
275
+ this.activeAuth && this.authFailures.has(this.failureKey(this.activeAuth));
276
+ if (this.activeAuth && !currentFailed) {
277
+ // Re-sync file if contents changed on disk (operator rotated credentials),
278
+ // or if the file doesn't exist yet. No telemetry — this is not a rotation.
279
+ const authFile = resolve(home, "auth.json");
280
+ const onDisk = existsSync(authFile) ? readFileSync(authFile, "utf8") : null;
281
+ if (onDisk !== this.activeAuth.contents) {
282
+ this.syncActiveAuthFile(this.activeAuth);
283
+ }
284
+ return;
285
+ }
286
+ const defaultAuth = resolve(getCodexAuthSourceDir(), "auth.json");
287
+ const initial =
288
+ this.authPool.find((entry) => entry.path === defaultAuth && !this.authFailures.has(this.failureKey(entry))) ??
289
+ this.authPool.find((entry) => !this.authFailures.has(this.failureKey(entry)));
290
+ if (!initial) {
291
+ this.activeAuth = this.authPool.find((entry) => entry.path === this.activeAuth?.path) ?? null;
292
+ return;
293
+ }
294
+ const reason = this.activeAuth ? "refresh" : "initial";
295
+ this.setActiveAuth(initial, reason);
296
+ this.authIndex = this.authPool.findIndex((entry) => entry.path === initial.path) + 1;
297
+ }
298
+
299
+ private rotateAuth(reason: string): boolean {
300
+ this.refreshAuthPool();
301
+ if (this.authPool.length === 0) return false;
302
+ for (let i = 0; i < this.authPool.length; i += 1) {
303
+ const next = this.authPool[this.authIndex % this.authPool.length];
304
+ this.authIndex += 1;
305
+ if (
306
+ next &&
307
+ next.path !== this.activeAuth?.path &&
308
+ !this.authFailures.has(this.failureKey(next))
309
+ ) {
310
+ this.setActiveAuth(next, reason);
311
+ this.logAuthSummary();
312
+ return true;
313
+ }
314
+ }
315
+ console.error("[fabrik-runtime] no codex auth left to rotate to");
316
+ this.logAuthSummary();
317
+ return false;
318
+ }
319
+
192
320
  async generate(args: Parameters<CodexAgent["generate"]>[0]) {
193
- initAuthPool();
194
- const attempts = Math.max(authPool.length, 1);
321
+ this.ensureActiveAuth();
322
+ const attempts = Math.max(this.authPool.length, 1);
195
323
  let lastError: unknown = null;
196
324
  for (let i = 0; i < attempts; i += 1) {
197
325
  try {
@@ -199,33 +327,56 @@ export class RotatingCodexAgent {
199
327
  } catch (err) {
200
328
  lastError = err;
201
329
  const message = err instanceof Error ? err.message : String(err);
202
- if (!AUTH_ROTATE_PATTERN.test(message)) {
330
+ const kind = classifyFailure(message);
331
+ if (kind === "unknown") {
203
332
  throw err;
204
333
  }
205
- if (activeAuth) {
206
- const kind = classifyFailure(message);
207
- authFailures.set(activeAuth, kind);
208
- if (AUTH_REFRESH_REUSED_PATTERN.test(message)) {
334
+ if (this.activeAuth) {
335
+ this.authFailures.set(this.failureKey(this.activeAuth), kind);
336
+ recordCodexAuthFailure(
337
+ { authName: this.activeAuth.authName, kind },
338
+ this.getBlockedDetails(),
339
+ telemetryContext(),
340
+ );
341
+ if (kind === "refresh_token_reused") {
209
342
  console.error("[fabrik-runtime] codex refresh token reused; re-auth required");
210
343
  }
211
344
  await notifyAuthFailure(
212
345
  {
213
- authPath: activeAuth,
214
- authName: basename(activeAuth),
346
+ authPath: this.activeAuth.path,
347
+ authName: this.activeAuth.authName,
215
348
  reason: "codex generate failed and rotation was requested",
216
349
  kind,
217
350
  message,
218
- clusterNamespace: NOTIFY_CLUSTER || undefined,
219
- runId: NOTIFY_RUN_ID || undefined,
351
+ clusterNamespace: getNotifyCluster() || undefined,
352
+ runId: getNotifyRunID() || undefined,
220
353
  },
221
354
  this.onAuthFailure,
222
355
  );
223
356
  }
224
- if (!rotateAuth("codex auth / usage failure")) {
225
- break;
357
+ if (!this.rotateAuth("codex auth / usage failure")) {
358
+ const details = this.getBlockedDetails();
359
+ recordCodexAuthExhausted(details, telemetryContext());
360
+ writeBlockerArtifact(details);
361
+ throw new CodexAuthBlockedError({
362
+ message: "Codex auth pool exhausted",
363
+ details,
364
+ runId: getNotifyRunID() || undefined,
365
+ namespace: getNotifyCluster() || undefined,
366
+ cause: err,
367
+ });
226
368
  }
227
369
  }
228
370
  }
229
- throw lastError ?? new Error("Codex auth pool exhausted");
371
+ const details = this.getBlockedDetails();
372
+ recordCodexAuthExhausted(details, telemetryContext());
373
+ writeBlockerArtifact(details);
374
+ throw new CodexAuthBlockedError({
375
+ message: "Codex auth pool exhausted",
376
+ details,
377
+ runId: getNotifyRunID() || undefined,
378
+ namespace: getNotifyCluster() || undefined,
379
+ cause: lastError,
380
+ });
230
381
  }
231
382
  }
package/src/index.ts CHANGED
@@ -25,13 +25,15 @@ export {
25
25
  } from "./credential-pool";
26
26
 
27
27
  export {
28
- CODEX_AUTH_HOME,
28
+ getCodexAuthHome,
29
29
  withCodexAuthPoolEnv,
30
30
  createCodexAgentWithPool,
31
31
  RotatingCodexAgent,
32
+ CodexAuthBlockedError,
32
33
  type AuthFailureKind,
33
34
  type AuthFailureEvent,
34
35
  type RotatingCodexAgentOptions,
36
+ type CodexAuthBlockedDetails,
35
37
  } from "./codex-auth";
36
38
 
37
39
  export {
package/src/jj-shell.ts CHANGED
@@ -22,6 +22,12 @@ type JjResult = {
22
22
  exitCode: number;
23
23
  };
24
24
 
25
+ const PUSH_RETRY_LIMIT = 3;
26
+ const STALE_REF_PATTERNS = [
27
+ "unexpectedly moved on the remote",
28
+ "reason: stale info",
29
+ ];
30
+
25
31
  async function jj(args: string[], cwd: string): Promise<JjResult> {
26
32
  const result = await $`jj ${args}`.cwd(cwd).nothrow().quiet();
27
33
  return {
@@ -32,6 +38,47 @@ async function jj(args: string[], cwd: string): Promise<JjResult> {
32
38
  };
33
39
  }
34
40
 
41
+ function isStaleRefPushFailure(result: JjResult): boolean {
42
+ const text = `${result.stdout}\n${result.stderr}`;
43
+ return STALE_REF_PATTERNS.some((pattern) => text.includes(pattern));
44
+ }
45
+
46
+ function summarizeJjResult(label: string, result: JjResult): string {
47
+ const details = [result.stdout, result.stderr].filter(Boolean).join(" | ");
48
+ return `${label} exit=${result.exitCode}${details ? ` ${details}` : ""}`;
49
+ }
50
+
51
+ async function hasConflicts(workspacePath: string): Promise<boolean> {
52
+ const conflicts = await jj(["log", "-r", "conflicts()", "--no-graph", "-T", "commit_id"], workspacePath);
53
+ if (!conflicts.ok) {
54
+ return true;
55
+ }
56
+ return conflicts.stdout.trim().length > 0;
57
+ }
58
+
59
+ async function trackRemoteBookmark(
60
+ workspacePath: string,
61
+ bookmarkName: string,
62
+ ): Promise<JjResult> {
63
+ return jj(["bookmark", "track", `glob:${bookmarkName}`, "--remote", "origin"], workspacePath);
64
+ }
65
+
66
+ async function setBookmarkToTarget(
67
+ workspacePath: string,
68
+ bookmarkName: string,
69
+ targetRev: string,
70
+ ): Promise<JjResult> {
71
+ const move = await jj(
72
+ ["bookmark", "set", bookmarkName, "-r", targetRev, "--allow-backwards"],
73
+ workspacePath,
74
+ );
75
+ if (move.ok) {
76
+ return move;
77
+ }
78
+
79
+ return jj(["bookmark", "create", "-r", targetRev, bookmarkName], workspacePath);
80
+ }
81
+
35
82
  export async function prepareWorkspaces(
36
83
  repoRoot: string,
37
84
  workspacesDir: string,
@@ -137,76 +184,131 @@ export async function pushBookmark(
137
184
  ticketId: string,
138
185
  ): Promise<ReportOutput> {
139
186
  const targetRev = "@-";
140
- const track = await jj(
141
- ["bookmark", "track", bookmarkName, "--remote", "origin"],
142
- workspacePath,
143
- );
187
+ const track = await trackRemoteBookmark(workspacePath, bookmarkName);
144
188
  const trackSummary =
145
189
  track.ok || track.stderr === ""
146
190
  ? ""
147
191
  : ` Tracking remote bookmark reported: ${track.stderr}`;
148
192
 
149
- const targetCommit = await jj(
150
- ["log", "-r", targetRev, "--no-graph", "-T", "commit_id"],
151
- workspacePath,
152
- );
153
- if (!targetCommit.ok || !targetCommit.stdout) {
154
- return {
155
- ticketId,
156
- status: "blocked",
157
- summary: `Failed to resolve target revision for bookmark push: ${targetCommit.stderr}`,
158
- };
159
- }
193
+ let lastAttemptSummary = "";
160
194
 
161
- const move = await jj(
162
- ["bookmark", "set", bookmarkName, "-r", targetRev, "--allow-backwards"],
163
- workspacePath,
164
- );
195
+ for (let attempt = 1; attempt <= PUSH_RETRY_LIMIT; attempt += 1) {
196
+ if (attempt > 1) {
197
+ const fetch = await jj(["git", "fetch"], workspacePath);
198
+ if (!fetch.ok) {
199
+ return {
200
+ ticketId,
201
+ status: "blocked",
202
+ summary:
203
+ `Bookmark push retry ${attempt}/${PUSH_RETRY_LIMIT} failed during fetch: ${fetch.stderr || fetch.stdout}.` +
204
+ trackSummary +
205
+ (lastAttemptSummary ? ` Last push state: ${lastAttemptSummary}` : ""),
206
+ };
207
+ }
208
+
209
+ const retryTrack = await trackRemoteBookmark(workspacePath, bookmarkName);
210
+ if (!retryTrack.ok && retryTrack.stderr !== "") {
211
+ lastAttemptSummary = `${lastAttemptSummary} ${summarizeJjResult("track", retryTrack)}`.trim();
212
+ }
165
213
 
166
- if (!move.ok) {
167
- const create = await jj(
168
- ["bookmark", "create", "-r", targetRev, bookmarkName],
214
+ const rebase = await jj(
215
+ ["rebase", "-s", `roots(${bookmarkName}@origin..${targetRev})`, "-d", `${bookmarkName}@origin`],
216
+ workspacePath,
217
+ );
218
+ if (!rebase.ok) {
219
+ return {
220
+ ticketId,
221
+ status: "blocked",
222
+ summary:
223
+ `Bookmark push retry ${attempt}/${PUSH_RETRY_LIMIT} failed during rebase: ${rebase.stderr || rebase.stdout}.` +
224
+ trackSummary +
225
+ (lastAttemptSummary ? ` Last push state: ${lastAttemptSummary}` : ""),
226
+ };
227
+ }
228
+
229
+ if (await hasConflicts(workspacePath)) {
230
+ return {
231
+ ticketId,
232
+ status: "blocked",
233
+ summary:
234
+ `Bookmark push retry ${attempt}/${PUSH_RETRY_LIMIT} stopped after rebase conflict on '${bookmarkName}'.` +
235
+ ` ${summarizeJjResult("rebase", rebase)}` +
236
+ trackSummary,
237
+ };
238
+ }
239
+ }
240
+
241
+ const targetCommit = await jj(
242
+ ["log", "-r", targetRev, "--no-graph", "-T", "commit_id"],
169
243
  workspacePath,
170
244
  );
171
- if (!create.ok) {
245
+ if (!targetCommit.ok || !targetCommit.stdout) {
172
246
  return {
173
247
  ticketId,
174
248
  status: "blocked",
175
- summary: `Failed to set bookmark '${bookmarkName}': ${create.stderr}`,
249
+ summary: `Failed to resolve target revision for bookmark push: ${targetCommit.stderr}`,
176
250
  };
177
251
  }
178
- }
179
252
 
180
- const push = await jj(
181
- ["git", "push", "--bookmark", bookmarkName],
182
- workspacePath,
183
- );
184
- if (!push.ok) {
185
- return {
186
- ticketId,
187
- status: "blocked",
188
- summary: `Bookmark set but push failed: ${push.stderr}${trackSummary}`,
189
- };
190
- }
253
+ const move = await setBookmarkToTarget(workspacePath, bookmarkName, targetRev);
254
+ if (!move.ok) {
255
+ return {
256
+ ticketId,
257
+ status: "blocked",
258
+ summary: `Failed to set bookmark '${bookmarkName}': ${move.stderr}`,
259
+ };
260
+ }
261
+
262
+ const push = await jj(
263
+ ["git", "push", "--bookmark", bookmarkName],
264
+ workspacePath,
265
+ );
266
+ if (!push.ok) {
267
+ lastAttemptSummary = [
268
+ `attempt ${attempt}/${PUSH_RETRY_LIMIT}`,
269
+ summarizeJjResult("push", push),
270
+ ].join(" ");
271
+
272
+ if (isStaleRefPushFailure(push) && attempt < PUSH_RETRY_LIMIT) {
273
+ continue;
274
+ }
275
+
276
+ const exhausted = isStaleRefPushFailure(push) && attempt === PUSH_RETRY_LIMIT;
277
+ return {
278
+ ticketId,
279
+ status: "blocked",
280
+ summary:
281
+ `${exhausted ? `Bookmark push retries exhausted for '${bookmarkName}'.` : "Bookmark set but push failed:"} ${push.stderr || push.stdout}` +
282
+ trackSummary +
283
+ (lastAttemptSummary ? ` Last attempt: ${lastAttemptSummary}` : ""),
284
+ };
285
+ }
286
+
287
+ const remote = await $`git ls-remote origin refs/heads/${bookmarkName}`
288
+ .cwd(workspacePath)
289
+ .nothrow()
290
+ .quiet();
291
+ const remoteCommit = remote.stdout.toString().trim().split(/\s+/)[0] ?? "";
292
+ if (remote.exitCode !== 0 || remoteCommit !== targetCommit.stdout) {
293
+ return {
294
+ ticketId,
295
+ status: "blocked",
296
+ summary:
297
+ `Bookmark push returned success but remote ${bookmarkName} is ${remoteCommit || "missing"} instead of ${targetCommit.stdout}.` +
298
+ trackSummary,
299
+ };
300
+ }
191
301
 
192
- const remote = await $`git ls-remote origin refs/heads/${bookmarkName}`
193
- .cwd(workspacePath)
194
- .nothrow()
195
- .quiet();
196
- const remoteCommit = remote.stdout.toString().trim().split(/\s+/)[0] ?? "";
197
- if (remote.exitCode !== 0 || remoteCommit !== targetCommit.stdout) {
198
302
  return {
199
303
  ticketId,
200
- status: "blocked",
201
- summary:
202
- `Bookmark push returned success but remote ${bookmarkName} is ${remoteCommit || "missing"} instead of ${targetCommit.stdout}.` +
203
- trackSummary,
304
+ status: "done",
305
+ summary: `Pushed bookmark '${bookmarkName}' to origin at ${targetCommit.stdout}.${trackSummary}`,
204
306
  };
205
307
  }
206
308
 
207
309
  return {
208
310
  ticketId,
209
- status: "done",
210
- summary: `Pushed bookmark '${bookmarkName}' to origin at ${targetCommit.stdout}.${trackSummary}`,
311
+ status: "blocked",
312
+ summary: `Bookmark push retries exhausted for '${bookmarkName}'.${trackSummary}${lastAttemptSummary ? ` Last attempt: ${lastAttemptSummary}` : ""}`,
211
313
  };
212
314
  }