codex-api-proxy 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,347 @@
1
+ Metadata-Version: 2.4
2
+ Name: codex-api-proxy
3
+ Version: 0.1.0
4
+ Summary: Local OpenAI-compatible HTTP proxy backed by Codex CLI
5
+ Author: codex-api-proxy contributors
6
+ License-Expression: MIT
7
+ Keywords: codex,openai,proxy,api,local
8
+ Classifier: Development Status :: 3 - Alpha
9
+ Classifier: Environment :: Console
10
+ Classifier: Framework :: FastAPI
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: Programming Language :: Python :: 3.11
14
+ Classifier: Programming Language :: Python :: 3.12
15
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
16
+ Requires-Python: >=3.11
17
+ Description-Content-Type: text/markdown
18
+ Requires-Dist: fastapi<1,>=0.115
19
+ Requires-Dist: pydantic<3,>=2.7
20
+ Requires-Dist: uvicorn[standard]<1,>=0.30
21
+ Provides-Extra: dev
22
+ Requires-Dist: httpx<1,>=0.27; extra == "dev"
23
+ Requires-Dist: pytest<9,>=8; extra == "dev"
24
+ Requires-Dist: pytest-asyncio<1,>=0.23; extra == "dev"
25
+
26
+ # codex-api-proxy
27
+
28
+ Local OpenAI-compatible HTTP proxy backed by local Codex credentials.
29
+
30
+ This project exposes a minimal `/v1/chat/completions` API for local automation. By default, requests are executed through `codex exec --json --skip-git-repo-check --ignore-user-config --ignore-rules --sandbox read-only --ephemeral`, using the local Codex installation and its existing authentication.
31
+
32
+ ## Safety
33
+
34
+ The proxy defaults to `127.0.0.1` and should not be exposed publicly. Any client with access can spend your local Codex quota and can ask Codex to inspect files that are available to the selected Codex sandbox and workspace.
35
+
36
+ Set `CODEX_PROXY_API_KEY` to require `Authorization: Bearer <key>` on API requests.
37
+
38
+ If you start with `--host 0.0.0.0` or another non-loopback bind address without `--api-key`, `codex-api-proxy` prints a warning. Use a bearer token before exposing the service to anything other than a trusted local machine.
39
+
40
+ With the default `exec` engine, Codex subprocesses are launched with `--ignore-user-config` and `--ignore-rules`. This prevents proxy requests from loading user Codex config, MCP servers, plugins, skills, and rule files.
41
+
42
+ Codex subprocesses also use `--sandbox read-only` and `--ephemeral` by default. This keeps calls closer to one-shot model calls where the caller owns conversation context.
43
+
44
+ The experimental `app-server` engine uses Codex's long-lived app-server protocol to reduce process startup latency and stream assistant deltas. Each API request starts a fresh Codex thread and archives it after completion, so callers must continue sending full chat history in `messages`. The app-server process uses an isolated `CODEX_HOME` at `~/.codex-api-proxy/codex-home` by default. `codex-api-proxy` symlinks only the current Codex `auth.json` into that isolated home, so the app-server worker can reuse the existing login while not seeing the current user's `config.toml`, MCP config, or plugins. The app-server process is also started with `--disable apps`, `--disable plugins`, `--disable skill_mcp_dependency_install`, and `-c mcp_servers={}`. To keep skills out of the model-visible prompt, `codex-api-proxy` generates a `skills.config=[{name=...,enabled=false}]` override for known system skills and locally discovered skill names. Each request uses an empty `dynamicTools` list, empty `environments`, `approvalPolicy: never`, `sandbox: read-only`, and `ephemeral: true` by default.
45
+
46
+ ## Install
47
+
48
+ ```bash
49
+ pip3 install codex-api-proxy
50
+ ```
51
+
52
+ For local development from this checkout:
53
+
54
+ ```bash
55
+ python3 -m pip install -e '.[dev]'
56
+ ```
57
+
58
+ Make targets are available for local build and release tasks:
59
+
60
+ ```bash
61
+ make build-tools
62
+ make test
63
+ make build
64
+ make release-check
65
+ make publish VERSION=0.1.1
66
+ ```
67
+
68
+ `make publish VERSION=...` first syncs that version into `pyproject.toml` and `src/codex_api_proxy/__init__.py`, then runs tests, builds the package, validates the generated artifacts, and uploads them to PyPI.
69
+
70
+ ## Run
71
+
72
+ Start in the background:
73
+
74
+ ```bash
75
+ codex-api-proxy start
76
+ ```
77
+
78
+ By default, the service listens on `127.0.0.1:8765`.
79
+ The default Codex working directory is an empty workspace at `~/.codex-api-proxy/workspace`.
80
+
81
+ Bind to all interfaces:
82
+
83
+ ```bash
84
+ codex-api-proxy start --host 0.0.0.0
85
+ ```
86
+
87
+ Check status:
88
+
89
+ ```bash
90
+ codex-api-proxy status
91
+ ```
92
+
93
+ Show saved runtime settings:
94
+
95
+ ```bash
96
+ codex-api-proxy status --verbose
97
+ ```
98
+
99
+ Restart with the last successful `start` settings:
100
+
101
+ ```bash
102
+ codex-api-proxy restart
103
+ ```
104
+
105
+ Restart and override one setting:
106
+
107
+ ```bash
108
+ codex-api-proxy restart --proxy=http://127.0.0.1:8118
109
+ ```
110
+
111
+ Start with faster defaults:
112
+
113
+ ```bash
114
+ codex-api-proxy start --fast
115
+ ```
116
+
117
+ Start with experimental long-lived app-server workers:
118
+
119
+ ```bash
120
+ codex-api-proxy start --engine app-server --workers 2
121
+ ```
122
+
123
+ Start with an outbound proxy, faster defaults, and multiple app-server workers:
124
+
125
+ ```bash
126
+ codex-api-proxy start --proxy=http://127.0.0.1:8118 --fast --engine app-server --workers 4
127
+ ```
128
+
129
+ Stop:
130
+
131
+ ```bash
132
+ codex-api-proxy stop
133
+ ```
134
+
135
+ Run in the foreground for debugging:
136
+
137
+ ```bash
138
+ codex-api-proxy start --foreground
139
+ ```
140
+
141
+ ## Configuration
142
+
143
+ CLI options:
144
+
145
+ - `--host`: bind host, default `127.0.0.1`
146
+ - `--port`: bind port, default `8765`
147
+ - `--api-key`: require bearer auth
148
+ - `--codex-bin`: Codex executable, default `codex`
149
+ - `--proxy`: proxy URL passed to Codex as `http_proxy` and `https_proxy`
150
+ - `--model`: model passed to Codex
151
+ - `--engine`: execution engine, `exec` or `app-server`, default `exec`
152
+ - `--workers`: number of long-lived `app-server` workers, default `1`
153
+ - `--max-queue-size`: maximum queued `app-server` requests before returning `429`, default `64`
154
+ - `--queue-timeout-seconds`: maximum time to wait for an `app-server` worker, default `30`
155
+ - `--app-server-codex-home`: isolated `CODEX_HOME` used by `app-server` workers, default `~/.codex-api-proxy/codex-home`
156
+ - `--codex-config`: Codex config override passed as `-c key=value`, repeatable
157
+ - `--ephemeral`: run `codex exec` with `--ephemeral`, enabled by default
158
+ - `--fast`: use fast defaults: `--codex-config model_reasoning_effort="low"`
159
+ - `--default-cwd`: default Codex working directory, default `~/.codex-api-proxy/workspace`
160
+ - `--allowed-root`: allowed cwd root, repeatable, default `--default-cwd`
161
+ - `--timeout-seconds`: per-request timeout, default `300`
162
+ - `--max-concurrency`: maximum concurrent Codex executions, default `1`
163
+ - `--log-level`: Uvicorn log level, one of `debug`, `info`, `warning`, or `error`, default `info`
164
+ - `--pid-file`: daemon pid file, default `~/.codex-api-proxy/codex-api-proxy.pid`
165
+ - `--log-file`: daemon log file for `start`, default `~/.codex-api-proxy/codex-api-proxy.log`
166
+ - `--state-file`: daemon state file, default `~/.codex-api-proxy/codex-api-proxy.state.json`
167
+
168
+ `start` prints the state file path and the effective startup parameters. The state file is written with `0600` permissions and is used by `restart` to reuse the previous start settings. If `--api-key` is used, the key is redacted in terminal output but stored in the state file so `restart` can reuse it.
169
+
170
+ Environment variables are also supported when running the FastAPI app directly:
171
+
172
+ - `CODEX_PROXY_HOST`: bind host, default `127.0.0.1`
173
+ - `CODEX_PROXY_PORT`: bind port, default `8765`
174
+ - `CODEX_PROXY_API_KEY`: optional bearer token
175
+ - `CODEX_PROXY_CODEX_BIN`: Codex executable, default `codex`
176
+ - `CODEX_PROXY_PROXY`: proxy URL passed to Codex
177
+ - `CODEX_PROXY_MODEL`: model passed to Codex
178
+ - `CODEX_PROXY_ENGINE`: execution engine, `exec` or `app-server`, default `exec`
179
+ - `CODEX_PROXY_WORKERS`: number of long-lived `app-server` workers, default `1`
180
+ - `CODEX_PROXY_MAX_QUEUE_SIZE`: maximum queued `app-server` requests, default `64`
181
+ - `CODEX_PROXY_QUEUE_TIMEOUT_SECONDS`: maximum time to wait for an `app-server` worker, default `30`
182
+ - `CODEX_PROXY_APP_SERVER_CODEX_HOME`: isolated `CODEX_HOME` used by `app-server` workers
183
+ - `CODEX_PROXY_CODEX_CONFIGS`: `;;`-separated Codex config overrides passed as repeated `-c`
184
+ - `CODEX_PROXY_EPHEMERAL`: set to `1`, `true`, or `yes` to run `codex exec` with `--ephemeral`; defaults to `true`
185
+ - `CODEX_PROXY_DEFAULT_CWD`: default Codex working directory, default current directory
186
+ - `CODEX_PROXY_ALLOWED_ROOTS`: colon-separated allowed cwd roots, default `CODEX_PROXY_DEFAULT_CWD`
187
+ - `CODEX_PROXY_TIMEOUT_SECONDS`: per-request timeout, default `300`
188
+ - `CODEX_PROXY_MAX_CONCURRENCY`: maximum concurrent Codex executions, default `1`
189
+ - `CODEX_PROXY_LOG_LEVEL`: Uvicorn log level, default `info`
190
+
191
+ ## API
192
+
193
+ Health:
194
+
195
+ ```bash
196
+ curl -sS http://127.0.0.1:8765/health
197
+ ```
198
+
199
+ Models:
200
+
201
+ ```bash
202
+ curl -sS http://127.0.0.1:8765/v1/models
203
+ ```
204
+
205
+ Readiness:
206
+
207
+ ```bash
208
+ curl -sS http://127.0.0.1:8765/ready
209
+ ```
210
+
211
+ Local counters:
212
+
213
+ ```bash
214
+ curl -sS http://127.0.0.1:8765/metrics
215
+ ```
216
+
217
+ Chat completion:
218
+
219
+ ```bash
220
+ curl -sS http://127.0.0.1:8765/v1/chat/completions \
221
+ -H 'Content-Type: application/json' \
222
+ -d '{"model":"codex-local","messages":[{"role":"user","content":"Reply with exactly: pong"}]}'
223
+ ```
224
+
225
+ Streaming chat completion:
226
+
227
+ ```bash
228
+ curl -N http://127.0.0.1:8765/v1/chat/completions \
229
+ -H 'Content-Type: application/json' \
230
+ -d '{"model":"codex-local","stream":true,"messages":[{"role":"user","content":"Reply with exactly: pong"}]}'
231
+ ```
232
+
233
+ Streaming responses use OpenAI-compatible SSE events:
234
+
235
+ - `data: {"object":"chat.completion.chunk",...}` for assistant chunks
236
+ - `data: [DONE]` when the response is complete
237
+
238
+ With the default `exec` engine, the proxy streams at the HTTP protocol layer. The underlying Codex CLI currently provides the assistant answer through `codex exec --json`; if Codex only emits final assistant text for a request, the streamed content chunk will arrive after Codex completes.
239
+
240
+ With `--engine app-server`, the proxy maps Codex `item/agentMessage/delta` notifications to OpenAI-compatible SSE content chunks. This is experimental because Codex's app-server protocol is itself experimental.
241
+
242
+ ## Compatibility
243
+
244
+ `codex-api-proxy` is OpenAI-compatible for the local chat-completions shape, not a complete OpenAI API implementation.
245
+
246
+ Supported:
247
+
248
+ - `GET /v1/models`
249
+ - `POST /v1/chat/completions`
250
+ - `model`
251
+ - `messages`
252
+ - `stream`
253
+ - `metadata.cwd` for request-scoped working directory selection inside `--allowed-root`
254
+ - OpenAI-compatible non-streaming response envelope
255
+ - OpenAI-compatible SSE chunk envelope for streaming responses
256
+
257
+ Accepted but currently ignored:
258
+
259
+ - `temperature`
260
+ - `top_p`
261
+ - `max_tokens`
262
+ - `presence_penalty`
263
+ - `frequency_penalty`
264
+
265
+ Not supported:
266
+
267
+ - `tools` and `tool_choice`
268
+ - `response_format`
269
+ - `n` greater than one
270
+ - `stop`
271
+ - embeddings, responses, assistants, files, batches, audio, images, and other OpenAI endpoints
272
+ - accurate token `usage`; the response currently returns zero token counts because Codex CLI does not expose stable token accounting through this path
273
+
274
+ The app-server engine starts a fresh Codex thread for each API request and archives it after completion. Callers must include the full chat history in `messages`; `codex-api-proxy` does not preserve conversation state between API requests.
275
+
276
+ OpenAI Python SDK smoke test:
277
+
278
+ ```python
279
+ from openai import OpenAI
280
+
281
+ client = OpenAI(base_url="http://127.0.0.1:8765/v1", api_key="local-secret")
282
+
283
+ response = client.chat.completions.create(
284
+ model="codex-local",
285
+ messages=[{"role": "user", "content": "Reply with exactly: pong"}],
286
+ )
287
+ print(response.choices[0].message.content)
288
+ ```
289
+
290
+ When no `--api-key` is configured, most OpenAI SDKs still require a placeholder `api_key`; any non-empty value is fine.
291
+
292
+ ## Operations
293
+
294
+ Use `/health` for a lightweight process check and `/ready` for a readiness check that includes the selected engine and Codex executable availability. Use `/metrics` for local JSON counters:
295
+
296
+ - `requests_total`
297
+ - `requests_ok`
298
+ - `requests_error`
299
+ - `errors_by_status`
300
+ - `engine`
301
+ - `uptime_seconds`
302
+ - `app_server_pool_started`
303
+
304
+ Daemon logs are written to `~/.codex-api-proxy/codex-api-proxy.log` by default. `codex-api-proxy` does not rotate logs itself; use your OS log rotation mechanism if you run it long-term.
305
+
306
+ Latency logs:
307
+
308
+ Each chat completion writes a single-line JSON log with logger `codex_api_proxy.latency` and event `chat_completion_latency`. Streaming responses also write `chat_completion_first_sse` when the first SSE chunk is yielded.
309
+
310
+ For background daemon runs, inspect:
311
+
312
+ ```bash
313
+ rg 'codex_api_proxy.latency|chat_completion_latency|chat_completion_first_sse' ~/.codex-api-proxy/codex-api-proxy.log
314
+ ```
315
+
316
+ Important fields:
317
+
318
+ - `request_id`: correlates latency lines for the same request
319
+ - `stream`: whether the request used `stream: true`
320
+ - `engine`: `exec` or `app-server`
321
+ - `phases_ms.cwd_resolve`: cwd validation time
322
+ - `phases_ms.prompt_build`: OpenAI messages to Codex prompt conversion time
323
+ - `phases_ms.queue_wait`: time waiting for local admission before engine execution
324
+ - `phases_ms.codex_exec`: time spent inside `codex exec`
325
+ - `phases_ms.app_server_exec`: time spent inside the app-server worker turn
326
+ - `phases_ms.codex_command_build`: Codex command construction time
327
+ - `phases_ms.codex_process_spawn`: local subprocess spawn time
328
+ - `phases_ms.codex_stdin_write`: prompt write and stdin close time
329
+ - `phases_ms.codex_first_stdout_event`: elapsed time from Codex IO start until the first non-empty stdout JSONL line
330
+ - `phases_ms.codex_first_assistant_event`: elapsed time from Codex IO start until the first assistant message event
331
+ - `phases_ms.codex_stdout_read`: total time spent reading Codex stdout until EOF
332
+ - `phases_ms.codex_process_wait`: time waiting for the Codex process after stdout EOF
333
+ - `phases_ms.codex_communicate`: total Codex subprocess IO time
334
+ - `phases_ms.codex_output_parse`: Codex JSONL final-message parse time
335
+ - `phases_ms.response_build`: response object/SSE setup time
336
+ - `phases_ms.total`: total server-side request time before response is ready
337
+ - `time_to_first_sse_ms`: stream request time until the first SSE chunk is yielded
338
+ - `time_to_first_content_sse_ms`: app-server stream request time until the first content chunk is yielded
339
+
340
+ With auth:
341
+
342
+ ```bash
343
+ curl -sS http://127.0.0.1:8765/v1/chat/completions \
344
+ -H 'Authorization: Bearer local-secret' \
345
+ -H 'Content-Type: application/json' \
346
+ -d '{"model":"codex-local","messages":[{"role":"user","content":"Reply with exactly: pong"}]}'
347
+ ```
@@ -0,0 +1,322 @@
1
+ # codex-api-proxy
2
+
3
+ Local OpenAI-compatible HTTP proxy backed by local Codex credentials.
4
+
5
+ This project exposes a minimal `/v1/chat/completions` API for local automation. By default, requests are executed through `codex exec --json --skip-git-repo-check --ignore-user-config --ignore-rules --sandbox read-only --ephemeral`, using the local Codex installation and its existing authentication.
6
+
7
+ ## Safety
8
+
9
+ The proxy defaults to `127.0.0.1` and should not be exposed publicly. Any client with access can spend your local Codex quota and can ask Codex to inspect files that are available to the selected Codex sandbox and workspace.
10
+
11
+ Set `CODEX_PROXY_API_KEY` to require `Authorization: Bearer <key>` on API requests.
12
+
13
+ If you start with `--host 0.0.0.0` or another non-loopback bind address without `--api-key`, `codex-api-proxy` prints a warning. Use a bearer token before exposing the service to anything other than a trusted local machine.
14
+
15
+ With the default `exec` engine, Codex subprocesses are launched with `--ignore-user-config` and `--ignore-rules`. This prevents proxy requests from loading user Codex config, MCP servers, plugins, skills, and rule files.
16
+
17
+ Codex subprocesses also use `--sandbox read-only` and `--ephemeral` by default. This keeps calls closer to one-shot model calls where the caller owns conversation context.
18
+
19
+ The experimental `app-server` engine uses Codex's long-lived app-server protocol to reduce process startup latency and stream assistant deltas. Each API request starts a fresh Codex thread and archives it after completion, so callers must continue sending full chat history in `messages`. The app-server process uses an isolated `CODEX_HOME` at `~/.codex-api-proxy/codex-home` by default. `codex-api-proxy` symlinks only the current Codex `auth.json` into that isolated home, so the app-server worker can reuse the existing login while not seeing the current user's `config.toml`, MCP config, or plugins. The app-server process is also started with `--disable apps`, `--disable plugins`, `--disable skill_mcp_dependency_install`, and `-c mcp_servers={}`. To keep skills out of the model-visible prompt, `codex-api-proxy` generates a `skills.config=[{name=...,enabled=false}]` override for known system skills and locally discovered skill names. Each request uses an empty `dynamicTools` list, empty `environments`, `approvalPolicy: never`, `sandbox: read-only`, and `ephemeral: true` by default.
20
+
21
+ ## Install
22
+
23
+ ```bash
24
+ pip3 install codex-api-proxy
25
+ ```
26
+
27
+ For local development from this checkout:
28
+
29
+ ```bash
30
+ python3 -m pip install -e '.[dev]'
31
+ ```
32
+
33
+ Make targets are available for local build and release tasks:
34
+
35
+ ```bash
36
+ make build-tools
37
+ make test
38
+ make build
39
+ make release-check
40
+ make publish VERSION=0.1.1
41
+ ```
42
+
43
+ `make publish VERSION=...` first syncs that version into `pyproject.toml` and `src/codex_api_proxy/__init__.py`, then runs tests, builds the package, validates the generated artifacts, and uploads them to PyPI.
44
+
45
+ ## Run
46
+
47
+ Start in the background:
48
+
49
+ ```bash
50
+ codex-api-proxy start
51
+ ```
52
+
53
+ By default, the service listens on `127.0.0.1:8765`.
54
+ The default Codex working directory is an empty workspace at `~/.codex-api-proxy/workspace`.
55
+
56
+ Bind to all interfaces:
57
+
58
+ ```bash
59
+ codex-api-proxy start --host 0.0.0.0
60
+ ```
61
+
62
+ Check status:
63
+
64
+ ```bash
65
+ codex-api-proxy status
66
+ ```
67
+
68
+ Show saved runtime settings:
69
+
70
+ ```bash
71
+ codex-api-proxy status --verbose
72
+ ```
73
+
74
+ Restart with the last successful `start` settings:
75
+
76
+ ```bash
77
+ codex-api-proxy restart
78
+ ```
79
+
80
+ Restart and override one setting:
81
+
82
+ ```bash
83
+ codex-api-proxy restart --proxy=http://127.0.0.1:8118
84
+ ```
85
+
86
+ Start with faster defaults:
87
+
88
+ ```bash
89
+ codex-api-proxy start --fast
90
+ ```
91
+
92
+ Start with experimental long-lived app-server workers:
93
+
94
+ ```bash
95
+ codex-api-proxy start --engine app-server --workers 2
96
+ ```
97
+
98
+ Start with an outbound proxy, faster defaults, and multiple app-server workers:
99
+
100
+ ```bash
101
+ codex-api-proxy start --proxy=http://127.0.0.1:8118 --fast --engine app-server --workers 4
102
+ ```
103
+
104
+ Stop:
105
+
106
+ ```bash
107
+ codex-api-proxy stop
108
+ ```
109
+
110
+ Run in the foreground for debugging:
111
+
112
+ ```bash
113
+ codex-api-proxy start --foreground
114
+ ```
115
+
116
+ ## Configuration
117
+
118
+ CLI options:
119
+
120
+ - `--host`: bind host, default `127.0.0.1`
121
+ - `--port`: bind port, default `8765`
122
+ - `--api-key`: require bearer auth
123
+ - `--codex-bin`: Codex executable, default `codex`
124
+ - `--proxy`: proxy URL passed to Codex as `http_proxy` and `https_proxy`
125
+ - `--model`: model passed to Codex
126
+ - `--engine`: execution engine, `exec` or `app-server`, default `exec`
127
+ - `--workers`: number of long-lived `app-server` workers, default `1`
128
+ - `--max-queue-size`: maximum queued `app-server` requests before returning `429`, default `64`
129
+ - `--queue-timeout-seconds`: maximum time to wait for an `app-server` worker, default `30`
130
+ - `--app-server-codex-home`: isolated `CODEX_HOME` used by `app-server` workers, default `~/.codex-api-proxy/codex-home`
131
+ - `--codex-config`: Codex config override passed as `-c key=value`, repeatable
132
+ - `--ephemeral`: run `codex exec` with `--ephemeral`, enabled by default
133
+ - `--fast`: use fast defaults: `--codex-config model_reasoning_effort="low"`
134
+ - `--default-cwd`: default Codex working directory, default `~/.codex-api-proxy/workspace`
135
+ - `--allowed-root`: allowed cwd root, repeatable, default `--default-cwd`
136
+ - `--timeout-seconds`: per-request timeout, default `300`
137
+ - `--max-concurrency`: maximum concurrent Codex executions, default `1`
138
+ - `--log-level`: Uvicorn log level, one of `debug`, `info`, `warning`, or `error`, default `info`
139
+ - `--pid-file`: daemon pid file, default `~/.codex-api-proxy/codex-api-proxy.pid`
140
+ - `--log-file`: daemon log file for `start`, default `~/.codex-api-proxy/codex-api-proxy.log`
141
+ - `--state-file`: daemon state file, default `~/.codex-api-proxy/codex-api-proxy.state.json`
142
+
143
+ `start` prints the state file path and the effective startup parameters. The state file is written with `0600` permissions and is used by `restart` to reuse the previous start settings. If `--api-key` is used, the key is redacted in terminal output but stored in the state file so `restart` can reuse it.
144
+
145
+ Environment variables are also supported when running the FastAPI app directly:
146
+
147
+ - `CODEX_PROXY_HOST`: bind host, default `127.0.0.1`
148
+ - `CODEX_PROXY_PORT`: bind port, default `8765`
149
+ - `CODEX_PROXY_API_KEY`: optional bearer token
150
+ - `CODEX_PROXY_CODEX_BIN`: Codex executable, default `codex`
151
+ - `CODEX_PROXY_PROXY`: proxy URL passed to Codex
152
+ - `CODEX_PROXY_MODEL`: model passed to Codex
153
+ - `CODEX_PROXY_ENGINE`: execution engine, `exec` or `app-server`, default `exec`
154
+ - `CODEX_PROXY_WORKERS`: number of long-lived `app-server` workers, default `1`
155
+ - `CODEX_PROXY_MAX_QUEUE_SIZE`: maximum queued `app-server` requests, default `64`
156
+ - `CODEX_PROXY_QUEUE_TIMEOUT_SECONDS`: maximum time to wait for an `app-server` worker, default `30`
157
+ - `CODEX_PROXY_APP_SERVER_CODEX_HOME`: isolated `CODEX_HOME` used by `app-server` workers
158
+ - `CODEX_PROXY_CODEX_CONFIGS`: `;;`-separated Codex config overrides passed as repeated `-c`
159
+ - `CODEX_PROXY_EPHEMERAL`: set to `1`, `true`, or `yes` to run `codex exec` with `--ephemeral`; defaults to `true`
160
+ - `CODEX_PROXY_DEFAULT_CWD`: default Codex working directory, default current directory
161
+ - `CODEX_PROXY_ALLOWED_ROOTS`: colon-separated allowed cwd roots, default `CODEX_PROXY_DEFAULT_CWD`
162
+ - `CODEX_PROXY_TIMEOUT_SECONDS`: per-request timeout, default `300`
163
+ - `CODEX_PROXY_MAX_CONCURRENCY`: maximum concurrent Codex executions, default `1`
164
+ - `CODEX_PROXY_LOG_LEVEL`: Uvicorn log level, default `info`
165
+
166
+ ## API
167
+
168
+ Health:
169
+
170
+ ```bash
171
+ curl -sS http://127.0.0.1:8765/health
172
+ ```
173
+
174
+ Models:
175
+
176
+ ```bash
177
+ curl -sS http://127.0.0.1:8765/v1/models
178
+ ```
179
+
180
+ Readiness:
181
+
182
+ ```bash
183
+ curl -sS http://127.0.0.1:8765/ready
184
+ ```
185
+
186
+ Local counters:
187
+
188
+ ```bash
189
+ curl -sS http://127.0.0.1:8765/metrics
190
+ ```
191
+
192
+ Chat completion:
193
+
194
+ ```bash
195
+ curl -sS http://127.0.0.1:8765/v1/chat/completions \
196
+ -H 'Content-Type: application/json' \
197
+ -d '{"model":"codex-local","messages":[{"role":"user","content":"Reply with exactly: pong"}]}'
198
+ ```
199
+
200
+ Streaming chat completion:
201
+
202
+ ```bash
203
+ curl -N http://127.0.0.1:8765/v1/chat/completions \
204
+ -H 'Content-Type: application/json' \
205
+ -d '{"model":"codex-local","stream":true,"messages":[{"role":"user","content":"Reply with exactly: pong"}]}'
206
+ ```
207
+
208
+ Streaming responses use OpenAI-compatible SSE events:
209
+
210
+ - `data: {"object":"chat.completion.chunk",...}` for assistant chunks
211
+ - `data: [DONE]` when the response is complete
212
+
213
+ With the default `exec` engine, the proxy streams at the HTTP protocol layer. The underlying Codex CLI currently provides the assistant answer through `codex exec --json`; if Codex only emits final assistant text for a request, the streamed content chunk will arrive after Codex completes.
214
+
215
+ With `--engine app-server`, the proxy maps Codex `item/agentMessage/delta` notifications to OpenAI-compatible SSE content chunks. This is experimental because Codex's app-server protocol is itself experimental.
216
+
217
+ ## Compatibility
218
+
219
+ `codex-api-proxy` is OpenAI-compatible for the local chat-completions shape, not a complete OpenAI API implementation.
220
+
221
+ Supported:
222
+
223
+ - `GET /v1/models`
224
+ - `POST /v1/chat/completions`
225
+ - `model`
226
+ - `messages`
227
+ - `stream`
228
+ - `metadata.cwd` for request-scoped working directory selection inside `--allowed-root`
229
+ - OpenAI-compatible non-streaming response envelope
230
+ - OpenAI-compatible SSE chunk envelope for streaming responses
231
+
232
+ Accepted but currently ignored:
233
+
234
+ - `temperature`
235
+ - `top_p`
236
+ - `max_tokens`
237
+ - `presence_penalty`
238
+ - `frequency_penalty`
239
+
240
+ Not supported:
241
+
242
+ - `tools` and `tool_choice`
243
+ - `response_format`
244
+ - `n` greater than one
245
+ - `stop`
246
+ - embeddings, responses, assistants, files, batches, audio, images, and other OpenAI endpoints
247
+ - accurate token `usage`; the response currently returns zero token counts because Codex CLI does not expose stable token accounting through this path
248
+
249
+ The app-server engine starts a fresh Codex thread for each API request and archives it after completion. Callers must include the full chat history in `messages`; `codex-api-proxy` does not preserve conversation state between API requests.
250
+
251
+ OpenAI Python SDK smoke test:
252
+
253
+ ```python
254
+ from openai import OpenAI
255
+
256
+ client = OpenAI(base_url="http://127.0.0.1:8765/v1", api_key="local-secret")
257
+
258
+ response = client.chat.completions.create(
259
+ model="codex-local",
260
+ messages=[{"role": "user", "content": "Reply with exactly: pong"}],
261
+ )
262
+ print(response.choices[0].message.content)
263
+ ```
264
+
265
+ When no `--api-key` is configured, most OpenAI SDKs still require a placeholder `api_key`; any non-empty value is fine.
266
+
267
+ ## Operations
268
+
269
+ Use `/health` for a lightweight process check and `/ready` for a readiness check that includes the selected engine and Codex executable availability. Use `/metrics` for local JSON counters:
270
+
271
+ - `requests_total`
272
+ - `requests_ok`
273
+ - `requests_error`
274
+ - `errors_by_status`
275
+ - `engine`
276
+ - `uptime_seconds`
277
+ - `app_server_pool_started`
278
+
279
+ Daemon logs are written to `~/.codex-api-proxy/codex-api-proxy.log` by default. `codex-api-proxy` does not rotate logs itself; use your OS log rotation mechanism if you run it long-term.
280
+
281
+ Latency logs:
282
+
283
+ Each chat completion writes a single-line JSON log with logger `codex_api_proxy.latency` and event `chat_completion_latency`. Streaming responses also write `chat_completion_first_sse` when the first SSE chunk is yielded.
284
+
285
+ For background daemon runs, inspect:
286
+
287
+ ```bash
288
+ rg 'codex_api_proxy.latency|chat_completion_latency|chat_completion_first_sse' ~/.codex-api-proxy/codex-api-proxy.log
289
+ ```
290
+
291
+ Important fields:
292
+
293
+ - `request_id`: correlates latency lines for the same request
294
+ - `stream`: whether the request used `stream: true`
295
+ - `engine`: `exec` or `app-server`
296
+ - `phases_ms.cwd_resolve`: cwd validation time
297
+ - `phases_ms.prompt_build`: OpenAI messages to Codex prompt conversion time
298
+ - `phases_ms.queue_wait`: time waiting for local admission before engine execution
299
+ - `phases_ms.codex_exec`: time spent inside `codex exec`
300
+ - `phases_ms.app_server_exec`: time spent inside the app-server worker turn
301
+ - `phases_ms.codex_command_build`: Codex command construction time
302
+ - `phases_ms.codex_process_spawn`: local subprocess spawn time
303
+ - `phases_ms.codex_stdin_write`: prompt write and stdin close time
304
+ - `phases_ms.codex_first_stdout_event`: elapsed time from Codex IO start until the first non-empty stdout JSONL line
305
+ - `phases_ms.codex_first_assistant_event`: elapsed time from Codex IO start until the first assistant message event
306
+ - `phases_ms.codex_stdout_read`: total time spent reading Codex stdout until EOF
307
+ - `phases_ms.codex_process_wait`: time waiting for the Codex process after stdout EOF
308
+ - `phases_ms.codex_communicate`: total Codex subprocess IO time
309
+ - `phases_ms.codex_output_parse`: Codex JSONL final-message parse time
310
+ - `phases_ms.response_build`: response object/SSE setup time
311
+ - `phases_ms.total`: total server-side request time before response is ready
312
+ - `time_to_first_sse_ms`: stream request time until the first SSE chunk is yielded
313
+ - `time_to_first_content_sse_ms`: app-server stream request time until the first content chunk is yielded
314
+
315
+ With auth:
316
+
317
+ ```bash
318
+ curl -sS http://127.0.0.1:8765/v1/chat/completions \
319
+ -H 'Authorization: Bearer local-secret' \
320
+ -H 'Content-Type: application/json' \
321
+ -d '{"model":"codex-local","messages":[{"role":"user","content":"Reply with exactly: pong"}]}'
322
+ ```