PyPI - codex-api-proxy - Versions diffs - 0.1.0__tar.gz - Mend

codex-api-proxy 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

codex_api_proxy-0.1.0/PKG-INFO +347 -0
codex_api_proxy-0.1.0/README.md +322 -0
codex_api_proxy-0.1.0/pyproject.toml +42 -0
codex_api_proxy-0.1.0/setup.cfg +4 -0
codex_api_proxy-0.1.0/src/codex_api_proxy/__init__.py +3 -0
codex_api_proxy-0.1.0/src/codex_api_proxy/app_server_runner.py +554 -0
codex_api_proxy-0.1.0/src/codex_api_proxy/cli.py +570 -0
codex_api_proxy-0.1.0/src/codex_api_proxy/codex_runner.py +278 -0
codex_api_proxy-0.1.0/src/codex_api_proxy/config.py +83 -0
codex_api_proxy-0.1.0/src/codex_api_proxy/main.py +561 -0
codex_api_proxy-0.1.0/src/codex_api_proxy/prompt.py +31 -0
codex_api_proxy-0.1.0/src/codex_api_proxy/schemas.py +48 -0
codex_api_proxy-0.1.0/src/codex_api_proxy.egg-info/PKG-INFO +347 -0
codex_api_proxy-0.1.0/src/codex_api_proxy.egg-info/SOURCES.txt +23 -0
codex_api_proxy-0.1.0/src/codex_api_proxy.egg-info/dependency_links.txt +1 -0
codex_api_proxy-0.1.0/src/codex_api_proxy.egg-info/entry_points.txt +2 -0
codex_api_proxy-0.1.0/src/codex_api_proxy.egg-info/requires.txt +8 -0
codex_api_proxy-0.1.0/src/codex_api_proxy.egg-info/top_level.txt +1 -0
codex_api_proxy-0.1.0/tests/test_api.py +366 -0
codex_api_proxy-0.1.0/tests/test_app_server_runner.py +197 -0
codex_api_proxy-0.1.0/tests/test_cli.py +565 -0
codex_api_proxy-0.1.0/tests/test_codex_runner.py +209 -0
codex_api_proxy-0.1.0/tests/test_config.py +50 -0
codex_api_proxy-0.1.0/tests/test_prompt.py +36 -0
codex_api_proxy-0.1.0/tests/test_release_version.py +50 -0

codex_api_proxy-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,347 @@
+Metadata-Version: 2.4
+Name: codex-api-proxy
+Version: 0.1.0
+Summary: Local OpenAI-compatible HTTP proxy backed by Codex CLI
+Author: codex-api-proxy contributors
+License-Expression: MIT
+Keywords: codex,openai,proxy,api,local
+Classifier: Development Status :: 3 - Alpha
+Classifier: Environment :: Console
+Classifier: Framework :: FastAPI
+Classifier: Intended Audience :: Developers
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Requires-Python: >=3.11
+Description-Content-Type: text/markdown
+Requires-Dist: fastapi<1,>=0.115
+Requires-Dist: pydantic<3,>=2.7
+Requires-Dist: uvicorn[standard]<1,>=0.30
+Provides-Extra: dev
+Requires-Dist: httpx<1,>=0.27; extra == "dev"
+Requires-Dist: pytest<9,>=8; extra == "dev"
+Requires-Dist: pytest-asyncio<1,>=0.23; extra == "dev"
+# codex-api-proxy
+Local OpenAI-compatible HTTP proxy backed by local Codex credentials.
+This project exposes a minimal `/v1/chat/completions` API for local automation. By default, requests are executed through `codex exec --json --skip-git-repo-check --ignore-user-config --ignore-rules --sandbox read-only --ephemeral`, using the local Codex installation and its existing authentication.
+## Safety
+The proxy defaults to `127.0.0.1` and should not be exposed publicly. Any client with access can spend your local Codex quota and can ask Codex to inspect files that are available to the selected Codex sandbox and workspace.
+Set `CODEX_PROXY_API_KEY` to require `Authorization: Bearer <key>` on API requests.
+If you start with `--host 0.0.0.0` or another non-loopback bind address without `--api-key`, `codex-api-proxy` prints a warning. Use a bearer token before exposing the service to anything other than a trusted local machine.
+With the default `exec` engine, Codex subprocesses are launched with `--ignore-user-config` and `--ignore-rules`. This prevents proxy requests from loading user Codex config, MCP servers, plugins, skills, and rule files.
+Codex subprocesses also use `--sandbox read-only` and `--ephemeral` by default. This keeps calls closer to one-shot model calls where the caller owns conversation context.
+The experimental `app-server` engine uses Codex's long-lived app-server protocol to reduce process startup latency and stream assistant deltas. Each API request starts a fresh Codex thread and archives it after completion, so callers must continue sending full chat history in `messages`. The app-server process uses an isolated `CODEX_HOME` at `~/.codex-api-proxy/codex-home` by default. `codex-api-proxy` symlinks only the current Codex `auth.json` into that isolated home, so the app-server worker can reuse the existing login while not seeing the current user's `config.toml`, MCP config, or plugins. The app-server process is also started with `--disable apps`, `--disable plugins`, `--disable skill_mcp_dependency_install`, and `-c mcp_servers={}`. To keep skills out of the model-visible prompt, `codex-api-proxy` generates a `skills.config=[{name=...,enabled=false}]` override for known system skills and locally discovered skill names. Each request uses an empty `dynamicTools` list, empty `environments`, `approvalPolicy: never`, `sandbox: read-only`, and `ephemeral: true` by default.
+## Install
+```bash
+pip3 install codex-api-proxy
+```
+For local development from this checkout:
+```bash
+python3 -m pip install -e '.[dev]'
+```
+Make targets are available for local build and release tasks:
+```bash
+make build-tools
+make test
+make build
+make release-check
+make publish VERSION=0.1.1
+```
+`make publish VERSION=...` first syncs that version into `pyproject.toml` and `src/codex_api_proxy/__init__.py`, then runs tests, builds the package, validates the generated artifacts, and uploads them to PyPI.
+## Run
+Start in the background:
+```bash
+codex-api-proxy start
+```
+By default, the service listens on `127.0.0.1:8765`.
+The default Codex working directory is an empty workspace at `~/.codex-api-proxy/workspace`.
+Bind to all interfaces:
+```bash
+codex-api-proxy start --host 0.0.0.0
+```
+Check status:
+```bash
+codex-api-proxy status
+```
+Show saved runtime settings:
+```bash
+codex-api-proxy status --verbose
+```
+Restart with the last successful `start` settings:
+```bash
+codex-api-proxy restart
+```
+Restart and override one setting:
+```bash
+codex-api-proxy restart --proxy=http://127.0.0.1:8118
+```
+Start with faster defaults:
+```bash
+codex-api-proxy start --fast
+```
+Start with experimental long-lived app-server workers:
+```bash
+codex-api-proxy start --engine app-server --workers 2
+```
+Start with an outbound proxy, faster defaults, and multiple app-server workers:
+```bash
+codex-api-proxy start --proxy=http://127.0.0.1:8118 --fast --engine app-server --workers 4
+```
+Stop:
+```bash
+codex-api-proxy stop
+```
+Run in the foreground for debugging:
+```bash
+codex-api-proxy start --foreground
+```
+## Configuration
+CLI options:
+- `--host`: bind host, default `127.0.0.1`
+- `--port`: bind port, default `8765`
+- `--api-key`: require bearer auth
+- `--codex-bin`: Codex executable, default `codex`
+- `--proxy`: proxy URL passed to Codex as `http_proxy` and `https_proxy`
+- `--model`: model passed to Codex
+- `--engine`: execution engine, `exec` or `app-server`, default `exec`
+- `--workers`: number of long-lived `app-server` workers, default `1`
+- `--max-queue-size`: maximum queued `app-server` requests before returning `429`, default `64`
+- `--queue-timeout-seconds`: maximum time to wait for an `app-server` worker, default `30`
+- `--app-server-codex-home`: isolated `CODEX_HOME` used by `app-server` workers, default `~/.codex-api-proxy/codex-home`
+- `--codex-config`: Codex config override passed as `-c key=value`, repeatable
+- `--ephemeral`: run `codex exec` with `--ephemeral`, enabled by default
+- `--fast`: use fast defaults: `--codex-config model_reasoning_effort="low"`
+- `--default-cwd`: default Codex working directory, default `~/.codex-api-proxy/workspace`
+- `--allowed-root`: allowed cwd root, repeatable, default `--default-cwd`
+- `--timeout-seconds`: per-request timeout, default `300`
+- `--max-concurrency`: maximum concurrent Codex executions, default `1`
+- `--log-level`: Uvicorn log level, one of `debug`, `info`, `warning`, or `error`, default `info`
+- `--pid-file`: daemon pid file, default `~/.codex-api-proxy/codex-api-proxy.pid`
+- `--log-file`: daemon log file for `start`, default `~/.codex-api-proxy/codex-api-proxy.log`
+- `--state-file`: daemon state file, default `~/.codex-api-proxy/codex-api-proxy.state.json`
+`start` prints the state file path and the effective startup parameters. The state file is written with `0600` permissions and is used by `restart` to reuse the previous start settings. If `--api-key` is used, the key is redacted in terminal output but stored in the state file so `restart` can reuse it.
+Environment variables are also supported when running the FastAPI app directly:
+- `CODEX_PROXY_HOST`: bind host, default `127.0.0.1`
+- `CODEX_PROXY_PORT`: bind port, default `8765`
+- `CODEX_PROXY_API_KEY`: optional bearer token
+- `CODEX_PROXY_CODEX_BIN`: Codex executable, default `codex`
+- `CODEX_PROXY_PROXY`: proxy URL passed to Codex
+- `CODEX_PROXY_MODEL`: model passed to Codex
+- `CODEX_PROXY_ENGINE`: execution engine, `exec` or `app-server`, default `exec`
+- `CODEX_PROXY_WORKERS`: number of long-lived `app-server` workers, default `1`
+- `CODEX_PROXY_MAX_QUEUE_SIZE`: maximum queued `app-server` requests, default `64`
+- `CODEX_PROXY_QUEUE_TIMEOUT_SECONDS`: maximum time to wait for an `app-server` worker, default `30`
+- `CODEX_PROXY_APP_SERVER_CODEX_HOME`: isolated `CODEX_HOME` used by `app-server` workers
+- `CODEX_PROXY_CODEX_CONFIGS`: `;;`-separated Codex config overrides passed as repeated `-c`
+- `CODEX_PROXY_EPHEMERAL`: set to `1`, `true`, or `yes` to run `codex exec` with `--ephemeral`; defaults to `true`
+- `CODEX_PROXY_DEFAULT_CWD`: default Codex working directory, default current directory
+- `CODEX_PROXY_ALLOWED_ROOTS`: colon-separated allowed cwd roots, default `CODEX_PROXY_DEFAULT_CWD`
+- `CODEX_PROXY_TIMEOUT_SECONDS`: per-request timeout, default `300`
+- `CODEX_PROXY_MAX_CONCURRENCY`: maximum concurrent Codex executions, default `1`
+- `CODEX_PROXY_LOG_LEVEL`: Uvicorn log level, default `info`
+## API
+Health:
+```bash
+curl -sS http://127.0.0.1:8765/health
+```
+Models:
+```bash
+curl -sS http://127.0.0.1:8765/v1/models
+```
+Readiness:
+```bash
+curl -sS http://127.0.0.1:8765/ready
+```
+Local counters:
+```bash
+curl -sS http://127.0.0.1:8765/metrics
+```
+Chat completion:
+```bash
+curl -sS http://127.0.0.1:8765/v1/chat/completions \
+  -H 'Content-Type: application/json' \
+  -d '{"model":"codex-local","messages":[{"role":"user","content":"Reply with exactly: pong"}]}'
+```
+Streaming chat completion:
+```bash
+curl -N http://127.0.0.1:8765/v1/chat/completions \
+  -H 'Content-Type: application/json' \
+  -d '{"model":"codex-local","stream":true,"messages":[{"role":"user","content":"Reply with exactly: pong"}]}'
+```
+Streaming responses use OpenAI-compatible SSE events:
+- `data: {"object":"chat.completion.chunk",...}` for assistant chunks
+- `data: [DONE]` when the response is complete
+With the default `exec` engine, the proxy streams at the HTTP protocol layer. The underlying Codex CLI currently provides the assistant answer through `codex exec --json`; if Codex only emits final assistant text for a request, the streamed content chunk will arrive after Codex completes.
+With `--engine app-server`, the proxy maps Codex `item/agentMessage/delta` notifications to OpenAI-compatible SSE content chunks. This is experimental because Codex's app-server protocol is itself experimental.
+## Compatibility
+`codex-api-proxy` is OpenAI-compatible for the local chat-completions shape, not a complete OpenAI API implementation.
+Supported:
+- `GET /v1/models`
+- `POST /v1/chat/completions`
+- `model`
+- `messages`
+- `stream`
+- `metadata.cwd` for request-scoped working directory selection inside `--allowed-root`
+- OpenAI-compatible non-streaming response envelope
+- OpenAI-compatible SSE chunk envelope for streaming responses
+Accepted but currently ignored:
+- `temperature`
+- `top_p`
+- `max_tokens`
+- `presence_penalty`
+- `frequency_penalty`
+Not supported:
+- `tools` and `tool_choice`
+- `response_format`
+- `n` greater than one
+- `stop`
+- embeddings, responses, assistants, files, batches, audio, images, and other OpenAI endpoints
+- accurate token `usage`; the response currently returns zero token counts because Codex CLI does not expose stable token accounting through this path
+The app-server engine starts a fresh Codex thread for each API request and archives it after completion. Callers must include the full chat history in `messages`; `codex-api-proxy` does not preserve conversation state between API requests.
+OpenAI Python SDK smoke test:
+```python
+from openai import OpenAI
+client = OpenAI(base_url="http://127.0.0.1:8765/v1", api_key="local-secret")
+response = client.chat.completions.create(
+    model="codex-local",
+    messages=[{"role": "user", "content": "Reply with exactly: pong"}],
+)
+print(response.choices[0].message.content)
+```
+When no `--api-key` is configured, most OpenAI SDKs still require a placeholder `api_key`; any non-empty value is fine.
+## Operations
+Use `/health` for a lightweight process check and `/ready` for a readiness check that includes the selected engine and Codex executable availability. Use `/metrics` for local JSON counters:
+- `requests_total`
+- `requests_ok`
+- `requests_error`
+- `errors_by_status`
+- `engine`
+- `uptime_seconds`
+- `app_server_pool_started`
+Daemon logs are written to `~/.codex-api-proxy/codex-api-proxy.log` by default. `codex-api-proxy` does not rotate logs itself; use your OS log rotation mechanism if you run it long-term.
+Latency logs:
+Each chat completion writes a single-line JSON log with logger `codex_api_proxy.latency` and event `chat_completion_latency`. Streaming responses also write `chat_completion_first_sse` when the first SSE chunk is yielded.
+For background daemon runs, inspect:
+```bash
+rg 'codex_api_proxy.latency|chat_completion_latency|chat_completion_first_sse' ~/.codex-api-proxy/codex-api-proxy.log
+```
+Important fields:
+- `request_id`: correlates latency lines for the same request
+- `stream`: whether the request used `stream: true`
+- `engine`: `exec` or `app-server`
+- `phases_ms.cwd_resolve`: cwd validation time
+- `phases_ms.prompt_build`: OpenAI messages to Codex prompt conversion time
+- `phases_ms.queue_wait`: time waiting for local admission before engine execution
+- `phases_ms.codex_exec`: time spent inside `codex exec`
+- `phases_ms.app_server_exec`: time spent inside the app-server worker turn
+- `phases_ms.codex_command_build`: Codex command construction time
+- `phases_ms.codex_process_spawn`: local subprocess spawn time
+- `phases_ms.codex_stdin_write`: prompt write and stdin close time
+- `phases_ms.codex_first_stdout_event`: elapsed time from Codex IO start until the first non-empty stdout JSONL line
+- `phases_ms.codex_first_assistant_event`: elapsed time from Codex IO start until the first assistant message event
+- `phases_ms.codex_stdout_read`: total time spent reading Codex stdout until EOF
+- `phases_ms.codex_process_wait`: time waiting for the Codex process after stdout EOF
+- `phases_ms.codex_communicate`: total Codex subprocess IO time
+- `phases_ms.codex_output_parse`: Codex JSONL final-message parse time
+- `phases_ms.response_build`: response object/SSE setup time
+- `phases_ms.total`: total server-side request time before response is ready
+- `time_to_first_sse_ms`: stream request time until the first SSE chunk is yielded
+- `time_to_first_content_sse_ms`: app-server stream request time until the first content chunk is yielded
+With auth:
+```bash
+curl -sS http://127.0.0.1:8765/v1/chat/completions \
+  -H 'Authorization: Bearer local-secret' \
+  -H 'Content-Type: application/json' \
+  -d '{"model":"codex-local","messages":[{"role":"user","content":"Reply with exactly: pong"}]}'
+```

codex_api_proxy-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,322 @@
+# codex-api-proxy
+Local OpenAI-compatible HTTP proxy backed by local Codex credentials.
+This project exposes a minimal `/v1/chat/completions` API for local automation. By default, requests are executed through `codex exec --json --skip-git-repo-check --ignore-user-config --ignore-rules --sandbox read-only --ephemeral`, using the local Codex installation and its existing authentication.
+## Safety
+The proxy defaults to `127.0.0.1` and should not be exposed publicly. Any client with access can spend your local Codex quota and can ask Codex to inspect files that are available to the selected Codex sandbox and workspace.
+Set `CODEX_PROXY_API_KEY` to require `Authorization: Bearer <key>` on API requests.
+If you start with `--host 0.0.0.0` or another non-loopback bind address without `--api-key`, `codex-api-proxy` prints a warning. Use a bearer token before exposing the service to anything other than a trusted local machine.
+With the default `exec` engine, Codex subprocesses are launched with `--ignore-user-config` and `--ignore-rules`. This prevents proxy requests from loading user Codex config, MCP servers, plugins, skills, and rule files.
+Codex subprocesses also use `--sandbox read-only` and `--ephemeral` by default. This keeps calls closer to one-shot model calls where the caller owns conversation context.
+The experimental `app-server` engine uses Codex's long-lived app-server protocol to reduce process startup latency and stream assistant deltas. Each API request starts a fresh Codex thread and archives it after completion, so callers must continue sending full chat history in `messages`. The app-server process uses an isolated `CODEX_HOME` at `~/.codex-api-proxy/codex-home` by default. `codex-api-proxy` symlinks only the current Codex `auth.json` into that isolated home, so the app-server worker can reuse the existing login while not seeing the current user's `config.toml`, MCP config, or plugins. The app-server process is also started with `--disable apps`, `--disable plugins`, `--disable skill_mcp_dependency_install`, and `-c mcp_servers={}`. To keep skills out of the model-visible prompt, `codex-api-proxy` generates a `skills.config=[{name=...,enabled=false}]` override for known system skills and locally discovered skill names. Each request uses an empty `dynamicTools` list, empty `environments`, `approvalPolicy: never`, `sandbox: read-only`, and `ephemeral: true` by default.
+## Install
+```bash
+pip3 install codex-api-proxy
+```
+For local development from this checkout:
+```bash
+python3 -m pip install -e '.[dev]'
+```
+Make targets are available for local build and release tasks:
+```bash
+make build-tools
+make test
+make build
+make release-check
+make publish VERSION=0.1.1
+```
+`make publish VERSION=...` first syncs that version into `pyproject.toml` and `src/codex_api_proxy/__init__.py`, then runs tests, builds the package, validates the generated artifacts, and uploads them to PyPI.
+## Run
+Start in the background:
+```bash
+codex-api-proxy start
+```
+By default, the service listens on `127.0.0.1:8765`.
+The default Codex working directory is an empty workspace at `~/.codex-api-proxy/workspace`.
+Bind to all interfaces:
+```bash
+codex-api-proxy start --host 0.0.0.0
+```
+Check status:
+```bash
+codex-api-proxy status
+```
+Show saved runtime settings:
+```bash
+codex-api-proxy status --verbose
+```
+Restart with the last successful `start` settings:
+```bash
+codex-api-proxy restart
+```
+Restart and override one setting:
+```bash
+codex-api-proxy restart --proxy=http://127.0.0.1:8118
+```
+Start with faster defaults:
+```bash
+codex-api-proxy start --fast
+```
+Start with experimental long-lived app-server workers:
+```bash
+codex-api-proxy start --engine app-server --workers 2
+```
+Start with an outbound proxy, faster defaults, and multiple app-server workers:
+```bash
+codex-api-proxy start --proxy=http://127.0.0.1:8118 --fast --engine app-server --workers 4
+```
+Stop:
+```bash
+codex-api-proxy stop
+```
+Run in the foreground for debugging:
+```bash
+codex-api-proxy start --foreground
+```
+## Configuration
+CLI options:
+- `--host`: bind host, default `127.0.0.1`
+- `--port`: bind port, default `8765`
+- `--api-key`: require bearer auth
+- `--codex-bin`: Codex executable, default `codex`
+- `--proxy`: proxy URL passed to Codex as `http_proxy` and `https_proxy`
+- `--model`: model passed to Codex
+- `--engine`: execution engine, `exec` or `app-server`, default `exec`
+- `--workers`: number of long-lived `app-server` workers, default `1`
+- `--max-queue-size`: maximum queued `app-server` requests before returning `429`, default `64`
+- `--queue-timeout-seconds`: maximum time to wait for an `app-server` worker, default `30`
+- `--app-server-codex-home`: isolated `CODEX_HOME` used by `app-server` workers, default `~/.codex-api-proxy/codex-home`
+- `--codex-config`: Codex config override passed as `-c key=value`, repeatable
+- `--ephemeral`: run `codex exec` with `--ephemeral`, enabled by default
+- `--fast`: use fast defaults: `--codex-config model_reasoning_effort="low"`
+- `--default-cwd`: default Codex working directory, default `~/.codex-api-proxy/workspace`
+- `--allowed-root`: allowed cwd root, repeatable, default `--default-cwd`
+- `--timeout-seconds`: per-request timeout, default `300`
+- `--max-concurrency`: maximum concurrent Codex executions, default `1`
+- `--log-level`: Uvicorn log level, one of `debug`, `info`, `warning`, or `error`, default `info`
+- `--pid-file`: daemon pid file, default `~/.codex-api-proxy/codex-api-proxy.pid`
+- `--log-file`: daemon log file for `start`, default `~/.codex-api-proxy/codex-api-proxy.log`
+- `--state-file`: daemon state file, default `~/.codex-api-proxy/codex-api-proxy.state.json`
+`start` prints the state file path and the effective startup parameters. The state file is written with `0600` permissions and is used by `restart` to reuse the previous start settings. If `--api-key` is used, the key is redacted in terminal output but stored in the state file so `restart` can reuse it.
+Environment variables are also supported when running the FastAPI app directly:
+- `CODEX_PROXY_HOST`: bind host, default `127.0.0.1`
+- `CODEX_PROXY_PORT`: bind port, default `8765`
+- `CODEX_PROXY_API_KEY`: optional bearer token
+- `CODEX_PROXY_CODEX_BIN`: Codex executable, default `codex`
+- `CODEX_PROXY_PROXY`: proxy URL passed to Codex
+- `CODEX_PROXY_MODEL`: model passed to Codex
+- `CODEX_PROXY_ENGINE`: execution engine, `exec` or `app-server`, default `exec`
+- `CODEX_PROXY_WORKERS`: number of long-lived `app-server` workers, default `1`
+- `CODEX_PROXY_MAX_QUEUE_SIZE`: maximum queued `app-server` requests, default `64`
+- `CODEX_PROXY_QUEUE_TIMEOUT_SECONDS`: maximum time to wait for an `app-server` worker, default `30`
+- `CODEX_PROXY_APP_SERVER_CODEX_HOME`: isolated `CODEX_HOME` used by `app-server` workers
+- `CODEX_PROXY_CODEX_CONFIGS`: `;;`-separated Codex config overrides passed as repeated `-c`
+- `CODEX_PROXY_EPHEMERAL`: set to `1`, `true`, or `yes` to run `codex exec` with `--ephemeral`; defaults to `true`
+- `CODEX_PROXY_DEFAULT_CWD`: default Codex working directory, default current directory
+- `CODEX_PROXY_ALLOWED_ROOTS`: colon-separated allowed cwd roots, default `CODEX_PROXY_DEFAULT_CWD`
+- `CODEX_PROXY_TIMEOUT_SECONDS`: per-request timeout, default `300`
+- `CODEX_PROXY_MAX_CONCURRENCY`: maximum concurrent Codex executions, default `1`
+- `CODEX_PROXY_LOG_LEVEL`: Uvicorn log level, default `info`
+## API
+Health:
+```bash
+curl -sS http://127.0.0.1:8765/health
+```
+Models:
+```bash
+curl -sS http://127.0.0.1:8765/v1/models
+```
+Readiness:
+```bash
+curl -sS http://127.0.0.1:8765/ready
+```
+Local counters:
+```bash
+curl -sS http://127.0.0.1:8765/metrics
+```
+Chat completion:
+```bash
+curl -sS http://127.0.0.1:8765/v1/chat/completions \
+  -H 'Content-Type: application/json' \
+  -d '{"model":"codex-local","messages":[{"role":"user","content":"Reply with exactly: pong"}]}'
+```
+Streaming chat completion:
+```bash
+curl -N http://127.0.0.1:8765/v1/chat/completions \
+  -H 'Content-Type: application/json' \
+  -d '{"model":"codex-local","stream":true,"messages":[{"role":"user","content":"Reply with exactly: pong"}]}'
+```
+Streaming responses use OpenAI-compatible SSE events:
+- `data: {"object":"chat.completion.chunk",...}` for assistant chunks
+- `data: [DONE]` when the response is complete
+With the default `exec` engine, the proxy streams at the HTTP protocol layer. The underlying Codex CLI currently provides the assistant answer through `codex exec --json`; if Codex only emits final assistant text for a request, the streamed content chunk will arrive after Codex completes.
+With `--engine app-server`, the proxy maps Codex `item/agentMessage/delta` notifications to OpenAI-compatible SSE content chunks. This is experimental because Codex's app-server protocol is itself experimental.
+## Compatibility
+`codex-api-proxy` is OpenAI-compatible for the local chat-completions shape, not a complete OpenAI API implementation.
+Supported:
+- `GET /v1/models`
+- `POST /v1/chat/completions`
+- `model`
+- `messages`
+- `stream`
+- `metadata.cwd` for request-scoped working directory selection inside `--allowed-root`
+- OpenAI-compatible non-streaming response envelope
+- OpenAI-compatible SSE chunk envelope for streaming responses
+Accepted but currently ignored:
+- `temperature`
+- `top_p`
+- `max_tokens`
+- `presence_penalty`
+- `frequency_penalty`
+Not supported:
+- `tools` and `tool_choice`
+- `response_format`
+- `n` greater than one
+- `stop`
+- embeddings, responses, assistants, files, batches, audio, images, and other OpenAI endpoints
+- accurate token `usage`; the response currently returns zero token counts because Codex CLI does not expose stable token accounting through this path
+The app-server engine starts a fresh Codex thread for each API request and archives it after completion. Callers must include the full chat history in `messages`; `codex-api-proxy` does not preserve conversation state between API requests.
+OpenAI Python SDK smoke test:
+```python
+from openai import OpenAI
+client = OpenAI(base_url="http://127.0.0.1:8765/v1", api_key="local-secret")
+response = client.chat.completions.create(
+    model="codex-local",
+    messages=[{"role": "user", "content": "Reply with exactly: pong"}],
+)
+print(response.choices[0].message.content)
+```
+When no `--api-key` is configured, most OpenAI SDKs still require a placeholder `api_key`; any non-empty value is fine.
+## Operations
+Use `/health` for a lightweight process check and `/ready` for a readiness check that includes the selected engine and Codex executable availability. Use `/metrics` for local JSON counters:
+- `requests_total`
+- `requests_ok`
+- `requests_error`
+- `errors_by_status`
+- `engine`
+- `uptime_seconds`
+- `app_server_pool_started`
+Daemon logs are written to `~/.codex-api-proxy/codex-api-proxy.log` by default. `codex-api-proxy` does not rotate logs itself; use your OS log rotation mechanism if you run it long-term.
+Latency logs:
+Each chat completion writes a single-line JSON log with logger `codex_api_proxy.latency` and event `chat_completion_latency`. Streaming responses also write `chat_completion_first_sse` when the first SSE chunk is yielded.
+For background daemon runs, inspect:
+```bash
+rg 'codex_api_proxy.latency|chat_completion_latency|chat_completion_first_sse' ~/.codex-api-proxy/codex-api-proxy.log
+```
+Important fields:
+- `request_id`: correlates latency lines for the same request
+- `stream`: whether the request used `stream: true`
+- `engine`: `exec` or `app-server`
+- `phases_ms.cwd_resolve`: cwd validation time
+- `phases_ms.prompt_build`: OpenAI messages to Codex prompt conversion time
+- `phases_ms.queue_wait`: time waiting for local admission before engine execution
+- `phases_ms.codex_exec`: time spent inside `codex exec`
+- `phases_ms.app_server_exec`: time spent inside the app-server worker turn
+- `phases_ms.codex_command_build`: Codex command construction time
+- `phases_ms.codex_process_spawn`: local subprocess spawn time
+- `phases_ms.codex_stdin_write`: prompt write and stdin close time
+- `phases_ms.codex_first_stdout_event`: elapsed time from Codex IO start until the first non-empty stdout JSONL line
+- `phases_ms.codex_first_assistant_event`: elapsed time from Codex IO start until the first assistant message event
+- `phases_ms.codex_stdout_read`: total time spent reading Codex stdout until EOF
+- `phases_ms.codex_process_wait`: time waiting for the Codex process after stdout EOF
+- `phases_ms.codex_communicate`: total Codex subprocess IO time
+- `phases_ms.codex_output_parse`: Codex JSONL final-message parse time
+- `phases_ms.response_build`: response object/SSE setup time
+- `phases_ms.total`: total server-side request time before response is ready
+- `time_to_first_sse_ms`: stream request time until the first SSE chunk is yielded
+- `time_to_first_content_sse_ms`: app-server stream request time until the first content chunk is yielded
+With auth:
+```bash
+curl -sS http://127.0.0.1:8765/v1/chat/completions \
+  -H 'Authorization: Bearer local-secret' \
+  -H 'Content-Type: application/json' \
+  -d '{"model":"codex-local","messages":[{"role":"user","content":"Reply with exactly: pong"}]}'
+```