PyPI - python-codex - Versions diffs - 0.1.13__tar.gz → 0.1.14__tar.gz - Mend

python-codex 0.1.13tar.gz → 0.1.14tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (124) hide show

{python_codex-0.1.13 → python_codex-0.1.14}/AGENTS.md RENAMED Viewed

@@ -18,7 +18,9 @@
 - 真实 vLLM `0.19.0` 的 `/v1/messages` 会对缺失 `max_tokens` 直接返回 `400`；messages 适配层必须总是补这个字段。当前约定是优先透传请求里的 `max_output_tokens`/`max_tokens`，否则回退到默认 `32000`。
 - 对 vLLM chat-completions 打开 `return_token_ids=true` 时，streaming `prompt_token_ids` 只出现在首个 chunk，后续每个 chunk 的 `choices[*].token_ids` 都是 decode delta；要在 `responses_server` 侧导出 trajectory 时，按“首个 `prompt_token_ids` + 按序拼接所有 chunk 的 `token_ids`”重建即可。
 - `pycodex` 默认是最小交互 CLI；无 prompt 时进入 REPL，并通过 `AgentRuntime` 跑外层提交循环。当前会显示最小事件流、assistant 流式输出、简单 title/history（`/title`, `/history`），并默认注册一组与原版一一对应的本地工具子集。
+- Web workspace lives in the standalone `workspace_server/` package and is launched with `pycodex-ws --listen <host:port> --board <html>`, not through `pycodex` CLI dispatch. CLI and web share `pycodex.interactive_session.run_interactive_session`; slash-command semantics such as `/resume`, `/compact`, `/model`, and `/link` belong to that shared interactive shell loop, while workspace only supplies a web view/input adapter and tab/session lifecycle.
 - 交互 CLI 的事件流展示优先表达用户可感知的阶段（例如工具开始/完成、模型回看工具结果），不要直接把内部 `iteration` 计数暴露成主要状态文案；`iterations` 应继续保留在 `TurnResult` 等程序化结果里。
+- 在交互 CLI 里，`stream_error` 表示当前 Responses stream attempt 失败且模型客户端可能马上自动重试；不要在这个事件上 `finish_stream()` 输出当前 assistant delta buffer，否则第一次失败 attempt 的文本和重试成功后的最终回复会重复显示。真正 fatal 的失败仍由 `turn_failed` 走通用 flush，保留 partial 输出。
 - prompt/context 相关逻辑统一放在 `pycodex/context.py`：`AgentLoop` 只维护真实会话历史；每轮请求前由 `ContextManager` 注入 base instructions、developer message、`AGENTS.md` 指令和 `<environment_context>`，且这些注入项不写回 history。
 - 对需要 model-specific prompt 的本地 model slug，直接在 vendored `pycodex/prompts/models.json` 补条目；当前 `step-3.5-flash` / `step-3.5-flash-2603` / `step-3.6` 已按这个方式接入。
 - 交互 REPL 的 context 用量提示也应尽量贴近上游语义：展示“剩余 context 百分比”而不是原始 token 数；计算时按上游同款 `BASELINE_TOKENS=12000` 做归一化，并在模型元数据只有 `context_window` 时默认按 `95%` effective window 处理。只要当前模型能解析出 context window，初始 prompt 就先显示 `100%`，等首个 usage 回来后再刷新成真实值。
@@ -58,4 +60,9 @@
 - `pycodex` 本地 session 保存现在也按上游思路走：新 session 一开始就分配稳定的 uuidv7 thread/session id，并把历史增量追加到 `CODEX_HOME/sessions/.../rollout-*.jsonl`；`/resume` 列表应只展示至少有真实 user message 的 rollout，避免空白新 session 污染恢复列表。
 - auto-compact 对齐上游配置名 `model_auto_compact_token_limit`；为空时关闭，触发依据是最近一次模型上报的 `usage.total_tokens`，pre-turn 压缩上一轮历史，mid-turn 压缩工具 follow-up 前的当前历史，并继续复用现有 compacted rollout 记录。
 - Responses streaming 里的 `response.incomplete` 不是连接断开：不要让 `ResponsesModelClient` 把它当 retryable incomplete stream 反复重连。普通 turn 应明确报 `response.incomplete`；compact 请求如果已经收到 assistant partial summary，可以用这个 partial summary 完成 replacement history，避免 midturn auto-compact 卡在 5 次 retry。
+- 上游 Codex Responses 请求当前不传模型级 `max_output_tokens`，也没有读取 `model_max_output_tokens` 这个 config key；这个名字在上游主要用于工具输出截断，不要为了上游对齐把它加进模型请求。
+- 普通 turn 遇到 `ResponsesIncompleteError(reason="max_output_tokens")` 时，上游语义是保留异常前已经收到的 `response.output_item.done`；pycodex 因为模型客户端按整轮返回，需要在异常路径把这些 done assistant/reasoning items 写入 history 和 rollout，才能让用户下一句 `continue` 接上。不要把纯 `response.output_text.delta` 合成 history，也不要持久化没有 tool result 的 tool call。
 - Feishu card tests read `~/.codex/.feishu_refresh_token` through production code; when running `tests/test_feishu_card.py` locally, isolate HOME (for example `HOME=/tmp/pycodex-empty-home env -u VIRTUAL_ENV uv run pytest tests/test_feishu_card.py tests/test_feishu_link.py`) unless the test itself controls `HOME`.
+- `lark_oapi.ws.client` creates a module-level asyncio `loop` at import time and `Client.start()` always uses that global. For `/link` long-connection listeners, bind that SDK global to a listener-thread-owned loop before constructing the client, and stop it through private `_disconnect()` plus `loop.stop()` on `/unlink`; otherwise unlink/link can reuse a still-running SDK loop and fail with `RuntimeError: This event loop is already running`.
+- `exec_command` background completion auto-resume is intentionally Agent-idle-only: when a session exits, it may call `Agent.maybe_invoke(...)` and start a synthetic `<exec_command_completed>` turn only if that Agent is not already running a turn. Do not enqueue/cache these events in `CliSubmissionQueue`; direct Agent/IPython use should share the same Agent-level hook.
+- The tool description JSON fallbacks (`pycodex/prompts/exec_tools.json` and `pycodex/prompts/subagent_tools.json`) were deleted after moving schemas into class-level `BaseTool` specs. `ToolSpec.serialize()` intentionally skips function-tool `output_schema`, matching upstream `ResponsesApiTool.output_schema #[serde(skip)]`; keep output schemas as local metadata only unless upstream wire format changes.

{python_codex-0.1.13 → python_codex-0.1.14}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: python-codex
-Version: 0.1.13
+Version: 0.1.14
 Summary: A minimal Python extraction of Codex's main agent loop
 License-File: LICENSE
 Requires-Python: >=3.6.2
@@ -165,12 +165,15 @@ pycodex --put @127.0.0.1:5577
 pycodex --put /data/.codex/@127.0.0.1:5577
 pycodex --call SECRET-CALLID@127.0.0.1:5577 "Reply with exactly OK."
 pycodex doctor
+pycodex-ws --listen 0.0.0.0:6007 --board ./board.html
 ```
 Current behavior:
 - with no argv prompt and a TTY stdin, enter interactive mode
 - with an argv prompt or piped stdin, run a single turn
+- `pycodex-ws` starts the standalone browser workspace with a board pane and a
+  pycodex session pane
 - interactive mode supports `/exit` and `/quit`
 - interactive mode shows a compact event stream for user-visible phases such as
   tool execution and model follow-up after tool results

{python_codex-0.1.13 → python_codex-0.1.14}/README.md RENAMED Viewed

@@ -144,12 +144,15 @@ pycodex --put @127.0.0.1:5577
 pycodex --put /data/.codex/@127.0.0.1:5577
 pycodex --call SECRET-CALLID@127.0.0.1:5577 "Reply with exactly OK."
 pycodex doctor
+pycodex-ws --listen 0.0.0.0:6007 --board ./board.html
 ```
 Current behavior:
 - with no argv prompt and a TTY stdin, enter interactive mode
 - with an argv prompt or piped stdin, run a single turn
+- `pycodex-ws` starts the standalone browser workspace with a board pane and a
+  pycodex session pane
 - interactive mode supports `/exit` and `/quit`
 - interactive mode shows a compact event stream for user-visible phases such as
   tool execution and model follow-up after tool results

{python_codex-0.1.13 → python_codex-0.1.14}/docs/ALIGNMENT.md RENAMED Viewed

@@ -1,17 +1,23 @@
 # Alignment
-This document records the current prompt/context alignment work between
-`pycodex` and upstream Codex from `https://github.com/openai/codex`.
+This document records the current alignment work between `pycodex` and upstream
+Codex from `https://github.com/openai/codex`: prompt/context assembly,
+model-visible tool schemas, and observed tool round-trip behavior.
 ## Scope
-The comparison in this pass focuses on the model-visible prompt assembly:
+The original comparison pass focused on the model-visible prompt assembly:
 - `instructions`
 - `input` items
 - developer/contextual user message shape
 - `AGENTS.md` / environment-context injection
+The current document also tracks tool alignment at two different layers:
+- request-visible payloads captured from real outbound `/responses` requests
+- class-level `BaseTool` descriptions, schemas, and runtime result shapes
 It does not claim full request parity for every runtime mode yet.
 ## Comparison method
@@ -37,10 +43,47 @@ The repository copy of that helper server now lives at
 The temporary capture artifacts used during debugging are intentionally not part
 of the repository contract and are not documented here as stable project files.
+Tool-specific status uses two inputs:
+- proxy captures of actual upstream Codex and `pycodex` requests/results
+- source inspection against the latest upstream tool specifications
+## Status checkpoint (2026-06-23)
+- Latest upstream source checked: `openai/codex` `83c4934`
+  (`2026-06-23 00:31:56 -0700`, `Remove redundant Codex Apps cache guard`).
+- Live request captures in this pass used installed `codex-cli 0.138.0`.
+- Prompt/context parity remains aligned for the compared non-interactive `exec`
+  path and the captured default two-turn `codex-tui` path, modulo dynamic ids.
+- The raw JSON tool fallback files have been deleted:
+  `pycodex/prompts/exec_tools.json` and
+  `pycodex/prompts/subagent_tools.json`.
+- Class-level descriptions/schemas/runtime result shapes have now been refreshed
+  across the default local tool set, not only the tools previously hidden by the
+  JSON fallback.
+- Request-visible tool serialization now comes from class-level `BaseTool`
+  specs. Function-tool `output_schema` remains available as local metadata but
+  is intentionally not serialized into `/responses` requests, matching upstream
+  `ResponsesApiTool.output_schema #[serde(skip)]`.
+- Latest upstream-facing fixes included in the class/runtime layer:
+  `request_user_input.autoResolutionMs` with `[60000, 240000]` clamping,
+  `view_image.detail` (`high` default, `original` opt-in), `close_agent`
+  returning `previous_status`, and `spawn_agent` guidance that spawned agents
+  inherit the current model by default.
+- Post-delete proxy compare:
+  `uv run python tests/compare_tool_schemas.py --root .tmp/tool_schema_proxy_compare_after_fallback_delete_2 --timeout-seconds 240`.
+  The request-visible payloads are now equal for `write_stdin`, `web_search`,
+  `update_plan`, `apply_patch`, and `view_image` on the captured default path.
+  `exec_command` intentionally omits upstream approval/sandbox parameters
+  (`sandbox_permissions`, `justification`, `prefix_rule`) because pycodex skips
+  that authorization path by design; its description also tells the agent that
+  it can reply first for long tasks and will be invoked to continue when the
+  task finishes.
 ## Result
-As of this snapshot, prompt/context parity is achieved for the non-interactive
-`exec` comparison:
+As of this snapshot, prompt/context parity is achieved for the compared
+non-interactive `exec` comparison:
 - `instructions` match exactly
 - `input` match exactly
@@ -48,10 +91,19 @@ As of this snapshot, prompt/context parity is achieved for the non-interactive
 In other words, the model-visible prompt dump for `pycodex` and upstream Codex
 is currently identical for this comparison scenario.
+Tool alignment is also materially improved: all default local tool classes have
+been reviewed/refreshed against the latest upstream-facing specs where a current
+upstream builtin exists. The request-visible payload for the compared official
+tools now comes from class-level specs rather than vendored JSON snapshots.
 ## Current non-prompt status
 After prompt/context parity, the next comparison layer is the full outbound
-request shape. That work is in progress.
+request shape. That work is still layered:
+- request-visible parity for captured paths
+- class-level tool spec/runtime parity after deleting the JSON fallback
+- broader runtime parity for uncaptured modes
 At the time of writing:
@@ -69,7 +121,10 @@ At the time of writing:
 - transport/header parity is now aligned for the compared path, including the
   sub-agent `x-openai-subagent` header and the observed `workspaces` omission
   on later sub-agent turns
-- tool schema parity is aligned for the compared exec-mode tool subset
+- request-visible tool schema parity is aligned for the compared exec-mode and
+  default TUI captured paths where upstream still exposes the same tool names
+- class-level tool descriptions, input schemas, output schemas, and notable
+  runtime result shapes have been refreshed across the default local tool set
 The current implementation already matches:
@@ -80,7 +135,8 @@ The current implementation already matches:
 - session-scoped request id headers
 - turn metadata header shape (`turn_id` + `sandbox`)
 - mode-aware `originator` header
-- exact exec-mode tool schema payloads via vendored snapshot at the tool layer
+- exact exec-mode tool schema payloads on the compared path, now generated from
+  class-level tool specs rather than vendored JSON snapshots
 - `User-Agent` string for the compared non-interactive path
 The main remaining deltas are now outside the prompt dump itself:
@@ -88,6 +144,9 @@ The main remaining deltas are now outside the prompt dump itself:
 - dynamic run-specific values such as generated session ids and turn ids
 - behavior outside the compared non-interactive `exec` path and the captured
   default two-turn TUI path, especially other runtime modes not yet captured
+- upstream's current default-path migration to `tool_search` / deferred
+  multi-agent tools and goal tools, while `pycodex` still exposes the legacy
+  flat sub-agent tools on the first request
 ## Proxy tool-schema compare
@@ -102,21 +161,45 @@ The main remaining deltas are now outside the prompt dump itself:
 - 从 `tests/TESTS.md` 的真实 smoke tool 表读取工具顺序
 - 逐个比较这条被抓到的 request path 里真正暴露给模型的 tool schema
-在当前默认 CLI non-exec / `codex-tui` 这条被抓到的路径上，已经确认 schema
-一致的工具有：
+注意：这项比较验证的是“这条 request 上模型实际看到的 payload”。raw JSON fallback
+删除后，这项比较已经能证明当前被抓路径里的 official tool payload 来自类内
+`BaseTool` spec。
+删除 fallback 后的最新结果：
+- command:
+  `env -u VIRTUAL_ENV uv run python tests/compare_tool_schemas.py --root .tmp/tool_schema_proxy_compare_after_fallback_delete_2 --timeout-seconds 240`
+- upstream request:
+  `.tmp/tool_schema_proxy_compare_after_fallback_delete_2/upstream/008_POST_v1_responses.json`
+- `pycodex` request:
+  `.tmp/tool_schema_proxy_compare_after_fallback_delete_2/pycodex/001_POST_v1_responses.json`
+- comparison:
+  `.tmp/tool_schema_proxy_compare_after_fallback_delete_2/comparison.json`
+在当前默认 CLI non-exec / `codex-tui` 这条被抓到的路径上，已经确认 request-visible
+schema 一致的工具有：
-- `exec_command`
 - `write_stdin`
 - `update_plan`
-- `request_user_input`
 - `apply_patch`
 - `web_search`
 - `view_image`
-- `spawn_agent`
-- `send_input`
-- `resume_agent`
-- `wait_agent`
-- `close_agent`
+仍需分层解释的差异：
+- `exec_command`：`pycodex` 刻意不暴露 upstream 的
+  `sandbox_permissions` / `justification` / `prefix_rule`，因为当前实现明确跳过
+  approval/sandbox escalation 逻辑；description 还额外提示本地差异：长任务可以先回复
+  用户，任务完成时 agent 会被 invoke 来继续处理；其余参数和运行时默认/范围约束按类内
+  schema 对齐。
+- `request_user_input`：`pycodex` 按 upstream source main 建模，带
+  `autoResolutionMs`；installed `codex-cli 0.138.0` 的 live capture 仍未带该字段。
+- `spawn_agent` / `send_input` / `resume_agent` / `wait_agent` /
+  `close_agent`：upstream 当前首轮 request 不再平铺暴露这些工具，而是暴露
+  `tool_search`，并由 deferred discovery 加载 Multi-agent tools。`pycodex` 仍在首轮
+  request 平铺暴露 legacy sub-agent tools。
+- upstream 当前还额外暴露 `get_goal` / `create_goal` / `update_goal`；`pycodex`
+  尚未实现 goal tools。
 同一条被抓到的路径下，当前 upstream Codex 和 `pycodex` 都没有暴露这些工具：
@@ -131,6 +214,7 @@ The main remaining deltas are now outside the prompt dump itself:
 这里的结论只针对当前被抓到的默认 `codex-tui` request path；它不等价于说这些
 工具在上游全局不存在，只说明这次实际 context capture 没把它们带进首轮请求。
+这些工具的类内 description/schema 状态见下面的 per-tool 表。
 ## Tool-call / tool-result schema compare
@@ -185,6 +269,8 @@ The main remaining deltas are now outside the prompt dump itself:
   - Plan-mode happy path 现在也已按 upstream 源码建模：handler 会要求每个问题都带
     非空 `options`、自动给每个问题补 `isOther=true`，并把结构化答案序列化成
     JSON 字符串回传到下一轮 `function_call_output.output`，同时补 `success=true`
+  - 类内 schema 已补齐 upstream 最新的 `autoResolutionMs` 字段；runtime 会把非空值
+    clamp 到 `[60000, 240000]` 后交给交互层
   - 当前仓库已经新增 deterministic proxy compare 脚本
     `uv run python tests/compare_request_user_input_roundtrip.py`
   - 该脚本会用同一套固定 origin SSE + proxy capture，同步比较 upstream Codex
@@ -200,7 +286,9 @@ The main remaining deltas are now outside the prompt dump itself:
   - `function_call` item schema 一致
   - 下一轮里的 `function_call_output` schema 一致
   - 当前样本里，两边都会把 tool result 回传成同一个 `input_image` 列表，
-    `image_url` data URL 也一致；当前抓到的默认样本没有显式 `detail`
+    `image_url` data URL 也一致
+  - 类内 schema/runtime 已补齐 upstream 最新的 `detail` 参数：省略时按 `high`
+    返回，显式 `original` 时保留并回传到 `input_image.detail`
 - `spawn_agent`
   - 当前先补齐了一个最小 validation-path：当模型在没有 `message` / `items` 的情况下
     强制调用 `spawn_agent` 时，upstream Codex 和 `pycodex` 现在都会回传同一个固定错误：
@@ -209,6 +297,8 @@ The main remaining deltas are now outside the prompt dump itself:
     `agent_id` / `nickname`
   - 当前 `pycodex` 也已经改成 uuid7 agent id，并接上了与 upstream 同一批候选名的
     默认昵称池；剩余差异主要只在具体抽到哪个昵称这类动态值
+  - 类内 description 已刷新到 upstream 最新方向：spawned agents 默认继承当前模型，
+    不再在 tool desc 里硬编码模型 picker 列表
 - `send_input`
   - `function_call` item schema 一致
   - 下一轮里的 `function_call_output` schema 一致
@@ -221,8 +311,8 @@ The main remaining deltas are now outside the prompt dump itself:
 - `close_agent`
   - `function_call` item schema 一致
   - 下一轮里的 `function_call_output` schema 一致
-  - 当前仓库已把返回键名从 `previous_status` 改成 `status`，与 upstream 当前 happy
-    path 对齐
+  - upstream 最新源码里的输出键名是 `previous_status`；当前仓库的 schema/runtime 已
+    回到 `previous_status`
 - `resume_agent`
   - 真实 happy path 已补抓：子 agent 完成、`close_agent`、`resume_agent`、再
     `send_input` 的完整链路现在已经对齐
@@ -236,8 +326,8 @@ The main remaining deltas are now outside the prompt dump itself:
   - request body 里的 `prompt_cache_key` 现在也改成和 upstream 一样：
     parent thread 维持自己的稳定 session id，而 sub-agent thread 则改用
     `agent_id` 本身，不再错误复用 parent 的 cache key
-  - 这 6 个 sub-agent tool schema 现在也已经固化到
-    `pycodex/prompts/subagent_tools.json`，并由测试逐字节锁定
+  - 这 6 个 sub-agent tool schema 现在来自类内 `BaseTool` spec，并由 CLI
+    serialization 测试覆盖；`pycodex/prompts/subagent_tools.json` 已删除
 - `sub-agent notification`
   - 在 `wait_agent` 之后，upstream 会向 parent thread history 额外注入一条
     `user` message：
@@ -271,30 +361,37 @@ The main remaining deltas are now outside the prompt dump itself:
 - `not exposed`：在当前默认 `codex-tui` 首轮 request path 下两边都没把这个工具带进 `tools`
 - `first-request same`：首轮 `tools` schema 已确认一致
 - `round-trip same`：真实触发后的 `tool_call` / `tool_result` 外层 schema 已确认一致
+- `class aligned`：类内 description/schema/runtime 已按当前 upstream-facing spec 刷新，
+  且不再依赖 raw JSON fallback
+- `local shim`：本地工具有实现和 smoke 覆盖，但当前 upstream 默认 CLI 抓包没有同名
+  official model-visible tool 可直接逐字节对齐
+- `legacy-flat mismatch`：`pycodex` 首轮仍直接暴露 legacy flat tool；upstream 首轮
+  已迁移到 `tool_search` / deferred discovery
 - `pending`：这条工具链还没有补完真实触发对比
-| tool | current status | note |
-|---|---|---|
-| `shell` | `not exposed` | 当前默认 `codex-tui` 首轮路径不带这个工具 |
-| `shell_command` | `not exposed` | 当前默认 `codex-tui` 首轮路径不带这个工具 |
-| `exec_command` | `round-trip same` | `function_call` / `function_call_output` 外层 shape 一致；默认 `10_000` token 截断和未读输出 `1 MiB` head/tail cap 也已补齐，仅剩动态值差异 |
-| `write_stdin` | `round-trip same` | `function_call` / `function_call_output` 外层 shape 一致；默认 `10_000` token 截断和未读输出 `1 MiB` head/tail cap 也已补齐，仅剩动态值差异 |
-| `exec` | `not exposed` | 当前默认 `codex-tui` 首轮路径不带这个工具 |
-| `wait` | `not exposed` | 当前默认 `codex-tui` 首轮路径不带这个工具 |
-| `web_search` | `round-trip same` | `web_search_call` shape 一致；provider-native tool 无单独客户端 `tool_result` |
-| `update_plan` | `round-trip same` | `function_call` / `function_call_output` 外层 shape 一致 |
-| `request_user_input` | `round-trip same (Default mode); Plan mode delta` | Default-mode unavailable 路径已 capture 对齐；Plan-mode deterministic proxy compare 已补做：本机 installed `codex-cli 0.115.0` 的 live capture 里，`function_call` 已一致，`function_call_output` 仅差 `pycodex` 多带 `success=true` |
-| `request_permissions` | `not exposed` | 当前默认 `codex-tui` 首轮路径不带这个工具 |
-| `apply_patch` | `round-trip same` | `custom_tool_call` / `custom_tool_call_output` 外层 shape 一致；当前样本里输出包装也已对齐，仅剩具体文件路径差异 |
-| `grep_files` | `not exposed` | 当前默认 `codex-tui` 首轮路径不带这个工具 |
-| `read_file` | `not exposed` | 当前默认 `codex-tui` 首轮路径不带这个工具 |
-| `list_dir` | `not exposed` | 当前默认 `codex-tui` 首轮路径不带这个工具 |
-| `view_image` | `round-trip same` | `function_call` / `function_call_output` 外层 shape 一致；当前样本里 `input_image` data URL 也一致 |
-| `spawn_agent` | `round-trip same` | validation-path 与 happy-path 都已补抓；剩余主要是动态 agent id / nickname 值 |
-| `send_input` | `round-trip same` | `function_call` / `function_call_output` 外层 shape 一致；仅剩动态 `submission_id` |
-| `resume_agent` | `round-trip same` | 已补抓真实 happy path；`resume_agent` 后的 `pending_init` 返回值、sub-agent tool 子集、sub-agent context 都已对齐 |
-| `wait_agent` | `round-trip same` | `function_call` / `function_call_output` 外层 shape 一致；仅剩动态 agent id |
-| `close_agent` | `round-trip same` | `function_call` / `function_call_output` 外层 shape 一致；parent-thread notification message 也已补齐 |
+| tool | request-visible status | class/runtime status | note |
+|---|---|---|---|
+| `shell` | `not exposed` | `local shim` | 默认 `codex-tui` 首轮路径不带这个工具；类内 argv 执行语义和 schema 已整理，但没有当前 upstream 默认 CLI 同名 payload 可逐字节对齐 |
+| `shell_command` | `not exposed` | `class aligned` | 默认首轮路径不带；类内 desc/schema 已刷新为 shell-string command 语义 |
+| `exec_command` | `intentional approval-field/description delta; round-trip same` | `class aligned except skipped auth + local idle resume` | 删除 fallback 后不再暴露 `sandbox_permissions` / `justification` / `prefix_rule`，这是 pycodex 刻意跳过鉴权逻辑的差异；description 额外提示长任务可以先回复用户，任务完成时 agent 会被 invoke 来继续处理；其余参数按 schema 执行，`function_call` / `function_call_output` 外层 shape 一致；默认 `10_000` token 截断和未读输出 `1 MiB` head/tail cap 已补齐，仅剩动态值差异 |
+| `write_stdin` | `first-request same; round-trip same` | `class aligned` | 删除 fallback 后首轮 schema 相等；`function_call` / `function_call_output` 外层 shape 一致；默认 `10_000` token 截断和未读输出 `1 MiB` head/tail cap 已补齐，仅剩动态值差异 |
+| `exec` | `not exposed` | `class aligned` | 默认首轮路径不带；code-mode custom/freeform desc 和 grammar 已刷新，仍需 code-mode request-visible 抓包复测 |
+| `wait` | `not exposed` | `class aligned` | 默认首轮路径不带；code-mode wait schema/runtime 已刷新，仍需 code-mode request-visible 抓包复测 |
+| `web_search` | `first-request same; round-trip same` | `class aligned` | 删除 fallback 后 provider-native payload 相等，包含 `search_content_types=["text","image"]`；`web_search_call` shape 一致；provider-native tool 无单独客户端 `tool_result` |
+| `update_plan` | `first-request same; round-trip same` | `class aligned` | 删除 fallback 后首轮 schema 相等；`function_call` / `function_call_output` 外层 shape 一致 |
+| `request_user_input` | `round-trip same (Default mode); Plan mode source-aligned` | `class aligned` | Default-mode unavailable 路径已 capture 对齐；Plan-mode 按 upstream main 建模，包含 `success=true` 和 `autoResolutionMs` clamp；本机 installed `codex-cli 0.115.0` live capture 仍少 `success=true` |
+| `request_permissions` | `not exposed` | `class aligned` | 默认首轮路径不带；类内 desc/schema 已补 `environment_id` passthrough，交互 handler 仍是最小实现 |
+| `apply_patch` | `first-request same; round-trip same` | `class aligned` | 删除 fallback 后 custom grammar 相等；`custom_tool_call` / `custom_tool_call_output` 外层 shape 一致；输出包装已对齐，仅剩具体文件路径差异 |
+| `grep_files` | `not exposed` | `local shim` | 默认首轮路径不带；本地文件搜索 helper 有 schema/smoke，但当前 upstream 默认 CLI 没有同名 official payload 可直接对齐 |
+| `read_file` | `not exposed` | `local shim` | 默认首轮路径不带；本地 slice/indentation 读文件 helper 有 schema/smoke，但当前 upstream 默认 CLI 没有同名 official payload 可直接对齐 |
+| `list_dir` | `not exposed` | `local shim` | 默认首轮路径不带；本地目录树 helper 有 schema/smoke，但当前 upstream 默认 CLI 没有同名 official payload 可直接对齐 |
+| `view_image` | `first-request same; round-trip same` | `class aligned` | 删除 fallback 后首轮 schema 相等；`function_call` / `function_call_output` 外层 shape 一致；类内已支持 `detail=high|original`，默认回传 `high` |
+| `spawn_agent` | `legacy-flat mismatch; round-trip same` | `class aligned` | pycodex 首轮仍平铺暴露此工具，upstream 首轮改用 `tool_search`；历史 validation-path 与 happy-path 已补抓；类内 desc 已去掉硬编码模型 picker |
+| `send_input` | `legacy-flat mismatch; round-trip same` | `class aligned` | pycodex 首轮仍平铺暴露此工具，upstream 首轮改用 `tool_search`；历史 round-trip 外层 shape 一致，仅剩动态 `submission_id` |
+| `resume_agent` | `legacy-flat mismatch; round-trip same` | `class aligned` | pycodex 首轮仍平铺暴露此工具，upstream 首轮改用 `tool_search`；已补抓真实 happy path；`pending_init` 返回值、sub-agent tool 子集、sub-agent context 都已对齐 |
+| `wait_agent` | `legacy-flat mismatch; round-trip same` | `class aligned` | pycodex 首轮仍平铺暴露此工具，upstream 首轮改用 `tool_search`；历史 round-trip 外层 shape 一致，仅剩动态 agent id |
+| `close_agent` | `legacy-flat mismatch; round-trip same` | `class aligned` | pycodex 首轮仍平铺暴露此工具，upstream 首轮改用 `tool_search`；schema/runtime 输出为 upstream 当前的 `previous_status`；parent-thread notification message 也已补齐 |
+| `ipython` | `not in default registry` | `local shim` | 这是可选 IPython attach helper，不属于默认 CLI tool 集合；当前没有 upstream default CLI 同名 payload 对齐目标 |
 ### Redacted example: request-level diff categories
@@ -308,12 +405,20 @@ same:
 - exec-mode tool subset membership
 - request context field presence
 - exec-mode tool schemas
+- current default-path schemas for `write_stdin`, `web_search`, `update_plan`,
+  `apply_patch`, and `view_image`
 - user-agent semantics and compared string
 different:
 - dynamic request metadata values
+- intentional `exec_command` approval/sandbox field omission and idle-resume
+  description in pycodex
 - transport-layer header casing / normalization
-- paths and modes not yet aligned beyond non-interactive `exec`
+- paths and modes not yet aligned beyond captured `exec` / default TUI paths
+- installed upstream `request_user_input` lacks source-main `autoResolutionMs`
+- upstream default-path tool discovery now exposes `tool_search` instead of
+  legacy flat sub-agent tools
+- upstream default-path goal tools are not implemented locally yet
 ```
 ## Redacted examples
@@ -457,7 +562,6 @@ Vendored upstream prompt data:
 - `pycodex/prompts/models.json`
 - `pycodex/prompts/permissions/sandbox_mode/`
 - `pycodex/prompts/permissions/approval_policy/`
-- `pycodex/prompts/exec_tools.json`
 Tests:
@@ -467,16 +571,31 @@ Tests:
 ## What is still out of scope here
-Prompt parity is not the same thing as full request parity.
-At the time this file was written, the remaining request-level differences are
-outside the prompt/context dump itself, for example:
-- dynamic request metadata values such as generated session ids and turn ids
-- behavior outside the currently aligned non-interactive `exec` path
-- broader runtime features such as sandbox / approvals / compact / memory
-Those are the next alignment target after the prompt/context pass.
+Prompt/tool-schema parity is not the same thing as full Codex parity.
+The remaining explicit alignment work is:
+- Implement upstream's current default-path `tool_search` / deferred multi-agent
+  discovery, or explicitly decide to keep legacy flat agent tools as a
+  compatibility surface.
+- Add goal tools (`get_goal`, `create_goal`, `update_goal`) if the local CLI is
+  intended to match the current upstream default tool set.
+- Capture/compare code-mode request-visible payloads for `exec` / `wait`, not
+  only their class-level schemas and smoke behavior.
+- Decide whether local helper tools (`shell`, `grep_files`, `read_file`,
+  `list_dir`, optional `ipython`) are intended to stay as local shims or should
+  be replaced/renamed as upstream evolves.
+- Broaden runtime parity beyond the currently aligned non-interactive `exec`
+  path and captured default two-turn TUI path.
+- App-server `AdditionalTools` handling from upstream `6e0c8b4` is not
+  implemented locally; this pass only verified it is not a builtin CLI tool spec
+  change for the local path.
+- Newer multi-agent v2-style tools such as `send_message`, `followup_task`,
+  `interrupt_agent`, and `list_agents` are not implemented in this local tool
+  registry yet.
+- Broader runtime features such as sandbox/approval enforcement, WebSocket/HTTP
+  transport fallback, cancellation markers, MCP/connectors/plugins, memory, and
+  review flows remain partial or out of scope for this pass.
 ## Steer semantics

{python_codex-0.1.13 → python_codex-0.1.14}/docs/CONTEXT.md RENAMED Viewed

@@ -228,7 +228,7 @@ A skill is a set of local instructions to follow that is stored in a `SKILL.md`
 - `permissions` prompt 的来源目录不同：Codex 从 `codex-rs/protocol/src/prompts/permissions/...` 取，`pycodex` 从 `./pycodex/prompts/permissions/...` 取。
 - `collaboration_mode` block 的来源不同：Codex 用上游协作提示模板，`pycodex` 用 `./pycodex/prompts/collaboration_default.md` / `./pycodex/prompts/collaboration_plan.md`。
 - `skills guidance` 的来源不同：Codex 用上游固定 guidance，`pycodex` 用 `./pycodex/context.py::SKILLS_GUIDANCE`。
-- `tools` 的构造来源不同：Codex 从上游 runtime tool registry 出来，`pycodex` 从 `./pycodex/prompts/exec_tools.json + ToolSpec.serialize()` 出来。
+- `tools` 的构造来源不同：Codex 从上游 runtime tool registry 出来，`pycodex` 从本地 `BaseTool` class specs 经 `ToolSpec.serialize()` 出来。
 ### 1.3 首轮请求不变量
@@ -462,5 +462,5 @@ ProviderBuiltinToolSchema = {
 当前实现方式：
 - 不再使用 prompt 级别的 `serialized_tools` override。
-- 在工具层直接复用上游 snapshot。
-- snapshot 文件位于 `./pycodex/prompts/exec_tools.json`。
+- 不再使用 `pycodex/prompts/exec_tools.json` 这类 raw JSON fallback。
+- 在工具类内维护 description / input schema；`ToolSpec.serialize()` 负责生成 request-visible payload。

{python_codex-0.1.13 → python_codex-0.1.14}/pycodex/agent.py RENAMED Viewed

@@ -5,7 +5,7 @@ import re
 from typing import Callable
 from .context import ContextManager
-from .model import ModelClient
+from .model import ModelClient, ResponsesIncompleteError
 from .protocol import (
     AgentEvent,
     AssistantMessage,
@@ -17,7 +17,7 @@ from .protocol import (
     TurnResult,
     UserMessage,
 )
-from .tools import ToolContext, ToolRegistry
+from .tools import ExecCommandTool, ToolContext, ToolRegistry, UnifiedExecManager
 from .utils import uuid7_string
 import typing
@@ -46,6 +46,7 @@ _CONTEXT_LENGTH_ERROR_MARKERS = (
     "exceeds the context window",
     "exceeded the context window",
 )
+TERMINAL_TURN_EVENTS = {"turn_completed", "turn_failed", "turn_interrupted"}
 class TurnInterrupted(RuntimeError):
@@ -85,6 +86,15 @@ class Agent:
         self._last_total_usage_tokens: 'typing.Union[int, None]' = None
         self.runtime_environment = runtime_environment
         self.interrupt_asap = False
+        self._turn_running = False
+        exec_command_tool = self._tool_registry.get_tool("exec_command")
+        self._exec_manager = (
+            exec_command_tool._manager
+            if isinstance(exec_command_tool, ExecCommandTool)
+            else None
+        )
+        if self._exec_manager is not None:
+            self._exec_manager.set_notify_hook(self.maybe_invoke)
     @property
     def history(self) -> 'typing.Tuple[ConversationItem, ...]':
@@ -129,6 +139,7 @@ class Agent:
     async def run_turn(
         self, texts: 'typing.List[str]', turn_id: 'typing.Union[str, None]' = None
     ) -> 'TurnResult':
+        self._turn_running = True
         turn_id = turn_id or uuid7_string()
         self.interrupt_asap = False
         new_user_messages = [UserMessage(text=text) for text in texts]
@@ -168,16 +179,10 @@ class Agent:
                     item_count=len(response.items),
                 )
-                tool_calls: 'typing.List[ToolCall]' = []
-                persisted_response_items: 'typing.List[ConversationItem]' = []
-                for item in response.items:
-                    self._history.append(item)
-                    persisted_response_items.append(item)
-                    if isinstance(item, AssistantMessage):
-                        last_assistant_message = item.text
-                    elif isinstance(item, ToolCall):
-                        tool_calls.append(item)
-                self._persist_history_items(persisted_response_items)
+                recorded_items = self._record_model_response_items(response.items)
+                tool_calls = recorded_items[1]
+                if recorded_items[2] is not None:
+                    last_assistant_message = recorded_items[2]
                 if not tool_calls:
                     self._raise_if_interrupt_requested(
@@ -191,6 +196,7 @@ class Agent:
                         iteration=iteration,
                         output_text=last_assistant_message,
                     )
+                    self._turn_running = False
                     return TurnResult(
                         turn_id=turn_id,
                         output_text=last_assistant_message,
@@ -211,6 +217,7 @@ class Agent:
                     output_text=last_assistant_message,
                 )
         except TurnInterrupted:
+            self._turn_running = False
             raise
         except Exception as exc:
             context_usage = _usage_from_context_length_error(str(exc))
@@ -224,8 +231,29 @@ class Agent:
                 error=str(exc),
                 error_type=type(exc).__name__,
             )
+            self._turn_running = False
             raise
+    async def maybe_invoke(self, event: 'typing.Dict[str, object]') -> 'bool':
+        if self._turn_running or event.get("type") != "exec_command_completed":
+            return False
+        payload = {
+            "session_id": event.get("session_id"),
+            "exit_code": event.get("exit_code"),
+            "command": event.get("command"),
+        }
+        text = (
+            "<exec_command_completed>\n"
+            f"{json.dumps(payload, ensure_ascii=False, separators=(',', ':'))}\n"
+            "</exec_command_completed>"
+        )
+        self._turn_running = True
+        task = asyncio.create_task(self.run_turn([text]))
+        task.add_done_callback(
+            lambda task: None if task.cancelled() else task.exception()
+        )
+        return True
     async def _execute_tool_batch(
         self,
         turn_id: 'str',
@@ -294,10 +322,18 @@ class Agent:
         return result
     def _emit(self, kind: 'str', turn_id: 'str', **payload: 'object') -> 'None':
+        if kind in TERMINAL_TURN_EVENTS:
+            payload["background_exec_count"] = self._background_exec_count()
         self._event_handler(
             AgentEvent(kind=kind, turn_id=turn_id, payload=dict(payload))
         )
+    def _background_exec_count(self) -> 'int':
+        manager: 'typing.Union[UnifiedExecManager, None]' = self._exec_manager
+        if manager is None:
+            return 0
+        return manager.running_session_count()
     def _persist_history_items(
         self,
         items: 'typing.Iterable[ConversationItem]',
@@ -310,6 +346,28 @@ class Agent:
         except Exception:  # pragma: no cover - persistence should not break turns
             return
+    def _record_model_response_items(
+        self,
+        items: 'typing.Iterable[object]',
+        include_tool_calls: 'bool' = True,
+    ) -> 'typing.Tuple[typing.Tuple[ConversationItem, ...], typing.List[ToolCall], typing.Union[str, None]]':
+        persisted_response_items: 'typing.List[ConversationItem]' = []
+        tool_calls: 'typing.List[ToolCall]' = []
+        last_assistant_message = None
+        for item in items:
+            if isinstance(item, ToolCall) and not include_tool_calls:
+                continue
+            if not isinstance(item, (AssistantMessage, ToolCall, ReasoningItem)):
+                continue
+            self._history.append(item)
+            persisted_response_items.append(item)
+            if isinstance(item, AssistantMessage):
+                last_assistant_message = item.text
+            elif isinstance(item, ToolCall):
+                tool_calls.append(item)
+        self._persist_history_items(persisted_response_items)
+        return tuple(persisted_response_items), tool_calls, last_assistant_message
     def _handle_model_stream_event(self, turn_id: 'str', event: 'ModelStreamEvent') -> 'None':
         if event.kind == "token_count":
             self._remember_token_usage(event.payload.get("usage"))
@@ -355,6 +413,13 @@ class Agent:
                     prompt,
                     lambda event: self._handle_model_stream_event(turn_id, event),
                 )
+            except ResponsesIncompleteError as exc:
+                if exc.reason == "max_output_tokens":
+                    self._record_model_response_items(
+                        exc.partial_items,
+                        include_tool_calls=False,
+                    )
+                raise
             except Exception as exc:
                 error_message = str(exc)
                 if (

python-codex 0.1.13__tar.gz → 0.1.14__tar.gz

python-codex 0.1.13tar.gz → 0.1.14tar.gz