codex-516-guard 0.1.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- codex_516_guard-0.1.0.dist-info/METADATA +202 -0
- codex_516_guard-0.1.0.dist-info/RECORD +9 -0
- codex_516_guard-0.1.0.dist-info/WHEEL +4 -0
- codex_516_guard-0.1.0.dist-info/entry_points.txt +2 -0
- codex_516_guard-0.1.0.dist-info/licenses/LICENSE +27 -0
- guard/__init__.py +0 -0
- guard/cli.py +39 -0
- guard/fold.py +361 -0
- guard/server.py +248 -0
|
@@ -0,0 +1,202 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: codex-516-guard
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Local Responses proxy for OpenAI Codex CLI: folds gpt-5.5 518n-2 reasoning truncation (516 degradation) via the official openai_base_url wiring — no provider change, WebSocket-first, no fallback noise.
|
|
5
|
+
Project-URL: Homepage, https://github.com/dzshzx/codex-516-guard
|
|
6
|
+
Project-URL: Repository, https://github.com/dzshzx/codex-516-guard
|
|
7
|
+
Project-URL: Issues, https://github.com/dzshzx/codex-516-guard/issues
|
|
8
|
+
Author: dzshzx
|
|
9
|
+
License-Expression: MIT
|
|
10
|
+
License-File: LICENSE
|
|
11
|
+
Keywords: 516,codex,gpt-5.5,openai,proxy,reasoning
|
|
12
|
+
Classifier: Development Status :: 4 - Beta
|
|
13
|
+
Classifier: Environment :: Console
|
|
14
|
+
Classifier: Intended Audience :: Developers
|
|
15
|
+
Classifier: Operating System :: OS Independent
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.14
|
|
19
|
+
Classifier: Topic :: Internet :: Proxy Servers
|
|
20
|
+
Requires-Python: >=3.12
|
|
21
|
+
Requires-Dist: httpx>=0.27
|
|
22
|
+
Requires-Dist: starlette>=0.41
|
|
23
|
+
Requires-Dist: uvicorn[standard]>=0.32
|
|
24
|
+
Requires-Dist: zstandard>=0.23
|
|
25
|
+
Description-Content-Type: text/markdown
|
|
26
|
+
|
|
27
|
+
# codex-516-guard
|
|
28
|
+
|
|
29
|
+
> Local Responses proxy for OpenAI Codex CLI: detects the gpt-5.5 "516" reasoning-truncation
|
|
30
|
+
> fingerprint (`reasoning_tokens == 518*n - 2`), auto-continues the model's thinking, and folds
|
|
31
|
+
> all rounds into one response — **without changing `model_provider`**, so session grouping,
|
|
32
|
+
> remote compaction and remote-control stay intact. WebSocket-first: no
|
|
33
|
+
> "Falling back from WebSockets" retry noise.
|
|
34
|
+
|
|
35
|
+
自研本地 Responses 代理,缓解 Codex gpt-5.5 的「516 降智」:思考在
|
|
36
|
+
`reasoning_tokens == 518*n - 2`(516、1034、1552…)处被截断,答案质量骤降
|
|
37
|
+
(上游 issue:[openai/codex#30364](https://github.com/openai/codex/issues/30364),无官方修复)。
|
|
38
|
+
本代理检测该指纹后自动让模型继续思考,并把多轮续写**折叠为单个下游响应**。
|
|
39
|
+
|
|
40
|
+
机制思路来自 [neteroster/CodexCont](https://github.com/neteroster/CodexCont)(MIT),
|
|
41
|
+
实现为全新代码。与其关键差异:
|
|
42
|
+
|
|
43
|
+
| | codex-516-guard | CodexCont |
|
|
44
|
+
| --- | --- | --- |
|
|
45
|
+
| Codex 侧接线 | 顶层 `openai_base_url`(**不新建 provider**) | 新建 `[model_providers]`(会话按 provider 分组被隐藏、remote-control 不可用、丢远程压缩) |
|
|
46
|
+
| 下游传输 | **WebSocket 第一传输**(完整实现 `responses_websockets` 协议)+ SSE 兜底 | 仅 SSE(codex 先试 ws → 405 → 每会话约 5 次重连告警后回退) |
|
|
47
|
+
| zstd 请求压缩(0.142.x 内置 provider 默认开) | 原生解压,无需改 codex 配置 | 需 `[features] enable_request_compression = false` |
|
|
48
|
+
| `GET /v1/models` 模型目录刷新 | `/v1/*` 透传 | 未代理(静默失败,靠本地缓存) |
|
|
49
|
+
| 续写方法 | commentary 法(`phase:"commentary"` 消息 + encrypted reasoning 重放) | commentary + tool_pair legacy + 跨轮 repair 等更多可配置项 |
|
|
50
|
+
|
|
51
|
+
## 原理
|
|
52
|
+
|
|
53
|
+
1. 上游每轮结束时读取 `usage.output_tokens_details.reasoning_tokens`,命中 `518n-2`(n∈[1,6],最多续写 3 轮)即判定思考被截断;
|
|
54
|
+
2. 丢弃该轮的**暂定输出**(message / tool calls——它们基于被截断的思考),把该轮 reasoning items(含 `encrypted_content`)+ 一条 `Continue thinking...` 的 `phase:"commentary"` 助手消息追加进 input 重放,开下一轮;
|
|
55
|
+
3. 思考流实时透传给 agent,只有干净收尾那一轮的最终输出被放行;terminal 事件重建为单响应口径的 usage(input 取第 1 轮防止「假爆上下文」,reasoning 求和),真实累计成本记在 `metadata.proxy_billed_usage`。
|
|
56
|
+
|
|
57
|
+
## 安装
|
|
58
|
+
|
|
59
|
+
要求:[uv](https://docs.astral.sh/uv/)(自带 Python 管理)、Codex CLI(ChatGPT OAuth 登录,0.142.x 实测)。
|
|
60
|
+
|
|
61
|
+
```bash
|
|
62
|
+
uv tool install codex-516-guard # 从 PyPI 安装
|
|
63
|
+
# 或直接从源码仓库:
|
|
64
|
+
# uv tool install git+https://github.com/dzshzx/codex-516-guard
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
uv 会建一个隔离环境并把可执行文件放进 uv 的 bin 目录(Unix/macOS 默认 `~/.local/bin`,
|
|
68
|
+
Windows 用 `where.exe codex-516-guard` 查实际路径;`uv tool update-shell` 可把该目录加进 PATH)。
|
|
69
|
+
之后:
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
codex-516-guard # 前台跑起(默认 127.0.0.1:8787)
|
|
73
|
+
codex-516-guard --port 8790 --log-level debug # 可选参数:--host/--port/--upstream/--log-level
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
升级 / 卸载:`uv tool upgrade codex-516-guard` / `uv tool uninstall codex-516-guard`。
|
|
77
|
+
|
|
78
|
+
Codex 侧接线——`~/.codex/config.toml` 顶层(必须在第一个 `[table]` 之前)加一行:
|
|
79
|
+
|
|
80
|
+
```toml
|
|
81
|
+
openai_base_url = "http://127.0.0.1:8787/v1"
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
这是覆盖内置 openai provider base_url 的**官方 config key**
|
|
85
|
+
([#16719](https://github.com/openai/codex/issues/16719);同名 `[model_providers.openai]`
|
|
86
|
+
覆盖被维护者拒绝,`OPENAI_BASE_URL` 环境变量已移除)。provider id 保持 `openai`,
|
|
87
|
+
因此会话历史分组、远程压缩、remote-control 均不受影响。
|
|
88
|
+
|
|
89
|
+
**关闭**:注释掉 `openai_base_url` 行 + 停掉代理进程。代理停止而 key 在位时,Codex 会因上游不可达报错。
|
|
90
|
+
|
|
91
|
+
## 开机自启动
|
|
92
|
+
|
|
93
|
+
代理是**用户会话内**被 Codex 调用的回环服务,所以三平台都选「随用户登录启动、跑在用户上下文」的方式
|
|
94
|
+
(而不是系统级服务——系统服务跑在无用户环境的 session 里,够不到用户 profile 下的 uv 可执行文件与代理设置)。
|
|
95
|
+
先用 `which codex-516-guard`(Unix/macOS)或 `where.exe codex-516-guard`(Windows)拿到绝对路径备用。
|
|
96
|
+
|
|
97
|
+
### Linux / WSL — systemd user unit
|
|
98
|
+
|
|
99
|
+
见 `systemd/codex-516-guard.service.example`:
|
|
100
|
+
|
|
101
|
+
```bash
|
|
102
|
+
cp systemd/codex-516-guard.service.example ~/.config/systemd/user/codex-516-guard.service
|
|
103
|
+
systemctl --user daemon-reload && systemctl --user enable --now codex-516-guard
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
### macOS — launchd LaunchAgent
|
|
107
|
+
|
|
108
|
+
macOS 用 launchd 管理后台任务。放在 `~/Library/LaunchAgents/` 的是 **LaunchAgent**,随**用户登录**启动、
|
|
109
|
+
跑在用户 GUI session 里(对回环代理正确的选择);`/Library/LaunchDaemons/` 里的 LaunchDaemon 开机即起但无用户会话,本场景不适用。
|
|
110
|
+
|
|
111
|
+
把可执行文件绝对路径填进下面的 plist,存为 `~/Library/LaunchAgents/com.dzshzx.codex-516-guard.plist`:
|
|
112
|
+
|
|
113
|
+
```xml
|
|
114
|
+
<?xml version="1.0" encoding="UTF-8"?>
|
|
115
|
+
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
|
116
|
+
<plist version="1.0">
|
|
117
|
+
<dict>
|
|
118
|
+
<key>Label</key> <string>com.dzshzx.codex-516-guard</string>
|
|
119
|
+
<key>ProgramArguments</key>
|
|
120
|
+
<array>
|
|
121
|
+
<string>/Users/YOU/.local/bin/codex-516-guard</string>
|
|
122
|
+
</array>
|
|
123
|
+
<key>RunAtLoad</key> <true/>
|
|
124
|
+
<key>KeepAlive</key> <true/>
|
|
125
|
+
<key>StandardOutPath</key> <string>/tmp/codex-516-guard.log</string>
|
|
126
|
+
<key>StandardErrorPath</key><string>/tmp/codex-516-guard.log</string>
|
|
127
|
+
</dict>
|
|
128
|
+
</plist>
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
注意:launchd 不读 shell 配置,`ProgramArguments` 必须是**绝对路径**;崩溃后 `KeepAlive` 会重启(10 秒节流)。
|
|
132
|
+
加载 / 停用(现代 `launchctl`,`load`/`unload` 已是 legacy):
|
|
133
|
+
|
|
134
|
+
```bash
|
|
135
|
+
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.dzshzx.codex-516-guard.plist
|
|
136
|
+
launchctl enable gui/$(id -u)/com.dzshzx.codex-516-guard
|
|
137
|
+
launchctl kickstart -k gui/$(id -u)/com.dzshzx.codex-516-guard # 立即(重)启动
|
|
138
|
+
launchctl bootout gui/$(id -u)/com.dzshzx.codex-516-guard # 卸载
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
### Windows — 用「登录时触发」的计划任务(不是系统服务)
|
|
142
|
+
|
|
143
|
+
用户常问能否做成 Windows **service**:可以,但原生 service 要求程序实现 SCM 控制协议,
|
|
144
|
+
`sc.exe create` 直接指向普通控制台程序会在启动时被判定超时(错误 1053),必须套 WinSW / NSSM 之类 wrapper。
|
|
145
|
+
即便套上,service 默认跑在 **session 0 + SYSTEM 账户**,没有你的用户环境、代理设置,也不便访问用户 profile 下 uv 装的可执行文件。
|
|
146
|
+
|
|
147
|
+
因此本场景**推荐计划任务(onlogon)**:随你登录启动、带完整用户环境和 PATH,正好匹配「用户会话内被调用的回环代理」。
|
|
148
|
+
用 `where.exe codex-516-guard` 拿到路径后(PowerShell):
|
|
149
|
+
|
|
150
|
+
```powershell
|
|
151
|
+
$exe = (Get-Command codex-516-guard).Source
|
|
152
|
+
schtasks /create /tn "codex-516-guard" /tr "`"$exe`"" /sc onlogon /rl limited /f
|
|
153
|
+
schtasks /run /tn "codex-516-guard" # 立即启动一次
|
|
154
|
+
# 删除:schtasks /delete /tn "codex-516-guard" /f
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
如确实要开机即起(未登录也运行)的系统服务形态,用 [WinSW](https://github.com/winswhq/winsw)(MIT)
|
|
158
|
+
把本 exe 包成服务;注意需让服务以你的用户账户运行,否则够不到用户 profile 下的安装。
|
|
159
|
+
|
|
160
|
+
## 验证
|
|
161
|
+
|
|
162
|
+
```bash
|
|
163
|
+
curl -sS http://127.0.0.1:8787/healthz # {"ok":true,...}
|
|
164
|
+
journalctl --user -u codex-516-guard -f | grep -E 'round|done' # Linux/WSL;mac 看 plist 日志文件
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
命中折叠时的日志(实测样例,连环双 516 被击破、答案正确):
|
|
168
|
+
|
|
169
|
+
```
|
|
170
|
+
round 1: in=21550 out=664 reason=516 total=22214 | n=1 buffered=['function_call'] -> continue
|
|
171
|
+
round 2: in=22078 out=652 reason=516 total=22730 | n=1 buffered=['function_call'] -> continue
|
|
172
|
+
round 3: in=22606 out=566 reason=291 total=23172 | n=None buffered=[...] -> clean
|
|
173
|
+
done: 3 round(s) | ... | status=completed stop=natural
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
## 开发
|
|
177
|
+
|
|
178
|
+
```bash
|
|
179
|
+
git clone https://github.com/dzshzx/codex-516-guard && cd codex-516-guard
|
|
180
|
+
uv sync
|
|
181
|
+
uv run python test_fold.py # 折叠状态机自测,应输出 ALL PASS
|
|
182
|
+
uv run codex-516-guard # 本地跑
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
发布走 PyPI Trusted Publishing(`.github/workflows/release.yml`,OIDC,无 token):推 `v*` tag 即自动构建上传。
|
|
186
|
+
|
|
187
|
+
## 结构
|
|
188
|
+
|
|
189
|
+
- `guard/fold.py` — 指纹检测 + 折叠状态机(传输无关;`test_fold.py` 覆盖丢弃/放行、重编号、双口径 usage)
|
|
190
|
+
- `guard/server.py` — starlette 传输层:ws / SSE 下游、SSE 上游、zstd/gzip 请求解压、`/v1/*` 透传
|
|
191
|
+
- `guard/cli.py` — CLI 入口(`codex-516-guard`;仅监听回环;auth passthrough,不存储任何凭据)
|
|
192
|
+
|
|
193
|
+
## 安全与免责
|
|
194
|
+
|
|
195
|
+
- 代理只做 auth **passthrough**:转发 Codex 发来的 Authorization 头,不读取、不落盘任何凭据。
|
|
196
|
+
- 仅监听回环地址;不要暴露到非回环接口。
|
|
197
|
+
- 非官方项目,依赖上游未公开的行为(截断指纹、ws 帧格式),OpenAI 侧变更可能使其失效;使用风险自负。
|
|
198
|
+
- 续写会产生额外的真实 token 消耗(见 `metadata.proxy_billed_usage`),guard 以 n 窗口 + 3 轮上限约束。
|
|
199
|
+
|
|
200
|
+
## License
|
|
201
|
+
|
|
202
|
+
MIT(见 LICENSE;机制思路 credit neteroster/CodexCont)。
|
|
@@ -0,0 +1,9 @@
|
|
|
1
|
+
guard/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
2
|
+
guard/cli.py,sha256=QMc1IOXV44rtoCcqPZ-GbwlGEATdqBNOFkHhU5s3zjI,1517
|
|
3
|
+
guard/fold.py,sha256=tkUn0rwHtzbYT4WmHbrpkotIyFuMWxHQJGPoefVk55E,14471
|
|
4
|
+
guard/server.py,sha256=9QppzZLpoNiAK-sEzwwuUKtAasiwQbp_Qzwxfy4cVP8,9103
|
|
5
|
+
codex_516_guard-0.1.0.dist-info/METADATA,sha256=TBLZ8siLW7vp6P8RbiRdY6QnD5AgkGd9kK3IoEZZSu8,10639
|
|
6
|
+
codex_516_guard-0.1.0.dist-info/WHEEL,sha256=mffPy8wBnZQn2VnJUU5jE99KsxaSfiyMHV9Yt0aLVxs,87
|
|
7
|
+
codex_516_guard-0.1.0.dist-info/entry_points.txt,sha256=k3OUjGaVDXtw83goZUWH_9PdFS42H9N1y6GCIpbBHf8,51
|
|
8
|
+
codex_516_guard-0.1.0.dist-info/licenses/LICENSE,sha256=xUs31HROJwQ3ywBsM36wSEgCHZbFGQLJXm0iiR06iJY,1258
|
|
9
|
+
codex_516_guard-0.1.0.dist-info/RECORD,,
|
|
@@ -0,0 +1,27 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 dzshzx
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
Mechanism inspiration: the 518n-2 truncation detection + fold-continuation
|
|
26
|
+
approach originates from neteroster/CodexCont (MIT). This project is an
|
|
27
|
+
independent, from-scratch implementation.
|
guard/__init__.py
ADDED
|
File without changes
|
guard/cli.py
ADDED
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
"""codex-516-guard CLI entry point (installed via [project.scripts])."""
|
|
2
|
+
from __future__ import annotations
|
|
3
|
+
|
|
4
|
+
import argparse
|
|
5
|
+
import logging
|
|
6
|
+
import os
|
|
7
|
+
|
|
8
|
+
import uvicorn
|
|
9
|
+
|
|
10
|
+
|
|
11
|
+
def main() -> None:
|
|
12
|
+
parser = argparse.ArgumentParser(
|
|
13
|
+
prog="codex-516-guard",
|
|
14
|
+
description=(
|
|
15
|
+
"Local Responses proxy for Codex CLI: detects the gpt-5.5 518n-2 "
|
|
16
|
+
"reasoning-truncation fingerprint, auto-continues thinking, and folds "
|
|
17
|
+
"all rounds into one response. Wire Codex to it with the top-level "
|
|
18
|
+
'config key: openai_base_url = "http://127.0.0.1:8787/v1"'
|
|
19
|
+
),
|
|
20
|
+
)
|
|
21
|
+
parser.add_argument("--host", default="127.0.0.1",
|
|
22
|
+
help="bind address (default: 127.0.0.1; keep it loopback)")
|
|
23
|
+
parser.add_argument("--port", type=int, default=8787, help="bind port (default: 8787)")
|
|
24
|
+
parser.add_argument("--upstream", default=None,
|
|
25
|
+
help="upstream base URL (default: https://chatgpt.com/backend-api/codex)")
|
|
26
|
+
parser.add_argument("--log-level", default="info",
|
|
27
|
+
choices=["critical", "error", "warning", "info", "debug"])
|
|
28
|
+
args = parser.parse_args()
|
|
29
|
+
|
|
30
|
+
if args.upstream:
|
|
31
|
+
os.environ["GUARD_UPSTREAM_BASE"] = args.upstream
|
|
32
|
+
logging.basicConfig(level=args.log_level.upper(),
|
|
33
|
+
format="%(levelname)s:%(name)s:%(message)s")
|
|
34
|
+
uvicorn.run("guard.server:app", host=args.host, port=args.port,
|
|
35
|
+
log_level=args.log_level)
|
|
36
|
+
|
|
37
|
+
|
|
38
|
+
if __name__ == "__main__":
|
|
39
|
+
main()
|
guard/fold.py
ADDED
|
@@ -0,0 +1,361 @@
|
|
|
1
|
+
"""518n-2 truncation detection + round folding for the Codex Responses event stream.
|
|
2
|
+
|
|
3
|
+
gpt-5.5 reasoning gets cut at reasoning_tokens == 518*n - 2 (openai/codex#30364).
|
|
4
|
+
When a round ends on that fingerprint we replay the conversation plus the round's
|
|
5
|
+
reasoning items and a phase:"commentary" nudge, then fold every round into ONE
|
|
6
|
+
downstream response: reasoning streams live, each round's tentative final output
|
|
7
|
+
(message / tool calls) is buffered and only the clean round's output is flushed.
|
|
8
|
+
|
|
9
|
+
Transport-agnostic: `fold()` consumes upstream events as dicts and yields
|
|
10
|
+
downstream events as dicts; serialization (SSE / WebSocket) lives in server.py.
|
|
11
|
+
|
|
12
|
+
Mechanism credit: neteroster/CodexCont (MIT). Implementation is original.
|
|
13
|
+
"""
|
|
14
|
+
from __future__ import annotations
|
|
15
|
+
|
|
16
|
+
import logging
|
|
17
|
+
from typing import Any, AsyncIterator, Awaitable, Callable
|
|
18
|
+
|
|
19
|
+
log = logging.getLogger("guard.fold")
|
|
20
|
+
|
|
21
|
+
STEP = 518
|
|
22
|
+
MIN_N = 1 # continue only when truncation tier n >= MIN_N
|
|
23
|
+
MAX_N = 6 # stop forcing once n > MAX_N (0 = no cap)
|
|
24
|
+
MAX_CONTINUE = 3 # continuation rounds after round 1 (runaway guard)
|
|
25
|
+
MARKER_TEXT = "Continue thinking..."
|
|
26
|
+
ENC_INCLUDE = "reasoning.encrypted_content"
|
|
27
|
+
|
|
28
|
+
TERMINAL_TYPES = ("response.completed", "response.failed", "response.incomplete")
|
|
29
|
+
|
|
30
|
+
# An opener returns the upstream event iterator for one round's body.
|
|
31
|
+
RoundOpener = Callable[[dict[str, Any]], Awaitable[AsyncIterator[dict[str, Any]]]]
|
|
32
|
+
|
|
33
|
+
|
|
34
|
+
class RoundOpenError(Exception):
|
|
35
|
+
"""Continuation round could not be opened (upstream HTTP >= 400)."""
|
|
36
|
+
|
|
37
|
+
def __init__(self, status: int, detail: str):
|
|
38
|
+
super().__init__(f"upstream {status}: {detail[:200]}")
|
|
39
|
+
self.status = status
|
|
40
|
+
|
|
41
|
+
|
|
42
|
+
DONE = object() # sentinel an opener may yield to signal upstream sent [DONE]
|
|
43
|
+
|
|
44
|
+
|
|
45
|
+
# --- fingerprint -------------------------------------------------------------
|
|
46
|
+
|
|
47
|
+
|
|
48
|
+
def reasoning_tokens(usage: dict[str, Any] | None) -> int | None:
|
|
49
|
+
val = ((usage or {}).get("output_tokens_details") or {}).get("reasoning_tokens")
|
|
50
|
+
return int(val) if val is not None else None
|
|
51
|
+
|
|
52
|
+
|
|
53
|
+
def tier_n(tokens: int | None) -> int | None:
|
|
54
|
+
"""n for reasoning_tokens == STEP*n - 2 (516, 1034, ...), else None."""
|
|
55
|
+
if tokens is None or tokens < STEP - 2 or (tokens + 2) % STEP != 0:
|
|
56
|
+
return None
|
|
57
|
+
return (tokens + 2) // STEP
|
|
58
|
+
|
|
59
|
+
|
|
60
|
+
def in_continue_window(n: int | None) -> bool:
|
|
61
|
+
return n is not None and n >= MIN_N and (MAX_N == 0 or n <= MAX_N)
|
|
62
|
+
|
|
63
|
+
|
|
64
|
+
# --- continuation payload ----------------------------------------------------
|
|
65
|
+
|
|
66
|
+
|
|
67
|
+
def commentary_nudge() -> dict[str, Any]:
|
|
68
|
+
"""phase:"commentary" assistant message that provokes the model to resume
|
|
69
|
+
reasoning when replayed together with the encrypted reasoning items."""
|
|
70
|
+
return {
|
|
71
|
+
"type": "message",
|
|
72
|
+
"role": "assistant",
|
|
73
|
+
"content": [{"type": "output_text", "text": MARKER_TEXT}],
|
|
74
|
+
"phase": "commentary",
|
|
75
|
+
}
|
|
76
|
+
|
|
77
|
+
|
|
78
|
+
def next_round_body(base_body: dict[str, Any], input_items: list[Any]) -> dict[str, Any]:
|
|
79
|
+
"""The agent's request re-shaped for a continuation round: explicit input,
|
|
80
|
+
always streamed, encrypted reasoning included, no previous_response_id
|
|
81
|
+
(state is carried in the replayed items)."""
|
|
82
|
+
body = dict(base_body)
|
|
83
|
+
body["stream"] = True
|
|
84
|
+
body["input"] = input_items
|
|
85
|
+
include = [str(x) for x in (base_body.get("include") or [])]
|
|
86
|
+
if ENC_INCLUDE not in include:
|
|
87
|
+
include.append(ENC_INCLUDE)
|
|
88
|
+
body["include"] = include
|
|
89
|
+
body.pop("previous_response_id", None)
|
|
90
|
+
return body
|
|
91
|
+
|
|
92
|
+
|
|
93
|
+
# --- usage accounting --------------------------------------------------------
|
|
94
|
+
|
|
95
|
+
|
|
96
|
+
def _sum_usage(acc: dict[str, Any], usage: dict[str, Any] | None) -> None:
|
|
97
|
+
if not usage:
|
|
98
|
+
return
|
|
99
|
+
for key in ("input_tokens", "output_tokens", "total_tokens"):
|
|
100
|
+
if usage.get(key) is not None:
|
|
101
|
+
acc[key] = acc.get(key, 0) + int(usage[key])
|
|
102
|
+
cached = (usage.get("input_tokens_details") or {}).get("cached_tokens")
|
|
103
|
+
if cached is not None:
|
|
104
|
+
acc.setdefault("input_tokens_details", {})
|
|
105
|
+
acc["input_tokens_details"]["cached_tokens"] = (
|
|
106
|
+
acc["input_tokens_details"].get("cached_tokens", 0) + int(cached)
|
|
107
|
+
)
|
|
108
|
+
rt = reasoning_tokens(usage)
|
|
109
|
+
if rt is not None:
|
|
110
|
+
acc.setdefault("output_tokens_details", {})
|
|
111
|
+
acc["output_tokens_details"]["reasoning_tokens"] = (
|
|
112
|
+
acc["output_tokens_details"].get("reasoning_tokens", 0) + rt
|
|
113
|
+
)
|
|
114
|
+
|
|
115
|
+
|
|
116
|
+
def agent_usage(
|
|
117
|
+
first: dict[str, Any] | None,
|
|
118
|
+
summed: dict[str, Any],
|
|
119
|
+
final_round: dict[str, Any] | None,
|
|
120
|
+
flushed_final: bool,
|
|
121
|
+
) -> dict[str, Any]:
|
|
122
|
+
"""Usage as if the fold were one response. input/cached come from round 1
|
|
123
|
+
(summing hidden rounds would fake a blown context window); reasoning is
|
|
124
|
+
summed because every round's reasoning reached the agent; output adds only
|
|
125
|
+
the flushed final round's non-reasoning part."""
|
|
126
|
+
first = first or {}
|
|
127
|
+
in_tok = first.get("input_tokens") or 0
|
|
128
|
+
cached = (first.get("input_tokens_details") or {}).get("cached_tokens")
|
|
129
|
+
reason = (summed.get("output_tokens_details") or {}).get("reasoning_tokens") or 0
|
|
130
|
+
final_part = 0
|
|
131
|
+
if flushed_final and final_round:
|
|
132
|
+
out = final_round.get("output_tokens") or 0
|
|
133
|
+
final_part = max(0, out - (reasoning_tokens(final_round) or 0))
|
|
134
|
+
usage: dict[str, Any] = {
|
|
135
|
+
"input_tokens": in_tok,
|
|
136
|
+
"output_tokens": reason + final_part,
|
|
137
|
+
"total_tokens": in_tok + reason + final_part,
|
|
138
|
+
"output_tokens_details": {"reasoning_tokens": reason},
|
|
139
|
+
}
|
|
140
|
+
if cached is not None:
|
|
141
|
+
usage["input_tokens_details"] = {"cached_tokens": cached}
|
|
142
|
+
return usage
|
|
143
|
+
|
|
144
|
+
|
|
145
|
+
def _fmt(usage: dict[str, Any] | None) -> str:
|
|
146
|
+
u = usage or {}
|
|
147
|
+
return (
|
|
148
|
+
f"in={u.get('input_tokens')} out={u.get('output_tokens')} "
|
|
149
|
+
f"reason={reasoning_tokens(u)} total={u.get('total_tokens')}"
|
|
150
|
+
)
|
|
151
|
+
|
|
152
|
+
|
|
153
|
+
# --- terminal reconstruction ---------------------------------------------------
|
|
154
|
+
|
|
155
|
+
|
|
156
|
+
def _terminal_event(
|
|
157
|
+
upstream_terminal: dict[str, Any] | None,
|
|
158
|
+
base_response: dict[str, Any] | None,
|
|
159
|
+
output: list[dict[str, Any]],
|
|
160
|
+
usage: dict[str, Any],
|
|
161
|
+
rounds: list[dict[str, Any]],
|
|
162
|
+
billed: dict[str, Any],
|
|
163
|
+
stopped_reason: str | None,
|
|
164
|
+
*,
|
|
165
|
+
incomplete_reason: str | None = None,
|
|
166
|
+
) -> dict[str, Any]:
|
|
167
|
+
"""Downstream terminal: round-1 response identity, upstream status (or a
|
|
168
|
+
synthetic incomplete), our reconstructed output + single-response usage,
|
|
169
|
+
true billed cost and per-round breakdown in metadata."""
|
|
170
|
+
tresp = (upstream_terminal or {}).get("response") or {}
|
|
171
|
+
resp = dict(base_response or tresp)
|
|
172
|
+
resp["output"] = output
|
|
173
|
+
resp["usage"] = usage
|
|
174
|
+
metadata = dict(resp.get("metadata") or {})
|
|
175
|
+
metadata["proxy_rounds"] = rounds
|
|
176
|
+
metadata["proxy_billed_usage"] = billed
|
|
177
|
+
if stopped_reason:
|
|
178
|
+
metadata["proxy_stopped_reason"] = stopped_reason
|
|
179
|
+
resp["metadata"] = metadata
|
|
180
|
+
if incomplete_reason is not None:
|
|
181
|
+
resp["status"] = "incomplete"
|
|
182
|
+
resp["incomplete_details"] = {"reason": incomplete_reason}
|
|
183
|
+
return {"type": "response.incomplete", "response": resp}
|
|
184
|
+
resp["status"] = tresp.get("status", "completed")
|
|
185
|
+
if "incomplete_details" in tresp:
|
|
186
|
+
resp["incomplete_details"] = tresp["incomplete_details"]
|
|
187
|
+
return {"type": (upstream_terminal or {}).get("type", "response.completed"), "response": resp}
|
|
188
|
+
|
|
189
|
+
|
|
190
|
+
# --- the fold ----------------------------------------------------------------
|
|
191
|
+
|
|
192
|
+
|
|
193
|
+
async def fold(
|
|
194
|
+
base_body: dict[str, Any],
|
|
195
|
+
open_round: RoundOpener,
|
|
196
|
+
) -> AsyncIterator[dict[str, Any] | object]:
|
|
197
|
+
"""Yield downstream events (dicts, plus the DONE sentinel when upstream sent
|
|
198
|
+
one). Every yielded event gets a proxy-owned sequence_number; output_index
|
|
199
|
+
is renumbered into one downstream space across rounds."""
|
|
200
|
+
orig_input = list(base_body.get("input") or [])
|
|
201
|
+
seq = 0
|
|
202
|
+
ds_oi = 0
|
|
203
|
+
base_response: dict[str, Any] | None = None
|
|
204
|
+
saw_done = False
|
|
205
|
+
final_output: list[dict[str, Any]] = []
|
|
206
|
+
replay_tail: list[Any] = []
|
|
207
|
+
summed_usage: dict[str, Any] = {}
|
|
208
|
+
first_usage: dict[str, Any] | None = None
|
|
209
|
+
rounds_info: list[dict[str, Any]] = []
|
|
210
|
+
|
|
211
|
+
def stamp(ev: dict[str, Any]) -> dict[str, Any]:
|
|
212
|
+
nonlocal seq
|
|
213
|
+
ev["sequence_number"] = seq
|
|
214
|
+
seq += 1
|
|
215
|
+
return ev
|
|
216
|
+
|
|
217
|
+
round_no = 0
|
|
218
|
+
events = await open_round(next_round_body(base_body, orig_input))
|
|
219
|
+
|
|
220
|
+
while True:
|
|
221
|
+
round_no += 1
|
|
222
|
+
oi_to_ds: dict[Any, int] = {}
|
|
223
|
+
kind: dict[Any, str] = {}
|
|
224
|
+
buffered: list[dict[str, Any]] = [] # {oi, item, events}
|
|
225
|
+
round_reasoning: list[dict[str, Any]] = []
|
|
226
|
+
terminal: dict[str, Any] | None = None
|
|
227
|
+
usage: dict[str, Any] | None = None
|
|
228
|
+
|
|
229
|
+
try:
|
|
230
|
+
async for ev in events:
|
|
231
|
+
if ev is DONE:
|
|
232
|
+
saw_done = True
|
|
233
|
+
continue
|
|
234
|
+
etype = ev.get("type", "")
|
|
235
|
+
|
|
236
|
+
if etype in ("response.created", "response.in_progress"):
|
|
237
|
+
if round_no == 1:
|
|
238
|
+
if etype == "response.created":
|
|
239
|
+
base_response = ev.get("response") or {}
|
|
240
|
+
yield stamp(ev)
|
|
241
|
+
continue
|
|
242
|
+
if etype in TERMINAL_TYPES:
|
|
243
|
+
terminal = ev
|
|
244
|
+
usage = (ev.get("response") or {}).get("usage")
|
|
245
|
+
break
|
|
246
|
+
|
|
247
|
+
oi = ev.get("output_index")
|
|
248
|
+
if etype == "response.output_item.added":
|
|
249
|
+
item = ev.get("item") or {}
|
|
250
|
+
if item.get("type") == "reasoning":
|
|
251
|
+
kind[oi] = "reasoning"
|
|
252
|
+
oi_to_ds[oi] = ds_oi
|
|
253
|
+
ev["output_index"] = ds_oi
|
|
254
|
+
ds_oi += 1
|
|
255
|
+
yield stamp(ev)
|
|
256
|
+
else:
|
|
257
|
+
kind[oi] = "buffered"
|
|
258
|
+
buffered.append({"oi": oi, "item": item, "events": [ev]})
|
|
259
|
+
continue
|
|
260
|
+
|
|
261
|
+
k = kind.get(oi)
|
|
262
|
+
if k == "reasoning":
|
|
263
|
+
if oi in oi_to_ds:
|
|
264
|
+
ev["output_index"] = oi_to_ds[oi]
|
|
265
|
+
if etype == "response.output_item.done":
|
|
266
|
+
item = ev.get("item") or {}
|
|
267
|
+
round_reasoning.append(item)
|
|
268
|
+
final_output.append(item)
|
|
269
|
+
yield stamp(ev)
|
|
270
|
+
elif k == "buffered":
|
|
271
|
+
entry = next(e for e in buffered if e["oi"] == oi)
|
|
272
|
+
entry["events"].append(ev)
|
|
273
|
+
if etype == "response.output_item.done":
|
|
274
|
+
entry["item"] = ev.get("item") or entry["item"]
|
|
275
|
+
else:
|
|
276
|
+
yield stamp(ev) # unknown scope: forward best-effort
|
|
277
|
+
except RoundOpenError:
|
|
278
|
+
raise # only raised before any event; handled by caller for round 1
|
|
279
|
+
except Exception as exc: # upstream died mid-stream
|
|
280
|
+
log.warning("round %d: upstream error mid-stream: %r", round_no, exc)
|
|
281
|
+
_sum_usage(summed_usage, usage)
|
|
282
|
+
yield stamp(_terminal_event(
|
|
283
|
+
None, base_response, final_output,
|
|
284
|
+
agent_usage(first_usage, summed_usage, usage, flushed_final=False),
|
|
285
|
+
rounds_info, summed_usage, "upstream_error",
|
|
286
|
+
incomplete_reason="upstream_error"))
|
|
287
|
+
return
|
|
288
|
+
|
|
289
|
+
# ---- round ended: decide continue / stop ----------------------------
|
|
290
|
+
_sum_usage(summed_usage, usage)
|
|
291
|
+
if round_no == 1:
|
|
292
|
+
first_usage = usage
|
|
293
|
+
rt = reasoning_tokens(usage)
|
|
294
|
+
n = tier_n(rt)
|
|
295
|
+
rounds_info.append({"round": round_no, "reasoning_tokens": rt, "n": n})
|
|
296
|
+
has_enc = bool(round_reasoning and round_reasoning[-1].get("encrypted_content"))
|
|
297
|
+
|
|
298
|
+
do_continue = (
|
|
299
|
+
terminal is not None
|
|
300
|
+
and in_continue_window(n)
|
|
301
|
+
and has_enc
|
|
302
|
+
and round_no <= MAX_CONTINUE
|
|
303
|
+
)
|
|
304
|
+
stopped_reason = None
|
|
305
|
+
if not do_continue and n is not None:
|
|
306
|
+
stopped_reason = (
|
|
307
|
+
"no_encrypted_content" if not has_enc
|
|
308
|
+
else "max_continue" if round_no > MAX_CONTINUE
|
|
309
|
+
else "tier_out_of_window"
|
|
310
|
+
)
|
|
311
|
+
|
|
312
|
+
log.info(
|
|
313
|
+
"round %d: %s | n=%s buffered=%s -> %s",
|
|
314
|
+
round_no, _fmt(usage), n,
|
|
315
|
+
[e["item"].get("type") for e in buffered],
|
|
316
|
+
"continue" if do_continue else
|
|
317
|
+
"upstream_eof" if terminal is None else stopped_reason or "clean",
|
|
318
|
+
)
|
|
319
|
+
|
|
320
|
+
if do_continue:
|
|
321
|
+
replay_tail.extend([*round_reasoning, commentary_nudge()])
|
|
322
|
+
try:
|
|
323
|
+
events = await open_round(next_round_body(base_body, orig_input + replay_tail))
|
|
324
|
+
except RoundOpenError as exc:
|
|
325
|
+
log.warning("continuation round %d failed to open: %s", round_no + 1, exc)
|
|
326
|
+
yield stamp(_terminal_event(
|
|
327
|
+
None, base_response, final_output,
|
|
328
|
+
agent_usage(first_usage, summed_usage, usage, flushed_final=False),
|
|
329
|
+
rounds_info, summed_usage, "upstream_error",
|
|
330
|
+
incomplete_reason="upstream_error"))
|
|
331
|
+
return
|
|
332
|
+
continue
|
|
333
|
+
|
|
334
|
+
if terminal is None: # EOF with no terminal: tentative output is NOT an answer
|
|
335
|
+
log.warning("round %d: upstream EOF with no terminal event", round_no)
|
|
336
|
+
yield stamp(_terminal_event(
|
|
337
|
+
None, base_response, final_output,
|
|
338
|
+
agent_usage(first_usage, summed_usage, usage, flushed_final=False),
|
|
339
|
+
rounds_info, summed_usage, "upstream_eof",
|
|
340
|
+
incomplete_reason="upstream_eof"))
|
|
341
|
+
return
|
|
342
|
+
|
|
343
|
+
# Clean stop: flush this round's buffered output as the real answer.
|
|
344
|
+
for entry in buffered:
|
|
345
|
+
for ev in entry["events"]:
|
|
346
|
+
if "output_index" in ev:
|
|
347
|
+
ev["output_index"] = ds_oi
|
|
348
|
+
yield stamp(ev)
|
|
349
|
+
ds_oi += 1
|
|
350
|
+
final_output.append(entry["item"])
|
|
351
|
+
|
|
352
|
+
status = (terminal.get("response") or {}).get("status", "completed")
|
|
353
|
+
log.info("done: %d round(s) | %s | status=%s stop=%s",
|
|
354
|
+
round_no, _fmt(summed_usage), status, stopped_reason or "natural")
|
|
355
|
+
yield stamp(_terminal_event(
|
|
356
|
+
terminal, base_response, final_output,
|
|
357
|
+
agent_usage(first_usage, summed_usage, usage, flushed_final=True),
|
|
358
|
+
rounds_info, summed_usage, stopped_reason))
|
|
359
|
+
if saw_done:
|
|
360
|
+
yield DONE
|
|
361
|
+
return
|
guard/server.py
ADDED
|
@@ -0,0 +1,248 @@
|
|
|
1
|
+
"""codex-516-guard transport layer.
|
|
2
|
+
|
|
3
|
+
Downstream (Codex, wired via top-level `openai_base_url`):
|
|
4
|
+
* WebSocket /v1/responses — Codex's preferred transport (openai-beta
|
|
5
|
+
responses_websockets): client sends {"type":"response.create", ...body...}
|
|
6
|
+
frames, we answer with response.* event frames; the connection is reused
|
|
7
|
+
for sequential requests (prewarm + turns).
|
|
8
|
+
* POST /v1/responses — SSE fallback; request body may be zstd/gzip
|
|
9
|
+
compressed (built-in provider sends zstd when request compression is on).
|
|
10
|
+
* anything else under /v1/ — transparent passthrough to the upstream base
|
|
11
|
+
(Codex refreshes its model catalog via GET /v1/models).
|
|
12
|
+
|
|
13
|
+
Upstream is always the SSE POST endpoint; the fold state machine (fold.py) is
|
|
14
|
+
transport-agnostic.
|
|
15
|
+
"""
|
|
16
|
+
from __future__ import annotations
|
|
17
|
+
|
|
18
|
+
import gzip
|
|
19
|
+
import json
|
|
20
|
+
import logging
|
|
21
|
+
import os
|
|
22
|
+
import zlib
|
|
23
|
+
from typing import Any, AsyncIterator
|
|
24
|
+
|
|
25
|
+
import httpx
|
|
26
|
+
import zstandard
|
|
27
|
+
from starlette.applications import Starlette
|
|
28
|
+
from starlette.requests import Request
|
|
29
|
+
from starlette.responses import JSONResponse, Response, StreamingResponse
|
|
30
|
+
from starlette.routing import Route, WebSocketRoute
|
|
31
|
+
from starlette.websockets import WebSocket, WebSocketDisconnect
|
|
32
|
+
|
|
33
|
+
from .fold import DONE, RoundOpenError, fold
|
|
34
|
+
|
|
35
|
+
log = logging.getLogger("guard.server")
|
|
36
|
+
|
|
37
|
+
UPSTREAM_BASE = os.environ.get(
|
|
38
|
+
"GUARD_UPSTREAM_BASE", "https://chatgpt.com/backend-api/codex"
|
|
39
|
+
).rstrip("/")
|
|
40
|
+
RESPONSES_URL = UPSTREAM_BASE + "/responses"
|
|
41
|
+
|
|
42
|
+
# hop-by-hop / transport-specific headers never forwarded upstream
|
|
43
|
+
_DROP_HEADERS = {
|
|
44
|
+
"host", "connection", "upgrade", "keep-alive", "te", "trailer",
|
|
45
|
+
"transfer-encoding", "proxy-authorization", "proxy-connection",
|
|
46
|
+
"content-length", "content-encoding", "accept-encoding",
|
|
47
|
+
"sec-websocket-key", "sec-websocket-version", "sec-websocket-extensions",
|
|
48
|
+
"sec-websocket-protocol",
|
|
49
|
+
"openai-beta", # advertises the ws protocol; upstream round is plain SSE
|
|
50
|
+
}
|
|
51
|
+
|
|
52
|
+
|
|
53
|
+
def upstream_headers(raw: Any) -> dict[str, str]:
|
|
54
|
+
out = {}
|
|
55
|
+
for key, value in raw:
|
|
56
|
+
k = key.decode() if isinstance(key, bytes) else key
|
|
57
|
+
if k.lower() in _DROP_HEADERS:
|
|
58
|
+
continue
|
|
59
|
+
out[k] = value.decode() if isinstance(value, bytes) else value
|
|
60
|
+
out["accept"] = "text/event-stream"
|
|
61
|
+
return out
|
|
62
|
+
|
|
63
|
+
|
|
64
|
+
def decompress_body(data: bytes, encoding: str | None) -> bytes:
|
|
65
|
+
enc = (encoding or "").lower().strip()
|
|
66
|
+
if not enc or enc == "identity":
|
|
67
|
+
return data
|
|
68
|
+
if enc == "zstd":
|
|
69
|
+
return zstandard.ZstdDecompressor().decompressobj().decompress(data)
|
|
70
|
+
if enc == "gzip":
|
|
71
|
+
return gzip.decompress(data)
|
|
72
|
+
if enc == "deflate":
|
|
73
|
+
return zlib.decompress(data)
|
|
74
|
+
raise ValueError(f"unsupported content-encoding: {enc}")
|
|
75
|
+
|
|
76
|
+
|
|
77
|
+
# --- upstream SSE rounds ------------------------------------------------------
|
|
78
|
+
|
|
79
|
+
|
|
80
|
+
def parse_sse(text_chunks: AsyncIterator[str]) -> AsyncIterator[dict | object]:
|
|
81
|
+
"""Incremental SSE parser: yields event dicts (from data: lines) and the
|
|
82
|
+
DONE sentinel for `data: [DONE]`."""
|
|
83
|
+
|
|
84
|
+
async def gen():
|
|
85
|
+
buf = ""
|
|
86
|
+
async for chunk in text_chunks:
|
|
87
|
+
buf += chunk
|
|
88
|
+
while "\n\n" in buf:
|
|
89
|
+
block, buf = buf.split("\n\n", 1)
|
|
90
|
+
data_lines = [
|
|
91
|
+
line[5:].lstrip()
|
|
92
|
+
for line in block.splitlines()
|
|
93
|
+
if line.startswith("data:")
|
|
94
|
+
]
|
|
95
|
+
if not data_lines:
|
|
96
|
+
continue
|
|
97
|
+
data = "\n".join(data_lines)
|
|
98
|
+
if data == "[DONE]":
|
|
99
|
+
yield DONE
|
|
100
|
+
continue
|
|
101
|
+
try:
|
|
102
|
+
yield json.loads(data)
|
|
103
|
+
except json.JSONDecodeError:
|
|
104
|
+
log.warning("unparseable SSE data (len=%d), dropped", len(data))
|
|
105
|
+
|
|
106
|
+
return gen()
|
|
107
|
+
|
|
108
|
+
|
|
109
|
+
class UpstreamRounds:
|
|
110
|
+
"""RoundOpener bound to one downstream request's headers; closes the
|
|
111
|
+
previous round's response before opening the next."""
|
|
112
|
+
|
|
113
|
+
def __init__(self, client: httpx.AsyncClient, headers: dict[str, str]):
|
|
114
|
+
self.client = client
|
|
115
|
+
self.headers = headers
|
|
116
|
+
self._resp: httpx.Response | None = None
|
|
117
|
+
|
|
118
|
+
async def open(self, body: dict[str, Any]) -> AsyncIterator[dict | object]:
|
|
119
|
+
await self.aclose()
|
|
120
|
+
req = self.client.build_request(
|
|
121
|
+
"POST", RESPONSES_URL,
|
|
122
|
+
content=json.dumps(body, ensure_ascii=False).encode(),
|
|
123
|
+
headers={**self.headers, "content-type": "application/json"},
|
|
124
|
+
timeout=httpx.Timeout(connect=30, read=600, write=60, pool=30),
|
|
125
|
+
)
|
|
126
|
+
resp = await self.client.send(req, stream=True)
|
|
127
|
+
if resp.status_code >= 400:
|
|
128
|
+
detail = (await resp.aread()).decode(errors="replace")
|
|
129
|
+
await resp.aclose()
|
|
130
|
+
raise RoundOpenError(resp.status_code, detail)
|
|
131
|
+
self._resp = resp
|
|
132
|
+
return parse_sse(resp.aiter_text())
|
|
133
|
+
|
|
134
|
+
async def aclose(self) -> None:
|
|
135
|
+
if self._resp is not None:
|
|
136
|
+
try:
|
|
137
|
+
await self._resp.aclose()
|
|
138
|
+
except Exception:
|
|
139
|
+
pass
|
|
140
|
+
self._resp = None
|
|
141
|
+
|
|
142
|
+
|
|
143
|
+
# --- downstream endpoints -----------------------------------------------------
|
|
144
|
+
|
|
145
|
+
|
|
146
|
+
def sse_bytes(ev: dict | object) -> bytes:
|
|
147
|
+
if ev is DONE:
|
|
148
|
+
return b"data: [DONE]\n\n"
|
|
149
|
+
etype = ev.get("type", "message") # type: ignore[union-attr]
|
|
150
|
+
return f"event: {etype}\ndata: {json.dumps(ev, ensure_ascii=False)}\n\n".encode()
|
|
151
|
+
|
|
152
|
+
|
|
153
|
+
async def responses_post(request: Request) -> Response:
|
|
154
|
+
raw = await request.body()
|
|
155
|
+
try:
|
|
156
|
+
raw = decompress_body(raw, request.headers.get("content-encoding"))
|
|
157
|
+
body = json.loads(raw)
|
|
158
|
+
except (ValueError, json.JSONDecodeError) as exc:
|
|
159
|
+
return JSONResponse({"error": f"bad request body: {exc}"}, status_code=400)
|
|
160
|
+
|
|
161
|
+
rounds = UpstreamRounds(request.app.state.client, upstream_headers(request.headers.raw))
|
|
162
|
+
|
|
163
|
+
async def stream() -> AsyncIterator[bytes]:
|
|
164
|
+
try:
|
|
165
|
+
async for ev in fold(body, rounds.open):
|
|
166
|
+
yield sse_bytes(ev)
|
|
167
|
+
except RoundOpenError as exc: # round 1 rejected: surface upstream error
|
|
168
|
+
yield sse_bytes({
|
|
169
|
+
"type": "response.failed",
|
|
170
|
+
"response": {"status": "failed",
|
|
171
|
+
"error": {"message": str(exc), "code": exc.status}},
|
|
172
|
+
})
|
|
173
|
+
finally:
|
|
174
|
+
await rounds.aclose()
|
|
175
|
+
|
|
176
|
+
return StreamingResponse(stream(), media_type="text/event-stream")
|
|
177
|
+
|
|
178
|
+
|
|
179
|
+
async def responses_ws(ws: WebSocket) -> None:
|
|
180
|
+
await ws.accept()
|
|
181
|
+
headers = upstream_headers(ws.headers.raw)
|
|
182
|
+
rounds = UpstreamRounds(ws.app.state.client, headers)
|
|
183
|
+
try:
|
|
184
|
+
while True:
|
|
185
|
+
try:
|
|
186
|
+
envelope = json.loads(await ws.receive_text())
|
|
187
|
+
except (WebSocketDisconnect, json.JSONDecodeError):
|
|
188
|
+
return
|
|
189
|
+
if envelope.get("type") != "response.create":
|
|
190
|
+
log.info("ws: ignoring frame type %s", envelope.get("type"))
|
|
191
|
+
continue
|
|
192
|
+
body = {k: v for k, v in envelope.items() if k != "type"}
|
|
193
|
+
try:
|
|
194
|
+
async for ev in fold(body, rounds.open):
|
|
195
|
+
if ev is DONE:
|
|
196
|
+
continue
|
|
197
|
+
await ws.send_text(json.dumps(ev, ensure_ascii=False))
|
|
198
|
+
except RoundOpenError as exc:
|
|
199
|
+
await ws.send_text(json.dumps({
|
|
200
|
+
"type": "response.failed",
|
|
201
|
+
"response": {"status": "failed",
|
|
202
|
+
"error": {"message": str(exc), "code": exc.status}},
|
|
203
|
+
}))
|
|
204
|
+
except WebSocketDisconnect:
|
|
205
|
+
pass
|
|
206
|
+
finally:
|
|
207
|
+
await rounds.aclose()
|
|
208
|
+
|
|
209
|
+
|
|
210
|
+
async def passthrough(request: Request) -> Response:
|
|
211
|
+
"""Transparent proxy for every other /v1/* call (e.g. GET /v1/models)."""
|
|
212
|
+
suffix = request.path_params["path"]
|
|
213
|
+
url = f"{UPSTREAM_BASE}/{suffix}"
|
|
214
|
+
if request.url.query:
|
|
215
|
+
url += "?" + request.url.query
|
|
216
|
+
content = await request.body()
|
|
217
|
+
if content:
|
|
218
|
+
content = decompress_body(content, request.headers.get("content-encoding"))
|
|
219
|
+
headers = upstream_headers(request.headers.raw)
|
|
220
|
+
headers.pop("accept", None)
|
|
221
|
+
upstream = await request.app.state.client.request(
|
|
222
|
+
request.method, url, content=content or None, headers=headers,
|
|
223
|
+
timeout=httpx.Timeout(60),
|
|
224
|
+
)
|
|
225
|
+
drop = {"content-encoding", "transfer-encoding", "connection", "content-length"}
|
|
226
|
+
return Response(
|
|
227
|
+
upstream.content, status_code=upstream.status_code,
|
|
228
|
+
headers={k: v for k, v in upstream.headers.items() if k.lower() not in drop},
|
|
229
|
+
)
|
|
230
|
+
|
|
231
|
+
|
|
232
|
+
async def health(_: Request) -> JSONResponse:
|
|
233
|
+
return JSONResponse({"ok": True, "upstream": UPSTREAM_BASE})
|
|
234
|
+
|
|
235
|
+
|
|
236
|
+
def build_app() -> Starlette:
|
|
237
|
+
app = Starlette(routes=[
|
|
238
|
+
Route("/healthz", health),
|
|
239
|
+
Route("/v1/responses", responses_post, methods=["POST"]),
|
|
240
|
+
WebSocketRoute("/v1/responses", responses_ws),
|
|
241
|
+
Route("/v1/{path:path}", passthrough,
|
|
242
|
+
methods=["GET", "POST", "PUT", "DELETE", "PATCH", "HEAD"]),
|
|
243
|
+
])
|
|
244
|
+
app.state.client = httpx.AsyncClient(trust_env=True, http2=False)
|
|
245
|
+
return app
|
|
246
|
+
|
|
247
|
+
|
|
248
|
+
app = build_app()
|