@zhiman_innies/innies-codex 0.122.33 → 0.122.35

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -11,17 +11,13 @@
11
11
  </p>
12
12
 
13
13
  <p>
14
- <img alt="license" src="https://img.shields.io/badge/license-Apache--2.0-1f6feb?style=flat-square">
15
- <img alt="node" src="https://img.shields.io/badge/node-%E2%89%A516-3fb950?style=flat-square">
16
- <img alt="platform" src="https://img.shields.io/badge/platform-macOS_%7C_Windows_%7C_Linux-8b949e?style=flat-square">
17
- <img alt="model" src="https://img.shields.io/badge/model-qwen35__35b-01d280?style=flat-square">
18
- <img alt="ctx" src="https://img.shields.io/badge/context-272k-01d280?style=flat-square">
14
+ <img alt="license" src="https://img.shields.io/badge/license-Apache--2.0-7B5CFF?style=flat-square&labelColor=0d1117">
15
+ <img alt="node" src="https://img.shields.io/badge/node-%E2%89%A516-01D280?style=flat-square&labelColor=0d1117">
16
+ <img alt="platform" src="https://img.shields.io/badge/platform-macOS_%7C_Windows_%7C_Linux-8b949e?style=flat-square&labelColor=0d1117">
17
+ <img alt="model" src="https://img.shields.io/badge/model-qwen35__35b-00B8D9?style=flat-square&labelColor=0d1117">
18
+ <img alt="context" src="https://img.shields.io/badge/context-272k-7B5CFF?style=flat-square&labelColor=0d1117">
19
19
  </p>
20
20
 
21
- ```bash
22
- npm install -g @zhiman_innies/innies-codex@latest
23
- ```
24
-
25
21
  <sub>
26
22
  <a href="#速览">速览</a> ·
27
23
  <a href="#安装">安装</a> ·
@@ -29,6 +25,7 @@ npm install -g @zhiman_innies/innies-codex@latest
29
25
  <a href="#快速上手">快速上手</a> ·
30
26
  <a href="#与原生-codex-的差异">差异</a> ·
31
27
  <a href="#-迭代路线">🛣️ 路线</a> ·
28
+ <a href="#-质量验证现状">🧪 验证</a> ·
32
29
  <a href="#-已知限制与风险">⚠️ 风险</a> ·
33
30
  <a href="#文档">文档</a>
34
31
  </sub>
@@ -39,6 +36,7 @@ npm install -g @zhiman_innies/innies-codex@latest
39
36
 
40
37
  ## 速览
41
38
 
39
+ > [!NOTE]
42
40
  > 把官方 Codex CLI 的"大脑"换成知满自研模型 `qwen35_35b`,骨架(Agent 编排、TUI、MCP、沙箱、SDK)原样保留。
43
41
 
44
42
  | 维度 | 内容 |
@@ -57,7 +55,7 @@ npm install -g @zhiman_innies/innies-codex@latest
57
55
  innies --version
58
56
  ```
59
57
 
60
- > 要求:Node.js ≥ 16 · macOS 12+ / Windows 10/11 / Linux x64 或 arm64
58
+ > 要求:Node.js ≥ 16 · macOS 12+ / Windows 10/11(**Linux 不支持**,启动时编译期强制退出)
61
59
  >
62
60
  > 源码构建参见 [`docs/install.md`](docs/install.md)。
63
61
 
@@ -78,7 +76,8 @@ wire_api = "responses"
78
76
  env_key = "ZHIMAN_35B_API_KEY"
79
77
  ```
80
78
 
81
- > ℹ️ **默认 `base_url` 仅用于评估与 POC**。
79
+ > [!IMPORTANT]
80
+ > **默认 `base_url` 仅用于评估与 POC**。
82
81
  > 正式落地通常将模型私有化部署到客户机房内网,部署完成后请**新增**一个供应商节并把 `model_provider` 指向它(默认段会被启动器周期性校正,请勿直接改):
83
82
 
84
83
  ```toml
@@ -168,6 +167,7 @@ innies app-server # JSON-RPC + WebSocket,供 IDE/系统
168
167
  </tbody>
169
168
  </table>
170
169
 
170
+ > [!TIP]
171
171
  > **设计原则**:*只换大脑,不换骨架。* 上游 Codex 每个 release 都会跟进,定制点仅落在模型层、品牌层、稳定性补丁层。
172
172
 
173
173
  ---
@@ -197,12 +197,60 @@ innies app-server # JSON-RPC + WebSocket,供 IDE/系统
197
197
  | `code_change` | Spec | 代码补丁 |
198
198
  | `test_case_lookup_and_generation` | 变更 | 单元 / 集成测试用例 |
199
199
 
200
+ > [!NOTE]
200
201
  > 当前以 `INNIES_CODING_STAGE` 环境变量内部触发,公开版本将以 `innies coding <stage>` 子命令形式暴露。实现入口见 [`codex-cli/bin/innies-coding-runtime.js`](codex-cli/bin/innies-coding-runtime.js)。
201
202
 
202
203
  ---
203
204
 
205
+ ## 🧪 质量验证现状
206
+
207
+ > [!NOTE]
208
+ > 基于 [`docs/innies-qwen35-*`](docs/) 系列报告的真实压测产出。最近一次验收:**2026-05-24**。
209
+
210
+ ### 已完成
211
+
212
+ | 验证项 | 结果 | 证据 |
213
+ | :--- | :---: | :--- |
214
+ | 全链路功能矩阵(CLI / API / Skills / Output Schema / Spawn Agent) | ✅ | [full-chain 2026-05-17](docs/innies-qwen35-full-chain-test-report-2026-05-17.md) |
215
+ | 单用户基线 | ✅ | [single-user 2026-05-17](docs/innies-qwen35-single-user-test-report-2026-05-17.md) |
216
+ | 30 并发 API 突发 | ✅ 30/30 | full-chain 2026-05-17 |
217
+ | **复杂任务通过率(A 计划验收)** | **✅ 118/120 = 98.3%** | [complex-task-fix-report 2026-05-24](docs/innies-qwen35-complex-task-fix-report-2026-05-24.md) |
218
+ | · `multi_agent_review` | ✅ 40/40 = 100%(60% → 100%) | 同上 |
219
+ | · `system_design` | ✅ 38/40 = 95%(75% → 95%) | 同上 |
220
+ | · `superpowers_chain` | ✅ 40/40 = 100% | 同上 |
221
+ | 长稳 harness + 监控脚本就绪 | ✅ | [`scripts/long_soak.py`](scripts/long_soak.py) · [`scripts/long_soak_monitor.py`](scripts/long_soak_monitor.py) |
222
+ | 长稳预演(1h 子集) | ✅ | corpus `long-soak-preflight-20260524-*` |
223
+ | **8 小时长稳正式跑(B 计划验收)** | **✅ 302/303 = 99.67%** | [long-soak-report 2026-05-26](docs/innies-qwen35-long-soak-report-2026-05-26.md) |
224
+ | · `superpowers_chain` | ✅ 27/27 = 100% | 同上 |
225
+ | · `multi_agent_review` | ✅ 27/27 = 100% | 同上 |
226
+ | · `system_design` | ✅ 26/27 = 96.3%(1 次超时,无熔断) | 同上 |
227
+ | · KV cache 命中率 | ✅ 83.6%(37.4M input / 31.3M cached) | 同上 |
228
+ | 失败语料库(供后续 SFT/DPO) | ✅ 4 个样本 | [`docs/qwen35-finetune-corpus/`](docs/qwen35-finetune-corpus/) |
229
+
230
+ ### 进行中 / 未完成
231
+
232
+ | 验证项 | 状态 | 说明 |
233
+ | :--- | :---: | :--- |
234
+ | 100+ 并发 API 突发 | ❌ 已知失败 | 43/102,45s 超时;待服务端容量整改后复跑 |
235
+ | 长稳复跑(次日不同时段) | ⚪ 未启动 | B 计划首跑已验收,复跑视客户落地节奏安排 |
236
+ | 24h / 7×24 持续运行 | ⚪ 未启动 | 8h 已达标,24h 验证服务长尾稳定性 |
237
+ | Windows 平台行为 | ⚪ 未验证 | 当前全部测试仅在 macOS 跑过;Linux 已编译期强制禁用 |
238
+ | 客户机房私有化部署后的长稳 | ⚪ 未验证 | 实施时与客户联合验证 |
239
+ | 多机器分布式 agent / 故障注入 / 多租户 | ⚪ 未启动 | B 计划延伸事项 |
240
+
241
+ ### 已落地的工程工件(A 计划,2026-05-24)
242
+
243
+ - 失败归因脚本 [`scripts/classify_complex_failures.py`](scripts/classify_complex_failures.py)
244
+ - output_schema 压测 [`scripts/schema_pressure.py`](scripts/schema_pressure.py)
245
+ - 长稳 harness [`scripts/long_soak.py`](scripts/long_soak.py)(419 行)
246
+ - 长稳监控 [`scripts/long_soak_monitor.py`](scripts/long_soak_monitor.py)(含 RSS / FD / 目录大小 / 崩溃信号)
247
+ - 三路径修复(B1 Prompt 末尾固化 marker · B2 强制 output_schema · B3 Harness 层 re-prompt 兜底)合并进主压测 runner
248
+
249
+ ---
250
+
204
251
  ## ⚠️ 已知限制与风险
205
252
 
253
+ > [!WARNING]
206
254
  > 以下内容**基于代码与配置事实**列出,落地前请完整阅读。
207
255
 
208
256
  ### 1. 模型能力差距是真实存在的
@@ -246,7 +294,8 @@ innies app-server # JSON-RPC + WebSocket,供 IDE/系统
246
294
  - 跨 10+ 文件的架构级重构需要更多人工拆解和多轮迭代,**不要期望一次到位**
247
295
  - 不能通过 `model_reasoning_effort` 让模型"想久一点"——这个开关对 `qwen35_35b` 无效
248
296
 
249
- > 上述特征可通过 [`codex-cli/assets/innies-catalog.json`](codex-cli/assets/innies-catalog.json) 中各模型的字段直接对比。
297
+ > [!TIP]
298
+ > **已工程化缓解**:复杂任务里"指令跟随不稳定"(如 final marker 缺失)这类故障,已通过 A 计划落地的三路径修复(Prompt 末尾固化 marker + 强制 output_schema + Harness re-prompt 兜底)将 `multi_agent_review` 从 60% 提升到 100%、`system_design` 从 75% 提升到 95%。详见 [质量验证现状](#-质量验证现状)。模型本身的并行/推理档位差距仍然存在。
250
299
 
251
300
  ### 2. 上游跟随有滞后与冲突风险
252
301
 
@@ -256,7 +305,7 @@ innies app-server # JSON-RPC + WebSocket,供 IDE/系统
256
305
 
257
306
  ### 3. 治理边界仅覆盖到本机
258
307
 
259
- - macOS 沙箱基于 Seatbelt(`sandbox-exec`),Linux 基于 landlock,Windows 沙箱能力受限(仅文件系统 ACL
308
+ - macOS 沙箱基于 Seatbelt(`sandbox-exec`),Windows 沙箱能力受限(仅文件系统 ACL);**Linux 不在支持范围内**,启动时即报错退出
260
309
  - 没有组织级 IAM、角色、审计后台——多人协作的合规治理需对接外部系统
261
310
  - API Key 是单 token、无作用域、无过期——一旦泄露需手动轮换全员
262
311
 
@@ -278,14 +327,24 @@ innies app-server # JSON-RPC + WebSocket,供 IDE/系统
278
327
  | [`docs/config.md`](docs/config.md) | `config.toml` 字段参考 |
279
328
  | [`docs/sandbox.md`](docs/sandbox.md) | 沙箱与审批策略 |
280
329
  | [`docs/skills.md`](docs/skills.md) | Skill 注入机制 |
281
- | [`docs/innies-qwen35-*.md`](docs/) | qwen35 全链路 / 稳定性 / 高并发测试报告 |
330
+ | [`docs/innies-qwen35-complex-task-fix-plan.md`](docs/innies-qwen35-complex-task-fix-plan.md) | **A 计划**:复杂任务成功率整改方案 |
331
+ | [`docs/innies-qwen35-complex-task-fix-report-2026-05-24.md`](docs/innies-qwen35-complex-task-fix-report-2026-05-24.md) | **A 计划验收报告**(98.3% 通过率) |
332
+ | [`docs/innies-qwen35-long-soak-plan.md`](docs/innies-qwen35-long-soak-plan.md) | **B 计划**:8 小时长稳运行方案 |
333
+ | [`docs/innies-qwen35-long-soak-report-2026-05-26.md`](docs/innies-qwen35-long-soak-report-2026-05-26.md) | **B 计划验收报告**(302/303 = 99.67% 通过率) |
334
+ | [`docs/innies-qwen35-high-concurrency-plan.md`](docs/innies-qwen35-high-concurrency-plan.md) | 高并发整改计划(多 key 池 / 多端点 / 熔断) |
335
+ | [`docs/innies-qwen35-stability-test-report-2026-05-19.md`](docs/innies-qwen35-stability-test-report-2026-05-19.md) | 修复前稳定性基线报告 |
336
+ | [`docs/innies-qwen35-*.md`](docs/) | 其他 qwen35 全链路 / 单用户 / 早期报告 |
282
337
  | [`AGENTS.md`](AGENTS.md) | 贡献者必读(Rust 规范、测试约定) |
283
338
  | [`SECURITY.md`](SECURITY.md) | 安全披露流程 |
284
339
 
285
340
  ---
286
341
 
287
342
  <div align="center">
288
- <sub>问题反馈、定制需求请联系 <a href="https://zhiman.tech/">知满科技</a></sub>
289
- <br>
290
- <sub><i>Zhiman · intent in motion</i></sub>
343
+
344
+ <sub>· ✦ ·</sub>
345
+
346
+ <sub>问题反馈 · 定制需求 → <a href="https://zhiman.tech/"><b>知满科技</b></a></sub>
347
+ <br>
348
+ <sub><i>Zhiman · intent in motion</i></sub>
349
+
291
350
  </div>
@@ -17,6 +17,19 @@ Do NOT invoke any implementation skill, write any code, scaffold any project, or
17
17
 
18
18
  Every project goes through this process. A todo list, a single-function utility, a config change — all of them. "Simple" projects are where unexamined assumptions cause the most wasted work. The design can be short (a few sentences for truly simple projects), but you MUST present it and get approval.
19
19
 
20
+ ## Anti-Pattern: "This Is Conceptual, Not An Implementation Task"
21
+
22
+ STOP. If you find yourself thinking any of the following — you are rationalizing your way out of the skill, and you MUST NOT act on the thought:
23
+
24
+ - "This is a conceptual question, not a code task — I should just answer directly."
25
+ - "The user is asking how X works / how to design X — they want an explanation, not collaboration."
26
+ - "Exploration is unnecessary for a design discussion — I'll just lay out the standard answer."
27
+ - "I can describe the three-layer / N-tier / standard architecture from memory; brainstorming is overkill."
28
+
29
+ Brainstorming exists **precisely** for design conversations. "How should I design X" is the canonical case, not the exception. The user said "我要设计 / I want to design / let's build" — that is an explicit invitation to collaborate, not a request for a textbook answer.
30
+
31
+ The correct first move is **always** the checklist below, starting with project context exploration and clarifying questions. Jumping straight to "## 1. Short-term memory ## 2. Long-term memory ## 3. Working memory" with no questions asked is the failure mode this skill is designed to prevent.
32
+
20
33
  ## Checklist
21
34
 
22
35
  You MUST create a task for each of these items and complete them in order:
@@ -93,6 +93,9 @@ These thoughts mean STOP—you're rationalizing:
93
93
  | "I'll just do this one thing first" | Check BEFORE doing anything. |
94
94
  | "This feels productive" | Undisciplined action wastes time. Skills prevent this. |
95
95
  | "I know what that means" | Knowing the concept ≠ using the skill. Invoke it. |
96
+ | **"This is conceptual, not implementation"** | **Brainstorming IS for conceptual design talk. "How should I design X" is the canonical case, not the exception.** |
97
+ | **"The user wants an explanation, not collaboration"** | **If they wrote "我要设计 / I want to design / let's build" they invited collaboration. Treat it as such.** |
98
+ | **"I can give the standard textbook answer from memory"** | **Standard answers without context-aware clarification are exactly what brainstorming is designed to replace.** |
96
99
 
97
100
  ## Skill Priority
98
101
 
package/bin/innies.js CHANGED
File without changes
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@zhiman_innies/innies-codex",
3
- "version": "0.122.33",
3
+ "version": "0.122.35",
4
4
  "license": "Apache-2.0",
5
5
  "bin": {
6
6
  "innies": "bin/innies.js"
@@ -23,9 +23,9 @@
23
23
  "postinstall": "node bin/innies-init.js"
24
24
  },
25
25
  "optionalDependencies": {
26
- "@zhiman_innies/innies-codex-darwin-x64": "0.122.33-darwin-x64",
27
- "@zhiman_innies/innies-codex-darwin-arm64": "0.122.33-darwin-arm64",
28
- "@zhiman_innies/innies-codex-win32-x64": "0.122.33-win32-x64",
29
- "@zhiman_innies/innies-codex-win32-arm64": "0.122.33-win32-arm64"
26
+ "@zhiman_innies/innies-codex-darwin-x64": "0.122.35-darwin-x64",
27
+ "@zhiman_innies/innies-codex-darwin-arm64": "0.122.35-darwin-arm64",
28
+ "@zhiman_innies/innies-codex-win32-x64": "0.122.35-win32-x64",
29
+ "@zhiman_innies/innies-codex-win32-arm64": "0.122.35-win32-arm64"
30
30
  }
31
31
  }