npm - mimo2codex - Versions diffs - 0.1.14 → 0.1.16 - Mend

mimo2codex 0.1.14 → 0.1.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

package/AGENTS.md +24 -5
package/README.md +46 -6
package/README.zh.md +46 -6
package/dist/admin/router.js +117 -2
package/dist/admin/router.js.map +1 -1
package/dist/cli.js +67 -147
package/dist/cli.js.map +1 -1
package/dist/config.js +16 -10
package/dist/config.js.map +1 -1
package/dist/db/logs.js +80 -0
package/dist/db/logs.js.map +1 -1
package/dist/providers/generic.js +96 -0
package/dist/providers/generic.js.map +1 -0
package/dist/providers/genericLoader.js +229 -0
package/dist/providers/genericLoader.js.map +1 -0
package/dist/providers/mimo.js +31 -0
package/dist/providers/mimo.js.map +1 -1
package/dist/providers/registry.js +48 -10
package/dist/providers/registry.js.map +1 -1
package/dist/server.js +269 -15
package/dist/server.js.map +1 -1
package/dist/setup/snippets.js +187 -0
package/dist/setup/snippets.js.map +1 -0
package/dist/translate/reqToChat.js +1 -1
package/dist/translate/reqToChat.js.map +1 -1
package/dist/upstream/openaiCompatClient.js +32 -11
package/dist/upstream/openaiCompatClient.js.map +1 -1
package/dist/web/assets/index-D19ffnSJ.css +1 -0
package/dist/web/assets/index-DPLJprJ4.js +67 -0
package/dist/web/index.html +2 -2
package/doc/generic-providers.md +399 -0
package/doc/generic-providers.zh.md +399 -0
package/mimoskill/SKILL.md +69 -8
package/mimoskill/references/ocr_workflow.md +216 -0
package/mimoskill/scripts/generate_image.py +163 -0
package/mimoskill/scripts/ocr.py +396 -0
package/package.json +5 -4
package/dist/web/assets/index-BoykBCnY.js +0 -67
package/dist/web/assets/index-DAJbSznk.css +0 -1

package/doc/generic-providers.zh.md ADDED Viewed

@@ -0,0 +1,399 @@
+# 通用 OpenAI 兼容 Provider · 详细教程
+> [English](./generic-providers.md) · 中文
+>
+> 返回：[README 中文](../README.zh.md) · [README English](../README.md)
+mimo2codex 内置了 MiMo 和 DeepSeek 两个 provider。**通用 provider 机制**让你能在不改任何代码、不重新发包的前提下，把任何 **OpenAI Chat Completions 兼容**或**原生 Responses API** 的上游接到新版 Codex —— Qwen、GLM、Kimi、智谱、OpenAI 本身、本地 vLLM、Ollama、LM Studio …… 凡是接口长得像 OpenAI 的都能接。
+## 它解决什么
+新版 Codex 强制走 `wire_api = "responses"`，绝大多数三方模型只对外提供 Chat Completions。mimo2codex 把这个翻译做掉，你只需要在配置里登记一下你的上游就行。
+支持两种「上游协议」：
+| `wireApi` | 上游协议 | 适用场景 |
+|---|---|---|
+| `chat`（默认） | OpenAI Chat Completions | 99% 的第三方厂商（Qwen / GLM / DeepSeek / Kimi / Ollama / vLLM …） |
+| `responses` | OpenAI Responses API | 上游已经原生支持 Responses（如 OpenAI 自家、未来想跟进的厂商）。直透模式，不做协议翻译 |
+`responses` 直透有个额外优点：**协议升级时不用等 mimo2codex 跟进**——上游加什么字段就转什么字段，不会被中间层的旧翻译卡住。
+## 60 秒上手
+**最简方式**：一个 env，三件套。
+```bash
+export GENERIC_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
+export GENERIC_API_KEY=sk-your-qwen-key
+export GENERIC_DEFAULT_MODEL=qwen3-max
+mimo2codex --model generic
+```
+启动横幅会显示 `provider: generic`、`upstream: https://dashscope...`，然后 `mimo2codex print-config --model generic` 把 `auth.json + config.toml` 两段打印出来，复制到 `~/.codex/` 即可。
+> ⚠️ env-only 模式只能配 **一个** 上游。要同时配多个，用下面的 `providers.json`。
+## 配置文件方式（多实例，推荐）
+写一个 `providers.json`，每个上游一项。默认路径：
+| 系统 | 路径 |
+|---|---|
+| macOS / Linux | `~/.mimo2codex/providers.json` |
+| Windows | `%USERPROFILE%\.mimo2codex\providers.json` |
+也可以用 `MIMO2CODEX_PROVIDERS_FILE=/some/path/providers.json` 显式指定。
+完整示例：
+```json
+{
+  "providers": [
+    {
+      "id": "qwen",
+      "shortcut": "qwen",
+      "displayName": "Qwen (DashScope)",
+      "baseUrl": "https://dashscope.aliyuncs.com/compatible-mode/v1",
+      "envKey": "QWEN_API_KEY",
+      "defaultModel": "qwen3-max",
+      "wireApi": "chat",
+      "models": [
+        { "id": "qwen3-max", "contextWindow": 262144 },
+        { "id": "qwen3-coder-plus", "contextWindow": 1048576 }
+      ],
+      "features": { "forceParallelToolCalls": true }
+    },
+    {
+      "id": "kimi",
+      "shortcut": "kimi",
+      "displayName": "Kimi K2",
+      "baseUrl": "https://api.moonshot.cn/v1",
+      "envKey": "KIMI_API_KEY",
+      "defaultModel": "kimi-k2-0905-preview"
+    },
+    {
+      "id": "ollama",
+      "shortcut": "ol",
+      "displayName": "Ollama (local)",
+      "baseUrl": "http://127.0.0.1:11434/v1",
+      "envKey": "OLLAMA_API_KEY",
+      "defaultModel": "qwen2.5-coder:7b"
+    },
+    {
+      "id": "openai-native",
+      "displayName": "OpenAI (native Responses)",
+      "baseUrl": "https://api.openai.com/v1",
+      "envKey": "OPENAI_API_KEY",
+      "defaultModel": "gpt-5",
+      "wireApi": "responses"
+    }
+  ]
+}
+```
+配好之后启动：
+```bash
+export QWEN_API_KEY=sk-...
+export KIMI_API_KEY=sk-...
+mimo2codex --model qwen        # 默认 provider 是 qwen
+```
+`--model` 接收 `id` 或 `shortcut`（上面例子里 `qwen` 既是 id 又是 shortcut，`ollama` 的 shortcut 是 `ol`）。
+## 字段一览
+| 字段 | 必填 | 默认 | 说明 |
+|---|---|---|---|
+| `id` | ✓ | — | 唯一标识。不能用 `mimo` / `deepseek`（保留）。只允许字母数字 / `-` / `_` |
+| `displayName` | — | id | UI 和 print-config 里显示的名字 |
+| `shortcut` | — | id | `--model <短码>` 用 |
+| `baseUrl` | ✓ | — | 上游 base URL（**不要**带 `/chat/completions` 后缀，mimo2codex 自己拼） |
+| `envKey` | ✓ | — | 从哪个环境变量读 API key（如 `QWEN_API_KEY`） |
+| `defaultModel` | ✓ | — | 客户端没指定 / 未识别 model 字段时的兜底 |
+| `wireApi` | — | `"chat"` | `"chat"` 或 `"responses"`，见上文 |
+| `models` | — | `[]` | 声明该 provider 的模型清单（详见下节） |
+| `features.forceParallelToolCalls` | — | `false` | 强制开启 `parallel_tool_calls: true`（agentic 编程任务推荐打开） |
+| `features.webSearch` | — | `false` | 把 Codex 的 `web_search` 工具透传给上游（仅对支持 builtin web_search 的上游有意义） |
+| `docsUrl` | — | — | "缺 API key" 错误消息里展示的链接 |
+`models[]` 每一项：
+| 字段 | 必填 | 说明 |
+|---|---|---|
+| `id` | ✓ | 上游真实 model id |
+| `aliases` | — | 客户端可能发的别名，路由时也算命中 |
+| `displayName` | — | UI 上显示的名字 |
+| `contextWindow` | — | print-config 里的 `model_context_window` |
+| `maxOutputTokens` | — | print-config 里的 `model_max_output_tokens` |
+| `supportsImages` / `supportsReasoning` / `supportsWebSearch` | — | 元信息，UI 展示用 |
+## 模型识别策略
+`models[]` **不是必填**。两种行为：
+**1. 声明了 `models[]`（严格模式）**
+只有列在 `models[]` 里的 id（及 alias）才算"属于这个 provider"。`byClientModel` 路由按列表精确匹配。客户端发了未在列表里的 id：
+- 如果该 provider 是**默认** provider → 把 model 重写为 `defaultModel`，并在日志里记一条 `rewriteNotice`
+- 否则不命中，走默认 provider 的兜底
+适合：知道自己用哪几个模型，想让 print-config 输出 `model_context_window`、想在 admin UI 看到清晰的模型清单。
+**2. 不写 `models[]`（任意透传）**
+客户端发什么 model id，就原样转发给上游。**不重写**、**不报错**。
+适合：上游模型清单变化快（Ollama、OpenRouter 这类聚合服务），或者你只想"管道"功能，不想每加一个模型就改配置。
+> 任意透传 provider **不会** 被 `byClientModel` 自动命中——避免它"吞掉"所有 mimo / deepseek 的模型。要路由到它，必须把它设为默认 provider（`--model <id>`）。
+## wireApi 详解
+**`chat`**：mimo2codex 把 Codex Responses 请求翻译成 Chat Completions，发到 `${baseUrl}/chat/completions`，再把上游响应翻译回 Responses 给 Codex。
+```
+Codex ──[Responses]──> mimo2codex ──[Chat]──> 上游 ──[Chat]──> mimo2codex ──[Responses]──> Codex
+```
+**`responses`**：mimo2codex 把 Codex 的请求**直接转发**到 `${baseUrl}/responses`，不做任何翻译；上游响应也原样返回。
+```
+Codex ──[Responses]──> mimo2codex ──[Responses raw]──> 上游 ──[Responses raw]──> mimo2codex ──> Codex
+```
+什么时候用 `responses`：
+- 上游就是 OpenAI 自家
+- 上游声称"完全兼容 OpenAI Responses API"
+- 上游有 chat completions 不支持的字段（如 `reasoning.effort`、`text.verbosity`、新工具类型），翻译层会丢字段时
+注意事项：
+- 流式直透是**字节级 pipe**——上游 SSE 帧原样转发到 Codex，Codex 端 SSE 解析器负责切帧。低开销但也意味着 mimo2codex 不在中间做任何修改
+- 当前 admin UI 的"按模型 token 统计"对 `responses` 路径只能提取 `usage` 字段顶层，复杂的 usage breakdown 不解析
+## 几个真实上游配置
+### 阿里通义千问（DashScope OpenAI 兼容模式）
+```json
+{
+  "id": "qwen",
+  "displayName": "Qwen (DashScope)",
+  "baseUrl": "https://dashscope.aliyuncs.com/compatible-mode/v1",
+  "envKey": "QWEN_API_KEY",
+  "defaultModel": "qwen3-max",
+  "models": [
+    { "id": "qwen3-max", "contextWindow": 262144 },
+    { "id": "qwen3-coder-plus", "contextWindow": 1048576, "supportsReasoning": true }
+  ],
+  "features": { "forceParallelToolCalls": true }
+}
+```
+### 智谱 GLM
+```json
+{
+  "id": "glm",
+  "displayName": "Zhipu GLM-4.6",
+  "baseUrl": "https://open.bigmodel.cn/api/paas/v4",
+  "envKey": "ZHIPU_API_KEY",
+  "defaultModel": "glm-4.6",
+  "models": [
+    { "id": "glm-4.6", "contextWindow": 200000 }
+  ]
+}
+```
+### Moonshot Kimi
+```json
+{
+  "id": "kimi",
+  "displayName": "Kimi K2",
+  "baseUrl": "https://api.moonshot.cn/v1",
+  "envKey": "KIMI_API_KEY",
+  "defaultModel": "kimi-k2-0905-preview",
+  "models": [
+    { "id": "kimi-k2-0905-preview", "contextWindow": 256000 }
+  ]
+}
+```
+### 本地 Ollama / LM Studio（任意透传）
+```json
+{
+  "id": "ollama",
+  "shortcut": "ol",
+  "displayName": "Ollama (local)",
+  "baseUrl": "http://127.0.0.1:11434/v1",
+  "envKey": "OLLAMA_API_KEY",
+  "defaultModel": "qwen2.5-coder:7b"
+}
+```
+Ollama 不验证 API key，但 `envKey` 是 schema 必填——随便给个值就行（`OLLAMA_API_KEY=ignored`）。
+### OpenAI 原生 Responses（直透）
+```json
+{
+  "id": "openai-native",
+  "displayName": "OpenAI (native Responses)",
+  "baseUrl": "https://api.openai.com/v1",
+  "envKey": "OPENAI_API_KEY",
+  "defaultModel": "gpt-5",
+  "wireApi": "responses"
+}
+```
+## 默认 provider 与路由规则（重要）
+加了 generic provider 之后，路由优先级：
+1. **客户端发的 model 字段命中某个 provider 的 `models[]`（含 alias）且该 provider 有 key** → 路由到该 provider
+2. **命中了 catalog 但 provider 没 key** → fall through 到默认 provider，model 被重写为 `defaultModel`，日志记 `client_model_rewritten`
+3. **没声明 `models[]` 的 provider（开放目录）** → 在自动路由阶段跳过（避免"吞掉"所有未知 id）；只有显式 `--model <id>` 设它为默认 provider 时才会被路由到
+4. **都不命中** → 走默认 provider，model 被重写为 `defaultModel`，日志记 `client_model_rewritten`
+默认 provider 的选择：
+- `--model <id-or-shortcut>` 优先
+- 否则 `MIMO2CODEX_DEFAULT_PROVIDER` 环境变量
+- 否则 fallback 到 `"mimo"`
+### "key 没设" 的实际后果
+举个常见的坑：你在 `providers.json` 配了 qwen / kimi / glm 三个 generic，但启动时只设了 `MIMO_API_KEY`。这种情况下：
+```bash
+# 客户端发 qwen3-max
+# → byClientModel 命中 qwen catalog
+# → qwen 没 key → fall through
+# → 走默认 provider mimo → model 重写为 mimo-v2.5-pro
+# → 实际由 MiMo 用 mimo-v2.5-pro 回答
+```
+**对话过程没有任何提示**。在 admin 的「模型映射记录」表里能看到 `qwen3-max → mimo-v2.5-pro` 这条映射，chat 日志里也会带 `client_model_rewritten` 错误码。但如果你不主动看 admin UI，很容易以为「在用 qwen」实际「在用 mimo」。
+要避免这个静默降级，目前两个办法：
+1. **启动前确认 key 全配齐**：admin 首页的 Provider 卡片明确显示每个 provider「已检测到 key / 未检测到 key」，把所有需要用到的 key 都设上
+2. **改用单 provider 启动**：要专门用 qwen 就 `--model qwen` 并把 mimo 的 key 拿掉——这样如果 qwen 没 key 启动会直接报错而不是静默降级
+> 既有的 mimo / deepseek 用户**完全不受影响**：不写 providers.json 时默认 provider 仍是 mimo，所有行为字节级一致。
+## 在 admin webui 配置（不用手写 JSON）
+打开 `http://127.0.0.1:8788/admin/`：
+- **通用 Provider 页**（侧栏，[`/admin/providers`](http://127.0.0.1:8788/admin/providers)）：可视化增删改查 generic providers
+  - 表格列出 `providers.json` 里所有条目，每条可「编辑」/「删除」
+  - 「+ 添加 Provider」弹出表单，所有字段都有占位符提示和实时校验（id 不能与内置冲突 / 不能含空格 / baseUrl 必填等）
+  - 模型清单可动态增删，每个模型可填 contextWindow / maxOutputTokens / vision / reasoning / web search 等元信息
+  - 「编辑原始 JSON」逃生口——直接编辑 `providers.json` 全文，校验通过才会写入
+  - 保存后写 `~/.mimo2codex/providers.json`，UI 提示 **「重启 mimo2codex 让配置生效」**——目前不做热重载，启动期一次性加载
+- **对接指引页**（[`/admin/setup`](http://127.0.0.1:8788/admin/setup)）：下拉选 provider，三个 Tab 自动渲染 `auth.json + config.toml` 三种粘贴方式（直接修改 / env-key / cc-switch），每个 codeblock 有「复制」按钮
+- **概览页**：所有已注册 provider（含 generic）列在 Provider 卡片里，显示 key 是否已配置
+- **日志页**：按 provider 过滤（generic id 直接出现在下拉里）
+> 注意：UI 编辑**不能管理 API key**——key 不存数据库、不写配置文件，必须通过环境变量注入（如 `QWEN_API_KEY=sk-...`）。这是为了避免凭据落盘后被备份/泄漏。UI 只管 schema 配置，env 管 secret。
+## CLI 子命令对 generic 的支持
+```bash
+mimo2codex print-config --model qwen          # 输出 qwen 的 auth.json + config.toml 片段
+mimo2codex print-config --model qwen --env-key  # env-key 变种（仅 Codex CLI 适用）
+mimo2codex print-cc-switch --model qwen       # cc-switch 自定义供应商片段
+```
+toml 输出里 `model_provider` 命名规则：
+- mimo → `[model_providers.mimo]`（保留历史）
+- deepseek → `[model_providers.mimo2codex]`（保留历史）
+- 其他 generic → `[model_providers.mimo2codex-<id>]`（加前缀避免与用户已有 toml 段冲突）
+## 故障排查
+<details>
+<summary><b>报 <code>provider id "xxx" must be alphanumeric + dash/underscore</code></b></summary>
+`id` 字段只允许字母数字、`-`、`_`，不能有空格、点、斜杠。改成 `kimi`、`my-qwen`、`local_dev` 这样的。
+</details>
+<details>
+<summary><b>报 <code>generic provider id "mimo" conflicts with a built-in provider</code></b></summary>
+`mimo` 和 `deepseek` 是保留 id。改成 `mimo-custom` 之类的。
+</details>
+<details>
+<summary><b>报 <code>missing API key for ...</code> 但我明明设了 env</b></summary>
+检查：
+1. env 变量名是否和 spec 里的 `envKey` 完全一致（区分大小写）
+2. 是否用对了 shell：PowerShell 设的 `$env:X` 在 cmd 看不到，反之亦然
+3. 是不是把 key 设到了 `MIMO2CODEX_DEFAULT_PROVIDER` 指定的 provider 上（默认必须有 key，否则启动报错）
+</details>
+<details>
+<summary><b>启动横幅没显示我的 generic provider</b></summary>
+- 横幅只显示**有 API key**的 provider。检查 `envKey` 是否设了
+- 检查 providers.json 路径：是否在 `~/.mimo2codex/`，或显式 `MIMO2CODEX_PROVIDERS_FILE`
+- JSON 语法错会启动失败并打错误，不会静默
+</details>
+<details>
+<summary><b>路由没按预期走 — 发 qwen3-max 却到了 mimo</b></summary>
+如果你的 generic provider 没声明 `models[]`，它**不会**被 `byClientModel` 自动命中。两条路：
+- 给 spec 加 `models: [{ "id": "qwen3-max" }]`（推荐）
+- 或者把 generic 设为默认 provider：`mimo2codex --model qwen`
+</details>
+<details>
+<summary><b>上游报 400，错误信息说不认识 reasoning / thinking 字段</b></summary>
+非 MiMo 的上游通常不支持 MiMo 特有的 `thinking` 字段。generic provider 已经默认会剥掉这些字段。如果还报错，**用 `--verbose` 看实际转发的 body**——可能是 Codex 端发了别的字段，那是 Codex 客户端的兼容性问题，与代理无关。
+</details>
+<details>
+<summary><b>wireApi: "responses" 上游返回 404 / 405</b></summary>
+上游可能根本没实现 `/v1/responses` 端点。绝大多数三方厂商目前只有 `/v1/chat/completions`——把 `wireApi` 改回 `"chat"`（或删掉，默认就是 chat）。
+</details>
+<details>
+<summary><b>同一个 id 在 providers.json 里出现两次</b></summary>
+启动会报错并退出。每个 id 必须唯一。
+</details>
+## 设计取舍备忘
+- **为什么默认 provider 仍是 mimo？** 向后兼容。既有 mimo / deepseek 用户升级到带 generic 支持的版本，行为零变化
+- **为什么开放目录的 generic 不参与 `byClientModel`？** 否则它会"吞掉"所有未知 model id，包括 mimo / deepseek 的合法 id。把开放目录 generic 设为默认 provider 才能用它做"全部 model 透传"
+- **为什么 toml provider key 加 `mimo2codex-` 前缀？** 用户的 `~/.codex/config.toml` 里可能已经有 `[model_providers.qwen]`（直接连 Qwen 的旧配置），用前缀避免被覆盖
+- **为什么不做 admin UI 的可视化编辑？** 第一版先把"可用"做出来。后续 UI 表单可以在不破坏现有架构的情况下加入（providers.json 反正本来就是配置文件）
+## 相关源码
+- [src/providers/generic.ts](../src/providers/generic.ts) — 工厂函数
+- [src/providers/genericLoader.ts](../src/providers/genericLoader.ts) — 配置加载 + env 兜底
+- [src/providers/registry.ts](../src/providers/registry.ts) — 运行时注册 + 路由防护
+- [src/upstream/openaiCompatClient.ts](../src/upstream/openaiCompatClient.ts) — chat / responses 两个上游客户端
+- [src/server.ts](../src/server.ts) — `handleResponses` 里的 wireApi 分支
+- [test/providers.generic.test.ts](../test/providers.generic.test.ts) — 18 个测试用例

package/mimoskill/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: mimoskill
-description: Use Xiaomi MiMo V2.5 (the LLM behind mimo2codex) for chat, vision, web search, TTS and ASR — and route around capabilities MiMo doesn't natively support, especially image generation needed for things like Codex Pets `/hatch`. Trigger when the user mentions MiMo, calls into mimo2codex, asks to generate / hatch a Codex pet, asks for image generation while using MiMo as the chat backend, or hits a "no image generation available" / "image_gen tool unavailable" message inside Codex.
+description: Use Xiaomi MiMo V2.5 (the LLM behind mimo2codex) for chat, vision, web search, TTS and ASR — and route around capabilities MiMo doesn't natively support, especially OCR / image recognition / 识图 / 提取图片文字 / extract text from image when the current model can't see images, and image generation / 图像生成 / 生成图片 / draw a picture / 画一张 including Codex Pets `/hatch`. Trigger when the user mentions MiMo, calls into mimo2codex, asks to read text from an image, asks to describe or 识别 an image while using a non-vision model (mimo-v2.5-pro, mimo-v2-flash, …), asks to generate / hatch a Codex pet, asks for image generation while using MiMo as the chat backend, or hits a "no image generation available" / "image_gen tool unavailable" / "this model does not support image input" message inside Codex.
 ---
 # mimoskill — Xiaomi MiMo V2.5 + gap fillers
@@ -18,6 +18,8 @@ Trigger this skill when:
 - User asks "how do I generate a Codex pet" / "/hatch isn't working" / "image_gen tool not available"
 - User wants image generation as part of a MiMo-backed workflow
 - User pastes the Codex error: `the image generation tool (image_gen) is not available in this environment` or `the CLI fallback requires the openai Python package`
+- User wants to **OCR / read text from / describe / 识别 / 提取文字 from an image** while the active chat model is non-vision (e.g. mimo-v2.5-pro, mimo-v2-flash, or any third-party model without vision) — use `scripts/ocr.py` to fall back through `mimo-v2.5` without changing the chat model
+- User sees the proxy's `[N image attachment(s) omitted: this model does not support image input …]` placeholder in their transcript
 - Anything in the `mimo2codex` repo that touches a feature MiMo doesn't support
 ## What MiMo V2.5 does and doesn't do
@@ -35,7 +37,8 @@ Quick answer:
 | ASR (speech recog) | ✅ | `mimo-v2.5-asr` | separate endpoint |
 | Audio chat | ✅ | `mimo-v2-omni` | input only |
 | Video understanding | ✅ | `mimo-v2-omni` | input only |
-| **Image generation** | ❌ | — | **see workaround below** |
+| **Image generation** | ❌ | — | `scripts/generate_image.py` (general) or `scripts/generate_pet.py` (Codex pets) — see below |
+| OCR / 识图 (when chat model is non-vision) | ⚠️ via `mimo-v2.5` | `scripts/ocr.py` | always uses `mimo-v2.5` internally regardless of chat model |
 | Code interpreter / sandbox | ❌ | — | not provided |
 For the full capability matrix and examples, read [references/models.md](references/models.md).
@@ -43,13 +46,18 @@ For the full capability matrix and examples, read [references/models.md](referen
 ## Decision tree: what does the user actually want?
 ```
-Is it chat / vision / search / TTS / ASR?
-├── Yes → use MiMo directly (see "Calling MiMo directly" below) or via mimo2codex if Codex is the client
-└── No, they want image generation
+Is it OCR / read text from image / describe / 识别 an image
+when the active chat model is non-vision?
+├── Yes → use scripts/ocr.py (always routes through mimo-v2.5 internally)
+└── No
     │
-    Is it for a Codex pet (`/hatch`)?
-    ├── Yes → see "Generating a Codex pet" below
-    └── No → see "Image generation in general" below
+    Is it chat / vision / search / TTS / ASR with a vision-capable model?
+    ├── Yes → use MiMo directly (see "Calling MiMo directly" below) or via mimo2codex if Codex is the client
+    └── No, they want image generation
+        │
+        Is it for a Codex pet (`/hatch`)?
+        ├── Yes → see "Generating a Codex pet" below (scripts/generate_pet.py + install_pet.sh)
+        └── No  → see "General (non-pet) image generation" below (scripts/generate_image.py)
 ```
 ## Calling MiMo directly
@@ -68,6 +76,59 @@ The script handles all the MiMo-specific quirks — `max_completion_tokens` inst
 For non-trivial integrations, [references/models.md](references/models.md) and [the official MiMo OpenAI-compat doc](https://platform.xiaomimimo.com/docs/api/chat/openai-api) are the authoritative references.
+## OCR / image recognition (when the chat model can't see images)
+If the user wants to **read text from an image** or **describe / 识别 an image** but the current chat model is non-vision (`mimo-v2.5-pro`, `mimo-v2.5-pro[1m]`, `mimo-v2-flash`, or any third-party model without vision), invoke `scripts/ocr.py`. It always uses `mimo-v2.5` internally — the chat model stays untouched.
+The proxy silently drops image attachments on non-vision models (`src/translate/reqToChat.ts:48-72`) and leaves a `[N image attachment(s) omitted: …]` placeholder. **When you see that placeholder in the transcript, the right move is to run ocr.py and feed the text back into the conversation.** Don't ask the user to switch models.
+```bash
+export MIMO_API_KEY=sk-xxxxxxxxxxxxxxxx
+# verbatim OCR (default)
+python3 mimoskill/scripts/ocr.py path/to/image.png
+# 2-4 sentence description
+python3 mimoskill/scripts/ocr.py --mode describe https://example.com/x.png
+# structured JSON (text + regions + language + summary)
+python3 mimoskill/scripts/ocr.py --mode structured a.png b.jpg
+# re-render as GitHub-flavored Markdown (good for forms / receipts)
+cat scan.png | python3 mimoskill/scripts/ocr.py --mode markdown
+```
+`ocr.py` accepts local paths, http(s) URLs, `data:` URLs, or stdin bytes. Magic-byte sniffs the MIME (PNG / JPEG / GIF / WebP / BMP). Multiple positional args are batched into one MiMo call. Non-vision `--model` values are auto-coerced to `mimo-v2.5` with one stderr note.
+See [references/ocr_workflow.md](references/ocr_workflow.md) for full mode reference, exit codes, JSON shape for `--mode structured`, and the `--lang` / `--prompt` knobs.
+## General (non-pet) image generation
+For arbitrary image generation, use `scripts/generate_image.py` — a thin wrapper over `generate_pet.py` with the chibi-pet prompt boilerplate removed and an optional `--style` for common looks. Same providers (`auto` / `pollinations` / `gpt-image-1` / `replicate` / `local-sd`), same env vars, same `auto` fallback to free Pollinations when you only have a MiMo key.
+```bash
+# free, no key
+python3 mimoskill/scripts/generate_image.py \
+    --prompt "isometric cyberpunk city at dusk" --out /tmp/out.png
+# with a style preset
+python3 mimoskill/scripts/generate_image.py --style pixel-art \
+    --prompt "a brave knight" --out /tmp/knight.png
+# multiple variants -> /tmp/img-1.png /tmp/img-2.png /tmp/img-3.png /tmp/img-4.png
+python3 mimoskill/scripts/generate_image.py --n 4 \
+    --prompt "watercolor desert sunrise" --out /tmp/img.png
+# best quality (needs PET_OPENAI_API_KEY — same env var as the pet flow)
+export PET_OPENAI_API_KEY=sk-real-openai-key
+python3 mimoskill/scripts/generate_image.py --provider gpt-image-1 \
+    --prompt "..." --out /tmp/out.png
+```
+`--style` choices: `plain` (default, no prefix), `pixel-art`, `photo`, `3d-render`, `line-art`, `watercolor`, `sticker`. `plain` sends your prompt verbatim — pick that when the user gave a fully-specified prompt.
+For **Codex `/hatch` pets** keep using `generate_pet.py` + `install_pet.sh` — that flow is unchanged and tuned for the chibi sprite + 3-state bundle Codex wants.
 ## Generating a Codex pet (the `/hatch` alternative)
 **Why this needs special handling**: Codex's built-in `/hatch` pet generation requires OpenAI's image generation API (`gpt-image-1`). MiMo doesn't have an image generation endpoint, and mimo2codex can't fake one. So `/hatch` from inside Codex won't work when Codex is pointed at MiMo.