mimo2codex 0.1.16 → 0.1.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -18,7 +18,7 @@ Trigger this skill when:
18
18
  - User asks "how do I generate a Codex pet" / "/hatch isn't working" / "image_gen tool not available"
19
19
  - User wants image generation as part of a MiMo-backed workflow
20
20
  - User pastes the Codex error: `the image generation tool (image_gen) is not available in this environment` or `the CLI fallback requires the openai Python package`
21
- - User wants to **OCR / read text from / describe / 识别 / 提取文字 from an image** while the active chat model is non-vision (e.g. mimo-v2.5-pro, mimo-v2-flash, or any third-party model without vision) — use `scripts/ocr.py` to fall back through `mimo-v2.5` without changing the chat model
21
+ - User wants to **OCR / read text from / describe / 识别 / 提取文字 from an image** while the active chat model is non-vision (e.g. mimo-v2.5-pro, mimo-v2-flash, deepseek-*, or any third-party text-only model) — use `scripts/ocr.py`. Works with or without a MiMo key (free pollinations fallback when `MIMO_API_KEY` is unset).
22
22
  - User sees the proxy's `[N image attachment(s) omitted: this model does not support image input …]` placeholder in their transcript
23
23
  - Anything in the `mimo2codex` repo that touches a feature MiMo doesn't support
24
24
 
@@ -38,7 +38,7 @@ Quick answer:
38
38
  | Audio chat | ✅ | `mimo-v2-omni` | input only |
39
39
  | Video understanding | ✅ | `mimo-v2-omni` | input only |
40
40
  | **Image generation** | ❌ | — | `scripts/generate_image.py` (general) or `scripts/generate_pet.py` (Codex pets) — see below |
41
- | OCR / 识图 (when chat model is non-vision) | ⚠️ via `mimo-v2.5` | `scripts/ocr.py` | always uses `mimo-v2.5` internally regardless of chat model |
41
+ | OCR / 识图 (when chat model is non-vision) | ⚠️ via `mimo-v2.5` or free pollinations | `scripts/ocr.py` | `--engine auto`: mimo if `MIMO_API_KEY` set, else pollinations (no key) |
42
42
  | Code interpreter / sandbox | ❌ | — | not provided |
43
43
 
44
44
  For the full capability matrix and examples, read [references/models.md](references/models.md).
@@ -48,7 +48,7 @@ For the full capability matrix and examples, read [references/models.md](referen
48
48
  ```
49
49
  Is it OCR / read text from image / describe / 识别 an image
50
50
  when the active chat model is non-vision?
51
- ├── Yes → use scripts/ocr.py (always routes through mimo-v2.5 internally)
51
+ ├── Yes → use scripts/ocr.py (mimo-v2.5 if MIMO_API_KEY set, else free pollinations)
52
52
  └── No
53
53
 
54
54
  Is it chat / vision / search / TTS / ASR with a vision-capable model?
@@ -60,45 +60,51 @@ when the active chat model is non-vision?
60
60
  └── No → see "General (non-pet) image generation" below (scripts/generate_image.py)
61
61
  ```
62
62
 
63
- ## Calling MiMo directly
63
+ ## Calling chat directly (works without any key)
64
64
 
65
- Use `scripts/mimo_chat.py` to send a single chat completion (or stream):
65
+ Use `scripts/mimo_chat.py` for one-shot or streaming chat. Two engines, `--engine auto` (default) picks `mimo` if `MIMO_API_KEY` is set, else `pollinations` (free, no key) — so **the script works without any key** for text and vision.
66
66
 
67
67
  ```bash
68
+ # Zero-setup — uses pollinations fallback when MIMO_API_KEY is unset
69
+ python3 mimoskill/scripts/mimo_chat.py "your prompt here"
70
+ python3 mimoskill/scripts/mimo_chat.py --image https://example.com/x.png "describe this"
71
+
72
+ # Best quality + MiMo-specific features (web search, TTS, ASR)
68
73
  export MIMO_API_KEY=sk-xxxxxxxxxxxxxxxx
69
74
  python3 mimoskill/scripts/mimo_chat.py "your prompt here"
70
- python3 mimoskill/scripts/mimo_chat.py --model mimo-v2.5 --image https://example.com/x.png "describe this"
71
- python3 mimoskill/scripts/mimo_chat.py --search "今天上海天气?"
75
+ python3 mimoskill/scripts/mimo_chat.py "今天上海天气?" # web search auto-enabled on sk-* keys
72
76
  python3 mimoskill/scripts/mimo_chat.py --stream "tell me a story"
73
77
  ```
74
78
 
75
- The script handles all the MiMo-specific quirks — `max_completion_tokens` instead of `max_tokens`, the required `text` part next to `image_url`, web_search plugin invocation, `reasoning_content` round-tripping, etc.
79
+ When the mimo engine is active the script handles all MiMo-specific quirks — `max_completion_tokens` instead of `max_tokens`, the required `text` part next to `image_url`, `reasoning_content` round-tripping, etc. **Web search is auto-enabled on pay-as-you-go (`sk-*`) keys** — the `web_search` builtin is always included in the tools array and the model decides when to invoke it (`tool_choice: "auto"`). Token-plan (`tp-*`) keys skip web search (the endpoint doesn't support it). The pollinations engine doesn't support web search, TTS, or ASR (those are MiMo native features); it auto-switches to OpenAI-compat field names (`max_tokens`).
76
80
 
77
81
  For non-trivial integrations, [references/models.md](references/models.md) and [the official MiMo OpenAI-compat doc](https://platform.xiaomimimo.com/docs/api/chat/openai-api) are the authoritative references.
78
82
 
79
83
  ## OCR / image recognition (when the chat model can't see images)
80
84
 
81
- If the user wants to **read text from an image** or **describe / 识别 an image** but the current chat model is non-vision (`mimo-v2.5-pro`, `mimo-v2.5-pro[1m]`, `mimo-v2-flash`, or any third-party model without vision), invoke `scripts/ocr.py`. It always uses `mimo-v2.5` internally the chat model stays untouched.
85
+ If the user wants to **read text from an image** or **describe / 识别 an image** but the current chat model is non-vision (`mimo-v2.5-pro`, `mimo-v2.5-pro[1m]`, `mimo-v2-flash`, `deepseek-*`, or any third-party text-only model), invoke `scripts/ocr.py`. Two engines, `--engine auto` (default) picks the right one:
86
+
87
+ - **`mimo`** — needs `MIMO_API_KEY`, uses `mimo-v2.5` regardless of the chat model. Best quality.
88
+ - **`pollinations`** — free public vision endpoint at `text.pollinations.ai`, **no key required**. Mirrors the same no-key fallback `generate_pet.py` uses. Rate-limited but always available — covers users who only have a DeepSeek key (or no key at all).
82
89
 
83
90
  The proxy silently drops image attachments on non-vision models (`src/translate/reqToChat.ts:48-72`) and leaves a `[N image attachment(s) omitted: …]` placeholder. **When you see that placeholder in the transcript, the right move is to run ocr.py and feed the text back into the conversation.** Don't ask the user to switch models.
84
91
 
85
92
  ```bash
86
- export MIMO_API_KEY=sk-xxxxxxxxxxxxxxxx
87
-
88
- # verbatim OCR (default)
93
+ # Zero-setup — uses pollinations fallback when MIMO_API_KEY is unset
89
94
  python3 mimoskill/scripts/ocr.py path/to/image.png
90
-
91
- # 2-4 sentence description
92
95
  python3 mimoskill/scripts/ocr.py --mode describe https://example.com/x.png
93
-
94
- # structured JSON (text + regions + language + summary)
95
96
  python3 mimoskill/scripts/ocr.py --mode structured a.png b.jpg
96
-
97
- # re-render as GitHub-flavored Markdown (good for forms / receipts)
98
97
  cat scan.png | python3 mimoskill/scripts/ocr.py --mode markdown
98
+
99
+ # Best quality — set MiMo key, auto picks mimo
100
+ export MIMO_API_KEY=sk-xxxxxxxxxxxxxxxx
101
+ python3 mimoskill/scripts/ocr.py path/to/image.png
102
+
103
+ # Force the free engine even when you have a MiMo key (e.g. to save quota)
104
+ python3 mimoskill/scripts/ocr.py --engine pollinations form.png
99
105
  ```
100
106
 
101
- `ocr.py` accepts local paths, http(s) URLs, `data:` URLs, or stdin bytes. Magic-byte sniffs the MIME (PNG / JPEG / GIF / WebP / BMP). Multiple positional args are batched into one MiMo call. Non-vision `--model` values are auto-coerced to `mimo-v2.5` with one stderr note.
107
+ `ocr.py` accepts local paths, http(s) URLs, `data:` URLs, or stdin bytes. Magic-byte sniffs the MIME (PNG / JPEG / GIF / WebP / BMP). Multiple positional args are batched into one upstream call. Non-vision `--model` values are auto-coerced to `mimo-v2.5` with one stderr note (mimo engine only; on pollinations use `--pollinations-model`).
102
108
 
103
109
  See [references/ocr_workflow.md](references/ocr_workflow.md) for full mode reference, exit codes, JSON shape for `--mode structured`, and the `--lang` / `--prompt` knobs.
104
110
 
@@ -1,26 +1,32 @@
1
1
  # OCR / image recognition workflow
2
2
 
3
3
  `mimoskill/scripts/ocr.py` is the fallback path for reading or describing
4
- images when the surrounding chat model can't see them. It always calls
5
- `mimo-v2.5` (MiMo's vision-capable model) internally, regardless of which
6
- model the rest of the conversation is using.
4
+ images when the surrounding chat model can't see them. Two engines:
5
+
6
+ | Engine | Needs API key? | Quality | Notes |
7
+ |---|---|---|---|
8
+ | `mimo` | yes (`MIMO_API_KEY`) | best | Calls `mimo-v2.5` regardless of the chat model used elsewhere. |
9
+ | `pollinations` | **no** | decent | Free public endpoint at `text.pollinations.ai`. Rate-limited but no signup. |
10
+
11
+ `--engine auto` (default) picks `mimo` if `MIMO_API_KEY` is set, else falls
12
+ back to `pollinations` so users with only a DeepSeek key (or no key at all)
13
+ still get OCR.
7
14
 
8
15
  ## TL;DR
9
16
 
10
17
  ```bash
11
- export MIMO_API_KEY=sk-xxxxxxxxxxxxxxxx
12
-
13
- # default mode (text) — verbatim OCR
18
+ # Zero-setup — uses free pollinations fallback when MIMO_API_KEY is unset
14
19
  python3 mimoskill/scripts/ocr.py path/to/image.png
15
-
16
- # describe the image in 2-4 sentences
17
20
  python3 mimoskill/scripts/ocr.py --mode describe path/to/image.png
18
-
19
- # structured JSON (text + regions + language + summary)
20
21
  python3 mimoskill/scripts/ocr.py --mode structured a.png b.jpg
21
-
22
- # re-render as GitHub-flavored Markdown
23
22
  python3 mimoskill/scripts/ocr.py --mode markdown form.png
23
+
24
+ # Force the free engine even when you have a MiMo key (e.g. to save quota)
25
+ python3 mimoskill/scripts/ocr.py --engine pollinations form.png
26
+
27
+ # Best quality — set MiMo key
28
+ export MIMO_API_KEY=sk-xxxxxxxxxxxxxxxx
29
+ python3 mimoskill/scripts/ocr.py path/to/image.png # auto -> mimo
24
30
  ```
25
31
 
26
32
  ## Why this skill exists
@@ -161,21 +167,39 @@ silently (one stderr line) rather than failing.
161
167
 
162
168
  ## When `MIMO_API_KEY` isn't set
163
169
 
164
- `ocr.py` exits with code `3` and this stderr message:
170
+ `--engine auto` (the default) silently falls back to `pollinations`:
165
171
 
166
172
  ```
167
- error: MIMO_API_KEY is not set; ocr.py needs MiMo V2.5 vision to read images.
168
- set one at https://platform.xiaomimimo.com/#/console/api-keys
169
- OR if you want fully-local OCR with no API key, install tesseract:
170
- macOS: brew install tesseract tesseract-lang
171
- Ubuntu: sudo apt install tesseract-ocr tesseract-ocr-chi-sim
172
- Windows: https://github.com/UB-Mannheim/tesseract/wiki
173
- then run: tesseract <image> - -l eng+chi_sim
174
- (tesseract is NOT installed or invoked by this skill; this is just a pointer.)
173
+ [engine] auto -> pollinations (free, no key). Set MIMO_API_KEY for higher quality (mimo-v2.5).
174
+ [ocr] engine=pollinations mode=text model=openai images=1
175
+ <extracted text>
176
+ ```
177
+
178
+ Exit code `3` is only raised when the user explicitly passes `--engine mimo`
179
+ without a key (passing the flag is treated as an assertion that MiMo should
180
+ be used; auto-falling-back would mask the misconfiguration).
181
+
182
+ If you'd rather use **fully-local OCR** with no network at all, install
183
+ tesseract and shell to it directly — this skill won't auto-invoke it:
184
+
185
+ ```bash
186
+ macOS: brew install tesseract tesseract-lang
187
+ Ubuntu: sudo apt install tesseract-ocr tesseract-ocr-chi-sim
188
+ Windows: https://github.com/UB-Mannheim/tesseract/wiki
189
+ tesseract <image> - -l eng+chi_sim
175
190
  ```
176
191
 
177
- The tesseract pointer is **just a pointer** — this skill never auto-shells
178
- to it. Keeps the dependency surface predictable.
192
+ ## Pollinations specifics
193
+
194
+ - Endpoint: `https://text.pollinations.ai/openai` (OpenAI Chat Completions
195
+ compatible).
196
+ - Default model: `openai` (vision-capable). Override with
197
+ `--pollinations-model <name>` or `POLLINATIONS_MODEL=<name>`. Other
198
+ vision-capable picks include `openai-large`, `openai-fast`.
199
+ - No `Authorization` header is sent; the service is open. Rate limits apply
200
+ per-IP; if you hit them you'll see HTTP 429 in stderr — wait or retry.
201
+ - `reasoning_content` is normally empty for pollinations responses (the
202
+ underlying models don't expose chain-of-thought).
179
203
 
180
204
  ## Common pitfalls
181
205
 
@@ -194,9 +218,9 @@ to it. Keeps the dependency surface predictable.
194
218
  | Code | Meaning |
195
219
  |---|---|
196
220
  | 0 | Success |
197
- | 1 | MiMo HTTP error (error body printed to stderr) |
221
+ | 1 | Upstream HTTP error (MiMo or Pollinations; error body printed to stderr) |
198
222
  | 2 | argv / usage error (no image, mutually exclusive flags, etc.) |
199
- | 3 | `MIMO_API_KEY` not set |
223
+ | 3 | `--engine mimo` explicitly requested but `MIMO_API_KEY` not set |
200
224
  | 4 | Local image file not found / unreadable |
201
225
 
202
226
  ## Composing with `mimo_chat.py`
@@ -1,21 +1,29 @@
1
1
  #!/usr/bin/env python3
2
2
  """
3
- mimo_chat.py — single-shot or streaming chat with Xiaomi MiMo V2.5.
3
+ mimo_chat.py — single-shot or streaming chat. Works WITHOUT any API key.
4
4
 
5
- Hits MiMo's OpenAI-compatible /v1/chat/completions endpoint directly. Handles
6
- the MiMo-specific quirks:
5
+ Engines (--engine):
6
+ auto (default) mimo if MIMO_API_KEY set, else pollinations
7
+ mimo — Xiaomi MiMo V2.5 (best quality, needs MIMO_API_KEY)
8
+ pollinations — pollinations.ai free public chat endpoint. NO KEY REQUIRED
7
9
 
10
+ When the mimo engine is used, handles the MiMo-specific quirks:
8
11
  - max_completion_tokens (not max_tokens)
9
12
  - vision via mimo-v2.5 / mimo-v2-omni (and the required text part next to
10
13
  image_url, otherwise MiMo 400s with "text is not set")
11
- - web_search builtin tool (requires Web Search Plugin activated in console)
14
+ - web_search builtin: auto-enabled on pay-as-you-go (sk-*) keys, skipped on
15
+ token-plan (tp-*) keys. Model decides when to invoke (tool_choice: auto).
16
+ Requires the Web Search Plugin to be activated in the MiMo console.
12
17
  - reasoning_content extraction
13
18
 
14
19
  Usage:
15
- export MIMO_API_KEY=sk-xxxx
20
+ # Zero-setup
16
21
  python3 mimo_chat.py "your prompt"
17
- python3 mimo_chat.py --model mimo-v2.5 --image https://x/y.png "describe"
18
- python3 mimo_chat.py --search "今天上海天气?"
22
+ python3 mimo_chat.py --image https://x/y.png "describe"
23
+
24
+ # MiMo key — gets best quality + native web search (when sk-*)
25
+ export MIMO_API_KEY=sk-xxxx
26
+ python3 mimo_chat.py "今天上海天气?"
19
27
  python3 mimo_chat.py --stream "tell me a story"
20
28
 
21
29
  Only depends on the standard library — no `openai` SDK install needed.
@@ -48,51 +56,64 @@ def build_messages(prompt: str, image: str | None) -> list[dict[str, Any]]:
48
56
  ]
49
57
 
50
58
 
59
+ POLLINATIONS_URL = "https://text.pollinations.ai/openai"
60
+ POLLINATIONS_DEFAULT_MODEL = "openai" # vision-capable, free, no key
61
+
62
+
51
63
  def build_body(
52
64
  *,
53
65
  prompt: str,
54
66
  image: str | None,
55
67
  model: str,
56
68
  stream: bool,
57
- search: bool,
69
+ enable_web_search: bool,
58
70
  max_tokens: int,
59
71
  temperature: float,
72
+ engine: str,
60
73
  ) -> dict[str, Any]:
61
74
  body: dict[str, Any] = {
62
75
  "model": model,
63
76
  "messages": build_messages(prompt, image),
64
- "max_completion_tokens": max_tokens,
65
77
  "temperature": temperature,
66
78
  "stream": stream,
67
79
  }
68
- if search:
69
- # MiMo native web_search builtin. Requires the Web Search Plugin to
70
- # be activated at https://platform.xiaomimimo.com/#/console/plugin.
71
- body["tools"] = [{"type": "web_search", "force_search": True}]
80
+ if engine == "mimo":
81
+ # MiMo's quirk: max_completion_tokens, not max_tokens.
82
+ body["max_completion_tokens"] = max_tokens
83
+ else:
84
+ body["max_tokens"] = max_tokens
85
+ if enable_web_search:
86
+ # MiMo native web_search builtin. The model decides whether to invoke
87
+ # it (tool_choice=auto). Requires the Web Search Plugin to be
88
+ # activated at https://platform.xiaomimimo.com/#/console/plugin —
89
+ # without that, MiMo returns 400 and the error body is printed.
90
+ body["tools"] = [{"type": "web_search"}]
72
91
  body["tool_choice"] = "auto"
73
92
  return body
74
93
 
75
94
 
76
- def post(url: str, body: dict[str, Any], api_key: str, stream: bool) -> Any:
95
+ def post(url: str, body: dict[str, Any], api_key: str | None, stream: bool, *, engine: str) -> Any:
96
+ headers = {
97
+ "Content-Type": "application/json",
98
+ "Accept": "text/event-stream" if stream else "application/json",
99
+ "User-Agent": "mimoskill/0.1",
100
+ }
101
+ if api_key:
102
+ headers["Authorization"] = f"Bearer {api_key}"
77
103
  req = urllib.request.Request(
78
104
  url,
79
105
  method="POST",
80
106
  data=json.dumps(body).encode("utf-8"),
81
- headers={
82
- "Content-Type": "application/json",
83
- "Accept": "text/event-stream" if stream else "application/json",
84
- "Authorization": f"Bearer {api_key}",
85
- "User-Agent": "mimoskill/0.1",
86
- },
107
+ headers=headers,
87
108
  )
88
109
  try:
89
110
  return urllib.request.urlopen(req, timeout=300)
90
111
  except urllib.error.HTTPError as e:
91
112
  snippet = e.read().decode("utf-8", "replace")
92
- sys.stderr.write(f"MiMo returned HTTP {e.code}: {snippet}\n")
113
+ sys.stderr.write(f"{engine} returned HTTP {e.code}: {snippet}\n")
93
114
  sys.exit(1)
94
115
  except urllib.error.URLError as e:
95
- sys.stderr.write(f"connection failed: {e}\n")
116
+ sys.stderr.write(f"connection failed ({engine}): {e}\n")
96
117
  sys.exit(1)
97
118
 
98
119
 
@@ -144,51 +165,99 @@ def main() -> None:
144
165
  p.add_argument("prompt", nargs="?", default="", help="user message text")
145
166
  p.add_argument("--model", default=os.environ.get("MIMO_MODEL", "mimo-v2.5-pro"))
146
167
  p.add_argument("--image", help="image URL to attach (forces vision-capable model)")
147
- p.add_argument("--search", action="store_true", help="enable MiMo web_search builtin")
148
168
  p.add_argument("--stream", action="store_true", help="stream the response")
149
169
  p.add_argument("--max-tokens", type=int, default=2048)
150
170
  p.add_argument("--temperature", type=float, default=0.7)
171
+ p.add_argument(
172
+ "--engine",
173
+ choices=["auto", "mimo", "pollinations"],
174
+ default=os.environ.get("MIMO_CHAT_ENGINE", "auto"),
175
+ help="chat backend. auto = mimo if MIMO_API_KEY set, else pollinations "
176
+ "(free, no key required). default: %(default)s",
177
+ )
151
178
  p.add_argument(
152
179
  "--base-url",
153
180
  default=os.environ.get("MIMO_BASE_URL", "https://api.xiaomimimo.com/v1"),
154
- help="set to https://token-plan-cn.xiaomimimo.com/v1 for tp-* keys",
181
+ help="MiMo endpoint, ignored when --engine=pollinations "
182
+ "(tp-* keys use https://token-plan-cn.xiaomimimo.com/v1)",
183
+ )
184
+ p.add_argument(
185
+ "--pollinations-model",
186
+ default=os.environ.get("POLLINATIONS_MODEL", POLLINATIONS_DEFAULT_MODEL),
187
+ help="model id when --engine=pollinations (default: %(default)s)",
155
188
  )
156
189
  args = p.parse_args()
157
190
 
158
191
  api_key = os.environ.get("MIMO_API_KEY")
159
- if not api_key:
160
- sys.stderr.write("error: MIMO_API_KEY not set in environment\n")
161
- sys.stderr.write(
162
- " get one at https://platform.xiaomimimo.com/#/console/api-keys\n"
163
- )
164
- sys.exit(2)
192
+
193
+ # Resolve engine.
194
+ if args.engine == "mimo":
195
+ engine = "mimo"
196
+ if not api_key:
197
+ sys.stderr.write(
198
+ "error: --engine mimo requires MIMO_API_KEY.\n"
199
+ " get one at https://platform.xiaomimimo.com/#/console/api-keys\n"
200
+ " OR drop the flag to fall back to pollinations (free, no key required):\n"
201
+ " python3 mimo_chat.py <prompt>\n"
202
+ )
203
+ sys.exit(3)
204
+ elif args.engine == "pollinations":
205
+ engine = "pollinations"
206
+ else: # auto
207
+ engine = "mimo" if api_key else "pollinations"
208
+ if engine == "pollinations":
209
+ sys.stderr.write(
210
+ "[engine] auto -> pollinations (free, no key). "
211
+ "Set MIMO_API_KEY for higher quality (mimo-v2.5).\n"
212
+ )
165
213
 
166
214
  if not args.prompt and not args.image:
167
215
  sys.stderr.write("error: pass a prompt and/or --image\n")
168
216
  sys.exit(2)
169
217
 
170
- # Auto-bump to a vision model if user passed --image with a non-vision model
171
- model = args.model
172
- if args.image and "omni" not in model.lower() and not model.startswith("mimo-v2.5["):
173
- if model != "mimo-v2.5":
174
- sys.stderr.write(
175
- f"note: --image given but model is '{model}' which doesn't see images.\n"
176
- f" switching to mimo-v2.5 for this call.\n"
177
- )
178
- model = "mimo-v2.5"
218
+ enable_web_search = False
219
+ if engine == "mimo":
220
+ # Auto-bump to a vision model if user passed --image with a non-vision model.
221
+ model = args.model
222
+ if args.image and "omni" not in model.lower() and not model.startswith("mimo-v2.5["):
223
+ if model != "mimo-v2.5":
224
+ sys.stderr.write(
225
+ f"note: --image given but model is '{model}' which doesn't see images.\n"
226
+ f" switching to mimo-v2.5 for this call.\n"
227
+ )
228
+ model = "mimo-v2.5"
229
+ url = args.base_url.rstrip("/") + "/chat/completions"
230
+ auth: str | None = api_key
231
+ # MiMo native web_search: pay-as-you-go (sk-*) supports it, token-plan
232
+ # (tp-*) does not. Always include the tool on sk-* and let the model
233
+ # decide via tool_choice=auto — no extra flag needed.
234
+ enable_web_search = bool(api_key and api_key.startswith("sk-"))
235
+ else:
236
+ # Pollinations: pick the configured vision-capable model. The user's
237
+ # --model (mimo-*) is mimo-specific so we don't honor it here unless
238
+ # they explicitly passed --pollinations-model.
239
+ model = args.pollinations_model
240
+ url = POLLINATIONS_URL
241
+ auth = None
242
+
243
+ sys.stderr.write(
244
+ f"[chat] engine={engine} model={model}"
245
+ + (" web_search=on" if enable_web_search else "")
246
+ + "\n"
247
+ )
179
248
 
180
249
  body = build_body(
181
250
  prompt=args.prompt,
182
251
  image=args.image,
183
252
  model=model,
184
253
  stream=args.stream,
185
- search=args.search,
254
+ enable_web_search=enable_web_search,
186
255
  max_tokens=args.max_tokens,
187
256
  temperature=args.temperature,
257
+ engine=engine,
188
258
  )
189
259
 
190
- url = args.base_url.rstrip("/") + "/chat/completions"
191
- resp = post(url, body, api_key, args.stream)
260
+ resp = post(url, body, auth, args.stream, engine=engine)
192
261
  if args.stream:
193
262
  stream_chat(resp)
194
263
  else:
@@ -1,11 +1,14 @@
1
1
  #!/usr/bin/env python3
2
2
  """
3
- ocr.py — OCR / image recognition via Xiaomi MiMo V2.5 vision.
3
+ ocr.py — OCR / image recognition that works without any API key.
4
4
 
5
5
  Use this when the surrounding chat model can't see images (mimo-v2.5-pro,
6
- mimo-v2.5-pro[1m], mimo-v2-flash, or any third-party model without vision).
7
- ocr.py always calls mimo-v2.5 internally regardless of what the rest of the
8
- conversation is using.
6
+ mimo-v2.5-pro[1m], mimo-v2-flash, deepseek-*, or any text-only model).
7
+
8
+ Engines (--engine):
9
+ auto (default) — mimo if MIMO_API_KEY set, else pollinations
10
+ mimo — Xiaomi MiMo V2.5 vision. Highest quality. Needs MIMO_API_KEY
11
+ pollinations — pollinations.ai free public vision endpoint. NO KEY REQUIRED
9
12
 
10
13
  Modes (--mode):
11
14
  text (default) verbatim OCR — raw text, preserves line breaks
@@ -21,9 +24,12 @@ Image inputs (positional, 0+):
21
24
  (none, stdin not a TTY) same as `-`
22
25
 
23
26
  Usage:
24
- export MIMO_API_KEY=sk-xxxx
27
+ # Zero-setup: free fallback, works for DeepSeek-only / no-key users
25
28
  python3 ocr.py path/to/image.png
26
29
  python3 ocr.py --mode describe https://example.com/x.png
30
+
31
+ # Best quality (needs MiMo key)
32
+ export MIMO_API_KEY=sk-xxxx
27
33
  python3 ocr.py --mode structured a.png b.jpg
28
34
  cat scan.png | python3 ocr.py --mode markdown
29
35
 
@@ -194,26 +200,32 @@ def build_messages(
194
200
 
195
201
  # --- HTTP -------------------------------------------------------------------
196
202
 
197
- def post(url: str, body: dict[str, Any], api_key: str, stream: bool) -> Any:
203
+ POLLINATIONS_URL = "https://text.pollinations.ai/openai"
204
+ POLLINATIONS_DEFAULT_MODEL = "openai" # vision-capable, free, no key
205
+
206
+
207
+ def post(url: str, body: dict[str, Any], api_key: str | None, stream: bool, *, engine: str) -> Any:
208
+ headers = {
209
+ "Content-Type": "application/json",
210
+ "Accept": "text/event-stream" if stream else "application/json",
211
+ "User-Agent": "mimoskill-ocr/0.1",
212
+ }
213
+ if api_key:
214
+ headers["Authorization"] = f"Bearer {api_key}"
198
215
  req = urllib.request.Request(
199
216
  url,
200
217
  method="POST",
201
218
  data=json.dumps(body).encode("utf-8"),
202
- headers={
203
- "Content-Type": "application/json",
204
- "Accept": "text/event-stream" if stream else "application/json",
205
- "Authorization": f"Bearer {api_key}",
206
- "User-Agent": "mimoskill-ocr/0.1",
207
- },
219
+ headers=headers,
208
220
  )
209
221
  try:
210
222
  return urllib.request.urlopen(req, timeout=300)
211
223
  except urllib.error.HTTPError as e:
212
224
  snippet = e.read().decode("utf-8", "replace")
213
- sys.stderr.write(f"MiMo returned HTTP {e.code}: {snippet}\n")
225
+ sys.stderr.write(f"{engine} returned HTTP {e.code}: {snippet}\n")
214
226
  sys.exit(1)
215
227
  except urllib.error.URLError as e:
216
- sys.stderr.write(f"connection failed: {e}\n")
228
+ sys.stderr.write(f"connection failed ({engine}): {e}\n")
217
229
  sys.exit(1)
218
230
 
219
231
 
@@ -289,10 +301,23 @@ def main() -> None:
289
301
  )
290
302
  p.add_argument("--max-tokens", type=int, default=4096)
291
303
  p.add_argument("--temperature", type=float, default=0.2)
304
+ p.add_argument(
305
+ "--engine",
306
+ choices=["auto", "mimo", "pollinations"],
307
+ default=os.environ.get("MIMO_OCR_ENGINE", "auto"),
308
+ help="OCR backend. auto = mimo if MIMO_API_KEY set, else pollinations "
309
+ "(free, no key required). default: %(default)s",
310
+ )
292
311
  p.add_argument(
293
312
  "--base-url",
294
313
  default=os.environ.get("MIMO_BASE_URL", "https://api.xiaomimimo.com/v1"),
295
- help="MiMo OpenAI-compat endpoint (default: %(default)s)",
314
+ help="MiMo OpenAI-compat endpoint, ignored when --engine=pollinations "
315
+ "(default: %(default)s)",
316
+ )
317
+ p.add_argument(
318
+ "--pollinations-model",
319
+ default=os.environ.get("POLLINATIONS_MODEL", POLLINATIONS_DEFAULT_MODEL),
320
+ help="model id when --engine=pollinations (default: %(default)s)",
296
321
  )
297
322
  p.add_argument(
298
323
  "--prompt",
@@ -304,18 +329,27 @@ def main() -> None:
304
329
  args = p.parse_args()
305
330
 
306
331
  api_key = os.environ.get("MIMO_API_KEY")
307
- if not api_key:
308
- sys.stderr.write(
309
- "error: MIMO_API_KEY is not set; ocr.py needs MiMo V2.5 vision to read images.\n"
310
- " set one at https://platform.xiaomimimo.com/#/console/api-keys\n"
311
- " OR if you want fully-local OCR with no API key, install tesseract:\n"
312
- " macOS: brew install tesseract tesseract-lang\n"
313
- " Ubuntu: sudo apt install tesseract-ocr tesseract-ocr-chi-sim\n"
314
- " Windows: https://github.com/UB-Mannheim/tesseract/wiki\n"
315
- " then run: tesseract <image> - -l eng+chi_sim\n"
316
- " (tesseract is NOT installed or invoked by this skill; this is just a pointer.)\n"
317
- )
318
- sys.exit(3)
332
+
333
+ # Resolve engine.
334
+ if args.engine == "mimo":
335
+ engine = "mimo"
336
+ if not api_key:
337
+ sys.stderr.write(
338
+ "error: --engine mimo requires MIMO_API_KEY.\n"
339
+ " set one at https://platform.xiaomimimo.com/#/console/api-keys\n"
340
+ " OR drop the flag to fall back to pollinations (free, no key required):\n"
341
+ " python3 ocr.py <image>\n"
342
+ )
343
+ sys.exit(3)
344
+ elif args.engine == "pollinations":
345
+ engine = "pollinations"
346
+ else: # auto
347
+ engine = "mimo" if api_key else "pollinations"
348
+ if engine == "pollinations":
349
+ sys.stderr.write(
350
+ "[engine] auto -> pollinations (free, no key). "
351
+ "Set MIMO_API_KEY for higher quality (mimo-v2.5).\n"
352
+ )
319
353
 
320
354
  # Resolve images: explicit args, else stdin if not a TTY.
321
355
  raw_args = args.images
@@ -330,12 +364,20 @@ def main() -> None:
330
364
 
331
365
  image_urls = [resolve_image_arg(a) for a in raw_args]
332
366
 
333
- model, note = pick_model(args.model)
334
- if note:
335
- sys.stderr.write(note)
367
+ if engine == "mimo":
368
+ model, note = pick_model(args.model)
369
+ if note:
370
+ sys.stderr.write(note)
371
+ else:
372
+ if args.model:
373
+ sys.stderr.write(
374
+ f"note: --model is mimo-specific; ignoring on pollinations "
375
+ f"(use --pollinations-model instead).\n"
376
+ )
377
+ model = args.pollinations_model
336
378
 
337
379
  sys.stderr.write(
338
- f"[ocr] mode={args.mode} model={model} images={len(image_urls)}\n"
380
+ f"[ocr] engine={engine} mode={args.mode} model={model} images={len(image_urls)}\n"
339
381
  )
340
382
 
341
383
  messages = build_messages(
@@ -348,13 +390,20 @@ def main() -> None:
348
390
  body: dict[str, Any] = {
349
391
  "model": model,
350
392
  "messages": messages,
351
- "max_completion_tokens": args.max_tokens,
352
393
  "temperature": args.temperature,
353
394
  "stream": args.stream,
354
395
  }
396
+ if engine == "mimo":
397
+ # MiMo's quirk: max_completion_tokens, not max_tokens.
398
+ body["max_completion_tokens"] = args.max_tokens
399
+ url = args.base_url.rstrip("/") + "/chat/completions"
400
+ auth = api_key
401
+ else:
402
+ body["max_tokens"] = args.max_tokens
403
+ url = POLLINATIONS_URL
404
+ auth = None
355
405
 
356
- url = args.base_url.rstrip("/") + "/chat/completions"
357
- resp = post(url, body, api_key, args.stream)
406
+ resp = post(url, body, auth, args.stream, engine=engine)
358
407
 
359
408
  if args.stream:
360
409
  content, reasoning = stream_chat(resp)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "mimo2codex",
3
- "version": "0.1.16",
3
+ "version": "0.1.18",
4
4
  "description": "Local proxy that lets the latest OpenAI Codex CLI / desktop talk to Xiaomi MiMo (V2.5 Pro) via the Responses API by translating to Chat Completions on the fly.",
5
5
  "keywords": [
6
6
  "codex",