mimo2codex 0.1.16 → 0.1.18
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +24 -1
- package/README.zh.md +23 -1
- package/dist/providers/deepseek.js +2 -1
- package/dist/providers/deepseek.js.map +1 -1
- package/dist/providers/generic.js +2 -1
- package/dist/providers/generic.js.map +1 -1
- package/dist/providers/mimo.js +1 -0
- package/dist/providers/mimo.js.map +1 -1
- package/dist/server.js +1 -0
- package/dist/server.js.map +1 -1
- package/dist/translate/reqToChat.js +104 -7
- package/dist/translate/reqToChat.js.map +1 -1
- package/doc/mimoskill.md +295 -0
- package/doc/mimoskill.zh.md +295 -0
- package/mimoskill/SKILL.md +25 -19
- package/mimoskill/references/ocr_workflow.md +49 -25
- package/mimoskill/scripts/mimo_chat.py +111 -42
- package/mimoskill/scripts/ocr.py +83 -34
- package/package.json +1 -1
package/mimoskill/SKILL.md
CHANGED
|
@@ -18,7 +18,7 @@ Trigger this skill when:
|
|
|
18
18
|
- User asks "how do I generate a Codex pet" / "/hatch isn't working" / "image_gen tool not available"
|
|
19
19
|
- User wants image generation as part of a MiMo-backed workflow
|
|
20
20
|
- User pastes the Codex error: `the image generation tool (image_gen) is not available in this environment` or `the CLI fallback requires the openai Python package`
|
|
21
|
-
- User wants to **OCR / read text from / describe / 识别 / 提取文字 from an image** while the active chat model is non-vision (e.g. mimo-v2.5-pro, mimo-v2-flash, or any third-party model
|
|
21
|
+
- User wants to **OCR / read text from / describe / 识别 / 提取文字 from an image** while the active chat model is non-vision (e.g. mimo-v2.5-pro, mimo-v2-flash, deepseek-*, or any third-party text-only model) — use `scripts/ocr.py`. Works with or without a MiMo key (free pollinations fallback when `MIMO_API_KEY` is unset).
|
|
22
22
|
- User sees the proxy's `[N image attachment(s) omitted: this model does not support image input …]` placeholder in their transcript
|
|
23
23
|
- Anything in the `mimo2codex` repo that touches a feature MiMo doesn't support
|
|
24
24
|
|
|
@@ -38,7 +38,7 @@ Quick answer:
|
|
|
38
38
|
| Audio chat | ✅ | `mimo-v2-omni` | input only |
|
|
39
39
|
| Video understanding | ✅ | `mimo-v2-omni` | input only |
|
|
40
40
|
| **Image generation** | ❌ | — | `scripts/generate_image.py` (general) or `scripts/generate_pet.py` (Codex pets) — see below |
|
|
41
|
-
| OCR / 识图 (when chat model is non-vision) | ⚠️ via `mimo-v2.5` | `scripts/ocr.py` |
|
|
41
|
+
| OCR / 识图 (when chat model is non-vision) | ⚠️ via `mimo-v2.5` or free pollinations | `scripts/ocr.py` | `--engine auto`: mimo if `MIMO_API_KEY` set, else pollinations (no key) |
|
|
42
42
|
| Code interpreter / sandbox | ❌ | — | not provided |
|
|
43
43
|
|
|
44
44
|
For the full capability matrix and examples, read [references/models.md](references/models.md).
|
|
@@ -48,7 +48,7 @@ For the full capability matrix and examples, read [references/models.md](referen
|
|
|
48
48
|
```
|
|
49
49
|
Is it OCR / read text from image / describe / 识别 an image
|
|
50
50
|
when the active chat model is non-vision?
|
|
51
|
-
├── Yes → use scripts/ocr.py (
|
|
51
|
+
├── Yes → use scripts/ocr.py (mimo-v2.5 if MIMO_API_KEY set, else free pollinations)
|
|
52
52
|
└── No
|
|
53
53
|
│
|
|
54
54
|
Is it chat / vision / search / TTS / ASR with a vision-capable model?
|
|
@@ -60,45 +60,51 @@ when the active chat model is non-vision?
|
|
|
60
60
|
└── No → see "General (non-pet) image generation" below (scripts/generate_image.py)
|
|
61
61
|
```
|
|
62
62
|
|
|
63
|
-
## Calling
|
|
63
|
+
## Calling chat directly (works without any key)
|
|
64
64
|
|
|
65
|
-
Use `scripts/mimo_chat.py`
|
|
65
|
+
Use `scripts/mimo_chat.py` for one-shot or streaming chat. Two engines, `--engine auto` (default) picks `mimo` if `MIMO_API_KEY` is set, else `pollinations` (free, no key) — so **the script works without any key** for text and vision.
|
|
66
66
|
|
|
67
67
|
```bash
|
|
68
|
+
# Zero-setup — uses pollinations fallback when MIMO_API_KEY is unset
|
|
69
|
+
python3 mimoskill/scripts/mimo_chat.py "your prompt here"
|
|
70
|
+
python3 mimoskill/scripts/mimo_chat.py --image https://example.com/x.png "describe this"
|
|
71
|
+
|
|
72
|
+
# Best quality + MiMo-specific features (web search, TTS, ASR)
|
|
68
73
|
export MIMO_API_KEY=sk-xxxxxxxxxxxxxxxx
|
|
69
74
|
python3 mimoskill/scripts/mimo_chat.py "your prompt here"
|
|
70
|
-
python3 mimoskill/scripts/mimo_chat.py
|
|
71
|
-
python3 mimoskill/scripts/mimo_chat.py --search "今天上海天气?"
|
|
75
|
+
python3 mimoskill/scripts/mimo_chat.py "今天上海天气?" # web search auto-enabled on sk-* keys
|
|
72
76
|
python3 mimoskill/scripts/mimo_chat.py --stream "tell me a story"
|
|
73
77
|
```
|
|
74
78
|
|
|
75
|
-
|
|
79
|
+
When the mimo engine is active the script handles all MiMo-specific quirks — `max_completion_tokens` instead of `max_tokens`, the required `text` part next to `image_url`, `reasoning_content` round-tripping, etc. **Web search is auto-enabled on pay-as-you-go (`sk-*`) keys** — the `web_search` builtin is always included in the tools array and the model decides when to invoke it (`tool_choice: "auto"`). Token-plan (`tp-*`) keys skip web search (the endpoint doesn't support it). The pollinations engine doesn't support web search, TTS, or ASR (those are MiMo native features); it auto-switches to OpenAI-compat field names (`max_tokens`).
|
|
76
80
|
|
|
77
81
|
For non-trivial integrations, [references/models.md](references/models.md) and [the official MiMo OpenAI-compat doc](https://platform.xiaomimimo.com/docs/api/chat/openai-api) are the authoritative references.
|
|
78
82
|
|
|
79
83
|
## OCR / image recognition (when the chat model can't see images)
|
|
80
84
|
|
|
81
|
-
If the user wants to **read text from an image** or **describe / 识别 an image** but the current chat model is non-vision (`mimo-v2.5-pro`, `mimo-v2.5-pro[1m]`, `mimo-v2-flash`, or any third-party model
|
|
85
|
+
If the user wants to **read text from an image** or **describe / 识别 an image** but the current chat model is non-vision (`mimo-v2.5-pro`, `mimo-v2.5-pro[1m]`, `mimo-v2-flash`, `deepseek-*`, or any third-party text-only model), invoke `scripts/ocr.py`. Two engines, `--engine auto` (default) picks the right one:
|
|
86
|
+
|
|
87
|
+
- **`mimo`** — needs `MIMO_API_KEY`, uses `mimo-v2.5` regardless of the chat model. Best quality.
|
|
88
|
+
- **`pollinations`** — free public vision endpoint at `text.pollinations.ai`, **no key required**. Mirrors the same no-key fallback `generate_pet.py` uses. Rate-limited but always available — covers users who only have a DeepSeek key (or no key at all).
|
|
82
89
|
|
|
83
90
|
The proxy silently drops image attachments on non-vision models (`src/translate/reqToChat.ts:48-72`) and leaves a `[N image attachment(s) omitted: …]` placeholder. **When you see that placeholder in the transcript, the right move is to run ocr.py and feed the text back into the conversation.** Don't ask the user to switch models.
|
|
84
91
|
|
|
85
92
|
```bash
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
# verbatim OCR (default)
|
|
93
|
+
# Zero-setup — uses pollinations fallback when MIMO_API_KEY is unset
|
|
89
94
|
python3 mimoskill/scripts/ocr.py path/to/image.png
|
|
90
|
-
|
|
91
|
-
# 2-4 sentence description
|
|
92
95
|
python3 mimoskill/scripts/ocr.py --mode describe https://example.com/x.png
|
|
93
|
-
|
|
94
|
-
# structured JSON (text + regions + language + summary)
|
|
95
96
|
python3 mimoskill/scripts/ocr.py --mode structured a.png b.jpg
|
|
96
|
-
|
|
97
|
-
# re-render as GitHub-flavored Markdown (good for forms / receipts)
|
|
98
97
|
cat scan.png | python3 mimoskill/scripts/ocr.py --mode markdown
|
|
98
|
+
|
|
99
|
+
# Best quality — set MiMo key, auto picks mimo
|
|
100
|
+
export MIMO_API_KEY=sk-xxxxxxxxxxxxxxxx
|
|
101
|
+
python3 mimoskill/scripts/ocr.py path/to/image.png
|
|
102
|
+
|
|
103
|
+
# Force the free engine even when you have a MiMo key (e.g. to save quota)
|
|
104
|
+
python3 mimoskill/scripts/ocr.py --engine pollinations form.png
|
|
99
105
|
```
|
|
100
106
|
|
|
101
|
-
`ocr.py` accepts local paths, http(s) URLs, `data:` URLs, or stdin bytes. Magic-byte sniffs the MIME (PNG / JPEG / GIF / WebP / BMP). Multiple positional args are batched into one
|
|
107
|
+
`ocr.py` accepts local paths, http(s) URLs, `data:` URLs, or stdin bytes. Magic-byte sniffs the MIME (PNG / JPEG / GIF / WebP / BMP). Multiple positional args are batched into one upstream call. Non-vision `--model` values are auto-coerced to `mimo-v2.5` with one stderr note (mimo engine only; on pollinations use `--pollinations-model`).
|
|
102
108
|
|
|
103
109
|
See [references/ocr_workflow.md](references/ocr_workflow.md) for full mode reference, exit codes, JSON shape for `--mode structured`, and the `--lang` / `--prompt` knobs.
|
|
104
110
|
|
|
@@ -1,26 +1,32 @@
|
|
|
1
1
|
# OCR / image recognition workflow
|
|
2
2
|
|
|
3
3
|
`mimoskill/scripts/ocr.py` is the fallback path for reading or describing
|
|
4
|
-
images when the surrounding chat model can't see them.
|
|
5
|
-
|
|
6
|
-
|
|
4
|
+
images when the surrounding chat model can't see them. Two engines:
|
|
5
|
+
|
|
6
|
+
| Engine | Needs API key? | Quality | Notes |
|
|
7
|
+
|---|---|---|---|
|
|
8
|
+
| `mimo` | yes (`MIMO_API_KEY`) | best | Calls `mimo-v2.5` regardless of the chat model used elsewhere. |
|
|
9
|
+
| `pollinations` | **no** | decent | Free public endpoint at `text.pollinations.ai`. Rate-limited but no signup. |
|
|
10
|
+
|
|
11
|
+
`--engine auto` (default) picks `mimo` if `MIMO_API_KEY` is set, else falls
|
|
12
|
+
back to `pollinations` so users with only a DeepSeek key (or no key at all)
|
|
13
|
+
still get OCR.
|
|
7
14
|
|
|
8
15
|
## TL;DR
|
|
9
16
|
|
|
10
17
|
```bash
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
# default mode (text) — verbatim OCR
|
|
18
|
+
# Zero-setup — uses free pollinations fallback when MIMO_API_KEY is unset
|
|
14
19
|
python3 mimoskill/scripts/ocr.py path/to/image.png
|
|
15
|
-
|
|
16
|
-
# describe the image in 2-4 sentences
|
|
17
20
|
python3 mimoskill/scripts/ocr.py --mode describe path/to/image.png
|
|
18
|
-
|
|
19
|
-
# structured JSON (text + regions + language + summary)
|
|
20
21
|
python3 mimoskill/scripts/ocr.py --mode structured a.png b.jpg
|
|
21
|
-
|
|
22
|
-
# re-render as GitHub-flavored Markdown
|
|
23
22
|
python3 mimoskill/scripts/ocr.py --mode markdown form.png
|
|
23
|
+
|
|
24
|
+
# Force the free engine even when you have a MiMo key (e.g. to save quota)
|
|
25
|
+
python3 mimoskill/scripts/ocr.py --engine pollinations form.png
|
|
26
|
+
|
|
27
|
+
# Best quality — set MiMo key
|
|
28
|
+
export MIMO_API_KEY=sk-xxxxxxxxxxxxxxxx
|
|
29
|
+
python3 mimoskill/scripts/ocr.py path/to/image.png # auto -> mimo
|
|
24
30
|
```
|
|
25
31
|
|
|
26
32
|
## Why this skill exists
|
|
@@ -161,21 +167,39 @@ silently (one stderr line) rather than failing.
|
|
|
161
167
|
|
|
162
168
|
## When `MIMO_API_KEY` isn't set
|
|
163
169
|
|
|
164
|
-
`
|
|
170
|
+
`--engine auto` (the default) silently falls back to `pollinations`:
|
|
165
171
|
|
|
166
172
|
```
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
173
|
+
[engine] auto -> pollinations (free, no key). Set MIMO_API_KEY for higher quality (mimo-v2.5).
|
|
174
|
+
[ocr] engine=pollinations mode=text model=openai images=1
|
|
175
|
+
<extracted text>
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
Exit code `3` is only raised when the user explicitly passes `--engine mimo`
|
|
179
|
+
without a key (passing the flag is treated as an assertion that MiMo should
|
|
180
|
+
be used; auto-falling-back would mask the misconfiguration).
|
|
181
|
+
|
|
182
|
+
If you'd rather use **fully-local OCR** with no network at all, install
|
|
183
|
+
tesseract and shell to it directly — this skill won't auto-invoke it:
|
|
184
|
+
|
|
185
|
+
```bash
|
|
186
|
+
macOS: brew install tesseract tesseract-lang
|
|
187
|
+
Ubuntu: sudo apt install tesseract-ocr tesseract-ocr-chi-sim
|
|
188
|
+
Windows: https://github.com/UB-Mannheim/tesseract/wiki
|
|
189
|
+
tesseract <image> - -l eng+chi_sim
|
|
175
190
|
```
|
|
176
191
|
|
|
177
|
-
|
|
178
|
-
|
|
192
|
+
## Pollinations specifics
|
|
193
|
+
|
|
194
|
+
- Endpoint: `https://text.pollinations.ai/openai` (OpenAI Chat Completions
|
|
195
|
+
compatible).
|
|
196
|
+
- Default model: `openai` (vision-capable). Override with
|
|
197
|
+
`--pollinations-model <name>` or `POLLINATIONS_MODEL=<name>`. Other
|
|
198
|
+
vision-capable picks include `openai-large`, `openai-fast`.
|
|
199
|
+
- No `Authorization` header is sent; the service is open. Rate limits apply
|
|
200
|
+
per-IP; if you hit them you'll see HTTP 429 in stderr — wait or retry.
|
|
201
|
+
- `reasoning_content` is normally empty for pollinations responses (the
|
|
202
|
+
underlying models don't expose chain-of-thought).
|
|
179
203
|
|
|
180
204
|
## Common pitfalls
|
|
181
205
|
|
|
@@ -194,9 +218,9 @@ to it. Keeps the dependency surface predictable.
|
|
|
194
218
|
| Code | Meaning |
|
|
195
219
|
|---|---|
|
|
196
220
|
| 0 | Success |
|
|
197
|
-
| 1 |
|
|
221
|
+
| 1 | Upstream HTTP error (MiMo or Pollinations; error body printed to stderr) |
|
|
198
222
|
| 2 | argv / usage error (no image, mutually exclusive flags, etc.) |
|
|
199
|
-
| 3 | `MIMO_API_KEY` not set |
|
|
223
|
+
| 3 | `--engine mimo` explicitly requested but `MIMO_API_KEY` not set |
|
|
200
224
|
| 4 | Local image file not found / unreadable |
|
|
201
225
|
|
|
202
226
|
## Composing with `mimo_chat.py`
|
|
@@ -1,21 +1,29 @@
|
|
|
1
1
|
#!/usr/bin/env python3
|
|
2
2
|
"""
|
|
3
|
-
mimo_chat.py — single-shot or streaming chat
|
|
3
|
+
mimo_chat.py — single-shot or streaming chat. Works WITHOUT any API key.
|
|
4
4
|
|
|
5
|
-
|
|
6
|
-
|
|
5
|
+
Engines (--engine):
|
|
6
|
+
auto (default) — mimo if MIMO_API_KEY set, else pollinations
|
|
7
|
+
mimo — Xiaomi MiMo V2.5 (best quality, needs MIMO_API_KEY)
|
|
8
|
+
pollinations — pollinations.ai free public chat endpoint. NO KEY REQUIRED
|
|
7
9
|
|
|
10
|
+
When the mimo engine is used, handles the MiMo-specific quirks:
|
|
8
11
|
- max_completion_tokens (not max_tokens)
|
|
9
12
|
- vision via mimo-v2.5 / mimo-v2-omni (and the required text part next to
|
|
10
13
|
image_url, otherwise MiMo 400s with "text is not set")
|
|
11
|
-
- web_search builtin
|
|
14
|
+
- web_search builtin: auto-enabled on pay-as-you-go (sk-*) keys, skipped on
|
|
15
|
+
token-plan (tp-*) keys. Model decides when to invoke (tool_choice: auto).
|
|
16
|
+
Requires the Web Search Plugin to be activated in the MiMo console.
|
|
12
17
|
- reasoning_content extraction
|
|
13
18
|
|
|
14
19
|
Usage:
|
|
15
|
-
|
|
20
|
+
# Zero-setup
|
|
16
21
|
python3 mimo_chat.py "your prompt"
|
|
17
|
-
python3 mimo_chat.py --
|
|
18
|
-
|
|
22
|
+
python3 mimo_chat.py --image https://x/y.png "describe"
|
|
23
|
+
|
|
24
|
+
# MiMo key — gets best quality + native web search (when sk-*)
|
|
25
|
+
export MIMO_API_KEY=sk-xxxx
|
|
26
|
+
python3 mimo_chat.py "今天上海天气?"
|
|
19
27
|
python3 mimo_chat.py --stream "tell me a story"
|
|
20
28
|
|
|
21
29
|
Only depends on the standard library — no `openai` SDK install needed.
|
|
@@ -48,51 +56,64 @@ def build_messages(prompt: str, image: str | None) -> list[dict[str, Any]]:
|
|
|
48
56
|
]
|
|
49
57
|
|
|
50
58
|
|
|
59
|
+
POLLINATIONS_URL = "https://text.pollinations.ai/openai"
|
|
60
|
+
POLLINATIONS_DEFAULT_MODEL = "openai" # vision-capable, free, no key
|
|
61
|
+
|
|
62
|
+
|
|
51
63
|
def build_body(
|
|
52
64
|
*,
|
|
53
65
|
prompt: str,
|
|
54
66
|
image: str | None,
|
|
55
67
|
model: str,
|
|
56
68
|
stream: bool,
|
|
57
|
-
|
|
69
|
+
enable_web_search: bool,
|
|
58
70
|
max_tokens: int,
|
|
59
71
|
temperature: float,
|
|
72
|
+
engine: str,
|
|
60
73
|
) -> dict[str, Any]:
|
|
61
74
|
body: dict[str, Any] = {
|
|
62
75
|
"model": model,
|
|
63
76
|
"messages": build_messages(prompt, image),
|
|
64
|
-
"max_completion_tokens": max_tokens,
|
|
65
77
|
"temperature": temperature,
|
|
66
78
|
"stream": stream,
|
|
67
79
|
}
|
|
68
|
-
if
|
|
69
|
-
# MiMo
|
|
70
|
-
|
|
71
|
-
|
|
80
|
+
if engine == "mimo":
|
|
81
|
+
# MiMo's quirk: max_completion_tokens, not max_tokens.
|
|
82
|
+
body["max_completion_tokens"] = max_tokens
|
|
83
|
+
else:
|
|
84
|
+
body["max_tokens"] = max_tokens
|
|
85
|
+
if enable_web_search:
|
|
86
|
+
# MiMo native web_search builtin. The model decides whether to invoke
|
|
87
|
+
# it (tool_choice=auto). Requires the Web Search Plugin to be
|
|
88
|
+
# activated at https://platform.xiaomimimo.com/#/console/plugin —
|
|
89
|
+
# without that, MiMo returns 400 and the error body is printed.
|
|
90
|
+
body["tools"] = [{"type": "web_search"}]
|
|
72
91
|
body["tool_choice"] = "auto"
|
|
73
92
|
return body
|
|
74
93
|
|
|
75
94
|
|
|
76
|
-
def post(url: str, body: dict[str, Any], api_key: str, stream: bool) -> Any:
|
|
95
|
+
def post(url: str, body: dict[str, Any], api_key: str | None, stream: bool, *, engine: str) -> Any:
|
|
96
|
+
headers = {
|
|
97
|
+
"Content-Type": "application/json",
|
|
98
|
+
"Accept": "text/event-stream" if stream else "application/json",
|
|
99
|
+
"User-Agent": "mimoskill/0.1",
|
|
100
|
+
}
|
|
101
|
+
if api_key:
|
|
102
|
+
headers["Authorization"] = f"Bearer {api_key}"
|
|
77
103
|
req = urllib.request.Request(
|
|
78
104
|
url,
|
|
79
105
|
method="POST",
|
|
80
106
|
data=json.dumps(body).encode("utf-8"),
|
|
81
|
-
headers=
|
|
82
|
-
"Content-Type": "application/json",
|
|
83
|
-
"Accept": "text/event-stream" if stream else "application/json",
|
|
84
|
-
"Authorization": f"Bearer {api_key}",
|
|
85
|
-
"User-Agent": "mimoskill/0.1",
|
|
86
|
-
},
|
|
107
|
+
headers=headers,
|
|
87
108
|
)
|
|
88
109
|
try:
|
|
89
110
|
return urllib.request.urlopen(req, timeout=300)
|
|
90
111
|
except urllib.error.HTTPError as e:
|
|
91
112
|
snippet = e.read().decode("utf-8", "replace")
|
|
92
|
-
sys.stderr.write(f"
|
|
113
|
+
sys.stderr.write(f"{engine} returned HTTP {e.code}: {snippet}\n")
|
|
93
114
|
sys.exit(1)
|
|
94
115
|
except urllib.error.URLError as e:
|
|
95
|
-
sys.stderr.write(f"connection failed: {e}\n")
|
|
116
|
+
sys.stderr.write(f"connection failed ({engine}): {e}\n")
|
|
96
117
|
sys.exit(1)
|
|
97
118
|
|
|
98
119
|
|
|
@@ -144,51 +165,99 @@ def main() -> None:
|
|
|
144
165
|
p.add_argument("prompt", nargs="?", default="", help="user message text")
|
|
145
166
|
p.add_argument("--model", default=os.environ.get("MIMO_MODEL", "mimo-v2.5-pro"))
|
|
146
167
|
p.add_argument("--image", help="image URL to attach (forces vision-capable model)")
|
|
147
|
-
p.add_argument("--search", action="store_true", help="enable MiMo web_search builtin")
|
|
148
168
|
p.add_argument("--stream", action="store_true", help="stream the response")
|
|
149
169
|
p.add_argument("--max-tokens", type=int, default=2048)
|
|
150
170
|
p.add_argument("--temperature", type=float, default=0.7)
|
|
171
|
+
p.add_argument(
|
|
172
|
+
"--engine",
|
|
173
|
+
choices=["auto", "mimo", "pollinations"],
|
|
174
|
+
default=os.environ.get("MIMO_CHAT_ENGINE", "auto"),
|
|
175
|
+
help="chat backend. auto = mimo if MIMO_API_KEY set, else pollinations "
|
|
176
|
+
"(free, no key required). default: %(default)s",
|
|
177
|
+
)
|
|
151
178
|
p.add_argument(
|
|
152
179
|
"--base-url",
|
|
153
180
|
default=os.environ.get("MIMO_BASE_URL", "https://api.xiaomimimo.com/v1"),
|
|
154
|
-
help="
|
|
181
|
+
help="MiMo endpoint, ignored when --engine=pollinations "
|
|
182
|
+
"(tp-* keys use https://token-plan-cn.xiaomimimo.com/v1)",
|
|
183
|
+
)
|
|
184
|
+
p.add_argument(
|
|
185
|
+
"--pollinations-model",
|
|
186
|
+
default=os.environ.get("POLLINATIONS_MODEL", POLLINATIONS_DEFAULT_MODEL),
|
|
187
|
+
help="model id when --engine=pollinations (default: %(default)s)",
|
|
155
188
|
)
|
|
156
189
|
args = p.parse_args()
|
|
157
190
|
|
|
158
191
|
api_key = os.environ.get("MIMO_API_KEY")
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
|
|
164
|
-
|
|
192
|
+
|
|
193
|
+
# Resolve engine.
|
|
194
|
+
if args.engine == "mimo":
|
|
195
|
+
engine = "mimo"
|
|
196
|
+
if not api_key:
|
|
197
|
+
sys.stderr.write(
|
|
198
|
+
"error: --engine mimo requires MIMO_API_KEY.\n"
|
|
199
|
+
" get one at https://platform.xiaomimimo.com/#/console/api-keys\n"
|
|
200
|
+
" OR drop the flag to fall back to pollinations (free, no key required):\n"
|
|
201
|
+
" python3 mimo_chat.py <prompt>\n"
|
|
202
|
+
)
|
|
203
|
+
sys.exit(3)
|
|
204
|
+
elif args.engine == "pollinations":
|
|
205
|
+
engine = "pollinations"
|
|
206
|
+
else: # auto
|
|
207
|
+
engine = "mimo" if api_key else "pollinations"
|
|
208
|
+
if engine == "pollinations":
|
|
209
|
+
sys.stderr.write(
|
|
210
|
+
"[engine] auto -> pollinations (free, no key). "
|
|
211
|
+
"Set MIMO_API_KEY for higher quality (mimo-v2.5).\n"
|
|
212
|
+
)
|
|
165
213
|
|
|
166
214
|
if not args.prompt and not args.image:
|
|
167
215
|
sys.stderr.write("error: pass a prompt and/or --image\n")
|
|
168
216
|
sys.exit(2)
|
|
169
217
|
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
|
|
176
|
-
|
|
177
|
-
|
|
178
|
-
|
|
218
|
+
enable_web_search = False
|
|
219
|
+
if engine == "mimo":
|
|
220
|
+
# Auto-bump to a vision model if user passed --image with a non-vision model.
|
|
221
|
+
model = args.model
|
|
222
|
+
if args.image and "omni" not in model.lower() and not model.startswith("mimo-v2.5["):
|
|
223
|
+
if model != "mimo-v2.5":
|
|
224
|
+
sys.stderr.write(
|
|
225
|
+
f"note: --image given but model is '{model}' which doesn't see images.\n"
|
|
226
|
+
f" switching to mimo-v2.5 for this call.\n"
|
|
227
|
+
)
|
|
228
|
+
model = "mimo-v2.5"
|
|
229
|
+
url = args.base_url.rstrip("/") + "/chat/completions"
|
|
230
|
+
auth: str | None = api_key
|
|
231
|
+
# MiMo native web_search: pay-as-you-go (sk-*) supports it, token-plan
|
|
232
|
+
# (tp-*) does not. Always include the tool on sk-* and let the model
|
|
233
|
+
# decide via tool_choice=auto — no extra flag needed.
|
|
234
|
+
enable_web_search = bool(api_key and api_key.startswith("sk-"))
|
|
235
|
+
else:
|
|
236
|
+
# Pollinations: pick the configured vision-capable model. The user's
|
|
237
|
+
# --model (mimo-*) is mimo-specific so we don't honor it here unless
|
|
238
|
+
# they explicitly passed --pollinations-model.
|
|
239
|
+
model = args.pollinations_model
|
|
240
|
+
url = POLLINATIONS_URL
|
|
241
|
+
auth = None
|
|
242
|
+
|
|
243
|
+
sys.stderr.write(
|
|
244
|
+
f"[chat] engine={engine} model={model}"
|
|
245
|
+
+ (" web_search=on" if enable_web_search else "")
|
|
246
|
+
+ "\n"
|
|
247
|
+
)
|
|
179
248
|
|
|
180
249
|
body = build_body(
|
|
181
250
|
prompt=args.prompt,
|
|
182
251
|
image=args.image,
|
|
183
252
|
model=model,
|
|
184
253
|
stream=args.stream,
|
|
185
|
-
|
|
254
|
+
enable_web_search=enable_web_search,
|
|
186
255
|
max_tokens=args.max_tokens,
|
|
187
256
|
temperature=args.temperature,
|
|
257
|
+
engine=engine,
|
|
188
258
|
)
|
|
189
259
|
|
|
190
|
-
|
|
191
|
-
resp = post(url, body, api_key, args.stream)
|
|
260
|
+
resp = post(url, body, auth, args.stream, engine=engine)
|
|
192
261
|
if args.stream:
|
|
193
262
|
stream_chat(resp)
|
|
194
263
|
else:
|
package/mimoskill/scripts/ocr.py
CHANGED
|
@@ -1,11 +1,14 @@
|
|
|
1
1
|
#!/usr/bin/env python3
|
|
2
2
|
"""
|
|
3
|
-
ocr.py — OCR / image recognition
|
|
3
|
+
ocr.py — OCR / image recognition that works without any API key.
|
|
4
4
|
|
|
5
5
|
Use this when the surrounding chat model can't see images (mimo-v2.5-pro,
|
|
6
|
-
mimo-v2.5-pro[1m], mimo-v2-flash, or any
|
|
7
|
-
|
|
8
|
-
|
|
6
|
+
mimo-v2.5-pro[1m], mimo-v2-flash, deepseek-*, or any text-only model).
|
|
7
|
+
|
|
8
|
+
Engines (--engine):
|
|
9
|
+
auto (default) — mimo if MIMO_API_KEY set, else pollinations
|
|
10
|
+
mimo — Xiaomi MiMo V2.5 vision. Highest quality. Needs MIMO_API_KEY
|
|
11
|
+
pollinations — pollinations.ai free public vision endpoint. NO KEY REQUIRED
|
|
9
12
|
|
|
10
13
|
Modes (--mode):
|
|
11
14
|
text (default) verbatim OCR — raw text, preserves line breaks
|
|
@@ -21,9 +24,12 @@ Image inputs (positional, 0+):
|
|
|
21
24
|
(none, stdin not a TTY) same as `-`
|
|
22
25
|
|
|
23
26
|
Usage:
|
|
24
|
-
|
|
27
|
+
# Zero-setup: free fallback, works for DeepSeek-only / no-key users
|
|
25
28
|
python3 ocr.py path/to/image.png
|
|
26
29
|
python3 ocr.py --mode describe https://example.com/x.png
|
|
30
|
+
|
|
31
|
+
# Best quality (needs MiMo key)
|
|
32
|
+
export MIMO_API_KEY=sk-xxxx
|
|
27
33
|
python3 ocr.py --mode structured a.png b.jpg
|
|
28
34
|
cat scan.png | python3 ocr.py --mode markdown
|
|
29
35
|
|
|
@@ -194,26 +200,32 @@ def build_messages(
|
|
|
194
200
|
|
|
195
201
|
# --- HTTP -------------------------------------------------------------------
|
|
196
202
|
|
|
197
|
-
|
|
203
|
+
POLLINATIONS_URL = "https://text.pollinations.ai/openai"
|
|
204
|
+
POLLINATIONS_DEFAULT_MODEL = "openai" # vision-capable, free, no key
|
|
205
|
+
|
|
206
|
+
|
|
207
|
+
def post(url: str, body: dict[str, Any], api_key: str | None, stream: bool, *, engine: str) -> Any:
|
|
208
|
+
headers = {
|
|
209
|
+
"Content-Type": "application/json",
|
|
210
|
+
"Accept": "text/event-stream" if stream else "application/json",
|
|
211
|
+
"User-Agent": "mimoskill-ocr/0.1",
|
|
212
|
+
}
|
|
213
|
+
if api_key:
|
|
214
|
+
headers["Authorization"] = f"Bearer {api_key}"
|
|
198
215
|
req = urllib.request.Request(
|
|
199
216
|
url,
|
|
200
217
|
method="POST",
|
|
201
218
|
data=json.dumps(body).encode("utf-8"),
|
|
202
|
-
headers=
|
|
203
|
-
"Content-Type": "application/json",
|
|
204
|
-
"Accept": "text/event-stream" if stream else "application/json",
|
|
205
|
-
"Authorization": f"Bearer {api_key}",
|
|
206
|
-
"User-Agent": "mimoskill-ocr/0.1",
|
|
207
|
-
},
|
|
219
|
+
headers=headers,
|
|
208
220
|
)
|
|
209
221
|
try:
|
|
210
222
|
return urllib.request.urlopen(req, timeout=300)
|
|
211
223
|
except urllib.error.HTTPError as e:
|
|
212
224
|
snippet = e.read().decode("utf-8", "replace")
|
|
213
|
-
sys.stderr.write(f"
|
|
225
|
+
sys.stderr.write(f"{engine} returned HTTP {e.code}: {snippet}\n")
|
|
214
226
|
sys.exit(1)
|
|
215
227
|
except urllib.error.URLError as e:
|
|
216
|
-
sys.stderr.write(f"connection failed: {e}\n")
|
|
228
|
+
sys.stderr.write(f"connection failed ({engine}): {e}\n")
|
|
217
229
|
sys.exit(1)
|
|
218
230
|
|
|
219
231
|
|
|
@@ -289,10 +301,23 @@ def main() -> None:
|
|
|
289
301
|
)
|
|
290
302
|
p.add_argument("--max-tokens", type=int, default=4096)
|
|
291
303
|
p.add_argument("--temperature", type=float, default=0.2)
|
|
304
|
+
p.add_argument(
|
|
305
|
+
"--engine",
|
|
306
|
+
choices=["auto", "mimo", "pollinations"],
|
|
307
|
+
default=os.environ.get("MIMO_OCR_ENGINE", "auto"),
|
|
308
|
+
help="OCR backend. auto = mimo if MIMO_API_KEY set, else pollinations "
|
|
309
|
+
"(free, no key required). default: %(default)s",
|
|
310
|
+
)
|
|
292
311
|
p.add_argument(
|
|
293
312
|
"--base-url",
|
|
294
313
|
default=os.environ.get("MIMO_BASE_URL", "https://api.xiaomimimo.com/v1"),
|
|
295
|
-
help="MiMo OpenAI-compat endpoint
|
|
314
|
+
help="MiMo OpenAI-compat endpoint, ignored when --engine=pollinations "
|
|
315
|
+
"(default: %(default)s)",
|
|
316
|
+
)
|
|
317
|
+
p.add_argument(
|
|
318
|
+
"--pollinations-model",
|
|
319
|
+
default=os.environ.get("POLLINATIONS_MODEL", POLLINATIONS_DEFAULT_MODEL),
|
|
320
|
+
help="model id when --engine=pollinations (default: %(default)s)",
|
|
296
321
|
)
|
|
297
322
|
p.add_argument(
|
|
298
323
|
"--prompt",
|
|
@@ -304,18 +329,27 @@ def main() -> None:
|
|
|
304
329
|
args = p.parse_args()
|
|
305
330
|
|
|
306
331
|
api_key = os.environ.get("MIMO_API_KEY")
|
|
307
|
-
|
|
308
|
-
|
|
309
|
-
|
|
310
|
-
|
|
311
|
-
|
|
312
|
-
|
|
313
|
-
|
|
314
|
-
|
|
315
|
-
|
|
316
|
-
|
|
317
|
-
|
|
318
|
-
|
|
332
|
+
|
|
333
|
+
# Resolve engine.
|
|
334
|
+
if args.engine == "mimo":
|
|
335
|
+
engine = "mimo"
|
|
336
|
+
if not api_key:
|
|
337
|
+
sys.stderr.write(
|
|
338
|
+
"error: --engine mimo requires MIMO_API_KEY.\n"
|
|
339
|
+
" set one at https://platform.xiaomimimo.com/#/console/api-keys\n"
|
|
340
|
+
" OR drop the flag to fall back to pollinations (free, no key required):\n"
|
|
341
|
+
" python3 ocr.py <image>\n"
|
|
342
|
+
)
|
|
343
|
+
sys.exit(3)
|
|
344
|
+
elif args.engine == "pollinations":
|
|
345
|
+
engine = "pollinations"
|
|
346
|
+
else: # auto
|
|
347
|
+
engine = "mimo" if api_key else "pollinations"
|
|
348
|
+
if engine == "pollinations":
|
|
349
|
+
sys.stderr.write(
|
|
350
|
+
"[engine] auto -> pollinations (free, no key). "
|
|
351
|
+
"Set MIMO_API_KEY for higher quality (mimo-v2.5).\n"
|
|
352
|
+
)
|
|
319
353
|
|
|
320
354
|
# Resolve images: explicit args, else stdin if not a TTY.
|
|
321
355
|
raw_args = args.images
|
|
@@ -330,12 +364,20 @@ def main() -> None:
|
|
|
330
364
|
|
|
331
365
|
image_urls = [resolve_image_arg(a) for a in raw_args]
|
|
332
366
|
|
|
333
|
-
|
|
334
|
-
|
|
335
|
-
|
|
367
|
+
if engine == "mimo":
|
|
368
|
+
model, note = pick_model(args.model)
|
|
369
|
+
if note:
|
|
370
|
+
sys.stderr.write(note)
|
|
371
|
+
else:
|
|
372
|
+
if args.model:
|
|
373
|
+
sys.stderr.write(
|
|
374
|
+
f"note: --model is mimo-specific; ignoring on pollinations "
|
|
375
|
+
f"(use --pollinations-model instead).\n"
|
|
376
|
+
)
|
|
377
|
+
model = args.pollinations_model
|
|
336
378
|
|
|
337
379
|
sys.stderr.write(
|
|
338
|
-
f"[ocr] mode={args.mode} model={model} images={len(image_urls)}\n"
|
|
380
|
+
f"[ocr] engine={engine} mode={args.mode} model={model} images={len(image_urls)}\n"
|
|
339
381
|
)
|
|
340
382
|
|
|
341
383
|
messages = build_messages(
|
|
@@ -348,13 +390,20 @@ def main() -> None:
|
|
|
348
390
|
body: dict[str, Any] = {
|
|
349
391
|
"model": model,
|
|
350
392
|
"messages": messages,
|
|
351
|
-
"max_completion_tokens": args.max_tokens,
|
|
352
393
|
"temperature": args.temperature,
|
|
353
394
|
"stream": args.stream,
|
|
354
395
|
}
|
|
396
|
+
if engine == "mimo":
|
|
397
|
+
# MiMo's quirk: max_completion_tokens, not max_tokens.
|
|
398
|
+
body["max_completion_tokens"] = args.max_tokens
|
|
399
|
+
url = args.base_url.rstrip("/") + "/chat/completions"
|
|
400
|
+
auth = api_key
|
|
401
|
+
else:
|
|
402
|
+
body["max_tokens"] = args.max_tokens
|
|
403
|
+
url = POLLINATIONS_URL
|
|
404
|
+
auth = None
|
|
355
405
|
|
|
356
|
-
|
|
357
|
-
resp = post(url, body, api_key, args.stream)
|
|
406
|
+
resp = post(url, body, auth, args.stream, engine=engine)
|
|
358
407
|
|
|
359
408
|
if args.stream:
|
|
360
409
|
content, reasoning = stream_chat(resp)
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "mimo2codex",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.18",
|
|
4
4
|
"description": "Local proxy that lets the latest OpenAI Codex CLI / desktop talk to Xiaomi MiMo (V2.5 Pro) via the Responses API by translating to Chat Completions on the fly.",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"codex",
|