npm - mimo2codex - Versions diffs - 0.1.16 → 0.1.18 - Mend

mimo2codex 0.1.16 → 0.1.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/README.md +24 -1
package/README.zh.md +23 -1
package/dist/providers/deepseek.js +2 -1
package/dist/providers/deepseek.js.map +1 -1
package/dist/providers/generic.js +2 -1
package/dist/providers/generic.js.map +1 -1
package/dist/providers/mimo.js +1 -0
package/dist/providers/mimo.js.map +1 -1
package/dist/server.js +1 -0
package/dist/server.js.map +1 -1
package/dist/translate/reqToChat.js +104 -7
package/dist/translate/reqToChat.js.map +1 -1
package/doc/mimoskill.md +295 -0
package/doc/mimoskill.zh.md +295 -0
package/mimoskill/SKILL.md +25 -19
package/mimoskill/references/ocr_workflow.md +49 -25
package/mimoskill/scripts/mimo_chat.py +111 -42
package/mimoskill/scripts/ocr.py +83 -34
package/package.json +1 -1

package/mimoskill/SKILL.md CHANGED Viewed

@@ -18,7 +18,7 @@ Trigger this skill when:
 - User asks "how do I generate a Codex pet" / "/hatch isn't working" / "image_gen tool not available"
 - User wants image generation as part of a MiMo-backed workflow
 - User pastes the Codex error: `the image generation tool (image_gen) is not available in this environment` or `the CLI fallback requires the openai Python package`
-- User wants to **OCR / read text from / describe / 识别 / 提取文字 from an image** while the active chat model is non-vision (e.g. mimo-v2.5-pro, mimo-v2-flash, or any third-party model without vision) — use `scripts/ocr.py` to fall back through `mimo-v2.5` without changing the chat model
+- User wants to **OCR / read text from / describe / 识别 / 提取文字 from an image** while the active chat model is non-vision (e.g. mimo-v2.5-pro, mimo-v2-flash, deepseek-*, or any third-party text-only model) — use `scripts/ocr.py`. Works with or without a MiMo key (free pollinations fallback when `MIMO_API_KEY` is unset).
 - User sees the proxy's `[N image attachment(s) omitted: this model does not support image input …]` placeholder in their transcript
 - Anything in the `mimo2codex` repo that touches a feature MiMo doesn't support
@@ -38,7 +38,7 @@ Quick answer:
 | Audio chat | ✅ | `mimo-v2-omni` | input only |
 | Video understanding | ✅ | `mimo-v2-omni` | input only |
 | **Image generation** | ❌ | — | `scripts/generate_image.py` (general) or `scripts/generate_pet.py` (Codex pets) — see below |
-| OCR / 识图 (when chat model is non-vision) | ⚠️ via `mimo-v2.5` | `scripts/ocr.py` | always uses `mimo-v2.5` internally regardless of chat model |
+| OCR / 识图 (when chat model is non-vision) | ⚠️ via `mimo-v2.5` or free pollinations | `scripts/ocr.py` | `--engine auto`: mimo if `MIMO_API_KEY` set, else pollinations (no key) |
 | Code interpreter / sandbox | ❌ | — | not provided |
 For the full capability matrix and examples, read [references/models.md](references/models.md).
@@ -48,7 +48,7 @@ For the full capability matrix and examples, read [references/models.md](referen
 ```
 Is it OCR / read text from image / describe / 识别 an image
 when the active chat model is non-vision?
-├── Yes → use scripts/ocr.py (always routes through mimo-v2.5 internally)
+├── Yes → use scripts/ocr.py (mimo-v2.5 if MIMO_API_KEY set, else free pollinations)
 └── No
     │
     Is it chat / vision / search / TTS / ASR with a vision-capable model?
@@ -60,45 +60,51 @@ when the active chat model is non-vision?
         └── No  → see "General (non-pet) image generation" below (scripts/generate_image.py)
 ```
-## Calling MiMo directly
+## Calling chat directly (works without any key)
-Use `scripts/mimo_chat.py` to send a single chat completion (or stream):
+Use `scripts/mimo_chat.py` for one-shot or streaming chat. Two engines, `--engine auto` (default) picks `mimo` if `MIMO_API_KEY` is set, else `pollinations` (free, no key) — so **the script works without any key** for text and vision.
 ```bash
+# Zero-setup — uses pollinations fallback when MIMO_API_KEY is unset
+python3 mimoskill/scripts/mimo_chat.py "your prompt here"
+python3 mimoskill/scripts/mimo_chat.py --image https://example.com/x.png "describe this"
+# Best quality + MiMo-specific features (web search, TTS, ASR)
 export MIMO_API_KEY=sk-xxxxxxxxxxxxxxxx
 python3 mimoskill/scripts/mimo_chat.py "your prompt here"
-python3 mimoskill/scripts/mimo_chat.py --model mimo-v2.5 --image https://example.com/x.png "describe this"
-python3 mimoskill/scripts/mimo_chat.py --search "今天上海天气?"
+python3 mimoskill/scripts/mimo_chat.py "今天上海天气?"   # web search auto-enabled on sk-* keys
 python3 mimoskill/scripts/mimo_chat.py --stream "tell me a story"
 ```
-The script handles all the MiMo-specific quirks — `max_completion_tokens` instead of `max_tokens`, the required `text` part next to `image_url`, web_search plugin invocation, `reasoning_content` round-tripping, etc.
+When the mimo engine is active the script handles all MiMo-specific quirks — `max_completion_tokens` instead of `max_tokens`, the required `text` part next to `image_url`, `reasoning_content` round-tripping, etc. **Web search is auto-enabled on pay-as-you-go (`sk-*`) keys** — the `web_search` builtin is always included in the tools array and the model decides when to invoke it (`tool_choice: "auto"`). Token-plan (`tp-*`) keys skip web search (the endpoint doesn't support it). The pollinations engine doesn't support web search, TTS, or ASR (those are MiMo native features); it auto-switches to OpenAI-compat field names (`max_tokens`).
 For non-trivial integrations, [references/models.md](references/models.md) and [the official MiMo OpenAI-compat doc](https://platform.xiaomimimo.com/docs/api/chat/openai-api) are the authoritative references.
 ## OCR / image recognition (when the chat model can't see images)
-If the user wants to **read text from an image** or **describe / 识别 an image** but the current chat model is non-vision (`mimo-v2.5-pro`, `mimo-v2.5-pro[1m]`, `mimo-v2-flash`, or any third-party model without vision), invoke `scripts/ocr.py`. It always uses `mimo-v2.5` internally — the chat model stays untouched.
+If the user wants to **read text from an image** or **describe / 识别 an image** but the current chat model is non-vision (`mimo-v2.5-pro`, `mimo-v2.5-pro[1m]`, `mimo-v2-flash`, `deepseek-*`, or any third-party text-only model), invoke `scripts/ocr.py`. Two engines, `--engine auto` (default) picks the right one:
+- **`mimo`** — needs `MIMO_API_KEY`, uses `mimo-v2.5` regardless of the chat model. Best quality.
+- **`pollinations`** — free public vision endpoint at `text.pollinations.ai`, **no key required**. Mirrors the same no-key fallback `generate_pet.py` uses. Rate-limited but always available — covers users who only have a DeepSeek key (or no key at all).
 The proxy silently drops image attachments on non-vision models (`src/translate/reqToChat.ts:48-72`) and leaves a `[N image attachment(s) omitted: …]` placeholder. **When you see that placeholder in the transcript, the right move is to run ocr.py and feed the text back into the conversation.** Don't ask the user to switch models.
 ```bash
-export MIMO_API_KEY=sk-xxxxxxxxxxxxxxxx
-# verbatim OCR (default)
+# Zero-setup — uses pollinations fallback when MIMO_API_KEY is unset
 python3 mimoskill/scripts/ocr.py path/to/image.png
-# 2-4 sentence description
 python3 mimoskill/scripts/ocr.py --mode describe https://example.com/x.png
-# structured JSON (text + regions + language + summary)
 python3 mimoskill/scripts/ocr.py --mode structured a.png b.jpg
-# re-render as GitHub-flavored Markdown (good for forms / receipts)
 cat scan.png | python3 mimoskill/scripts/ocr.py --mode markdown
+# Best quality — set MiMo key, auto picks mimo
+export MIMO_API_KEY=sk-xxxxxxxxxxxxxxxx
+python3 mimoskill/scripts/ocr.py path/to/image.png
+# Force the free engine even when you have a MiMo key (e.g. to save quota)
+python3 mimoskill/scripts/ocr.py --engine pollinations form.png
 ```
-`ocr.py` accepts local paths, http(s) URLs, `data:` URLs, or stdin bytes. Magic-byte sniffs the MIME (PNG / JPEG / GIF / WebP / BMP). Multiple positional args are batched into one MiMo call. Non-vision `--model` values are auto-coerced to `mimo-v2.5` with one stderr note.
+`ocr.py` accepts local paths, http(s) URLs, `data:` URLs, or stdin bytes. Magic-byte sniffs the MIME (PNG / JPEG / GIF / WebP / BMP). Multiple positional args are batched into one upstream call. Non-vision `--model` values are auto-coerced to `mimo-v2.5` with one stderr note (mimo engine only; on pollinations use `--pollinations-model`).
 See [references/ocr_workflow.md](references/ocr_workflow.md) for full mode reference, exit codes, JSON shape for `--mode structured`, and the `--lang` / `--prompt` knobs.

package/mimoskill/references/ocr_workflow.md CHANGED Viewed

@@ -1,26 +1,32 @@
 # OCR / image recognition workflow
 `mimoskill/scripts/ocr.py` is the fallback path for reading or describing
-images when the surrounding chat model can't see them. It always calls
-`mimo-v2.5` (MiMo's vision-capable model) internally, regardless of which
-model the rest of the conversation is using.
+images when the surrounding chat model can't see them. Two engines:
+| Engine | Needs API key? | Quality | Notes |
+|---|---|---|---|
+| `mimo` | yes (`MIMO_API_KEY`) | best | Calls `mimo-v2.5` regardless of the chat model used elsewhere. |
+| `pollinations` | **no** | decent | Free public endpoint at `text.pollinations.ai`. Rate-limited but no signup. |
+`--engine auto` (default) picks `mimo` if `MIMO_API_KEY` is set, else falls
+back to `pollinations` so users with only a DeepSeek key (or no key at all)
+still get OCR.
 ## TL;DR
 ```bash
-export MIMO_API_KEY=sk-xxxxxxxxxxxxxxxx
-# default mode (text) — verbatim OCR
+# Zero-setup — uses free pollinations fallback when MIMO_API_KEY is unset
 python3 mimoskill/scripts/ocr.py path/to/image.png
-# describe the image in 2-4 sentences
 python3 mimoskill/scripts/ocr.py --mode describe path/to/image.png
-# structured JSON (text + regions + language + summary)
 python3 mimoskill/scripts/ocr.py --mode structured a.png b.jpg
-# re-render as GitHub-flavored Markdown
 python3 mimoskill/scripts/ocr.py --mode markdown form.png
+# Force the free engine even when you have a MiMo key (e.g. to save quota)
+python3 mimoskill/scripts/ocr.py --engine pollinations form.png
+# Best quality — set MiMo key
+export MIMO_API_KEY=sk-xxxxxxxxxxxxxxxx
+python3 mimoskill/scripts/ocr.py path/to/image.png   # auto -> mimo
 ```
 ## Why this skill exists
@@ -161,21 +167,39 @@ silently (one stderr line) rather than failing.
 ## When `MIMO_API_KEY` isn't set
-`ocr.py` exits with code `3` and this stderr message:
+`--engine auto` (the default) silently falls back to `pollinations`:
 ```
-error: MIMO_API_KEY is not set; ocr.py needs MiMo V2.5 vision to read images.
-  set one at https://platform.xiaomimimo.com/#/console/api-keys
-  OR if you want fully-local OCR with no API key, install tesseract:
-      macOS:    brew install tesseract tesseract-lang
-      Ubuntu:   sudo apt install tesseract-ocr tesseract-ocr-chi-sim
-      Windows:  https://github.com/UB-Mannheim/tesseract/wiki
-    then run: tesseract <image> - -l eng+chi_sim
-  (tesseract is NOT installed or invoked by this skill; this is just a pointer.)
+[engine] auto -> pollinations (free, no key). Set MIMO_API_KEY for higher quality (mimo-v2.5).
+[ocr] engine=pollinations mode=text model=openai images=1
+<extracted text>
+```
+Exit code `3` is only raised when the user explicitly passes `--engine mimo`
+without a key (passing the flag is treated as an assertion that MiMo should
+be used; auto-falling-back would mask the misconfiguration).
+If you'd rather use **fully-local OCR** with no network at all, install
+tesseract and shell to it directly — this skill won't auto-invoke it:
+```bash
+macOS:    brew install tesseract tesseract-lang
+Ubuntu:   sudo apt install tesseract-ocr tesseract-ocr-chi-sim
+Windows:  https://github.com/UB-Mannheim/tesseract/wiki
+tesseract <image> - -l eng+chi_sim
 ```
-The tesseract pointer is **just a pointer** — this skill never auto-shells
-to it. Keeps the dependency surface predictable.
+## Pollinations specifics
+- Endpoint: `https://text.pollinations.ai/openai` (OpenAI Chat Completions
+  compatible).
+- Default model: `openai` (vision-capable). Override with
+  `--pollinations-model <name>` or `POLLINATIONS_MODEL=<name>`. Other
+  vision-capable picks include `openai-large`, `openai-fast`.
+- No `Authorization` header is sent; the service is open. Rate limits apply
+  per-IP; if you hit them you'll see HTTP 429 in stderr — wait or retry.
+- `reasoning_content` is normally empty for pollinations responses (the
+  underlying models don't expose chain-of-thought).
 ## Common pitfalls
@@ -194,9 +218,9 @@ to it. Keeps the dependency surface predictable.
 | Code | Meaning |
 |---|---|
 | 0 | Success |
-| 1 | MiMo HTTP error (error body printed to stderr) |
+| 1 | Upstream HTTP error (MiMo or Pollinations; error body printed to stderr) |
 | 2 | argv / usage error (no image, mutually exclusive flags, etc.) |
-| 3 | `MIMO_API_KEY` not set |
+| 3 | `--engine mimo` explicitly requested but `MIMO_API_KEY` not set |
 | 4 | Local image file not found / unreadable |
 ## Composing with `mimo_chat.py`

package/mimoskill/scripts/mimo_chat.py CHANGED Viewed

@@ -1,21 +1,29 @@
 #!/usr/bin/env python3
 """
-mimo_chat.py — single-shot or streaming chat with Xiaomi MiMo V2.5.
+mimo_chat.py — single-shot or streaming chat. Works WITHOUT any API key.
-Hits MiMo's OpenAI-compatible /v1/chat/completions endpoint directly. Handles
-the MiMo-specific quirks:
+Engines (--engine):
+  auto          (default) — mimo if MIMO_API_KEY set, else pollinations
+  mimo          — Xiaomi MiMo V2.5 (best quality, needs MIMO_API_KEY)
+  pollinations  — pollinations.ai free public chat endpoint. NO KEY REQUIRED
+When the mimo engine is used, handles the MiMo-specific quirks:
   - max_completion_tokens (not max_tokens)
   - vision via mimo-v2.5 / mimo-v2-omni (and the required text part next to
     image_url, otherwise MiMo 400s with "text is not set")
-  - web_search builtin tool (requires Web Search Plugin activated in console)
+  - web_search builtin: auto-enabled on pay-as-you-go (sk-*) keys, skipped on
+    token-plan (tp-*) keys. Model decides when to invoke (tool_choice: auto).
+    Requires the Web Search Plugin to be activated in the MiMo console.
   - reasoning_content extraction
 Usage:
-    export MIMO_API_KEY=sk-xxxx
+    # Zero-setup
     python3 mimo_chat.py "your prompt"
-    python3 mimo_chat.py --model mimo-v2.5 --image https://x/y.png "describe"
-    python3 mimo_chat.py --search "今天上海天气?"
+    python3 mimo_chat.py --image https://x/y.png "describe"
+    # MiMo key — gets best quality + native web search (when sk-*)
+    export MIMO_API_KEY=sk-xxxx
+    python3 mimo_chat.py "今天上海天气?"
     python3 mimo_chat.py --stream "tell me a story"
 Only depends on the standard library — no `openai` SDK install needed.
@@ -48,51 +56,64 @@ def build_messages(prompt: str, image: str | None) -> list[dict[str, Any]]:
     ]
+POLLINATIONS_URL = "https://text.pollinations.ai/openai"
+POLLINATIONS_DEFAULT_MODEL = "openai"  # vision-capable, free, no key
 def build_body(
     *,
     prompt: str,
     image: str | None,
     model: str,
     stream: bool,
-    search: bool,
+    enable_web_search: bool,
     max_tokens: int,
     temperature: float,
+    engine: str,
 ) -> dict[str, Any]:
     body: dict[str, Any] = {
         "model": model,
         "messages": build_messages(prompt, image),
-        "max_completion_tokens": max_tokens,
         "temperature": temperature,
         "stream": stream,
     }
-    if search:
-        # MiMo native web_search builtin. Requires the Web Search Plugin to
-        # be activated at https://platform.xiaomimimo.com/#/console/plugin.
-        body["tools"] = [{"type": "web_search", "force_search": True}]
+    if engine == "mimo":
+        # MiMo's quirk: max_completion_tokens, not max_tokens.
+        body["max_completion_tokens"] = max_tokens
+    else:
+        body["max_tokens"] = max_tokens
+    if enable_web_search:
+        # MiMo native web_search builtin. The model decides whether to invoke
+        # it (tool_choice=auto). Requires the Web Search Plugin to be
+        # activated at https://platform.xiaomimimo.com/#/console/plugin —
+        # without that, MiMo returns 400 and the error body is printed.
+        body["tools"] = [{"type": "web_search"}]
         body["tool_choice"] = "auto"
     return body
-def post(url: str, body: dict[str, Any], api_key: str, stream: bool) -> Any:
+def post(url: str, body: dict[str, Any], api_key: str | None, stream: bool, *, engine: str) -> Any:
+    headers = {
+        "Content-Type": "application/json",
+        "Accept": "text/event-stream" if stream else "application/json",
+        "User-Agent": "mimoskill/0.1",
+    }
+    if api_key:
+        headers["Authorization"] = f"Bearer {api_key}"
     req = urllib.request.Request(
         url,
         method="POST",
         data=json.dumps(body).encode("utf-8"),
-        headers={
-            "Content-Type": "application/json",
-            "Accept": "text/event-stream" if stream else "application/json",
-            "Authorization": f"Bearer {api_key}",
-            "User-Agent": "mimoskill/0.1",
-        },
+        headers=headers,
     )
     try:
         return urllib.request.urlopen(req, timeout=300)
     except urllib.error.HTTPError as e:
         snippet = e.read().decode("utf-8", "replace")
-        sys.stderr.write(f"MiMo returned HTTP {e.code}: {snippet}\n")
+        sys.stderr.write(f"{engine} returned HTTP {e.code}: {snippet}\n")
         sys.exit(1)
     except urllib.error.URLError as e:
-        sys.stderr.write(f"connection failed: {e}\n")
+        sys.stderr.write(f"connection failed ({engine}): {e}\n")
         sys.exit(1)
@@ -144,51 +165,99 @@ def main() -> None:
     p.add_argument("prompt", nargs="?", default="", help="user message text")
     p.add_argument("--model", default=os.environ.get("MIMO_MODEL", "mimo-v2.5-pro"))
     p.add_argument("--image", help="image URL to attach (forces vision-capable model)")
-    p.add_argument("--search", action="store_true", help="enable MiMo web_search builtin")
     p.add_argument("--stream", action="store_true", help="stream the response")
     p.add_argument("--max-tokens", type=int, default=2048)
     p.add_argument("--temperature", type=float, default=0.7)
+    p.add_argument(
+        "--engine",
+        choices=["auto", "mimo", "pollinations"],
+        default=os.environ.get("MIMO_CHAT_ENGINE", "auto"),
+        help="chat backend. auto = mimo if MIMO_API_KEY set, else pollinations "
+        "(free, no key required). default: %(default)s",
+    )
     p.add_argument(
         "--base-url",
         default=os.environ.get("MIMO_BASE_URL", "https://api.xiaomimimo.com/v1"),
-        help="set to https://token-plan-cn.xiaomimimo.com/v1 for tp-* keys",
+        help="MiMo endpoint, ignored when --engine=pollinations "
+        "(tp-* keys use https://token-plan-cn.xiaomimimo.com/v1)",
+    )
+    p.add_argument(
+        "--pollinations-model",
+        default=os.environ.get("POLLINATIONS_MODEL", POLLINATIONS_DEFAULT_MODEL),
+        help="model id when --engine=pollinations (default: %(default)s)",
     )
     args = p.parse_args()
     api_key = os.environ.get("MIMO_API_KEY")
-    if not api_key:
-        sys.stderr.write("error: MIMO_API_KEY not set in environment\n")
-        sys.stderr.write(
-            "  get one at https://platform.xiaomimimo.com/#/console/api-keys\n"
-        )
-        sys.exit(2)
+    # Resolve engine.
+    if args.engine == "mimo":
+        engine = "mimo"
+        if not api_key:
+            sys.stderr.write(
+                "error: --engine mimo requires MIMO_API_KEY.\n"
+                "  get one at https://platform.xiaomimimo.com/#/console/api-keys\n"
+                "  OR drop the flag to fall back to pollinations (free, no key required):\n"
+                "      python3 mimo_chat.py <prompt>\n"
+            )
+            sys.exit(3)
+    elif args.engine == "pollinations":
+        engine = "pollinations"
+    else:  # auto
+        engine = "mimo" if api_key else "pollinations"
+        if engine == "pollinations":
+            sys.stderr.write(
+                "[engine] auto -> pollinations (free, no key). "
+                "Set MIMO_API_KEY for higher quality (mimo-v2.5).\n"
+            )
     if not args.prompt and not args.image:
         sys.stderr.write("error: pass a prompt and/or --image\n")
         sys.exit(2)
-    # Auto-bump to a vision model if user passed --image with a non-vision model
-    model = args.model
-    if args.image and "omni" not in model.lower() and not model.startswith("mimo-v2.5["):
-        if model != "mimo-v2.5":
-            sys.stderr.write(
-                f"note: --image given but model is '{model}' which doesn't see images.\n"
-                f"      switching to mimo-v2.5 for this call.\n"
-            )
-            model = "mimo-v2.5"
+    enable_web_search = False
+    if engine == "mimo":
+        # Auto-bump to a vision model if user passed --image with a non-vision model.
+        model = args.model
+        if args.image and "omni" not in model.lower() and not model.startswith("mimo-v2.5["):
+            if model != "mimo-v2.5":
+                sys.stderr.write(
+                    f"note: --image given but model is '{model}' which doesn't see images.\n"
+                    f"      switching to mimo-v2.5 for this call.\n"
+                )
+                model = "mimo-v2.5"
+        url = args.base_url.rstrip("/") + "/chat/completions"
+        auth: str | None = api_key
+        # MiMo native web_search: pay-as-you-go (sk-*) supports it, token-plan
+        # (tp-*) does not. Always include the tool on sk-* and let the model
+        # decide via tool_choice=auto — no extra flag needed.
+        enable_web_search = bool(api_key and api_key.startswith("sk-"))
+    else:
+        # Pollinations: pick the configured vision-capable model. The user's
+        # --model (mimo-*) is mimo-specific so we don't honor it here unless
+        # they explicitly passed --pollinations-model.
+        model = args.pollinations_model
+        url = POLLINATIONS_URL
+        auth = None
+    sys.stderr.write(
+        f"[chat] engine={engine} model={model}"
+        + (" web_search=on" if enable_web_search else "")
+        + "\n"
+    )
     body = build_body(
         prompt=args.prompt,
         image=args.image,
         model=model,
         stream=args.stream,
-        search=args.search,
+        enable_web_search=enable_web_search,
         max_tokens=args.max_tokens,
         temperature=args.temperature,
+        engine=engine,
     )
-    url = args.base_url.rstrip("/") + "/chat/completions"
-    resp = post(url, body, api_key, args.stream)
+    resp = post(url, body, auth, args.stream, engine=engine)
     if args.stream:
         stream_chat(resp)
     else:

package/mimoskill/scripts/ocr.py CHANGED Viewed

@@ -1,11 +1,14 @@
 #!/usr/bin/env python3
 """
-ocr.py — OCR / image recognition via Xiaomi MiMo V2.5 vision.
+ocr.py — OCR / image recognition that works without any API key.
 Use this when the surrounding chat model can't see images (mimo-v2.5-pro,
-mimo-v2.5-pro[1m], mimo-v2-flash, or any third-party model without vision).
-ocr.py always calls mimo-v2.5 internally regardless of what the rest of the
-conversation is using.
+mimo-v2.5-pro[1m], mimo-v2-flash, deepseek-*, or any text-only model).
+Engines (--engine):
+  auto          (default) — mimo if MIMO_API_KEY set, else pollinations
+  mimo          — Xiaomi MiMo V2.5 vision. Highest quality. Needs MIMO_API_KEY
+  pollinations  — pollinations.ai free public vision endpoint. NO KEY REQUIRED
 Modes (--mode):
   text       (default) verbatim OCR — raw text, preserves line breaks
@@ -21,9 +24,12 @@ Image inputs (positional, 0+):
   (none, stdin not a TTY)    same as `-`
 Usage:
-    export MIMO_API_KEY=sk-xxxx
+    # Zero-setup: free fallback, works for DeepSeek-only / no-key users
     python3 ocr.py path/to/image.png
     python3 ocr.py --mode describe https://example.com/x.png
+    # Best quality (needs MiMo key)
+    export MIMO_API_KEY=sk-xxxx
     python3 ocr.py --mode structured a.png b.jpg
     cat scan.png | python3 ocr.py --mode markdown
@@ -194,26 +200,32 @@ def build_messages(
 # --- HTTP -------------------------------------------------------------------
-def post(url: str, body: dict[str, Any], api_key: str, stream: bool) -> Any:
+POLLINATIONS_URL = "https://text.pollinations.ai/openai"
+POLLINATIONS_DEFAULT_MODEL = "openai"  # vision-capable, free, no key
+def post(url: str, body: dict[str, Any], api_key: str | None, stream: bool, *, engine: str) -> Any:
+    headers = {
+        "Content-Type": "application/json",
+        "Accept": "text/event-stream" if stream else "application/json",
+        "User-Agent": "mimoskill-ocr/0.1",
+    }
+    if api_key:
+        headers["Authorization"] = f"Bearer {api_key}"
     req = urllib.request.Request(
         url,
         method="POST",
         data=json.dumps(body).encode("utf-8"),
-        headers={
-            "Content-Type": "application/json",
-            "Accept": "text/event-stream" if stream else "application/json",
-            "Authorization": f"Bearer {api_key}",
-            "User-Agent": "mimoskill-ocr/0.1",
-        },
+        headers=headers,
     )
     try:
         return urllib.request.urlopen(req, timeout=300)
     except urllib.error.HTTPError as e:
         snippet = e.read().decode("utf-8", "replace")
-        sys.stderr.write(f"MiMo returned HTTP {e.code}: {snippet}\n")
+        sys.stderr.write(f"{engine} returned HTTP {e.code}: {snippet}\n")
         sys.exit(1)
     except urllib.error.URLError as e:
-        sys.stderr.write(f"connection failed: {e}\n")
+        sys.stderr.write(f"connection failed ({engine}): {e}\n")
         sys.exit(1)
@@ -289,10 +301,23 @@ def main() -> None:
     )
     p.add_argument("--max-tokens", type=int, default=4096)
     p.add_argument("--temperature", type=float, default=0.2)
+    p.add_argument(
+        "--engine",
+        choices=["auto", "mimo", "pollinations"],
+        default=os.environ.get("MIMO_OCR_ENGINE", "auto"),
+        help="OCR backend. auto = mimo if MIMO_API_KEY set, else pollinations "
+        "(free, no key required). default: %(default)s",
+    )
     p.add_argument(
         "--base-url",
         default=os.environ.get("MIMO_BASE_URL", "https://api.xiaomimimo.com/v1"),
-        help="MiMo OpenAI-compat endpoint (default: %(default)s)",
+        help="MiMo OpenAI-compat endpoint, ignored when --engine=pollinations "
+        "(default: %(default)s)",
+    )
+    p.add_argument(
+        "--pollinations-model",
+        default=os.environ.get("POLLINATIONS_MODEL", POLLINATIONS_DEFAULT_MODEL),
+        help="model id when --engine=pollinations (default: %(default)s)",
     )
     p.add_argument(
         "--prompt",
@@ -304,18 +329,27 @@ def main() -> None:
     args = p.parse_args()
     api_key = os.environ.get("MIMO_API_KEY")
-    if not api_key:
-        sys.stderr.write(
-            "error: MIMO_API_KEY is not set; ocr.py needs MiMo V2.5 vision to read images.\n"
-            "  set one at https://platform.xiaomimimo.com/#/console/api-keys\n"
-            "  OR if you want fully-local OCR with no API key, install tesseract:\n"
-            "      macOS:    brew install tesseract tesseract-lang\n"
-            "      Ubuntu:   sudo apt install tesseract-ocr tesseract-ocr-chi-sim\n"
-            "      Windows:  https://github.com/UB-Mannheim/tesseract/wiki\n"
-            "    then run: tesseract <image> - -l eng+chi_sim\n"
-            "  (tesseract is NOT installed or invoked by this skill; this is just a pointer.)\n"
-        )
-        sys.exit(3)
+    # Resolve engine.
+    if args.engine == "mimo":
+        engine = "mimo"
+        if not api_key:
+            sys.stderr.write(
+                "error: --engine mimo requires MIMO_API_KEY.\n"
+                "  set one at https://platform.xiaomimimo.com/#/console/api-keys\n"
+                "  OR drop the flag to fall back to pollinations (free, no key required):\n"
+                "      python3 ocr.py <image>\n"
+            )
+            sys.exit(3)
+    elif args.engine == "pollinations":
+        engine = "pollinations"
+    else:  # auto
+        engine = "mimo" if api_key else "pollinations"
+        if engine == "pollinations":
+            sys.stderr.write(
+                "[engine] auto -> pollinations (free, no key). "
+                "Set MIMO_API_KEY for higher quality (mimo-v2.5).\n"
+            )
     # Resolve images: explicit args, else stdin if not a TTY.
     raw_args = args.images
@@ -330,12 +364,20 @@ def main() -> None:
     image_urls = [resolve_image_arg(a) for a in raw_args]
-    model, note = pick_model(args.model)
-    if note:
-        sys.stderr.write(note)
+    if engine == "mimo":
+        model, note = pick_model(args.model)
+        if note:
+            sys.stderr.write(note)
+    else:
+        if args.model:
+            sys.stderr.write(
+                f"note: --model is mimo-specific; ignoring on pollinations "
+                f"(use --pollinations-model instead).\n"
+            )
+        model = args.pollinations_model
     sys.stderr.write(
-        f"[ocr] mode={args.mode} model={model} images={len(image_urls)}\n"
+        f"[ocr] engine={engine} mode={args.mode} model={model} images={len(image_urls)}\n"
     )
     messages = build_messages(
@@ -348,13 +390,20 @@ def main() -> None:
     body: dict[str, Any] = {
         "model": model,
         "messages": messages,
-        "max_completion_tokens": args.max_tokens,
         "temperature": args.temperature,
         "stream": args.stream,
     }
+    if engine == "mimo":
+        # MiMo's quirk: max_completion_tokens, not max_tokens.
+        body["max_completion_tokens"] = args.max_tokens
+        url = args.base_url.rstrip("/") + "/chat/completions"
+        auth = api_key
+    else:
+        body["max_tokens"] = args.max_tokens
+        url = POLLINATIONS_URL
+        auth = None
-    url = args.base_url.rstrip("/") + "/chat/completions"
-    resp = post(url, body, api_key, args.stream)
+    resp = post(url, body, auth, args.stream, engine=engine)
     if args.stream:
         content, reasoning = stream_chat(resp)

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "mimo2codex",
-  "version": "0.1.16",
+  "version": "0.1.18",
   "description": "Local proxy that lets the latest OpenAI Codex CLI / desktop talk to Xiaomi MiMo (V2.5 Pro) via the Responses API by translating to Chat Completions on the fly.",
   "keywords": [
     "codex",