PyPI - cembedding - Versions diffs - 0.5.0__tar.gz - Mend

cembedding 0.5.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

cembedding-0.5.0/.gitignore +13 -0
cembedding-0.5.0/LICENSE +21 -0
cembedding-0.5.0/PKG-INFO +138 -0
cembedding-0.5.0/README.md +111 -0
cembedding-0.5.0/cembedding/__init__.py +3 -0
cembedding-0.5.0/cembedding/__main__.py +6 -0
cembedding-0.5.0/cembedding/_vendored_mcp_common/__init__.py +14 -0
cembedding-0.5.0/cembedding/_vendored_mcp_common/mcp_utils.py +152 -0
cembedding-0.5.0/cembedding/_vendored_mcp_common/validation.py +65 -0
cembedding-0.5.0/cembedding/download_model.py +132 -0
cembedding-0.5.0/cembedding/server.py +1351 -0
cembedding-0.5.0/pyproject.toml +40 -0
cembedding-0.5.0/server.py +16 -0

cembedding-0.5.0/.gitignore ADDED Viewed

@@ -0,0 +1,13 @@
+# Downloaded ONNX/MLX models + runtime index (fetched via download_model.py)
+data/
+# Python
+__pycache__/
+*.py[cod]
+.venv/
+venv/
+*.egg-info/
+dist/
+build/
+.uv/
+uv.lock

cembedding-0.5.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025-2026 ClotoCore Project <ClotoCore@proton.me>
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

cembedding-0.5.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,138 @@
+Metadata-Version: 2.4
+Name: cembedding
+Version: 0.5.0
+Summary: Local-first embedding server: vector generation + index/search over HTTP (ONNX on-device or API providers). The reference /embed server for CPersona.
+Project-URL: Homepage, https://github.com/Cloto-dev/CEmbedding
+Project-URL: Repository, https://github.com/Cloto-dev/CEmbedding
+Author-email: ClotoCore Project <ClotoCore@proton.me>
+License: MIT
+License-File: LICENSE
+Keywords: bge-m3,embedding,jina,mcp,onnx,vector-search
+Requires-Python: >=3.10
+Requires-Dist: aiohttp>=3.9.0
+Requires-Dist: aiosqlite>=0.20.0
+Requires-Dist: httpx>=0.27.0
+Requires-Dist: mcp<1.27.0,>=1.0.0
+Requires-Dist: numpy>=1.24.0
+Provides-Extra: mlx
+Requires-Dist: mlx-embeddings>=0.1.0; extra == 'mlx'
+Requires-Dist: mlx>=0.18.0; extra == 'mlx'
+Provides-Extra: onnx
+Requires-Dist: onnxruntime>=1.17.0; extra == 'onnx'
+Requires-Dist: tokenizers>=0.15.0; extra == 'onnx'
+Provides-Extra: onnx-gpu
+Requires-Dist: onnxruntime-gpu>=1.17.0; extra == 'onnx-gpu'
+Requires-Dist: tokenizers>=0.15.0; extra == 'onnx-gpu'
+Description-Content-Type: text/markdown
+<div align="center">
+# CEmbedding
+### Local-first embedding server
+Vector embeddings over a tiny HTTP contract.
+On-device ONNX or any OpenAI-compatible API. The reference `/embed` server for [CPersona](https://github.com/Cloto-dev/CPersona).
+[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
+[![Python](https://img.shields.io/badge/python-3.10+-blue.svg)]()
+</div>
+---
+> **Standalone repository** — extracted from the (now private) `clotohub-servers` monorepo so it can be used on its own. [ClotoCore](https://github.com/Cloto-dev/ClotoCore) users get this through the in-app marketplace ([ClotoHub](https://hub.cloto.dev)); everyone else can run it directly as described below.
+## What it is
+A small server that turns text into vectors. It speaks a minimal HTTP contract so anything can call it — its primary consumer is [CPersona](https://github.com/Cloto-dev/CPersona), whose hybrid search uses it for the vector-similarity layer. It can run a model **on-device** via ONNX (no API key, no network) or proxy an **OpenAI-compatible API**.
+It also exposes an MCP (stdio) surface and an optional persistent vector index (`/index`, `/search`), but the HTTP `/embed` endpoint is all CPersona needs.
+## The `/embed` contract
+```
+POST /embed
+Request:  { "texts": ["string", ...] }                 # non-empty array, max 100 per batch
+Response: { "embeddings": [[float, ...], ...], "dimensions": <int> }
+```
+Point any client (e.g. CPersona's `CPERSONA_EMBEDDING_URL` / generic `EMBEDDING_HTTP_URL`) at `http://127.0.0.1:8401/embed`.
+## Quick Start (on-device ONNX)
+**Prerequisites:** Python 3.10+
+```bash
+# Download a model into ./data/models (jina-v5-nano is what CPersona is tuned for)
+uvx --from "cembedding[onnx]" cembedding-download-model --model jina-v5-nano
+# Run the server (reads ./data/models from the current directory)
+EMBEDDING_PROVIDER=onnx_jina_v5_nano uvx --from "cembedding[onnx]" cembedding
+```
+Or install it onto your PATH with `pip install "cembedding[onnx]"`, then run
+`cembedding-download-model --model jina-v5-nano` and `cembedding`.
+From source (development):
+```bash
+git clone https://github.com/Cloto-dev/CEmbedding.git
+cd CEmbedding
+python -m venv .venv
+source .venv/bin/activate          # Windows: .venv\Scripts\activate
+pip install ".[onnx]"
+python -m cembedding.download_model --model jina-v5-nano
+EMBEDDING_PROVIDER=onnx_jina_v5_nano python -m cembedding   # or: python server.py
+```
+You should see `HTTP embedding endpoint started on http://127.0.0.1:8401/embed`. Verify it:
+```bash
+curl -s http://127.0.0.1:8401/embed \
+  -H 'content-type: application/json' \
+  -d '{"texts":["hello world"]}' | head -c 200
+```
+## Providers
+Set `EMBEDDING_PROVIDER`:
+| Value | Model | Notes |
+|-------|-------|-------|
+| `onnx_jina_v5_nano` | jina-v5-nano (33M, 768d) | Local CPU, what CPersona is benchmarked against |
+| `onnx_bge_m3` | bge-m3 | Local CPU, larger / multilingual |
+| `onnx_miniml` | all-MiniLM-L6-v2 (22M, 384d) | Local CPU, smallest |
+| `mlx_bge_m3` | bge-m3 (MLX) | Apple Silicon only — `pip install ".[mlx]"` |
+| `api_openai` | provider's model | OpenAI-compatible API; needs `EMBEDDING_API_KEY` (+ optional `EMBEDDING_API_URL`, `EMBEDDING_MODEL`) |
+Download a local model with `cembedding-download-model --model {miniml,jina-v5-nano,bge-m3}` (or `python -m cembedding.download_model ...` from a source checkout; fetched from HuggingFace into `./data/models`, not committed to this repo).
+## Configuration
+| Env var | Default | Description |
+|---------|---------|-------------|
+| `EMBEDDING_PROVIDER` | `api_openai` | Provider (see table above) |
+| `EMBEDDING_HTTP_PORT` | `8401` | HTTP port for `/embed` |
+| `EMBEDDING_INDEX_ENABLED` | `true` | Enable the persistent vector index endpoints (`/index`, `/search`, `/remove`, `/purge`) |
+| `ONNX_MODEL_DIR` | (auto) | Override the model directory for ONNX providers |
+| `ONNX_EP_PREFERENCE` | (auto) | ONNX execution providers, comma-separated. Empty = auto (CoreML on macOS, DirectML on Windows, else CPU; CPU always ensured) |
+| `ONNX_MAX_SEQ_LEN` | `2048` | Max tokenization length (1–8192; MiniLM clamped to 512 internally) |
+| `EMBEDDING_API_KEY` | — | Required for `api_openai` |
+| `EMBEDDING_API_URL` | `https://api.openai.com/v1/embeddings` | API endpoint for `api_openai` |
+## Use with CPersona
+Run this server, then tell CPersona to use it:
+```bash
+# CPersona MCP config env
+CPERSONA_EMBEDDING_MODE=http
+CPERSONA_EMBEDDING_URL=http://127.0.0.1:8401/embed
+```
+Without an embedding server CPersona still works (FTS5 + keyword search); adding one enables the vector-similarity layer.
+## License
+MIT — see [LICENSE](LICENSE).

cembedding-0.5.0/README.md ADDED Viewed

@@ -0,0 +1,111 @@
+<div align="center">
+# CEmbedding
+### Local-first embedding server
+Vector embeddings over a tiny HTTP contract.
+On-device ONNX or any OpenAI-compatible API. The reference `/embed` server for [CPersona](https://github.com/Cloto-dev/CPersona).
+[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
+[![Python](https://img.shields.io/badge/python-3.10+-blue.svg)]()
+</div>
+---
+> **Standalone repository** — extracted from the (now private) `clotohub-servers` monorepo so it can be used on its own. [ClotoCore](https://github.com/Cloto-dev/ClotoCore) users get this through the in-app marketplace ([ClotoHub](https://hub.cloto.dev)); everyone else can run it directly as described below.
+## What it is
+A small server that turns text into vectors. It speaks a minimal HTTP contract so anything can call it — its primary consumer is [CPersona](https://github.com/Cloto-dev/CPersona), whose hybrid search uses it for the vector-similarity layer. It can run a model **on-device** via ONNX (no API key, no network) or proxy an **OpenAI-compatible API**.
+It also exposes an MCP (stdio) surface and an optional persistent vector index (`/index`, `/search`), but the HTTP `/embed` endpoint is all CPersona needs.
+## The `/embed` contract
+```
+POST /embed
+Request:  { "texts": ["string", ...] }                 # non-empty array, max 100 per batch
+Response: { "embeddings": [[float, ...], ...], "dimensions": <int> }
+```
+Point any client (e.g. CPersona's `CPERSONA_EMBEDDING_URL` / generic `EMBEDDING_HTTP_URL`) at `http://127.0.0.1:8401/embed`.
+## Quick Start (on-device ONNX)
+**Prerequisites:** Python 3.10+
+```bash
+# Download a model into ./data/models (jina-v5-nano is what CPersona is tuned for)
+uvx --from "cembedding[onnx]" cembedding-download-model --model jina-v5-nano
+# Run the server (reads ./data/models from the current directory)
+EMBEDDING_PROVIDER=onnx_jina_v5_nano uvx --from "cembedding[onnx]" cembedding
+```
+Or install it onto your PATH with `pip install "cembedding[onnx]"`, then run
+`cembedding-download-model --model jina-v5-nano` and `cembedding`.
+From source (development):
+```bash
+git clone https://github.com/Cloto-dev/CEmbedding.git
+cd CEmbedding
+python -m venv .venv
+source .venv/bin/activate          # Windows: .venv\Scripts\activate
+pip install ".[onnx]"
+python -m cembedding.download_model --model jina-v5-nano
+EMBEDDING_PROVIDER=onnx_jina_v5_nano python -m cembedding   # or: python server.py
+```
+You should see `HTTP embedding endpoint started on http://127.0.0.1:8401/embed`. Verify it:
+```bash
+curl -s http://127.0.0.1:8401/embed \
+  -H 'content-type: application/json' \
+  -d '{"texts":["hello world"]}' | head -c 200
+```
+## Providers
+Set `EMBEDDING_PROVIDER`:
+| Value | Model | Notes |
+|-------|-------|-------|
+| `onnx_jina_v5_nano` | jina-v5-nano (33M, 768d) | Local CPU, what CPersona is benchmarked against |
+| `onnx_bge_m3` | bge-m3 | Local CPU, larger / multilingual |
+| `onnx_miniml` | all-MiniLM-L6-v2 (22M, 384d) | Local CPU, smallest |
+| `mlx_bge_m3` | bge-m3 (MLX) | Apple Silicon only — `pip install ".[mlx]"` |
+| `api_openai` | provider's model | OpenAI-compatible API; needs `EMBEDDING_API_KEY` (+ optional `EMBEDDING_API_URL`, `EMBEDDING_MODEL`) |
+Download a local model with `cembedding-download-model --model {miniml,jina-v5-nano,bge-m3}` (or `python -m cembedding.download_model ...` from a source checkout; fetched from HuggingFace into `./data/models`, not committed to this repo).
+## Configuration
+| Env var | Default | Description |
+|---------|---------|-------------|
+| `EMBEDDING_PROVIDER` | `api_openai` | Provider (see table above) |
+| `EMBEDDING_HTTP_PORT` | `8401` | HTTP port for `/embed` |
+| `EMBEDDING_INDEX_ENABLED` | `true` | Enable the persistent vector index endpoints (`/index`, `/search`, `/remove`, `/purge`) |
+| `ONNX_MODEL_DIR` | (auto) | Override the model directory for ONNX providers |
+| `ONNX_EP_PREFERENCE` | (auto) | ONNX execution providers, comma-separated. Empty = auto (CoreML on macOS, DirectML on Windows, else CPU; CPU always ensured) |
+| `ONNX_MAX_SEQ_LEN` | `2048` | Max tokenization length (1–8192; MiniLM clamped to 512 internally) |
+| `EMBEDDING_API_KEY` | — | Required for `api_openai` |
+| `EMBEDDING_API_URL` | `https://api.openai.com/v1/embeddings` | API endpoint for `api_openai` |
+## Use with CPersona
+Run this server, then tell CPersona to use it:
+```bash
+# CPersona MCP config env
+CPERSONA_EMBEDDING_MODE=http
+CPERSONA_EMBEDDING_URL=http://127.0.0.1:8401/embed
+```
+Without an embedding server CPersona still works (FTS5 + keyword search); adding one enables the vector-similarity layer.
+## License
+MIT — see [LICENSE](LICENSE).

cembedding-0.5.0/cembedding/__init__.py ADDED Viewed

@@ -0,0 +1,3 @@
+"""CEmbedding — local-first embedding server (the reference /embed server for CPersona)."""
+__version__ = "0.5.0"

cembedding-0.5.0/cembedding/__main__.py ADDED Viewed

@@ -0,0 +1,6 @@
+"""Enable ``python -m cembedding``."""
+from cembedding.server import run
+if __name__ == "__main__":
+    run()

cembedding-0.5.0/cembedding/_vendored_mcp_common/__init__.py ADDED Viewed

@@ -0,0 +1,14 @@
+"""Vendored subset of the MGP Python common layer.
+Ported from ``clotohub-servers/servers/common/`` so this server runs
+standalone without depending on that (now private) monorepo:
+- :mod:`_vendored_mcp_common.validation` — graceful-degradation argument
+  validators (``validate_bool`` / ``validate_str`` / ``validate_int`` /
+  ``validate_dict`` / ``validate_float`` / ``validate_list``).
+- :mod:`_vendored_mcp_common.mcp_utils` — ``ToolRegistry`` MCP tool
+  registration helper.
+Only the symbols this server actually imports are vendored; keep this
+copy in sync with the upstream common layer when it changes.
+"""

cembedding-0.5.0/cembedding/_vendored_mcp_common/mcp_utils.py ADDED Viewed

@@ -0,0 +1,152 @@
+"""
+Decorator-based MCP tool registration utility.
+Eliminates boilerplate list_tools/call_tool patterns across all servers.
+"""
+import json
+import logging
+from collections.abc import Callable
+from mcp.server import Server
+from mcp.server.stdio import stdio_server
+from mcp.types import TextContent, Tool, ToolAnnotations
+from cembedding._vendored_mcp_common.validation import validate_bool, validate_dict, validate_float, validate_int, validate_list, validate_str
+class _MgpValidationFilter(logging.Filter):
+    """Drop mcp.shared.session's bulk pydantic validation warnings.
+    The Python MCP SDK's ``ClientRequest`` union doesn't include MGP
+    extensions (``mgp/callback/respond``, ``notifications/mgp.*``). Every
+    time the kernel sends one the SDK logs a 30+ line ``Failed to validate
+    request`` warning against every known method, even though the SDK's
+    own error-response path handles it cleanly. These warnings are pure
+    noise and drown out genuine errors.
+    """
+    def filter(self, record: logging.LogRecord) -> bool:
+        msg = record.getMessage()
+        return not msg.startswith("Failed to validate request:") and not msg.startswith(
+            "Failed to validate notification:"
+        )
+_MGP_FILTER_INSTALLED = False
+def install_mgp_validation_filter() -> None:
+    """Install the MGP validation log filter on the root logger.
+    Called automatically by ``run_mcp_server``. Servers with a custom
+    main loop (e.g. ones that also serve HTTP) should call this
+    explicitly before entering ``stdio_server``.
+    """
+    global _MGP_FILTER_INSTALLED
+    if _MGP_FILTER_INSTALLED:
+        return
+    logging.getLogger().addFilter(_MgpValidationFilter())
+    _MGP_FILTER_INSTALLED = True
+_VALIDATORS: dict[type, Callable] = {
+    bool: validate_bool,
+    str: validate_str,
+    int: validate_int,
+    float: validate_float,
+    dict: validate_dict,
+    list: validate_list,
+}
+class ToolRegistry:
+    """Decorator-based MCP tool registration."""
+    def __init__(self, server_name: str):
+        self.server = Server(server_name)
+        self._tools: list[Tool] = []
+        self._handlers: dict[str, Callable] = {}
+        self._bind()
+    def tool(
+        self,
+        name: str,
+        description: str,
+        schema: dict,
+        annotations: ToolAnnotations | None = None,
+    ):
+        """Decorator: register a tool handler.
+        The decorated function receives (arguments: dict) and returns a dict.
+        JSON serialization and TextContent wrapping are handled automatically.
+        *annotations* is forwarded to the MCP Tool schema. The kernel reads
+        ``destructiveHint`` from annotations to trigger the HITL approval
+        gate for destructive tools.
+        """
+        def decorator(fn):
+            tool_kwargs = {"name": name, "description": description, "inputSchema": schema}
+            if annotations is not None:
+                tool_kwargs["annotations"] = annotations
+            self._tools.append(Tool(**tool_kwargs))
+            self._handlers[name] = fn
+            return fn
+        return decorator
+    def auto_tool(
+        self,
+        name: str,
+        description: str,
+        schema: dict,
+        handler: Callable,
+        params: list[tuple],
+        annotations: ToolAnnotations | None = None,
+    ):
+        """Register a tool with auto-validated parameter extraction.
+        Each entry in *params* is ``(key, type)`` or ``(key, type, default)``.
+        Supported types: ``str``, ``int``, ``dict``, ``list``.
+        The extracted values are passed positionally to *handler*.
+        """
+        async def _handler(arguments: dict) -> dict:
+            args = []
+            for spec in params:
+                key, typ = spec[0], spec[1]
+                default = spec[2] if len(spec) > 2 else None
+                validator = _VALIDATORS[typ]
+                if default is not None:
+                    args.append(validator(arguments, key, default))
+                else:
+                    args.append(validator(arguments, key))
+            return await handler(*args)
+        self._tools.append(Tool(name=name, description=description, inputSchema=schema, annotations=annotations))
+        self._handlers[name] = _handler
+    def _bind(self):
+        registry = self
+        @self.server.list_tools()
+        async def list_tools() -> list[Tool]:
+            return registry._tools
+        @self.server.call_tool()
+        async def call_tool(name: str, arguments: dict) -> list[TextContent]:
+            handler = registry._handlers.get(name)
+            if handler is None:
+                return [TextContent(type="text", text=json.dumps({"error": f"Unknown tool: {name}"}))]
+            try:
+                result = await handler(arguments)
+                return [TextContent(type="text", text=json.dumps(result, ensure_ascii=False))]
+            except Exception as e:
+                return [TextContent(type="text", text=json.dumps({"error": str(e)}))]
+async def run_mcp_server(registry: ToolRegistry):
+    """Standard MCP server main loop."""
+    install_mgp_validation_filter()
+    async with stdio_server() as (read_stream, write_stream):
+        await registry.server.run(read_stream, write_stream, registry.server.create_initialization_options())

cembedding-0.5.0/cembedding/_vendored_mcp_common/validation.py ADDED Viewed

@@ -0,0 +1,65 @@
+"""Common argument validation helpers for MCP tool handlers.
+All validators return a safe default on type mismatch (graceful degradation).
+"""
+def validate_bool(arguments: dict, key: str, default: bool = False) -> bool:
+    """Extract a boolean value, returning *default* if missing or wrong type."""
+    val = arguments.get(key, default)
+    if not isinstance(val, bool):
+        return default
+    return val
+def validate_str(arguments: dict, key: str, default: str = "") -> str:
+    """Extract a string value, returning *default* if missing or wrong type."""
+    val = arguments.get(key, default)
+    if not isinstance(val, str):
+        return default
+    return val
+def validate_int(arguments: dict, key: str, default: int = 0) -> int:
+    """Extract an integer value, returning *default* if missing or wrong type.
+    ``bool`` is explicitly excluded (``isinstance(True, int)`` is ``True``).
+    """
+    val = arguments.get(key, default)
+    if isinstance(val, bool) or not isinstance(val, int):
+        return default
+    return val
+def validate_dict(arguments: dict, key: str, default: dict | None = None) -> dict:
+    """Extract a dict value, returning *default* (or ``{}``) if missing or wrong type."""
+    if default is None:
+        default = {}
+    val = arguments.get(key, default)
+    if not isinstance(val, dict):
+        return default
+    return val
+def validate_float(arguments: dict, key: str, default: float = 0.0) -> float:
+    """Extract a float value, returning *default* if missing or wrong type.
+    Accepts both ``float`` and ``int`` (JSON integers are valid float inputs).
+    ``bool`` is explicitly excluded.
+    """
+    val = arguments.get(key, default)
+    if isinstance(val, bool):
+        return default
+    if isinstance(val, (int, float)):
+        return float(val)
+    return default
+def validate_list(arguments: dict, key: str, default: list | None = None) -> list:
+    """Extract a list value, returning *default* (or ``[]``) if missing or wrong type."""
+    if default is None:
+        default = []
+    val = arguments.get(key, default)
+    if not isinstance(val, list):
+        return default
+    return val

cembedding-0.5.0/cembedding/download_model.py ADDED Viewed

@@ -0,0 +1,132 @@
+"""Download ONNX embedding models for local inference."""
+import argparse
+import os
+import sys
+import urllib.request
+def _hf_download(repo_id: str, repo_filename: str, dest_path: str) -> bool:
+    """Download a single file from HuggingFace Hub.
+    Uses huggingface_hub if available (handles LFS, caching, auth).
+    Falls back to direct urllib download for minimal-dependency environments.
+    """
+    if os.path.exists(dest_path):
+        print(f"  Already exists: {dest_path}")
+        return True
+    os.makedirs(os.path.dirname(dest_path), exist_ok=True)
+    print(f"  Downloading: {repo_filename} ...")
+    try:
+        from huggingface_hub import hf_hub_download
+        cached = hf_hub_download(repo_id=repo_id, filename=repo_filename)
+        import shutil
+        shutil.copy2(cached, dest_path)
+        size_mb = os.path.getsize(dest_path) / (1024 * 1024)
+        print(f"  Saved: {dest_path} ({size_mb:.1f} MB)")
+        return True
+    except ImportError:
+        pass
+    # Fallback: direct URL
+    url = f"https://huggingface.co/{repo_id}/resolve/main/{repo_filename}"
+    try:
+        urllib.request.urlretrieve(url, dest_path)
+        size_mb = os.path.getsize(dest_path) / (1024 * 1024)
+        print(f"  Saved: {dest_path} ({size_mb:.1f} MB)")
+        return True
+    except Exception as e:
+        print(f"  Failed: {e}", file=sys.stderr)
+        if os.path.exists(dest_path):
+            os.remove(dest_path)
+        return False
+# MiniLM
+MINIML_DIR = os.environ.get("ONNX_MODEL_DIR", "data/models/all-MiniLM-L6-v2")
+MINIML_REPO = "sentence-transformers/all-MiniLM-L6-v2"
+MINIML_FILES = {
+    "model.onnx": "onnx/model.onnx",
+    "tokenizer.json": "tokenizer.json",
+}
+# jina-v5-nano (retrieval variant with merged LoRA, external data format)
+JINA_REPO = "jinaai/jina-embeddings-v5-text-nano-retrieval"
+JINA_FILES = {
+    "model.onnx": "onnx/model.onnx",
+    "model.onnx_data": "onnx/model.onnx_data",
+    "tokenizer.json": "tokenizer.json",
+}
+# bge-m3 (Xenova int8 single-file, ~542MB)
+# Xenova/bge-m3 is the canonical Transformers.js ONNX conversion maintained by HuggingFace
+BGE_M3_REPO = "Xenova/bge-m3"
+BGE_M3_FILES = {
+    "model.onnx": "onnx/model_int8.onnx",
+    "tokenizer.json": "tokenizer.json",
+    "sentencepiece.bpe.model": "sentencepiece.bpe.model",
+}
+def _download_repo_files(repo_id: str, files: dict[str, str], model_dir: str) -> bool:
+    """Download a set of repo_filename→local_filename mappings into model_dir."""
+    os.makedirs(model_dir, exist_ok=True)
+    for local_name, repo_filename in files.items():
+        dest = os.path.join(model_dir, local_name)
+        if not _hf_download(repo_id, repo_filename, dest):
+            return False
+    return True
+def download():
+    """Download MiniLM (legacy entrypoint)."""
+    print("=== Downloading all-MiniLM-L6-v2 ONNX model ===")
+    ok = _download_repo_files(MINIML_REPO, MINIML_FILES, MINIML_DIR)
+    if ok:
+        print(f"Model ready at {MINIML_DIR}")
+    return ok
+def download_jina_v5_nano(model_dir: str = "") -> bool:
+    """Download jina-embeddings-v5-text-nano-retrieval ONNX model."""
+    if not model_dir:
+        model_dir = os.environ.get("ONNX_MODEL_DIR", "data/models/jina-embeddings-v5-text-nano")
+    print("=== Downloading jina-embeddings-v5-text-nano-retrieval ===")
+    ok = _download_repo_files(JINA_REPO, JINA_FILES, model_dir)
+    if ok:
+        print(f"Model ready at {model_dir}")
+    return ok
+def download_bge_m3(model_dir: str = "") -> bool:
+    """Download BAAI/bge-m3 int8 quantized ONNX model (~542MB) via Xenova conversion."""
+    if not model_dir:
+        model_dir = os.environ.get("ONNX_MODEL_DIR", "data/models/bge-m3")
+    print("=== Downloading BAAI/bge-m3 ONNX int8 (~542 MB) from Xenova/bge-m3 ===")
+    ok = _download_repo_files(BGE_M3_REPO, BGE_M3_FILES, model_dir)
+    if ok:
+        print(f"Model ready at {model_dir}")
+    return ok
+def main():
+    """Console-script / ``python -m cembedding.download_model`` entry point."""
+    parser = argparse.ArgumentParser(description="Download ONNX embedding models")
+    parser.add_argument("--model", default="miniml", choices=["miniml", "jina-v5-nano", "bge-m3"])
+    args = parser.parse_args()
+    if args.model == "miniml":
+        success = download()
+    elif args.model == "jina-v5-nano":
+        success = download_jina_v5_nano()
+    else:
+        success = download_bge_m3()
+    sys.exit(0 if success else 1)
+if __name__ == "__main__":
+    main()