PyPI - loreguard-cli - Versions diffs - 0.3.0__py3-none-any.whl - Mend

loreguard-cli 0.3.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

loreguard_cli-0.3.0.dist-info/METADATA +220 -0
loreguard_cli-0.3.0.dist-info/RECORD +17 -0
loreguard_cli-0.3.0.dist-info/WHEEL +4 -0
loreguard_cli-0.3.0.dist-info/entry_points.txt +3 -0
loreguard_cli-0.3.0.dist-info/licenses/LICENSE +21 -0
src/__init__.py +3 -0
src/cli.py +409 -0
src/config.py +61 -0
src/llama_server.py +377 -0
src/llm.py +485 -0
src/main.py +147 -0
src/models_registry.py +130 -0
src/npc_chat.py +468 -0
src/steam.py +505 -0
src/term_ui.py +719 -0
src/tunnel.py +428 -0
src/wizard.py +649 -0

loreguard_cli-0.3.0.dist-info/METADATA ADDED Viewed

@@ -0,0 +1,220 @@
+Metadata-Version: 2.4
+Name: loreguard-cli
+Version: 0.3.0
+Summary: Local inference client for Loreguard NPCs
+Project-URL: Homepage, https://loreguard.com
+Project-URL: Documentation, https://github.com/beyond-logic-labs/loreguard-cli#readme
+Project-URL: Repository, https://github.com/beyond-logic-labs/loreguard-cli
+Project-URL: Issues, https://github.com/beyond-logic-labs/loreguard-cli/issues
+License-Expression: MIT
+License-File: LICENSE
+Keywords: gamedev,inference,llm,loreguard,npc
+Classifier: Development Status :: 4 - Beta
+Classifier: Environment :: Console
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Games/Entertainment
+Requires-Python: >=3.10
+Requires-Dist: aiofiles>=24.1.0
+Requires-Dist: httpx>=0.26.0
+Requires-Dist: pydantic>=2.5.0
+Requires-Dist: websockets>=12.0
+Provides-Extra: build
+Requires-Dist: pyinstaller>=6.0.0; extra == 'build'
+Provides-Extra: dev
+Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
+Requires-Dist: pytest>=7.4.0; extra == 'dev'
+Requires-Dist: ruff>=0.1.0; extra == 'dev'
+Description-Content-Type: text/markdown
+# Loreguard
+[![PyPI version](https://img.shields.io/pypi/v/loreguard-cli.svg)](https://pypi.org/project/loreguard-cli/)
+[![Build](https://github.com/beyond-logic-labs/loreguard-cli/actions/workflows/release.yml/badge.svg)](https://github.com/beyond-logic-labs/loreguard-cli/actions/workflows/release.yml)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
+[![GitHub release](https://img.shields.io/github/v/release/beyond-logic-labs/loreguard-cli)](https://github.com/beyond-logic-labs/loreguard-cli/releases)
+```
+┌────────────────────────────────────────────────────────────────────────────────┐
+│                                                                                │
+│  ██╗      ██████╗ ██████╗ ███████╗   ██████╗ ██╗   ██╗ █████╗ ██████╗ ██████╗  │
+│  ██║     ██╔═══██╗██╔══██╗██╔════╝  ██╔════╝ ██║   ██║██╔══██╗██╔══██╗██╔══██╗ │
+│  ██║     ██║   ██║██████╔╝█████╗    ██║  ███╗██║   ██║███████║██████╔╝██║  ██║ │
+│  ██║     ██║   ██║██╔══██╗██╔══╝    ██║   ██║██║   ██║██╔══██║██╔══██╗██║  ██║ │
+│  ███████╗╚██████╔╝██║  ██║███████╗  ╚██████╔╝╚██████╔╝██║  ██║██║  ██║██████╔╝ │
+│  ╚══════╝ ╚═════╝ ╚═╝  ╚═╝╚══════╝   ╚═════╝  ╚═════╝ ╚═╝  ╚═╝╚═╝  ╚═╝╚═════╝  │
+│                                                                                │
+│  Local inference for your game NPCs                                            │
+│  loreguard.com                                                                 │
+│                                                                                │
+└────────────────────────────────────────────────────────────────────────────────┘
+```
+AI-Powered NPCs using your own hardware (your servers or your player's hardware)
+Loreguard CLI connects the LLM Inference to the Loreguard NPC system.
+## How It Works
+```
+┌─────────────────┐    wss://api.loreguard.com    ┌─────────────────┐
+│   Your Game     │◄────────────────────────────► │  Loreguard API  │
+│  (NPC Dialog)   │                               │    (Backend)    │
+└─────────────────┘                               └────────┬────────┘
+                                                           │
+                                                           │ Routes inference
+                                                           │ to your worker
+                                                           ▼
+                                                  ┌─────────────────┐
+                                                  │  Loreguard CLI  │◄── You run this
+                                                  │  (This repo)    │
+                                                  └────────┬────────┘
+                                                           │
+                                                           │ Local inference
+                                                           ▼
+                                                  ┌─────────────────┐
+                                                  │   llama.cpp     │
+                                                  │  (Your GPU/CPU) │
+                                                  └─────────────────┘
+```
+## Installation
+### Option 1: Download Binary (Recommended)
+Download standalone binaries from [Releases](https://github.com/beyond-logic-labs/loreguard-cli/releases):
+- `loreguard-linux` - Linux x64
+- `loreguard-macos` - macOS (Intel & Apple Silicon)
+- `loreguard-windows.exe` - Windows x64
+### Option 2: Install from PyPI
+```bash
+pip install loreguard-cli
+```
+### Option 3: Install from Source
+```bash
+git clone https://github.com/beyond-logic-labs/loreguard-cli
+cd loreguard-cli
+pip install -e .
+```
+### Option 4: Build Your Own Binary
+```bash
+git clone https://github.com/beyond-logic-labs/loreguard-cli
+cd loreguard-cli
+pip install -e ".[build]"
+python scripts/build.py
+# Output: dist/loreguard (or dist/loreguard.exe on Windows)
+```
+## Quick Start
+### Interactive Wizard
+```bash
+loreguard
+```
+The wizard guides you through:
+1. **Authentication** - Enter your worker token
+2. **Model Selection** - Choose or download a model
+3. **Running** - Starts llama-server and connects to backend
+### Headless CLI
+```bash
+loreguard-cli --token lg_worker_xxx --model /path/to/model.gguf
+```
+Or auto-download a supported model:
+```bash
+loreguard-cli --token lg_worker_xxx --model-id qwen3-4b-instruct
+```
+**Environment variables:**
+```bash
+export LOREGUARD_TOKEN=lg_worker_xxx
+export LOREGUARD_MODEL=/path/to/model.gguf
+loreguard-cli
+```
+## Supported Models
+| Model ID | Name | Size | Notes |
+|----------|------|------|-------|
+| `qwen3-4b-instruct` | Qwen3 4B Instruct | 2.8 GB | **Recommended** |
+| `llama-3.2-3b-instruct` | Llama 3.2 3B | 2.0 GB | Fast |
+| `qwen3-8b` | Qwen3 8B | 5.2 GB | Higher quality |
+| `meta-llama-3-8b-instruct` | Llama 3 8B | 4.9 GB | General purpose |
+Or use any `.gguf` model with `--model /path/to/model.gguf`.
+## Use Cases
+### For Game Developers (Testing & Development)
+Use Loreguard CLI during development to test NPC dialogs with your own hardware:
+```bash
+# Start the worker
+loreguard-cli --token $YOUR_DEV_TOKEN --model-id qwen3-4b-instruct
+# Your game connects to Loreguard API
+# NPC inference requests are routed to your local worker
+```
+### For Players (Coming Soon)
+> **Note:** Player distribution support is in development. Currently, players would need their own Loreguard account and token.
+We're working on a **Game Keys** system that will allow:
+- Developers to register their game and get a Game API Key
+- Players to run the CLI without needing a Loreguard account
+- Automatic worker provisioning scoped to each game
+**Interested in early access?** Contact us at [loreguard.com](https://loreguard.com)
+## Requirements
+- **RAM**: 8GB minimum (16GB+ for larger models)
+- **GPU**: Optional but recommended (NVIDIA CUDA or Apple Silicon)
+- **Disk**: 2-6GB depending on model
+- **Python**: 3.10+ (if installing from source)
+## Get Your Token
+1. Go to [loreguard.com/developers](https://loreguard.com/developers)
+2. Create a worker token
+3. Use it with `--token` or `LOREGUARD_TOKEN`
+## Development
+```bash
+git clone https://github.com/beyond-logic-labs/loreguard-cli
+cd loreguard-cli
+python -m venv .venv
+source .venv/bin/activate  # Windows: .venv\Scripts\activate
+pip install -e ".[dev]"
+# Run interactive wizard
+python -m src.wizard
+# Run headless CLI
+python -m src.cli --help
+# Run tests
+pytest
+```
+## License
+MIT

loreguard_cli-0.3.0.dist-info/RECORD ADDED Viewed

@@ -0,0 +1,17 @@
+src/__init__.py,sha256=ZMoriJzdvRs-iW4dMtpdNd1FldEG5NG0TGaLoU2lvAc,88
+src/cli.py,sha256=gndjZfWb_YtfE8hK8rnHKx8jjDMVd6mtHtMNbN1sfU0,13044
+src/config.py,sha256=5WWbcaC5IOayHRm9AR8SquKonhouF4wZfbWf7_RqHbs,2266
+src/llama_server.py,sha256=68ufUY6kVyqOW7bdnNCIKGTisHFBjeFUioj6fRyHqR4,12322
+src/llm.py,sha256=zIg0fBE1hZDuvsqVPbKvApdgqE-WRCUN0yzPcv80Qpw,18113
+src/main.py,sha256=0s-zbBZRWOesfd6utVsqNc_DMSxSiJBCl4qjxt-6btQ,4628
+src/models_registry.py,sha256=VQNrapqbraw2B8PdfoOuqVDrJK2NBEAArNzyMf9PR6Q,4824
+src/npc_chat.py,sha256=qpd71DIDyNt5rE6VpwKk632X8T4b2I1bvRKUnMajYrA,16310
+src/steam.py,sha256=hdjSrdi3dNHVo7Ck0MgIo2IHuXDIJp3vFYofhOptNxc,16330
+src/term_ui.py,sha256=gA73hO9MMeGDU5f44UhqvYP92i5-euY2TibIKWTE3F8,27197
+src/tunnel.py,sha256=g9VakpW61BwL9ToEot6w5Hz07fAvVvH7YKkQZc15dS4,15661
+src/wizard.py,sha256=RuQGHasR16k2Mj5ALiNRovHWhQ3cmESVrMuNl2iXP00,21258
+loreguard_cli-0.3.0.dist-info/METADATA,sha256=-7DskTrhB1UrEUe4MOzoWsCsAYjMvb3o7Lti2lb8yQ8,9416
+loreguard_cli-0.3.0.dist-info/WHEEL,sha256=WLgqFyCfm_KASv4WHyYy0P3pM_m7J5L9k2skdKLirC8,87
+loreguard_cli-0.3.0.dist-info/entry_points.txt,sha256=LofpwUo6uKd-9uLxeJIbkZx8RX0DNFz3UPxM1Wokm5w,75
+loreguard_cli-0.3.0.dist-info/licenses/LICENSE,sha256=r39SrDcO4q8PEQTShsd2OWVcA9tj07oiBgoWlq-_x9c,1074
+loreguard_cli-0.3.0.dist-info/RECORD,,

loreguard_cli-0.3.0.dist-info/WHEEL ADDED Viewed

@@ -0,0 +1,4 @@
+Wheel-Version: 1.0
+Generator: hatchling 1.28.0
+Root-Is-Purelib: true
+Tag: py3-none-any

loreguard_cli-0.3.0.dist-info/entry_points.txt ADDED Viewed

@@ -0,0 +1,3 @@
+[console_scripts]
+loreguard = src.wizard:main
+loreguard-cli = src.cli:main

loreguard_cli-0.3.0.dist-info/licenses/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2024 Beyond Logic Labs
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

src/__init__.py ADDED Viewed

@@ -0,0 +1,3 @@
+"""Lorekeeper Client Bridge - Local LLM to Backend connector."""
+__version__ = "0.2.0"

src/cli.py ADDED Viewed

@@ -0,0 +1,409 @@
+#!/usr/bin/env python3
+"""Loreguard CLI - Standalone headless mode for embedding in games.
+Usage:
+    loreguard-cli --token lg_worker_xxx --model /path/to/model.gguf
+    loreguard-cli --token lg_worker_xxx --model-id qwen3-4b
+Environment variables (alternative to args):
+    LOREGUARD_TOKEN     Worker token
+    LOREGUARD_MODEL     Path to model file
+    LOREGUARD_MODEL_ID  Model ID to download (if not using custom model)
+    LOREGUARD_PORT      Local llama-server port (default: 8080)
+    LOREGUARD_BACKEND   Backend URL (default: wss://api.loreguard.com/workers)
+"""
+import argparse
+import asyncio
+import logging
+import os
+import signal
+import sys
+from datetime import datetime
+from pathlib import Path
+from typing import Optional
+# Setup logging
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s [%(levelname)s] %(message)s",
+    datefmt="%Y-%m-%d %H:%M:%S",
+)
+log = logging.getLogger("loreguard")
+class LoreguardCLI:
+    """Headless Loreguard client for embedding in games."""
+    def __init__(
+        self,
+        token: str,
+        model_path: Optional[Path] = None,
+        model_id: Optional[str] = None,
+        port: int = 8080,
+        backend_url: str = "wss://api.loreguard.com/workers",
+    ):
+        self.token = token
+        self.model_path = model_path
+        self.model_id = model_id
+        self.port = port
+        self.backend_url = backend_url
+        self._llama = None
+        self._tunnel = None
+        self._running = False
+        # Metrics
+        self._requests = 0
+        self._tokens = 0
+        self._start_time = None
+    async def run(self) -> int:
+        """Run the client. Returns exit code."""
+        self._start_time = datetime.now()
+        # Setup signal handlers
+        loop = asyncio.get_event_loop()
+        for sig in (signal.SIGINT, signal.SIGTERM):
+            loop.add_signal_handler(sig, lambda: asyncio.create_task(self._shutdown()))
+        log.info("=" * 50)
+        log.info("Loreguard CLI - Starting")
+        log.info("=" * 50)
+        try:
+            # Resolve model path
+            if not await self._resolve_model():
+                return 1
+            # Start llama-server
+            if not await self._start_llama_server():
+                return 1
+            # Connect to backend
+            if not await self._connect_backend():
+                return 1
+            self._running = True
+            log.info("=" * 50)
+            log.info("Ready! Waiting for inference requests...")
+            log.info("Press Ctrl+C to stop")
+            log.info("=" * 50)
+            # Keep running until shutdown
+            while self._running:
+                await asyncio.sleep(1)
+                self._log_stats()
+            return 0
+        except Exception as e:
+            log.error(f"Fatal error: {e}")
+            return 1
+        finally:
+            await self._cleanup()
+    async def _resolve_model(self) -> bool:
+        """Resolve model path, downloading if needed."""
+        if self.model_path:
+            if not self.model_path.exists():
+                log.error(f"Model not found: {self.model_path}")
+                return False
+            log.info(f"Using model: {self.model_path}")
+            return True
+        if self.model_id:
+            from .models_registry import SUPPORTED_MODELS
+            from .llama_server import get_models_dir, DownloadProgress
+            # Find model by ID
+            model = None
+            for m in SUPPORTED_MODELS:
+                if m.id == self.model_id:
+                    model = m
+                    break
+            if not model:
+                log.error(f"Unknown model ID: {self.model_id}")
+                log.info("Available models:")
+                for m in SUPPORTED_MODELS:
+                    log.info(f"  - {m.id}: {m.name}")
+                return False
+            models_dir = get_models_dir()
+            self.model_path = models_dir / model.filename
+            if self.model_path.exists():
+                log.info(f"Model already downloaded: {self.model_path}")
+                return True
+            # Download
+            log.info(f"Downloading {model.name} ({model.size_gb:.1f} GB)...")
+            try:
+                await self._download_model(model, self.model_path)
+                log.info(f"Download complete: {self.model_path}")
+                return True
+            except Exception as e:
+                log.error(f"Download failed: {e}")
+                return False
+        log.error("No model specified. Use --model or --model-id")
+        return False
+    async def _download_model(self, model, dest: Path) -> None:
+        """Download a model file with progress."""
+        import httpx
+        dest.parent.mkdir(parents=True, exist_ok=True)
+        async with httpx.AsyncClient(follow_redirects=True, timeout=None) as client:
+            async with client.stream("GET", model.url) as response:
+                response.raise_for_status()
+                total = model.size_bytes or int(response.headers.get("content-length", 0))
+                downloaded = 0
+                last_log = 0
+                with open(dest, "wb") as f:
+                    async for chunk in response.aiter_bytes(chunk_size=1024 * 1024):
+                        f.write(chunk)
+                        downloaded += len(chunk)
+                        # Log progress every 10%
+                        pct = int(downloaded / total * 100) if total else 0
+                        if pct >= last_log + 10:
+                            last_log = pct
+                            log.info(f"  {pct}% ({downloaded // 1024 // 1024} MB)")
+    async def _start_llama_server(self) -> bool:
+        """Start llama-server."""
+        from .llama_server import (
+            LlamaServerProcess,
+            is_llama_server_installed,
+            download_llama_server,
+            DownloadProgress,
+        )
+        # Download llama-server if needed
+        if not is_llama_server_installed():
+            log.info("Downloading llama-server...")
+            try:
+                def on_progress(msg: str, progress: DownloadProgress | None):
+                    if progress and int(progress.percent) % 20 == 0:
+                        log.info(f"  {int(progress.percent)}%")
+                await download_llama_server(on_progress)
+                log.info("llama-server downloaded")
+            except Exception as e:
+                log.error(f"Failed to download llama-server: {e}")
+                return False
+        # Start server
+        log.info(f"Starting llama-server on port {self.port}...")
+        try:
+            self._llama = LlamaServerProcess(self.model_path, port=self.port)
+            self._llama.start()
+            # Wait for ready
+            ready = await self._llama.wait_for_ready(timeout=120.0)
+            if not ready:
+                log.error("llama-server failed to start (timeout)")
+                return False
+            log.info("llama-server ready")
+            return True
+        except Exception as e:
+            log.error(f"Failed to start llama-server: {e}")
+            return False
+    async def _connect_backend(self) -> bool:
+        """Connect to Loreguard backend."""
+        # Dev mode - skip backend connection
+        if self.token == "dev_mock_token":
+            log.info("DEV MODE: Skipping backend connection")
+            log.info(f"llama-server running at http://127.0.0.1:{self.port}")
+            log.info("You can send requests directly to the llama-server")
+            return True
+        from .tunnel import BackendTunnel
+        from .llm import LLMProxy
+        log.info(f"Connecting to {self.backend_url}...")
+        try:
+            # Extract worker ID from token (format: lg_worker_<id>_<secret>)
+            parts = self.token.split("_")
+            worker_id = parts[2] if len(parts) >= 3 else "worker"
+            llm_proxy = LLMProxy(f"http://127.0.0.1:{self.port}")
+            self._tunnel = BackendTunnel(
+                backend_url=self.backend_url,
+                llm_proxy=llm_proxy,
+                worker_id=worker_id,
+                worker_token=self.token,
+                model_id=self.model_path.stem if self.model_path else "unknown",
+            )
+            self._tunnel.on_request_complete = self._on_request_complete
+            # Start connection (runs in background)
+            asyncio.create_task(self._tunnel.connect())
+            # Wait a bit for connection
+            await asyncio.sleep(2)
+            log.info("Backend connection established")
+            return True
+        except Exception as e:
+            log.error(f"Failed to connect to backend: {e}")
+            return False
+    def _on_request_complete(
+        self, npc: str, tokens: int, ttft_ms: float, total_ms: float
+    ) -> None:
+        """Called when a request completes."""
+        self._requests += 1
+        self._tokens += tokens
+        tps = (tokens / total_ms * 1000) if total_ms > 0 else 0
+        log.info(
+            f"Request #{self._requests}: {npc} | "
+            f"{tokens} tokens | {ttft_ms:.0f}ms TTFT | "
+            f"{total_ms/1000:.1f}s total | {tps:.1f} tk/s"
+        )
+    def _log_stats(self) -> None:
+        """Log periodic stats (every 60 seconds)."""
+        if not self._start_time:
+            return
+        elapsed = (datetime.now() - self._start_time).total_seconds()
+        if int(elapsed) % 60 == 0 and int(elapsed) > 0:
+            mins = int(elapsed // 60)
+            log.info(
+                f"Stats: {mins}m uptime | "
+                f"{self._requests} requests | "
+                f"{self._tokens:,} tokens"
+            )
+    async def _shutdown(self) -> None:
+        """Graceful shutdown."""
+        if not self._running:
+            return
+        log.info("Shutting down...")
+        self._running = False
+    async def _cleanup(self) -> None:
+        """Cleanup resources."""
+        if self._tunnel:
+            try:
+                await self._tunnel.disconnect()
+            except Exception:
+                pass
+        if self._llama:
+            try:
+                self._llama.stop()
+            except Exception:
+                pass
+        log.info("Goodbye!")
+def main():
+    """CLI entry point."""
+    parser = argparse.ArgumentParser(
+        description="Loreguard CLI - Local inference for game NPCs",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  loreguard-cli --token lg_worker_xxx --model ./model.gguf
+  loreguard-cli --token lg_worker_xxx --model-id qwen3-4b
+  LOREGUARD_TOKEN=lg_worker_xxx loreguard-cli --model-id qwen3-4b
+Available model IDs:
+  qwen3-4b-instruct    Qwen3 4B Instruct (recommended, 2.8 GB)
+  llama-3.2-3b         Llama 3.2 3B Instruct (2.0 GB)
+  qwen3-8b             Qwen3 8B (5.2 GB)
+  meta-llama-3-8b      Meta Llama 3 8B (4.9 GB)
+        """,
+    )
+    parser.add_argument(
+        "--token",
+        default=os.getenv("LOREGUARD_TOKEN", ""),
+        help="Worker token (or set LOREGUARD_TOKEN env var)",
+    )
+    parser.add_argument(
+        "--model",
+        type=Path,
+        default=os.getenv("LOREGUARD_MODEL"),
+        help="Path to .gguf model file",
+    )
+    parser.add_argument(
+        "--model-id",
+        default=os.getenv("LOREGUARD_MODEL_ID"),
+        help="Model ID to download (e.g., qwen3-4b-instruct)",
+    )
+    parser.add_argument(
+        "--port",
+        type=int,
+        default=int(os.getenv("LOREGUARD_PORT", "8080")),
+        help="Local llama-server port (default: 8080)",
+    )
+    parser.add_argument(
+        "--backend",
+        default=os.getenv("LOREGUARD_BACKEND", "wss://api.loreguard.com/workers"),
+        help="Backend WebSocket URL",
+    )
+    parser.add_argument(
+        "-v", "--verbose",
+        action="store_true",
+        help="Enable debug logging",
+    )
+    parser.add_argument(
+        "--dev",
+        action="store_true",
+        help="Dev mode - skip backend connection, just run llama-server",
+    )
+    args = parser.parse_args()
+    if args.verbose:
+        logging.getLogger().setLevel(logging.DEBUG)
+    # Dev mode - skip token validation
+    if args.dev:
+        args.token = "dev_mock_token"
+        log.info("Running in DEV MODE - no backend connection")
+    else:
+        # Validate token
+        if not args.token:
+            log.error("Token required. Use --token or set LOREGUARD_TOKEN (or use --dev)")
+            sys.exit(1)
+        if not args.token.startswith("lg_worker_"):
+            log.error("Invalid token format (must start with lg_worker_)")
+            sys.exit(1)
+    # Validate model
+    if not args.model and not args.model_id:
+        log.error("Model required. Use --model or --model-id")
+        sys.exit(1)
+    # Run
+    cli = LoreguardCLI(
+        token=args.token,
+        model_path=args.model,
+        model_id=args.model_id,
+        port=args.port,
+        backend_url=args.backend,
+    )
+    exit_code = asyncio.run(cli.run())
+    sys.exit(exit_code)
+if __name__ == "__main__":
+    main()