npm - ltcai - Versions diffs - 0.1.29 → 0.1.31 - Mend

ltcai 0.1.29 → 0.1.31

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/README.md +54 -24
package/auto_setup.py +279 -55
package/docs/CHANGELOG.md +52 -0
package/docs/images/lattice-ai-demo.gif +0 -0
package/docs/images/screenshot-admin.png +0 -0
package/docs/images/screenshot-chat.png +0 -0
package/docs/images/screenshot-graph.png +0 -0
package/knowledge_graph.py +1338 -3
package/knowledge_graph_api.py +112 -0
package/llm_router.py +15 -9
package/local_knowledge_api.py +319 -0
package/mcp_registry.py +791 -0
package/package.json +5 -2
package/requirements.txt +2 -0
package/server.py +209 -965
package/static/graph.html +7 -2
package/static/lattice-reference.css +220 -0
package/static/scripts/graph.js +305 -4

package/README.md CHANGED Viewed

@@ -1,7 +1,9 @@
 <div align="center">
   <img src="https://raw.githubusercontent.com/TaeSooPark-PTS/LatticeAI/main/docs/images/logo.svg" alt="Lattice AI" width="280"/>
   <br/>
-  <strong>Your personal AI workspace server — local & cloud, one stack.</strong>
+  <strong>One install. Your personal AI workspace.</strong>
+  <br/>
+  Local LLMs, cloud models, VS Code / Cursor, Telegram, MCP tools, files, admin controls, and a knowledge graph in one self-hosted stack.
   <br/><br/>
 [![PyPI](https://img.shields.io/pypi/v/ltcai?label=PyPI&color=blue)](https://pypi.org/project/ltcai/)
@@ -9,35 +11,61 @@
 [![npm](https://img.shields.io/npm/v/ltcai?label=npm)](https://www.npmjs.com/package/ltcai)
 [![VS Code](https://vsmarketplacebadges.dev/version-short/parktaesoo.ltcai.svg)](https://marketplace.visualstudio.com/items?itemName=parktaesoo.ltcai)
 [![Open VSX](https://img.shields.io/open-vsx/v/parktaesoo/ltcai?label=Open%20VSX)](https://open-vsx.org/extension/parktaesoo/ltcai)
+[![CI](https://github.com/TaeSooPark-PTS/LatticeAI/actions/workflows/ci.yml/badge.svg)](https://github.com/TaeSooPark-PTS/LatticeAI/actions/workflows/ci.yml)
 [![License: MIT](https://img.shields.io/badge/License-MIT-green)](./LICENSE)
 [![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue)](https://www.python.org/)
+<br/>
+<img src="https://raw.githubusercontent.com/TaeSooPark-PTS/LatticeAI/main/docs/images/lattice-ai-demo.gif" alt="Lattice AI demo showing chat, knowledge graph, and admin dashboard" width="100%"/>
 </div>
 ---
 ## What is Lattice AI?
-**Lattice AI** is a self-hosted AI server that unifies local and cloud LLMs into one workspace — web chat, VS Code extension, Telegram bot, and MCP tools, all from a single `pip install`.
+**Lattice AI** is a self-hosted AI server that unifies local and cloud LLMs into one practical workspace. Install once, then use the same AI from the web UI, VS Code / Cursor, Telegram, MCP clients, files, and your personal knowledge graph.
 - 🖥️ **Web UI** — chat, file upload, admin dashboard, data graph
 - 🧩 **VS Code / Cursor extension** — edit, explain, generate commands inline
 - 📱 **Telegram bot** — access your AI from anywhere
 - 🔌 **MCP server** — use Lattice tools inside Claude Desktop / Cursor
 - 🔒 **Zero telemetry** — all data stays in `~/.ltcai/` on your machine
+- ⚡ **30-second start** — `pip install ltcai` or `npm install -g ltcai`
 ---
-## 📸 Screenshots
+## 📸 Product Preview
+Real screens from the local web app:
 <table>
 <tr>
-<td width="33%"><b>Chat UI</b><br/><img src="https://raw.githubusercontent.com/TaeSooPark-PTS/LatticeAI/main/docs/images/screenshot-chat.png" alt="Lattice AI Chat" width="100%"/></td>
-<td width="33%"><b>Admin Dashboard</b><br/><img src="https://raw.githubusercontent.com/TaeSooPark-PTS/LatticeAI/main/docs/images/screenshot-admin.png" alt="Admin Dashboard" width="100%"/></td>
-<td width="33%"><b>Data Graph (Graph RAG)</b><br/><img src="https://raw.githubusercontent.com/TaeSooPark-PTS/LatticeAI/main/docs/images/screenshot-graph.png" alt="Knowledge Graph" width="100%"/></td>
+<td align="center" width="33%">
+  <b>💬 Workspace Chat</b><br/>
+  <img src="https://raw.githubusercontent.com/TaeSooPark-PTS/LatticeAI/main/docs/images/screenshot-chat.png" alt="Lattice AI workspace chat" width="100%"/>
+  <sub>Web chat with local LLM, file upload, pipeline status</sub>
+</td>
+<td align="center" width="33%">
+  <b>🛡️ Admin Dashboard</b><br/>
+  <img src="https://raw.githubusercontent.com/TaeSooPark-PTS/LatticeAI/main/docs/images/screenshot-admin.png" alt="Lattice AI admin dashboard" width="100%"/>
+  <sub>User management, audit log, security monitoring</sub>
+</td>
+<td align="center" width="33%">
+  <b>🕸️ Knowledge Graph</b><br/>
+  <img src="https://raw.githubusercontent.com/TaeSooPark-PTS/LatticeAI/main/docs/images/screenshot-graph.png" alt="Lattice AI knowledge graph" width="100%"/>
+  <sub>Auto-built Graph RAG from chats &amp; documents</sub>
+</td>
 </tr>
 </table>
+What this gives users after install:
+- A single local workspace for chat, files, models, runtime setup, and tool control
+- A graph view that turns chats and documents into searchable knowledge
+- Admin screens for users, model status, VPC settings, SSO, audit logs, and security monitoring
 ---
 ## ⚡ Quick Start (30 seconds)
@@ -45,16 +73,9 @@
 **Python / PyPI**
 ```bash
-# Install (cloud models)
 pip install ltcai
-# Install (+ Apple Silicon local models)
 pip install "ltcai[local]"
-# Verify environment
 LTCAI doctor
-# Start server
 LTCAI
 # → http://localhost:4825
 ```
@@ -95,27 +116,32 @@ Comparison is based on public product behavior as of 2026-05.
 | VS Code extension | ✅ | ❌ | ✅ | ✅ |
 | Telegram bot | ✅ | ❌ | ❌ | ❌ |
 | Graph RAG (auto knowledge graph) | ✅ | ❌ | ❌ | ❌ |
-| MCP registry & install | ✅ | ❌ | ✅ | ❌ |
+| MCP registry (browse & one-click install) | ✅ | ⚠️* | ✅ | ❌ |
 | Admin dashboard + audit log | ✅ | ✅ | ❌ | ❌ |
 | Self-hosted, zero telemetry | ✅ | ✅ | ✅ | ❌ |
 | One-command public tunnel | ✅ | ❌ | ❌ | ❌ |
 | Free | ✅ | ✅ | ✅ | ❌ |
+> ⚠️ *Open WebUI supports MCP via manual URL configuration — no registry browsing or one-click install.
 ---
 ## 🧠 Supported Models
-**Local — Apple Silicon only (MLX):**
+**Local — Apple Silicon MLX + cross-platform local servers:**
 | Model | Best for | Size |
 |-------|----------|------|
-| `mlx-community/gemma-4-26b-a4b-it-4bit` | General / coding | ~14 GB |
-| `mlx-community/Qwen2.5-Coder-32B-Instruct-4bit` | Coding | ~18 GB |
-| `mlx-community/DeepSeek-R1-0528-4bit` | Reasoning | ~38 GB |
-| `mlx-community/Phi-4-4bit` | Coding (fast) | ~8 GB |
+| `mlx-community/Qwen3-VL-4B-Instruct-4bit` | Multimodal / low spec | ~2.7 GB |
+| `mlx-community/Qwen3-VL-8B-Instruct-4bit` | Multimodal / balanced | ~4.8 GB |
+| `mlx-community/Qwen3-VL-30B-A3B-Instruct-4bit` | Multimodal / large | ~18 GB |
+| `mlx-community/Llama-3.1-8B-Instruct-4bit` | General | ~4.7 GB |
+| `mlx-community/Mistral-7B-Instruct-v0.3-4bit` | General / Apache | ~4.1 GB |
+| `mlx-community/Phi-4-mini-instruct-4bit` | Coding (fast) | ~2.2 GB |
+| `mlx-community/gemma-4-26b-a4b-it-4bit` | Multimodal / large | ~15.6 GB |
 **Cloud (any platform):**
-OpenAI · Groq · Together · OpenRouter · any OpenAI-compatible endpoint
+OpenAI GPT-5.5 · OpenRouter Claude Opus 4.7 / Sonnet 4.6 / Haiku 4.5 · Groq · Together · any OpenAI-compatible endpoint
 ---
@@ -298,7 +324,7 @@ Or: `./start_ai.sh` (auto-restart + caffeinate)
 | VS Code Marketplace | [marketplace.visualstudio.com](https://marketplace.visualstudio.com/items?itemName=parktaesoo.ltcai) |
 | Open VSX | [open-vsx.org](https://open-vsx.org/extension/parktaesoo/ltcai) |
-Current version: **0.1.29** — [Changelog](docs/CHANGELOG.md)
+Current version: **0.1.31** — [Changelog](docs/CHANGELOG.md)
 ---
@@ -345,9 +371,13 @@ LTCAI --tunnel                       # + Cloudflare 공개 URL 자동 발급
 | 모델 | 용도 | 크기 |
 |------|------|------|
-| `mlx-community/gemma-4-26b-a4b-it-4bit` | 범용 | ~14GB |
-| `mlx-community/Qwen2.5-Coder-32B-Instruct-4bit` | 코딩 | ~18GB |
-| `mlx-community/DeepSeek-R1-0528-4bit` | 추론 | ~38GB |
+| `mlx-community/Qwen3-VL-4B-Instruct-4bit` | 멀티모달 / 저사양 | ~2.7GB |
+| `mlx-community/Qwen3-VL-8B-Instruct-4bit` | 멀티모달 / 균형 | ~4.8GB |
+| `mlx-community/Qwen3-VL-30B-A3B-Instruct-4bit` | 멀티모달 / 대형 | ~18GB |
+| `mlx-community/Llama-3.1-8B-Instruct-4bit` | 범용 | ~4.7GB |
+| `mlx-community/Mistral-7B-Instruct-v0.3-4bit` | 범용 / Apache | ~4.1GB |
+| `mlx-community/Phi-4-mini-instruct-4bit` | 코딩 | ~2.2GB |
+| `mlx-community/gemma-4-26b-a4b-it-4bit` | 멀티모달 / 대형 | ~15.6GB |
 자세한 내용: [docs/CHANGELOG.md](docs/CHANGELOG.md) · [보안](SECURITY.md) · [기여](CONTRIBUTING.md)

package/auto_setup.py CHANGED Viewed

@@ -38,6 +38,7 @@ import argparse
 import json
 import os
 import platform
+import re
 import shutil
 import subprocess
 import sys
@@ -68,12 +69,19 @@ class SystemProfile:
     arch: str = ""                   # x86_64 | arm64 | …
     cpu_model: str = ""
     cpu_cores: int = 0
+    cpu_logical_cores: int = 0
+    cpu_instructions: List[str] = field(default_factory=list)
     ram_mb: int = 0
     disk_free_mb: int = 0
     gpu: GPUInfo = field(default_factory=GPUInfo)
     package_manager: Optional[str] = None   # winget | brew | apt | dnf | pacman
     has_internet: bool = True
     python_version: str = ""
+    is_wsl: bool = False
+    wsl_version: str = ""
+    cuda_available: bool = False
+    cuda_version: str = ""
+    tools: Dict[str, str] = field(default_factory=dict)
     def score(self) -> int:
         """LLM 적합도 점수 (0..100). RECOMMEND 의 입력."""
@@ -105,13 +113,84 @@ def _run(cmd: List[str], timeout: float = 4.0) -> str:
         return ""
+def _windows_candidate_paths(binary: str) -> List[str]:
+    local_appdata = os.environ.get("LOCALAPPDATA", "")
+    program_files = os.environ.get("ProgramFiles", r"C:\Program Files")
+    program_files_x86 = os.environ.get("ProgramFiles(x86)", r"C:\Program Files (x86)")
+    candidates = {
+        "ollama": [
+            str(Path(local_appdata) / "Programs" / "Ollama" / "ollama.exe") if local_appdata else "",
+            str(Path(program_files) / "Ollama" / "ollama.exe"),
+        ],
+        "lms": [
+            str(Path(local_appdata) / "Programs" / "LM Studio" / "resources" / "app" / ".webpack" / "lms.exe") if local_appdata else "",
+            str(Path(program_files) / "LM Studio" / "resources" / "app" / ".webpack" / "lms.exe"),
+        ],
+        "nvidia-smi": [
+            str(Path(program_files) / "NVIDIA Corporation" / "NVSMI" / "nvidia-smi.exe"),
+            str(Path(program_files_x86) / "NVIDIA Corporation" / "NVSMI" / "nvidia-smi.exe"),
+        ],
+    }
+    return [item for item in candidates.get(binary, []) if item]
+def _which(binary: str) -> Optional[str]:
+    found = shutil.which(binary)
+    if found:
+        return found
+    if platform.system() == "Windows":
+        for candidate in _windows_candidate_paths(binary):
+            if Path(candidate).exists():
+                return candidate
+    return None
+def _parse_windows_video_controllers(raw: str) -> List[Dict[str, Any]]:
+    controllers: List[Dict[str, Any]] = []
+    if not raw:
+        return controllers
+    try:
+        data = json.loads(raw)
+        if isinstance(data, dict):
+            data = [data]
+        if isinstance(data, list):
+            for item in data:
+                name = str(item.get("Name") or "").strip()
+                if not name:
+                    continue
+                try:
+                    ram_mb = int(item.get("AdapterRAM") or 0) // (1024 * 1024)
+                except Exception:
+                    ram_mb = 0
+                controllers.append({"name": name, "vram_mb": ram_mb})
+        if controllers:
+            return controllers
+    except Exception:
+        pass
+    current: Dict[str, Any] = {}
+    for line in raw.splitlines():
+        if line.startswith("Name="):
+            if current:
+                controllers.append(current)
+            current = {"name": line.split("=", 1)[-1].strip(), "vram_mb": 0}
+        elif line.startswith("AdapterRAM=") and current:
+            try:
+                current["vram_mb"] = int(line.split("=", 1)[-1].strip()) // (1024 * 1024)
+            except ValueError:
+                current["vram_mb"] = 0
+    if current:
+        controllers.append(current)
+    return controllers
 def _detect_gpu(prof_os: str, arch: str) -> GPUInfo:
     """OS별 휴리스틱으로 GPU 감지. 외부 라이브러리 없이 가능한 만큼만."""
     gpu = GPUInfo()
     # NVIDIA
-    if shutil.which("nvidia-smi"):
-        info = _run(["nvidia-smi", "--query-gpu=name,memory.total",
+    nvidia_smi = _which("nvidia-smi")
+    if nvidia_smi:
+        info = _run([nvidia_smi, "--query-gpu=name,memory.total",
                      "--format=csv,noheader,nounits"])
         if info.strip():
             first = info.strip().splitlines()[0]
@@ -139,30 +218,29 @@ def _detect_gpu(prof_os: str, arch: str) -> GPUInfo:
     # Windows
     if prof_os == "windows" and gpu.vendor == "unknown":
-        info = _run(["wmic", "path", "win32_VideoController", "get",
-                     "Name,AdapterRAM", "/format:list"])
-        if info:
-            name = ""
-            ram = 0
-            for line in info.splitlines():
-                if line.startswith("Name="):
-                    name = line.split("=", 1)[-1].strip()
-                elif line.startswith("AdapterRAM="):
-                    try:
-                        ram = int(line.split("=", 1)[-1].strip()) // (1024 * 1024)
-                    except ValueError:
-                        ram = 0
-            if name:
-                gpu.model = name
-                low = name.lower()
-                if "nvidia" in low or "rtx" in low or "geforce" in low:
-                    gpu.vendor = "nvidia"; gpu.sdk.append("cuda")
-                elif "amd" in low or "radeon" in low:
-                    gpu.vendor = "amd"; gpu.sdk.extend(["directml", "vulkan"])
-                elif "intel" in low:
-                    gpu.vendor = "intel"; gpu.sdk.extend(["directml", "vulkan"])
-                if ram > 0:
-                    gpu.vram_mb = ram
+        shell = _which("powershell") or _which("pwsh")
+        info = ""
+        if shell:
+            info = _run([
+                shell, "-NoProfile", "-Command",
+                "Get-CimInstance Win32_VideoController | Select-Object Name,AdapterRAM | ConvertTo-Json -Compress",
+            ], timeout=8.0)
+        if not info:
+            info = _run(["wmic", "path", "win32_VideoController", "get",
+                         "Name,AdapterRAM", "/format:list"])
+        controllers = _parse_windows_video_controllers(info)
+        if controllers:
+            primary = max(controllers, key=lambda item: int(item.get("vram_mb") or 0))
+            name = str(primary.get("name") or "")
+            gpu.model = name
+            gpu.vram_mb = int(primary.get("vram_mb") or 0)
+            low = name.lower()
+            if "nvidia" in low or "rtx" in low or "geforce" in low:
+                gpu.vendor = "nvidia"; gpu.sdk.append("cuda")
+            elif "amd" in low or "radeon" in low:
+                gpu.vendor = "amd"; gpu.sdk.extend(["directml", "vulkan"])
+            elif "intel" in low or "arc" in low or "iris" in low:
+                gpu.vendor = "intel"; gpu.sdk.extend(["directml", "vulkan"])
     # Linux (lspci)
     if prof_os == "linux" and gpu.vendor == "unknown":
@@ -179,16 +257,96 @@ def _detect_gpu(prof_os: str, arch: str) -> GPUInfo:
 def _detect_package_manager(prof_os: str) -> Optional[str]:
     if prof_os == "windows":
-        return "winget" if shutil.which("winget") else None
+        return "winget" if _which("winget") else None
     if prof_os == "darwin":
-        return "brew" if shutil.which("brew") else None
+        return "brew" if _which("brew") else None
     if prof_os == "linux":
         for pm in ("apt", "dnf", "pacman", "zypper", "apk"):
-            if shutil.which(pm):
+            if _which(pm):
                 return pm
     return None
+def _detect_tools() -> Dict[str, str]:
+    tools: Dict[str, str] = {}
+    for binary in ("ollama", "lms", "nvidia-smi", "nvcc", "winget", "brew", "apt", "git", "node", "python", "python3"):
+        found = _which(binary)
+        if found:
+            tools[binary] = found
+    return tools
+def _detect_wsl(prof_os: str) -> Tuple[bool, str]:
+    if prof_os != "linux":
+        return False, ""
+    raw = _read_text("/proc/version")
+    is_wsl = "microsoft" in raw.lower() or "wsl" in raw.lower()
+    version = "2" if "microsoft-standard" in raw.lower() or "wsl2" in raw.lower() else ("1" if is_wsl else "")
+    return is_wsl, version
+def _detect_cuda() -> Tuple[bool, str]:
+    nvidia_smi = _which("nvidia-smi")
+    nvcc = _which("nvcc")
+    version = ""
+    if nvidia_smi:
+        raw = _run([nvidia_smi, "--query-gpu=driver_version", "--format=csv,noheader"], timeout=4.0)
+        version = raw.splitlines()[0].strip() if raw.splitlines() else ""
+    if nvcc:
+        raw = _run([nvcc, "--version"], timeout=4.0)
+        m = re.search(r"release\s+([\d.]+)", raw)
+        if m:
+            version = m.group(1)
+    return bool(nvidia_smi or nvcc), version
+def _detect_cpu_details(prof_os: str) -> Tuple[str, int, int, List[str]]:
+    model = platform.processor() or ""
+    physical = os.cpu_count() or 0
+    logical = os.cpu_count() or 0
+    flags: List[str] = []
+    if prof_os == "darwin":
+        model = _run(["sysctl", "-n", "machdep.cpu.brand_string"]).strip() or model
+        try:
+            physical = int((_run(["sysctl", "-n", "hw.physicalcpu"]).strip() or physical))
+            logical = int((_run(["sysctl", "-n", "hw.logicalcpu"]).strip() or logical))
+        except ValueError:
+            pass
+        flags = [item.lower() for item in _run(["sysctl", "-n", "machdep.cpu.features"]).split()]
+    elif prof_os == "linux":
+        text = _read_text("/proc/cpuinfo")
+        for line in text.splitlines():
+            if line.lower().startswith("model name") and not model:
+                model = line.split(":", 1)[-1].strip()
+            if line.lower().startswith(("flags", "features")) and not flags:
+                flags = line.split(":", 1)[-1].strip().lower().split()
+    elif prof_os == "windows":
+        raw = _run(["wmic", "cpu", "get", "Name,NumberOfCores,NumberOfLogicalProcessors", "/format:list"])
+        for line in raw.splitlines():
+            key, _, value = line.partition("=")
+            if key == "Name" and value.strip():
+                model = value.strip()
+            elif key == "NumberOfCores" and value.strip():
+                try:
+                    physical = int(value.strip())
+                except ValueError:
+                    pass
+            elif key == "NumberOfLogicalProcessors" and value.strip():
+                try:
+                    logical = int(value.strip())
+                except ValueError:
+                    pass
+        try:
+            import ctypes
+            kernel32 = ctypes.windll.kernel32
+            feature_map = {6: "sse", 10: "sse2", 13: "sse3", 19: "neon", 28: "rdrand"}
+            flags.extend(name for code, name in feature_map.items() if kernel32.IsProcessorFeaturePresent(code))
+        except Exception:
+            pass
+    interesting = {"avx", "avx2", "avx512f", "fma", "neon", "sse4_2", "sse", "sse2", "sse3", "rdrand"}
+    return model, physical, logical, sorted({flag for flag in flags if flag in interesting})
 def _has_module(name: str) -> bool:
     try:
         __import__(name)
@@ -204,9 +362,15 @@ def probe() -> SystemProfile:
                "Linux": "linux"}.get(platform.system(), platform.system().lower())
     prof.os_version = platform.release()
     prof.arch = platform.machine().lower()
-    prof.cpu_model = platform.processor() or ""
-    prof.cpu_cores = os.cpu_count() or 0
+    cpu_model, cpu_cores, cpu_logical_cores, cpu_instructions = _detect_cpu_details(prof.os)
+    prof.cpu_model = cpu_model
+    prof.cpu_cores = cpu_cores
+    prof.cpu_logical_cores = cpu_logical_cores
+    prof.cpu_instructions = cpu_instructions
     prof.python_version = platform.python_version()
+    prof.is_wsl, prof.wsl_version = _detect_wsl(prof.os)
+    prof.cuda_available, prof.cuda_version = _detect_cuda()
+    prof.tools = _detect_tools()
     # RAM
     try:
@@ -218,7 +382,27 @@ def probe() -> SystemProfile:
         elif prof.os == "darwin":
             out = _run(["sysctl", "-n", "hw.memsize"])
             if out.strip():
-                prof.ram_mb = int(out.strip()) // (1024 * 1024)
+                try:
+                    prof.ram_mb = int(out.strip()) // (1024 * 1024)
+                except ValueError:
+                    prof.ram_mb = 0
+            if not prof.ram_mb:
+                profiler = _run(["system_profiler", "SPHardwareDataType"], timeout=8.0)
+                m = re.search(r"Memory:\s+([\d.]+)\s*(TB|GB|MB)", profiler, re.IGNORECASE)
+                if m:
+                    value = float(m.group(1))
+                    unit = m.group(2).lower()
+                    if unit == "tb":
+                        prof.ram_mb = int(value * 1024 * 1024)
+                    elif unit == "gb":
+                        prof.ram_mb = int(value * 1024)
+                    else:
+                        prof.ram_mb = int(value)
+                if not prof.ram_mb:
+                    hostinfo = _run(["hostinfo"])
+                    m = re.search(r"Primary memory available:\s+([\d.]+)\s+gigabytes", hostinfo, re.IGNORECASE)
+                    if m:
+                        prof.ram_mb = int(float(m.group(1)) * 1024)
         elif prof.os == "windows":
             out = _run(["wmic", "ComputerSystem", "get", "TotalPhysicalMemory",
                         "/format:list"])
@@ -258,16 +442,23 @@ class Recommendation:
 # 모델 카탈로그. PPT 슬라이드 16 의 "추천 모델" 열과 동기화.
 _MODEL_CATALOG: List[Dict[str, Any]] = [
     # (min_ram_mb, min_vram_mb, model_id, quant, runtime_preference)
-    {"ram": 24 * 1024, "vram": 16 * 1024,
-     "id": "google/gemma-3-12b-it", "q": "q5_K_M"},
+    # OS 오버헤드(~4-6 GB) + KV 캐시 여유를 감안한 보수적 RAM 임계값
+    {"ram": 64 * 1024, "vram": 32 * 1024,
+     "id": "Qwen/Qwen3-VL-30B-A3B-Instruct", "q": "q4_K_M", "multimodal": True},
+    {"ram": 48 * 1024, "vram": 24 * 1024,
+     "id": "Qwen/Qwen3-VL-30B-A3B-Instruct", "q": "q4_K_M", "multimodal": True},
+    {"ram": 32 * 1024, "vram": 16 * 1024,
+     "id": "Qwen/Qwen3-VL-8B-Instruct", "q": "q5_K_M", "multimodal": True},
+    {"ram": 24 * 1024, "vram": 12 * 1024,
+     "id": "Qwen/Qwen3-VL-8B-Instruct", "q": "q4_K_M", "multimodal": True},
     {"ram": 16 * 1024, "vram": 8 * 1024,
-     "id": "Qwen/Qwen2.5-7B-Instruct", "q": "q4_K_M"},
+     "id": "Qwen/Qwen3-VL-8B-Instruct", "q": "q4_K_M", "multimodal": True},
     {"ram": 12 * 1024, "vram": 6 * 1024,
-     "id": "google/gemma-3-4b-it", "q": "q4_K_M"},
+     "id": "Qwen/Qwen3-VL-4B-Instruct", "q": "q4_K_M", "multimodal": True},
     {"ram":  8 * 1024, "vram": 4 * 1024,
-     "id": "microsoft/Phi-3.5-mini-instruct", "q": "q4_K_M"},
+     "id": "Qwen/Qwen3-VL-4B-Instruct", "q": "q4_K_M", "multimodal": True},
     {"ram":  4 * 1024, "vram": 0,
-     "id": "google/gemma-3-2b-it", "q": "q4_K_M"},
+     "id": "google/gemma-3-1b-it", "q": "q4_K_M", "multimodal": False},
 ]
@@ -280,34 +471,41 @@ def recommend(profile: SystemProfile) -> Recommendation:
         backend = "metal+mlx"
         runtime = "mlx" if _has_module("mlx") else "llama.cpp"
         rationale.append("Apple Silicon → Metal + MLX")
-    elif profile.gpu.vendor == "nvidia" and profile.gpu.vram_mb >= 6000:
+    elif profile.gpu.vendor == "nvidia" and profile.cuda_available and (profile.os == "linux" or profile.is_wsl):
         backend = "cuda"
-        runtime = "llama.cpp"
-        rationale.append(f"NVIDIA GPU {profile.gpu.vram_mb} MB VRAM → CUDA + llama.cpp")
+        runtime = "vllm" if profile.gpu.vram_mb >= 12 * 1024 else "llama.cpp"
+        rationale.append(f"NVIDIA GPU {profile.gpu.vram_mb} MB VRAM + CUDA → {runtime}")
+    elif profile.gpu.vendor == "nvidia":
+        backend = "cuda" if profile.cuda_available else "vulkan"
+        runtime = "lmstudio" if profile.tools.get("lms") else ("ollama" if profile.tools.get("ollama") else "llama.cpp")
+        rationale.append("Windows NVIDIA는 LM Studio/Ollama 우선, vLLM은 WSL/Linux 권장")
     elif profile.os == "windows" and profile.gpu.vendor in ("amd", "intel"):
-        backend = "directml"
-        runtime = "llama.cpp"
-        rationale.append("Windows + AMD/Intel GPU → DirectML")
+        backend = "directml/vulkan"
+        runtime = "lmstudio" if profile.tools.get("lms") else ("ollama" if profile.tools.get("ollama") else "llama.cpp")
+        rationale.append("Windows + AMD/Intel GPU → DirectML/Vulkan")
     elif profile.os == "linux" and profile.gpu.vendor == "amd":
         backend = "rocm" if "rocm" in profile.gpu.sdk else "vulkan"
-        runtime = "llama.cpp"
+        runtime = "ollama" if profile.tools.get("ollama") else "llama.cpp"
         rationale.append("Linux + AMD GPU → ROCm/Vulkan")
     else:
         backend = "cpu"
-        runtime = "llama.cpp"
-        rationale.append("GPU 가속이 없거나 미감지 → CPU 추론")
+        runtime = "ollama" if profile.tools.get("ollama") else "llama.cpp"
+        instruction_hint = ", ".join(profile.cpu_instructions) or "명령어 미감지"
+        rationale.append(f"GPU 가속이 없거나 미감지 → CPU 추론 ({profile.cpu_logical_cores or profile.cpu_cores} threads, {instruction_hint})")
     # model size by RAM/VRAM
     pick = _MODEL_CATALOG[-1]   # 가장 작은 모델 기본값
     for entry in _MODEL_CATALOG:
         if profile.ram_mb >= entry["ram"] and (
-            backend == "cpu" or profile.gpu.vram_mb >= entry["vram"]
+            backend in {"cpu", "metal+mlx"} or profile.gpu.vram_mb >= entry["vram"]
         ):
             pick = entry
             break
     rationale.append(
         f"RAM {profile.ram_mb} MB · VRAM {profile.gpu.vram_mb} MB → {pick['id']}"
     )
+    if pick.get("multimodal"):
+        rationale.append("최신 멀티모달 모델을 우선 선택")
     # 양자화: VRAM 충분 → 더 정밀한 양자화로 업그레이드
     quant = pick["q"]
@@ -402,7 +600,7 @@ def plan(profile: SystemProfile, rec: Recommendation) -> InstallPlan:
     if sys.version_info < (3, 11):
         need("python3.11+", "Lattice AI 서버는 Python 3.11 이상이 필요합니다.")
-    if not shutil.which("node"):
+    if not _which("node"):
         need("node20", "VSCode 확장 / npm CLI 부트스트랩에 필요")
     # 런타임별 추가
@@ -411,17 +609,39 @@ def plan(profile: SystemProfile, rec: Recommendation) -> InstallPlan:
             name="mlx-lm", why="Apple Silicon LLM 추론",
             command=["pip3", "install", "--upgrade", "mlx-lm"],
         ))
-    if rec.runtime == "llama.cpp" and not shutil.which("ollama"):
+    if rec.runtime in {"llama.cpp", "ollama"} and not _which("ollama"):
         need("ollama", "llama.cpp 가중치를 가장 쉽게 받는 경로")
+    if rec.runtime == "lmstudio" and not _which("lms"):
+        notes.append("LM Studio CLI(lms)를 찾지 못했습니다. https://lmstudio.ai/download 에서 설치하면 Windows/macOS/Linux 모델 다운로드와 GPU 백엔드를 자동 감지합니다.")
+    if rec.runtime == "vllm" and not _has_module("vllm"):
+        steps.append(InstallStep(
+            name="vllm", why="NVIDIA CUDA/WSL/Linux 서버형 추론",
+            command=["pip3", "install", "--upgrade", "vllm", "huggingface_hub"],
+        ))
+    if profile.gpu.vendor == "nvidia" and not profile.cuda_available:
+        notes.append("NVIDIA GPU는 감지됐지만 CUDA/nvidia-smi를 찾지 못했습니다. Windows에서는 NVIDIA 드라이버와 CUDA Toolkit 설치 후 재검사를 권장합니다.")
+    if profile.os == "windows" and profile.gpu.vendor == "nvidia" and not profile.is_wsl:
+        notes.append("vLLM은 Windows native보다 WSL2/Linux에서 안정적입니다. Windows 데스크톱은 LM Studio 또는 Ollama GPU 경로를 먼저 권장합니다.")
-    if not shutil.which("huggingface-cli"):
+    if not _which("huggingface-cli"):
         need("huggingface-cli", "추천 모델 가중치 다운로드용")
     # 모델 가중치 풀
+    model_command = ["huggingface-cli", "download", rec.model_id, "--quiet"]
+    if rec.runtime == "ollama":
+        lower = rec.model_id.lower()
+        if "qwen3-vl-8b" in lower:
+            model_command = ["ollama", "pull", "qwen3-vl:8b"]
+        elif "qwen3-vl-4b" in lower:
+            model_command = ["ollama", "pull", "qwen3-vl:4b"]
+        elif "gemma-3-1b" in lower:
+            model_command = ["ollama", "pull", "gemma3:1b"]
+    elif rec.runtime == "lmstudio":
+        model_command = ["lms", "get", rec.model_id]
     steps.append(InstallStep(
         name=f"weights:{rec.model_id}",
         why="추론에 사용할 모델 가중치",
-        command=["huggingface-cli", "download", rec.model_id, "--quiet"],
+        command=model_command,
     ))
     return InstallPlan(package_manager=pm, steps=steps, notes=notes)
@@ -463,9 +683,13 @@ def verify(profile: SystemProfile, rec: Recommendation) -> Dict[str, Any]:
     if rec.runtime == "mlx":
         add("mlx_lm import", _has_module("mlx_lm"), "Apple Silicon 런타임")
-    if rec.runtime == "llama.cpp":
-        add("ollama binary", shutil.which("ollama") is not None,
-            shutil.which("ollama") or "not found")
+    if rec.runtime in {"llama.cpp", "ollama"}:
+        add("ollama binary", _which("ollama") is not None,
+            _which("ollama") or "not found")
+    if rec.runtime == "lmstudio":
+        add("LM Studio CLI", _which("lms") is not None, _which("lms") or "not found")
+    if rec.backend == "cuda":
+        add("CUDA/nvidia-smi", profile.cuda_available, profile.cuda_version or "not found")
     # CPU/메모리 잠깐 측정
     t0 = time.perf_counter()