ltcai 0.1.29 → 0.1.31

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,7 +1,9 @@
1
1
  <div align="center">
2
2
  <img src="https://raw.githubusercontent.com/TaeSooPark-PTS/LatticeAI/main/docs/images/logo.svg" alt="Lattice AI" width="280"/>
3
3
  <br/>
4
- <strong>Your personal AI workspace server — local & cloud, one stack.</strong>
4
+ <strong>One install. Your personal AI workspace.</strong>
5
+ <br/>
6
+ Local LLMs, cloud models, VS Code / Cursor, Telegram, MCP tools, files, admin controls, and a knowledge graph in one self-hosted stack.
5
7
  <br/><br/>
6
8
 
7
9
  [![PyPI](https://img.shields.io/pypi/v/ltcai?label=PyPI&color=blue)](https://pypi.org/project/ltcai/)
@@ -9,35 +11,61 @@
9
11
  [![npm](https://img.shields.io/npm/v/ltcai?label=npm)](https://www.npmjs.com/package/ltcai)
10
12
  [![VS Code](https://vsmarketplacebadges.dev/version-short/parktaesoo.ltcai.svg)](https://marketplace.visualstudio.com/items?itemName=parktaesoo.ltcai)
11
13
  [![Open VSX](https://img.shields.io/open-vsx/v/parktaesoo/ltcai?label=Open%20VSX)](https://open-vsx.org/extension/parktaesoo/ltcai)
14
+ [![CI](https://github.com/TaeSooPark-PTS/LatticeAI/actions/workflows/ci.yml/badge.svg)](https://github.com/TaeSooPark-PTS/LatticeAI/actions/workflows/ci.yml)
12
15
  [![License: MIT](https://img.shields.io/badge/License-MIT-green)](./LICENSE)
13
16
  [![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue)](https://www.python.org/)
14
17
 
18
+ <br/>
19
+
20
+ <img src="https://raw.githubusercontent.com/TaeSooPark-PTS/LatticeAI/main/docs/images/lattice-ai-demo.gif" alt="Lattice AI demo showing chat, knowledge graph, and admin dashboard" width="100%"/>
21
+
15
22
  </div>
16
23
 
17
24
  ---
18
25
 
19
26
  ## What is Lattice AI?
20
27
 
21
- **Lattice AI** is a self-hosted AI server that unifies local and cloud LLMs into one workspace web chat, VS Code extension, Telegram bot, and MCP tools, all from a single `pip install`.
28
+ **Lattice AI** is a self-hosted AI server that unifies local and cloud LLMs into one practical workspace. Install once, then use the same AI from the web UI, VS Code / Cursor, Telegram, MCP clients, files, and your personal knowledge graph.
22
29
 
23
30
  - 🖥️ **Web UI** — chat, file upload, admin dashboard, data graph
24
31
  - 🧩 **VS Code / Cursor extension** — edit, explain, generate commands inline
25
32
  - 📱 **Telegram bot** — access your AI from anywhere
26
33
  - 🔌 **MCP server** — use Lattice tools inside Claude Desktop / Cursor
27
34
  - 🔒 **Zero telemetry** — all data stays in `~/.ltcai/` on your machine
35
+ - ⚡ **30-second start** — `pip install ltcai` or `npm install -g ltcai`
28
36
 
29
37
  ---
30
38
 
31
- ## 📸 Screenshots
39
+ ## 📸 Product Preview
40
+
41
+ Real screens from the local web app:
32
42
 
33
43
  <table>
34
44
  <tr>
35
- <td width="33%"><b>Chat UI</b><br/><img src="https://raw.githubusercontent.com/TaeSooPark-PTS/LatticeAI/main/docs/images/screenshot-chat.png" alt="Lattice AI Chat" width="100%"/></td>
36
- <td width="33%"><b>Admin Dashboard</b><br/><img src="https://raw.githubusercontent.com/TaeSooPark-PTS/LatticeAI/main/docs/images/screenshot-admin.png" alt="Admin Dashboard" width="100%"/></td>
37
- <td width="33%"><b>Data Graph (Graph RAG)</b><br/><img src="https://raw.githubusercontent.com/TaeSooPark-PTS/LatticeAI/main/docs/images/screenshot-graph.png" alt="Knowledge Graph" width="100%"/></td>
45
+ <td align="center" width="33%">
46
+ <b>💬 Workspace Chat</b><br/>
47
+ <img src="https://raw.githubusercontent.com/TaeSooPark-PTS/LatticeAI/main/docs/images/screenshot-chat.png" alt="Lattice AI workspace chat" width="100%"/>
48
+ <sub>Web chat with local LLM, file upload, pipeline status</sub>
49
+ </td>
50
+ <td align="center" width="33%">
51
+ <b>🛡️ Admin Dashboard</b><br/>
52
+ <img src="https://raw.githubusercontent.com/TaeSooPark-PTS/LatticeAI/main/docs/images/screenshot-admin.png" alt="Lattice AI admin dashboard" width="100%"/>
53
+ <sub>User management, audit log, security monitoring</sub>
54
+ </td>
55
+ <td align="center" width="33%">
56
+ <b>🕸️ Knowledge Graph</b><br/>
57
+ <img src="https://raw.githubusercontent.com/TaeSooPark-PTS/LatticeAI/main/docs/images/screenshot-graph.png" alt="Lattice AI knowledge graph" width="100%"/>
58
+ <sub>Auto-built Graph RAG from chats &amp; documents</sub>
59
+ </td>
38
60
  </tr>
39
61
  </table>
40
62
 
63
+ What this gives users after install:
64
+
65
+ - A single local workspace for chat, files, models, runtime setup, and tool control
66
+ - A graph view that turns chats and documents into searchable knowledge
67
+ - Admin screens for users, model status, VPC settings, SSO, audit logs, and security monitoring
68
+
41
69
  ---
42
70
 
43
71
  ## ⚡ Quick Start (30 seconds)
@@ -45,16 +73,9 @@
45
73
  **Python / PyPI**
46
74
 
47
75
  ```bash
48
- # Install (cloud models)
49
76
  pip install ltcai
50
-
51
- # Install (+ Apple Silicon local models)
52
77
  pip install "ltcai[local]"
53
-
54
- # Verify environment
55
78
  LTCAI doctor
56
-
57
- # Start server
58
79
  LTCAI
59
80
  # → http://localhost:4825
60
81
  ```
@@ -95,27 +116,32 @@ Comparison is based on public product behavior as of 2026-05.
95
116
  | VS Code extension | ✅ | ❌ | ✅ | ✅ |
96
117
  | Telegram bot | ✅ | ❌ | ❌ | ❌ |
97
118
  | Graph RAG (auto knowledge graph) | ✅ | ❌ | ❌ | ❌ |
98
- | MCP registry & install | ✅ | | ✅ | ❌ |
119
+ | MCP registry (browse & one-click install) | ✅ | ⚠️* | ✅ | ❌ |
99
120
  | Admin dashboard + audit log | ✅ | ✅ | ❌ | ❌ |
100
121
  | Self-hosted, zero telemetry | ✅ | ✅ | ✅ | ❌ |
101
122
  | One-command public tunnel | ✅ | ❌ | ❌ | ❌ |
102
123
  | Free | ✅ | ✅ | ✅ | ❌ |
103
124
 
125
+ > ⚠️ *Open WebUI supports MCP via manual URL configuration — no registry browsing or one-click install.
126
+
104
127
  ---
105
128
 
106
129
  ## 🧠 Supported Models
107
130
 
108
- **Local — Apple Silicon only (MLX):**
131
+ **Local — Apple Silicon MLX + cross-platform local servers:**
109
132
 
110
133
  | Model | Best for | Size |
111
134
  |-------|----------|------|
112
- | `mlx-community/gemma-4-26b-a4b-it-4bit` | General / coding | ~14 GB |
113
- | `mlx-community/Qwen2.5-Coder-32B-Instruct-4bit` | Coding | ~18 GB |
114
- | `mlx-community/DeepSeek-R1-0528-4bit` | Reasoning | ~38 GB |
115
- | `mlx-community/Phi-4-4bit` | Coding (fast) | ~8 GB |
135
+ | `mlx-community/Qwen3-VL-4B-Instruct-4bit` | Multimodal / low spec | ~2.7 GB |
136
+ | `mlx-community/Qwen3-VL-8B-Instruct-4bit` | Multimodal / balanced | ~4.8 GB |
137
+ | `mlx-community/Qwen3-VL-30B-A3B-Instruct-4bit` | Multimodal / large | ~18 GB |
138
+ | `mlx-community/Llama-3.1-8B-Instruct-4bit` | General | ~4.7 GB |
139
+ | `mlx-community/Mistral-7B-Instruct-v0.3-4bit` | General / Apache | ~4.1 GB |
140
+ | `mlx-community/Phi-4-mini-instruct-4bit` | Coding (fast) | ~2.2 GB |
141
+ | `mlx-community/gemma-4-26b-a4b-it-4bit` | Multimodal / large | ~15.6 GB |
116
142
 
117
143
  **Cloud (any platform):**
118
- OpenAI · Groq · Together · OpenRouter · any OpenAI-compatible endpoint
144
+ OpenAI GPT-5.5 · OpenRouter Claude Opus 4.7 / Sonnet 4.6 / Haiku 4.5 · Groq · Together · any OpenAI-compatible endpoint
119
145
 
120
146
  ---
121
147
 
@@ -298,7 +324,7 @@ Or: `./start_ai.sh` (auto-restart + caffeinate)
298
324
  | VS Code Marketplace | [marketplace.visualstudio.com](https://marketplace.visualstudio.com/items?itemName=parktaesoo.ltcai) |
299
325
  | Open VSX | [open-vsx.org](https://open-vsx.org/extension/parktaesoo/ltcai) |
300
326
 
301
- Current version: **0.1.29** — [Changelog](docs/CHANGELOG.md)
327
+ Current version: **0.1.31** — [Changelog](docs/CHANGELOG.md)
302
328
 
303
329
  ---
304
330
 
@@ -345,9 +371,13 @@ LTCAI --tunnel # + Cloudflare 공개 URL 자동 발급
345
371
 
346
372
  | 모델 | 용도 | 크기 |
347
373
  |------|------|------|
348
- | `mlx-community/gemma-4-26b-a4b-it-4bit` | 범용 | ~14GB |
349
- | `mlx-community/Qwen2.5-Coder-32B-Instruct-4bit` | 코딩 | ~18GB |
350
- | `mlx-community/DeepSeek-R1-0528-4bit` | 추론 | ~38GB |
374
+ | `mlx-community/Qwen3-VL-4B-Instruct-4bit` | 멀티모달 / 저사양 | ~2.7GB |
375
+ | `mlx-community/Qwen3-VL-8B-Instruct-4bit` | 멀티모달 / 균형 | ~4.8GB |
376
+ | `mlx-community/Qwen3-VL-30B-A3B-Instruct-4bit` | 멀티모달 / 대형 | ~18GB |
377
+ | `mlx-community/Llama-3.1-8B-Instruct-4bit` | 범용 | ~4.7GB |
378
+ | `mlx-community/Mistral-7B-Instruct-v0.3-4bit` | 범용 / Apache | ~4.1GB |
379
+ | `mlx-community/Phi-4-mini-instruct-4bit` | 코딩 | ~2.2GB |
380
+ | `mlx-community/gemma-4-26b-a4b-it-4bit` | 멀티모달 / 대형 | ~15.6GB |
351
381
 
352
382
  자세한 내용: [docs/CHANGELOG.md](docs/CHANGELOG.md) · [보안](SECURITY.md) · [기여](CONTRIBUTING.md)
353
383
 
package/auto_setup.py CHANGED
@@ -38,6 +38,7 @@ import argparse
38
38
  import json
39
39
  import os
40
40
  import platform
41
+ import re
41
42
  import shutil
42
43
  import subprocess
43
44
  import sys
@@ -68,12 +69,19 @@ class SystemProfile:
68
69
  arch: str = "" # x86_64 | arm64 | …
69
70
  cpu_model: str = ""
70
71
  cpu_cores: int = 0
72
+ cpu_logical_cores: int = 0
73
+ cpu_instructions: List[str] = field(default_factory=list)
71
74
  ram_mb: int = 0
72
75
  disk_free_mb: int = 0
73
76
  gpu: GPUInfo = field(default_factory=GPUInfo)
74
77
  package_manager: Optional[str] = None # winget | brew | apt | dnf | pacman
75
78
  has_internet: bool = True
76
79
  python_version: str = ""
80
+ is_wsl: bool = False
81
+ wsl_version: str = ""
82
+ cuda_available: bool = False
83
+ cuda_version: str = ""
84
+ tools: Dict[str, str] = field(default_factory=dict)
77
85
 
78
86
  def score(self) -> int:
79
87
  """LLM 적합도 점수 (0..100). RECOMMEND 의 입력."""
@@ -105,13 +113,84 @@ def _run(cmd: List[str], timeout: float = 4.0) -> str:
105
113
  return ""
106
114
 
107
115
 
116
+ def _windows_candidate_paths(binary: str) -> List[str]:
117
+ local_appdata = os.environ.get("LOCALAPPDATA", "")
118
+ program_files = os.environ.get("ProgramFiles", r"C:\Program Files")
119
+ program_files_x86 = os.environ.get("ProgramFiles(x86)", r"C:\Program Files (x86)")
120
+ candidates = {
121
+ "ollama": [
122
+ str(Path(local_appdata) / "Programs" / "Ollama" / "ollama.exe") if local_appdata else "",
123
+ str(Path(program_files) / "Ollama" / "ollama.exe"),
124
+ ],
125
+ "lms": [
126
+ str(Path(local_appdata) / "Programs" / "LM Studio" / "resources" / "app" / ".webpack" / "lms.exe") if local_appdata else "",
127
+ str(Path(program_files) / "LM Studio" / "resources" / "app" / ".webpack" / "lms.exe"),
128
+ ],
129
+ "nvidia-smi": [
130
+ str(Path(program_files) / "NVIDIA Corporation" / "NVSMI" / "nvidia-smi.exe"),
131
+ str(Path(program_files_x86) / "NVIDIA Corporation" / "NVSMI" / "nvidia-smi.exe"),
132
+ ],
133
+ }
134
+ return [item for item in candidates.get(binary, []) if item]
135
+
136
+
137
+ def _which(binary: str) -> Optional[str]:
138
+ found = shutil.which(binary)
139
+ if found:
140
+ return found
141
+ if platform.system() == "Windows":
142
+ for candidate in _windows_candidate_paths(binary):
143
+ if Path(candidate).exists():
144
+ return candidate
145
+ return None
146
+
147
+
148
+ def _parse_windows_video_controllers(raw: str) -> List[Dict[str, Any]]:
149
+ controllers: List[Dict[str, Any]] = []
150
+ if not raw:
151
+ return controllers
152
+ try:
153
+ data = json.loads(raw)
154
+ if isinstance(data, dict):
155
+ data = [data]
156
+ if isinstance(data, list):
157
+ for item in data:
158
+ name = str(item.get("Name") or "").strip()
159
+ if not name:
160
+ continue
161
+ try:
162
+ ram_mb = int(item.get("AdapterRAM") or 0) // (1024 * 1024)
163
+ except Exception:
164
+ ram_mb = 0
165
+ controllers.append({"name": name, "vram_mb": ram_mb})
166
+ if controllers:
167
+ return controllers
168
+ except Exception:
169
+ pass
170
+ current: Dict[str, Any] = {}
171
+ for line in raw.splitlines():
172
+ if line.startswith("Name="):
173
+ if current:
174
+ controllers.append(current)
175
+ current = {"name": line.split("=", 1)[-1].strip(), "vram_mb": 0}
176
+ elif line.startswith("AdapterRAM=") and current:
177
+ try:
178
+ current["vram_mb"] = int(line.split("=", 1)[-1].strip()) // (1024 * 1024)
179
+ except ValueError:
180
+ current["vram_mb"] = 0
181
+ if current:
182
+ controllers.append(current)
183
+ return controllers
184
+
185
+
108
186
  def _detect_gpu(prof_os: str, arch: str) -> GPUInfo:
109
187
  """OS별 휴리스틱으로 GPU 감지. 외부 라이브러리 없이 가능한 만큼만."""
110
188
  gpu = GPUInfo()
111
189
 
112
190
  # NVIDIA
113
- if shutil.which("nvidia-smi"):
114
- info = _run(["nvidia-smi", "--query-gpu=name,memory.total",
191
+ nvidia_smi = _which("nvidia-smi")
192
+ if nvidia_smi:
193
+ info = _run([nvidia_smi, "--query-gpu=name,memory.total",
115
194
  "--format=csv,noheader,nounits"])
116
195
  if info.strip():
117
196
  first = info.strip().splitlines()[0]
@@ -139,30 +218,29 @@ def _detect_gpu(prof_os: str, arch: str) -> GPUInfo:
139
218
 
140
219
  # Windows
141
220
  if prof_os == "windows" and gpu.vendor == "unknown":
142
- info = _run(["wmic", "path", "win32_VideoController", "get",
143
- "Name,AdapterRAM", "/format:list"])
144
- if info:
145
- name = ""
146
- ram = 0
147
- for line in info.splitlines():
148
- if line.startswith("Name="):
149
- name = line.split("=", 1)[-1].strip()
150
- elif line.startswith("AdapterRAM="):
151
- try:
152
- ram = int(line.split("=", 1)[-1].strip()) // (1024 * 1024)
153
- except ValueError:
154
- ram = 0
155
- if name:
156
- gpu.model = name
157
- low = name.lower()
158
- if "nvidia" in low or "rtx" in low or "geforce" in low:
159
- gpu.vendor = "nvidia"; gpu.sdk.append("cuda")
160
- elif "amd" in low or "radeon" in low:
161
- gpu.vendor = "amd"; gpu.sdk.extend(["directml", "vulkan"])
162
- elif "intel" in low:
163
- gpu.vendor = "intel"; gpu.sdk.extend(["directml", "vulkan"])
164
- if ram > 0:
165
- gpu.vram_mb = ram
221
+ shell = _which("powershell") or _which("pwsh")
222
+ info = ""
223
+ if shell:
224
+ info = _run([
225
+ shell, "-NoProfile", "-Command",
226
+ "Get-CimInstance Win32_VideoController | Select-Object Name,AdapterRAM | ConvertTo-Json -Compress",
227
+ ], timeout=8.0)
228
+ if not info:
229
+ info = _run(["wmic", "path", "win32_VideoController", "get",
230
+ "Name,AdapterRAM", "/format:list"])
231
+ controllers = _parse_windows_video_controllers(info)
232
+ if controllers:
233
+ primary = max(controllers, key=lambda item: int(item.get("vram_mb") or 0))
234
+ name = str(primary.get("name") or "")
235
+ gpu.model = name
236
+ gpu.vram_mb = int(primary.get("vram_mb") or 0)
237
+ low = name.lower()
238
+ if "nvidia" in low or "rtx" in low or "geforce" in low:
239
+ gpu.vendor = "nvidia"; gpu.sdk.append("cuda")
240
+ elif "amd" in low or "radeon" in low:
241
+ gpu.vendor = "amd"; gpu.sdk.extend(["directml", "vulkan"])
242
+ elif "intel" in low or "arc" in low or "iris" in low:
243
+ gpu.vendor = "intel"; gpu.sdk.extend(["directml", "vulkan"])
166
244
 
167
245
  # Linux (lspci)
168
246
  if prof_os == "linux" and gpu.vendor == "unknown":
@@ -179,16 +257,96 @@ def _detect_gpu(prof_os: str, arch: str) -> GPUInfo:
179
257
 
180
258
  def _detect_package_manager(prof_os: str) -> Optional[str]:
181
259
  if prof_os == "windows":
182
- return "winget" if shutil.which("winget") else None
260
+ return "winget" if _which("winget") else None
183
261
  if prof_os == "darwin":
184
- return "brew" if shutil.which("brew") else None
262
+ return "brew" if _which("brew") else None
185
263
  if prof_os == "linux":
186
264
  for pm in ("apt", "dnf", "pacman", "zypper", "apk"):
187
- if shutil.which(pm):
265
+ if _which(pm):
188
266
  return pm
189
267
  return None
190
268
 
191
269
 
270
+ def _detect_tools() -> Dict[str, str]:
271
+ tools: Dict[str, str] = {}
272
+ for binary in ("ollama", "lms", "nvidia-smi", "nvcc", "winget", "brew", "apt", "git", "node", "python", "python3"):
273
+ found = _which(binary)
274
+ if found:
275
+ tools[binary] = found
276
+ return tools
277
+
278
+
279
+ def _detect_wsl(prof_os: str) -> Tuple[bool, str]:
280
+ if prof_os != "linux":
281
+ return False, ""
282
+ raw = _read_text("/proc/version")
283
+ is_wsl = "microsoft" in raw.lower() or "wsl" in raw.lower()
284
+ version = "2" if "microsoft-standard" in raw.lower() or "wsl2" in raw.lower() else ("1" if is_wsl else "")
285
+ return is_wsl, version
286
+
287
+
288
+ def _detect_cuda() -> Tuple[bool, str]:
289
+ nvidia_smi = _which("nvidia-smi")
290
+ nvcc = _which("nvcc")
291
+ version = ""
292
+ if nvidia_smi:
293
+ raw = _run([nvidia_smi, "--query-gpu=driver_version", "--format=csv,noheader"], timeout=4.0)
294
+ version = raw.splitlines()[0].strip() if raw.splitlines() else ""
295
+ if nvcc:
296
+ raw = _run([nvcc, "--version"], timeout=4.0)
297
+ m = re.search(r"release\s+([\d.]+)", raw)
298
+ if m:
299
+ version = m.group(1)
300
+ return bool(nvidia_smi or nvcc), version
301
+
302
+
303
+ def _detect_cpu_details(prof_os: str) -> Tuple[str, int, int, List[str]]:
304
+ model = platform.processor() or ""
305
+ physical = os.cpu_count() or 0
306
+ logical = os.cpu_count() or 0
307
+ flags: List[str] = []
308
+ if prof_os == "darwin":
309
+ model = _run(["sysctl", "-n", "machdep.cpu.brand_string"]).strip() or model
310
+ try:
311
+ physical = int((_run(["sysctl", "-n", "hw.physicalcpu"]).strip() or physical))
312
+ logical = int((_run(["sysctl", "-n", "hw.logicalcpu"]).strip() or logical))
313
+ except ValueError:
314
+ pass
315
+ flags = [item.lower() for item in _run(["sysctl", "-n", "machdep.cpu.features"]).split()]
316
+ elif prof_os == "linux":
317
+ text = _read_text("/proc/cpuinfo")
318
+ for line in text.splitlines():
319
+ if line.lower().startswith("model name") and not model:
320
+ model = line.split(":", 1)[-1].strip()
321
+ if line.lower().startswith(("flags", "features")) and not flags:
322
+ flags = line.split(":", 1)[-1].strip().lower().split()
323
+ elif prof_os == "windows":
324
+ raw = _run(["wmic", "cpu", "get", "Name,NumberOfCores,NumberOfLogicalProcessors", "/format:list"])
325
+ for line in raw.splitlines():
326
+ key, _, value = line.partition("=")
327
+ if key == "Name" and value.strip():
328
+ model = value.strip()
329
+ elif key == "NumberOfCores" and value.strip():
330
+ try:
331
+ physical = int(value.strip())
332
+ except ValueError:
333
+ pass
334
+ elif key == "NumberOfLogicalProcessors" and value.strip():
335
+ try:
336
+ logical = int(value.strip())
337
+ except ValueError:
338
+ pass
339
+ try:
340
+ import ctypes
341
+ kernel32 = ctypes.windll.kernel32
342
+ feature_map = {6: "sse", 10: "sse2", 13: "sse3", 19: "neon", 28: "rdrand"}
343
+ flags.extend(name for code, name in feature_map.items() if kernel32.IsProcessorFeaturePresent(code))
344
+ except Exception:
345
+ pass
346
+ interesting = {"avx", "avx2", "avx512f", "fma", "neon", "sse4_2", "sse", "sse2", "sse3", "rdrand"}
347
+ return model, physical, logical, sorted({flag for flag in flags if flag in interesting})
348
+
349
+
192
350
  def _has_module(name: str) -> bool:
193
351
  try:
194
352
  __import__(name)
@@ -204,9 +362,15 @@ def probe() -> SystemProfile:
204
362
  "Linux": "linux"}.get(platform.system(), platform.system().lower())
205
363
  prof.os_version = platform.release()
206
364
  prof.arch = platform.machine().lower()
207
- prof.cpu_model = platform.processor() or ""
208
- prof.cpu_cores = os.cpu_count() or 0
365
+ cpu_model, cpu_cores, cpu_logical_cores, cpu_instructions = _detect_cpu_details(prof.os)
366
+ prof.cpu_model = cpu_model
367
+ prof.cpu_cores = cpu_cores
368
+ prof.cpu_logical_cores = cpu_logical_cores
369
+ prof.cpu_instructions = cpu_instructions
209
370
  prof.python_version = platform.python_version()
371
+ prof.is_wsl, prof.wsl_version = _detect_wsl(prof.os)
372
+ prof.cuda_available, prof.cuda_version = _detect_cuda()
373
+ prof.tools = _detect_tools()
210
374
 
211
375
  # RAM
212
376
  try:
@@ -218,7 +382,27 @@ def probe() -> SystemProfile:
218
382
  elif prof.os == "darwin":
219
383
  out = _run(["sysctl", "-n", "hw.memsize"])
220
384
  if out.strip():
221
- prof.ram_mb = int(out.strip()) // (1024 * 1024)
385
+ try:
386
+ prof.ram_mb = int(out.strip()) // (1024 * 1024)
387
+ except ValueError:
388
+ prof.ram_mb = 0
389
+ if not prof.ram_mb:
390
+ profiler = _run(["system_profiler", "SPHardwareDataType"], timeout=8.0)
391
+ m = re.search(r"Memory:\s+([\d.]+)\s*(TB|GB|MB)", profiler, re.IGNORECASE)
392
+ if m:
393
+ value = float(m.group(1))
394
+ unit = m.group(2).lower()
395
+ if unit == "tb":
396
+ prof.ram_mb = int(value * 1024 * 1024)
397
+ elif unit == "gb":
398
+ prof.ram_mb = int(value * 1024)
399
+ else:
400
+ prof.ram_mb = int(value)
401
+ if not prof.ram_mb:
402
+ hostinfo = _run(["hostinfo"])
403
+ m = re.search(r"Primary memory available:\s+([\d.]+)\s+gigabytes", hostinfo, re.IGNORECASE)
404
+ if m:
405
+ prof.ram_mb = int(float(m.group(1)) * 1024)
222
406
  elif prof.os == "windows":
223
407
  out = _run(["wmic", "ComputerSystem", "get", "TotalPhysicalMemory",
224
408
  "/format:list"])
@@ -258,16 +442,23 @@ class Recommendation:
258
442
  # 모델 카탈로그. PPT 슬라이드 16 의 "추천 모델" 열과 동기화.
259
443
  _MODEL_CATALOG: List[Dict[str, Any]] = [
260
444
  # (min_ram_mb, min_vram_mb, model_id, quant, runtime_preference)
261
- {"ram": 24 * 1024, "vram": 16 * 1024,
262
- "id": "google/gemma-3-12b-it", "q": "q5_K_M"},
445
+ # OS 오버헤드(~4-6 GB) + KV 캐시 여유를 감안한 보수적 RAM 임계값
446
+ {"ram": 64 * 1024, "vram": 32 * 1024,
447
+ "id": "Qwen/Qwen3-VL-30B-A3B-Instruct", "q": "q4_K_M", "multimodal": True},
448
+ {"ram": 48 * 1024, "vram": 24 * 1024,
449
+ "id": "Qwen/Qwen3-VL-30B-A3B-Instruct", "q": "q4_K_M", "multimodal": True},
450
+ {"ram": 32 * 1024, "vram": 16 * 1024,
451
+ "id": "Qwen/Qwen3-VL-8B-Instruct", "q": "q5_K_M", "multimodal": True},
452
+ {"ram": 24 * 1024, "vram": 12 * 1024,
453
+ "id": "Qwen/Qwen3-VL-8B-Instruct", "q": "q4_K_M", "multimodal": True},
263
454
  {"ram": 16 * 1024, "vram": 8 * 1024,
264
- "id": "Qwen/Qwen2.5-7B-Instruct", "q": "q4_K_M"},
455
+ "id": "Qwen/Qwen3-VL-8B-Instruct", "q": "q4_K_M", "multimodal": True},
265
456
  {"ram": 12 * 1024, "vram": 6 * 1024,
266
- "id": "google/gemma-3-4b-it", "q": "q4_K_M"},
457
+ "id": "Qwen/Qwen3-VL-4B-Instruct", "q": "q4_K_M", "multimodal": True},
267
458
  {"ram": 8 * 1024, "vram": 4 * 1024,
268
- "id": "microsoft/Phi-3.5-mini-instruct", "q": "q4_K_M"},
459
+ "id": "Qwen/Qwen3-VL-4B-Instruct", "q": "q4_K_M", "multimodal": True},
269
460
  {"ram": 4 * 1024, "vram": 0,
270
- "id": "google/gemma-3-2b-it", "q": "q4_K_M"},
461
+ "id": "google/gemma-3-1b-it", "q": "q4_K_M", "multimodal": False},
271
462
  ]
272
463
 
273
464
 
@@ -280,34 +471,41 @@ def recommend(profile: SystemProfile) -> Recommendation:
280
471
  backend = "metal+mlx"
281
472
  runtime = "mlx" if _has_module("mlx") else "llama.cpp"
282
473
  rationale.append("Apple Silicon → Metal + MLX")
283
- elif profile.gpu.vendor == "nvidia" and profile.gpu.vram_mb >= 6000:
474
+ elif profile.gpu.vendor == "nvidia" and profile.cuda_available and (profile.os == "linux" or profile.is_wsl):
284
475
  backend = "cuda"
285
- runtime = "llama.cpp"
286
- rationale.append(f"NVIDIA GPU {profile.gpu.vram_mb} MB VRAM CUDA + llama.cpp")
476
+ runtime = "vllm" if profile.gpu.vram_mb >= 12 * 1024 else "llama.cpp"
477
+ rationale.append(f"NVIDIA GPU {profile.gpu.vram_mb} MB VRAM + CUDA {runtime}")
478
+ elif profile.gpu.vendor == "nvidia":
479
+ backend = "cuda" if profile.cuda_available else "vulkan"
480
+ runtime = "lmstudio" if profile.tools.get("lms") else ("ollama" if profile.tools.get("ollama") else "llama.cpp")
481
+ rationale.append("Windows NVIDIA는 LM Studio/Ollama 우선, vLLM은 WSL/Linux 권장")
287
482
  elif profile.os == "windows" and profile.gpu.vendor in ("amd", "intel"):
288
- backend = "directml"
289
- runtime = "llama.cpp"
290
- rationale.append("Windows + AMD/Intel GPU → DirectML")
483
+ backend = "directml/vulkan"
484
+ runtime = "lmstudio" if profile.tools.get("lms") else ("ollama" if profile.tools.get("ollama") else "llama.cpp")
485
+ rationale.append("Windows + AMD/Intel GPU → DirectML/Vulkan")
291
486
  elif profile.os == "linux" and profile.gpu.vendor == "amd":
292
487
  backend = "rocm" if "rocm" in profile.gpu.sdk else "vulkan"
293
- runtime = "llama.cpp"
488
+ runtime = "ollama" if profile.tools.get("ollama") else "llama.cpp"
294
489
  rationale.append("Linux + AMD GPU → ROCm/Vulkan")
295
490
  else:
296
491
  backend = "cpu"
297
- runtime = "llama.cpp"
298
- rationale.append("GPU 가속이 없거나 미감지 → CPU 추론")
492
+ runtime = "ollama" if profile.tools.get("ollama") else "llama.cpp"
493
+ instruction_hint = ", ".join(profile.cpu_instructions) or "명령어 미감지"
494
+ rationale.append(f"GPU 가속이 없거나 미감지 → CPU 추론 ({profile.cpu_logical_cores or profile.cpu_cores} threads, {instruction_hint})")
299
495
 
300
496
  # model size by RAM/VRAM
301
497
  pick = _MODEL_CATALOG[-1] # 가장 작은 모델 기본값
302
498
  for entry in _MODEL_CATALOG:
303
499
  if profile.ram_mb >= entry["ram"] and (
304
- backend == "cpu" or profile.gpu.vram_mb >= entry["vram"]
500
+ backend in {"cpu", "metal+mlx"} or profile.gpu.vram_mb >= entry["vram"]
305
501
  ):
306
502
  pick = entry
307
503
  break
308
504
  rationale.append(
309
505
  f"RAM {profile.ram_mb} MB · VRAM {profile.gpu.vram_mb} MB → {pick['id']}"
310
506
  )
507
+ if pick.get("multimodal"):
508
+ rationale.append("최신 멀티모달 모델을 우선 선택")
311
509
 
312
510
  # 양자화: VRAM 충분 → 더 정밀한 양자화로 업그레이드
313
511
  quant = pick["q"]
@@ -402,7 +600,7 @@ def plan(profile: SystemProfile, rec: Recommendation) -> InstallPlan:
402
600
 
403
601
  if sys.version_info < (3, 11):
404
602
  need("python3.11+", "Lattice AI 서버는 Python 3.11 이상이 필요합니다.")
405
- if not shutil.which("node"):
603
+ if not _which("node"):
406
604
  need("node20", "VSCode 확장 / npm CLI 부트스트랩에 필요")
407
605
 
408
606
  # 런타임별 추가
@@ -411,17 +609,39 @@ def plan(profile: SystemProfile, rec: Recommendation) -> InstallPlan:
411
609
  name="mlx-lm", why="Apple Silicon LLM 추론",
412
610
  command=["pip3", "install", "--upgrade", "mlx-lm"],
413
611
  ))
414
- if rec.runtime == "llama.cpp" and not shutil.which("ollama"):
612
+ if rec.runtime in {"llama.cpp", "ollama"} and not _which("ollama"):
415
613
  need("ollama", "llama.cpp 가중치를 가장 쉽게 받는 경로")
614
+ if rec.runtime == "lmstudio" and not _which("lms"):
615
+ notes.append("LM Studio CLI(lms)를 찾지 못했습니다. https://lmstudio.ai/download 에서 설치하면 Windows/macOS/Linux 모델 다운로드와 GPU 백엔드를 자동 감지합니다.")
616
+ if rec.runtime == "vllm" and not _has_module("vllm"):
617
+ steps.append(InstallStep(
618
+ name="vllm", why="NVIDIA CUDA/WSL/Linux 서버형 추론",
619
+ command=["pip3", "install", "--upgrade", "vllm", "huggingface_hub"],
620
+ ))
621
+ if profile.gpu.vendor == "nvidia" and not profile.cuda_available:
622
+ notes.append("NVIDIA GPU는 감지됐지만 CUDA/nvidia-smi를 찾지 못했습니다. Windows에서는 NVIDIA 드라이버와 CUDA Toolkit 설치 후 재검사를 권장합니다.")
623
+ if profile.os == "windows" and profile.gpu.vendor == "nvidia" and not profile.is_wsl:
624
+ notes.append("vLLM은 Windows native보다 WSL2/Linux에서 안정적입니다. Windows 데스크톱은 LM Studio 또는 Ollama GPU 경로를 먼저 권장합니다.")
416
625
 
417
- if not shutil.which("huggingface-cli"):
626
+ if not _which("huggingface-cli"):
418
627
  need("huggingface-cli", "추천 모델 가중치 다운로드용")
419
628
 
420
629
  # 모델 가중치 풀
630
+ model_command = ["huggingface-cli", "download", rec.model_id, "--quiet"]
631
+ if rec.runtime == "ollama":
632
+ lower = rec.model_id.lower()
633
+ if "qwen3-vl-8b" in lower:
634
+ model_command = ["ollama", "pull", "qwen3-vl:8b"]
635
+ elif "qwen3-vl-4b" in lower:
636
+ model_command = ["ollama", "pull", "qwen3-vl:4b"]
637
+ elif "gemma-3-1b" in lower:
638
+ model_command = ["ollama", "pull", "gemma3:1b"]
639
+ elif rec.runtime == "lmstudio":
640
+ model_command = ["lms", "get", rec.model_id]
421
641
  steps.append(InstallStep(
422
642
  name=f"weights:{rec.model_id}",
423
643
  why="추론에 사용할 모델 가중치",
424
- command=["huggingface-cli", "download", rec.model_id, "--quiet"],
644
+ command=model_command,
425
645
  ))
426
646
 
427
647
  return InstallPlan(package_manager=pm, steps=steps, notes=notes)
@@ -463,9 +683,13 @@ def verify(profile: SystemProfile, rec: Recommendation) -> Dict[str, Any]:
463
683
 
464
684
  if rec.runtime == "mlx":
465
685
  add("mlx_lm import", _has_module("mlx_lm"), "Apple Silicon 런타임")
466
- if rec.runtime == "llama.cpp":
467
- add("ollama binary", shutil.which("ollama") is not None,
468
- shutil.which("ollama") or "not found")
686
+ if rec.runtime in {"llama.cpp", "ollama"}:
687
+ add("ollama binary", _which("ollama") is not None,
688
+ _which("ollama") or "not found")
689
+ if rec.runtime == "lmstudio":
690
+ add("LM Studio CLI", _which("lms") is not None, _which("lms") or "not found")
691
+ if rec.backend == "cuda":
692
+ add("CUDA/nvidia-smi", profile.cuda_available, profile.cuda_version or "not found")
469
693
 
470
694
  # CPU/메모리 잠깐 측정
471
695
  t0 = time.perf_counter()