npm - ltcai - Versions diffs - 0.1.9 → 0.1.16 - Mend

ltcai 0.1.9 → 0.1.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (43) hide show

package/README.md +174 -305
package/docs/CHANGELOG.md +307 -0
package/docs/architecture.md +121 -0
package/docs/mcp-tools.md +116 -0
package/docs/privacy.md +74 -0
package/docs/public-deploy.md +137 -0
package/docs/security-model.md +121 -0
package/knowledge_graph.py +123 -15
package/llm_router.py +100 -28
package/ltcai_cli.py +138 -5
package/package.json +14 -2
package/server.py +1756 -329
package/skills/SKILL_TEMPLATE.md +61 -29
package/skills/code_review/SKILL.md +28 -0
package/skills/code_review/examples.md +59 -0
package/skills/code_review/risk.json +9 -0
package/skills/code_review/schema.json +65 -0
package/skills/data_analysis/SKILL.md +28 -0
package/skills/data_analysis/examples.md +62 -0
package/skills/data_analysis/risk.json +9 -0
package/skills/data_analysis/schema.json +61 -0
package/skills/file_edit/SKILL.md +33 -0
package/skills/file_edit/examples.md +45 -0
package/skills/file_edit/risk.json +9 -0
package/skills/file_edit/schema.json +60 -0
package/skills/summarize_document/SKILL.md +68 -0
package/skills/summarize_document/examples.md +65 -0
package/skills/summarize_document/risk.json +9 -0
package/skills/summarize_document/schema.json +71 -0
package/skills/web_search/SKILL.md +28 -0
package/skills/web_search/examples.md +61 -0
package/skills/web_search/risk.json +9 -0
package/skills/web_search/schema.json +62 -0
package/static/account.html +53 -51
package/static/admin.html +50 -46
package/static/chat.html +124 -96
package/static/graph.html +1231 -337
package/static/manifest.json +2 -2
package/tests/integration/__pycache__/__init__.cpython-314.pyc +0 -0
package/tests/integration/__pycache__/test_api.cpython-314-pytest-9.0.3.pyc +0 -0
package/tests/unit/__pycache__/test_tools.cpython-314-pytest-9.0.3.pyc +0 -0
package/tests/unit/test_tools.py +194 -1
package/tools.py +264 -4

package/docs/public-deploy.md ADDED Viewed

@@ -0,0 +1,137 @@
+# 퍼블릭 배포 가이드
+Render, Fly.io, Railway, VPS 등 외부 서버에 Lattice AI를 배포할 때 사용하는 가이드입니다.
+## 환경변수
+```bash
+# 필수
+LATTICEAI_MODE=public
+LATTICEAI_INVITE_CODE=my-secret-invite-code   # 회원가입 시 필요한 초대 코드
+# 클라우드 모델 (최소 하나 이상)
+OPENAI_API_KEY=sk-...
+# GROQ_API_KEY=gsk_...
+# OPENROUTER_API_KEY=sk-or-...
+LATTICEAI_PUBLIC_MODEL=openai:gpt-4o-mini     # 기본 공개 모델
+# 보안
+LATTICEAI_ALLOW_LOCAL_MODELS=false            # MLX 비활성화 (서버에 불필요)
+LATTICEAI_ENABLE_TELEGRAM=false               # Telegram 봇 비활성화
+# 선택적
+LATTICEAI_ENABLE_GRAPH=false                  # Data Graph 비활성화
+LATTICEAI_DATA_DIR=/data                      # 데이터 디렉토리
+LATTICEAI_ADMIN_EMAILS=you@example.com        # 어드민 이메일 고정
+```
+## Docker
+```dockerfile
+# Dockerfile이 이미 포함되어 있습니다
+docker build -t lattice-ai .
+```
+```bash
+docker run --rm \
+  -p 4825:4825 \
+  -e LATTICEAI_MODE=public \
+  -e OPENAI_API_KEY="$OPENAI_API_KEY" \
+  -e LATTICEAI_INVITE_CODE="my-secret-code" \
+  -v "$PWD/.data:/data" \
+  lattice-ai
+```
+## Render 배포
+1. New Web Service → GitHub 레포 연결
+2. Environment: `Python 3`
+3. Build Command: `pip install ltcai`
+4. Start Command: `LTCAI`
+5. Environment Variables 탭에서 위 환경변수 입력
+6. Disk 추가: `/data` (영구 저장용)
+## Fly.io 배포
+```bash
+fly launch
+fly secrets set LATTICEAI_MODE=public OPENAI_API_KEY=sk-... LATTICEAI_INVITE_CODE=secret
+fly volumes create ltcai_data --size 1
+fly deploy
+```
+`fly.toml`:
+```toml
+[build]
+  dockerfile = "Dockerfile"
+[[mounts]]
+  source = "ltcai_data"
+  destination = "/data"
+[env]
+  LATTICEAI_DATA_DIR = "/data"
+```
+## nginx 리버스 프록시
+```nginx
+server {
+    listen 80;
+    server_name yourdomain.com;
+    return 301 https://$host$request_uri;
+}
+server {
+    listen 443 ssl http2;
+    server_name yourdomain.com;
+    ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
+    ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;
+    location / {
+        proxy_pass http://127.0.0.1:4825;
+        proxy_set_header Host $host;
+        proxy_set_header X-Real-IP $remote_addr;
+        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+        proxy_set_header X-Forwarded-Proto $scheme;
+        # SSE 스트리밍 지원
+        proxy_buffering off;
+        proxy_cache off;
+        proxy_read_timeout 300s;
+        chunked_transfer_encoding on;
+    }
+}
+```
+## Caddy 리버스 프록시
+```caddyfile
+yourdomain.com {
+    reverse_proxy localhost:4825
+}
+```
+## 퍼블릭 배포 체크리스트
+- [ ] `LATTICEAI_MODE=public` 설정
+- [ ] `LATTICEAI_INVITE_CODE` 비공개 랜덤 값으로 설정
+- [ ] HTTPS 리버스 프록시 구성 (nginx / Caddy)
+- [ ] 영구 볼륨 마운트 (`/data` 또는 `LATTICEAI_DATA_DIR`)
+- [ ] 방화벽에서 4825 포트 직접 노출 차단
+- [ ] `LATTICEAI_ALLOW_LOCAL_MODELS=false`
+- [ ] 최소 하나의 클라우드 API 키 설정
+- [ ] 첫 가입 후 어드민 계정 확인 (`http://yourdomain.com/admin`)
+## 지원 클라우드 모델 프리픽스
+```
+openai:gpt-4o-mini
+openai:gpt-4o
+openrouter:openai/gpt-4o-mini
+groq:llama-3.1-8b-instant
+groq:llama-3.3-70b-versatile
+together:meta-llama/Llama-3.3-70B-Instruct-Turbo
+```

package/docs/security-model.md ADDED Viewed

@@ -0,0 +1,121 @@
+# Lattice AI — 보안 모델
+## 설계 원칙
+Lattice AI는 **개인 AI 워크스페이스**로 설계되었습니다. 기본값은 최대한 안전하게, 네트워크 노출은 명시적 opt-in으로만 허용합니다.
+## 네트워크 바인딩
+| 설정 | 바인딩 | 용도 |
+|------|--------|------|
+| 기본 | `127.0.0.1:4825` | 로컬 전용, 외부 접근 불가 |
+| `LATTICEAI_HOST=0.0.0.0` | `0.0.0.0:4825` | 같은 Wi-Fi 기기 접근 허용 |
+| 퍼블릭 배포 | nginx/Caddy 뒤에 두기 | HTTPS 종단 + 리버스 프록시 |
+## 인증
+### 비밀번호
+- scrypt 해싱 (`hashlib.scrypt`, N=2^14, r=8, p=1)
+- `users.json`에 `{"hash": "<scrypt hex>"}` 형식 저장
+- 평문 비밀번호는 메모리에도 저장되지 않음
+### 세션
+- UUID 토큰, `~/.ltcai/sessions.json` 파일 저장
+- TTL: 24시간 + sliding refresh (활동 시 자동 연장, 15분 단위 디스크 쓰기)
+- 쿠키: `HttpOnly; SameSite=Lax; Path=/`
+- 서버 재시작 후에도 유지 (파일 기반)
+### SSO (선택적)
+- Entra ID / Okta OIDC (`OIDC_DISCOVERY_URL`, `OIDC_CLIENT_ID`, `OIDC_CLIENT_SECRET`)
+- 콜백 후 내부 세션 토큰으로 변환
+- 어드민 핸드오프: `sessionStorage` 1회 읽기 (URL 파라미터 노출 방지)
+## API 키 보안
+- OS keyring (macOS Keychain, Windows Credential Manager, Linux Secret Service) 저장
+- 평문 디스크 저장은 `LATTICEAI_ALLOW_PLAINTEXT_API_KEYS=true` 명시 시에만
+- 채팅 히스토리 저장 전 API key/token/password 패턴 자동 마스킹
+## CORS
+```python
+CORS_ALLOWED_ORIGINS = ["http://localhost:4825", "http://127.0.0.1:4825"]
+```
+- 기본: localhost만 허용
+- `LATTICEAI_CORS_ALLOW_NETWORK=true`: 같은 Wi-Fi 기기 허용
+- 퍼블릭 배포: 리버스 프록시 도메인만 허용 권장
+## Rate Limiting
+토큰 버킷 알고리즘, per-user:
+| 엔드포인트 | burst | 지속 |
+|-----------|-------|------|
+| `/chat` | 30 | 30/분 |
+| `/agent` | 10 | 6/분 |
+| `/upload` | 20 | 12/분 |
+`LATTICEAI_RATE_LIMIT=0`으로 비활성화 (개발 환경용).
+## 파일 업로드
+```python
+MAGIC_NUMBERS = {
+    ".pdf":  b"%PDF",
+    ".docx": b"PK\x03\x04",
+    ".xlsx": b"PK\x03\x04",
+    ".pptx": b"PK\x03\x04",
+    ".png":  b"\x89PNG",
+    ".jpg":  b"\xff\xd8\xff",
+    ".zip":  b"PK\x03\x04",
+}
+```
+- 업로드 시 파일 첫 바이트와 확장자 매핑 검증
+- 불일치 시 400 에러
+## 에이전트 도구 샌드박스
+### `run_command()` 위험 플래그 차단
+다음 패턴이 포함된 명령 실행 거부:
+- `rm -rf`, `sudo`, `chmod 777`, `curl | bash`, `wget | sh`
+- `> /dev/sda`, `dd if=`, `mkfs`
+### `edit_file()` 유일성 검증
+- `old_string`이 파일에 정확히 한 번만 존재해야 성공
+- `replace_all=true`로 전체 치환 허용
+- 워크스페이스 외부 경로 접근 차단 (`../../../etc/passwd` 등)
+### `grep()` 이진 디렉토리 제외
+`node_modules`, `.git`, `venv`, `dist`, `__pycache__` 자동 제외
+## 감사 로그
+- 어드민 세션 핸드오프 이벤트 로깅
+- 평문 비밀번호 마이그레이션 이벤트: `password_migrated_from_plaintext`
+- `server.log` 파일에 모든 요청 기록
+## 텔레메트리
+**없음.** 모든 데이터는 로컬에만 저장됩니다. 외부 서버로 어떠한 사용 데이터도 전송되지 않습니다.
+예외: 사용자가 직접 설정한 클라우드 API(OpenAI, Groq 등)로의 프롬프트 전송은 해당 제공업체의 정책을 따릅니다.
+## 퍼블릭 배포 체크리스트
+- [ ] `LATTICEAI_MODE=public`
+- [ ] `LATTICEAI_INVITE_CODE` 비공개 값 설정
+- [ ] HTTPS 리버스 프록시 (nginx/Caddy)
+- [ ] `LATTICEAI_ENABLE_GRAPH=false` (필요 시)
+- [ ] `/data` 영구 볼륨 마운트
+- [ ] `LATTICEAI_ALLOW_LOCAL_MODELS=false`
+- [ ] 방화벽에서 4825 포트 직접 노출 차단 (리버스 프록시 통해서만)
+자세한 내용: [public-deploy.md](public-deploy.md)

package/knowledge_graph.py CHANGED Viewed

@@ -9,6 +9,7 @@ the ingestion contract.
 import hashlib
 import json
 import logging
+import math
 import re
 import shutil
 import sqlite3
@@ -25,6 +26,25 @@ def _now() -> str:
     return datetime.now().isoformat()
+def _parse_iso(raw: Optional[str]) -> Optional[datetime]:
+    if not raw:
+        return None
+    try:
+        return datetime.fromisoformat(str(raw))
+    except (TypeError, ValueError):
+        return None
+def _recency_score(updated_at: Optional[str], *, now: Optional[datetime] = None, half_life_days: float = 14.0) -> float:
+    stamp = _parse_iso(updated_at)
+    if not stamp:
+        return 0.0
+    now = now or datetime.now()
+    age_days = max(0.0, (now - stamp).total_seconds() / 86400.0)
+    decay = math.log(2) / max(0.1, half_life_days)
+    return math.exp(-decay * age_days)
 def _json(data: Optional[Dict[str, Any]]) -> str:
     return json.dumps(data or {}, ensure_ascii=False, sort_keys=True)
@@ -587,28 +607,115 @@ class KnowledgeGraphStore:
                     "title": row["title"],
                     "summary": row["summary"],
                     "metadata": _safe_loads(row["metadata_json"]),
+                    "updated_at": row["updated_at"],
                 }
                 for row in conn.execute(
-                    "SELECT id, type, title, summary, metadata_json FROM nodes WHERE type != 'Chunk' ORDER BY updated_at DESC LIMIT ?",
+                    "SELECT id, type, title, summary, metadata_json, updated_at FROM nodes WHERE type != 'Chunk' ORDER BY updated_at DESC LIMIT ?",
                     (limit,),
                 )
             ]
             node_ids = {node["id"] for node in nodes}
-            edges = [
-                {
-                    "id": row["id"],
-                    "from": row["from_node"],
-                    "to": row["to_node"],
-                    "type": row["type"],
-                    "weight": row["weight"],
-                    "metadata": _safe_loads(row["metadata_json"]),
-                }
-                for row in conn.execute(
-                    "SELECT id, from_node, to_node, type, weight, metadata_json FROM edges ORDER BY created_at DESC LIMIT ?",
-                    (limit * 3,),
+            edges: List[Dict[str, Any]] = []
+            if node_ids:
+                edge_rows = conn.execute(
+                    """
+                    SELECT id, from_node, to_node, type, weight, metadata_json
+                    FROM edges
+                    WHERE from_node IN (
+                        SELECT id
+                        FROM nodes
+                        WHERE type != 'Chunk'
+                        ORDER BY updated_at DESC
+                        LIMIT ?
+                    )
+                    AND to_node IN (
+                        SELECT id
+                        FROM nodes
+                        WHERE type != 'Chunk'
+                        ORDER BY updated_at DESC
+                        LIMIT ?
+                    )
+                    ORDER BY created_at DESC
+                    """,
+                    (limit, limit),
+                ).fetchall()
+                edges = [
+                    {
+                        "id": row["id"],
+                        "from": row["from_node"],
+                        "to": row["to_node"],
+                        "type": row["type"],
+                        "weight": row["weight"],
+                        "metadata": _safe_loads(row["metadata_json"]),
+                    }
+                    for row in edge_rows
+                ]
+        degree_map: Dict[str, int] = {}
+        now = datetime.now()
+        node_by_id = {node["id"]: node for node in nodes}
+        topic_metrics: Dict[str, Dict[str, Any]] = {}
+        for edge in edges:
+            degree_map[edge["from"]] = degree_map.get(edge["from"], 0) + 1
+            degree_map[edge["to"]] = degree_map.get(edge["to"], 0) + 1
+            from_node = node_by_id.get(edge["from"])
+            to_node = node_by_id.get(edge["to"])
+            if not from_node or not to_node:
+                continue
+            for topic_node, other_node in ((from_node, to_node), (to_node, from_node)):
+                if topic_node["type"] != "Topic":
+                    continue
+                metrics = topic_metrics.setdefault(topic_node["id"], {
+                    "mention_count": 0.0,
+                    "conversation_ids": set(),
+                })
+                if edge["type"] in {"mentions", "discusses"}:
+                    metrics["mention_count"] += max(0.5, float(edge.get("weight") or 1.0))
+                other_meta = other_node.get("metadata") or {}
+                conversation_id = other_meta.get("conversation_id")
+                if other_node["type"] == "Conversation":
+                    conversation_id = other_node["id"]
+                if conversation_id:
+                    metrics["conversation_ids"].add(str(conversation_id))
+        type_max_raw: Dict[str, float] = {}
+        for node in nodes:
+            degree = degree_map.get(node["id"], 0)
+            recency = _recency_score(node.get("updated_at"), now=now)
+            metrics = {
+                "degree": degree,
+                "recency_score": round(recency, 4),
+            }
+            if node["type"] == "Topic":
+                topic_stat = topic_metrics.get(node["id"], {})
+                mention_count = float(topic_stat.get("mention_count") or 0.0)
+                conversation_count = len(topic_stat.get("conversation_ids") or ())
+                raw_importance = (
+                    math.log1p(mention_count) * 2.8
+                    + math.log1p(conversation_count) * 2.2
+                    + recency * 1.4
+                    + math.sqrt(max(0, degree)) * 0.45
                 )
-                if row["from_node"] in node_ids and row["to_node"] in node_ids
-            ]
+                metrics.update({
+                    "mention_count": round(mention_count, 2),
+                    "conversation_count": conversation_count,
+                })
+            else:
+                raw_importance = math.log1p(max(0, degree)) * 1.4 + recency * 0.9
+            metrics["importance_raw"] = round(raw_importance, 4)
+            node["importance"] = round(raw_importance, 4)
+            node["_raw_importance"] = raw_importance
+            node["metadata"] = {**(node.get("metadata") or {}), "graph_metrics": metrics}
+            type_max_raw[node["type"]] = max(type_max_raw.get(node["type"], 0.0), raw_importance)
+        for node in nodes:
+            max_raw = max(type_max_raw.get(node["type"], 0.0), 0.0001)
+            importance_norm = min(1.0, (node.get("_raw_importance") or 0.0) / max_raw)
+            node["importance_norm"] = round(importance_norm, 4)
+            node["metadata"]["graph_metrics"]["importance_norm"] = node["importance_norm"]
+            node.pop("_raw_importance", None)
         return {"nodes": nodes, "edges": edges}
     def search(self, query: str, limit: int = 30) -> Dict[str, Any]:
@@ -669,6 +776,7 @@ class KnowledgeGraphStore:
                     "title": row["title"],
                     "summary": row["summary"],
                     "metadata": _safe_loads(row["metadata_json"]),
+                    "updated_at": row["updated_at"],
                 }
                 for row in rows
             ],

package/llm_router.py CHANGED Viewed

@@ -10,6 +10,7 @@ import os
 import re
 import time
 from dataclasses import dataclass
+from pathlib import Path
 # Set MLX_VLM_DRAFT_KIND to 'mtp' to enable the Gemma 4 assistant MTP drafter.
 os.environ["MLX_VLM_DRAFT_KIND"] = "mtp"
@@ -167,10 +168,59 @@ def parse_model_ref(model_id: str) -> tuple[str, str]:
         provider, model = model_id.split(":", 1)
         if provider in OPENAI_COMPATIBLE_PROVIDERS:
             return provider, model
+        if provider in {"local_mlx", "mlx"}:
+            return "local_mlx", model
     if model_id.startswith("local_mlx:"):
         return "local_mlx", model_id.split(":", 1)[1]
     return "local_mlx", model_id
+HF_MODELS_ROOT = Path.home() / ".latticeai" / "hf-models"
+def hf_model_dir(repo_id: str) -> Path:
+    return HF_MODELS_ROOT / repo_id.replace("/", "__")
+def _looks_like_hf_model_dir(path: Path) -> bool:
+    if not path.exists() or not path.is_dir():
+        return False
+    has_config = (path / "config.json").exists()
+    has_weights = any(path.glob("*.safetensors")) or any(path.glob("*.bin"))
+    has_tokenizer = (
+        (path / "tokenizer.json").exists()
+        or (path / "tokenizer.model").exists()
+        or (path / "tokenizer_config.json").exists()
+    )
+    return has_config and has_weights and has_tokenizer
+def _resolve_local_hf_model(model_id: str) -> str:
+    explicit_path = Path(model_id).expanduser()
+    if explicit_path.exists():
+        return str(explicit_path)
+    local_dir = hf_model_dir(model_id)
+    if _looks_like_hf_model_dir(local_dir):
+        return str(local_dir)
+    return model_id
+def ensure_mlx_runtime() -> None:
+    global mx, lm_load, vlm_load, VLM_AVAILABLE
+    if mx is not None and lm_load is not None:
+        return
+    try:
+        import mlx.core as mlx_core
+        from mlx_lm import load as mlx_lm_load
+        mx = mlx_core
+        lm_load = mlx_lm_load
+        try:
+            from mlx_vlm import load as mlx_vlm_load
+            vlm_load = mlx_vlm_load
+            VLM_AVAILABLE = True
+        except Exception:
+            vlm_load = None
+            VLM_AVAILABLE = False
+        mx.set_default_device(mx.gpu)
+    except Exception as e:
+        raise RuntimeError(f"MLX runtime is not available after install: {e}") from e
 class LLMRouter:
     def __init__(self):
         self._cache: Dict[str, Tuple] = {}
@@ -262,6 +312,7 @@ class LLMRouter:
         if provider != "local_mlx":
             return self._load_cloud_model(provider, provider_model, api_key_override=api_key_override, owner=owner)
+        ensure_mlx_runtime()
         if mx is None or lm_load is None:
             raise RuntimeError("MLX is not available in this process. Run on Apple Silicon with Metal access.")
@@ -274,6 +325,8 @@ class LLMRouter:
         self._enforce_local_model_limit(cache_key)
         print(f"⏳ Loading Gemma 4 Stack: {cache_key}...")
         loop = asyncio.get_event_loop()
+        target_model_id = _resolve_local_hf_model(model_id)
+        target_draft_model_id = _resolve_local_hf_model(draft_model_id) if draft_model_id else None
         def _load():
             mx.set_default_device(mx.gpu)
@@ -281,20 +334,20 @@ class LLMRouter:
             # 1. Target 로드 (Gemma 4는 항상 vlm_load 사용)
             if is_gemma4 and VLM_AVAILABLE:
-                print(f"🔄 Loading Target (VLM Mode): {model_id}...")
-                model, tokenizer = vlm_load(model_id)
+                print(f"🔄 Loading Target (VLM Mode): {target_model_id}...")
+                model, tokenizer = vlm_load(target_model_id)
             else:
-                print(f"🔄 Loading Target (LM Mode): {model_id}...")
-                model, tokenizer = lm_load(model_id)
+                print(f"🔄 Loading Target (LM Mode): {target_model_id}...")
+                model, tokenizer = lm_load(target_model_id)
             # 2. Draft 로드 (Gemma 4는 항상 vlm_load 사용)
             draft_model = None
-            if draft_model_id:
-                print(f"🔄 Loading Assistant (VLM Mode): {draft_model_id}...")
+            if target_draft_model_id:
+                print(f"🔄 Loading Assistant (VLM Mode): {target_draft_model_id}...")
                 if is_gemma4 and VLM_AVAILABLE:
-                    draft_model, _ = vlm_load(draft_model_id)
+                    draft_model, _ = vlm_load(target_draft_model_id)
                 else:
-                    draft_model, _ = lm_load(draft_model_id)
+                    draft_model, _ = lm_load(target_draft_model_id)
                 print(f"✅ Assistant Ready.")
             return model, tokenizer, draft_model
@@ -374,6 +427,18 @@ class LLMRouter:
     def _is_cloud_current(self) -> bool:
         return bool(self._current and isinstance(self._cache.get(self._current), CloudModel))
+    def _local_server_error_hint(self, cloud: CloudModel, error: Exception) -> str:
+        raw = str(error)
+        if cloud.provider == "lmstudio":
+            base_url = os.getenv("LMSTUDIO_BASE_URL") or OPENAI_COMPATIBLE_PROVIDERS["lmstudio"]["base_url"]
+            return (
+                f"LM Studio 연결 실패: {raw}\n\n"
+                f"- LM Studio의 Developer/Local Server를 켜고 모델을 로드했는지 확인하세요.\n"
+                f"- Lattice가 보는 주소는 {base_url} 입니다. 포트가 다르면 LMSTUDIO_BASE_URL을 맞춰주세요.\n"
+                f"- 모델 선택창에는 LM Studio /v1/models에서 감지된 모델만 표시됩니다."
+            )
+        return raw
     def _build_prompt(self, message: str, context: Optional[str], tokenizer) -> str:
         system = SYSTEM_PROMPT
         context = normalize_branding(context)
@@ -382,7 +447,7 @@ class LLMRouter:
             try:
                 msgs = [{"role": "system", "content": system}, {"role": "user", "content": message}]
                 return tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
-            except: pass
+            except Exception: pass
         return f"<|im_start|>system\n{system}<|im_end|>\n<|im_start|>user\n{message}<|im_end|>\n<|im_start|>assistant\n"
     def _build_vlm_prompt(self, model, processor, message: str, context: Optional[str], num_images: int) -> str:
@@ -445,15 +510,18 @@ class LLMRouter:
         context = normalize_branding(context)
         if context:
             system += f"\n\nContext:\n{context}"
-        response = await cloud.client.chat.completions.create(
-            model=cloud.model,
-            messages=[
-                {"role": "system", "content": system},
-                {"role": "user", "content": message},
-            ],
-            max_tokens=max_tokens,
-            temperature=temperature,
-        )
+        try:
+            response = await cloud.client.chat.completions.create(
+                model=cloud.model,
+                messages=[
+                    {"role": "system", "content": system},
+                    {"role": "user", "content": message},
+                ],
+                max_tokens=max_tokens,
+                temperature=temperature,
+            )
+        except Exception as e:
+            raise RuntimeError(self._local_server_error_hint(cloud, e)) from e
         return normalize_branding(response.choices[0].message.content or "")
     async def stream_generate(self, message: str, context: Optional[str] = None, max_tokens: int = 4096, temperature: float = 0.2, image_data: Optional[str] = None) -> AsyncIterator[str]:
@@ -508,16 +576,20 @@ class LLMRouter:
         context = normalize_branding(context)
         if context:
             system += f"\n\nContext:\n{context}"
-        stream = await cloud.client.chat.completions.create(
-            model=cloud.model,
-            messages=[
-                {"role": "system", "content": system},
-                {"role": "user", "content": message},
-            ],
-            max_tokens=max_tokens,
-            temperature=temperature,
-            stream=True,
-        )
+        try:
+            stream = await cloud.client.chat.completions.create(
+                model=cloud.model,
+                messages=[
+                    {"role": "system", "content": system},
+                    {"role": "user", "content": message},
+                ],
+                max_tokens=max_tokens,
+                temperature=temperature,
+                stream=True,
+            )
+        except Exception as e:
+            yield f"⚠️ {self._local_server_error_hint(cloud, e)}"
+            return
         async for event in stream:
             if not event.choices:
                 continue