npm - groove-dev - Versions diffs - 0.27.130 → 0.27.131 - Mend

groove-dev 0.27.130 → 0.27.131

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (43) hide show

package/model-workspace/LAB-ASSISTANT-BUILD-PLAN.md ADDED Viewed

@@ -0,0 +1,341 @@
+# Lab Assistant — Build Plan
+## Overview
+Add an embedded AI assistant to the Model Lab that helps users set up inference runtimes (vLLM, TGI) without leaving the Lab. One agent, one chat, inline in the center panel. The agent handles system recon, installation, configuration, server startup, and auto-creates the Lab runtime when done.
+## User Flow
+1. User selects a model + picks vLLM or TGI backend in the Launch Model section
+2. Clicks **"Setup vLLM with Assistant"** button
+3. Center panel switches from Playground to Assistant tab — agent chat appears
+4. Agent: "Let me check your system..." → runs nvidia-smi, checks CUDA, Python, Docker, VRAM
+5. Agent: "You have an RTX 4090 with 24GB VRAM. I'll set up vLLM via Docker with Qwen3-8B."
+6. Agent installs, configures, starts the server, validates with a health check
+7. Agent calls `POST http://localhost:31415/api/lab/runtimes` to register the runtime
+8. Runtime appears in the Lab's left panel automatically (via WebSocket event)
+9. Agent: "Done! Switch to the Playground tab to start chatting."
+10. User clicks "Switch to Playground" — model is ready to use
+## Architecture
+```
+User clicks "Setup vLLM"
+  → store.launchLabAssistant('vllm')
+    → POST /api/lab/assistant { backend: 'vllm' }
+      → daemon reads templates/vllm-setup.json
+      → daemon.processes.spawn({ role: 'lab-assistant', prompt: ... })
+      → returns { agentId }
+    → store sets labAssistantAgentId, switches to Assistant tab
+    → agent output streams via WebSocket → chatHistory[agentId]
+Agent runs system commands, installs vLLM, starts server
+  → agent calls: curl POST /api/lab/runtimes (on localhost)
+    → daemon creates runtime, broadcasts lab:runtime:added
+    → store.fetchLabRuntimes() fires → left panel updates
+```
+## Implementation
+### 1. Team Templates (new files)
+**`packages/daemon/templates/vllm-setup.json`**
+Single-agent template. Role is `lab-assistant` (avoids the `planner` role which is restricted to planning-only in `ROLE_PROMPTS`).
+```json
+{
+  "name": "vllm-setup",
+  "description": "Lab Assistant for vLLM installation and configuration",
+  "agents": [
+    {
+      "role": "lab-assistant",
+      "scope": [],
+      "provider": "claude-code",
+      "prompt": "SEE PROMPT SPEC BELOW"
+    }
+  ]
+}
+```
+**`packages/daemon/templates/tgi-setup.json`** — same structure, TGI-specific prompt.
+#### Planner Prompt Spec (vLLM)
+The prompt is the hardest part. It must cover:
+**Identity**: "You are a GROOVE Lab Assistant. You help the user set up a vLLM inference server. Be conversational, report progress clearly."
+**System Recon** (run these commands, report findings):
+- `nvidia-smi` — GPU model, VRAM, driver version
+- `nvcc --version` — CUDA toolkit version
+- `python3 --version` and `pip3 --version`
+- `docker --version`
+- `free -h` — available RAM
+- `df -h /` — disk space
+**Decision Matrix**:
+- Docker available + NVIDIA GPU → Docker path (simplest)
+- No Docker, Python 3.8+ and CUDA → pip path
+- No GPU → warn user, suggest llama.cpp/Ollama instead
+- VRAM sizing: <8GB → 1-3B models, 8-16GB → 7B, 16-24GB → 13B, 24-48GB → 30-70B quantized, 48GB+ → 70B+
+**Installation**:
+- Docker: `docker run --runtime nvidia --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host vllm/vllm-openai:latest --model <model>`
+- Pip: `pip install vllm && vllm serve <model> --host 0.0.0.0 --port 8000`
+- Must use `nohup` or `docker run -d` so server persists after agent exits
+**Validation**: `curl http://localhost:8000/v1/models` — confirm JSON response
+**Runtime Registration**:
+```bash
+PORT=$(cat ~/.groove/daemon.port 2>/dev/null || echo 31415)
+curl -s -X POST http://localhost:$PORT/api/lab/runtimes \
+  -H 'Content-Type: application/json' \
+  -d '{"name":"vLLM - <model>","type":"vllm","endpoint":"http://localhost:8000"}'
+```
+**Completion**: Tell user to switch to Playground tab.
+**Error Handling**: Explain errors clearly, suggest fixes, offer retry. Common issues: CUDA mismatch, insufficient VRAM, Docker not running, missing nvidia-container-toolkit.
+#### TGI Prompt Spec
+Same structure, but:
+- Docker: `ghcr.io/huggingface/text-generation-inference`
+- Default port: 8080
+- Model loading with `--model-id`
+- Runtime type: `"tgi"`
+---
+### 2. Daemon Endpoint
+**File**: `packages/daemon/src/api.js`
+**Location**: After the existing lab endpoints (after `GET /api/lab/sessions/:id`, around line 6760)
+**`POST /api/lab/assistant`**
+```
+Request:  { "backend": "vllm" | "tgi" }
+Response: { "agentId": "abc123", "backend": "vllm" }
+```
+Implementation:
+1. Validate `backend` is "vllm" or "tgi"
+2. Read template: `readFileSync(resolve(__dirname, '../templates/${backend}-setup.json'), 'utf8')`
+3. Parse JSON, extract the agent config (first entry in `agents` array)
+4. Spawn agent via `daemon.processes.spawn()` with:
+   - `role`: "lab-assistant"
+   - `provider`: agent config provider or daemon default
+   - `prompt`: agent config prompt
+   - `teamId`: default team ID
+   - `metadata`: `{ labAssistant: true, backend }`
+5. Return `201` with `{ agentId: agent.id, backend }`
+**Import needed**: `readFileSync` from `fs`, `resolve` from `path` (likely already imported in api.js)
+---
+### 3. Store State & Actions
+**File**: `packages/gui/src/stores/groove.js`
+**New state** (add after `labLaunchError`):
+```js
+labAssistantAgentId: null,
+labAssistantMode: false,
+labAssistantBackend: null,
+```
+**New actions** (add after the existing lab actions block):
+```js
+async launchLabAssistant(backend) {
+  const existing = get().labAssistantAgentId;
+  if (existing) {
+    // If assistant already running, just switch to its tab
+    const agent = get().agents.find((a) => a.id === existing);
+    if (agent && agent.status === 'running') {
+      set({ labAssistantMode: true });
+      return;
+    }
+  }
+  try {
+    const data = await api.post('/lab/assistant', { backend });
+    set({ labAssistantAgentId: data.agentId, labAssistantMode: true, labAssistantBackend: backend });
+    get().addToast('info', `Lab Assistant started for ${backend}`);
+  } catch (err) {
+    get().addToast('error', 'Failed to start assistant', err.message);
+  }
+},
+dismissLabAssistant() {
+  set({ labAssistantMode: false });
+},
+clearLabAssistant() {
+  const id = get().labAssistantAgentId;
+  if (id) api.delete(`/agents/${id}`).catch(() => {});
+  set({ labAssistantAgentId: null, labAssistantMode: false, labAssistantBackend: null });
+},
+```
+**No WebSocket changes needed** — `agent:output` already populates `chatHistory[agentId]`, and `lab:runtime:added` already triggers `fetchLabRuntimes()`.
+---
+### 4. Lab Assistant Component (new file)
+**File**: `packages/gui/src/components/lab/lab-assistant.jsx`
+A focused chat component for the lab assistant agent. Reads from `chatHistory[labAssistantAgentId]`, sends via the existing agent instruction mechanism.
+**Structure**:
+```
+LabAssistant
+├── Header: backend badge, agent status, dismiss "X" button
+├── ScrollArea: message list
+│   ├── Agent messages (left-aligned, border-l accent)
+│   └── User messages (right-aligned, accent/10 bg)
+├── Thinking indicator (when agent is processing)
+├── Completion banner: "Setup complete — Switch to Playground" button
+└── Input: textarea + send button (Enter to send, Shift+Enter newline)
+```
+**Key store selectors**:
+- `labAssistantAgentId` — which agent to display
+- `chatHistory[agentId]` — message history
+- `agents.find(a => a.id === agentId)` — agent status (running/completed/error)
+- `dismissLabAssistant` — switch back to playground
+**Sending messages**: Use the existing `instructAgent(agentId, message)` store action (check if this exists — it should be the mechanism used by AgentChat). If not, use `api.post('/agents/${id}/instruct', { message })`.
+**Styling**: Match `chat-playground.jsx` patterns exactly — same message bubble styles, same input area, same auto-scroll logic. License header on line 1.
+**Completion detection**: When `agent.status !== 'running'` and the last message exists, show the "Switch to Playground" button.
+---
+### 5. Center Panel Mode Switch
+**File**: `packages/gui/src/views/model-lab.jsx`
+**New import**: `import { LabAssistant } from '../components/lab/lab-assistant';`
+**New store selectors** in `ModelLabView`:
+```js
+const labAssistantAgentId = useGrooveStore((s) => s.labAssistantAgentId);
+const labAssistantMode = useGrooveStore((s) => s.labAssistantMode);
+```
+**Replace the center panel** (currently `<div className="flex-1 min-w-0 p-3"><ChatPlayground /></div>`) with:
+```jsx
+<div className="flex-1 min-w-0 flex flex-col">
+  {/* Tab bar — only visible when assistant exists */}
+  {labAssistantAgentId && (
+    <div className="flex-shrink-0 flex items-center gap-1 px-3 pt-2">
+      <button
+        onClick={() => set({ labAssistantMode: false })}
+        className={cn(
+          'px-3 py-1.5 text-xs font-sans font-medium rounded-t-md transition-colors cursor-pointer',
+          !labAssistantMode ? 'text-accent bg-accent/10' : 'text-text-3 hover:text-text-1',
+        )}
+      >
+        Playground
+      </button>
+      <button
+        onClick={() => set({ labAssistantMode: true })}
+        className={cn(
+          'px-3 py-1.5 text-xs font-sans font-medium rounded-t-md transition-colors cursor-pointer',
+          labAssistantMode ? 'text-accent bg-accent/10' : 'text-text-3 hover:text-text-1',
+        )}
+      >
+        Assistant
+      </button>
+    </div>
+  )}
+  <div className="flex-1 min-h-0 p-3">
+    {labAssistantMode && labAssistantAgentId ? <LabAssistant /> : <ChatPlayground />}
+  </div>
+</div>
+```
+Note: The tab buttons need to call store's `set` function — either expose `setLabAssistantMode(bool)` as a store action, or use `dismissLabAssistant()` and `set({ labAssistantMode: true })`. Cleanest approach: add a `setLabAssistantMode(mode)` action to the store.
+---
+### 6. LaunchModel Button
+**File**: `packages/gui/src/components/lab/runtime-config.jsx`
+**In the `LaunchModel` component**, replace the static guidance box for vLLM/TGI:
+Change from:
+```jsx
+{!currentBackend?.autoLaunch && (
+  <div>...manual setup guidance...</div>
+)}
+```
+To:
+```jsx
+{!currentBackend?.autoLaunch && (
+  <div className="space-y-2">
+    <Button variant="primary" size="sm" className="w-full" onClick={handleLaunchAssistant} disabled={assistantLaunching}>
+      {assistantLaunching
+        ? <><Loader2 size={12} className="animate-spin mr-1.5" /> Starting Assistant...</>
+        : <><Wrench size={12} className="mr-1.5" /> Setup {currentBackend?.label} with Assistant</>
+      }
+    </Button>
+    <p className="text-2xs text-text-4 font-sans px-1">
+      An AI assistant will check your system and handle the installation, or start your server manually and add it as a Runtime below.
+    </p>
+  </div>
+)}
+```
+**New imports**: `Wrench` from lucide-react, `launchLabAssistant` from store.
+**Update BACKENDS array** subtitle for vLLM/TGI:
+```js
+{ id: 'vllm', label: 'vLLM', subtitle: 'GPU-optimized, guided setup', autoLaunch: false },
+{ id: 'tgi', label: 'TGI', subtitle: 'HuggingFace, guided setup', autoLaunch: false },
+```
+---
+## File Summary
+| File | Action | Description |
+|------|--------|-------------|
+| `packages/daemon/templates/vllm-setup.json` | CREATE | vLLM setup agent template + prompt |
+| `packages/daemon/templates/tgi-setup.json` | CREATE | TGI setup agent template + prompt |
+| `packages/daemon/src/api.js` | MODIFY | Add `POST /api/lab/assistant` endpoint |
+| `packages/gui/src/stores/groove.js` | MODIFY | Add labAssistant state + actions |
+| `packages/gui/src/components/lab/lab-assistant.jsx` | CREATE | Assistant chat component |
+| `packages/gui/src/views/model-lab.jsx` | MODIFY | Center panel tab switch |
+| `packages/gui/src/components/lab/runtime-config.jsx` | MODIFY | "Setup with Assistant" button |
+## Build Order
+1. Team templates (`vllm-setup.json`, `tgi-setup.json`)
+2. Daemon endpoint (`POST /api/lab/assistant` in `api.js`)
+3. Store state and actions (`groove.js`)
+4. Lab Assistant component (`lab-assistant.jsx`)
+5. Center panel mode switch (`model-lab.jsx`)
+6. LaunchModel button (`runtime-config.jsx`)
+7. `npm run build` from `packages/gui/`
+8. End-to-end test
+## Key Constraints
+- **Role must be `lab-assistant`** — not `planner` (which is restricted to planning-only in `ROLE_PROMPTS`)
+- **Agent sends messages via WebSocket** — the `chatHistory[agentId]` pattern already handles this
+- **Runtime creation via curl** — the agent calls the daemon REST API directly. The daemon is on localhost:31415 (or read from `~/.groove/daemon.port`)
+- **Server must persist** — use `nohup`, `docker run -d`, or background processes so the inference server outlives the agent
+- **No inline styles** — Tailwind CSS only
+- **License header** — `// FSL-1.1-Apache-2.0 — see LICENSE` on every new source file
+- **ESM imports** — all files use `import`/`export`
+- **Do NOT restart the daemon** — verify with `npm run build` only

package/node_modules/@groove-dev/cli/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@groove-dev/cli",
-  "version": "0.27.130",
+  "version": "0.27.131",
   "description": "GROOVE CLI — manage AI coding agents from your terminal",
   "license": "FSL-1.1-Apache-2.0",
   "type": "module",

package/node_modules/@groove-dev/daemon/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@groove-dev/daemon",
-  "version": "0.27.130",
+  "version": "0.27.131",
   "description": "GROOVE daemon — agent orchestration engine",
   "license": "FSL-1.1-Apache-2.0",
   "type": "module",

package/node_modules/@groove-dev/daemon/src/api.js CHANGED Viewed

@@ -6764,6 +6764,31 @@ Keep responses concise. Help them think, don't lecture them about the system the
     res.json(session);
   });
+  app.post('/api/lab/assistant', async (req, res) => {
+    try {
+      const { backend } = req.body || {};
+      if (!backend || !['vllm', 'tgi'].includes(backend)) {
+        return res.status(400).json({ error: 'backend must be "vllm" or "tgi"' });
+      }
+      const templatePath = resolve(__dirname, `../templates/${backend}-setup.json`);
+      const template = JSON.parse(readFileSync(templatePath, 'utf8'));
+      const agentConfig = template.agents[0];
+      const config = {
+        role: 'lab-assistant',
+        scope: agentConfig.scope || [],
+        provider: agentConfig.provider || daemon.config.defaultProvider,
+        prompt: agentConfig.prompt,
+        metadata: { labAssistant: true, backend },
+      };
+      if (!config.provider) config.provider = daemon.config.defaultProvider;
+      const agent = await daemon.processes.spawn(config);
+      daemon.audit.log('lab.assistant.spawn', { id: agent.id, backend });
+      res.status(201).json({ agentId: agent.id, backend });
+    } catch (err) {
+      res.status(400).json({ error: err.message });
+    }
+  });
   // --- Wallet & earnings stubs (Base L2 — wired to real data post-mainnet) ---
   app.get('/api/network/wallet', networkGate, (req, res) => {

package/node_modules/@groove-dev/daemon/src/index.js CHANGED Viewed

@@ -719,10 +719,23 @@ export class Daemon {
     try {
       // Build set of agent names still in the registry — never remove their logs
-      const activeNames = new Set(this.registry.getAll().map((a) => a.name));
+      const allAgents = this.registry.getAll();
+      const activeNames = new Set(allAgents.map((a) => a.name));
-      // 1. Clean raw log files for agents no longer in the registry
+      // Safety: if registry is empty but log files exist, state may have been
+      // lost (corrupt JSON, partial write). Skip log cleanup to prevent
+      // destroying agent history that could still be recovered.
       const logsDir = resolve(grooveDir, 'logs');
+      const agentLogsDir = resolve(this.projectDir, 'GROOVE_AGENT_LOGS');
+      const hasOrphanedLogs = (existsSync(logsDir) && readdirSync(logsDir).length > 0) ||
+        (existsSync(agentLogsDir) && readdirSync(agentLogsDir).length > 0);
+      if (allAgents.length === 0 && hasOrphanedLogs) {
+        console.log('[Groove:GC] Registry empty but log files exist — skipping cleanup to prevent data loss');
+        return;
+      }
+      // 1. Clean raw log files for agents no longer in the registry
       if (existsSync(logsDir)) {
         for (const file of readdirSync(logsDir)) {
           const agentName = file.replace(/\.log$/, '');
@@ -732,7 +745,6 @@ export class Daemon {
       }
       // 2. Clean GROOVE_AGENT_LOGS/ for agents no longer in the registry
-      const agentLogsDir = resolve(this.projectDir, 'GROOVE_AGENT_LOGS');
       if (existsSync(agentLogsDir)) {
         for (const dir of readdirSync(agentLogsDir, { withFileTypes: true })) {
           if (!dir.isDirectory()) continue;

package/node_modules/@groove-dev/daemon/src/llama-server.js CHANGED Viewed

@@ -71,7 +71,7 @@ export class LlamaServerManager {
     // Flash attention for better memory efficiency (if supported)
     if (options.flashAttention !== false) {
-      args.push('--flash-attn');
+      args.push('--flash-attn', 'auto');
     }
     const proc = spawn('llama-server', args, {

package/node_modules/@groove-dev/daemon/src/process.js CHANGED Viewed

@@ -1953,6 +1953,9 @@ For normal file edits within your scope, proceed without review.
         this.daemon.trajectoryCapture.onAgentSpawn(
           newAgent.id, config.provider, config.model || null, config.role, teamSize, config.prompt
         ).catch(() => {});
+        if (message && typeof message === 'string' && message.trim()) {
+          this.daemon.trajectoryCapture.onUserMessage(newAgent.id, message, 'user');
+        }
       } catch (e) { /* fail silent */ }
     }
@@ -2095,6 +2098,9 @@ For normal file edits within your scope, proceed without review.
         this.daemon.trajectoryCapture.onAgentSpawn(
           newAgent.id, config.provider, loopConfig.model || config.model || null, config.role, teamSize, config.prompt
         ).catch(() => {});
+        if (message && typeof message === 'string' && message.trim()) {
+          this.daemon.trajectoryCapture.onUserMessage(newAgent.id, message, 'user');
+        }
       } catch (e) { /* fail silent */ }
     }

package/node_modules/@groove-dev/daemon/src/state.js CHANGED Viewed

@@ -1,7 +1,7 @@
 // GROOVE — State Persistence
 // FSL-1.1-Apache-2.0 — see LICENSE
-import { readFileSync, existsSync, readdirSync, unlinkSync } from 'fs';
+import { readFileSync, writeFileSync, existsSync, readdirSync, unlinkSync, renameSync, copyFileSync } from 'fs';
 import { writeFile } from 'node:fs/promises';
 import { resolve } from 'path';
@@ -9,6 +9,7 @@ export class StateManager {
   constructor(grooveDir) {
     this.grooveDir = grooveDir;
     this.path = resolve(grooveDir, 'state.json');
+    this.backupPath = resolve(grooveDir, 'state.json.bak');
     this.data = {};
   }
@@ -16,13 +17,28 @@ export class StateManager {
     if (existsSync(this.path)) {
       try {
         this.data = JSON.parse(readFileSync(this.path, 'utf8'));
+        return;
       } catch {
-        this.data = {};
+        console.error('[Groove:State] state.json corrupt — trying backup');
       }
     }
+    if (existsSync(this.backupPath)) {
+      try {
+        this.data = JSON.parse(readFileSync(this.backupPath, 'utf8'));
+        writeFileSync(this.path, readFileSync(this.backupPath, 'utf8'));
+        console.log('[Groove:State] Restored from state.json.bak');
+        return;
+      } catch {
+        console.error('[Groove:State] Backup also corrupt — starting fresh');
+      }
+    }
+    this.data = {};
   }
   async save() {
+    if (existsSync(this.path)) {
+      try { copyFileSync(this.path, this.backupPath); } catch { /* non-fatal */ }
+    }
     await writeFile(this.path, JSON.stringify(this.data, null, 2));
   }

package/node_modules/@groove-dev/daemon/src/teams.js CHANGED Viewed

@@ -1,7 +1,7 @@
 // GROOVE — Teams (Live Agent Groups)
 // FSL-1.1-Apache-2.0 — see LICENSE
-import { readFileSync, writeFileSync, existsSync, mkdirSync, renameSync, rmSync, readdirSync, cpSync } from 'fs';
+import { readFileSync, writeFileSync, copyFileSync, existsSync, mkdirSync, renameSync, rmSync, readdirSync, cpSync } from 'fs';
 import { resolve } from 'path';
 import { randomUUID } from 'crypto';
 import { validateTeamName, validateTeamMode } from './validate.js';
@@ -14,22 +14,41 @@ export class Teams {
   constructor(daemon) {
     this.daemon = daemon;
     this.filePath = resolve(daemon.grooveDir, 'teams.json');
+    this.backupPath = resolve(daemon.grooveDir, 'teams.json.bak');
     this.teams = new Map();
     this._load();
     this._ensureDefault();
   }
   _load() {
-    if (!existsSync(this.filePath)) return;
-    try {
-      const data = JSON.parse(readFileSync(this.filePath, 'utf8'));
-      if (Array.isArray(data)) {
-        for (const team of data) this.teams.set(team.id, team);
+    if (existsSync(this.filePath)) {
+      try {
+        const data = JSON.parse(readFileSync(this.filePath, 'utf8'));
+        if (Array.isArray(data)) {
+          for (const team of data) this.teams.set(team.id, team);
+          return;
+        }
+      } catch {
+        console.error('[Groove:Teams] teams.json corrupt — trying backup');
+      }
+    }
+    if (existsSync(this.backupPath)) {
+      try {
+        const data = JSON.parse(readFileSync(this.backupPath, 'utf8'));
+        if (Array.isArray(data)) {
+          for (const team of data) this.teams.set(team.id, team);
+          writeFileSync(this.filePath, readFileSync(this.backupPath, 'utf8'));
+          console.log('[Groove:Teams] Restored from teams.json.bak');
+          return;
+        }
+      } catch {
+        console.error('[Groove:Teams] Backup also corrupt');
       }
-    } catch { /* ignore corrupt file */ }
+    }
   }
   _save() {
+    try { copyFileSync(this.filePath, this.backupPath); } catch { /* may not exist yet */ }
     writeFileSync(this.filePath, JSON.stringify([...this.teams.values()], null, 2));
   }

package/node_modules/@groove-dev/daemon/templates/tgi-setup.json ADDED Viewed

@@ -0,0 +1,12 @@
+{
+  "name": "tgi-setup",
+  "description": "Lab Assistant for TGI installation and configuration",
+  "agents": [
+    {
+      "role": "lab-assistant",
+      "scope": [],
+      "provider": "claude-code",
+      "prompt": "You are a GROOVE Lab Assistant. Your job is to help the user set up a HuggingFace Text Generation Inference (TGI) server on their machine. Be conversational, report progress clearly, and explain each step.\n\n## Step 1 — System Recon\n\nRun these commands and report what you find:\n- `nvidia-smi` — GPU model, VRAM, driver version\n- `nvcc --version` — CUDA toolkit version\n- `python3 --version` and `pip3 --version`\n- `docker --version`\n- `free -h` — available RAM\n- `df -h /` — disk space\n\nSummarize the findings clearly: GPU model, VRAM, CUDA version, whether Docker is available, RAM and disk.\n\n## Step 2 — Decision Matrix\n\nBased on the recon, pick the best installation path:\n- **Docker available + NVIDIA GPU detected** → Use the Docker path (simplest, recommended). TGI is primarily distributed via Docker.\n- **No Docker, but Python 3.8+ and CUDA available** → Use the pip path (install from source)\n- **No GPU detected** → Warn the user that TGI requires a GPU for optimal performance. Suggest llama.cpp or Ollama as CPU-friendly alternatives instead.\n\nVRAM sizing guide for model selection:\n- Less than 8 GB VRAM → 1–3B parameter models\n- 8–16 GB VRAM → 7B parameter models\n- 16–24 GB VRAM → 13B parameter models\n- 24–48 GB VRAM → 30–70B quantized models\n- 48 GB+ VRAM → 70B+ parameter models\n\nRecommend a specific model based on the user's VRAM. Default to a popular model like Qwen/Qwen3-8B for 16–24 GB setups.\n\n## Step 3 — Installation\n\n**Docker path:**\n```bash\ndocker run -d --gpus all --shm-size 1g -p 8080:80 -v ~/.cache/huggingface:/data ghcr.io/huggingface/text-generation-inference --model-id <MODEL>\n```\nUse `docker run -d` so the server persists after this agent session ends.\n\n**Pip path:**\n```bash\npip install text-generation-server\nnohup text-generation-launcher --model-id <MODEL> --port 8080 > /tmp/tgi.log 2>&1 &\n```\nUse `nohup` and background the process so the server persists after this agent session ends.\n\nReplace `<MODEL>` with the recommended model from Step 2.\n\n## Step 4 — Validation\n\nWait for the server to start (it may take a few minutes to download and load the model). Then validate:\n```bash\ncurl http://localhost:8080/v1/models\n```\nConfirm you get a JSON response listing the loaded model. TGI also supports a health endpoint at `http://localhost:8080/health`.\n\n## Step 5 — Runtime Registration\n\nRegister the running server as a Lab runtime so it appears in the Model Lab UI:\n```bash\nPORT=$(cat ~/.groove/daemon.port 2>/dev/null || echo 31415)\ncurl -s -X POST http://localhost:$PORT/api/lab/runtimes \\\n  -H 'Content-Type: application/json' \\\n  -d '{\"name\":\"TGI - <MODEL>\",\"type\":\"tgi\",\"endpoint\":\"http://localhost:8080\"}'\n```\nReplace `<MODEL>` with the actual model name used.\n\n## Step 6 — Completion\n\nTell the user: \"Your TGI server is running and registered in the Lab. Switch to the Playground tab to start chatting with your model!\"\n\n## Error Handling\n\nIf any step fails, explain the error clearly and suggest a fix. Common issues:\n- **CUDA mismatch**: Driver version doesn't match CUDA toolkit — suggest updating the NVIDIA driver\n- **Insufficient VRAM**: Model too large — suggest a smaller model or quantized variant\n- **Docker not running**: `docker: Cannot connect to the Docker daemon` — suggest `sudo systemctl start docker`\n- **Missing nvidia-container-toolkit**: Docker can't access GPU — provide install instructions for the user's OS\n- **Port already in use**: Another service on port 8080 — suggest using a different port\n- **Shared memory too small**: `--shm-size` needs to be increased — suggest `--shm-size 2g`\n\nAlways offer to retry after the user fixes an issue."
+    }
+  ]
+}

package/node_modules/@groove-dev/daemon/templates/vllm-setup.json ADDED Viewed

@@ -0,0 +1,12 @@
+{
+  "name": "vllm-setup",
+  "description": "Lab Assistant for vLLM installation and configuration",
+  "agents": [
+    {
+      "role": "lab-assistant",
+      "scope": [],
+      "provider": "claude-code",
+      "prompt": "You are a GROOVE Lab Assistant. Your job is to help the user set up a vLLM inference server on their machine. Be conversational, report progress clearly, and explain each step.\n\n## Step 1 — System Recon\n\nRun these commands and report what you find:\n- `nvidia-smi` — GPU model, VRAM, driver version\n- `nvcc --version` — CUDA toolkit version\n- `python3 --version` and `pip3 --version`\n- `docker --version`\n- `free -h` — available RAM\n- `df -h /` — disk space\n\nSummarize the findings clearly: GPU model, VRAM, CUDA version, whether Docker is available, RAM and disk.\n\n## Step 2 — Decision Matrix\n\nBased on the recon, pick the best installation path:\n- **Docker available + NVIDIA GPU detected** → Use the Docker path (simplest, recommended)\n- **No Docker, but Python 3.8+ and CUDA available** → Use the pip path\n- **No GPU detected** → Warn the user that vLLM requires a GPU. Suggest llama.cpp or Ollama as CPU-friendly alternatives instead.\n\nVRAM sizing guide for model selection:\n- Less than 8 GB VRAM → 1–3B parameter models\n- 8–16 GB VRAM → 7B parameter models\n- 16–24 GB VRAM → 13B parameter models\n- 24–48 GB VRAM → 30–70B quantized models\n- 48 GB+ VRAM → 70B+ parameter models\n\nRecommend a specific model based on the user's VRAM. Default to a popular model like Qwen/Qwen3-8B for 16–24 GB setups.\n\n## Step 3 — Installation\n\n**Docker path:**\n```bash\ndocker run -d --runtime nvidia --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host vllm/vllm-openai:latest --model <MODEL>\n```\nUse `docker run -d` so the server persists after this agent session ends.\n\n**Pip path:**\n```bash\npip install vllm\nnohup vllm serve <MODEL> --host 0.0.0.0 --port 8000 > /tmp/vllm.log 2>&1 &\n```\nUse `nohup` and background the process so the server persists after this agent session ends.\n\nReplace `<MODEL>` with the recommended model from Step 2.\n\n## Step 4 — Validation\n\nWait for the server to start (it may take a few minutes to download and load the model). Then validate:\n```bash\ncurl http://localhost:8000/v1/models\n```\nConfirm you get a JSON response listing the loaded model.\n\n## Step 5 — Runtime Registration\n\nRegister the running server as a Lab runtime so it appears in the Model Lab UI:\n```bash\nPORT=$(cat ~/.groove/daemon.port 2>/dev/null || echo 31415)\ncurl -s -X POST http://localhost:$PORT/api/lab/runtimes \\\n  -H 'Content-Type: application/json' \\\n  -d '{\"name\":\"vLLM - <MODEL>\",\"type\":\"vllm\",\"endpoint\":\"http://localhost:8000\"}'\n```\nReplace `<MODEL>` with the actual model name used.\n\n## Step 6 — Completion\n\nTell the user: \"Your vLLM server is running and registered in the Lab. Switch to the Playground tab to start chatting with your model!\"\n\n## Error Handling\n\nIf any step fails, explain the error clearly and suggest a fix. Common issues:\n- **CUDA mismatch**: Driver version doesn't match CUDA toolkit — suggest updating the NVIDIA driver\n- **Insufficient VRAM**: Model too large — suggest a smaller model or quantized variant\n- **Docker not running**: `docker: Cannot connect to the Docker daemon` — suggest `sudo systemctl start docker`\n- **Missing nvidia-container-toolkit**: Docker can't access GPU — provide install instructions for the user's OS\n- **Port already in use**: Another service on port 8000 — suggest using a different port with `--port 8001`\n\nAlways offer to retry after the user fixes an issue."
+    }
+  ]
+}