npm - @appkit/llamacpp-cli - Versions diffs - 1.11.0 → 1.12.1 - Mend

@appkit/llamacpp-cli 1.11.0 → 1.12.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (126) hide show

package/README.md +572 -170
package/dist/cli.js +99 -0
package/dist/cli.js.map +1 -1
package/dist/commands/admin/config.d.ts +10 -0
package/dist/commands/admin/config.d.ts.map +1 -0
package/dist/commands/admin/config.js +100 -0
package/dist/commands/admin/config.js.map +1 -0
package/dist/commands/admin/logs.d.ts +10 -0
package/dist/commands/admin/logs.d.ts.map +1 -0
package/dist/commands/admin/logs.js +114 -0
package/dist/commands/admin/logs.js.map +1 -0
package/dist/commands/admin/restart.d.ts +2 -0
package/dist/commands/admin/restart.d.ts.map +1 -0
package/dist/commands/admin/restart.js +29 -0
package/dist/commands/admin/restart.js.map +1 -0
package/dist/commands/admin/start.d.ts +2 -0
package/dist/commands/admin/start.d.ts.map +1 -0
package/dist/commands/admin/start.js +30 -0
package/dist/commands/admin/start.js.map +1 -0
package/dist/commands/admin/status.d.ts +2 -0
package/dist/commands/admin/status.d.ts.map +1 -0
package/dist/commands/admin/status.js +82 -0
package/dist/commands/admin/status.js.map +1 -0
package/dist/commands/admin/stop.d.ts +2 -0
package/dist/commands/admin/stop.d.ts.map +1 -0
package/dist/commands/admin/stop.js +21 -0
package/dist/commands/admin/stop.js.map +1 -0
package/dist/commands/logs.d.ts +1 -0
package/dist/commands/logs.d.ts.map +1 -1
package/dist/commands/logs.js +22 -0
package/dist/commands/logs.js.map +1 -1
package/dist/lib/admin-manager.d.ts +111 -0
package/dist/lib/admin-manager.d.ts.map +1 -0
package/dist/lib/admin-manager.js +413 -0
package/dist/lib/admin-manager.js.map +1 -0
package/dist/lib/admin-server.d.ts +148 -0
package/dist/lib/admin-server.d.ts.map +1 -0
package/dist/lib/admin-server.js +1161 -0
package/dist/lib/admin-server.js.map +1 -0
package/dist/lib/download-job-manager.d.ts +64 -0
package/dist/lib/download-job-manager.d.ts.map +1 -0
package/dist/lib/download-job-manager.js +164 -0
package/dist/lib/download-job-manager.js.map +1 -0
package/dist/tui/MultiServerMonitorApp.js +1 -1
package/dist/types/admin-config.d.ts +19 -0
package/dist/types/admin-config.d.ts.map +1 -0
package/dist/types/admin-config.js +3 -0
package/dist/types/admin-config.js.map +1 -0
package/dist/utils/log-parser.d.ts +9 -0
package/dist/utils/log-parser.d.ts.map +1 -1
package/dist/utils/log-parser.js +11 -0
package/dist/utils/log-parser.js.map +1 -1
package/package.json +10 -2
package/web/README.md +429 -0
package/web/dist/assets/index-Bin89Lwr.css +1 -0
package/web/dist/assets/index-CVmonw3T.js +17 -0
package/web/dist/index.html +14 -0
package/web/dist/vite.svg +1 -0
package/.versionrc.json +0 -16
package/CHANGELOG.md +0 -203
package/MONITORING-ACCURACY-FIX.md +0 -199
package/PER-PROCESS-METRICS.md +0 -190
package/docs/images/.gitkeep +0 -1
package/src/cli.ts +0 -423
package/src/commands/config-global.ts +0 -38
package/src/commands/config.ts +0 -323
package/src/commands/create.ts +0 -183
package/src/commands/delete.ts +0 -74
package/src/commands/list.ts +0 -37
package/src/commands/logs-all.ts +0 -251
package/src/commands/logs.ts +0 -321
package/src/commands/monitor.ts +0 -110
package/src/commands/ps.ts +0 -84
package/src/commands/pull.ts +0 -44
package/src/commands/rm.ts +0 -107
package/src/commands/router/config.ts +0 -116
package/src/commands/router/logs.ts +0 -256
package/src/commands/router/restart.ts +0 -36
package/src/commands/router/start.ts +0 -60
package/src/commands/router/status.ts +0 -119
package/src/commands/router/stop.ts +0 -33
package/src/commands/run.ts +0 -233
package/src/commands/search.ts +0 -107
package/src/commands/server-show.ts +0 -161
package/src/commands/show.ts +0 -207
package/src/commands/start.ts +0 -101
package/src/commands/stop.ts +0 -39
package/src/commands/tui.ts +0 -25
package/src/lib/config-generator.ts +0 -130
package/src/lib/history-manager.ts +0 -172
package/src/lib/launchctl-manager.ts +0 -225
package/src/lib/metrics-aggregator.ts +0 -257
package/src/lib/model-downloader.ts +0 -328
package/src/lib/model-scanner.ts +0 -157
package/src/lib/model-search.ts +0 -114
package/src/lib/models-dir-setup.ts +0 -46
package/src/lib/port-manager.ts +0 -80
package/src/lib/router-logger.ts +0 -201
package/src/lib/router-manager.ts +0 -414
package/src/lib/router-server.ts +0 -538
package/src/lib/state-manager.ts +0 -206
package/src/lib/status-checker.ts +0 -113
package/src/lib/system-collector.ts +0 -315
package/src/tui/ConfigApp.ts +0 -1085
package/src/tui/HistoricalMonitorApp.ts +0 -587
package/src/tui/ModelsApp.ts +0 -368
package/src/tui/MonitorApp.ts +0 -386
package/src/tui/MultiServerMonitorApp.ts +0 -1833
package/src/tui/RootNavigator.ts +0 -74
package/src/tui/SearchApp.ts +0 -511
package/src/tui/SplashScreen.ts +0 -149
package/src/types/global-config.ts +0 -26
package/src/types/history-types.ts +0 -39
package/src/types/model-info.ts +0 -8
package/src/types/monitor-types.ts +0 -162
package/src/types/router-config.ts +0 -25
package/src/types/server-config.ts +0 -46
package/src/utils/downsample-utils.ts +0 -128
package/src/utils/file-utils.ts +0 -146
package/src/utils/format-utils.ts +0 -98
package/src/utils/log-parser.ts +0 -271
package/src/utils/log-utils.ts +0 -178
package/src/utils/process-utils.ts +0 -316
package/src/utils/prompt-utils.ts +0 -47
package/test-load.sh +0 -100
package/tsconfig.json +0 -20

package/README.md CHANGED Viewed

@@ -13,6 +13,7 @@ CLI tool to manage local llama.cpp servers on macOS. Provides an Ollama-like exp
 - 🚀 **Easy server management** - Start, stop, and monitor llama.cpp servers
 - 🔀 **Unified router** - Single OpenAI-compatible endpoint for all models with automatic routing and request logging
+- 🌐 **Admin Interface** - REST API + modern web UI for remote management and automation
 - 🤖 **Model downloads** - Pull GGUF models from Hugging Face
 - 📦 **Models Management TUI** - Browse, search, and delete models without leaving the TUI. Search HuggingFace, download with progress tracking, manage local models
 - ⚙️ **Smart defaults** - Auto-configure threads, context size, and GPU layers based on model size
@@ -47,6 +48,21 @@ Ollama is great, but it adds a wrapper layer that introduces latency. llamacpp-c
 If you need raw speed and full control, llamacpp-cli is the better choice.
+### Management Options
+llamacpp-cli offers three ways to manage your servers:
+| Interface | Best For | Access | Key Features |
+|-----------|----------|--------|--------------|
+| **CLI** | Local development, automation scripts | Terminal | Full control, shell scripting, fastest for local tasks |
+| **Router** | Single endpoint for all models | Any OpenAI client | Model-based routing, streaming, zero config |
+| **Admin** | Remote management, team access | REST API + Web browser | Full CRUD, web UI, API automation, remote control |
+**When to use each:**
+- **CLI** - Local development, scripting, full terminal control
+- **Router** - Using with LLM frameworks (LangChain, LlamaIndex), multi-model apps
+- **Admin** - Remote access, team collaboration, browser-based management, CI/CD pipelines
 ## Installation
 ```bash
@@ -100,6 +116,10 @@ llamacpp server start llama-3.2-3b
 # View logs
 llamacpp server logs llama-3.2-3b -f
+# Start admin interface (REST API + Web UI)
+llamacpp admin start
+# Access web UI at http://localhost:9200
 ```
 ## Using Your Server
@@ -140,6 +160,221 @@ curl http://localhost:9000/health
 The server is fully compatible with OpenAI's API format, so you can use it with any OpenAI-compatible client library.
+## Interactive TUI
+The primary way to manage and monitor your llama.cpp servers is through the interactive TUI dashboard. Launch it by running `llamacpp` with no arguments.
+```bash
+llamacpp
+```
+![Server Monitoring TUI](https://raw.githubusercontent.com/appkitstudio/llamacpp-cli/main/docs/images/monitor-detail.png)
+### Overview
+The TUI provides a comprehensive interface for:
+- **Monitoring** - Real-time metrics for all servers (GPU, CPU, memory, token generation)
+- **Server Management** - Create, start, stop, remove, and configure servers
+- **Model Management** - Browse, search, download, and delete models
+- **Historical Metrics** - View time-series charts of past performance
+### Multi-Server Dashboard
+The main view shows all your servers at a glance:
+```
+┌─────────────────────────────────────────────────────────┐
+│ System Resources                                         │
+│ GPU: [████░░░] 65%  CPU: [███░░░] 38%  Memory: 58%     │
+├─────────────────────────────────────────────────────────┤
+│ Servers (3 running, 0 stopped)                          │
+│   │ Server ID      │ Port │ Status │ Slots │ tok/s    │
+│───┼────────────────┼──────┼────────┼───────┼──────────┤
+│ ► │ llama-3-2-3b   │ 9000 │ ● RUN  │ 2/4   │ 245      │  (highlighted)
+│   │ qwen2-7b       │ 9001 │ ● RUN  │ 1/4   │ 198      │
+│   │ llama-3-1-8b   │ 9002 │ ○ IDLE │ 0/4   │ -        │
+└─────────────────────────────────────────────────────────┘
+↑/↓ Navigate | Enter for details | [N]ew [M]odels [H]istory [Q]uit
+```
+**Features:**
+- System resource overview (GPU, CPU, memory)
+- List of all servers (running and stopped)
+- Real-time status updates every 2 seconds
+- Color-coded status indicators
+- Navigate with arrow keys or vim keys (k/j)
+### Single-Server Detail View
+Press `Enter` on any server to see detailed information:
+**Running servers show:**
+- Server information (status, uptime, model name, endpoint)
+- Request metrics (active/idle slots, prompt speed, generation speed)
+- Active slots detail (per-slot token generation rates)
+- System resources (GPU/CPU/ANE utilization, memory usage)
+**Stopped servers show:**
+- Server configuration (threads, context, GPU layers)
+- Last activity timestamps
+- Quick action commands (start, config, logs)
+### Models Management
+Press `M` from the main view to access Models Management.
+**Features:**
+- Browse all installed models with size and modified date
+- View which servers are using each model
+- Delete models with cascade option (removes associated servers)
+- Search HuggingFace for new models
+- Download models with real-time progress tracking
+**Models View:**
+- View all GGUF files in scrollable table
+- Color-coded server usage (green = safe to delete, yellow = in use)
+- Delete selected model with `Enter` or `D` key
+- Confirmation dialog with cascade warning
+**Search View** (press `S` from Models view):
+- Search HuggingFace models by text input
+- Browse results with downloads, likes, and file counts
+- Expand model to show available GGUF files
+- Download with real-time progress, speed, and ETA
+- Cancel download with `ESC` (cleans up partial files)
+### Server Operations
+**Create Server** (press `N` from main view):
+1. Select model from list (shows existing servers per model)
+2. Edit configuration (threads, context size, GPU layers, port)
+3. Review smart defaults based on model size
+4. Create and automatically start server
+5. Return to main view with new server visible
+**Start/Stop Server** (press `S` from detail view):
+- Toggle server state with progress modal
+- Stays in detail view after operation
+- Shows updated status immediately
+**Remove Server** (press `R` from detail view):
+- Confirmation dialog with option to delete model file
+- Warns if other servers use the same model
+- Cascade deletion removes all associated data
+- Returns to main view after deletion
+**Configure Server** (press `C` from detail view):
+- Edit all server parameters inline
+- Modal dialogs for different field types
+- Model migration support (handles server ID changes)
+- Automatic restart prompts for running servers
+- Port conflict detection and validation
+### Historical Monitoring
+Press `H` from any view to see historical time-series charts.
+**Single-Server Historical View:**
+- Token generation speed over time
+- GPU usage (%) with avg/max/min stats
+- CPU usage (%) with avg/max/min
+- Memory usage (%) with avg/max/min
+- Auto-refresh every 3 seconds
+**Multi-Server Historical View:**
+- Aggregated metrics across all servers
+- Total token generation speed (sum)
+- System GPU usage (average)
+- Total CPU usage (sum of per-process)
+- Total memory usage (sum in GB)
+**View Modes** (toggle with `H` key):
+- **Recent View (default):**
+  - Shows last 40-80 samples (~1-3 minutes)
+  - Raw data with no downsampling - perfect accuracy
+  - Best for: "What's happening right now?"
+- **Hour View:**
+  - Shows all ~1,800 samples from last hour
+  - Absolute time-aligned downsampling (30:1 ratio)
+  - Bucket max for GPU/CPU/token speed (preserves peaks)
+  - Bucket mean for memory (shows average)
+  - Chart stays perfectly stable as data streams in
+  - Best for: "What happened over the last hour?"
+**Data Collection:**
+- Automatic during monitoring (piggyback on polling loop)
+- Stored in `~/.llamacpp/history/<server-id>.json` per server
+- Retention: Last 24 hours (circular buffer, auto-prune)
+- File size: ~21 MB per server for 24h @ 2s interval
+### Keyboard Shortcuts
+**List View (Multi-Server):**
+- `↑/↓` or `k/j` - Navigate server list
+- `Enter` - View details for selected server
+- `N` - Create new server
+- `M` - Switch to Models Management
+- `H` - View historical metrics (all servers)
+- `ESC` - Exit TUI
+- `Q` - Quit immediately
+**Detail View (Single-Server):**
+- `S` - Start/Stop server (toggles based on status)
+- `C` - Open configuration screen
+- `R` - Remove server (with confirmation)
+- `H` - View historical metrics (this server)
+- `ESC` - Back to list view
+- `Q` - Quit immediately
+**Models View:**
+- `↑/↓` or `k/j` - Navigate model list
+- `Enter` or `D` - Delete selected model
+- `S` - Open search view
+- `R` - Refresh model list
+- `ESC` - Back to main view
+- `Q` - Quit immediately
+**Search View:**
+- `/` or `I` - Focus search input
+- `Enter` (in input) - Execute search
+- `↑/↓` or `k/j` - Navigate results or files
+- `Enter` (on result) - Show GGUF files for model
+- `Enter` (on file) - Download/install model
+- `R` - Refresh results (re-execute search)
+- `ESC` - Back to models view (or results list if viewing files)
+- `Q` - Quit immediately
+**Historical View:**
+- `H` - Toggle between Recent/Hour view
+- `ESC` - Return to live monitoring
+- `Q` - Quit immediately
+**Configuration Screen:**
+- `↑/↓` or `k/j` - Navigate fields
+- `Enter` - Open modal for selected field
+- `S` - Save changes (prompts for restart if running)
+- `ESC` - Cancel (prompts if unsaved changes)
+- `Q` - Quit immediately
+### Optional: GPU/CPU Metrics
+For GPU and CPU utilization metrics, install macmon:
+```bash
+brew install vladkens/tap/macmon
+```
+Without macmon, the TUI still shows:
+- ✅ Server status and uptime
+- ✅ Active slots and token generation speeds
+- ✅ Memory usage (via built-in vm_stat)
+- ❌ GPU/CPU/ANE utilization (requires macmon)
+### Deprecated: `llamacpp server monitor`
+The `llamacpp server monitor` command is deprecated. Use `llamacpp` instead to launch the TUI dashboard.
 ## Router (Unified Endpoint)
 The router provides a single OpenAI-compatible endpoint that automatically routes requests to the correct backend server based on the model name. This is perfect for LLM clients that don't support multiple endpoints.
@@ -300,8 +535,275 @@ llamacpp router logs --stderr
 If the requested model's server is not running, the router returns a 503 error with a helpful message.
+## Admin Interface (REST API + Web UI)
+The admin interface provides full remote management of llama.cpp servers through both a REST API and a modern web UI. Perfect for programmatic control, automation, and browser-based management.
+### Quick Start
+```bash
+# Start the admin service (generates API key automatically)
+llamacpp admin start
+# View status and API key
+llamacpp admin status
+# Access web UI
+open http://localhost:9200
+```
+### Commands
+```bash
+llamacpp admin start       # Start admin service
+llamacpp admin stop        # Stop admin service
+llamacpp admin status      # Show status and API key
+llamacpp admin restart     # Restart service
+llamacpp admin config      # Update settings (--port, --host, --regenerate-key, --verbose)
+llamacpp admin logs        # View admin logs (with --follow, --clear, --rotate options)
+```
+### REST API
+The Admin API provides full CRUD operations for servers and models via HTTP.
+**Base URL:** `http://localhost:9200`
+**Authentication:** Bearer token (API key auto-generated on first start)
+#### Server Endpoints
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| GET | `/api/servers` | List all servers with status |
+| GET | `/api/servers/:id` | Get server details |
+| POST | `/api/servers` | Create new server |
+| PATCH | `/api/servers/:id` | Update server config |
+| DELETE | `/api/servers/:id` | Remove server |
+| POST | `/api/servers/:id/start` | Start stopped server |
+| POST | `/api/servers/:id/stop` | Stop running server |
+| POST | `/api/servers/:id/restart` | Restart server |
+| GET | `/api/servers/:id/logs?type=stdout\|stderr&lines=100` | Get server logs |
+#### Model Endpoints
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| GET | `/api/models` | List available models |
+| GET | `/api/models/:name` | Get model details |
+| DELETE | `/api/models/:name?cascade=true` | Delete model (cascade removes servers) |
+| GET | `/api/models/search?q=query` | Search HuggingFace |
+| POST | `/api/models/download` | Download model from HF |
+#### System Endpoints
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| GET | `/health` | Health check (no auth) |
+| GET | `/api/status` | System status |
+#### Example Usage
+**Create a server:**
+```bash
+curl -X POST http://localhost:9200/api/servers \
+  -H "Authorization: Bearer YOUR_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "llama-3.2-3b-instruct-q4_k_m.gguf",
+    "port": 9001,
+    "threads": 8,
+    "ctxSize": 8192
+  }'
+```
+**Start a server:**
+```bash
+curl -X POST http://localhost:9200/api/servers/llama-3-2-3b/start \
+  -H "Authorization: Bearer YOUR_API_KEY"
+```
+**List all servers:**
+```bash
+curl http://localhost:9200/api/servers \
+  -H "Authorization: Bearer YOUR_API_KEY"
+```
+**Delete model with cascade:**
+```bash
+curl -X DELETE "http://localhost:9200/api/models/llama-3.2-3b-instruct-q4_k_m.gguf?cascade=true" \
+  -H "Authorization: Bearer YOUR_API_KEY"
+```
+### Web UI
+The web UI provides a modern, browser-based interface for managing servers and models.
+![Web UI - Servers Page](https://raw.githubusercontent.com/appkitstudio/llamacpp-cli/main/docs/images/web-ui-servers.png)
+**Access:** `http://localhost:9200` (same port as API)
+**Features:**
+- **Dashboard** - System overview with stats and running servers
+- **Servers Page** - Full CRUD operations (create, start, stop, restart, delete)
+- **Models Page** - Browse models, view usage, delete with cascade
+- **Real-time updates** - Auto-refresh every 5 seconds
+- **Dark theme** - Modern, clean interface
+**Pages:**
+| Page | Path | Description |
+|------|------|-------------|
+| Dashboard | `/dashboard` | System overview and quick stats |
+| Servers | `/servers` | Manage all servers (list, start/stop, configure) |
+| Models | `/models` | Browse models, view server usage, delete |
+**Building Web UI:**
+The web UI is built with React + Vite + TypeScript. To build:
+```bash
+cd web
+npm install
+npm run build
+```
+This generates static files in `web/dist/` which are automatically served by the admin service.
+**Development:**
+```bash
+cd web
+npm install
+npm run dev  # Starts dev server on localhost:5173 with API proxy
+```
+See `web/README.md` for detailed web development documentation.
+### Configuration
+Configure the admin service with various options:
+```bash
+# Change port
+llamacpp admin config --port 9300 --restart
+# Enable remote access (WARNING: security implications)
+llamacpp admin config --host 0.0.0.0 --restart
+# Regenerate API key (invalidates old key)
+llamacpp admin config --regenerate-key --restart
+# Enable verbose logging
+llamacpp admin config --verbose true --restart
+```
+**Note:** Changes require a restart to take effect. Use `--restart` flag to apply immediately.
+### Security
+**Default Security Posture:**
+- **Host:** `127.0.0.1` (localhost only - secure by default)
+- **API Key:** Auto-generated 32-character hex string
+- **Storage:** API key stored in `~/.llamacpp/admin.json` (file permissions 600)
+**Remote Access:**
+⚠️ **Warning:** Changing host to `0.0.0.0` allows remote access from your network and potentially the internet.
+If you need remote access:
+```bash
+# Enable remote access
+llamacpp admin config --host 0.0.0.0 --restart
+# Ensure you use strong API key
+llamacpp admin config --regenerate-key --restart
+```
+**Best Practices:**
+- Keep default `127.0.0.1` for local development
+- Use HTTPS reverse proxy (nginx/Caddy) for remote access
+- Rotate API keys regularly if exposed
+- Monitor admin logs for suspicious activity
+### Logging
+The admin service maintains separate log streams:
+| Log File | Purpose | Content |
+|----------|---------|---------|
+| `admin.stdout` | Request activity | Endpoint, status, duration |
+| `admin.stderr` | System messages | Startup, shutdown, errors |
+**View logs:**
+```bash
+# Show activity logs (default - stdout)
+llamacpp admin logs
+# Show system logs (errors, startup)
+llamacpp admin logs --stderr
+# Follow in real-time
+llamacpp admin logs --follow
+# Clear all logs
+llamacpp admin logs --clear
+# Rotate logs with timestamp
+llamacpp admin logs --rotate
+```
 ### Example Output
+**Starting the admin service:**
+```
+$ llamacpp admin start
+✓ Admin service started successfully!
+  Port:    9200
+  Host:    127.0.0.1
+  API Key: a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6
+  API:     http://localhost:9200/api
+  Web UI:  http://localhost:9200
+  Health:  http://localhost:9200/health
+Quick Commands:
+  llamacpp admin status        # View status
+  llamacpp admin logs -f       # Follow logs
+  llamacpp admin config --help # Configure options
+```
+**Admin status:**
+```
+$ llamacpp admin status
+Admin Service Status
+────────────────────
+Status:     ✅ RUNNING
+PID:        98765
+Uptime:     2h 15m
+Port:       9200
+Host:       127.0.0.1
+API Key:    a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6
+API:        http://localhost:9200/api
+Web UI:     http://localhost:9200
+Configuration:
+  Config:   ~/.llamacpp/admin.json
+  Plist:    ~/Library/LaunchAgents/com.llama.admin.plist
+  Logs:     ~/.llamacpp/logs/admin.{stdout,stderr}
+Quick Commands:
+  llamacpp admin stop          # Stop service
+  llamacpp admin restart       # Restart service
+  llamacpp admin logs -f       # Follow logs
+```
 Creating a server:
 ```
 $ llamacpp server create llama-3.2-3b-instruct-q4_k_m.gguf
@@ -356,7 +858,7 @@ Launch the interactive TUI dashboard for monitoring and managing servers.
 llamacpp
 ```
-See [Interactive TUI Dashboard](#interactive-tui-dashboard) for full details.
+See [Interactive TUI](#interactive-tui) for full details.
 ### `llamacpp ls`
 List all GGUF models in ~/models directory.
@@ -476,47 +978,6 @@ llamacpp logs --rotate
 **Use case:** Quickly see which servers are accumulating large logs, or clean up all logs at once.
-## Models Management TUI
-The Models Management TUI is accessible by pressing `M` from the `llamacpp` list view. It provides a full-featured interface for managing local models and searching/downloading new ones.
-**Features:**
-- **Browse local models** - View all GGUF files with size, modification date, and server usage
-- **Delete models** - Remove models with automatic cleanup of associated servers
-- **Search HuggingFace** - Find and browse models from Hugging Face repository
-- **Download with progress** - Real-time progress tracking for model downloads
-- **Seamless navigation** - Switch between monitoring and models management
-**Quick Access:**
-```bash
-# Launch TUI and press 'M' to open Models Management
-llamacpp
-```
-**Models View:**
-- View all installed models in scrollable table
-- See which servers are using each model
-- Color-coded status (green = safe to delete, yellow/gray = servers using)
-- Delete models with Enter or D key
-- Cascade deletion: automatically removes associated servers
-**Search View (press 'S' from Models view):**
-- Search HuggingFace models by name
-- Browse search results with download counts and likes
-- Expand models to show available GGUF files
-- Download files with real-time progress tracking
-- Cancel downloads with ESC (cleans up partial files)
-**Keyboard Controls:**
-- **M** - Switch to Models view (from TUI list view)
-- **↑/↓** or **k/j** - Navigate lists
-- **Enter** - Select/download/delete
-- **S** - Open search view (from models view)
-- **/** or **I** - Focus search input (in search view)
-- **R** - Refresh view
-- **ESC** - Back/cancel
-- **Q** - Quit
 ## Server Management
 ### `llamacpp server create <model> [options]`
@@ -733,131 +1194,6 @@ The compact format shows one line per HTTP request and includes:
 Use `--http` to see full request/response JSON, or `--verbose` option to see all internal server logs.
-## Interactive TUI Dashboard
-The main way to monitor and manage servers is through the interactive TUI dashboard, launched by running `llamacpp` with no arguments.
-```bash
-llamacpp
-```
-![Server Monitoring TUI](https://raw.githubusercontent.com/dweaver/llamacpp-cli/main/docs/images/monitor-detail.png)
-**Features:**
-- Multi-server dashboard with real-time metrics
-- Drill-down to single-server detail view
-- Create, start, stop, and remove servers without leaving the TUI
-- Edit server configuration inline
-- Access Models Management (press `M`)
-- Historical metrics with time-series charts
-**Multi-Server Dashboard:**
-```
-┌─────────────────────────────────────────────────────────┐
-│ System Resources                                         │
-│ GPU: [████░░░] 65%  CPU: [███░░░] 38%  Memory: 58%     │
-├─────────────────────────────────────────────────────────┤
-│ Servers (3 running, 0 stopped)                          │
-│   │ Server ID      │ Port │ Status │ Slots │ tok/s    │
-│───┼────────────────┼──────┼────────┼───────┼──────────┤
-│ ► │ llama-3-2-3b   │ 9000 │ ● RUN  │ 2/4   │ 245      │  (highlighted)
-│   │ qwen2-7b       │ 9001 │ ● RUN  │ 1/4   │ 198      │
-│   │ llama-3-1-8b   │ 9002 │ ○ IDLE │ 0/4   │ -        │
-└─────────────────────────────────────────────────────────┘
-↑/↓ Navigate | Enter for details | [H]istory [R]efresh [Q] Quit
-```
-**Single-Server View:**
-- **Server Information** - Status, uptime, model name, endpoint, slot counts
-- **Request Metrics** - Active/idle slots, prompt speed, generation speed
-- **Active Slots** - Per-slot token generation rates and progress
-- **System Resources** - GPU/CPU/ANE utilization, memory usage, temperature
-**Keyboard Shortcuts:**
-- **List View (Multi-Server):**
-  - `↑/↓` or `k/j` - Navigate server list
-  - `Enter` - View details for selected server
-  - `N` - Create new server
-  - `M` - Switch to Models Management
-  - `H` - View historical metrics (all servers)
-  - `ESC` - Exit TUI
-  - `Q` - Quit immediately
-- **Detail View (Single-Server):**
-  - `S` - Start/Stop server (toggles based on status)
-  - `C` - Open configuration screen
-  - `R` - Remove server (with confirmation)
-  - `H` - View historical metrics (this server)
-  - `ESC` - Back to list view
-  - `Q` - Quit immediately
-- **Historical View:**
-  - `H` - Toggle Hour View (Recent ↔ Hour)
-  - `ESC` - Back to live monitoring
-  - `Q` - Quit
-**Historical Monitoring:**
-Press `H` from any live monitoring view to see historical time-series charts. The historical view shows:
-- **Token generation speed** over time with statistics (avg, max, stddev)
-- **GPU usage** over time with min/max/avg
-- **CPU usage** over time with min/max/avg
-- **Memory usage** over time with min/max/avg
-**View Modes (Toggle with `H` key):**
-- **Recent View (default):**
-  - Shows last 40-80 samples (~1-3 minutes)
-  - Raw data with no downsampling - perfect accuracy
-  - Best for: "What's happening right now?"
-- **Hour View:**
-  - Shows all ~1,800 samples from last hour
-  - **Absolute time-aligned downsampling** (30:1 ratio) - chart stays perfectly stable
-  - Bucket boundaries never shift (aligned to round minutes)
-  - New samples only affect their own bucket, not the entire chart
-  - **Bucket max** for GPU/CPU/token speed (preserves peaks)
-  - **Bucket mean** for memory (shows average)
-  - Chart labels indicate "Peak per bucket" or "Average per bucket"
-  - Best for: "What happened over the last hour?"
-**Note:** The `H` key has two functions:
-- From **live monitoring** → Enter historical view (Recent mode)
-- Within **historical view** → Toggle between Recent and Hour views
-**Data Collection:**
-Historical data is automatically collected whenever you run the TUI (`llamacpp`). Data is retained for 24 hours in `~/.llamacpp/history/<server-id>.json` files, then automatically pruned.
-**Multi-Server Historical View:**
-From the multi-server dashboard, press `H` to see a summary table comparing average metrics across all servers for the last hour.
-**Features:**
-- **Multi-server dashboard** - Monitor all servers at once
-- **Real-time updates** - Metrics refresh every 2 seconds (adjustable)
-- **Historical monitoring** - View time-series charts of past metrics (press `H` from monitor view)
-- **Token-per-second calculation** - Shows actual generation speed per slot
-- **Progress bars** - Visual representation of GPU/CPU/memory usage
-- **Error recovery** - Shows stale data with warnings if connection lost
-- **Graceful degradation** - Works without GPU metrics (uses memory-only mode)
-**Optional: GPU/CPU Metrics**
-For GPU and CPU utilization metrics, install macmon:
-```bash
-brew install vladkens/tap/macmon
-```
-Without macmon, the TUI still shows:
-- ✅ Server status and uptime
-- ✅ Active slots and token generation speeds
-- ✅ Memory usage (via built-in vm_stat)
-- ❌ GPU/CPU/ANE utilization (requires macmon)
-### Deprecated: `llamacpp server monitor`
-The `llamacpp server monitor` command is deprecated. Use `llamacpp` instead to launch the TUI dashboard.
 ## Configuration
 llamacpp-cli stores its configuration in `~/.llamacpp/`:
@@ -865,11 +1201,17 @@ llamacpp-cli stores its configuration in `~/.llamacpp/`:
 ```
 ~/.llamacpp/
 ├── config.json           # Global settings
+├── router.json           # Router configuration
+├── admin.json            # Admin service configuration (includes API key)
 ├── servers/              # Server configurations
 │   └── <server-id>.json
-└── logs/                 # Server logs
-    ├── <server-id>.stdout
-    └── <server-id>.stderr
+├── logs/                 # Server logs
+│   ├── <server-id>.stdout
+│   ├── <server-id>.stderr
+│   ├── router.{stdout,stderr,log}
+│   └── admin.{stdout,stderr}
+└── history/              # Historical metrics (TUI)
+    └── <server-id>.json
 ```
 ## Smart Defaults
@@ -901,6 +1243,11 @@ Services are named `com.llama.<model-id>`.
 - When you **stop** a server, it's unloaded from launchd and stays stopped (no auto-restart)
 - Crashed servers will automatically restart (when loaded)
+**Router and Admin Services:**
+- The **Router** (`com.llama.router`) provides a unified OpenAI-compatible endpoint for all models
+- The **Admin** (`com.llama.admin`) provides REST API + web UI for remote management
+- Both run as launchctl services similar to individual model servers
 ## Known Limitations
 - **macOS only** - Relies on launchctl for service management (Linux/Windows support planned)
@@ -935,8 +1282,34 @@ Check the logs for errors:
 llamacpp server logs <identifier> --errors
 ```
+### Admin web UI not loading
+Check that static files are built:
+```bash
+cd web
+npm install
+npm run build
+```
+Then restart the admin service:
+```bash
+llamacpp admin restart
+```
+### API authentication failing
+Get your current API key:
+```bash
+llamacpp admin status  # Shows API key
+```
+Or regenerate a new one:
+```bash
+llamacpp admin config --regenerate-key --restart
+```
 ## Development
+### CLI Development
 ```bash
 # Install dependencies
 npm install
@@ -953,6 +1326,27 @@ npm run build
 npm run clean
 ```
+### Web UI Development
+```bash
+# Navigate to web directory
+cd web
+# Install dependencies
+npm install
+# Run dev server (with API proxy to localhost:9200)
+npm run dev
+# Build for production
+npm run build
+# Clean build artifacts
+rm -rf dist
+```
+The web UI dev server runs on `http://localhost:5173` with automatic API proxying to the admin service. See `web/README.md` for detailed documentation.
 ### Releasing
 This project uses [commit-and-tag-version](https://github.com/absolute-version/commit-and-tag-version) for automated releases based on conventional commits.
@@ -1024,12 +1418,20 @@ Contributions are welcome! If you'd like to contribute:
 ### Development Tips
+**CLI Development:**
 - Use `npm run dev -- <command>` to test commands without building
 - Check logs with `llamacpp server logs <server> --errors` when debugging
 - Test launchctl integration with `launchctl list | grep com.llama`
 - All server configs are in `~/.llamacpp/servers/`
 - Test interactive chat with `npm run dev -- server run <model>`
+**Web UI Development:**
+- Navigate to `web/` directory and run `npm run dev` for hot reload
+- API proxy automatically configured for `localhost:9200`
+- Update types in `web/src/types/api.ts` when API changes
+- Build with `npm run build` and test with admin service
+- See `web/README.md` for detailed web development guide
 ## Acknowledgments
 Built on top of the excellent [llama.cpp](https://github.com/ggerganov/llama.cpp) project by Georgi Gerganov and contributors.