@appkit/llamacpp-cli 1.14.1 → 2.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +276 -280
- package/dist/cli.js +133 -23
- package/dist/cli.js.map +1 -1
- package/dist/commands/admin/config.d.ts +1 -1
- package/dist/commands/admin/config.js +5 -5
- package/dist/commands/admin/config.js.map +1 -1
- package/dist/commands/admin/log-config.d.ts +11 -0
- package/dist/commands/admin/log-config.d.ts.map +1 -0
- package/dist/commands/admin/log-config.js +159 -0
- package/dist/commands/admin/log-config.js.map +1 -0
- package/dist/commands/admin/logs.d.ts +2 -3
- package/dist/commands/admin/logs.d.ts.map +1 -1
- package/dist/commands/admin/logs.js +6 -48
- package/dist/commands/admin/logs.js.map +1 -1
- package/dist/commands/admin/status.d.ts.map +1 -1
- package/dist/commands/admin/status.js +1 -0
- package/dist/commands/admin/status.js.map +1 -1
- package/dist/commands/config.d.ts +1 -0
- package/dist/commands/config.d.ts.map +1 -1
- package/dist/commands/config.js +63 -196
- package/dist/commands/config.js.map +1 -1
- package/dist/commands/create.d.ts +3 -2
- package/dist/commands/create.d.ts.map +1 -1
- package/dist/commands/create.js +24 -97
- package/dist/commands/create.js.map +1 -1
- package/dist/commands/delete.d.ts.map +1 -1
- package/dist/commands/delete.js +7 -24
- package/dist/commands/delete.js.map +1 -1
- package/dist/commands/internal/server-wrapper.d.ts +15 -0
- package/dist/commands/internal/server-wrapper.d.ts.map +1 -0
- package/dist/commands/internal/server-wrapper.js +126 -0
- package/dist/commands/internal/server-wrapper.js.map +1 -0
- package/dist/commands/logs-all.d.ts +0 -2
- package/dist/commands/logs-all.d.ts.map +1 -1
- package/dist/commands/logs-all.js +1 -61
- package/dist/commands/logs-all.js.map +1 -1
- package/dist/commands/logs.d.ts +2 -5
- package/dist/commands/logs.d.ts.map +1 -1
- package/dist/commands/logs.js +104 -120
- package/dist/commands/logs.js.map +1 -1
- package/dist/commands/migrate-labels.d.ts +12 -0
- package/dist/commands/migrate-labels.d.ts.map +1 -0
- package/dist/commands/migrate-labels.js +160 -0
- package/dist/commands/migrate-labels.js.map +1 -0
- package/dist/commands/ps.d.ts.map +1 -1
- package/dist/commands/ps.js +2 -1
- package/dist/commands/ps.js.map +1 -1
- package/dist/commands/rm.d.ts.map +1 -1
- package/dist/commands/rm.js +22 -48
- package/dist/commands/rm.js.map +1 -1
- package/dist/commands/router/config.d.ts +1 -1
- package/dist/commands/router/config.js +6 -6
- package/dist/commands/router/config.js.map +1 -1
- package/dist/commands/router/logs.d.ts +2 -4
- package/dist/commands/router/logs.d.ts.map +1 -1
- package/dist/commands/router/logs.js +34 -189
- package/dist/commands/router/logs.js.map +1 -1
- package/dist/commands/router/status.d.ts.map +1 -1
- package/dist/commands/router/status.js +1 -0
- package/dist/commands/router/status.js.map +1 -1
- package/dist/commands/server-show.d.ts.map +1 -1
- package/dist/commands/server-show.js +3 -0
- package/dist/commands/server-show.js.map +1 -1
- package/dist/commands/start.d.ts.map +1 -1
- package/dist/commands/start.js +21 -72
- package/dist/commands/start.js.map +1 -1
- package/dist/commands/stop.d.ts.map +1 -1
- package/dist/commands/stop.js +10 -26
- package/dist/commands/stop.js.map +1 -1
- package/dist/launchers/llamacpp-admin +8 -0
- package/dist/launchers/llamacpp-router +8 -0
- package/dist/launchers/llamacpp-server +8 -0
- package/dist/lib/admin-manager.d.ts +4 -0
- package/dist/lib/admin-manager.d.ts.map +1 -1
- package/dist/lib/admin-manager.js +42 -18
- package/dist/lib/admin-manager.js.map +1 -1
- package/dist/lib/admin-server.d.ts +48 -1
- package/dist/lib/admin-server.d.ts.map +1 -1
- package/dist/lib/admin-server.js +632 -238
- package/dist/lib/admin-server.js.map +1 -1
- package/dist/lib/config-generator.d.ts +1 -0
- package/dist/lib/config-generator.d.ts.map +1 -1
- package/dist/lib/config-generator.js +12 -5
- package/dist/lib/config-generator.js.map +1 -1
- package/dist/lib/keyboard-manager.d.ts +162 -0
- package/dist/lib/keyboard-manager.d.ts.map +1 -0
- package/dist/lib/keyboard-manager.js +247 -0
- package/dist/lib/keyboard-manager.js.map +1 -0
- package/dist/lib/label-migration.d.ts +65 -0
- package/dist/lib/label-migration.d.ts.map +1 -0
- package/dist/lib/label-migration.js +458 -0
- package/dist/lib/label-migration.js.map +1 -0
- package/dist/lib/launchctl-manager.d.ts +9 -0
- package/dist/lib/launchctl-manager.d.ts.map +1 -1
- package/dist/lib/launchctl-manager.js +65 -19
- package/dist/lib/launchctl-manager.js.map +1 -1
- package/dist/lib/log-management-service.d.ts +51 -0
- package/dist/lib/log-management-service.d.ts.map +1 -0
- package/dist/lib/log-management-service.js +124 -0
- package/dist/lib/log-management-service.js.map +1 -0
- package/dist/lib/log-workers.d.ts +70 -0
- package/dist/lib/log-workers.d.ts.map +1 -0
- package/dist/lib/log-workers.js +217 -0
- package/dist/lib/log-workers.js.map +1 -0
- package/dist/lib/model-downloader.d.ts +9 -1
- package/dist/lib/model-downloader.d.ts.map +1 -1
- package/dist/lib/model-downloader.js +98 -1
- package/dist/lib/model-downloader.js.map +1 -1
- package/dist/lib/model-management-service.d.ts +60 -0
- package/dist/lib/model-management-service.d.ts.map +1 -0
- package/dist/lib/model-management-service.js +246 -0
- package/dist/lib/model-management-service.js.map +1 -0
- package/dist/lib/model-management-service.test.d.ts +2 -0
- package/dist/lib/model-management-service.test.d.ts.map +1 -0
- package/dist/lib/model-management-service.test.js.map +1 -0
- package/dist/lib/model-scanner.d.ts +15 -3
- package/dist/lib/model-scanner.d.ts.map +1 -1
- package/dist/lib/model-scanner.js +174 -17
- package/dist/lib/model-scanner.js.map +1 -1
- package/dist/lib/openapi-spec.d.ts +1335 -0
- package/dist/lib/openapi-spec.d.ts.map +1 -0
- package/dist/lib/openapi-spec.js +1017 -0
- package/dist/lib/openapi-spec.js.map +1 -0
- package/dist/lib/router-logger.d.ts +1 -1
- package/dist/lib/router-logger.d.ts.map +1 -1
- package/dist/lib/router-logger.js +13 -11
- package/dist/lib/router-logger.js.map +1 -1
- package/dist/lib/router-manager.d.ts +4 -0
- package/dist/lib/router-manager.d.ts.map +1 -1
- package/dist/lib/router-manager.js +30 -18
- package/dist/lib/router-manager.js.map +1 -1
- package/dist/lib/router-server.d.ts +4 -7
- package/dist/lib/router-server.d.ts.map +1 -1
- package/dist/lib/router-server.js +71 -182
- package/dist/lib/router-server.js.map +1 -1
- package/dist/lib/server-config-service.d.ts +51 -0
- package/dist/lib/server-config-service.d.ts.map +1 -0
- package/dist/lib/server-config-service.js +310 -0
- package/dist/lib/server-config-service.js.map +1 -0
- package/dist/lib/server-config-service.test.d.ts +2 -0
- package/dist/lib/server-config-service.test.d.ts.map +1 -0
- package/dist/lib/server-config-service.test.js.map +1 -0
- package/dist/lib/server-lifecycle-service.d.ts +172 -0
- package/dist/lib/server-lifecycle-service.d.ts.map +1 -0
- package/dist/lib/server-lifecycle-service.js +619 -0
- package/dist/lib/server-lifecycle-service.js.map +1 -0
- package/dist/lib/state-manager.d.ts +18 -1
- package/dist/lib/state-manager.d.ts.map +1 -1
- package/dist/lib/state-manager.js +51 -2
- package/dist/lib/state-manager.js.map +1 -1
- package/dist/lib/status-checker.d.ts +11 -4
- package/dist/lib/status-checker.d.ts.map +1 -1
- package/dist/lib/status-checker.js +34 -1
- package/dist/lib/status-checker.js.map +1 -1
- package/dist/lib/validation-service.d.ts +43 -0
- package/dist/lib/validation-service.d.ts.map +1 -0
- package/dist/lib/validation-service.js +112 -0
- package/dist/lib/validation-service.js.map +1 -0
- package/dist/lib/validation-service.test.d.ts +2 -0
- package/dist/lib/validation-service.test.d.ts.map +1 -0
- package/dist/lib/validation-service.test.js.map +1 -0
- package/dist/scripts/http-log-filter.sh +8 -0
- package/dist/tui/ConfigApp.d.ts.map +1 -1
- package/dist/tui/ConfigApp.js +222 -184
- package/dist/tui/ConfigApp.js.map +1 -1
- package/dist/tui/HistoricalMonitorApp.d.ts.map +1 -1
- package/dist/tui/HistoricalMonitorApp.js +12 -0
- package/dist/tui/HistoricalMonitorApp.js.map +1 -1
- package/dist/tui/ModelsApp.d.ts.map +1 -1
- package/dist/tui/ModelsApp.js +93 -17
- package/dist/tui/ModelsApp.js.map +1 -1
- package/dist/tui/MonitorApp.d.ts.map +1 -1
- package/dist/tui/MonitorApp.js +1 -3
- package/dist/tui/MonitorApp.js.map +1 -1
- package/dist/tui/MultiServerMonitorApp.d.ts +3 -3
- package/dist/tui/MultiServerMonitorApp.d.ts.map +1 -1
- package/dist/tui/MultiServerMonitorApp.js +724 -508
- package/dist/tui/MultiServerMonitorApp.js.map +1 -1
- package/dist/tui/RootNavigator.d.ts.map +1 -1
- package/dist/tui/RootNavigator.js +17 -1
- package/dist/tui/RootNavigator.js.map +1 -1
- package/dist/tui/RouterApp.d.ts +6 -0
- package/dist/tui/RouterApp.d.ts.map +1 -0
- package/dist/tui/RouterApp.js +928 -0
- package/dist/tui/RouterApp.js.map +1 -0
- package/dist/tui/SearchApp.d.ts.map +1 -1
- package/dist/tui/SearchApp.js +27 -6
- package/dist/tui/SearchApp.js.map +1 -1
- package/dist/tui/shared/modal-controller.d.ts +65 -0
- package/dist/tui/shared/modal-controller.d.ts.map +1 -0
- package/dist/tui/shared/modal-controller.js +625 -0
- package/dist/tui/shared/modal-controller.js.map +1 -0
- package/dist/tui/shared/overlay-utils.d.ts +7 -0
- package/dist/tui/shared/overlay-utils.d.ts.map +1 -0
- package/dist/tui/shared/overlay-utils.js +54 -0
- package/dist/tui/shared/overlay-utils.js.map +1 -0
- package/dist/types/admin-config.d.ts +15 -2
- package/dist/types/admin-config.d.ts.map +1 -1
- package/dist/types/model-info.d.ts +5 -0
- package/dist/types/model-info.d.ts.map +1 -1
- package/dist/types/router-config.d.ts +2 -2
- package/dist/types/router-config.d.ts.map +1 -1
- package/dist/types/server-config.d.ts +8 -0
- package/dist/types/server-config.d.ts.map +1 -1
- package/dist/types/server-config.js +25 -0
- package/dist/types/server-config.js.map +1 -1
- package/dist/utils/http-log-filter.d.ts +10 -0
- package/dist/utils/http-log-filter.d.ts.map +1 -0
- package/dist/utils/http-log-filter.js +84 -0
- package/dist/utils/http-log-filter.js.map +1 -0
- package/dist/utils/log-parser.d.ts.map +1 -1
- package/dist/utils/log-parser.js +7 -4
- package/dist/utils/log-parser.js.map +1 -1
- package/dist/utils/log-utils.d.ts +59 -4
- package/dist/utils/log-utils.d.ts.map +1 -1
- package/dist/utils/log-utils.js +150 -11
- package/dist/utils/log-utils.js.map +1 -1
- package/dist/utils/shard-utils.d.ts +72 -0
- package/dist/utils/shard-utils.d.ts.map +1 -0
- package/dist/utils/shard-utils.js +168 -0
- package/dist/utils/shard-utils.js.map +1 -0
- package/package.json +18 -4
- package/src/launchers/llamacpp-admin +8 -0
- package/src/launchers/llamacpp-router +8 -0
- package/src/launchers/llamacpp-server +8 -0
- package/web/dist/assets/index-Byhoy86V.css +1 -0
- package/web/dist/assets/index-HSrgvray.js +50 -0
- package/web/dist/index.html +2 -2
- package/web/dist/assets/index-Bin89Lwr.css +0 -1
- package/web/dist/assets/index-CVmonw3T.js +0 -17
package/README.md
CHANGED
|
@@ -1,5 +1,7 @@
|
|
|
1
1
|
# llamacpp-cli
|
|
2
2
|
|
|
3
|
+
> **Note:** llamacpp-cli only works on **macOS** and requires [llama.cpp](https://github.com/ggerganov/llama.cpp) to be installed.
|
|
4
|
+
|
|
3
5
|
> Manage llama.cpp servers like Ollama—but faster. Full control over llama-server with macOS launchctl integration.
|
|
4
6
|
|
|
5
7
|
CLI tool to manage local llama.cpp servers on macOS. Provides an Ollama-like experience for managing GGUF models and llama-server instances, with **significantly faster response times** than Ollama.
|
|
@@ -12,6 +14,7 @@ CLI tool to manage local llama.cpp servers on macOS. Provides an Ollama-like exp
|
|
|
12
14
|
## Features
|
|
13
15
|
|
|
14
16
|
- 🚀 **Easy server management** - Start, stop, and monitor llama.cpp servers
|
|
17
|
+
- 🏷️ **Server aliases** - Friendly, stable identifiers that persist across model changes
|
|
15
18
|
- 🔀 **Unified router** - Single OpenAI-compatible endpoint for all models with automatic routing and request logging
|
|
16
19
|
- 🌐 **Admin Interface** - REST API + modern web UI for remote management and automation
|
|
17
20
|
- 🤖 **Model downloads** - Pull GGUF models from Hugging Face
|
|
@@ -19,7 +22,7 @@ CLI tool to manage local llama.cpp servers on macOS. Provides an Ollama-like exp
|
|
|
19
22
|
- ⚙️ **Smart defaults** - Auto-configure threads, context size, and GPU layers based on model size
|
|
20
23
|
- 🔌 **Auto port assignment** - Automatically find available ports (9000-9999)
|
|
21
24
|
- 📊 **Real-time monitoring TUI** - Multi-server dashboard with drill-down details, live GPU/CPU/memory metrics, token generation speeds, and animated loading states
|
|
22
|
-
- 🪵 **
|
|
25
|
+
- 🪵 **Unified logging** - Activity logs (HTTP requests) and System logs (diagnostics) for all services
|
|
23
26
|
- ⚡️ **Optimized metrics** - Batch collection and caching prevent CPU spikes (10x fewer processes)
|
|
24
27
|
|
|
25
28
|
## Why llamacpp-cli?
|
|
@@ -170,17 +173,21 @@ llamacpp
|
|
|
170
173
|
|
|
171
174
|

|
|
172
175
|
|
|
173
|
-
###
|
|
176
|
+
### Main Features
|
|
177
|
+
|
|
178
|
+
**Dashboard** - Monitor all servers at a glance with real-time metrics (GPU, CPU, memory, token speed)
|
|
179
|
+
|
|
180
|
+
**Server Management** - Create, start, stop, configure, and remove servers with inline editors
|
|
181
|
+
|
|
182
|
+
**Model Management** (press `M`) - Browse local models, search/download from HuggingFace, delete with cascade
|
|
174
183
|
|
|
175
|
-
|
|
176
|
-
- **Monitoring** - Real-time metrics for all servers (GPU, CPU, memory, token generation)
|
|
177
|
-
- **Server Management** - Create, start, stop, remove, and configure servers
|
|
178
|
-
- **Model Management** - Browse, search, download, and delete models
|
|
179
|
-
- **Historical Metrics** - View time-series charts of past performance
|
|
184
|
+
**Router Management** (press `R`) - Control router service, view configuration, access activity/system logs
|
|
180
185
|
|
|
181
|
-
|
|
186
|
+
**Historical Charts** (press `H`) - View time-series graphs with Recent (1-3min) or Hour (60min) views
|
|
182
187
|
|
|
183
|
-
|
|
188
|
+
**Logs** (press `L`) - Toggle between Activity (HTTP) and System (diagnostics) logs with auto-refresh
|
|
189
|
+
|
|
190
|
+
### Dashboard View
|
|
184
191
|
|
|
185
192
|
```
|
|
186
193
|
┌─────────────────────────────────────────────────────────┐
|
|
@@ -190,173 +197,14 @@ The main view shows all your servers at a glance:
|
|
|
190
197
|
│ Servers (3 running, 0 stopped) │
|
|
191
198
|
│ │ Server ID │ Port │ Status │ Slots │ tok/s │
|
|
192
199
|
│───┼────────────────┼──────┼────────┼───────┼──────────┤
|
|
193
|
-
│ ► │ llama-3-2-3b │ 9000 │ ● RUN │ 2/4 │ 245 │
|
|
200
|
+
│ ► │ llama-3-2-3b │ 9000 │ ● RUN │ 2/4 │ 245 │
|
|
194
201
|
│ │ qwen2-7b │ 9001 │ ● RUN │ 1/4 │ 198 │
|
|
195
202
|
│ │ llama-3-1-8b │ 9002 │ ○ IDLE │ 0/4 │ - │
|
|
196
203
|
└─────────────────────────────────────────────────────────┘
|
|
197
|
-
↑/↓ Navigate | Enter for details | [N]ew [M]odels [H]istory [Q]uit
|
|
204
|
+
↑/↓ Navigate | Enter for details | [N]ew [M]odels [R]outer [H]istory [Q]uit
|
|
198
205
|
```
|
|
199
206
|
|
|
200
|
-
|
|
201
|
-
- System resource overview (GPU, CPU, memory)
|
|
202
|
-
- List of all servers (running and stopped)
|
|
203
|
-
- Real-time status updates every 2 seconds
|
|
204
|
-
- Color-coded status indicators
|
|
205
|
-
- Navigate with arrow keys or vim keys (k/j)
|
|
206
|
-
|
|
207
|
-
### Single-Server Detail View
|
|
208
|
-
|
|
209
|
-
Press `Enter` on any server to see detailed information:
|
|
210
|
-
|
|
211
|
-
**Running servers show:**
|
|
212
|
-
- Server information (status, uptime, model name, endpoint)
|
|
213
|
-
- Request metrics (active/idle slots, prompt speed, generation speed)
|
|
214
|
-
- Active slots detail (per-slot token generation rates)
|
|
215
|
-
- System resources (GPU/CPU/ANE utilization, memory usage)
|
|
216
|
-
|
|
217
|
-
**Stopped servers show:**
|
|
218
|
-
- Server configuration (threads, context, GPU layers)
|
|
219
|
-
- Last activity timestamps
|
|
220
|
-
- Quick action commands (start, config, logs)
|
|
221
|
-
|
|
222
|
-
### Models Management
|
|
223
|
-
|
|
224
|
-
Press `M` from the main view to access Models Management.
|
|
225
|
-
|
|
226
|
-
**Features:**
|
|
227
|
-
- Browse all installed models with size and modified date
|
|
228
|
-
- View which servers are using each model
|
|
229
|
-
- Delete models with cascade option (removes associated servers)
|
|
230
|
-
- Search HuggingFace for new models
|
|
231
|
-
- Download models with real-time progress tracking
|
|
232
|
-
|
|
233
|
-
**Models View:**
|
|
234
|
-
- View all GGUF files in scrollable table
|
|
235
|
-
- Color-coded server usage (green = safe to delete, yellow = in use)
|
|
236
|
-
- Delete selected model with `Enter` or `D` key
|
|
237
|
-
- Confirmation dialog with cascade warning
|
|
238
|
-
|
|
239
|
-
**Search View** (press `S` from Models view):
|
|
240
|
-
- Search HuggingFace models by text input
|
|
241
|
-
- Browse results with downloads, likes, and file counts
|
|
242
|
-
- Expand model to show available GGUF files
|
|
243
|
-
- Download with real-time progress, speed, and ETA
|
|
244
|
-
- Cancel download with `ESC` (cleans up partial files)
|
|
245
|
-
|
|
246
|
-
### Server Operations
|
|
247
|
-
|
|
248
|
-
**Create Server** (press `N` from main view):
|
|
249
|
-
1. Select model from list (shows existing servers per model)
|
|
250
|
-
2. Edit configuration (threads, context size, GPU layers, port)
|
|
251
|
-
3. Review smart defaults based on model size
|
|
252
|
-
4. Create and automatically start server
|
|
253
|
-
5. Return to main view with new server visible
|
|
254
|
-
|
|
255
|
-
**Start/Stop Server** (press `S` from detail view):
|
|
256
|
-
- Toggle server state with progress modal
|
|
257
|
-
- Stays in detail view after operation
|
|
258
|
-
- Shows updated status immediately
|
|
259
|
-
|
|
260
|
-
**Remove Server** (press `R` from detail view):
|
|
261
|
-
- Confirmation dialog with option to delete model file
|
|
262
|
-
- Warns if other servers use the same model
|
|
263
|
-
- Cascade deletion removes all associated data
|
|
264
|
-
- Returns to main view after deletion
|
|
265
|
-
|
|
266
|
-
**Configure Server** (press `C` from detail view):
|
|
267
|
-
- Edit all server parameters inline
|
|
268
|
-
- Modal dialogs for different field types
|
|
269
|
-
- Model migration support (handles server ID changes)
|
|
270
|
-
- Automatic restart prompts for running servers
|
|
271
|
-
- Port conflict detection and validation
|
|
272
|
-
|
|
273
|
-
### Historical Monitoring
|
|
274
|
-
|
|
275
|
-
Press `H` from any view to see historical time-series charts.
|
|
276
|
-
|
|
277
|
-
**Single-Server Historical View:**
|
|
278
|
-
- Token generation speed over time
|
|
279
|
-
- GPU usage (%) with avg/max/min stats
|
|
280
|
-
- CPU usage (%) with avg/max/min
|
|
281
|
-
- Memory usage (%) with avg/max/min
|
|
282
|
-
- Auto-refresh every 3 seconds
|
|
283
|
-
|
|
284
|
-
**Multi-Server Historical View:**
|
|
285
|
-
- Aggregated metrics across all servers
|
|
286
|
-
- Total token generation speed (sum)
|
|
287
|
-
- System GPU usage (average)
|
|
288
|
-
- Total CPU usage (sum of per-process)
|
|
289
|
-
- Total memory usage (sum in GB)
|
|
290
|
-
|
|
291
|
-
**View Modes** (toggle with `H` key):
|
|
292
|
-
|
|
293
|
-
- **Recent View (default):**
|
|
294
|
-
- Shows last 40-80 samples (~1-3 minutes)
|
|
295
|
-
- Raw data with no downsampling - perfect accuracy
|
|
296
|
-
- Best for: "What's happening right now?"
|
|
297
|
-
|
|
298
|
-
- **Hour View:**
|
|
299
|
-
- Shows all ~1,800 samples from last hour
|
|
300
|
-
- Absolute time-aligned downsampling (30:1 ratio)
|
|
301
|
-
- Bucket max for GPU/CPU/token speed (preserves peaks)
|
|
302
|
-
- Bucket mean for memory (shows average)
|
|
303
|
-
- Chart stays perfectly stable as data streams in
|
|
304
|
-
- Best for: "What happened over the last hour?"
|
|
305
|
-
|
|
306
|
-
**Data Collection:**
|
|
307
|
-
- Automatic during monitoring (piggyback on polling loop)
|
|
308
|
-
- Stored in `~/.llamacpp/history/<server-id>.json` per server
|
|
309
|
-
- Retention: Last 24 hours (circular buffer, auto-prune)
|
|
310
|
-
- File size: ~21 MB per server for 24h @ 2s interval
|
|
311
|
-
|
|
312
|
-
### Keyboard Shortcuts
|
|
313
|
-
|
|
314
|
-
**List View (Multi-Server):**
|
|
315
|
-
- `↑/↓` or `k/j` - Navigate server list
|
|
316
|
-
- `Enter` - View details for selected server
|
|
317
|
-
- `N` - Create new server
|
|
318
|
-
- `M` - Switch to Models Management
|
|
319
|
-
- `H` - View historical metrics (all servers)
|
|
320
|
-
- `ESC` - Exit TUI
|
|
321
|
-
- `Q` - Quit immediately
|
|
322
|
-
|
|
323
|
-
**Detail View (Single-Server):**
|
|
324
|
-
- `S` - Start/Stop server (toggles based on status)
|
|
325
|
-
- `C` - Open configuration screen
|
|
326
|
-
- `R` - Remove server (with confirmation)
|
|
327
|
-
- `H` - View historical metrics (this server)
|
|
328
|
-
- `ESC` - Back to list view
|
|
329
|
-
- `Q` - Quit immediately
|
|
330
|
-
|
|
331
|
-
**Models View:**
|
|
332
|
-
- `↑/↓` or `k/j` - Navigate model list
|
|
333
|
-
- `Enter` or `D` - Delete selected model
|
|
334
|
-
- `S` - Open search view
|
|
335
|
-
- `R` - Refresh model list
|
|
336
|
-
- `ESC` - Back to main view
|
|
337
|
-
- `Q` - Quit immediately
|
|
338
|
-
|
|
339
|
-
**Search View:**
|
|
340
|
-
- `/` or `I` - Focus search input
|
|
341
|
-
- `Enter` (in input) - Execute search
|
|
342
|
-
- `↑/↓` or `k/j` - Navigate results or files
|
|
343
|
-
- `Enter` (on result) - Show GGUF files for model
|
|
344
|
-
- `Enter` (on file) - Download/install model
|
|
345
|
-
- `R` - Refresh results (re-execute search)
|
|
346
|
-
- `ESC` - Back to models view (or results list if viewing files)
|
|
347
|
-
- `Q` - Quit immediately
|
|
348
|
-
|
|
349
|
-
**Historical View:**
|
|
350
|
-
- `H` - Toggle between Recent/Hour view
|
|
351
|
-
- `ESC` - Return to live monitoring
|
|
352
|
-
- `Q` - Quit immediately
|
|
353
|
-
|
|
354
|
-
**Configuration Screen:**
|
|
355
|
-
- `↑/↓` or `k/j` - Navigate fields
|
|
356
|
-
- `Enter` - Open modal for selected field
|
|
357
|
-
- `S` - Save changes (prompts for restart if running)
|
|
358
|
-
- `ESC` - Cancel (prompts if unsaved changes)
|
|
359
|
-
- `Q` - Quit immediately
|
|
207
|
+
Navigate with arrow keys or vim keys (k/j). Press `Enter` on any server to see detailed metrics, active slots, and resource usage. All keyboard shortcuts are shown in the footer of each view.
|
|
360
208
|
|
|
361
209
|
### Optional: GPU/CPU Metrics
|
|
362
210
|
|
|
@@ -377,7 +225,7 @@ The `llamacpp server monitor` command is deprecated. Use `llamacpp` instead to l
|
|
|
377
225
|
|
|
378
226
|
## Router (Unified Endpoint)
|
|
379
227
|
|
|
380
|
-
The router provides a single
|
|
228
|
+
The router provides a single unified endpoint that automatically routes requests to the correct backend server based on the model name. Supports both OpenAI and Anthropic API formats. Perfect for LLM clients that don't support multiple endpoints.
|
|
381
229
|
|
|
382
230
|
### Quick Start
|
|
383
231
|
|
|
@@ -396,8 +244,8 @@ llamacpp router start # Start the router service
|
|
|
396
244
|
llamacpp router stop # Stop the router service
|
|
397
245
|
llamacpp router status # Show router status and available models
|
|
398
246
|
llamacpp router restart # Restart the router
|
|
399
|
-
llamacpp router config # Update router settings (--port, --host, --timeout, --health-interval
|
|
400
|
-
llamacpp router logs # View router logs (with --follow, --
|
|
247
|
+
llamacpp router config # Update router settings (--port, --host, --timeout, --health-interval)
|
|
248
|
+
llamacpp router logs # View router logs (with --follow, --activity, --system, --clear options)
|
|
401
249
|
```
|
|
402
250
|
|
|
403
251
|
### Usage Example
|
|
@@ -417,8 +265,22 @@ response = client.chat.completions.create(
|
|
|
417
265
|
model="llama-3.2-3b-instruct-q4_k_m.gguf",
|
|
418
266
|
messages=[{"role": "user", "content": "Hello!"}]
|
|
419
267
|
)
|
|
268
|
+
|
|
269
|
+
# Or use server aliases for cleaner code
|
|
270
|
+
response = client.chat.completions.create(
|
|
271
|
+
model="thinking", # Routes to server with alias "thinking"
|
|
272
|
+
messages=[{"role": "user", "content": "Hello!"}]
|
|
273
|
+
)
|
|
420
274
|
```
|
|
421
275
|
|
|
276
|
+
**Model Name Resolution:**
|
|
277
|
+
The router accepts model names in multiple formats:
|
|
278
|
+
- Full model filename: `llama-3.2-3b-instruct-q4_k_m.gguf`
|
|
279
|
+
- Server alias: `thinking` (set with `--alias` flag)
|
|
280
|
+
- Partial model name: `llama-3.2-3b` (fuzzy match)
|
|
281
|
+
|
|
282
|
+
Aliases provide a stable, friendly identifier that persists across model changes.
|
|
283
|
+
|
|
422
284
|
### Supported Endpoints
|
|
423
285
|
|
|
424
286
|
**OpenAI-Compatible:**
|
|
@@ -427,8 +289,8 @@ response = client.chat.completions.create(
|
|
|
427
289
|
- `GET /v1/models` - List all available models from running servers
|
|
428
290
|
|
|
429
291
|
**Anthropic-Compatible:**
|
|
430
|
-
- `POST /v1/messages` - Anthropic Messages API (with tool calling support)
|
|
431
|
-
- `POST /v1/messages/count_tokens` - Token counting
|
|
292
|
+
- `POST /v1/messages` - Anthropic Messages API (with streaming and tool calling support)
|
|
293
|
+
- `POST /v1/messages/count_tokens` - Token counting (estimated)
|
|
432
294
|
- `GET /v1/models/{model}` - Retrieve specific model info
|
|
433
295
|
|
|
434
296
|
**System:**
|
|
@@ -451,34 +313,28 @@ llamacpp router config --health-interval 3000 --restart
|
|
|
451
313
|
# Change bind address (for remote access)
|
|
452
314
|
llamacpp router config --host 0.0.0.0 --restart
|
|
453
315
|
|
|
454
|
-
# Enable verbose logging (saves detailed JSON logs)
|
|
455
|
-
llamacpp router config --verbose true --restart
|
|
456
|
-
|
|
457
|
-
# Disable verbose logging
|
|
458
|
-
llamacpp router config --verbose false --restart
|
|
459
316
|
```
|
|
460
317
|
|
|
461
318
|
**Note:** Changes require a restart to take effect. Use `--restart` flag to apply immediately.
|
|
462
319
|
|
|
463
320
|
### Logging
|
|
464
321
|
|
|
465
|
-
The router
|
|
322
|
+
The router provides two log types:
|
|
466
323
|
|
|
467
|
-
| Log
|
|
468
|
-
|
|
469
|
-
|
|
|
470
|
-
|
|
|
471
|
-
| `router.log` | Structured JSON | Detailed entries for programmatic parsing (verbose mode) |
|
|
324
|
+
| Log Type | CLI Flag | Content |
|
|
325
|
+
|----------|----------|---------|
|
|
326
|
+
| **Activity** | (default) | Request routing, status codes, timing, backend selection |
|
|
327
|
+
| **System** | `--system` | Startup, shutdown, errors, diagnostic messages |
|
|
472
328
|
|
|
473
|
-
**View
|
|
329
|
+
**View logs:**
|
|
474
330
|
```bash
|
|
475
|
-
#
|
|
331
|
+
# Activity logs (default) - router request routing
|
|
476
332
|
llamacpp router logs
|
|
477
333
|
|
|
478
|
-
#
|
|
479
|
-
llamacpp router logs --
|
|
334
|
+
# System logs - diagnostics and errors
|
|
335
|
+
llamacpp router logs --system
|
|
480
336
|
|
|
481
|
-
# Follow
|
|
337
|
+
# Follow logs in real-time
|
|
482
338
|
llamacpp router logs --follow
|
|
483
339
|
|
|
484
340
|
# Show last 10 lines
|
|
@@ -487,50 +343,38 @@ llamacpp router logs --lines 10
|
|
|
487
343
|
|
|
488
344
|
**Log formats:**
|
|
489
345
|
|
|
490
|
-
Activity logs
|
|
346
|
+
Activity logs:
|
|
491
347
|
```
|
|
492
348
|
200 POST /v1/chat/completions → llama-3.2-3b-instruct-q4_k_m.gguf (127.0.0.1:9001) 1234ms | "What is..."
|
|
493
349
|
404 POST /v1/chat/completions → unknown-model 3ms | "test" | Error: No server found
|
|
494
350
|
```
|
|
495
351
|
|
|
496
|
-
System logs
|
|
352
|
+
System logs:
|
|
497
353
|
```
|
|
498
354
|
[Router] Listening on http://127.0.0.1:9100
|
|
499
355
|
[Router] PID: 12345
|
|
500
356
|
[Router] Proxy request failed: ECONNREFUSED
|
|
501
357
|
```
|
|
502
358
|
|
|
503
|
-
Verbose JSON logs (router.log) - enable with `--verbose true`:
|
|
504
|
-
```bash
|
|
505
|
-
llamacpp router logs --verbose
|
|
506
|
-
```
|
|
507
|
-
|
|
508
359
|
**Log management:**
|
|
509
360
|
```bash
|
|
510
|
-
# Clear activity
|
|
361
|
+
# Clear current log file (activity or system)
|
|
511
362
|
llamacpp router logs --clear
|
|
512
363
|
|
|
513
|
-
# Clear all router logs (
|
|
364
|
+
# Clear all router logs (both activity and system)
|
|
514
365
|
llamacpp router logs --clear-all
|
|
515
366
|
|
|
516
367
|
# Rotate log files with timestamp
|
|
517
368
|
llamacpp router logs --rotate
|
|
518
|
-
|
|
519
|
-
# View system logs instead of activity
|
|
520
|
-
llamacpp router logs --stderr
|
|
521
369
|
```
|
|
522
370
|
|
|
523
|
-
**What's logged
|
|
524
|
-
- ✅ Model name
|
|
525
|
-
- ✅ HTTP status
|
|
371
|
+
**What's logged:**
|
|
372
|
+
- ✅ Model name and routing decisions
|
|
373
|
+
- ✅ HTTP status codes (color-coded)
|
|
526
374
|
- ✅ Request duration (ms)
|
|
527
|
-
- ✅ Backend server (host:port)
|
|
375
|
+
- ✅ Backend server selection (host:port)
|
|
528
376
|
- ✅ First 50 chars of prompt
|
|
529
|
-
- ✅ Error messages
|
|
530
|
-
|
|
531
|
-
**Verbose mode benefits:**
|
|
532
|
-
- Detailed JSON logs for LLM/script parsing
|
|
533
|
-
- Stored in `~/.llamacpp/logs/router.log`
|
|
377
|
+
- ✅ Error messages and diagnostics
|
|
534
378
|
- Automatic rotation when exceeding 100MB
|
|
535
379
|
- Machine-readable format with timestamps
|
|
536
380
|
|
|
@@ -674,8 +518,8 @@ llamacpp admin start # Start admin service
|
|
|
674
518
|
llamacpp admin stop # Stop admin service
|
|
675
519
|
llamacpp admin status # Show status and API key
|
|
676
520
|
llamacpp admin restart # Restart service
|
|
677
|
-
llamacpp admin config # Update settings (--port, --host, --regenerate-key
|
|
678
|
-
llamacpp admin logs # View admin logs (with --follow, --
|
|
521
|
+
llamacpp admin config # Update settings (--port, --host, --regenerate-key)
|
|
522
|
+
llamacpp admin logs # View admin logs (with --follow, --activity, --system, --clear options)
|
|
679
523
|
```
|
|
680
524
|
|
|
681
525
|
### REST API
|
|
@@ -686,6 +530,8 @@ The Admin API provides full CRUD operations for servers and models via HTTP.
|
|
|
686
530
|
|
|
687
531
|
**Authentication:** Bearer token (API key auto-generated on first start)
|
|
688
532
|
|
|
533
|
+
**API Documentation:** Interactive Swagger UI available at `http://localhost:9200/api-docs`
|
|
534
|
+
|
|
689
535
|
#### Server Endpoints
|
|
690
536
|
|
|
691
537
|
| Method | Endpoint | Description |
|
|
@@ -698,7 +544,7 @@ The Admin API provides full CRUD operations for servers and models via HTTP.
|
|
|
698
544
|
| POST | `/api/servers/:id/start` | Start stopped server |
|
|
699
545
|
| POST | `/api/servers/:id/stop` | Stop running server |
|
|
700
546
|
| POST | `/api/servers/:id/restart` | Restart server |
|
|
701
|
-
| GET | `/api/servers/:id/logs?type=
|
|
547
|
+
| GET | `/api/servers/:id/logs?type=activity\|system\|all&lines=100` | Get server logs (activity=HTTP, system=diagnostics) |
|
|
702
548
|
|
|
703
549
|
#### Model Endpoints
|
|
704
550
|
|
|
@@ -710,6 +556,17 @@ The Admin API provides full CRUD operations for servers and models via HTTP.
|
|
|
710
556
|
| GET | `/api/models/search?q=query` | Search HuggingFace |
|
|
711
557
|
| POST | `/api/models/download` | Download model from HF |
|
|
712
558
|
|
|
559
|
+
#### Router Endpoints
|
|
560
|
+
|
|
561
|
+
| Method | Endpoint | Description |
|
|
562
|
+
|--------|----------|-------------|
|
|
563
|
+
| GET | `/api/router` | Get router status and config |
|
|
564
|
+
| POST | `/api/router/start` | Start router service |
|
|
565
|
+
| POST | `/api/router/stop` | Stop router service |
|
|
566
|
+
| POST | `/api/router/restart` | Restart router service |
|
|
567
|
+
| PATCH | `/api/router` | Update router config |
|
|
568
|
+
| GET | `/api/router/logs?type=activity\|system&lines=100` | Get router logs (Activity from stdout, System from stderr) |
|
|
569
|
+
|
|
713
570
|
#### System Endpoints
|
|
714
571
|
|
|
715
572
|
| Method | Endpoint | Description |
|
|
@@ -750,6 +607,28 @@ curl -X DELETE "http://localhost:9200/api/models/llama-3.2-3b-instruct-q4_k_m.gg
|
|
|
750
607
|
-H "Authorization: Bearer YOUR_API_KEY"
|
|
751
608
|
```
|
|
752
609
|
|
|
610
|
+
**Get server logs:**
|
|
611
|
+
```bash
|
|
612
|
+
# Activity logs (HTTP requests) - default
|
|
613
|
+
curl "http://localhost:9200/api/servers/llama-3-2-3b/logs?type=activity&lines=50" \
|
|
614
|
+
-H "Authorization: Bearer YOUR_API_KEY"
|
|
615
|
+
|
|
616
|
+
# System logs (diagnostics)
|
|
617
|
+
curl "http://localhost:9200/api/servers/llama-3-2-3b/logs?type=system&lines=100" \
|
|
618
|
+
-H "Authorization: Bearer YOUR_API_KEY"
|
|
619
|
+
```
|
|
620
|
+
|
|
621
|
+
**Get router logs:**
|
|
622
|
+
```bash
|
|
623
|
+
# Activity logs (router requests)
|
|
624
|
+
curl "http://localhost:9200/api/router/logs?type=activity&lines=50" \
|
|
625
|
+
-H "Authorization: Bearer YOUR_API_KEY"
|
|
626
|
+
|
|
627
|
+
# System logs (diagnostics)
|
|
628
|
+
curl "http://localhost:9200/api/router/logs?type=system&lines=100" \
|
|
629
|
+
-H "Authorization: Bearer YOUR_API_KEY"
|
|
630
|
+
```
|
|
631
|
+
|
|
753
632
|
### Web UI
|
|
754
633
|
|
|
755
634
|
The web UI provides a modern, browser-based interface for managing servers and models.
|
|
@@ -809,8 +688,8 @@ llamacpp admin config --host 0.0.0.0 --restart
|
|
|
809
688
|
# Regenerate API key (invalidates old key)
|
|
810
689
|
llamacpp admin config --regenerate-key --restart
|
|
811
690
|
|
|
812
|
-
# Enable
|
|
813
|
-
llamacpp admin config --
|
|
691
|
+
# Enable logging
|
|
692
|
+
llamacpp admin config --logging true --restart
|
|
814
693
|
```
|
|
815
694
|
|
|
816
695
|
**Note:** Changes require a restart to take effect. Use `--restart` flag to apply immediately.
|
|
@@ -844,29 +723,31 @@ llamacpp admin config --regenerate-key --restart
|
|
|
844
723
|
|
|
845
724
|
### Logging
|
|
846
725
|
|
|
847
|
-
The admin service
|
|
726
|
+
The admin service provides two log types:
|
|
727
|
+
|
|
728
|
+
| Log Type | CLI Flag | Content |
|
|
729
|
+
|----------|----------|---------|
|
|
730
|
+
| **Activity** | `--activity` | HTTP API requests (endpoint, status, duration) |
|
|
731
|
+
| **System** | `--system` | Startup, shutdown, errors, diagnostic messages |
|
|
848
732
|
|
|
849
|
-
|
|
850
|
-
|----------|---------|---------|
|
|
851
|
-
| `admin.stdout` | Request activity | Endpoint, status, duration |
|
|
852
|
-
| `admin.stderr` | System messages | Startup, shutdown, errors |
|
|
733
|
+
**Default:** Shows both Activity and System logs (useful for debugging).
|
|
853
734
|
|
|
854
735
|
**View logs:**
|
|
855
736
|
```bash
|
|
856
|
-
#
|
|
737
|
+
# Both activity and system logs (default)
|
|
857
738
|
llamacpp admin logs
|
|
858
739
|
|
|
859
|
-
#
|
|
860
|
-
llamacpp admin logs --
|
|
740
|
+
# Activity logs only (HTTP API requests)
|
|
741
|
+
llamacpp admin logs --activity
|
|
742
|
+
|
|
743
|
+
# System logs only (diagnostics and errors)
|
|
744
|
+
llamacpp admin logs --system
|
|
861
745
|
|
|
862
746
|
# Follow in real-time
|
|
863
747
|
llamacpp admin logs --follow
|
|
864
748
|
|
|
865
749
|
# Clear all logs
|
|
866
750
|
llamacpp admin logs --clear
|
|
867
|
-
|
|
868
|
-
# Rotate logs with timestamp
|
|
869
|
-
llamacpp admin logs --rotate
|
|
870
751
|
```
|
|
871
752
|
|
|
872
753
|
### Example Output
|
|
@@ -910,8 +791,9 @@ Web UI: http://localhost:9200
|
|
|
910
791
|
|
|
911
792
|
Configuration:
|
|
912
793
|
Config: ~/.llamacpp/admin.json
|
|
913
|
-
Plist: ~/Library/LaunchAgents/
|
|
914
|
-
Logs: ~/.llamacpp/logs/admin.
|
|
794
|
+
Plist: ~/Library/LaunchAgents/studio.appkit.llamacpp-cli.admin.plist
|
|
795
|
+
Logs: ~/.llamacpp/logs/admin.stdout # Activity logs
|
|
796
|
+
~/.llamacpp/logs/admin.stderr # System logs
|
|
915
797
|
|
|
916
798
|
Quick Commands:
|
|
917
799
|
llamacpp admin stop # Stop service
|
|
@@ -1079,8 +961,8 @@ llamacpp logs --rotate
|
|
|
1079
961
|
```
|
|
1080
962
|
|
|
1081
963
|
**Displays:**
|
|
1082
|
-
-
|
|
1083
|
-
-
|
|
964
|
+
- Activity logs (.http) size per server
|
|
965
|
+
- System logs (.stderr, .stdout) size per server
|
|
1084
966
|
- Archived logs size and count
|
|
1085
967
|
- Total log usage per server
|
|
1086
968
|
- Grand total across all servers
|
|
@@ -1093,6 +975,64 @@ llamacpp logs --rotate
|
|
|
1093
975
|
|
|
1094
976
|
**Use case:** Quickly see which servers are accumulating large logs, or clean up all logs at once.
|
|
1095
977
|
|
|
978
|
+
## Server Aliases
|
|
979
|
+
|
|
980
|
+
Server aliases provide stable, user-friendly identifiers for your servers that persist across model changes. Instead of using auto-generated IDs like `llama-3-2-3b-instruct-q4-k-m`, you can use memorable names like `thinking`, `coder`, or `gpt-oss`.
|
|
981
|
+
|
|
982
|
+
### Why Use Aliases?
|
|
983
|
+
|
|
984
|
+
**Stability:** When you change a server's model, the server ID changes (because it's derived from the model name). Aliases stay the same, preventing broken references in scripts and workflows.
|
|
985
|
+
|
|
986
|
+
**Convenience:** Shorter, more memorable names are easier to type and read.
|
|
987
|
+
|
|
988
|
+
**Router Integration:** Aliases work with the router, allowing cleaner API requests.
|
|
989
|
+
|
|
990
|
+
### Usage Examples
|
|
991
|
+
|
|
992
|
+
```bash
|
|
993
|
+
# Create server with alias
|
|
994
|
+
llamacpp server create llama-3.2-3b-instruct-q4_k_m.gguf --alias thinking
|
|
995
|
+
|
|
996
|
+
# Use alias in all commands
|
|
997
|
+
llamacpp server start thinking
|
|
998
|
+
llamacpp server stop thinking
|
|
999
|
+
llamacpp server logs thinking
|
|
1000
|
+
llamacpp ps thinking
|
|
1001
|
+
|
|
1002
|
+
# Update alias
|
|
1003
|
+
llamacpp server config thinking --alias smart-model
|
|
1004
|
+
|
|
1005
|
+
# Remove alias
|
|
1006
|
+
llamacpp server config thinking --alias ""
|
|
1007
|
+
|
|
1008
|
+
# Alias persists across model changes
|
|
1009
|
+
llamacpp server config thinking --model mistral-7b.gguf --restart
|
|
1010
|
+
llamacpp server start thinking # Still works with new model!
|
|
1011
|
+
|
|
1012
|
+
# Use alias in router requests
|
|
1013
|
+
curl -X POST http://localhost:9100/v1/messages \
|
|
1014
|
+
-H "Content-Type: application/json" \
|
|
1015
|
+
-d '{"model": "thinking", "max_tokens": 100, "messages": [{"role": "user", "content": "Hello"}]}'
|
|
1016
|
+
```
|
|
1017
|
+
|
|
1018
|
+
### Validation Rules
|
|
1019
|
+
|
|
1020
|
+
- **Format:** Alphanumeric characters, hyphens, and underscores only
|
|
1021
|
+
- **Length:** 1-64 characters
|
|
1022
|
+
- **Uniqueness:** Case-insensitive (can't have both "Thinking" and "thinking")
|
|
1023
|
+
- **Reserved names:** Cannot use "router", "admin", or "server"
|
|
1024
|
+
- **Storage:** Case-sensitive (preserves your input)
|
|
1025
|
+
|
|
1026
|
+
### Lookup Priority
|
|
1027
|
+
|
|
1028
|
+
When you reference a server, the CLI checks identifiers in this order:
|
|
1029
|
+
1. **Alias** (exact match, case-sensitive)
|
|
1030
|
+
2. **Port** (if identifier is numeric)
|
|
1031
|
+
3. **Server ID** (exact match)
|
|
1032
|
+
4. **Model name** (fuzzy match)
|
|
1033
|
+
|
|
1034
|
+
This means aliases always take precedence, providing predictable behavior.
|
|
1035
|
+
|
|
1096
1036
|
## Server Management
|
|
1097
1037
|
|
|
1098
1038
|
### `llamacpp server create <model> [options]`
|
|
@@ -1102,11 +1042,21 @@ Create and start a new llama-server instance.
|
|
|
1102
1042
|
llamacpp server create llama-3.2-3b-instruct-q4_k_m.gguf
|
|
1103
1043
|
llamacpp server create llama-3.2-3b-instruct-q4_k_m.gguf --port 8080 --ctx-size 16384 --verbose
|
|
1104
1044
|
|
|
1045
|
+
# Create with a friendly alias
|
|
1046
|
+
llamacpp server create llama-3.2-3b-instruct-q4_k_m.gguf --alias thinking
|
|
1047
|
+
|
|
1048
|
+
# Create multiple servers with the same model (different configurations)
|
|
1049
|
+
llamacpp server create llama-3.2-3b-instruct-q4_k_m.gguf --ctx-size 8192 --alias short-context
|
|
1050
|
+
llamacpp server create llama-3.2-3b-instruct-q4_k_m.gguf --ctx-size 32768 --alias long-context
|
|
1051
|
+
|
|
1105
1052
|
# Enable remote access (WARNING: security implications)
|
|
1106
1053
|
llamacpp server create llama-3.2-3b-instruct-q4_k_m.gguf --host 0.0.0.0
|
|
1107
1054
|
```
|
|
1108
1055
|
|
|
1056
|
+
**Note:** You can create multiple servers using the same model file with different configurations (context size, GPU layers, etc.). Each server gets a unique ID automatically.
|
|
1057
|
+
|
|
1109
1058
|
**Options:**
|
|
1059
|
+
- `-a, --alias <name>` - Friendly alias for the server (alphanumeric, hyphens, underscores, 1-64 chars)
|
|
1110
1060
|
- `-p, --port <number>` - Port number (default: auto-assign from 9000)
|
|
1111
1061
|
- `-h, --host <address>` - Bind address (default: `127.0.0.1` for localhost only, use `0.0.0.0` for remote access)
|
|
1112
1062
|
- `-t, --threads <number>` - Thread count (default: half of CPU cores)
|
|
@@ -1122,11 +1072,12 @@ Show detailed configuration and status information for a server.
|
|
|
1122
1072
|
```bash
|
|
1123
1073
|
llamacpp server show llama-3.2-3b # By partial name
|
|
1124
1074
|
llamacpp server show 9000 # By port
|
|
1075
|
+
llamacpp server show thinking # By alias
|
|
1125
1076
|
llamacpp server show llama-3-2-3b # By server ID
|
|
1126
1077
|
```
|
|
1127
1078
|
|
|
1128
1079
|
**Displays:**
|
|
1129
|
-
- Server ID, model name, and path
|
|
1080
|
+
- Server ID, alias (if set), model name, and path
|
|
1130
1081
|
- Current status (running/stopped/crashed)
|
|
1131
1082
|
- Host and port
|
|
1132
1083
|
- PID (process ID)
|
|
@@ -1136,7 +1087,7 @@ llamacpp server show llama-3-2-3b # By server ID
|
|
|
1136
1087
|
- System paths (plist file, log files)
|
|
1137
1088
|
- Quick commands for common next actions
|
|
1138
1089
|
|
|
1139
|
-
**Identifiers:**
|
|
1090
|
+
**Identifiers:** Alias, port number, server ID, partial model name
|
|
1140
1091
|
|
|
1141
1092
|
### `llamacpp server config <identifier> [options]`
|
|
1142
1093
|
Update server configuration parameters without recreating the server.
|
|
@@ -1145,6 +1096,12 @@ Update server configuration parameters without recreating the server.
|
|
|
1145
1096
|
# Change model while keeping all other settings
|
|
1146
1097
|
llamacpp server config llama-3.2-3b --model llama-3.2-1b-instruct-q4_k_m.gguf --restart
|
|
1147
1098
|
|
|
1099
|
+
# Add or update alias
|
|
1100
|
+
llamacpp server config llama-3.2-3b --alias thinking
|
|
1101
|
+
|
|
1102
|
+
# Remove alias (use empty string)
|
|
1103
|
+
llamacpp server config thinking --alias ""
|
|
1104
|
+
|
|
1148
1105
|
# Update context size and restart
|
|
1149
1106
|
llamacpp server config llama-3.2-3b --ctx-size 8192 --restart
|
|
1150
1107
|
|
|
@@ -1162,6 +1119,7 @@ llamacpp server config llama-3.2-3b --threads 8 --ctx-size 16384 --gpu-layers 40
|
|
|
1162
1119
|
```
|
|
1163
1120
|
|
|
1164
1121
|
**Options:**
|
|
1122
|
+
- `-a, --alias <name>` - Set or update alias (use empty string `""` to remove)
|
|
1165
1123
|
- `-m, --model <filename>` - Update model (filename or path)
|
|
1166
1124
|
- `-h, --host <address>` - Update bind address (`127.0.0.1` for localhost, `0.0.0.0` for remote access)
|
|
1167
1125
|
- `-t, --threads <number>` - Update thread count
|
|
@@ -1171,22 +1129,23 @@ llamacpp server config llama-3.2-3b --threads 8 --ctx-size 16384 --gpu-layers 40
|
|
|
1171
1129
|
- `--no-verbose` - Disable verbose logging
|
|
1172
1130
|
- `-r, --restart` - Automatically restart server if running
|
|
1173
1131
|
|
|
1174
|
-
**Note:** Changes require a server restart to take effect. Use `--restart` to automatically stop and start the server with the new configuration.
|
|
1132
|
+
**Note:** Changes require a server restart to take effect. Use `--restart` to automatically stop and start the server with the new configuration. Aliases persist across model changes, providing a stable identifier for your server.
|
|
1175
1133
|
|
|
1176
1134
|
**⚠️ Security Warning:** Using `--host 0.0.0.0` binds the server to all network interfaces, allowing remote access. Only use this if you understand the security implications.
|
|
1177
1135
|
|
|
1178
|
-
**Identifiers:**
|
|
1136
|
+
**Identifiers:** Alias, port number, server ID, partial model name
|
|
1179
1137
|
|
|
1180
1138
|
### `llamacpp server start <identifier>`
|
|
1181
1139
|
Start an existing stopped server.
|
|
1182
1140
|
|
|
1183
1141
|
```bash
|
|
1142
|
+
llamacpp server start thinking # By alias
|
|
1184
1143
|
llamacpp server start llama-3.2-3b # By partial name
|
|
1185
1144
|
llamacpp server start 9000 # By port
|
|
1186
1145
|
llamacpp server start llama-3-2-3b # By server ID
|
|
1187
1146
|
```
|
|
1188
1147
|
|
|
1189
|
-
**Identifiers:**
|
|
1148
|
+
**Identifiers:** Alias, port number, server ID, partial model name, or model filename
|
|
1190
1149
|
|
|
1191
1150
|
### `llamacpp server run <identifier> [options]`
|
|
1192
1151
|
Run an interactive chat session with a model, or send a single message.
|
|
@@ -1226,41 +1185,44 @@ llamacpp server rm 9000
|
|
|
1226
1185
|
```
|
|
1227
1186
|
|
|
1228
1187
|
### `llamacpp server logs <identifier> [options]`
|
|
1229
|
-
View server logs with smart filtering.
|
|
1230
1188
|
|
|
1231
|
-
|
|
1232
|
-
|
|
1233
|
-
|
|
1234
|
-
|
|
1235
|
-
|
|
1189
|
+
View server logs with flexible filtering.
|
|
1190
|
+
|
|
1191
|
+
**Log Types:**
|
|
1192
|
+
- **Activity logs** (default): HTTP request/response logs in compact format
|
|
1193
|
+
- **System logs** (`--system`): Server diagnostic output (stderr + stdout)
|
|
1236
1194
|
|
|
1237
|
-
**
|
|
1195
|
+
**Basic usage:**
|
|
1238
1196
|
```bash
|
|
1197
|
+
# Activity logs (default) - HTTP requests
|
|
1239
1198
|
llamacpp server logs llama-3.2-3b
|
|
1240
|
-
# Output:
|
|
1241
|
-
```
|
|
1242
|
-
|
|
1243
|
-
**More examples:**
|
|
1199
|
+
# Output: 2025-12-09 18:02:23 POST /v1/chat/completions 127.0.0.1 200 "What is..." 305 22 1036
|
|
1244
1200
|
|
|
1245
|
-
#
|
|
1246
|
-
llamacpp server logs llama-3.2-3b --
|
|
1201
|
+
# System logs - diagnostics and errors
|
|
1202
|
+
llamacpp server logs llama-3.2-3b --system
|
|
1247
1203
|
|
|
1248
1204
|
# Follow logs in real-time
|
|
1249
1205
|
llamacpp server logs llama-3.2-3b --follow
|
|
1250
1206
|
|
|
1251
|
-
# Last 100
|
|
1207
|
+
# Last 100 lines
|
|
1252
1208
|
llamacpp server logs llama-3.2-3b --lines 100
|
|
1209
|
+
```
|
|
1253
1210
|
|
|
1254
|
-
|
|
1255
|
-
|
|
1211
|
+
**Advanced filtering:**
|
|
1212
|
+
```bash
|
|
1213
|
+
# System logs with errors only
|
|
1214
|
+
llamacpp server logs llama-3.2-3b --system --errors
|
|
1256
1215
|
|
|
1257
|
-
#
|
|
1258
|
-
llamacpp server logs llama-3.2-3b --
|
|
1216
|
+
# Custom grep pattern
|
|
1217
|
+
llamacpp server logs llama-3.2-3b --system --filter "error|warning"
|
|
1259
1218
|
|
|
1260
|
-
#
|
|
1261
|
-
llamacpp server logs llama-3.2-3b --
|
|
1219
|
+
# Include health check requests (filtered by default)
|
|
1220
|
+
llamacpp server logs llama-3.2-3b --include-health
|
|
1221
|
+
```
|
|
1262
1222
|
|
|
1263
|
-
|
|
1223
|
+
**Log management:**
|
|
1224
|
+
```bash
|
|
1225
|
+
# Clear current log file (truncate to zero bytes)
|
|
1264
1226
|
llamacpp server logs llama-3.2-3b --clear
|
|
1265
1227
|
|
|
1266
1228
|
# Delete only archived logs (preserves current)
|
|
@@ -1276,15 +1238,15 @@ llamacpp server logs llama-3.2-3b --rotate
|
|
|
1276
1238
|
**Options:**
|
|
1277
1239
|
- `-f, --follow` - Follow log output in real-time
|
|
1278
1240
|
- `-n, --lines <number>` - Number of lines to show (default: 50)
|
|
1279
|
-
- `--
|
|
1280
|
-
- `--
|
|
1281
|
-
- `--
|
|
1241
|
+
- `--activity` - Show HTTP activity logs (default)
|
|
1242
|
+
- `--system` - Show system logs (all server output)
|
|
1243
|
+
- `--errors` - Filter system logs for errors only
|
|
1282
1244
|
- `--filter <pattern>` - Custom grep pattern for filtering
|
|
1283
|
-
- `--
|
|
1245
|
+
- `--include-health` - Include health check requests (/health, /slots, /props)
|
|
1284
1246
|
- `--clear` - Clear (truncate) log file to zero bytes
|
|
1285
1247
|
- `--clear-archived` - Delete only archived logs (preserves current logs)
|
|
1286
1248
|
- `--clear-all` - Clear current logs AND delete all archived logs (frees most space)
|
|
1287
|
-
- `--rotate` - Rotate log file with timestamp (e.g., `server.2026-01-22-19-30-00.
|
|
1249
|
+
- `--rotate` - Rotate log file with timestamp (e.g., `server.2026-01-22-19-30-00.http`)
|
|
1288
1250
|
|
|
1289
1251
|
**Automatic Log Rotation:**
|
|
1290
1252
|
Logs are automatically rotated when they exceed 100MB during:
|
|
@@ -1293,9 +1255,7 @@ Logs are automatically rotated when they exceed 100MB during:
|
|
|
1293
1255
|
|
|
1294
1256
|
Rotated logs are saved with timestamps in the same directory: `~/.llamacpp/logs/`
|
|
1295
1257
|
|
|
1296
|
-
**
|
|
1297
|
-
|
|
1298
|
-
Default compact format:
|
|
1258
|
+
**Activity Log Format:**
|
|
1299
1259
|
```
|
|
1300
1260
|
TIMESTAMP METHOD ENDPOINT IP STATUS "MESSAGE..." TOKENS_IN TOKENS_OUT TIME_MS
|
|
1301
1261
|
```
|
|
@@ -1304,10 +1264,7 @@ The compact format shows one line per HTTP request and includes:
|
|
|
1304
1264
|
- User's message (first 50 characters)
|
|
1305
1265
|
- Token counts (prompt tokens in, completion tokens out)
|
|
1306
1266
|
- Total response time in milliseconds
|
|
1307
|
-
|
|
1308
|
-
**Note:** Verbose logging is now enabled by default. HTTP request logs are available by default.
|
|
1309
|
-
|
|
1310
|
-
Use `--http` to see full request/response JSON, or `--verbose` option to see all internal server logs.
|
|
1267
|
+
- Health checks filtered by default (use `--include-health` to show)
|
|
1311
1268
|
|
|
1312
1269
|
## Configuration
|
|
1313
1270
|
|
|
@@ -1320,11 +1277,14 @@ llamacpp-cli stores its configuration in `~/.llamacpp/`:
|
|
|
1320
1277
|
├── admin.json # Admin service configuration (includes API key)
|
|
1321
1278
|
├── servers/ # Server configurations
|
|
1322
1279
|
│ └── <server-id>.json
|
|
1323
|
-
├── logs/ #
|
|
1324
|
-
│ ├── <server-id>.
|
|
1325
|
-
│ ├── <server-id>.stderr
|
|
1326
|
-
│ ├──
|
|
1327
|
-
│
|
|
1280
|
+
├── logs/ # All service logs
|
|
1281
|
+
│ ├── <server-id>.http # Activity: HTTP request logs
|
|
1282
|
+
│ ├── <server-id>.stderr # System: diagnostics
|
|
1283
|
+
│ ├── <server-id>.stdout # System: additional output
|
|
1284
|
+
│ ├── router.stdout # Router activity logs
|
|
1285
|
+
│ ├── router.stderr # Router system logs
|
|
1286
|
+
│ ├── admin.stdout # Admin activity logs
|
|
1287
|
+
│ └── admin.stderr # Admin system logs
|
|
1328
1288
|
└── history/ # Historical metrics (TUI)
|
|
1329
1289
|
└── <server-id>.json
|
|
1330
1290
|
```
|
|
@@ -1342,6 +1302,12 @@ llamacpp-cli automatically configures optimal settings based on model size:
|
|
|
1342
1302
|
|
|
1343
1303
|
All servers include `--embeddings` and `--jinja` flags by default.
|
|
1344
1304
|
|
|
1305
|
+
**GPU Layers explained:**
|
|
1306
|
+
- **Default: 60** - Conservative value that works reliably on all Apple Silicon devices
|
|
1307
|
+
- **-1 (all)** - Maximum performance, uses all available GPU layers. May cause OOM on very large models with limited VRAM.
|
|
1308
|
+
- **0 (CPU only)** - Useful for testing or when GPU is busy with other tasks
|
|
1309
|
+
- **Specific number** - Fine-tune based on your GPU memory and model size
|
|
1310
|
+
|
|
1345
1311
|
## How It Works
|
|
1346
1312
|
|
|
1347
1313
|
llamacpp-cli uses macOS launchctl to manage llama-server processes:
|
|
@@ -1351,7 +1317,7 @@ llamacpp-cli uses macOS launchctl to manage llama-server processes:
|
|
|
1351
1317
|
3. Starts the server with `launchctl start`
|
|
1352
1318
|
4. Monitors status via `launchctl list` and `lsof`
|
|
1353
1319
|
|
|
1354
|
-
Services are named `
|
|
1320
|
+
Services are named `studio.appkit.llamacpp-cli.<model-id>`.
|
|
1355
1321
|
|
|
1356
1322
|
**Auto-Restart Behavior:**
|
|
1357
1323
|
- When you **start** a server, it's registered with launchd and will auto-restart on crash
|
|
@@ -1359,8 +1325,8 @@ Services are named `com.llama.<model-id>`.
|
|
|
1359
1325
|
- Crashed servers will automatically restart (when loaded)
|
|
1360
1326
|
|
|
1361
1327
|
**Router and Admin Services:**
|
|
1362
|
-
- The **Router** (`
|
|
1363
|
-
- The **Admin** (`
|
|
1328
|
+
- The **Router** (`studio.appkit.llamacpp-cli.router`) provides a unified OpenAI-compatible endpoint for all models
|
|
1329
|
+
- The **Admin** (`studio.appkit.llamacpp-cli.admin`) provides REST API + web UI for remote management
|
|
1364
1330
|
- Both run as launchctl services similar to individual model servers
|
|
1365
1331
|
|
|
1366
1332
|
## Known Limitations
|
|
@@ -1421,6 +1387,36 @@ Or regenerate a new one:
|
|
|
1421
1387
|
llamacpp admin config --regenerate-key --restart
|
|
1422
1388
|
```
|
|
1423
1389
|
|
|
1390
|
+
### `llamacpp migrate-labels`
|
|
1391
|
+
Migrate service labels from old format (`com.llama.*`) to new format (`studio.appkit.llamacpp-cli.*`).
|
|
1392
|
+
|
|
1393
|
+
> **Note:** This command is automatically triggered on first run after upgrading from versions prior to v2.1.0.
|
|
1394
|
+
|
|
1395
|
+
```bash
|
|
1396
|
+
# Show what would be migrated without making changes
|
|
1397
|
+
llamacpp migrate-labels --dry-run
|
|
1398
|
+
|
|
1399
|
+
# Perform migration (with confirmation prompt)
|
|
1400
|
+
llamacpp migrate-labels
|
|
1401
|
+
|
|
1402
|
+
# Skip confirmation prompt
|
|
1403
|
+
llamacpp migrate-labels --force
|
|
1404
|
+
```
|
|
1405
|
+
|
|
1406
|
+
**What it does:**
|
|
1407
|
+
1. Creates a backup of all current configurations
|
|
1408
|
+
2. Stops running services
|
|
1409
|
+
3. Updates service labels and plist files
|
|
1410
|
+
4. Restarts services that were running
|
|
1411
|
+
5. Creates a marker file to prevent re-migration
|
|
1412
|
+
|
|
1413
|
+
**Troubleshooting:**
|
|
1414
|
+
If migration fails, configurations are automatically rolled back. You can also manually rollback:
|
|
1415
|
+
|
|
1416
|
+
```bash
|
|
1417
|
+
llamacpp rollback-labels
|
|
1418
|
+
```
|
|
1419
|
+
|
|
1424
1420
|
## Development
|
|
1425
1421
|
|
|
1426
1422
|
### CLI Development
|
|
@@ -1536,7 +1532,7 @@ Contributions are welcome! If you'd like to contribute:
|
|
|
1536
1532
|
**CLI Development:**
|
|
1537
1533
|
- Use `npm run dev -- <command>` to test commands without building
|
|
1538
1534
|
- Check logs with `llamacpp server logs <server> --errors` when debugging
|
|
1539
|
-
- Test launchctl integration with `launchctl list | grep
|
|
1535
|
+
- Test launchctl integration with `launchctl list | grep studio.appkit.llamacpp-cli`
|
|
1540
1536
|
- All server configs are in `~/.llamacpp/servers/`
|
|
1541
1537
|
- Test interactive chat with `npm run dev -- server run <model>`
|
|
1542
1538
|
|