@appkit/llamacpp-cli 1.14.1 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (230) hide show
  1. package/README.md +276 -280
  2. package/dist/cli.js +133 -23
  3. package/dist/cli.js.map +1 -1
  4. package/dist/commands/admin/config.d.ts +1 -1
  5. package/dist/commands/admin/config.js +5 -5
  6. package/dist/commands/admin/config.js.map +1 -1
  7. package/dist/commands/admin/log-config.d.ts +11 -0
  8. package/dist/commands/admin/log-config.d.ts.map +1 -0
  9. package/dist/commands/admin/log-config.js +159 -0
  10. package/dist/commands/admin/log-config.js.map +1 -0
  11. package/dist/commands/admin/logs.d.ts +2 -3
  12. package/dist/commands/admin/logs.d.ts.map +1 -1
  13. package/dist/commands/admin/logs.js +6 -48
  14. package/dist/commands/admin/logs.js.map +1 -1
  15. package/dist/commands/admin/status.d.ts.map +1 -1
  16. package/dist/commands/admin/status.js +1 -0
  17. package/dist/commands/admin/status.js.map +1 -1
  18. package/dist/commands/config.d.ts +1 -0
  19. package/dist/commands/config.d.ts.map +1 -1
  20. package/dist/commands/config.js +63 -196
  21. package/dist/commands/config.js.map +1 -1
  22. package/dist/commands/create.d.ts +3 -2
  23. package/dist/commands/create.d.ts.map +1 -1
  24. package/dist/commands/create.js +24 -97
  25. package/dist/commands/create.js.map +1 -1
  26. package/dist/commands/delete.d.ts.map +1 -1
  27. package/dist/commands/delete.js +7 -24
  28. package/dist/commands/delete.js.map +1 -1
  29. package/dist/commands/internal/server-wrapper.d.ts +15 -0
  30. package/dist/commands/internal/server-wrapper.d.ts.map +1 -0
  31. package/dist/commands/internal/server-wrapper.js +126 -0
  32. package/dist/commands/internal/server-wrapper.js.map +1 -0
  33. package/dist/commands/logs-all.d.ts +0 -2
  34. package/dist/commands/logs-all.d.ts.map +1 -1
  35. package/dist/commands/logs-all.js +1 -61
  36. package/dist/commands/logs-all.js.map +1 -1
  37. package/dist/commands/logs.d.ts +2 -5
  38. package/dist/commands/logs.d.ts.map +1 -1
  39. package/dist/commands/logs.js +104 -120
  40. package/dist/commands/logs.js.map +1 -1
  41. package/dist/commands/migrate-labels.d.ts +12 -0
  42. package/dist/commands/migrate-labels.d.ts.map +1 -0
  43. package/dist/commands/migrate-labels.js +160 -0
  44. package/dist/commands/migrate-labels.js.map +1 -0
  45. package/dist/commands/ps.d.ts.map +1 -1
  46. package/dist/commands/ps.js +2 -1
  47. package/dist/commands/ps.js.map +1 -1
  48. package/dist/commands/rm.d.ts.map +1 -1
  49. package/dist/commands/rm.js +22 -48
  50. package/dist/commands/rm.js.map +1 -1
  51. package/dist/commands/router/config.d.ts +1 -1
  52. package/dist/commands/router/config.js +6 -6
  53. package/dist/commands/router/config.js.map +1 -1
  54. package/dist/commands/router/logs.d.ts +2 -4
  55. package/dist/commands/router/logs.d.ts.map +1 -1
  56. package/dist/commands/router/logs.js +34 -189
  57. package/dist/commands/router/logs.js.map +1 -1
  58. package/dist/commands/router/status.d.ts.map +1 -1
  59. package/dist/commands/router/status.js +1 -0
  60. package/dist/commands/router/status.js.map +1 -1
  61. package/dist/commands/server-show.d.ts.map +1 -1
  62. package/dist/commands/server-show.js +3 -0
  63. package/dist/commands/server-show.js.map +1 -1
  64. package/dist/commands/start.d.ts.map +1 -1
  65. package/dist/commands/start.js +21 -72
  66. package/dist/commands/start.js.map +1 -1
  67. package/dist/commands/stop.d.ts.map +1 -1
  68. package/dist/commands/stop.js +10 -26
  69. package/dist/commands/stop.js.map +1 -1
  70. package/dist/launchers/llamacpp-admin +8 -0
  71. package/dist/launchers/llamacpp-router +8 -0
  72. package/dist/launchers/llamacpp-server +8 -0
  73. package/dist/lib/admin-manager.d.ts +4 -0
  74. package/dist/lib/admin-manager.d.ts.map +1 -1
  75. package/dist/lib/admin-manager.js +42 -18
  76. package/dist/lib/admin-manager.js.map +1 -1
  77. package/dist/lib/admin-server.d.ts +48 -1
  78. package/dist/lib/admin-server.d.ts.map +1 -1
  79. package/dist/lib/admin-server.js +632 -238
  80. package/dist/lib/admin-server.js.map +1 -1
  81. package/dist/lib/config-generator.d.ts +1 -0
  82. package/dist/lib/config-generator.d.ts.map +1 -1
  83. package/dist/lib/config-generator.js +12 -5
  84. package/dist/lib/config-generator.js.map +1 -1
  85. package/dist/lib/keyboard-manager.d.ts +162 -0
  86. package/dist/lib/keyboard-manager.d.ts.map +1 -0
  87. package/dist/lib/keyboard-manager.js +247 -0
  88. package/dist/lib/keyboard-manager.js.map +1 -0
  89. package/dist/lib/label-migration.d.ts +65 -0
  90. package/dist/lib/label-migration.d.ts.map +1 -0
  91. package/dist/lib/label-migration.js +458 -0
  92. package/dist/lib/label-migration.js.map +1 -0
  93. package/dist/lib/launchctl-manager.d.ts +9 -0
  94. package/dist/lib/launchctl-manager.d.ts.map +1 -1
  95. package/dist/lib/launchctl-manager.js +65 -19
  96. package/dist/lib/launchctl-manager.js.map +1 -1
  97. package/dist/lib/log-management-service.d.ts +51 -0
  98. package/dist/lib/log-management-service.d.ts.map +1 -0
  99. package/dist/lib/log-management-service.js +124 -0
  100. package/dist/lib/log-management-service.js.map +1 -0
  101. package/dist/lib/log-workers.d.ts +70 -0
  102. package/dist/lib/log-workers.d.ts.map +1 -0
  103. package/dist/lib/log-workers.js +217 -0
  104. package/dist/lib/log-workers.js.map +1 -0
  105. package/dist/lib/model-downloader.d.ts +9 -1
  106. package/dist/lib/model-downloader.d.ts.map +1 -1
  107. package/dist/lib/model-downloader.js +98 -1
  108. package/dist/lib/model-downloader.js.map +1 -1
  109. package/dist/lib/model-management-service.d.ts +60 -0
  110. package/dist/lib/model-management-service.d.ts.map +1 -0
  111. package/dist/lib/model-management-service.js +246 -0
  112. package/dist/lib/model-management-service.js.map +1 -0
  113. package/dist/lib/model-management-service.test.d.ts +2 -0
  114. package/dist/lib/model-management-service.test.d.ts.map +1 -0
  115. package/dist/lib/model-management-service.test.js.map +1 -0
  116. package/dist/lib/model-scanner.d.ts +15 -3
  117. package/dist/lib/model-scanner.d.ts.map +1 -1
  118. package/dist/lib/model-scanner.js +174 -17
  119. package/dist/lib/model-scanner.js.map +1 -1
  120. package/dist/lib/openapi-spec.d.ts +1335 -0
  121. package/dist/lib/openapi-spec.d.ts.map +1 -0
  122. package/dist/lib/openapi-spec.js +1017 -0
  123. package/dist/lib/openapi-spec.js.map +1 -0
  124. package/dist/lib/router-logger.d.ts +1 -1
  125. package/dist/lib/router-logger.d.ts.map +1 -1
  126. package/dist/lib/router-logger.js +13 -11
  127. package/dist/lib/router-logger.js.map +1 -1
  128. package/dist/lib/router-manager.d.ts +4 -0
  129. package/dist/lib/router-manager.d.ts.map +1 -1
  130. package/dist/lib/router-manager.js +30 -18
  131. package/dist/lib/router-manager.js.map +1 -1
  132. package/dist/lib/router-server.d.ts +4 -7
  133. package/dist/lib/router-server.d.ts.map +1 -1
  134. package/dist/lib/router-server.js +71 -182
  135. package/dist/lib/router-server.js.map +1 -1
  136. package/dist/lib/server-config-service.d.ts +51 -0
  137. package/dist/lib/server-config-service.d.ts.map +1 -0
  138. package/dist/lib/server-config-service.js +310 -0
  139. package/dist/lib/server-config-service.js.map +1 -0
  140. package/dist/lib/server-config-service.test.d.ts +2 -0
  141. package/dist/lib/server-config-service.test.d.ts.map +1 -0
  142. package/dist/lib/server-config-service.test.js.map +1 -0
  143. package/dist/lib/server-lifecycle-service.d.ts +172 -0
  144. package/dist/lib/server-lifecycle-service.d.ts.map +1 -0
  145. package/dist/lib/server-lifecycle-service.js +619 -0
  146. package/dist/lib/server-lifecycle-service.js.map +1 -0
  147. package/dist/lib/state-manager.d.ts +18 -1
  148. package/dist/lib/state-manager.d.ts.map +1 -1
  149. package/dist/lib/state-manager.js +51 -2
  150. package/dist/lib/state-manager.js.map +1 -1
  151. package/dist/lib/status-checker.d.ts +11 -4
  152. package/dist/lib/status-checker.d.ts.map +1 -1
  153. package/dist/lib/status-checker.js +34 -1
  154. package/dist/lib/status-checker.js.map +1 -1
  155. package/dist/lib/validation-service.d.ts +43 -0
  156. package/dist/lib/validation-service.d.ts.map +1 -0
  157. package/dist/lib/validation-service.js +112 -0
  158. package/dist/lib/validation-service.js.map +1 -0
  159. package/dist/lib/validation-service.test.d.ts +2 -0
  160. package/dist/lib/validation-service.test.d.ts.map +1 -0
  161. package/dist/lib/validation-service.test.js.map +1 -0
  162. package/dist/scripts/http-log-filter.sh +8 -0
  163. package/dist/tui/ConfigApp.d.ts.map +1 -1
  164. package/dist/tui/ConfigApp.js +222 -184
  165. package/dist/tui/ConfigApp.js.map +1 -1
  166. package/dist/tui/HistoricalMonitorApp.d.ts.map +1 -1
  167. package/dist/tui/HistoricalMonitorApp.js +12 -0
  168. package/dist/tui/HistoricalMonitorApp.js.map +1 -1
  169. package/dist/tui/ModelsApp.d.ts.map +1 -1
  170. package/dist/tui/ModelsApp.js +93 -17
  171. package/dist/tui/ModelsApp.js.map +1 -1
  172. package/dist/tui/MonitorApp.d.ts.map +1 -1
  173. package/dist/tui/MonitorApp.js +1 -3
  174. package/dist/tui/MonitorApp.js.map +1 -1
  175. package/dist/tui/MultiServerMonitorApp.d.ts +3 -3
  176. package/dist/tui/MultiServerMonitorApp.d.ts.map +1 -1
  177. package/dist/tui/MultiServerMonitorApp.js +724 -508
  178. package/dist/tui/MultiServerMonitorApp.js.map +1 -1
  179. package/dist/tui/RootNavigator.d.ts.map +1 -1
  180. package/dist/tui/RootNavigator.js +17 -1
  181. package/dist/tui/RootNavigator.js.map +1 -1
  182. package/dist/tui/RouterApp.d.ts +6 -0
  183. package/dist/tui/RouterApp.d.ts.map +1 -0
  184. package/dist/tui/RouterApp.js +928 -0
  185. package/dist/tui/RouterApp.js.map +1 -0
  186. package/dist/tui/SearchApp.d.ts.map +1 -1
  187. package/dist/tui/SearchApp.js +27 -6
  188. package/dist/tui/SearchApp.js.map +1 -1
  189. package/dist/tui/shared/modal-controller.d.ts +65 -0
  190. package/dist/tui/shared/modal-controller.d.ts.map +1 -0
  191. package/dist/tui/shared/modal-controller.js +625 -0
  192. package/dist/tui/shared/modal-controller.js.map +1 -0
  193. package/dist/tui/shared/overlay-utils.d.ts +7 -0
  194. package/dist/tui/shared/overlay-utils.d.ts.map +1 -0
  195. package/dist/tui/shared/overlay-utils.js +54 -0
  196. package/dist/tui/shared/overlay-utils.js.map +1 -0
  197. package/dist/types/admin-config.d.ts +15 -2
  198. package/dist/types/admin-config.d.ts.map +1 -1
  199. package/dist/types/model-info.d.ts +5 -0
  200. package/dist/types/model-info.d.ts.map +1 -1
  201. package/dist/types/router-config.d.ts +2 -2
  202. package/dist/types/router-config.d.ts.map +1 -1
  203. package/dist/types/server-config.d.ts +8 -0
  204. package/dist/types/server-config.d.ts.map +1 -1
  205. package/dist/types/server-config.js +25 -0
  206. package/dist/types/server-config.js.map +1 -1
  207. package/dist/utils/http-log-filter.d.ts +10 -0
  208. package/dist/utils/http-log-filter.d.ts.map +1 -0
  209. package/dist/utils/http-log-filter.js +84 -0
  210. package/dist/utils/http-log-filter.js.map +1 -0
  211. package/dist/utils/log-parser.d.ts.map +1 -1
  212. package/dist/utils/log-parser.js +7 -4
  213. package/dist/utils/log-parser.js.map +1 -1
  214. package/dist/utils/log-utils.d.ts +59 -4
  215. package/dist/utils/log-utils.d.ts.map +1 -1
  216. package/dist/utils/log-utils.js +150 -11
  217. package/dist/utils/log-utils.js.map +1 -1
  218. package/dist/utils/shard-utils.d.ts +72 -0
  219. package/dist/utils/shard-utils.d.ts.map +1 -0
  220. package/dist/utils/shard-utils.js +168 -0
  221. package/dist/utils/shard-utils.js.map +1 -0
  222. package/package.json +18 -4
  223. package/src/launchers/llamacpp-admin +8 -0
  224. package/src/launchers/llamacpp-router +8 -0
  225. package/src/launchers/llamacpp-server +8 -0
  226. package/web/dist/assets/index-Byhoy86V.css +1 -0
  227. package/web/dist/assets/index-HSrgvray.js +50 -0
  228. package/web/dist/index.html +2 -2
  229. package/web/dist/assets/index-Bin89Lwr.css +0 -1
  230. package/web/dist/assets/index-CVmonw3T.js +0 -17
package/README.md CHANGED
@@ -1,5 +1,7 @@
1
1
  # llamacpp-cli
2
2
 
3
+ > **Note:** llamacpp-cli only works on **macOS** and requires [llama.cpp](https://github.com/ggerganov/llama.cpp) to be installed.
4
+
3
5
  > Manage llama.cpp servers like Ollama—but faster. Full control over llama-server with macOS launchctl integration.
4
6
 
5
7
  CLI tool to manage local llama.cpp servers on macOS. Provides an Ollama-like experience for managing GGUF models and llama-server instances, with **significantly faster response times** than Ollama.
@@ -12,6 +14,7 @@ CLI tool to manage local llama.cpp servers on macOS. Provides an Ollama-like exp
12
14
  ## Features
13
15
 
14
16
  - 🚀 **Easy server management** - Start, stop, and monitor llama.cpp servers
17
+ - 🏷️ **Server aliases** - Friendly, stable identifiers that persist across model changes
15
18
  - 🔀 **Unified router** - Single OpenAI-compatible endpoint for all models with automatic routing and request logging
16
19
  - 🌐 **Admin Interface** - REST API + modern web UI for remote management and automation
17
20
  - 🤖 **Model downloads** - Pull GGUF models from Hugging Face
@@ -19,7 +22,7 @@ CLI tool to manage local llama.cpp servers on macOS. Provides an Ollama-like exp
19
22
  - ⚙️ **Smart defaults** - Auto-configure threads, context size, and GPU layers based on model size
20
23
  - 🔌 **Auto port assignment** - Automatically find available ports (9000-9999)
21
24
  - 📊 **Real-time monitoring TUI** - Multi-server dashboard with drill-down details, live GPU/CPU/memory metrics, token generation speeds, and animated loading states
22
- - 🪵 **Smart logging** - Compact one-line request format with optional full JSON details
25
+ - 🪵 **Unified logging** - Activity logs (HTTP requests) and System logs (diagnostics) for all services
23
26
  - ⚡️ **Optimized metrics** - Batch collection and caching prevent CPU spikes (10x fewer processes)
24
27
 
25
28
  ## Why llamacpp-cli?
@@ -170,17 +173,21 @@ llamacpp
170
173
 
171
174
  ![Server Monitoring TUI](https://raw.githubusercontent.com/appkitstudio/llamacpp-cli/main/docs/images/monitor-detail.png)
172
175
 
173
- ### Overview
176
+ ### Main Features
177
+
178
+ **Dashboard** - Monitor all servers at a glance with real-time metrics (GPU, CPU, memory, token speed)
179
+
180
+ **Server Management** - Create, start, stop, configure, and remove servers with inline editors
181
+
182
+ **Model Management** (press `M`) - Browse local models, search/download from HuggingFace, delete with cascade
174
183
 
175
- The TUI provides a comprehensive interface for:
176
- - **Monitoring** - Real-time metrics for all servers (GPU, CPU, memory, token generation)
177
- - **Server Management** - Create, start, stop, remove, and configure servers
178
- - **Model Management** - Browse, search, download, and delete models
179
- - **Historical Metrics** - View time-series charts of past performance
184
+ **Router Management** (press `R`) - Control router service, view configuration, access activity/system logs
180
185
 
181
- ### Multi-Server Dashboard
186
+ **Historical Charts** (press `H`) - View time-series graphs with Recent (1-3min) or Hour (60min) views
182
187
 
183
- The main view shows all your servers at a glance:
188
+ **Logs** (press `L`) - Toggle between Activity (HTTP) and System (diagnostics) logs with auto-refresh
189
+
190
+ ### Dashboard View
184
191
 
185
192
  ```
186
193
  ┌─────────────────────────────────────────────────────────┐
@@ -190,173 +197,14 @@ The main view shows all your servers at a glance:
190
197
  │ Servers (3 running, 0 stopped) │
191
198
  │ │ Server ID │ Port │ Status │ Slots │ tok/s │
192
199
  │───┼────────────────┼──────┼────────┼───────┼──────────┤
193
- │ ► │ llama-3-2-3b │ 9000 │ ● RUN │ 2/4 │ 245 │ (highlighted)
200
+ │ ► │ llama-3-2-3b │ 9000 │ ● RUN │ 2/4 │ 245 │
194
201
  │ │ qwen2-7b │ 9001 │ ● RUN │ 1/4 │ 198 │
195
202
  │ │ llama-3-1-8b │ 9002 │ ○ IDLE │ 0/4 │ - │
196
203
  └─────────────────────────────────────────────────────────┘
197
- ↑/↓ Navigate | Enter for details | [N]ew [M]odels [H]istory [Q]uit
204
+ ↑/↓ Navigate | Enter for details | [N]ew [M]odels [R]outer [H]istory [Q]uit
198
205
  ```
199
206
 
200
- **Features:**
201
- - System resource overview (GPU, CPU, memory)
202
- - List of all servers (running and stopped)
203
- - Real-time status updates every 2 seconds
204
- - Color-coded status indicators
205
- - Navigate with arrow keys or vim keys (k/j)
206
-
207
- ### Single-Server Detail View
208
-
209
- Press `Enter` on any server to see detailed information:
210
-
211
- **Running servers show:**
212
- - Server information (status, uptime, model name, endpoint)
213
- - Request metrics (active/idle slots, prompt speed, generation speed)
214
- - Active slots detail (per-slot token generation rates)
215
- - System resources (GPU/CPU/ANE utilization, memory usage)
216
-
217
- **Stopped servers show:**
218
- - Server configuration (threads, context, GPU layers)
219
- - Last activity timestamps
220
- - Quick action commands (start, config, logs)
221
-
222
- ### Models Management
223
-
224
- Press `M` from the main view to access Models Management.
225
-
226
- **Features:**
227
- - Browse all installed models with size and modified date
228
- - View which servers are using each model
229
- - Delete models with cascade option (removes associated servers)
230
- - Search HuggingFace for new models
231
- - Download models with real-time progress tracking
232
-
233
- **Models View:**
234
- - View all GGUF files in scrollable table
235
- - Color-coded server usage (green = safe to delete, yellow = in use)
236
- - Delete selected model with `Enter` or `D` key
237
- - Confirmation dialog with cascade warning
238
-
239
- **Search View** (press `S` from Models view):
240
- - Search HuggingFace models by text input
241
- - Browse results with downloads, likes, and file counts
242
- - Expand model to show available GGUF files
243
- - Download with real-time progress, speed, and ETA
244
- - Cancel download with `ESC` (cleans up partial files)
245
-
246
- ### Server Operations
247
-
248
- **Create Server** (press `N` from main view):
249
- 1. Select model from list (shows existing servers per model)
250
- 2. Edit configuration (threads, context size, GPU layers, port)
251
- 3. Review smart defaults based on model size
252
- 4. Create and automatically start server
253
- 5. Return to main view with new server visible
254
-
255
- **Start/Stop Server** (press `S` from detail view):
256
- - Toggle server state with progress modal
257
- - Stays in detail view after operation
258
- - Shows updated status immediately
259
-
260
- **Remove Server** (press `R` from detail view):
261
- - Confirmation dialog with option to delete model file
262
- - Warns if other servers use the same model
263
- - Cascade deletion removes all associated data
264
- - Returns to main view after deletion
265
-
266
- **Configure Server** (press `C` from detail view):
267
- - Edit all server parameters inline
268
- - Modal dialogs for different field types
269
- - Model migration support (handles server ID changes)
270
- - Automatic restart prompts for running servers
271
- - Port conflict detection and validation
272
-
273
- ### Historical Monitoring
274
-
275
- Press `H` from any view to see historical time-series charts.
276
-
277
- **Single-Server Historical View:**
278
- - Token generation speed over time
279
- - GPU usage (%) with avg/max/min stats
280
- - CPU usage (%) with avg/max/min
281
- - Memory usage (%) with avg/max/min
282
- - Auto-refresh every 3 seconds
283
-
284
- **Multi-Server Historical View:**
285
- - Aggregated metrics across all servers
286
- - Total token generation speed (sum)
287
- - System GPU usage (average)
288
- - Total CPU usage (sum of per-process)
289
- - Total memory usage (sum in GB)
290
-
291
- **View Modes** (toggle with `H` key):
292
-
293
- - **Recent View (default):**
294
- - Shows last 40-80 samples (~1-3 minutes)
295
- - Raw data with no downsampling - perfect accuracy
296
- - Best for: "What's happening right now?"
297
-
298
- - **Hour View:**
299
- - Shows all ~1,800 samples from last hour
300
- - Absolute time-aligned downsampling (30:1 ratio)
301
- - Bucket max for GPU/CPU/token speed (preserves peaks)
302
- - Bucket mean for memory (shows average)
303
- - Chart stays perfectly stable as data streams in
304
- - Best for: "What happened over the last hour?"
305
-
306
- **Data Collection:**
307
- - Automatic during monitoring (piggyback on polling loop)
308
- - Stored in `~/.llamacpp/history/<server-id>.json` per server
309
- - Retention: Last 24 hours (circular buffer, auto-prune)
310
- - File size: ~21 MB per server for 24h @ 2s interval
311
-
312
- ### Keyboard Shortcuts
313
-
314
- **List View (Multi-Server):**
315
- - `↑/↓` or `k/j` - Navigate server list
316
- - `Enter` - View details for selected server
317
- - `N` - Create new server
318
- - `M` - Switch to Models Management
319
- - `H` - View historical metrics (all servers)
320
- - `ESC` - Exit TUI
321
- - `Q` - Quit immediately
322
-
323
- **Detail View (Single-Server):**
324
- - `S` - Start/Stop server (toggles based on status)
325
- - `C` - Open configuration screen
326
- - `R` - Remove server (with confirmation)
327
- - `H` - View historical metrics (this server)
328
- - `ESC` - Back to list view
329
- - `Q` - Quit immediately
330
-
331
- **Models View:**
332
- - `↑/↓` or `k/j` - Navigate model list
333
- - `Enter` or `D` - Delete selected model
334
- - `S` - Open search view
335
- - `R` - Refresh model list
336
- - `ESC` - Back to main view
337
- - `Q` - Quit immediately
338
-
339
- **Search View:**
340
- - `/` or `I` - Focus search input
341
- - `Enter` (in input) - Execute search
342
- - `↑/↓` or `k/j` - Navigate results or files
343
- - `Enter` (on result) - Show GGUF files for model
344
- - `Enter` (on file) - Download/install model
345
- - `R` - Refresh results (re-execute search)
346
- - `ESC` - Back to models view (or results list if viewing files)
347
- - `Q` - Quit immediately
348
-
349
- **Historical View:**
350
- - `H` - Toggle between Recent/Hour view
351
- - `ESC` - Return to live monitoring
352
- - `Q` - Quit immediately
353
-
354
- **Configuration Screen:**
355
- - `↑/↓` or `k/j` - Navigate fields
356
- - `Enter` - Open modal for selected field
357
- - `S` - Save changes (prompts for restart if running)
358
- - `ESC` - Cancel (prompts if unsaved changes)
359
- - `Q` - Quit immediately
207
+ Navigate with arrow keys or vim keys (k/j). Press `Enter` on any server to see detailed metrics, active slots, and resource usage. All keyboard shortcuts are shown in the footer of each view.
360
208
 
361
209
  ### Optional: GPU/CPU Metrics
362
210
 
@@ -377,7 +225,7 @@ The `llamacpp server monitor` command is deprecated. Use `llamacpp` instead to l
377
225
 
378
226
  ## Router (Unified Endpoint)
379
227
 
380
- The router provides a single OpenAI-compatible endpoint that automatically routes requests to the correct backend server based on the model name. This is perfect for LLM clients that don't support multiple endpoints.
228
+ The router provides a single unified endpoint that automatically routes requests to the correct backend server based on the model name. Supports both OpenAI and Anthropic API formats. Perfect for LLM clients that don't support multiple endpoints.
381
229
 
382
230
  ### Quick Start
383
231
 
@@ -396,8 +244,8 @@ llamacpp router start # Start the router service
396
244
  llamacpp router stop # Stop the router service
397
245
  llamacpp router status # Show router status and available models
398
246
  llamacpp router restart # Restart the router
399
- llamacpp router config # Update router settings (--port, --host, --timeout, --health-interval, --verbose)
400
- llamacpp router logs # View router logs (with --follow, --verbose, --clear options)
247
+ llamacpp router config # Update router settings (--port, --host, --timeout, --health-interval)
248
+ llamacpp router logs # View router logs (with --follow, --activity, --system, --clear options)
401
249
  ```
402
250
 
403
251
  ### Usage Example
@@ -417,8 +265,22 @@ response = client.chat.completions.create(
417
265
  model="llama-3.2-3b-instruct-q4_k_m.gguf",
418
266
  messages=[{"role": "user", "content": "Hello!"}]
419
267
  )
268
+
269
+ # Or use server aliases for cleaner code
270
+ response = client.chat.completions.create(
271
+ model="thinking", # Routes to server with alias "thinking"
272
+ messages=[{"role": "user", "content": "Hello!"}]
273
+ )
420
274
  ```
421
275
 
276
+ **Model Name Resolution:**
277
+ The router accepts model names in multiple formats:
278
+ - Full model filename: `llama-3.2-3b-instruct-q4_k_m.gguf`
279
+ - Server alias: `thinking` (set with `--alias` flag)
280
+ - Partial model name: `llama-3.2-3b` (fuzzy match)
281
+
282
+ Aliases provide a stable, friendly identifier that persists across model changes.
283
+
422
284
  ### Supported Endpoints
423
285
 
424
286
  **OpenAI-Compatible:**
@@ -427,8 +289,8 @@ response = client.chat.completions.create(
427
289
  - `GET /v1/models` - List all available models from running servers
428
290
 
429
291
  **Anthropic-Compatible:**
430
- - `POST /v1/messages` - Anthropic Messages API (with tool calling support)
431
- - `POST /v1/messages/count_tokens` - Token counting
292
+ - `POST /v1/messages` - Anthropic Messages API (with streaming and tool calling support)
293
+ - `POST /v1/messages/count_tokens` - Token counting (estimated)
432
294
  - `GET /v1/models/{model}` - Retrieve specific model info
433
295
 
434
296
  **System:**
@@ -451,34 +313,28 @@ llamacpp router config --health-interval 3000 --restart
451
313
  # Change bind address (for remote access)
452
314
  llamacpp router config --host 0.0.0.0 --restart
453
315
 
454
- # Enable verbose logging (saves detailed JSON logs)
455
- llamacpp router config --verbose true --restart
456
-
457
- # Disable verbose logging
458
- llamacpp router config --verbose false --restart
459
316
  ```
460
317
 
461
318
  **Note:** Changes require a restart to take effect. Use `--restart` flag to apply immediately.
462
319
 
463
320
  ### Logging
464
321
 
465
- The router uses separate log streams for different purposes (nginx-style):
322
+ The router provides two log types:
466
323
 
467
- | Log File | Purpose | Content |
468
- |----------|---------|---------|
469
- | `router.stdout` | Request activity | Model routing, status codes, timing, prompts |
470
- | `router.stderr` | System messages | Startup, shutdown, errors, proxy failures |
471
- | `router.log` | Structured JSON | Detailed entries for programmatic parsing (verbose mode) |
324
+ | Log Type | CLI Flag | Content |
325
+ |----------|----------|---------|
326
+ | **Activity** | (default) | Request routing, status codes, timing, backend selection |
327
+ | **System** | `--system` | Startup, shutdown, errors, diagnostic messages |
472
328
 
473
- **View recent logs:**
329
+ **View logs:**
474
330
  ```bash
475
- # Show activity logs (default - stdout)
331
+ # Activity logs (default) - router request routing
476
332
  llamacpp router logs
477
333
 
478
- # Show system logs (errors, startup messages)
479
- llamacpp router logs --stderr
334
+ # System logs - diagnostics and errors
335
+ llamacpp router logs --system
480
336
 
481
- # Follow activity in real-time
337
+ # Follow logs in real-time
482
338
  llamacpp router logs --follow
483
339
 
484
340
  # Show last 10 lines
@@ -487,50 +343,38 @@ llamacpp router logs --lines 10
487
343
 
488
344
  **Log formats:**
489
345
 
490
- Activity logs (stdout):
346
+ Activity logs:
491
347
  ```
492
348
  200 POST /v1/chat/completions → llama-3.2-3b-instruct-q4_k_m.gguf (127.0.0.1:9001) 1234ms | "What is..."
493
349
  404 POST /v1/chat/completions → unknown-model 3ms | "test" | Error: No server found
494
350
  ```
495
351
 
496
- System logs (stderr):
352
+ System logs:
497
353
  ```
498
354
  [Router] Listening on http://127.0.0.1:9100
499
355
  [Router] PID: 12345
500
356
  [Router] Proxy request failed: ECONNREFUSED
501
357
  ```
502
358
 
503
- Verbose JSON logs (router.log) - enable with `--verbose true`:
504
- ```bash
505
- llamacpp router logs --verbose
506
- ```
507
-
508
359
  **Log management:**
509
360
  ```bash
510
- # Clear activity log
361
+ # Clear current log file (activity or system)
511
362
  llamacpp router logs --clear
512
363
 
513
- # Clear all router logs (stdout, stderr, verbose)
364
+ # Clear all router logs (both activity and system)
514
365
  llamacpp router logs --clear-all
515
366
 
516
367
  # Rotate log files with timestamp
517
368
  llamacpp router logs --rotate
518
-
519
- # View system logs instead of activity
520
- llamacpp router logs --stderr
521
369
  ```
522
370
 
523
- **What's logged (activity):**
524
- - ✅ Model name used
525
- - ✅ HTTP status code (color-coded)
371
+ **What's logged:**
372
+ - ✅ Model name and routing decisions
373
+ - ✅ HTTP status codes (color-coded)
526
374
  - ✅ Request duration (ms)
527
- - ✅ Backend server (host:port)
375
+ - ✅ Backend server selection (host:port)
528
376
  - ✅ First 50 chars of prompt
529
- - ✅ Error messages (if failed)
530
-
531
- **Verbose mode benefits:**
532
- - Detailed JSON logs for LLM/script parsing
533
- - Stored in `~/.llamacpp/logs/router.log`
377
+ - ✅ Error messages and diagnostics
534
378
  - Automatic rotation when exceeding 100MB
535
379
  - Machine-readable format with timestamps
536
380
 
@@ -674,8 +518,8 @@ llamacpp admin start # Start admin service
674
518
  llamacpp admin stop # Stop admin service
675
519
  llamacpp admin status # Show status and API key
676
520
  llamacpp admin restart # Restart service
677
- llamacpp admin config # Update settings (--port, --host, --regenerate-key, --verbose)
678
- llamacpp admin logs # View admin logs (with --follow, --clear, --rotate options)
521
+ llamacpp admin config # Update settings (--port, --host, --regenerate-key)
522
+ llamacpp admin logs # View admin logs (with --follow, --activity, --system, --clear options)
679
523
  ```
680
524
 
681
525
  ### REST API
@@ -686,6 +530,8 @@ The Admin API provides full CRUD operations for servers and models via HTTP.
686
530
 
687
531
  **Authentication:** Bearer token (API key auto-generated on first start)
688
532
 
533
+ **API Documentation:** Interactive Swagger UI available at `http://localhost:9200/api-docs`
534
+
689
535
  #### Server Endpoints
690
536
 
691
537
  | Method | Endpoint | Description |
@@ -698,7 +544,7 @@ The Admin API provides full CRUD operations for servers and models via HTTP.
698
544
  | POST | `/api/servers/:id/start` | Start stopped server |
699
545
  | POST | `/api/servers/:id/stop` | Stop running server |
700
546
  | POST | `/api/servers/:id/restart` | Restart server |
701
- | GET | `/api/servers/:id/logs?type=stdout\|stderr&lines=100` | Get server logs |
547
+ | GET | `/api/servers/:id/logs?type=activity\|system\|all&lines=100` | Get server logs (activity=HTTP, system=diagnostics) |
702
548
 
703
549
  #### Model Endpoints
704
550
 
@@ -710,6 +556,17 @@ The Admin API provides full CRUD operations for servers and models via HTTP.
710
556
  | GET | `/api/models/search?q=query` | Search HuggingFace |
711
557
  | POST | `/api/models/download` | Download model from HF |
712
558
 
559
+ #### Router Endpoints
560
+
561
+ | Method | Endpoint | Description |
562
+ |--------|----------|-------------|
563
+ | GET | `/api/router` | Get router status and config |
564
+ | POST | `/api/router/start` | Start router service |
565
+ | POST | `/api/router/stop` | Stop router service |
566
+ | POST | `/api/router/restart` | Restart router service |
567
+ | PATCH | `/api/router` | Update router config |
568
+ | GET | `/api/router/logs?type=activity\|system&lines=100` | Get router logs (Activity from stdout, System from stderr) |
569
+
713
570
  #### System Endpoints
714
571
 
715
572
  | Method | Endpoint | Description |
@@ -750,6 +607,28 @@ curl -X DELETE "http://localhost:9200/api/models/llama-3.2-3b-instruct-q4_k_m.gg
750
607
  -H "Authorization: Bearer YOUR_API_KEY"
751
608
  ```
752
609
 
610
+ **Get server logs:**
611
+ ```bash
612
+ # Activity logs (HTTP requests) - default
613
+ curl "http://localhost:9200/api/servers/llama-3-2-3b/logs?type=activity&lines=50" \
614
+ -H "Authorization: Bearer YOUR_API_KEY"
615
+
616
+ # System logs (diagnostics)
617
+ curl "http://localhost:9200/api/servers/llama-3-2-3b/logs?type=system&lines=100" \
618
+ -H "Authorization: Bearer YOUR_API_KEY"
619
+ ```
620
+
621
+ **Get router logs:**
622
+ ```bash
623
+ # Activity logs (router requests)
624
+ curl "http://localhost:9200/api/router/logs?type=activity&lines=50" \
625
+ -H "Authorization: Bearer YOUR_API_KEY"
626
+
627
+ # System logs (diagnostics)
628
+ curl "http://localhost:9200/api/router/logs?type=system&lines=100" \
629
+ -H "Authorization: Bearer YOUR_API_KEY"
630
+ ```
631
+
753
632
  ### Web UI
754
633
 
755
634
  The web UI provides a modern, browser-based interface for managing servers and models.
@@ -809,8 +688,8 @@ llamacpp admin config --host 0.0.0.0 --restart
809
688
  # Regenerate API key (invalidates old key)
810
689
  llamacpp admin config --regenerate-key --restart
811
690
 
812
- # Enable verbose logging
813
- llamacpp admin config --verbose true --restart
691
+ # Enable logging
692
+ llamacpp admin config --logging true --restart
814
693
  ```
815
694
 
816
695
  **Note:** Changes require a restart to take effect. Use `--restart` flag to apply immediately.
@@ -844,29 +723,31 @@ llamacpp admin config --regenerate-key --restart
844
723
 
845
724
  ### Logging
846
725
 
847
- The admin service maintains separate log streams:
726
+ The admin service provides two log types:
727
+
728
+ | Log Type | CLI Flag | Content |
729
+ |----------|----------|---------|
730
+ | **Activity** | `--activity` | HTTP API requests (endpoint, status, duration) |
731
+ | **System** | `--system` | Startup, shutdown, errors, diagnostic messages |
848
732
 
849
- | Log File | Purpose | Content |
850
- |----------|---------|---------|
851
- | `admin.stdout` | Request activity | Endpoint, status, duration |
852
- | `admin.stderr` | System messages | Startup, shutdown, errors |
733
+ **Default:** Shows both Activity and System logs (useful for debugging).
853
734
 
854
735
  **View logs:**
855
736
  ```bash
856
- # Show activity logs (default - stdout)
737
+ # Both activity and system logs (default)
857
738
  llamacpp admin logs
858
739
 
859
- # Show system logs (errors, startup)
860
- llamacpp admin logs --stderr
740
+ # Activity logs only (HTTP API requests)
741
+ llamacpp admin logs --activity
742
+
743
+ # System logs only (diagnostics and errors)
744
+ llamacpp admin logs --system
861
745
 
862
746
  # Follow in real-time
863
747
  llamacpp admin logs --follow
864
748
 
865
749
  # Clear all logs
866
750
  llamacpp admin logs --clear
867
-
868
- # Rotate logs with timestamp
869
- llamacpp admin logs --rotate
870
751
  ```
871
752
 
872
753
  ### Example Output
@@ -910,8 +791,9 @@ Web UI: http://localhost:9200
910
791
 
911
792
  Configuration:
912
793
  Config: ~/.llamacpp/admin.json
913
- Plist: ~/Library/LaunchAgents/com.llama.admin.plist
914
- Logs: ~/.llamacpp/logs/admin.{stdout,stderr}
794
+ Plist: ~/Library/LaunchAgents/studio.appkit.llamacpp-cli.admin.plist
795
+ Logs: ~/.llamacpp/logs/admin.stdout # Activity logs
796
+ ~/.llamacpp/logs/admin.stderr # System logs
915
797
 
916
798
  Quick Commands:
917
799
  llamacpp admin stop # Stop service
@@ -1079,8 +961,8 @@ llamacpp logs --rotate
1079
961
  ```
1080
962
 
1081
963
  **Displays:**
1082
- - Current stderr size per server
1083
- - Current stdout size per server
964
+ - Activity logs (.http) size per server
965
+ - System logs (.stderr, .stdout) size per server
1084
966
  - Archived logs size and count
1085
967
  - Total log usage per server
1086
968
  - Grand total across all servers
@@ -1093,6 +975,64 @@ llamacpp logs --rotate
1093
975
 
1094
976
  **Use case:** Quickly see which servers are accumulating large logs, or clean up all logs at once.
1095
977
 
978
+ ## Server Aliases
979
+
980
+ Server aliases provide stable, user-friendly identifiers for your servers that persist across model changes. Instead of using auto-generated IDs like `llama-3-2-3b-instruct-q4-k-m`, you can use memorable names like `thinking`, `coder`, or `gpt-oss`.
981
+
982
+ ### Why Use Aliases?
983
+
984
+ **Stability:** When you change a server's model, the server ID changes (because it's derived from the model name). Aliases stay the same, preventing broken references in scripts and workflows.
985
+
986
+ **Convenience:** Shorter, more memorable names are easier to type and read.
987
+
988
+ **Router Integration:** Aliases work with the router, allowing cleaner API requests.
989
+
990
+ ### Usage Examples
991
+
992
+ ```bash
993
+ # Create server with alias
994
+ llamacpp server create llama-3.2-3b-instruct-q4_k_m.gguf --alias thinking
995
+
996
+ # Use alias in all commands
997
+ llamacpp server start thinking
998
+ llamacpp server stop thinking
999
+ llamacpp server logs thinking
1000
+ llamacpp ps thinking
1001
+
1002
+ # Update alias
1003
+ llamacpp server config thinking --alias smart-model
1004
+
1005
+ # Remove alias
1006
+ llamacpp server config thinking --alias ""
1007
+
1008
+ # Alias persists across model changes
1009
+ llamacpp server config thinking --model mistral-7b.gguf --restart
1010
+ llamacpp server start thinking # Still works with new model!
1011
+
1012
+ # Use alias in router requests
1013
+ curl -X POST http://localhost:9100/v1/messages \
1014
+ -H "Content-Type: application/json" \
1015
+ -d '{"model": "thinking", "max_tokens": 100, "messages": [{"role": "user", "content": "Hello"}]}'
1016
+ ```
1017
+
1018
+ ### Validation Rules
1019
+
1020
+ - **Format:** Alphanumeric characters, hyphens, and underscores only
1021
+ - **Length:** 1-64 characters
1022
+ - **Uniqueness:** Case-insensitive (can't have both "Thinking" and "thinking")
1023
+ - **Reserved names:** Cannot use "router", "admin", or "server"
1024
+ - **Storage:** Case-sensitive (preserves your input)
1025
+
1026
+ ### Lookup Priority
1027
+
1028
+ When you reference a server, the CLI checks identifiers in this order:
1029
+ 1. **Alias** (exact match, case-sensitive)
1030
+ 2. **Port** (if identifier is numeric)
1031
+ 3. **Server ID** (exact match)
1032
+ 4. **Model name** (fuzzy match)
1033
+
1034
+ This means aliases always take precedence, providing predictable behavior.
1035
+
1096
1036
  ## Server Management
1097
1037
 
1098
1038
  ### `llamacpp server create <model> [options]`
@@ -1102,11 +1042,21 @@ Create and start a new llama-server instance.
1102
1042
  llamacpp server create llama-3.2-3b-instruct-q4_k_m.gguf
1103
1043
  llamacpp server create llama-3.2-3b-instruct-q4_k_m.gguf --port 8080 --ctx-size 16384 --verbose
1104
1044
 
1045
+ # Create with a friendly alias
1046
+ llamacpp server create llama-3.2-3b-instruct-q4_k_m.gguf --alias thinking
1047
+
1048
+ # Create multiple servers with the same model (different configurations)
1049
+ llamacpp server create llama-3.2-3b-instruct-q4_k_m.gguf --ctx-size 8192 --alias short-context
1050
+ llamacpp server create llama-3.2-3b-instruct-q4_k_m.gguf --ctx-size 32768 --alias long-context
1051
+
1105
1052
  # Enable remote access (WARNING: security implications)
1106
1053
  llamacpp server create llama-3.2-3b-instruct-q4_k_m.gguf --host 0.0.0.0
1107
1054
  ```
1108
1055
 
1056
+ **Note:** You can create multiple servers using the same model file with different configurations (context size, GPU layers, etc.). Each server gets a unique ID automatically.
1057
+
1109
1058
  **Options:**
1059
+ - `-a, --alias <name>` - Friendly alias for the server (alphanumeric, hyphens, underscores, 1-64 chars)
1110
1060
  - `-p, --port <number>` - Port number (default: auto-assign from 9000)
1111
1061
  - `-h, --host <address>` - Bind address (default: `127.0.0.1` for localhost only, use `0.0.0.0` for remote access)
1112
1062
  - `-t, --threads <number>` - Thread count (default: half of CPU cores)
@@ -1122,11 +1072,12 @@ Show detailed configuration and status information for a server.
1122
1072
  ```bash
1123
1073
  llamacpp server show llama-3.2-3b # By partial name
1124
1074
  llamacpp server show 9000 # By port
1075
+ llamacpp server show thinking # By alias
1125
1076
  llamacpp server show llama-3-2-3b # By server ID
1126
1077
  ```
1127
1078
 
1128
1079
  **Displays:**
1129
- - Server ID, model name, and path
1080
+ - Server ID, alias (if set), model name, and path
1130
1081
  - Current status (running/stopped/crashed)
1131
1082
  - Host and port
1132
1083
  - PID (process ID)
@@ -1136,7 +1087,7 @@ llamacpp server show llama-3-2-3b # By server ID
1136
1087
  - System paths (plist file, log files)
1137
1088
  - Quick commands for common next actions
1138
1089
 
1139
- **Identifiers:** Port number, server ID, partial model name
1090
+ **Identifiers:** Alias, port number, server ID, partial model name
1140
1091
 
1141
1092
  ### `llamacpp server config <identifier> [options]`
1142
1093
  Update server configuration parameters without recreating the server.
@@ -1145,6 +1096,12 @@ Update server configuration parameters without recreating the server.
1145
1096
  # Change model while keeping all other settings
1146
1097
  llamacpp server config llama-3.2-3b --model llama-3.2-1b-instruct-q4_k_m.gguf --restart
1147
1098
 
1099
+ # Add or update alias
1100
+ llamacpp server config llama-3.2-3b --alias thinking
1101
+
1102
+ # Remove alias (use empty string)
1103
+ llamacpp server config thinking --alias ""
1104
+
1148
1105
  # Update context size and restart
1149
1106
  llamacpp server config llama-3.2-3b --ctx-size 8192 --restart
1150
1107
 
@@ -1162,6 +1119,7 @@ llamacpp server config llama-3.2-3b --threads 8 --ctx-size 16384 --gpu-layers 40
1162
1119
  ```
1163
1120
 
1164
1121
  **Options:**
1122
+ - `-a, --alias <name>` - Set or update alias (use empty string `""` to remove)
1165
1123
  - `-m, --model <filename>` - Update model (filename or path)
1166
1124
  - `-h, --host <address>` - Update bind address (`127.0.0.1` for localhost, `0.0.0.0` for remote access)
1167
1125
  - `-t, --threads <number>` - Update thread count
@@ -1171,22 +1129,23 @@ llamacpp server config llama-3.2-3b --threads 8 --ctx-size 16384 --gpu-layers 40
1171
1129
  - `--no-verbose` - Disable verbose logging
1172
1130
  - `-r, --restart` - Automatically restart server if running
1173
1131
 
1174
- **Note:** Changes require a server restart to take effect. Use `--restart` to automatically stop and start the server with the new configuration.
1132
+ **Note:** Changes require a server restart to take effect. Use `--restart` to automatically stop and start the server with the new configuration. Aliases persist across model changes, providing a stable identifier for your server.
1175
1133
 
1176
1134
  **⚠️ Security Warning:** Using `--host 0.0.0.0` binds the server to all network interfaces, allowing remote access. Only use this if you understand the security implications.
1177
1135
 
1178
- **Identifiers:** Port number, server ID, partial model name
1136
+ **Identifiers:** Alias, port number, server ID, partial model name
1179
1137
 
1180
1138
  ### `llamacpp server start <identifier>`
1181
1139
  Start an existing stopped server.
1182
1140
 
1183
1141
  ```bash
1142
+ llamacpp server start thinking # By alias
1184
1143
  llamacpp server start llama-3.2-3b # By partial name
1185
1144
  llamacpp server start 9000 # By port
1186
1145
  llamacpp server start llama-3-2-3b # By server ID
1187
1146
  ```
1188
1147
 
1189
- **Identifiers:** Port number, server ID, partial model name, or model filename
1148
+ **Identifiers:** Alias, port number, server ID, partial model name, or model filename
1190
1149
 
1191
1150
  ### `llamacpp server run <identifier> [options]`
1192
1151
  Run an interactive chat session with a model, or send a single message.
@@ -1226,41 +1185,44 @@ llamacpp server rm 9000
1226
1185
  ```
1227
1186
 
1228
1187
  ### `llamacpp server logs <identifier> [options]`
1229
- View server logs with smart filtering.
1230
1188
 
1231
- **Default (verbose enabled):**
1232
- ```bash
1233
- llamacpp server logs llama-3.2-3b
1234
- # Output: 2025-12-09 18:02:23 POST /v1/chat/completions 127.0.0.1 200 "What is..." 305 22 1036
1235
- ```
1189
+ View server logs with flexible filtering.
1190
+
1191
+ **Log Types:**
1192
+ - **Activity logs** (default): HTTP request/response logs in compact format
1193
+ - **System logs** (`--system`): Server diagnostic output (stderr + stdout)
1236
1194
 
1237
- **Without `--verbose` on server:**
1195
+ **Basic usage:**
1238
1196
  ```bash
1197
+ # Activity logs (default) - HTTP requests
1239
1198
  llamacpp server logs llama-3.2-3b
1240
- # Output: Only internal server logs (cache, slots) - no HTTP request logs
1241
- ```
1242
-
1243
- **More examples:**
1199
+ # Output: 2025-12-09 18:02:23 POST /v1/chat/completions 127.0.0.1 200 "What is..." 305 22 1036
1244
1200
 
1245
- # Full HTTP JSON request/response
1246
- llamacpp server logs llama-3.2-3b --http
1201
+ # System logs - diagnostics and errors
1202
+ llamacpp server logs llama-3.2-3b --system
1247
1203
 
1248
1204
  # Follow logs in real-time
1249
1205
  llamacpp server logs llama-3.2-3b --follow
1250
1206
 
1251
- # Last 100 requests
1207
+ # Last 100 lines
1252
1208
  llamacpp server logs llama-3.2-3b --lines 100
1209
+ ```
1253
1210
 
1254
- # Show only errors
1255
- llamacpp server logs llama-3.2-3b --errors
1211
+ **Advanced filtering:**
1212
+ ```bash
1213
+ # System logs with errors only
1214
+ llamacpp server logs llama-3.2-3b --system --errors
1256
1215
 
1257
- # Show all messages (including debug internals)
1258
- llamacpp server logs llama-3.2-3b --verbose
1216
+ # Custom grep pattern
1217
+ llamacpp server logs llama-3.2-3b --system --filter "error|warning"
1259
1218
 
1260
- # Custom filter pattern
1261
- llamacpp server logs llama-3.2-3b --filter "error|warning"
1219
+ # Include health check requests (filtered by default)
1220
+ llamacpp server logs llama-3.2-3b --include-health
1221
+ ```
1262
1222
 
1263
- # Clear log file (truncate to zero bytes)
1223
+ **Log management:**
1224
+ ```bash
1225
+ # Clear current log file (truncate to zero bytes)
1264
1226
  llamacpp server logs llama-3.2-3b --clear
1265
1227
 
1266
1228
  # Delete only archived logs (preserves current)
@@ -1276,15 +1238,15 @@ llamacpp server logs llama-3.2-3b --rotate
1276
1238
  **Options:**
1277
1239
  - `-f, --follow` - Follow log output in real-time
1278
1240
  - `-n, --lines <number>` - Number of lines to show (default: 50)
1279
- - `--http` - Show full HTTP JSON request/response logs
1280
- - `--errors` - Show only error messages
1281
- - `--verbose` - Show all messages including debug internals
1241
+ - `--activity` - Show HTTP activity logs (default)
1242
+ - `--system` - Show system logs (all server output)
1243
+ - `--errors` - Filter system logs for errors only
1282
1244
  - `--filter <pattern>` - Custom grep pattern for filtering
1283
- - `--stdout` - Show stdout instead of stderr (rarely needed)
1245
+ - `--include-health` - Include health check requests (/health, /slots, /props)
1284
1246
  - `--clear` - Clear (truncate) log file to zero bytes
1285
1247
  - `--clear-archived` - Delete only archived logs (preserves current logs)
1286
1248
  - `--clear-all` - Clear current logs AND delete all archived logs (frees most space)
1287
- - `--rotate` - Rotate log file with timestamp (e.g., `server.2026-01-22-19-30-00.stderr`)
1249
+ - `--rotate` - Rotate log file with timestamp (e.g., `server.2026-01-22-19-30-00.http`)
1288
1250
 
1289
1251
  **Automatic Log Rotation:**
1290
1252
  Logs are automatically rotated when they exceed 100MB during:
@@ -1293,9 +1255,7 @@ Logs are automatically rotated when they exceed 100MB during:
1293
1255
 
1294
1256
  Rotated logs are saved with timestamps in the same directory: `~/.llamacpp/logs/`
1295
1257
 
1296
- **Output Formats:**
1297
-
1298
- Default compact format:
1258
+ **Activity Log Format:**
1299
1259
  ```
1300
1260
  TIMESTAMP METHOD ENDPOINT IP STATUS "MESSAGE..." TOKENS_IN TOKENS_OUT TIME_MS
1301
1261
  ```
@@ -1304,10 +1264,7 @@ The compact format shows one line per HTTP request and includes:
1304
1264
  - User's message (first 50 characters)
1305
1265
  - Token counts (prompt tokens in, completion tokens out)
1306
1266
  - Total response time in milliseconds
1307
-
1308
- **Note:** Verbose logging is now enabled by default. HTTP request logs are available by default.
1309
-
1310
- Use `--http` to see full request/response JSON, or `--verbose` option to see all internal server logs.
1267
+ - Health checks filtered by default (use `--include-health` to show)
1311
1268
 
1312
1269
  ## Configuration
1313
1270
 
@@ -1320,11 +1277,14 @@ llamacpp-cli stores its configuration in `~/.llamacpp/`:
1320
1277
  ├── admin.json # Admin service configuration (includes API key)
1321
1278
  ├── servers/ # Server configurations
1322
1279
  │ └── <server-id>.json
1323
- ├── logs/ # Server logs
1324
- │ ├── <server-id>.stdout
1325
- │ ├── <server-id>.stderr
1326
- │ ├── router.{stdout,stderr,log}
1327
- └── admin.{stdout,stderr}
1280
+ ├── logs/ # All service logs
1281
+ │ ├── <server-id>.http # Activity: HTTP request logs
1282
+ │ ├── <server-id>.stderr # System: diagnostics
1283
+ │ ├── <server-id>.stdout # System: additional output
1284
+ ├── router.stdout # Router activity logs
1285
+ │ ├── router.stderr # Router system logs
1286
+ │ ├── admin.stdout # Admin activity logs
1287
+ │ └── admin.stderr # Admin system logs
1328
1288
  └── history/ # Historical metrics (TUI)
1329
1289
  └── <server-id>.json
1330
1290
  ```
@@ -1342,6 +1302,12 @@ llamacpp-cli automatically configures optimal settings based on model size:
1342
1302
 
1343
1303
  All servers include `--embeddings` and `--jinja` flags by default.
1344
1304
 
1305
+ **GPU Layers explained:**
1306
+ - **Default: 60** - Conservative value that works reliably on all Apple Silicon devices
1307
+ - **-1 (all)** - Maximum performance, uses all available GPU layers. May cause OOM on very large models with limited VRAM.
1308
+ - **0 (CPU only)** - Useful for testing or when GPU is busy with other tasks
1309
+ - **Specific number** - Fine-tune based on your GPU memory and model size
1310
+
1345
1311
  ## How It Works
1346
1312
 
1347
1313
  llamacpp-cli uses macOS launchctl to manage llama-server processes:
@@ -1351,7 +1317,7 @@ llamacpp-cli uses macOS launchctl to manage llama-server processes:
1351
1317
  3. Starts the server with `launchctl start`
1352
1318
  4. Monitors status via `launchctl list` and `lsof`
1353
1319
 
1354
- Services are named `com.llama.<model-id>`.
1320
+ Services are named `studio.appkit.llamacpp-cli.<model-id>`.
1355
1321
 
1356
1322
  **Auto-Restart Behavior:**
1357
1323
  - When you **start** a server, it's registered with launchd and will auto-restart on crash
@@ -1359,8 +1325,8 @@ Services are named `com.llama.<model-id>`.
1359
1325
  - Crashed servers will automatically restart (when loaded)
1360
1326
 
1361
1327
  **Router and Admin Services:**
1362
- - The **Router** (`com.llama.router`) provides a unified OpenAI-compatible endpoint for all models
1363
- - The **Admin** (`com.llama.admin`) provides REST API + web UI for remote management
1328
+ - The **Router** (`studio.appkit.llamacpp-cli.router`) provides a unified OpenAI-compatible endpoint for all models
1329
+ - The **Admin** (`studio.appkit.llamacpp-cli.admin`) provides REST API + web UI for remote management
1364
1330
  - Both run as launchctl services similar to individual model servers
1365
1331
 
1366
1332
  ## Known Limitations
@@ -1421,6 +1387,36 @@ Or regenerate a new one:
1421
1387
  llamacpp admin config --regenerate-key --restart
1422
1388
  ```
1423
1389
 
1390
+ ### `llamacpp migrate-labels`
1391
+ Migrate service labels from old format (`com.llama.*`) to new format (`studio.appkit.llamacpp-cli.*`).
1392
+
1393
+ > **Note:** This command is automatically triggered on first run after upgrading from versions prior to v2.1.0.
1394
+
1395
+ ```bash
1396
+ # Show what would be migrated without making changes
1397
+ llamacpp migrate-labels --dry-run
1398
+
1399
+ # Perform migration (with confirmation prompt)
1400
+ llamacpp migrate-labels
1401
+
1402
+ # Skip confirmation prompt
1403
+ llamacpp migrate-labels --force
1404
+ ```
1405
+
1406
+ **What it does:**
1407
+ 1. Creates a backup of all current configurations
1408
+ 2. Stops running services
1409
+ 3. Updates service labels and plist files
1410
+ 4. Restarts services that were running
1411
+ 5. Creates a marker file to prevent re-migration
1412
+
1413
+ **Troubleshooting:**
1414
+ If migration fails, configurations are automatically rolled back. You can also manually rollback:
1415
+
1416
+ ```bash
1417
+ llamacpp rollback-labels
1418
+ ```
1419
+
1424
1420
  ## Development
1425
1421
 
1426
1422
  ### CLI Development
@@ -1536,7 +1532,7 @@ Contributions are welcome! If you'd like to contribute:
1536
1532
  **CLI Development:**
1537
1533
  - Use `npm run dev -- <command>` to test commands without building
1538
1534
  - Check logs with `llamacpp server logs <server> --errors` when debugging
1539
- - Test launchctl integration with `launchctl list | grep com.llama`
1535
+ - Test launchctl integration with `launchctl list | grep studio.appkit.llamacpp-cli`
1540
1536
  - All server configs are in `~/.llamacpp/servers/`
1541
1537
  - Test interactive chat with `npm run dev -- server run <model>`
1542
1538