@appkit/llamacpp-cli 1.5.0 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (124) hide show
  1. package/CHANGELOG.md +20 -0
  2. package/MONITORING-ACCURACY-FIX.md +199 -0
  3. package/PER-PROCESS-METRICS.md +190 -0
  4. package/README.md +124 -9
  5. package/dist/cli.js +32 -7
  6. package/dist/cli.js.map +1 -1
  7. package/dist/commands/config.d.ts.map +1 -1
  8. package/dist/commands/config.js +15 -1
  9. package/dist/commands/config.js.map +1 -1
  10. package/dist/commands/create.d.ts.map +1 -1
  11. package/dist/commands/create.js +12 -4
  12. package/dist/commands/create.js.map +1 -1
  13. package/dist/commands/delete.js +12 -10
  14. package/dist/commands/delete.js.map +1 -1
  15. package/dist/commands/logs-all.d.ts +9 -0
  16. package/dist/commands/logs-all.d.ts.map +1 -0
  17. package/dist/commands/logs-all.js +209 -0
  18. package/dist/commands/logs-all.js.map +1 -0
  19. package/dist/commands/logs.d.ts +4 -0
  20. package/dist/commands/logs.d.ts.map +1 -1
  21. package/dist/commands/logs.js +108 -2
  22. package/dist/commands/logs.js.map +1 -1
  23. package/dist/commands/monitor.d.ts.map +1 -1
  24. package/dist/commands/monitor.js +51 -1
  25. package/dist/commands/monitor.js.map +1 -1
  26. package/dist/commands/ps.d.ts +3 -1
  27. package/dist/commands/ps.d.ts.map +1 -1
  28. package/dist/commands/ps.js +75 -5
  29. package/dist/commands/ps.js.map +1 -1
  30. package/dist/commands/rm.d.ts.map +1 -1
  31. package/dist/commands/rm.js +5 -12
  32. package/dist/commands/rm.js.map +1 -1
  33. package/dist/commands/server-show.d.ts.map +1 -1
  34. package/dist/commands/server-show.js +30 -3
  35. package/dist/commands/server-show.js.map +1 -1
  36. package/dist/commands/start.d.ts.map +1 -1
  37. package/dist/commands/start.js +34 -7
  38. package/dist/commands/start.js.map +1 -1
  39. package/dist/commands/stop.js +3 -3
  40. package/dist/commands/stop.js.map +1 -1
  41. package/dist/lib/history-manager.d.ts +46 -0
  42. package/dist/lib/history-manager.d.ts.map +1 -0
  43. package/dist/lib/history-manager.js +157 -0
  44. package/dist/lib/history-manager.js.map +1 -0
  45. package/dist/lib/metrics-aggregator.d.ts +2 -1
  46. package/dist/lib/metrics-aggregator.d.ts.map +1 -1
  47. package/dist/lib/metrics-aggregator.js +15 -4
  48. package/dist/lib/metrics-aggregator.js.map +1 -1
  49. package/dist/lib/system-collector.d.ts +9 -4
  50. package/dist/lib/system-collector.d.ts.map +1 -1
  51. package/dist/lib/system-collector.js +29 -28
  52. package/dist/lib/system-collector.js.map +1 -1
  53. package/dist/tui/HistoricalMonitorApp.d.ts +5 -0
  54. package/dist/tui/HistoricalMonitorApp.d.ts.map +1 -0
  55. package/dist/tui/HistoricalMonitorApp.js +490 -0
  56. package/dist/tui/HistoricalMonitorApp.js.map +1 -0
  57. package/dist/tui/MonitorApp.d.ts.map +1 -1
  58. package/dist/tui/MonitorApp.js +84 -62
  59. package/dist/tui/MonitorApp.js.map +1 -1
  60. package/dist/tui/MultiServerMonitorApp.d.ts +1 -1
  61. package/dist/tui/MultiServerMonitorApp.d.ts.map +1 -1
  62. package/dist/tui/MultiServerMonitorApp.js +293 -77
  63. package/dist/tui/MultiServerMonitorApp.js.map +1 -1
  64. package/dist/types/history-types.d.ts +30 -0
  65. package/dist/types/history-types.d.ts.map +1 -0
  66. package/dist/types/history-types.js +11 -0
  67. package/dist/types/history-types.js.map +1 -0
  68. package/dist/types/monitor-types.d.ts +1 -0
  69. package/dist/types/monitor-types.d.ts.map +1 -1
  70. package/dist/types/server-config.d.ts +1 -0
  71. package/dist/types/server-config.d.ts.map +1 -1
  72. package/dist/types/server-config.js.map +1 -1
  73. package/dist/utils/downsample-utils.d.ts +35 -0
  74. package/dist/utils/downsample-utils.d.ts.map +1 -0
  75. package/dist/utils/downsample-utils.js +107 -0
  76. package/dist/utils/downsample-utils.js.map +1 -0
  77. package/dist/utils/file-utils.d.ts +6 -0
  78. package/dist/utils/file-utils.d.ts.map +1 -1
  79. package/dist/utils/file-utils.js +38 -0
  80. package/dist/utils/file-utils.js.map +1 -1
  81. package/dist/utils/log-utils.d.ts +43 -0
  82. package/dist/utils/log-utils.d.ts.map +1 -0
  83. package/dist/utils/log-utils.js +190 -0
  84. package/dist/utils/log-utils.js.map +1 -0
  85. package/dist/utils/process-utils.d.ts +19 -1
  86. package/dist/utils/process-utils.d.ts.map +1 -1
  87. package/dist/utils/process-utils.js +79 -1
  88. package/dist/utils/process-utils.js.map +1 -1
  89. package/docs/images/.gitkeep +1 -0
  90. package/package.json +3 -1
  91. package/src/cli.ts +32 -7
  92. package/src/commands/config.ts +15 -1
  93. package/src/commands/create.ts +14 -5
  94. package/src/commands/delete.ts +10 -10
  95. package/src/commands/logs-all.ts +251 -0
  96. package/src/commands/logs.ts +138 -2
  97. package/src/commands/monitor.ts +21 -1
  98. package/src/commands/ps.ts +88 -5
  99. package/src/commands/rm.ts +5 -12
  100. package/src/commands/server-show.ts +35 -3
  101. package/src/commands/start.ts +35 -7
  102. package/src/commands/stop.ts +3 -3
  103. package/src/lib/history-manager.ts +172 -0
  104. package/src/lib/metrics-aggregator.ts +18 -5
  105. package/src/lib/system-collector.ts +31 -28
  106. package/src/tui/HistoricalMonitorApp.ts +548 -0
  107. package/src/tui/MonitorApp.ts +89 -64
  108. package/src/tui/MultiServerMonitorApp.ts +348 -103
  109. package/src/types/history-types.ts +39 -0
  110. package/src/types/monitor-types.ts +1 -0
  111. package/src/types/server-config.ts +1 -0
  112. package/src/utils/downsample-utils.ts +128 -0
  113. package/src/utils/file-utils.ts +40 -0
  114. package/src/utils/log-utils.ts +178 -0
  115. package/src/utils/process-utils.ts +85 -1
  116. package/test-load.sh +100 -0
  117. package/dist/tui/components/ErrorState.d.ts +0 -8
  118. package/dist/tui/components/ErrorState.d.ts.map +0 -1
  119. package/dist/tui/components/ErrorState.js +0 -22
  120. package/dist/tui/components/ErrorState.js.map +0 -1
  121. package/dist/tui/components/LoadingState.d.ts +0 -8
  122. package/dist/tui/components/LoadingState.d.ts.map +0 -1
  123. package/dist/tui/components/LoadingState.js +0 -21
  124. package/dist/tui/components/LoadingState.js.map +0 -1
package/CHANGELOG.md CHANGED
@@ -2,6 +2,26 @@
2
2
 
3
3
  All notable changes to this project will be documented in this file. See [commit-and-tag-version](https://github.com/absolute-version/commit-and-tag-version) for commit guidelines.
4
4
 
5
+ ## [1.7.0](https://github.com/appkitstudio/llamacpp-cli/compare/v1.6.0...v1.7.0) (2026-01-23)
6
+
7
+
8
+ ### Features
9
+
10
+ * add log management commands and auto-rotation for server logs ([e670a53](https://github.com/appkitstudio/llamacpp-cli/commit/e670a53a712d04267f06327af730dc2429e4ab43))
11
+
12
+ ## [1.6.0](https://github.com/appkitstudio/llamacpp-cli/compare/v1.5.0...v1.6.0) (2026-01-17)
13
+
14
+
15
+ ### Features
16
+
17
+ * add full-hour downsampling functions and enhance multi-server monitor UI with dynamic server ID width ([ae2862a](https://github.com/appkitstudio/llamacpp-cli/commit/ae2862acba905cddf60f0e7c30f6a7867391a5e2))
18
+ * add GPU memory tracking to server monitoring ([bc59c6a](https://github.com/appkitstudio/llamacpp-cli/commit/bc59c6a74580e428ab674167146caea47d8a32c1))
19
+ * enhance monitoring functionality with server status updates and improved resource tracking ([45fb833](https://github.com/appkitstudio/llamacpp-cli/commit/45fb833da5efe023a2271e7bd12d780a71474629))
20
+ * enhance multi-server monitor UI with improved navigation and selection indicators ([9e57cfb](https://github.com/appkitstudio/llamacpp-cli/commit/9e57cfb8ce93a2c561981598cf75f0e4ff1a477d))
21
+ * enhance server monitoring with interactive dashboard and improved metrics display ([fba8d79](https://github.com/appkitstudio/llamacpp-cli/commit/fba8d79ee58ecd7ccfe02e319ae7bf5474b591df))
22
+ * implement per-process metrics for historical monitoring accuracy ([cc59df0](https://github.com/appkitstudio/llamacpp-cli/commit/cc59df069775031de1bfacdeb3a462a17610e4eb))
23
+ * improve historical monitoring UI with faster refresh rate and enhanced display elements ([e0ce04b](https://github.com/appkitstudio/llamacpp-cli/commit/e0ce04ba258f6d945a977c39f056ba22cb324c70))
24
+
5
25
  ## [1.5.0](https://github.com/appkitstudio/llamacpp-cli/compare/v1.4.1...v1.5.0) (2026-01-13)
6
26
 
7
27
 
@@ -0,0 +1,199 @@
1
+ # Historical Monitoring Accuracy Fix
2
+
3
+ **STATUS:** This document describes the initial fix attempt for memory calculations. However, the root issue was that historical monitoring was showing **system-wide metrics** instead of **per-process metrics**. See `PER-PROCESS-METRICS.md` for the correct implementation.
4
+
5
+ ## Issue Summary
6
+
7
+ Comparison between our historical monitoring and macmon revealed discrepancies in memory usage calculations.
8
+
9
+ ## Issues Identified
10
+
11
+ ### 1. Memory Total Calculation (CRITICAL)
12
+
13
+ **Problem:** Total memory was calculated by summing all vm_stat page counts, which doesn't equal the actual installed RAM.
14
+
15
+ **Evidence:**
16
+ - Historical monitor showed: ~60% memory usage
17
+ - macmon showed: 26.86 / 32.0 GB = ~84% memory usage
18
+ - The denominator (32.0 GB installed RAM) was being calculated incorrectly
19
+
20
+ **Root Cause:**
21
+ ```typescript
22
+ // OLD CODE (INCORRECT)
23
+ const totalPages = pagesActive + pagesWired + pagesCompressed +
24
+ pagesFree + pagesInactive + pagesSpeculative;
25
+ const memoryTotal = totalPages * pageSize;
26
+ ```
27
+
28
+ This approach has fundamental flaws:
29
+ - vm_stat doesn't report all memory categories (kernel reserved, etc.)
30
+ - Page counts don't sum to actual installed RAM
31
+ - Results in artificially inflated "total" value
32
+ - Makes memory usage appear lower than reality
33
+
34
+ **Fix:**
35
+ ```typescript
36
+ // NEW CODE (CORRECT)
37
+ // Get total installed RAM from sysctl (accurate)
38
+ const memoryTotal = await execCommand('sysctl -n hw.memsize 2>/dev/null');
39
+ ```
40
+
41
+ Use `sysctl hw.memsize` to get actual installed RAM size in bytes. This matches what Activity Monitor and macmon report.
42
+
43
+ ### 2. Memory Used Calculation (VERIFIED CORRECT)
44
+
45
+ **Current approach:**
46
+ ```typescript
47
+ // Used = Active + Wired + Compressed
48
+ const usedPages = pagesActive + pagesWired + pagesCompressed;
49
+ const memoryUsed = usedPages * pageSize;
50
+ ```
51
+
52
+ This formula is **correct** and matches what Activity Monitor and macmon report as "used memory".
53
+
54
+ - **Active:** Recently used memory
55
+ - **Wired:** Kernel memory that can't be paged out
56
+ - **Compressed:** Compressed pages in RAM
57
+
58
+ We removed the calculation of unused page types (free, inactive, speculative) since they're not needed.
59
+
60
+ ### 3. CPU Calculation (VERIFIED CORRECT)
61
+
62
+ **Formula:**
63
+ ```typescript
64
+ cpuUsage = ((pcpuUsage * pCoreCount) + (ecpuUsage * eCoreCount)) / totalCores * 100
65
+ ```
66
+
67
+ This weighted average is mathematically correct:
68
+ - macmon reports per-core-type averages (P-CPU: 25%, E-CPU: 36%)
69
+ - Formula computes overall system average: `(25% × 6 + 36% × 4) / 10 = 29.4%`
70
+ - Historical average of 33% is reasonable given fluctuations over time
71
+
72
+ ### 4. GPU Calculation (VERIFIED CORRECT)
73
+
74
+ **Observation:**
75
+ - Historical: Avg: 1.8%, Max: 4.0%, Min: 0.6%
76
+ - macmon snapshot: GPU 4%
77
+
78
+ This is **expected behavior**:
79
+ - GPU is mostly idle (0-2%) between inference requests
80
+ - Spikes to 4% during active token generation
81
+ - Average of 1.8% correctly reflects mostly-idle state
82
+ - Max of 4.0% matches macmon's instantaneous reading
83
+
84
+ ## Changes Made
85
+
86
+ ### `src/lib/system-collector.ts`
87
+
88
+ **1. Removed total memory calculation from vm_stat parsing:**
89
+ ```typescript
90
+ // Now only returns memoryUsed
91
+ private parseVmStatOutput(output: string): { memoryUsed: number }
92
+ ```
93
+
94
+ **2. Added method to get actual installed RAM:**
95
+ ```typescript
96
+ private async getTotalMemory(): Promise<number> {
97
+ const output = await execCommand('sysctl -n hw.memsize 2>/dev/null');
98
+ return parseInt(output.trim(), 10) || 0;
99
+ }
100
+ ```
101
+
102
+ **3. Combined both sources in new method:**
103
+ ```typescript
104
+ private async getMemoryMetrics(): Promise<{
105
+ memoryUsed: number;
106
+ memoryTotal: number;
107
+ }> {
108
+ // Get used memory from vm_stat (active + wired + compressed)
109
+ const vmStatOutput = await execCommand('vm_stat 2>/dev/null');
110
+ const { memoryUsed } = this.parseVmStatOutput(vmStatOutput);
111
+
112
+ // Get total installed RAM from sysctl (accurate)
113
+ const memoryTotal = await this.getTotalMemory();
114
+
115
+ return { memoryUsed, memoryTotal };
116
+ }
117
+ ```
118
+
119
+ **4. Updated collector to use new method:**
120
+ ```typescript
121
+ // Always get memory from vm_stat + sysctl (accurate total from sysctl)
122
+ const memoryMetrics = await this.getMemoryMetrics();
123
+ ```
124
+
125
+ ## Verification
126
+
127
+ After these changes, memory usage should now accurately match macmon and Activity Monitor:
128
+
129
+ **Before:**
130
+ - Total: Calculated from page sum (~40 GB equivalent)
131
+ - Used: 26.86 GB
132
+ - **Percentage: ~60% (WRONG)**
133
+
134
+ **After:**
135
+ - Total: 32.0 GB (from `sysctl hw.memsize`)
136
+ - Used: 26.86 GB (from vm_stat)
137
+ - **Percentage: ~84% (CORRECT)**
138
+
139
+ ## Testing Recommendations
140
+
141
+ 1. **Compare with macmon:**
142
+ ```bash
143
+ # Terminal 1: Run macmon
144
+ macmon
145
+
146
+ # Terminal 2: Monitor server
147
+ npm run dev -- server monitor <server-id>
148
+ ```
149
+
150
+ Memory percentages should now match within 1-2%.
151
+
152
+ 2. **Compare with Activity Monitor:**
153
+ - Open Activity Monitor → Memory tab
154
+ - Check "Memory Used" value
155
+ - Should match historical monitor's memory calculation
156
+
157
+ 3. **Verify historical data:**
158
+ ```bash
159
+ # View historical metrics (press H in monitor)
160
+ npm run dev -- server monitor <server-id>
161
+ # Press 'H' to toggle historical view
162
+ ```
163
+
164
+ Memory usage should now show realistic values (~80-90% on actively used system).
165
+
166
+ 4. **Check edge cases:**
167
+ - Fresh boot (low memory usage ~30-40%)
168
+ - Under load (high memory usage ~85-95%)
169
+ - Multiple servers running (memory should increase proportionally)
170
+
171
+ ## Impact on Historical Data
172
+
173
+ **Note:** Existing historical data was collected with the old (incorrect) calculation.
174
+
175
+ **Options:**
176
+
177
+ 1. **Keep old data as-is** (recommended for now)
178
+ - Historical charts will show old incorrect baseline
179
+ - New data will be accurate going forward
180
+ - Natural transition over 24 hours as old data ages out
181
+
182
+ 2. **Clear history and start fresh:**
183
+ ```bash
184
+ rm ~/.llamacpp/history/*.json
185
+ ```
186
+ - Immediate accuracy
187
+ - Lose historical context
188
+
189
+ ## Related Files
190
+
191
+ - `src/lib/system-collector.ts` - System metrics collection (MODIFIED)
192
+ - `src/lib/history-manager.ts` - History persistence (unchanged)
193
+ - `src/tui/HistoricalMonitorApp.ts` - Historical UI (unchanged)
194
+
195
+ ## References
196
+
197
+ - macOS `vm_stat` documentation: Reports memory in pages (16KB on Apple Silicon)
198
+ - macOS `sysctl` documentation: `hw.memsize` reports installed RAM in bytes
199
+ - Activity Monitor algorithm: Uses active + wired + compressed for "Memory Used"
@@ -0,0 +1,190 @@
1
+ # Per-Process Metrics Implementation
2
+
3
+ ## Overview
4
+
5
+ Historical monitoring now shows **per-process metrics** for the specific llama-server being monitored, rather than system-wide metrics. This provides accurate resource usage for each model.
6
+
7
+ ## What Changed
8
+
9
+ ### Before (System-Wide)
10
+ - **GPU Usage:** All processes combined
11
+ - **CPU Usage:** All processes combined
12
+ - **Memory Usage:** All processes combined (% of total RAM)
13
+
14
+ ### After (Per-Process)
15
+ - **GPU Usage:** System-wide (unchanged - can't isolate per-process on macOS)
16
+ - **CPU Usage:** Just the llama-server process (from `ps`)
17
+ - **Memory Usage:** Just the llama-server process in GB (from `top`)
18
+
19
+ ## Implementation Details
20
+
21
+ ### 1. Process Metrics Collection
22
+
23
+ **Added CPU collection (`src/utils/process-utils.ts`):**
24
+ ```typescript
25
+ // Batch collection for efficiency
26
+ export async function getBatchProcessCpu(pids: number[]): Promise<Map<number, number | null>>
27
+
28
+ // Single process collection
29
+ export async function getProcessCpu(pid: number): Promise<number | null>
30
+ ```
31
+
32
+ **Features:**
33
+ - Uses `ps -p <pid> -o %cpu` to get per-process CPU percentage
34
+ - 3-second cache to prevent excessive process spawning
35
+ - Batch collection for multi-server monitoring
36
+ - Returns percentage (0-100+, can exceed 100% on multi-core)
37
+
38
+ ### 2. Type Updates
39
+
40
+ **ServerMetrics interface (`src/types/monitor-types.ts`):**
41
+ ```typescript
42
+ export interface ServerMetrics {
43
+ // ... existing fields
44
+ processMemory?: number; // Already existed
45
+ processCpuUsage?: number; // NEW: Per-process CPU %
46
+ }
47
+ ```
48
+
49
+ **HistorySnapshot interface (`src/types/history-types.ts`):**
50
+ ```typescript
51
+ export interface HistorySnapshot {
52
+ server: {
53
+ // ... existing fields
54
+ processMemory?: number; // Already existed
55
+ processCpuUsage?: number; // NEW: Per-process CPU %
56
+ };
57
+ system?: {
58
+ // ... system-wide metrics (kept for live monitoring)
59
+ };
60
+ }
61
+ ```
62
+
63
+ ### 3. Metrics Collection
64
+
65
+ **MetricsAggregator (`src/lib/metrics-aggregator.ts`):**
66
+ - Added `processCpuUsage` parameter to `collectServerMetrics()`
67
+ - Collects CPU in parallel with other metrics
68
+ - Supports batch collection for multi-server scenarios
69
+
70
+ **HistoryManager (`src/lib/history-manager.ts`):**
71
+ - Saves `processCpuUsage` in snapshots
72
+ - Maintains backward compatibility (optional field)
73
+
74
+ ### 4. Historical Monitor UI
75
+
76
+ **HistoricalMonitorApp (`src/tui/HistoricalMonitorApp.ts`):**
77
+
78
+ **Chart Changes:**
79
+
80
+ **GPU Usage:**
81
+ - **Unchanged:** Still system-wide
82
+ - **Reason:** macOS doesn't provide per-process GPU metrics easily
83
+ - **Label:** "GPU Usage (%)"
84
+
85
+ **CPU Usage:**
86
+ - **Before:** `snapshot.system.cpuUsage` (system-wide)
87
+ - **After:** `snapshot.server.processCpuUsage` (per-process)
88
+ - **Label:** "Process CPU Usage (%)"
89
+ - **Range:** Not forced to 0-100% (can show >100% for multi-threaded workloads)
90
+
91
+ **Memory Usage:**
92
+ - **Before:** `(system.memoryUsed / system.memoryTotal) * 100` (system-wide %)
93
+ - **After:** `processMemory / (1024 * 1024 * 1024)` (per-process GB)
94
+ - **Label:** "Process Memory Usage (GB)"
95
+ - **Format:** Shows 2 decimal places (e.g., "3.45 GB")
96
+ - **Statistics:** Avg, Max, Min in GB
97
+
98
+ **Multi-Server Comparison:**
99
+ - Table also updated to show per-process CPU and memory
100
+ - Memory column now shows GB instead of %
101
+
102
+ ## Benefits
103
+
104
+ 1. **Accurate Attribution:** See exactly what each model is using
105
+ 2. **Multi-Server Clarity:** Compare resource usage across different models
106
+ 3. **Debugging:** Identify which specific model is consuming resources
107
+ 4. **Capacity Planning:** Understand per-model requirements
108
+
109
+ ## Example Output
110
+
111
+ **Before (System-Wide):**
112
+ ```
113
+ CPU Usage (%)
114
+ Avg: 33.0% (±17.4) Max: 86.6% Min: 12.0%
115
+
116
+ Memory Usage (%)
117
+ Avg: 31.0% (±0.6) Max: 31.9% Min: 29.9%
118
+ ```
119
+
120
+ **After (Per-Process):**
121
+ ```
122
+ Process CPU Usage (%)
123
+ Avg: 45.2% (±12.3) Max: 120.5% Min: 8.1%
124
+
125
+ Process Memory Usage (GB)
126
+ Avg: 3.45 GB (±0.12) Max: 3.67 GB Min: 3.21 GB
127
+ ```
128
+
129
+ ## Edge Cases Handled
130
+
131
+ 1. **Missing Data:** Fields are optional, gracefully handles old snapshots
132
+ 2. **Process Not Running:** Returns null, charts skip those data points
133
+ 3. **Multi-Core:** CPU can exceed 100% (expected behavior)
134
+ 4. **Cache Expiry:** 3-second TTL prevents stale data
135
+ 5. **Batch Collection:** Efficient when monitoring multiple servers
136
+
137
+ ## Testing Recommendations
138
+
139
+ 1. **Single Server:**
140
+ ```bash
141
+ npm run dev -- server monitor <server-id>
142
+ # Press 'H' to view historical data
143
+ ```
144
+ - Verify CPU shows reasonable per-process values (not system-wide)
145
+ - Verify memory shows model size in GB (not total RAM %)
146
+
147
+ 2. **Multi-Server:**
148
+ ```bash
149
+ npm run dev -- server monitor
150
+ # Press 'H' to view comparison table
151
+ ```
152
+ - Verify each server shows different CPU/memory values
153
+ - Verify table shows GB for memory column
154
+
155
+ 3. **Compare with Activity Monitor:**
156
+ - Open Activity Monitor
157
+ - Filter for `llama-server` process
158
+ - CPU % should match within 5-10%
159
+ - Memory should match within 0.1 GB
160
+
161
+ 4. **Compare with `ps`:**
162
+ ```bash
163
+ ps -p <pid> -o %cpu,rss
164
+ ```
165
+ - CPU % should match
166
+ - RSS (memory) should match when converted to GB
167
+
168
+ ## Backward Compatibility
169
+
170
+ - Old history files still work (missing fields treated as undefined)
171
+ - System-wide metrics still collected for live monitoring
172
+ - Live monitoring TUI unchanged (still shows system-wide for context)
173
+ - Only historical view changed to per-process
174
+
175
+ ## Related Files
176
+
177
+ - `src/utils/process-utils.ts` - Added CPU collection functions
178
+ - `src/types/monitor-types.ts` - Added processCpuUsage field
179
+ - `src/types/history-types.ts` - Added processCpuUsage to snapshots
180
+ - `src/lib/metrics-aggregator.ts` - Collects CPU metrics
181
+ - `src/lib/history-manager.ts` - Saves CPU metrics
182
+ - `src/tui/HistoricalMonitorApp.ts` - Displays per-process charts
183
+
184
+ ## Future Improvements
185
+
186
+ 1. **Per-Process GPU:** Investigate Metal API for GPU attribution
187
+ 2. **Network I/O:** Track per-process network usage
188
+ 3. **Disk I/O:** Track per-process disk reads/writes
189
+ 4. **Thread Count:** Show number of threads used by process
190
+ 5. **Context Switches:** Show voluntary/involuntary context switches
package/README.md CHANGED
@@ -15,8 +15,9 @@ CLI tool to manage local llama.cpp servers on macOS. Provides an Ollama-like exp
15
15
  - 🤖 **Model downloads** - Pull GGUF models from Hugging Face
16
16
  - ⚙️ **Smart defaults** - Auto-configure threads, context size, and GPU layers based on model size
17
17
  - 🔌 **Auto port assignment** - Automatically find available ports (9000-9999)
18
- - 📊 **Real-time monitoring TUI** - Live server metrics with GPU/CPU usage, token generation speeds, and active slots
18
+ - 📊 **Real-time monitoring TUI** - Multi-server dashboard with drill-down details, live GPU/CPU/memory metrics, token generation speeds, and animated loading states
19
19
  - 🪵 **Smart logging** - Compact one-line request format with optional full JSON details
20
+ - ⚡️ **Optimized metrics** - Batch collection and caching prevent CPU spikes (10x fewer processes)
20
21
 
21
22
  ## Why llamacpp-cli?
22
23
 
@@ -76,6 +77,9 @@ llamacpp server create llama-3.2-3b-instruct-q4_k_m.gguf
76
77
  # View running servers
77
78
  llamacpp ps
78
79
 
80
+ # View log sizes for all servers
81
+ llamacpp logs
82
+
79
83
  # Monitor all servers (multi-server dashboard)
80
84
  llamacpp server monitor
81
85
 
@@ -268,6 +272,41 @@ Shows:
268
272
  - Memory usage (RAM consumption)
269
273
  - Uptime (how long server has been running)
270
274
 
275
+ ### `llamacpp logs [options]`
276
+ View log sizes for all servers and perform batch log operations.
277
+
278
+ ```bash
279
+ # Show table of log sizes for all servers
280
+ llamacpp logs
281
+
282
+ # Clear current logs for ALL servers (preserves archives)
283
+ llamacpp logs --clear
284
+
285
+ # Delete only archived logs for ALL servers (preserves current)
286
+ llamacpp logs --clear-archived
287
+
288
+ # Clear current + delete ALL archived logs (maximum cleanup)
289
+ llamacpp logs --clear-all
290
+
291
+ # Rotate ALL server logs with timestamps
292
+ llamacpp logs --rotate
293
+ ```
294
+
295
+ **Displays:**
296
+ - Current stderr size per server
297
+ - Current stdout size per server
298
+ - Archived logs size and count
299
+ - Total log usage per server
300
+ - Grand total across all servers
301
+
302
+ **Batch Operations:**
303
+ - `--clear` - Truncates all current logs to 0 bytes (archives preserved)
304
+ - `--clear-archived` - Deletes only archived logs (current logs preserved)
305
+ - `--clear-all` - Clears current AND deletes all archives (frees maximum space)
306
+ - `--rotate` - Archives all current logs with timestamps
307
+
308
+ **Use case:** Quickly see which servers are accumulating large logs, or clean up all logs at once.
309
+
271
310
  ## Server Management
272
311
 
273
312
  ### `llamacpp server create <model> [options]`
@@ -430,6 +469,18 @@ llamacpp server logs llama-3.2-3b --verbose
430
469
 
431
470
  # Custom filter pattern
432
471
  llamacpp server logs llama-3.2-3b --filter "error|warning"
472
+
473
+ # Clear log file (truncate to zero bytes)
474
+ llamacpp server logs llama-3.2-3b --clear
475
+
476
+ # Delete only archived logs (preserves current)
477
+ llamacpp server logs llama-3.2-3b --clear-archived
478
+
479
+ # Clear current AND delete all archived logs
480
+ llamacpp server logs llama-3.2-3b --clear-all
481
+
482
+ # Rotate log file with timestamp (preserves old logs)
483
+ llamacpp server logs llama-3.2-3b --rotate
433
484
  ```
434
485
 
435
486
  **Options:**
@@ -440,6 +491,17 @@ llamacpp server logs llama-3.2-3b --filter "error|warning"
440
491
  - `--verbose` - Show all messages including debug internals
441
492
  - `--filter <pattern>` - Custom grep pattern for filtering
442
493
  - `--stdout` - Show stdout instead of stderr (rarely needed)
494
+ - `--clear` - Clear (truncate) log file to zero bytes
495
+ - `--clear-archived` - Delete only archived logs (preserves current logs)
496
+ - `--clear-all` - Clear current logs AND delete all archived logs (frees most space)
497
+ - `--rotate` - Rotate log file with timestamp (e.g., `server.2026-01-22-19-30-00.stderr`)
498
+
499
+ **Automatic Log Rotation:**
500
+ Logs are automatically rotated when they exceed 100MB during:
501
+ - `llamacpp server start <identifier>` - Rotates before starting
502
+ - `llamacpp server config <identifier> --restart` - Rotates before restarting
503
+
504
+ Rotated logs are saved with timestamps in the same directory: `~/.llamacpp/logs/`
443
505
 
444
506
  **Output Formats:**
445
507
 
@@ -460,13 +522,15 @@ Use `--http` to see full request/response JSON, or `--verbose` option to see all
460
522
  ### `llamacpp server monitor [identifier]`
461
523
  Real-time monitoring TUI showing server metrics, GPU/CPU usage, and active inference slots.
462
524
 
525
+ ![Server Monitoring TUI](https://raw.githubusercontent.com/dweaver/llamacpp-cli/main/docs/images/monitor-detail.png)
526
+
463
527
  **Two Modes:**
464
528
 
465
529
  **1. Multi-Server Dashboard (no identifier):**
466
530
  ```bash
467
531
  llamacpp server monitor
468
532
  ```
469
- Shows overview of all servers with system resources. Press 1-9 to drill down into individual server details.
533
+ Shows overview of all servers with system resources. Use arrow keys (↑/↓) or vim keys (k/j) to navigate, then press Enter to view server details.
470
534
 
471
535
  **2. Single-Server Monitor (with identifier):**
472
536
  ```bash
@@ -487,13 +551,13 @@ llamacpp server monitor llama-3-2-3b
487
551
  │ GPU: [████░░░] 65% CPU: [███░░░] 38% Memory: 58% │
488
552
  ├─────────────────────────────────────────────────────────┤
489
553
  │ Servers (3 running, 0 stopped) │
490
- # │ Server ID │ Port │ Status │ Slots │ tok/s │
554
+ │ Server ID │ Port │ Status │ Slots │ tok/s │
491
555
  │───┼────────────────┼──────┼────────┼───────┼──────────┤
492
- 1 │ llama-3-2-3b │ 9000 │ ● RUN │ 2/4 │ 245 │
493
- 2 │ qwen2-7b │ 9001 │ ● RUN │ 1/4 │ 198 │
494
- 3 │ llama-3-1-8b │ 9002 │ ○ IDLE │ 0/4 │ - │
556
+ │ llama-3-2-3b │ 9000 │ ● RUN │ 2/4 │ 245 │ (highlighted)
557
+ │ qwen2-7b │ 9001 │ ● RUN │ 1/4 │ 198 │
558
+ │ llama-3-1-8b │ 9002 │ ○ IDLE │ 0/4 │ - │
495
559
  └─────────────────────────────────────────────────────────┘
496
- Press 1-9 for details | [Q] Quit
560
+ ↑/↓ Navigate | Enter for details | [H]istory [R]efresh [Q] Quit
497
561
  ```
498
562
 
499
563
  **Single-Server View:**
@@ -504,19 +568,65 @@ Press 1-9 for details | [Q] Quit
504
568
 
505
569
  **Keyboard Shortcuts:**
506
570
  - **Multi-Server Mode:**
507
- - `1-9` - View details for server #N
571
+ - `↑/↓` or `k/j` - Navigate server list
572
+ - `Enter` - View details for selected server
508
573
  - `ESC` - Back to list (from detail view)
574
+ - `H` - View historical metrics
509
575
  - `R` - Force refresh now
510
576
  - `+/-` - Adjust update speed
511
577
  - `Q` - Quit
512
578
  - **Single-Server Mode:**
579
+ - `H` - View historical metrics
513
580
  - `R` - Force refresh now
514
581
  - `+/-` - Adjust update speed
515
582
  - `Q` - Quit
583
+ - **Historical View:**
584
+ - `H` - Toggle Hour View (Recent ↔ Hour)
585
+ - `ESC` - Back to live monitoring
586
+ - `Q` - Quit
587
+
588
+ **Historical Monitoring:**
589
+
590
+ Press `H` from any live monitoring view to see historical time-series charts. The historical view shows:
591
+
592
+ - **Token generation speed** over time with statistics (avg, max, stddev)
593
+ - **GPU usage** over time with min/max/avg
594
+ - **CPU usage** over time with min/max/avg
595
+ - **Memory usage** over time with min/max/avg
596
+
597
+ **View Modes (Toggle with `H` key):**
598
+
599
+ - **Recent View (default):**
600
+ - Shows last 40-80 samples (~1-3 minutes)
601
+ - Raw data with no downsampling - perfect accuracy
602
+ - Best for: "What's happening right now?"
603
+
604
+ - **Hour View:**
605
+ - Shows all ~1,800 samples from last hour
606
+ - **Absolute time-aligned downsampling** (30:1 ratio) - chart stays perfectly stable
607
+ - Bucket boundaries never shift (aligned to round minutes)
608
+ - New samples only affect their own bucket, not the entire chart
609
+ - **Bucket max** for GPU/CPU/token speed (preserves peaks)
610
+ - **Bucket mean** for memory (shows average)
611
+ - Chart labels indicate "Peak per bucket" or "Average per bucket"
612
+ - Best for: "What happened over the last hour?"
613
+
614
+ **Note:** The `H` key has two functions:
615
+ - From **live monitoring** → Enter historical view (Recent mode)
616
+ - Within **historical view** → Toggle between Recent and Hour views
617
+
618
+ **Data Collection:**
619
+
620
+ Historical data is automatically collected whenever you run the monitor command. Data is retained for 24 hours in `~/.llamacpp/history/<server-id>.json` files, then automatically pruned.
621
+
622
+ **Multi-Server Historical View:**
623
+
624
+ From the multi-server dashboard, press `H` to see a summary table comparing average metrics across all servers for the last hour.
516
625
 
517
626
  **Features:**
518
627
  - **Multi-server dashboard** - Monitor all servers at once
519
628
  - **Real-time updates** - Metrics refresh every 2 seconds (adjustable)
629
+ - **Historical monitoring** - View time-series charts of past metrics (press `H` from monitor view)
520
630
  - **Token-per-second calculation** - Shows actual generation speed per slot
521
631
  - **Progress bars** - Visual representation of GPU/CPU/memory usage
522
632
  - **Error recovery** - Shows stale data with warnings if connection lost
@@ -573,7 +683,12 @@ llamacpp-cli uses macOS launchctl to manage llama-server processes:
573
683
  3. Starts the server with `launchctl start`
574
684
  4. Monitors status via `launchctl list` and `lsof`
575
685
 
576
- Services are named `com.llama.<model-id>` and persist across reboots.
686
+ Services are named `com.llama.<model-id>`.
687
+
688
+ **Auto-Restart Behavior:**
689
+ - When you **start** a server, it's registered with launchd and will auto-restart on crash
690
+ - When you **stop** a server, it's unloaded from launchd and stays stopped (no auto-restart)
691
+ - Crashed servers will automatically restart (when loaded)
577
692
 
578
693
  ## Known Limitations
579
694