npm - @appkit/llamacpp-cli - Versions diffs - 1.5.0 → 1.7.0 - Mend

@appkit/llamacpp-cli 1.5.0 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (124) hide show

package/CHANGELOG.md +20 -0
package/MONITORING-ACCURACY-FIX.md +199 -0
package/PER-PROCESS-METRICS.md +190 -0
package/README.md +124 -9
package/dist/cli.js +32 -7
package/dist/cli.js.map +1 -1
package/dist/commands/config.d.ts.map +1 -1
package/dist/commands/config.js +15 -1
package/dist/commands/config.js.map +1 -1
package/dist/commands/create.d.ts.map +1 -1
package/dist/commands/create.js +12 -4
package/dist/commands/create.js.map +1 -1
package/dist/commands/delete.js +12 -10
package/dist/commands/delete.js.map +1 -1
package/dist/commands/logs-all.d.ts +9 -0
package/dist/commands/logs-all.d.ts.map +1 -0
package/dist/commands/logs-all.js +209 -0
package/dist/commands/logs-all.js.map +1 -0
package/dist/commands/logs.d.ts +4 -0
package/dist/commands/logs.d.ts.map +1 -1
package/dist/commands/logs.js +108 -2
package/dist/commands/logs.js.map +1 -1
package/dist/commands/monitor.d.ts.map +1 -1
package/dist/commands/monitor.js +51 -1
package/dist/commands/monitor.js.map +1 -1
package/dist/commands/ps.d.ts +3 -1
package/dist/commands/ps.d.ts.map +1 -1
package/dist/commands/ps.js +75 -5
package/dist/commands/ps.js.map +1 -1
package/dist/commands/rm.d.ts.map +1 -1
package/dist/commands/rm.js +5 -12
package/dist/commands/rm.js.map +1 -1
package/dist/commands/server-show.d.ts.map +1 -1
package/dist/commands/server-show.js +30 -3
package/dist/commands/server-show.js.map +1 -1
package/dist/commands/start.d.ts.map +1 -1
package/dist/commands/start.js +34 -7
package/dist/commands/start.js.map +1 -1
package/dist/commands/stop.js +3 -3
package/dist/commands/stop.js.map +1 -1
package/dist/lib/history-manager.d.ts +46 -0
package/dist/lib/history-manager.d.ts.map +1 -0
package/dist/lib/history-manager.js +157 -0
package/dist/lib/history-manager.js.map +1 -0
package/dist/lib/metrics-aggregator.d.ts +2 -1
package/dist/lib/metrics-aggregator.d.ts.map +1 -1
package/dist/lib/metrics-aggregator.js +15 -4
package/dist/lib/metrics-aggregator.js.map +1 -1
package/dist/lib/system-collector.d.ts +9 -4
package/dist/lib/system-collector.d.ts.map +1 -1
package/dist/lib/system-collector.js +29 -28
package/dist/lib/system-collector.js.map +1 -1
package/dist/tui/HistoricalMonitorApp.d.ts +5 -0
package/dist/tui/HistoricalMonitorApp.d.ts.map +1 -0
package/dist/tui/HistoricalMonitorApp.js +490 -0
package/dist/tui/HistoricalMonitorApp.js.map +1 -0
package/dist/tui/MonitorApp.d.ts.map +1 -1
package/dist/tui/MonitorApp.js +84 -62
package/dist/tui/MonitorApp.js.map +1 -1
package/dist/tui/MultiServerMonitorApp.d.ts +1 -1
package/dist/tui/MultiServerMonitorApp.d.ts.map +1 -1
package/dist/tui/MultiServerMonitorApp.js +293 -77
package/dist/tui/MultiServerMonitorApp.js.map +1 -1
package/dist/types/history-types.d.ts +30 -0
package/dist/types/history-types.d.ts.map +1 -0
package/dist/types/history-types.js +11 -0
package/dist/types/history-types.js.map +1 -0
package/dist/types/monitor-types.d.ts +1 -0
package/dist/types/monitor-types.d.ts.map +1 -1
package/dist/types/server-config.d.ts +1 -0
package/dist/types/server-config.d.ts.map +1 -1
package/dist/types/server-config.js.map +1 -1
package/dist/utils/downsample-utils.d.ts +35 -0
package/dist/utils/downsample-utils.d.ts.map +1 -0
package/dist/utils/downsample-utils.js +107 -0
package/dist/utils/downsample-utils.js.map +1 -0
package/dist/utils/file-utils.d.ts +6 -0
package/dist/utils/file-utils.d.ts.map +1 -1
package/dist/utils/file-utils.js +38 -0
package/dist/utils/file-utils.js.map +1 -1
package/dist/utils/log-utils.d.ts +43 -0
package/dist/utils/log-utils.d.ts.map +1 -0
package/dist/utils/log-utils.js +190 -0
package/dist/utils/log-utils.js.map +1 -0
package/dist/utils/process-utils.d.ts +19 -1
package/dist/utils/process-utils.d.ts.map +1 -1
package/dist/utils/process-utils.js +79 -1
package/dist/utils/process-utils.js.map +1 -1
package/docs/images/.gitkeep +1 -0
package/package.json +3 -1
package/src/cli.ts +32 -7
package/src/commands/config.ts +15 -1
package/src/commands/create.ts +14 -5
package/src/commands/delete.ts +10 -10
package/src/commands/logs-all.ts +251 -0
package/src/commands/logs.ts +138 -2
package/src/commands/monitor.ts +21 -1
package/src/commands/ps.ts +88 -5
package/src/commands/rm.ts +5 -12
package/src/commands/server-show.ts +35 -3
package/src/commands/start.ts +35 -7
package/src/commands/stop.ts +3 -3
package/src/lib/history-manager.ts +172 -0
package/src/lib/metrics-aggregator.ts +18 -5
package/src/lib/system-collector.ts +31 -28
package/src/tui/HistoricalMonitorApp.ts +548 -0
package/src/tui/MonitorApp.ts +89 -64
package/src/tui/MultiServerMonitorApp.ts +348 -103
package/src/types/history-types.ts +39 -0
package/src/types/monitor-types.ts +1 -0
package/src/types/server-config.ts +1 -0
package/src/utils/downsample-utils.ts +128 -0
package/src/utils/file-utils.ts +40 -0
package/src/utils/log-utils.ts +178 -0
package/src/utils/process-utils.ts +85 -1
package/test-load.sh +100 -0
package/dist/tui/components/ErrorState.d.ts +0 -8
package/dist/tui/components/ErrorState.d.ts.map +0 -1
package/dist/tui/components/ErrorState.js +0 -22
package/dist/tui/components/ErrorState.js.map +0 -1
package/dist/tui/components/LoadingState.d.ts +0 -8
package/dist/tui/components/LoadingState.d.ts.map +0 -1
package/dist/tui/components/LoadingState.js +0 -21
package/dist/tui/components/LoadingState.js.map +0 -1

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,26 @@
 All notable changes to this project will be documented in this file. See [commit-and-tag-version](https://github.com/absolute-version/commit-and-tag-version) for commit guidelines.
+## [1.7.0](https://github.com/appkitstudio/llamacpp-cli/compare/v1.6.0...v1.7.0) (2026-01-23)
+### Features
+* add log management commands and auto-rotation for server logs ([e670a53](https://github.com/appkitstudio/llamacpp-cli/commit/e670a53a712d04267f06327af730dc2429e4ab43))
+## [1.6.0](https://github.com/appkitstudio/llamacpp-cli/compare/v1.5.0...v1.6.0) (2026-01-17)
+### Features
+* add full-hour downsampling functions and enhance multi-server monitor UI with dynamic server ID width ([ae2862a](https://github.com/appkitstudio/llamacpp-cli/commit/ae2862acba905cddf60f0e7c30f6a7867391a5e2))
+* add GPU memory tracking to server monitoring ([bc59c6a](https://github.com/appkitstudio/llamacpp-cli/commit/bc59c6a74580e428ab674167146caea47d8a32c1))
+* enhance monitoring functionality with server status updates and improved resource tracking ([45fb833](https://github.com/appkitstudio/llamacpp-cli/commit/45fb833da5efe023a2271e7bd12d780a71474629))
+* enhance multi-server monitor UI with improved navigation and selection indicators ([9e57cfb](https://github.com/appkitstudio/llamacpp-cli/commit/9e57cfb8ce93a2c561981598cf75f0e4ff1a477d))
+* enhance server monitoring with interactive dashboard and improved metrics display ([fba8d79](https://github.com/appkitstudio/llamacpp-cli/commit/fba8d79ee58ecd7ccfe02e319ae7bf5474b591df))
+* implement per-process metrics for historical monitoring accuracy ([cc59df0](https://github.com/appkitstudio/llamacpp-cli/commit/cc59df069775031de1bfacdeb3a462a17610e4eb))
+* improve historical monitoring UI with faster refresh rate and enhanced display elements ([e0ce04b](https://github.com/appkitstudio/llamacpp-cli/commit/e0ce04ba258f6d945a977c39f056ba22cb324c70))
 ## [1.5.0](https://github.com/appkitstudio/llamacpp-cli/compare/v1.4.1...v1.5.0) (2026-01-13)

package/MONITORING-ACCURACY-FIX.md ADDED Viewed

@@ -0,0 +1,199 @@
+# Historical Monitoring Accuracy Fix
+**STATUS:** This document describes the initial fix attempt for memory calculations. However, the root issue was that historical monitoring was showing **system-wide metrics** instead of **per-process metrics**. See `PER-PROCESS-METRICS.md` for the correct implementation.
+## Issue Summary
+Comparison between our historical monitoring and macmon revealed discrepancies in memory usage calculations.
+## Issues Identified
+### 1. Memory Total Calculation (CRITICAL)
+**Problem:** Total memory was calculated by summing all vm_stat page counts, which doesn't equal the actual installed RAM.
+**Evidence:**
+- Historical monitor showed: ~60% memory usage
+- macmon showed: 26.86 / 32.0 GB = ~84% memory usage
+- The denominator (32.0 GB installed RAM) was being calculated incorrectly
+**Root Cause:**
+```typescript
+// OLD CODE (INCORRECT)
+const totalPages = pagesActive + pagesWired + pagesCompressed +
+                   pagesFree + pagesInactive + pagesSpeculative;
+const memoryTotal = totalPages * pageSize;
+```
+This approach has fundamental flaws:
+- vm_stat doesn't report all memory categories (kernel reserved, etc.)
+- Page counts don't sum to actual installed RAM
+- Results in artificially inflated "total" value
+- Makes memory usage appear lower than reality
+**Fix:**
+```typescript
+// NEW CODE (CORRECT)
+// Get total installed RAM from sysctl (accurate)
+const memoryTotal = await execCommand('sysctl -n hw.memsize 2>/dev/null');
+```
+Use `sysctl hw.memsize` to get actual installed RAM size in bytes. This matches what Activity Monitor and macmon report.
+### 2. Memory Used Calculation (VERIFIED CORRECT)
+**Current approach:**
+```typescript
+// Used = Active + Wired + Compressed
+const usedPages = pagesActive + pagesWired + pagesCompressed;
+const memoryUsed = usedPages * pageSize;
+```
+This formula is **correct** and matches what Activity Monitor and macmon report as "used memory".
+- **Active:** Recently used memory
+- **Wired:** Kernel memory that can't be paged out
+- **Compressed:** Compressed pages in RAM
+We removed the calculation of unused page types (free, inactive, speculative) since they're not needed.
+### 3. CPU Calculation (VERIFIED CORRECT)
+**Formula:**
+```typescript
+cpuUsage = ((pcpuUsage * pCoreCount) + (ecpuUsage * eCoreCount)) / totalCores * 100
+```
+This weighted average is mathematically correct:
+- macmon reports per-core-type averages (P-CPU: 25%, E-CPU: 36%)
+- Formula computes overall system average: `(25% × 6 + 36% × 4) / 10 = 29.4%`
+- Historical average of 33% is reasonable given fluctuations over time
+### 4. GPU Calculation (VERIFIED CORRECT)
+**Observation:**
+- Historical: Avg: 1.8%, Max: 4.0%, Min: 0.6%
+- macmon snapshot: GPU 4%
+This is **expected behavior**:
+- GPU is mostly idle (0-2%) between inference requests
+- Spikes to 4% during active token generation
+- Average of 1.8% correctly reflects mostly-idle state
+- Max of 4.0% matches macmon's instantaneous reading
+## Changes Made
+### `src/lib/system-collector.ts`
+**1. Removed total memory calculation from vm_stat parsing:**
+```typescript
+// Now only returns memoryUsed
+private parseVmStatOutput(output: string): { memoryUsed: number }
+```
+**2. Added method to get actual installed RAM:**
+```typescript
+private async getTotalMemory(): Promise<number> {
+  const output = await execCommand('sysctl -n hw.memsize 2>/dev/null');
+  return parseInt(output.trim(), 10) || 0;
+}
+```
+**3. Combined both sources in new method:**
+```typescript
+private async getMemoryMetrics(): Promise<{
+  memoryUsed: number;
+  memoryTotal: number;
+}> {
+  // Get used memory from vm_stat (active + wired + compressed)
+  const vmStatOutput = await execCommand('vm_stat 2>/dev/null');
+  const { memoryUsed } = this.parseVmStatOutput(vmStatOutput);
+  // Get total installed RAM from sysctl (accurate)
+  const memoryTotal = await this.getTotalMemory();
+  return { memoryUsed, memoryTotal };
+}
+```
+**4. Updated collector to use new method:**
+```typescript
+// Always get memory from vm_stat + sysctl (accurate total from sysctl)
+const memoryMetrics = await this.getMemoryMetrics();
+```
+## Verification
+After these changes, memory usage should now accurately match macmon and Activity Monitor:
+**Before:**
+- Total: Calculated from page sum (~40 GB equivalent)
+- Used: 26.86 GB
+- **Percentage: ~60% (WRONG)**
+**After:**
+- Total: 32.0 GB (from `sysctl hw.memsize`)
+- Used: 26.86 GB (from vm_stat)
+- **Percentage: ~84% (CORRECT)**
+## Testing Recommendations
+1. **Compare with macmon:**
+   ```bash
+   # Terminal 1: Run macmon
+   macmon
+   # Terminal 2: Monitor server
+   npm run dev -- server monitor <server-id>
+   ```
+   Memory percentages should now match within 1-2%.
+2. **Compare with Activity Monitor:**
+   - Open Activity Monitor → Memory tab
+   - Check "Memory Used" value
+   - Should match historical monitor's memory calculation
+3. **Verify historical data:**
+   ```bash
+   # View historical metrics (press H in monitor)
+   npm run dev -- server monitor <server-id>
+   # Press 'H' to toggle historical view
+   ```
+   Memory usage should now show realistic values (~80-90% on actively used system).
+4. **Check edge cases:**
+   - Fresh boot (low memory usage ~30-40%)
+   - Under load (high memory usage ~85-95%)
+   - Multiple servers running (memory should increase proportionally)
+## Impact on Historical Data
+**Note:** Existing historical data was collected with the old (incorrect) calculation.
+**Options:**
+1. **Keep old data as-is** (recommended for now)
+   - Historical charts will show old incorrect baseline
+   - New data will be accurate going forward
+   - Natural transition over 24 hours as old data ages out
+2. **Clear history and start fresh:**
+   ```bash
+   rm ~/.llamacpp/history/*.json
+   ```
+   - Immediate accuracy
+   - Lose historical context
+## Related Files
+- `src/lib/system-collector.ts` - System metrics collection (MODIFIED)
+- `src/lib/history-manager.ts` - History persistence (unchanged)
+- `src/tui/HistoricalMonitorApp.ts` - Historical UI (unchanged)
+## References
+- macOS `vm_stat` documentation: Reports memory in pages (16KB on Apple Silicon)
+- macOS `sysctl` documentation: `hw.memsize` reports installed RAM in bytes
+- Activity Monitor algorithm: Uses active + wired + compressed for "Memory Used"

package/PER-PROCESS-METRICS.md ADDED Viewed

@@ -0,0 +1,190 @@
+# Per-Process Metrics Implementation
+## Overview
+Historical monitoring now shows **per-process metrics** for the specific llama-server being monitored, rather than system-wide metrics. This provides accurate resource usage for each model.
+## What Changed
+### Before (System-Wide)
+- **GPU Usage:** All processes combined
+- **CPU Usage:** All processes combined
+- **Memory Usage:** All processes combined (% of total RAM)
+### After (Per-Process)
+- **GPU Usage:** System-wide (unchanged - can't isolate per-process on macOS)
+- **CPU Usage:** Just the llama-server process (from `ps`)
+- **Memory Usage:** Just the llama-server process in GB (from `top`)
+## Implementation Details
+### 1. Process Metrics Collection
+**Added CPU collection (`src/utils/process-utils.ts`):**
+```typescript
+// Batch collection for efficiency
+export async function getBatchProcessCpu(pids: number[]): Promise<Map<number, number | null>>
+// Single process collection
+export async function getProcessCpu(pid: number): Promise<number | null>
+```
+**Features:**
+- Uses `ps -p <pid> -o %cpu` to get per-process CPU percentage
+- 3-second cache to prevent excessive process spawning
+- Batch collection for multi-server monitoring
+- Returns percentage (0-100+, can exceed 100% on multi-core)
+### 2. Type Updates
+**ServerMetrics interface (`src/types/monitor-types.ts`):**
+```typescript
+export interface ServerMetrics {
+  // ... existing fields
+  processMemory?: number;     // Already existed
+  processCpuUsage?: number;   // NEW: Per-process CPU %
+}
+```
+**HistorySnapshot interface (`src/types/history-types.ts`):**
+```typescript
+export interface HistorySnapshot {
+  server: {
+    // ... existing fields
+    processMemory?: number;      // Already existed
+    processCpuUsage?: number;    // NEW: Per-process CPU %
+  };
+  system?: {
+    // ... system-wide metrics (kept for live monitoring)
+  };
+}
+```
+### 3. Metrics Collection
+**MetricsAggregator (`src/lib/metrics-aggregator.ts`):**
+- Added `processCpuUsage` parameter to `collectServerMetrics()`
+- Collects CPU in parallel with other metrics
+- Supports batch collection for multi-server scenarios
+**HistoryManager (`src/lib/history-manager.ts`):**
+- Saves `processCpuUsage` in snapshots
+- Maintains backward compatibility (optional field)
+### 4. Historical Monitor UI
+**HistoricalMonitorApp (`src/tui/HistoricalMonitorApp.ts`):**
+**Chart Changes:**
+**GPU Usage:**
+- **Unchanged:** Still system-wide
+- **Reason:** macOS doesn't provide per-process GPU metrics easily
+- **Label:** "GPU Usage (%)"
+**CPU Usage:**
+- **Before:** `snapshot.system.cpuUsage` (system-wide)
+- **After:** `snapshot.server.processCpuUsage` (per-process)
+- **Label:** "Process CPU Usage (%)"
+- **Range:** Not forced to 0-100% (can show >100% for multi-threaded workloads)
+**Memory Usage:**
+- **Before:** `(system.memoryUsed / system.memoryTotal) * 100` (system-wide %)
+- **After:** `processMemory / (1024 * 1024 * 1024)` (per-process GB)
+- **Label:** "Process Memory Usage (GB)"
+- **Format:** Shows 2 decimal places (e.g., "3.45 GB")
+- **Statistics:** Avg, Max, Min in GB
+**Multi-Server Comparison:**
+- Table also updated to show per-process CPU and memory
+- Memory column now shows GB instead of %
+## Benefits
+1. **Accurate Attribution:** See exactly what each model is using
+2. **Multi-Server Clarity:** Compare resource usage across different models
+3. **Debugging:** Identify which specific model is consuming resources
+4. **Capacity Planning:** Understand per-model requirements
+## Example Output
+**Before (System-Wide):**
+```
+CPU Usage (%)
+  Avg: 33.0% (±17.4)  Max: 86.6%  Min: 12.0%
+Memory Usage (%)
+  Avg: 31.0% (±0.6)  Max: 31.9%  Min: 29.9%
+```
+**After (Per-Process):**
+```
+Process CPU Usage (%)
+  Avg: 45.2% (±12.3)  Max: 120.5%  Min: 8.1%
+Process Memory Usage (GB)
+  Avg: 3.45 GB (±0.12)  Max: 3.67 GB  Min: 3.21 GB
+```
+## Edge Cases Handled
+1. **Missing Data:** Fields are optional, gracefully handles old snapshots
+2. **Process Not Running:** Returns null, charts skip those data points
+3. **Multi-Core:** CPU can exceed 100% (expected behavior)
+4. **Cache Expiry:** 3-second TTL prevents stale data
+5. **Batch Collection:** Efficient when monitoring multiple servers
+## Testing Recommendations
+1. **Single Server:**
+   ```bash
+   npm run dev -- server monitor <server-id>
+   # Press 'H' to view historical data
+   ```
+   - Verify CPU shows reasonable per-process values (not system-wide)
+   - Verify memory shows model size in GB (not total RAM %)
+2. **Multi-Server:**
+   ```bash
+   npm run dev -- server monitor
+   # Press 'H' to view comparison table
+   ```
+   - Verify each server shows different CPU/memory values
+   - Verify table shows GB for memory column
+3. **Compare with Activity Monitor:**
+   - Open Activity Monitor
+   - Filter for `llama-server` process
+   - CPU % should match within 5-10%
+   - Memory should match within 0.1 GB
+4. **Compare with `ps`:**
+   ```bash
+   ps -p <pid> -o %cpu,rss
+   ```
+   - CPU % should match
+   - RSS (memory) should match when converted to GB
+## Backward Compatibility
+- Old history files still work (missing fields treated as undefined)
+- System-wide metrics still collected for live monitoring
+- Live monitoring TUI unchanged (still shows system-wide for context)
+- Only historical view changed to per-process
+## Related Files
+- `src/utils/process-utils.ts` - Added CPU collection functions
+- `src/types/monitor-types.ts` - Added processCpuUsage field
+- `src/types/history-types.ts` - Added processCpuUsage to snapshots
+- `src/lib/metrics-aggregator.ts` - Collects CPU metrics
+- `src/lib/history-manager.ts` - Saves CPU metrics
+- `src/tui/HistoricalMonitorApp.ts` - Displays per-process charts
+## Future Improvements
+1. **Per-Process GPU:** Investigate Metal API for GPU attribution
+2. **Network I/O:** Track per-process network usage
+3. **Disk I/O:** Track per-process disk reads/writes
+4. **Thread Count:** Show number of threads used by process
+5. **Context Switches:** Show voluntary/involuntary context switches

package/README.md CHANGED Viewed

@@ -15,8 +15,9 @@ CLI tool to manage local llama.cpp servers on macOS. Provides an Ollama-like exp
 - 🤖 **Model downloads** - Pull GGUF models from Hugging Face
 - ⚙️ **Smart defaults** - Auto-configure threads, context size, and GPU layers based on model size
 - 🔌 **Auto port assignment** - Automatically find available ports (9000-9999)
-- 📊 **Real-time monitoring TUI** - Live server metrics with GPU/CPU usage, token generation speeds, and active slots
+- 📊 **Real-time monitoring TUI** - Multi-server dashboard with drill-down details, live GPU/CPU/memory metrics, token generation speeds, and animated loading states
 - 🪵 **Smart logging** - Compact one-line request format with optional full JSON details
+- ⚡️ **Optimized metrics** - Batch collection and caching prevent CPU spikes (10x fewer processes)
 ## Why llamacpp-cli?
@@ -76,6 +77,9 @@ llamacpp server create llama-3.2-3b-instruct-q4_k_m.gguf
 # View running servers
 llamacpp ps
+# View log sizes for all servers
+llamacpp logs
 # Monitor all servers (multi-server dashboard)
 llamacpp server monitor
@@ -268,6 +272,41 @@ Shows:
 - Memory usage (RAM consumption)
 - Uptime (how long server has been running)
+### `llamacpp logs [options]`
+View log sizes for all servers and perform batch log operations.
+```bash
+# Show table of log sizes for all servers
+llamacpp logs
+# Clear current logs for ALL servers (preserves archives)
+llamacpp logs --clear
+# Delete only archived logs for ALL servers (preserves current)
+llamacpp logs --clear-archived
+# Clear current + delete ALL archived logs (maximum cleanup)
+llamacpp logs --clear-all
+# Rotate ALL server logs with timestamps
+llamacpp logs --rotate
+```
+**Displays:**
+- Current stderr size per server
+- Current stdout size per server
+- Archived logs size and count
+- Total log usage per server
+- Grand total across all servers
+**Batch Operations:**
+- `--clear` - Truncates all current logs to 0 bytes (archives preserved)
+- `--clear-archived` - Deletes only archived logs (current logs preserved)
+- `--clear-all` - Clears current AND deletes all archives (frees maximum space)
+- `--rotate` - Archives all current logs with timestamps
+**Use case:** Quickly see which servers are accumulating large logs, or clean up all logs at once.
 ## Server Management
 ### `llamacpp server create <model> [options]`
@@ -430,6 +469,18 @@ llamacpp server logs llama-3.2-3b --verbose
 # Custom filter pattern
 llamacpp server logs llama-3.2-3b --filter "error|warning"
+# Clear log file (truncate to zero bytes)
+llamacpp server logs llama-3.2-3b --clear
+# Delete only archived logs (preserves current)
+llamacpp server logs llama-3.2-3b --clear-archived
+# Clear current AND delete all archived logs
+llamacpp server logs llama-3.2-3b --clear-all
+# Rotate log file with timestamp (preserves old logs)
+llamacpp server logs llama-3.2-3b --rotate
 ```
 **Options:**
@@ -440,6 +491,17 @@ llamacpp server logs llama-3.2-3b --filter "error|warning"
 - `--verbose` - Show all messages including debug internals
 - `--filter <pattern>` - Custom grep pattern for filtering
 - `--stdout` - Show stdout instead of stderr (rarely needed)
+- `--clear` - Clear (truncate) log file to zero bytes
+- `--clear-archived` - Delete only archived logs (preserves current logs)
+- `--clear-all` - Clear current logs AND delete all archived logs (frees most space)
+- `--rotate` - Rotate log file with timestamp (e.g., `server.2026-01-22-19-30-00.stderr`)
+**Automatic Log Rotation:**
+Logs are automatically rotated when they exceed 100MB during:
+- `llamacpp server start <identifier>` - Rotates before starting
+- `llamacpp server config <identifier> --restart` - Rotates before restarting
+Rotated logs are saved with timestamps in the same directory: `~/.llamacpp/logs/`
 **Output Formats:**
@@ -460,13 +522,15 @@ Use `--http` to see full request/response JSON, or `--verbose` option to see all
 ### `llamacpp server monitor [identifier]`
 Real-time monitoring TUI showing server metrics, GPU/CPU usage, and active inference slots.
+![Server Monitoring TUI](https://raw.githubusercontent.com/dweaver/llamacpp-cli/main/docs/images/monitor-detail.png)
 **Two Modes:**
 **1. Multi-Server Dashboard (no identifier):**
 ```bash
 llamacpp server monitor
 ```
-Shows overview of all servers with system resources. Press 1-9 to drill down into individual server details.
+Shows overview of all servers with system resources. Use arrow keys (↑/↓) or vim keys (k/j) to navigate, then press Enter to view server details.
 **2. Single-Server Monitor (with identifier):**
 ```bash
@@ -487,13 +551,13 @@ llamacpp server monitor llama-3-2-3b
 │ GPU: [████░░░] 65%  CPU: [███░░░] 38%  Memory: 58%     │
 ├─────────────────────────────────────────────────────────┤
 │ Servers (3 running, 0 stopped)                          │
-│ # │ Server ID      │ Port │ Status │ Slots │ tok/s    │
+│   │ Server ID      │ Port │ Status │ Slots │ tok/s    │
 │───┼────────────────┼──────┼────────┼───────┼──────────┤
-│ 1 │ llama-3-2-3b   │ 9000 │ ● RUN  │ 2/4   │ 245      │
-│ 2 │ qwen2-7b       │ 9001 │ ● RUN  │ 1/4   │ 198      │
-│ 3 │ llama-3-1-8b   │ 9002 │ ○ IDLE │ 0/4   │ -        │
+│ ► │ llama-3-2-3b   │ 9000 │ ● RUN  │ 2/4   │ 245      │  (highlighted)
+│   │ qwen2-7b       │ 9001 │ ● RUN  │ 1/4   │ 198      │
+│   │ llama-3-1-8b   │ 9002 │ ○ IDLE │ 0/4   │ -        │
 └─────────────────────────────────────────────────────────┘
-Press 1-9 for details | [Q] Quit
+↑/↓ Navigate | Enter for details | [H]istory [R]efresh [Q] Quit
 ```
 **Single-Server View:**
@@ -504,19 +568,65 @@ Press 1-9 for details | [Q] Quit
 **Keyboard Shortcuts:**
 - **Multi-Server Mode:**
-  - `1-9` - View details for server #N
+  - `↑/↓` or `k/j` - Navigate server list
+  - `Enter` - View details for selected server
   - `ESC` - Back to list (from detail view)
+  - `H` - View historical metrics
   - `R` - Force refresh now
   - `+/-` - Adjust update speed
   - `Q` - Quit
 - **Single-Server Mode:**
+  - `H` - View historical metrics
   - `R` - Force refresh now
   - `+/-` - Adjust update speed
   - `Q` - Quit
+- **Historical View:**
+  - `H` - Toggle Hour View (Recent ↔ Hour)
+  - `ESC` - Back to live monitoring
+  - `Q` - Quit
+**Historical Monitoring:**
+Press `H` from any live monitoring view to see historical time-series charts. The historical view shows:
+- **Token generation speed** over time with statistics (avg, max, stddev)
+- **GPU usage** over time with min/max/avg
+- **CPU usage** over time with min/max/avg
+- **Memory usage** over time with min/max/avg
+**View Modes (Toggle with `H` key):**
+- **Recent View (default):**
+  - Shows last 40-80 samples (~1-3 minutes)
+  - Raw data with no downsampling - perfect accuracy
+  - Best for: "What's happening right now?"
+- **Hour View:**
+  - Shows all ~1,800 samples from last hour
+  - **Absolute time-aligned downsampling** (30:1 ratio) - chart stays perfectly stable
+  - Bucket boundaries never shift (aligned to round minutes)
+  - New samples only affect their own bucket, not the entire chart
+  - **Bucket max** for GPU/CPU/token speed (preserves peaks)
+  - **Bucket mean** for memory (shows average)
+  - Chart labels indicate "Peak per bucket" or "Average per bucket"
+  - Best for: "What happened over the last hour?"
+**Note:** The `H` key has two functions:
+- From **live monitoring** → Enter historical view (Recent mode)
+- Within **historical view** → Toggle between Recent and Hour views
+**Data Collection:**
+Historical data is automatically collected whenever you run the monitor command. Data is retained for 24 hours in `~/.llamacpp/history/<server-id>.json` files, then automatically pruned.
+**Multi-Server Historical View:**
+From the multi-server dashboard, press `H` to see a summary table comparing average metrics across all servers for the last hour.
 **Features:**
 - **Multi-server dashboard** - Monitor all servers at once
 - **Real-time updates** - Metrics refresh every 2 seconds (adjustable)
+- **Historical monitoring** - View time-series charts of past metrics (press `H` from monitor view)
 - **Token-per-second calculation** - Shows actual generation speed per slot
 - **Progress bars** - Visual representation of GPU/CPU/memory usage
 - **Error recovery** - Shows stale data with warnings if connection lost
@@ -573,7 +683,12 @@ llamacpp-cli uses macOS launchctl to manage llama-server processes:
 3. Starts the server with `launchctl start`
 4. Monitors status via `launchctl list` and `lsof`
-Services are named `com.llama.<model-id>` and persist across reboots.
+Services are named `com.llama.<model-id>`.
+**Auto-Restart Behavior:**
+- When you **start** a server, it's registered with launchd and will auto-restart on crash
+- When you **stop** a server, it's unloaded from launchd and stays stopped (no auto-restart)
+- Crashed servers will automatically restart (when loaded)
 ## Known Limitations