agentgui 1.0.274 → 1.0.276

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (70) hide show
  1. package/CLAUDE.md +280 -280
  2. package/IPFS_DOWNLOADER.md +277 -277
  3. package/TASK_2C_COMPLETION.md +334 -334
  4. package/agentgui.ico +0 -0
  5. package/bin/gmgui.cjs +54 -54
  6. package/build-portable.js +13 -42
  7. package/database.js +1422 -1406
  8. package/lib/claude-runner.js +1130 -1130
  9. package/lib/ipfs-downloader.js +459 -459
  10. package/lib/speech.js +159 -152
  11. package/package.json +1 -1
  12. package/readme.md +76 -76
  13. package/server.js +3787 -3794
  14. package/setup-npm-token.sh +68 -68
  15. package/static/app.js +773 -773
  16. package/static/event-rendering-showcase.html +708 -708
  17. package/static/index.html +3178 -3180
  18. package/static/js/agent-auth.js +298 -298
  19. package/static/js/audio-recorder-processor.js +18 -18
  20. package/static/js/client.js +2656 -2656
  21. package/static/js/conversations.js +583 -583
  22. package/static/js/dialogs.js +267 -267
  23. package/static/js/event-consolidator.js +101 -101
  24. package/static/js/event-filter.js +311 -311
  25. package/static/js/event-processor.js +452 -452
  26. package/static/js/features.js +413 -413
  27. package/static/js/kalman-filter.js +67 -67
  28. package/static/js/progress-dialog.js +130 -130
  29. package/static/js/script-runner.js +219 -219
  30. package/static/js/streaming-renderer.js +2123 -2120
  31. package/static/js/syntax-highlighter.js +269 -269
  32. package/static/js/tts-websocket-handler.js +152 -152
  33. package/static/js/ui-components.js +431 -431
  34. package/static/js/voice.js +849 -849
  35. package/static/js/websocket-manager.js +596 -596
  36. package/static/templates/INDEX.html +465 -465
  37. package/static/templates/README.md +190 -190
  38. package/static/templates/agent-capabilities.html +56 -56
  39. package/static/templates/agent-metadata-panel.html +44 -44
  40. package/static/templates/agent-status-badge.html +30 -30
  41. package/static/templates/code-annotation-panel.html +155 -155
  42. package/static/templates/code-suggestion-panel.html +184 -184
  43. package/static/templates/command-header.html +77 -77
  44. package/static/templates/command-output-scrollable.html +118 -118
  45. package/static/templates/elapsed-time.html +54 -54
  46. package/static/templates/error-alert.html +106 -106
  47. package/static/templates/error-history-timeline.html +160 -160
  48. package/static/templates/error-recovery-options.html +109 -109
  49. package/static/templates/error-stack-trace.html +95 -95
  50. package/static/templates/error-summary.html +80 -80
  51. package/static/templates/event-counter.html +48 -48
  52. package/static/templates/execution-actions.html +97 -97
  53. package/static/templates/execution-progress-bar.html +80 -80
  54. package/static/templates/execution-stepper.html +120 -120
  55. package/static/templates/file-breadcrumb.html +118 -118
  56. package/static/templates/file-diff-viewer.html +121 -121
  57. package/static/templates/file-metadata.html +133 -133
  58. package/static/templates/file-read-panel.html +66 -66
  59. package/static/templates/file-write-panel.html +120 -120
  60. package/static/templates/git-branch-remote.html +107 -107
  61. package/static/templates/git-diff-list.html +101 -101
  62. package/static/templates/git-log-visualization.html +153 -153
  63. package/static/templates/git-status-panel.html +115 -115
  64. package/static/templates/quality-metrics-display.html +170 -170
  65. package/static/templates/terminal-output-panel.html +87 -87
  66. package/static/templates/test-results-display.html +144 -144
  67. package/static/theme.js +72 -72
  68. package/test-download-progress.js +223 -223
  69. package/test-websocket-broadcast.js +147 -147
  70. package/tests/ipfs-downloader.test.js +370 -370
package/CLAUDE.md CHANGED
@@ -1,280 +1,280 @@
1
- # AgentGUI
2
-
3
- Multi-agent GUI client for AI coding agents (Claude Code, Gemini CLI, OpenCode, Goose, etc.) with real-time streaming, WebSocket sync, and SQLite persistence.
4
-
5
- ## Running
6
-
7
- ```bash
8
- npm install
9
- npm run dev # node server.js --watch
10
- ```
11
-
12
- Server starts on `http://localhost:3000`, redirects to `/gm/`.
13
-
14
- ## Architecture
15
-
16
- ```
17
- server.js HTTP server + WebSocket + all API routes (raw http.createServer)
18
- database.js SQLite setup (WAL mode), schema, query functions
19
- lib/claude-runner.js Agent framework - spawns CLI processes, parses stream-json output
20
- lib/speech.js Speech-to-text and text-to-speech via @huggingface/transformers
21
- bin/gmgui.cjs CLI entry point (npx agentgui / bunx agentgui)
22
- static/index.html Main HTML shell
23
- static/app.js App initialization
24
- static/theme.js Theme switching
25
- static/js/client.js Main client logic
26
- static/js/conversations.js Conversation management
27
- static/js/streaming-renderer.js Renders Claude streaming events as HTML
28
- static/js/event-processor.js Processes incoming events
29
- static/js/event-filter.js Filters events by type
30
- static/js/websocket-manager.js WebSocket connection handling
31
- static/js/ui-components.js UI component helpers
32
- static/js/syntax-highlighter.js Code syntax highlighting
33
- static/js/voice.js Voice input/output
34
- static/js/features.js Feature flags
35
- static/templates/ 31 HTML template fragments for event rendering
36
- ```
37
-
38
- ## Key Details
39
-
40
- - Express is used only for file upload (`/api/upload/:conversationId`) and fsbrowse file browser (`/files/:conversationId`). All other routes use raw `http.createServer` with manual routing.
41
- - Agent discovery scans PATH for known CLI binaries (claude, opencode, gemini, goose, etc.) at startup.
42
- - Database lives at `~/.gmgui/data.db`. Tables: conversations, messages, events, sessions, stream chunks.
43
- - WebSocket endpoint is at `BASE_URL + /sync`. Supports subscribe/unsubscribe by sessionId or conversationId, and ping.
44
-
45
- ## Environment Variables
46
-
47
- - `PORT` - Server port (default: 3000)
48
- - `BASE_URL` - URL prefix (default: /gm)
49
- - `STARTUP_CWD` - Working directory passed to agents
50
- - `HOT_RELOAD` - Set to "false" to disable watch mode
51
-
52
- ## REST API
53
-
54
- All routes are prefixed with `BASE_URL` (default `/gm`).
55
-
56
- - `GET /api/conversations` - List conversations
57
- - `POST /api/conversations` - Create conversation (body: agentId, title, workingDirectory)
58
- - `GET /api/conversations/:id` - Get conversation with streaming status
59
- - `POST /api/conversations/:id` - Update conversation
60
- - `DELETE /api/conversations/:id` - Delete conversation
61
- - `GET /api/conversations/:id/messages` - Get messages (query: limit, offset)
62
- - `POST /api/conversations/:id/messages` - Send message (body: content, agentId)
63
- - `POST /api/conversations/:id/stream` - Start streaming execution
64
- - `GET /api/conversations/:id/full` - Full conversation load with chunks
65
- - `GET /api/conversations/:id/chunks` - Get stream chunks (query: since)
66
- - `GET /api/conversations/:id/sessions/latest` - Get latest session
67
- - `GET /api/sessions/:id` - Get session
68
- - `GET /api/sessions/:id/chunks` - Get session chunks (query: since)
69
- - `GET /api/sessions/:id/execution` - Get execution events (query: limit, offset, filterType)
70
- - `GET /api/agents` - List discovered agents
71
- - `GET /api/home` - Get home directory
72
- - `POST /api/stt` - Speech-to-text (raw audio body)
73
- - `POST /api/tts` - Text-to-speech (body: text)
74
- - `GET /api/speech-status` - Speech model loading status
75
- - `POST /api/folders` - Create folder
76
-
77
- ## WebSocket Protocol
78
-
79
- Endpoint: `BASE_URL + /sync`
80
-
81
- Client sends:
82
- - `{ type: "subscribe", sessionId }` or `{ type: "subscribe", conversationId }`
83
- - `{ type: "unsubscribe", sessionId }`
84
- - `{ type: "ping" }`
85
-
86
- Server broadcasts:
87
- - `streaming_start` - Agent execution started
88
- - `streaming_progress` - New event/chunk from agent
89
- - `streaming_complete` - Execution finished
90
- - `streaming_error` - Execution failed
91
- - `conversation_created`, `conversation_updated`, `conversation_deleted`
92
- - `tts_setup_progress` - Windows pocket-tts setup progress (step, status, message)
93
-
94
- ## Pocket-TTS Windows Setup (Reliability for Slow/Bad Internet)
95
-
96
- On Windows, text-to-speech uses pocket-tts which requires Python and pip install. The setup process is now resilient to slow/unreliable connections:
97
-
98
- ### Features
99
- - **Extended timeouts**: 120s for pip install (accommodates slow connections)
100
- - **Retry logic**: 3 attempts with exponential backoff (1s, 2s delays)
101
- - **Progress reporting**: Real-time updates via WebSocket to UI
102
- - **Partial install cleanup**: Failed venvs are removed to allow retry
103
- - **Installation verification**: Binary validation via `--version` check
104
- - **Concurrent waiting**: Multiple simultaneous requests wait for single setup (600s timeout)
105
-
106
- ### Configuration (lib/windows-pocket-tts-setup.js)
107
- ```javascript
108
- const CONFIG = {
109
- PIP_TIMEOUT: 120000, // 2 minutes
110
- VENV_CREATION_TIMEOUT: 30000, // 30 seconds
111
- MAX_RETRIES: 3, // 3 attempts
112
- RETRY_DELAY_MS: 1000, // 1 second initial
113
- RETRY_BACKOFF_MULTIPLIER: 2, // 2x exponential
114
- };
115
- ```
116
-
117
- ### Network Requirements
118
- - **Minimum**: 50 kbps sustained, < 5s latency, < 10% packet loss
119
- - **Recommended**: 256+ kbps, < 2s latency, < 1% packet loss
120
- - **Expected time on slow connection**: 2-6 minutes with retries
121
-
122
- ### Progress Messages
123
- During TTS setup on first use, WebSocket broadcasts:
124
- ```json
125
- {
126
- "type": "tts_setup_progress",
127
- "step": "detecting-python|creating-venv|installing|verifying",
128
- "status": "in-progress|success|error",
129
- "message": "descriptive status message with retry count if applicable"
130
- }
131
- ```
132
-
133
- ### Recovery Behavior
134
- 1. Network timeout → auto-retry with backoff
135
- 2. Partial venv → auto-cleanup before retry
136
- 3. Failed verification → auto-cleanup and error
137
- 4. Concurrent requests → first starts setup, others wait up to 600s
138
- 5. Interrupted setup → cleanup allows fresh retry
139
-
140
- ### Testing
141
- Setup validates by running pocket-tts binary with `--version` flag to confirm functional installation, not just file existence.
142
-
143
- ## Model Download Fallback Chain Architecture (Task 1C)
144
-
145
- Three-layer resilient fallback for speech models (280MB whisper-base + 197MB TTS). Designed to eliminate single points of failure while maintaining backward compatibility.
146
-
147
- ### Layer 1: IPFS Gateway (Primary)
148
-
149
- Decentralized distribution across three gateways with automatic failover:
150
-
151
- ```
152
- Cloudflare IPFS https://cloudflare-ipfs.com/ipfs/ Priority 1 (99.9% reliable)
153
- dweb.link https://dweb.link/ipfs/ Priority 2 (99% reliable)
154
- Pinata https://gateway.pinata.cloud/ipfs/ Priority 3 (99.5% reliable)
155
- ```
156
-
157
- **Model Distribution**:
158
- - Whisper Base (280MB): `TBD_WHISPER_HASH` → encoder (78.6MB) + decoder (198.9MB) + configs
159
- - TTS Models (197MB): `TBD_TTS_HASH` → mimi_encoder (73MB) + decoders + text_conditioner + flow_lm
160
-
161
- **Characteristics**: 30s timeout per gateway, 2 retries before fallback, SHA-256 per-file verification against IPFS-stored manifest
162
-
163
- ### Layer 2: HuggingFace (Secondary)
164
-
165
- Current working implementation via webtalk package. Proven reliable with region-dependent latency.
166
-
167
- ```
168
- Whisper https://huggingface.co/onnx-community/whisper-base/resolve/main/
169
- TTS https://huggingface.co/datasets/AnEntrypoint/sttttsmodels/resolve/main/tts/
170
- ```
171
-
172
- **Characteristics**: 3 retries with exponential backoff (2^attempt seconds), 30s timeout, file size validation (minBytes thresholds: encoder ≥40MB, decoder ≥100MB, TTS files ≥18-61MB range)
173
-
174
- **Implementation Location**: webtalk/whisper-models.js, webtalk/tts-models.js (unchanged, wrapped by fallback logic)
175
-
176
- ### Layer 3: Local Cache + Fallbacks
177
-
178
- **Primary Cache**: `~/.gmgui/models/` with manifest at `~/.gmgui/models/.manifests.json`
179
-
180
- **Verification Algorithms**:
181
- 1. Size check (minBytes threshold) → corrupted: delete & retry
182
- 2. SHA-256 hash against manifest → mismatch: delete & re-download
183
- 3. ONNX format validation (header check) → invalid: delete & escalate to primary
184
-
185
- **Bundled Models** (future): `agentgui/bundled-models.tar.gz` (~50-80MB) for offline-first deployments
186
-
187
- **Peer-to-Peer** (future): mDNS discovery for LAN sharing across multiple AgentGUI instances
188
-
189
- ### Download Decision Logic
190
-
191
- ```
192
- 1. Check local cache validity → RETURN if valid, record cache_hit metric
193
- 2. TRY PRIMARY (IPFS): attempt 3 gateways sequentially, 2 retries each
194
- - VERIFY size + sha256 → ON SUCCESS: record primary_success, return
195
- 3. TRY SECONDARY (HuggingFace): 3 attempts with exponential backoff
196
- - VERIFY file size → ON SUCCESS: record secondary_success, return
197
- 4. TRY TERTIARY (Bundled): extract tarball if present
198
- - VERIFY extraction → ON SUCCESS: record tertiary_bundled_success, return
199
- 5. TRY TERTIARY (Peer): query mDNS if enabled, fetch from peer
200
- - VERIFY checksum → ON SUCCESS: record tertiary_peer_success, return
201
- 6. FAILURE: record all_layers_exhausted metric, throw error (optional: activate degraded mode)
202
- ```
203
-
204
- ### Metrics Collection
205
-
206
- **Storage**: `~/.gmgui/models/.metrics.json` (append-only, rotated daily)
207
-
208
- **Per-Download Fields**: timestamp, modelType, layer, gateway, status, latency_ms, bytes_downloaded/total, error_type/message
209
-
210
- **Aggregations**: per-layer success rate, per-gateway success rate, avg latency per layer, cache effectiveness
211
-
212
- **Dashboard Endpoints**:
213
- - `GET /api/metrics/downloads` - all metrics
214
- - `GET /api/metrics/downloads/summary` - aggregated stats
215
- - `GET /api/metrics/downloads/health` - per-layer health
216
- - `POST /api/metrics/downloads/reset` - clear history
217
-
218
- ### Cache Invalidation Strategy
219
-
220
- **Version Manifest** (`~/.gmgui/models/.manifests.json`):
221
- ```json
222
- {
223
- "whisper-base": {
224
- "currentVersion": "1.0.0",
225
- "ipfsHash": "QmXXXX...",
226
- "huggingfaceTag": "revision-hash",
227
- "downloadedAt": "ISO8601",
228
- "sha256": { "file": "hash...", ... }
229
- },
230
- "tts-models": { ... }
231
- }
232
- ```
233
-
234
- **Version Mismatch Detection** (on startup + periodic background check):
235
- - Query HuggingFace API HEAD for latest revision
236
- - Query IPFS gateway for latest dag-json manifest
237
- - If new version: log warning, set flag in `/api/status`, prompt user (not auto-download)
238
- - If corrupted: quarantine to `.bak`, mark invalid, trigger auto-download from primary on next request
239
-
240
- **Stale Cache Handling**:
241
- - Max age: 90 days → background check queries IPFS for new hash
242
- - Stale window: 7 days after max age → serve stale if live fetch fails
243
- - Offline degradation: serve even if 365 days old when network down
244
-
245
- **Cleanup Policy**:
246
- - Backup retention: 1 previous version (`.bak`) for 7 days
247
- - Failed downloads: delete `*.tmp` after 1 hour idle
248
- - Old versions: delete if > 90 days old
249
- - Disk threshold: warn if `~/.gmgui/models` exceeds 2GB
250
-
251
- ### Design Rationale
252
-
253
- **Why Three Layers?** IPFS (decentralized, no SPoF) + HuggingFace (proven, existing) + Local (offline-ready, LAN-resilient)
254
-
255
- **Why Metrics First?** Enables data-driven gateway selection, identifies reliability in production, guides timeout/retry tuning
256
-
257
- **Why No Auto-Upgrade?** User controls timing, allows staged rollout, supports version pinning, reduces surprise breakage
258
-
259
- **Why Bundled Models?** Enables air-gapped deployments, reduces network load, supports edge environments with poor connectivity
260
-
261
- ### Implementation Roadmap
262
-
263
- | Phase | Description | Priority |
264
- |-------|-------------|----------|
265
- | 1 | Integrate IPFS gateway discovery (default configurable) | HIGH |
266
- | 2 | Refactor `ensureModelsDownloaded()` to use fallback chain | HIGH |
267
- | 3 | Add metrics collection to download layer | HIGH |
268
- | 4 | Implement manifest-based version tracking | MEDIUM |
269
- | 5 | Add stale-while-revalidate background checks | MEDIUM |
270
- | 6 | Integrate bundled models option | LOW |
271
- | 7 | Add peer-to-peer discovery | LOW |
272
-
273
- ### Critical TODOs Before Implementation
274
-
275
- 1. Publish whisper-base to IPFS → obtain ipfsHash
276
- 2. Publish TTS models to IPFS → obtain ipfsHash
277
- 3. Create manifest templates for both models
278
- 4. Design metrics storage schema (SQLite vs JSON)
279
- 5. Plan background check scheduler
280
- 6. Define dashboard UI for metrics visualization
1
+ # AgentGUI
2
+
3
+ Multi-agent GUI client for AI coding agents (Claude Code, Gemini CLI, OpenCode, Goose, etc.) with real-time streaming, WebSocket sync, and SQLite persistence.
4
+
5
+ ## Running
6
+
7
+ ```bash
8
+ npm install
9
+ npm run dev # node server.js --watch
10
+ ```
11
+
12
+ Server starts on `http://localhost:3000`, redirects to `/gm/`.
13
+
14
+ ## Architecture
15
+
16
+ ```
17
+ server.js HTTP server + WebSocket + all API routes (raw http.createServer)
18
+ database.js SQLite setup (WAL mode), schema, query functions
19
+ lib/claude-runner.js Agent framework - spawns CLI processes, parses stream-json output
20
+ lib/speech.js Speech-to-text and text-to-speech via @huggingface/transformers
21
+ bin/gmgui.cjs CLI entry point (npx agentgui / bunx agentgui)
22
+ static/index.html Main HTML shell
23
+ static/app.js App initialization
24
+ static/theme.js Theme switching
25
+ static/js/client.js Main client logic
26
+ static/js/conversations.js Conversation management
27
+ static/js/streaming-renderer.js Renders Claude streaming events as HTML
28
+ static/js/event-processor.js Processes incoming events
29
+ static/js/event-filter.js Filters events by type
30
+ static/js/websocket-manager.js WebSocket connection handling
31
+ static/js/ui-components.js UI component helpers
32
+ static/js/syntax-highlighter.js Code syntax highlighting
33
+ static/js/voice.js Voice input/output
34
+ static/js/features.js Feature flags
35
+ static/templates/ 31 HTML template fragments for event rendering
36
+ ```
37
+
38
+ ## Key Details
39
+
40
+ - Express is used only for file upload (`/api/upload/:conversationId`) and fsbrowse file browser (`/files/:conversationId`). All other routes use raw `http.createServer` with manual routing.
41
+ - Agent discovery scans PATH for known CLI binaries (claude, opencode, gemini, goose, etc.) at startup.
42
+ - Database lives at `~/.gmgui/data.db`. Tables: conversations, messages, events, sessions, stream chunks.
43
+ - WebSocket endpoint is at `BASE_URL + /sync`. Supports subscribe/unsubscribe by sessionId or conversationId, and ping.
44
+
45
+ ## Environment Variables
46
+
47
+ - `PORT` - Server port (default: 3000)
48
+ - `BASE_URL` - URL prefix (default: /gm)
49
+ - `STARTUP_CWD` - Working directory passed to agents
50
+ - `HOT_RELOAD` - Set to "false" to disable watch mode
51
+
52
+ ## REST API
53
+
54
+ All routes are prefixed with `BASE_URL` (default `/gm`).
55
+
56
+ - `GET /api/conversations` - List conversations
57
+ - `POST /api/conversations` - Create conversation (body: agentId, title, workingDirectory)
58
+ - `GET /api/conversations/:id` - Get conversation with streaming status
59
+ - `POST /api/conversations/:id` - Update conversation
60
+ - `DELETE /api/conversations/:id` - Delete conversation
61
+ - `GET /api/conversations/:id/messages` - Get messages (query: limit, offset)
62
+ - `POST /api/conversations/:id/messages` - Send message (body: content, agentId)
63
+ - `POST /api/conversations/:id/stream` - Start streaming execution
64
+ - `GET /api/conversations/:id/full` - Full conversation load with chunks
65
+ - `GET /api/conversations/:id/chunks` - Get stream chunks (query: since)
66
+ - `GET /api/conversations/:id/sessions/latest` - Get latest session
67
+ - `GET /api/sessions/:id` - Get session
68
+ - `GET /api/sessions/:id/chunks` - Get session chunks (query: since)
69
+ - `GET /api/sessions/:id/execution` - Get execution events (query: limit, offset, filterType)
70
+ - `GET /api/agents` - List discovered agents
71
+ - `GET /api/home` - Get home directory
72
+ - `POST /api/stt` - Speech-to-text (raw audio body)
73
+ - `POST /api/tts` - Text-to-speech (body: text)
74
+ - `GET /api/speech-status` - Speech model loading status
75
+ - `POST /api/folders` - Create folder
76
+
77
+ ## WebSocket Protocol
78
+
79
+ Endpoint: `BASE_URL + /sync`
80
+
81
+ Client sends:
82
+ - `{ type: "subscribe", sessionId }` or `{ type: "subscribe", conversationId }`
83
+ - `{ type: "unsubscribe", sessionId }`
84
+ - `{ type: "ping" }`
85
+
86
+ Server broadcasts:
87
+ - `streaming_start` - Agent execution started
88
+ - `streaming_progress` - New event/chunk from agent
89
+ - `streaming_complete` - Execution finished
90
+ - `streaming_error` - Execution failed
91
+ - `conversation_created`, `conversation_updated`, `conversation_deleted`
92
+ - `tts_setup_progress` - Windows pocket-tts setup progress (step, status, message)
93
+
94
+ ## Pocket-TTS Windows Setup (Reliability for Slow/Bad Internet)
95
+
96
+ On Windows, text-to-speech uses pocket-tts which requires Python and pip install. The setup process is now resilient to slow/unreliable connections:
97
+
98
+ ### Features
99
+ - **Extended timeouts**: 120s for pip install (accommodates slow connections)
100
+ - **Retry logic**: 3 attempts with exponential backoff (1s, 2s delays)
101
+ - **Progress reporting**: Real-time updates via WebSocket to UI
102
+ - **Partial install cleanup**: Failed venvs are removed to allow retry
103
+ - **Installation verification**: Binary validation via `--version` check
104
+ - **Concurrent waiting**: Multiple simultaneous requests wait for single setup (600s timeout)
105
+
106
+ ### Configuration (lib/windows-pocket-tts-setup.js)
107
+ ```javascript
108
+ const CONFIG = {
109
+ PIP_TIMEOUT: 120000, // 2 minutes
110
+ VENV_CREATION_TIMEOUT: 30000, // 30 seconds
111
+ MAX_RETRIES: 3, // 3 attempts
112
+ RETRY_DELAY_MS: 1000, // 1 second initial
113
+ RETRY_BACKOFF_MULTIPLIER: 2, // 2x exponential
114
+ };
115
+ ```
116
+
117
+ ### Network Requirements
118
+ - **Minimum**: 50 kbps sustained, < 5s latency, < 10% packet loss
119
+ - **Recommended**: 256+ kbps, < 2s latency, < 1% packet loss
120
+ - **Expected time on slow connection**: 2-6 minutes with retries
121
+
122
+ ### Progress Messages
123
+ During TTS setup on first use, WebSocket broadcasts:
124
+ ```json
125
+ {
126
+ "type": "tts_setup_progress",
127
+ "step": "detecting-python|creating-venv|installing|verifying",
128
+ "status": "in-progress|success|error",
129
+ "message": "descriptive status message with retry count if applicable"
130
+ }
131
+ ```
132
+
133
+ ### Recovery Behavior
134
+ 1. Network timeout → auto-retry with backoff
135
+ 2. Partial venv → auto-cleanup before retry
136
+ 3. Failed verification → auto-cleanup and error
137
+ 4. Concurrent requests → first starts setup, others wait up to 600s
138
+ 5. Interrupted setup → cleanup allows fresh retry
139
+
140
+ ### Testing
141
+ Setup validates by running pocket-tts binary with `--version` flag to confirm functional installation, not just file existence.
142
+
143
+ ## Model Download Fallback Chain Architecture (Task 1C)
144
+
145
+ Three-layer resilient fallback for speech models (280MB whisper-base + 197MB TTS). Designed to eliminate single points of failure while maintaining backward compatibility.
146
+
147
+ ### Layer 1: IPFS Gateway (Primary)
148
+
149
+ Decentralized distribution across three gateways with automatic failover:
150
+
151
+ ```
152
+ Cloudflare IPFS https://cloudflare-ipfs.com/ipfs/ Priority 1 (99.9% reliable)
153
+ dweb.link https://dweb.link/ipfs/ Priority 2 (99% reliable)
154
+ Pinata https://gateway.pinata.cloud/ipfs/ Priority 3 (99.5% reliable)
155
+ ```
156
+
157
+ **Model Distribution**:
158
+ - Whisper Base (280MB): `TBD_WHISPER_HASH` → encoder (78.6MB) + decoder (198.9MB) + configs
159
+ - TTS Models (197MB): `TBD_TTS_HASH` → mimi_encoder (73MB) + decoders + text_conditioner + flow_lm
160
+
161
+ **Characteristics**: 30s timeout per gateway, 2 retries before fallback, SHA-256 per-file verification against IPFS-stored manifest
162
+
163
+ ### Layer 2: HuggingFace (Secondary)
164
+
165
+ Current working implementation via webtalk package. Proven reliable with region-dependent latency.
166
+
167
+ ```
168
+ Whisper https://huggingface.co/onnx-community/whisper-base/resolve/main/
169
+ TTS https://huggingface.co/datasets/AnEntrypoint/sttttsmodels/resolve/main/tts/
170
+ ```
171
+
172
+ **Characteristics**: 3 retries with exponential backoff (2^attempt seconds), 30s timeout, file size validation (minBytes thresholds: encoder ≥40MB, decoder ≥100MB, TTS files ≥18-61MB range)
173
+
174
+ **Implementation Location**: webtalk/whisper-models.js, webtalk/tts-models.js (unchanged, wrapped by fallback logic)
175
+
176
+ ### Layer 3: Local Cache + Fallbacks
177
+
178
+ **Primary Cache**: `~/.gmgui/models/` with manifest at `~/.gmgui/models/.manifests.json`
179
+
180
+ **Verification Algorithms**:
181
+ 1. Size check (minBytes threshold) → corrupted: delete & retry
182
+ 2. SHA-256 hash against manifest → mismatch: delete & re-download
183
+ 3. ONNX format validation (header check) → invalid: delete & escalate to primary
184
+
185
+ **Bundled Models** (future): `agentgui/bundled-models.tar.gz` (~50-80MB) for offline-first deployments
186
+
187
+ **Peer-to-Peer** (future): mDNS discovery for LAN sharing across multiple AgentGUI instances
188
+
189
+ ### Download Decision Logic
190
+
191
+ ```
192
+ 1. Check local cache validity → RETURN if valid, record cache_hit metric
193
+ 2. TRY PRIMARY (IPFS): attempt 3 gateways sequentially, 2 retries each
194
+ - VERIFY size + sha256 → ON SUCCESS: record primary_success, return
195
+ 3. TRY SECONDARY (HuggingFace): 3 attempts with exponential backoff
196
+ - VERIFY file size → ON SUCCESS: record secondary_success, return
197
+ 4. TRY TERTIARY (Bundled): extract tarball if present
198
+ - VERIFY extraction → ON SUCCESS: record tertiary_bundled_success, return
199
+ 5. TRY TERTIARY (Peer): query mDNS if enabled, fetch from peer
200
+ - VERIFY checksum → ON SUCCESS: record tertiary_peer_success, return
201
+ 6. FAILURE: record all_layers_exhausted metric, throw error (optional: activate degraded mode)
202
+ ```
203
+
204
+ ### Metrics Collection
205
+
206
+ **Storage**: `~/.gmgui/models/.metrics.json` (append-only, rotated daily)
207
+
208
+ **Per-Download Fields**: timestamp, modelType, layer, gateway, status, latency_ms, bytes_downloaded/total, error_type/message
209
+
210
+ **Aggregations**: per-layer success rate, per-gateway success rate, avg latency per layer, cache effectiveness
211
+
212
+ **Dashboard Endpoints**:
213
+ - `GET /api/metrics/downloads` - all metrics
214
+ - `GET /api/metrics/downloads/summary` - aggregated stats
215
+ - `GET /api/metrics/downloads/health` - per-layer health
216
+ - `POST /api/metrics/downloads/reset` - clear history
217
+
218
+ ### Cache Invalidation Strategy
219
+
220
+ **Version Manifest** (`~/.gmgui/models/.manifests.json`):
221
+ ```json
222
+ {
223
+ "whisper-base": {
224
+ "currentVersion": "1.0.0",
225
+ "ipfsHash": "QmXXXX...",
226
+ "huggingfaceTag": "revision-hash",
227
+ "downloadedAt": "ISO8601",
228
+ "sha256": { "file": "hash...", ... }
229
+ },
230
+ "tts-models": { ... }
231
+ }
232
+ ```
233
+
234
+ **Version Mismatch Detection** (on startup + periodic background check):
235
+ - Query HuggingFace API HEAD for latest revision
236
+ - Query IPFS gateway for latest dag-json manifest
237
+ - If new version: log warning, set flag in `/api/status`, prompt user (not auto-download)
238
+ - If corrupted: quarantine to `.bak`, mark invalid, trigger auto-download from primary on next request
239
+
240
+ **Stale Cache Handling**:
241
+ - Max age: 90 days → background check queries IPFS for new hash
242
+ - Stale window: 7 days after max age → serve stale if live fetch fails
243
+ - Offline degradation: serve even if 365 days old when network down
244
+
245
+ **Cleanup Policy**:
246
+ - Backup retention: 1 previous version (`.bak`) for 7 days
247
+ - Failed downloads: delete `*.tmp` after 1 hour idle
248
+ - Old versions: delete if > 90 days old
249
+ - Disk threshold: warn if `~/.gmgui/models` exceeds 2GB
250
+
251
+ ### Design Rationale
252
+
253
+ **Why Three Layers?** IPFS (decentralized, no SPoF) + HuggingFace (proven, existing) + Local (offline-ready, LAN-resilient)
254
+
255
+ **Why Metrics First?** Enables data-driven gateway selection, identifies reliability in production, guides timeout/retry tuning
256
+
257
+ **Why No Auto-Upgrade?** User controls timing, allows staged rollout, supports version pinning, reduces surprise breakage
258
+
259
+ **Why Bundled Models?** Enables air-gapped deployments, reduces network load, supports edge environments with poor connectivity
260
+
261
+ ### Implementation Roadmap
262
+
263
+ | Phase | Description | Priority |
264
+ |-------|-------------|----------|
265
+ | 1 | Integrate IPFS gateway discovery (default configurable) | HIGH |
266
+ | 2 | Refactor `ensureModelsDownloaded()` to use fallback chain | HIGH |
267
+ | 3 | Add metrics collection to download layer | HIGH |
268
+ | 4 | Implement manifest-based version tracking | MEDIUM |
269
+ | 5 | Add stale-while-revalidate background checks | MEDIUM |
270
+ | 6 | Integrate bundled models option | LOW |
271
+ | 7 | Add peer-to-peer discovery | LOW |
272
+
273
+ ### Critical TODOs Before Implementation
274
+
275
+ 1. Publish whisper-base to IPFS → obtain ipfsHash
276
+ 2. Publish TTS models to IPFS → obtain ipfsHash
277
+ 3. Create manifest templates for both models
278
+ 4. Design metrics storage schema (SQLite vs JSON)
279
+ 5. Plan background check scheduler
280
+ 6. Define dashboard UI for metrics visualization