agentgui 1.0.274 → 1.0.275
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CLAUDE.md +280 -280
- package/IPFS_DOWNLOADER.md +277 -277
- package/TASK_2C_COMPLETION.md +334 -334
- package/bin/gmgui.cjs +54 -54
- package/build-portable.js +3 -42
- package/database.js +1422 -1406
- package/lib/claude-runner.js +1130 -1130
- package/lib/ipfs-downloader.js +459 -459
- package/lib/speech.js +152 -152
- package/package.json +1 -1
- package/readme.md +76 -76
- package/server.js +3787 -3794
- package/setup-npm-token.sh +68 -68
- package/static/app.js +773 -773
- package/static/event-rendering-showcase.html +708 -708
- package/static/index.html +3178 -3180
- package/static/js/agent-auth.js +298 -298
- package/static/js/audio-recorder-processor.js +18 -18
- package/static/js/client.js +2656 -2656
- package/static/js/conversations.js +583 -583
- package/static/js/dialogs.js +267 -267
- package/static/js/event-consolidator.js +101 -101
- package/static/js/event-filter.js +311 -311
- package/static/js/event-processor.js +452 -452
- package/static/js/features.js +413 -413
- package/static/js/kalman-filter.js +67 -67
- package/static/js/progress-dialog.js +130 -130
- package/static/js/script-runner.js +219 -219
- package/static/js/streaming-renderer.js +2123 -2120
- package/static/js/syntax-highlighter.js +269 -269
- package/static/js/tts-websocket-handler.js +152 -152
- package/static/js/ui-components.js +431 -431
- package/static/js/voice.js +849 -849
- package/static/js/websocket-manager.js +596 -596
- package/static/templates/INDEX.html +465 -465
- package/static/templates/README.md +190 -190
- package/static/templates/agent-capabilities.html +56 -56
- package/static/templates/agent-metadata-panel.html +44 -44
- package/static/templates/agent-status-badge.html +30 -30
- package/static/templates/code-annotation-panel.html +155 -155
- package/static/templates/code-suggestion-panel.html +184 -184
- package/static/templates/command-header.html +77 -77
- package/static/templates/command-output-scrollable.html +118 -118
- package/static/templates/elapsed-time.html +54 -54
- package/static/templates/error-alert.html +106 -106
- package/static/templates/error-history-timeline.html +160 -160
- package/static/templates/error-recovery-options.html +109 -109
- package/static/templates/error-stack-trace.html +95 -95
- package/static/templates/error-summary.html +80 -80
- package/static/templates/event-counter.html +48 -48
- package/static/templates/execution-actions.html +97 -97
- package/static/templates/execution-progress-bar.html +80 -80
- package/static/templates/execution-stepper.html +120 -120
- package/static/templates/file-breadcrumb.html +118 -118
- package/static/templates/file-diff-viewer.html +121 -121
- package/static/templates/file-metadata.html +133 -133
- package/static/templates/file-read-panel.html +66 -66
- package/static/templates/file-write-panel.html +120 -120
- package/static/templates/git-branch-remote.html +107 -107
- package/static/templates/git-diff-list.html +101 -101
- package/static/templates/git-log-visualization.html +153 -153
- package/static/templates/git-status-panel.html +115 -115
- package/static/templates/quality-metrics-display.html +170 -170
- package/static/templates/terminal-output-panel.html +87 -87
- package/static/templates/test-results-display.html +144 -144
- package/static/theme.js +72 -72
- package/test-download-progress.js +223 -223
- package/test-websocket-broadcast.js +147 -147
- package/tests/ipfs-downloader.test.js +370 -370
package/CLAUDE.md
CHANGED
|
@@ -1,280 +1,280 @@
|
|
|
1
|
-
# AgentGUI
|
|
2
|
-
|
|
3
|
-
Multi-agent GUI client for AI coding agents (Claude Code, Gemini CLI, OpenCode, Goose, etc.) with real-time streaming, WebSocket sync, and SQLite persistence.
|
|
4
|
-
|
|
5
|
-
## Running
|
|
6
|
-
|
|
7
|
-
```bash
|
|
8
|
-
npm install
|
|
9
|
-
npm run dev # node server.js --watch
|
|
10
|
-
```
|
|
11
|
-
|
|
12
|
-
Server starts on `http://localhost:3000`, redirects to `/gm/`.
|
|
13
|
-
|
|
14
|
-
## Architecture
|
|
15
|
-
|
|
16
|
-
```
|
|
17
|
-
server.js HTTP server + WebSocket + all API routes (raw http.createServer)
|
|
18
|
-
database.js SQLite setup (WAL mode), schema, query functions
|
|
19
|
-
lib/claude-runner.js Agent framework - spawns CLI processes, parses stream-json output
|
|
20
|
-
lib/speech.js Speech-to-text and text-to-speech via @huggingface/transformers
|
|
21
|
-
bin/gmgui.cjs CLI entry point (npx agentgui / bunx agentgui)
|
|
22
|
-
static/index.html Main HTML shell
|
|
23
|
-
static/app.js App initialization
|
|
24
|
-
static/theme.js Theme switching
|
|
25
|
-
static/js/client.js Main client logic
|
|
26
|
-
static/js/conversations.js Conversation management
|
|
27
|
-
static/js/streaming-renderer.js Renders Claude streaming events as HTML
|
|
28
|
-
static/js/event-processor.js Processes incoming events
|
|
29
|
-
static/js/event-filter.js Filters events by type
|
|
30
|
-
static/js/websocket-manager.js WebSocket connection handling
|
|
31
|
-
static/js/ui-components.js UI component helpers
|
|
32
|
-
static/js/syntax-highlighter.js Code syntax highlighting
|
|
33
|
-
static/js/voice.js Voice input/output
|
|
34
|
-
static/js/features.js Feature flags
|
|
35
|
-
static/templates/ 31 HTML template fragments for event rendering
|
|
36
|
-
```
|
|
37
|
-
|
|
38
|
-
## Key Details
|
|
39
|
-
|
|
40
|
-
- Express is used only for file upload (`/api/upload/:conversationId`) and fsbrowse file browser (`/files/:conversationId`). All other routes use raw `http.createServer` with manual routing.
|
|
41
|
-
- Agent discovery scans PATH for known CLI binaries (claude, opencode, gemini, goose, etc.) at startup.
|
|
42
|
-
- Database lives at `~/.gmgui/data.db`. Tables: conversations, messages, events, sessions, stream chunks.
|
|
43
|
-
- WebSocket endpoint is at `BASE_URL + /sync`. Supports subscribe/unsubscribe by sessionId or conversationId, and ping.
|
|
44
|
-
|
|
45
|
-
## Environment Variables
|
|
46
|
-
|
|
47
|
-
- `PORT` - Server port (default: 3000)
|
|
48
|
-
- `BASE_URL` - URL prefix (default: /gm)
|
|
49
|
-
- `STARTUP_CWD` - Working directory passed to agents
|
|
50
|
-
- `HOT_RELOAD` - Set to "false" to disable watch mode
|
|
51
|
-
|
|
52
|
-
## REST API
|
|
53
|
-
|
|
54
|
-
All routes are prefixed with `BASE_URL` (default `/gm`).
|
|
55
|
-
|
|
56
|
-
- `GET /api/conversations` - List conversations
|
|
57
|
-
- `POST /api/conversations` - Create conversation (body: agentId, title, workingDirectory)
|
|
58
|
-
- `GET /api/conversations/:id` - Get conversation with streaming status
|
|
59
|
-
- `POST /api/conversations/:id` - Update conversation
|
|
60
|
-
- `DELETE /api/conversations/:id` - Delete conversation
|
|
61
|
-
- `GET /api/conversations/:id/messages` - Get messages (query: limit, offset)
|
|
62
|
-
- `POST /api/conversations/:id/messages` - Send message (body: content, agentId)
|
|
63
|
-
- `POST /api/conversations/:id/stream` - Start streaming execution
|
|
64
|
-
- `GET /api/conversations/:id/full` - Full conversation load with chunks
|
|
65
|
-
- `GET /api/conversations/:id/chunks` - Get stream chunks (query: since)
|
|
66
|
-
- `GET /api/conversations/:id/sessions/latest` - Get latest session
|
|
67
|
-
- `GET /api/sessions/:id` - Get session
|
|
68
|
-
- `GET /api/sessions/:id/chunks` - Get session chunks (query: since)
|
|
69
|
-
- `GET /api/sessions/:id/execution` - Get execution events (query: limit, offset, filterType)
|
|
70
|
-
- `GET /api/agents` - List discovered agents
|
|
71
|
-
- `GET /api/home` - Get home directory
|
|
72
|
-
- `POST /api/stt` - Speech-to-text (raw audio body)
|
|
73
|
-
- `POST /api/tts` - Text-to-speech (body: text)
|
|
74
|
-
- `GET /api/speech-status` - Speech model loading status
|
|
75
|
-
- `POST /api/folders` - Create folder
|
|
76
|
-
|
|
77
|
-
## WebSocket Protocol
|
|
78
|
-
|
|
79
|
-
Endpoint: `BASE_URL + /sync`
|
|
80
|
-
|
|
81
|
-
Client sends:
|
|
82
|
-
- `{ type: "subscribe", sessionId }` or `{ type: "subscribe", conversationId }`
|
|
83
|
-
- `{ type: "unsubscribe", sessionId }`
|
|
84
|
-
- `{ type: "ping" }`
|
|
85
|
-
|
|
86
|
-
Server broadcasts:
|
|
87
|
-
- `streaming_start` - Agent execution started
|
|
88
|
-
- `streaming_progress` - New event/chunk from agent
|
|
89
|
-
- `streaming_complete` - Execution finished
|
|
90
|
-
- `streaming_error` - Execution failed
|
|
91
|
-
- `conversation_created`, `conversation_updated`, `conversation_deleted`
|
|
92
|
-
- `tts_setup_progress` - Windows pocket-tts setup progress (step, status, message)
|
|
93
|
-
|
|
94
|
-
## Pocket-TTS Windows Setup (Reliability for Slow/Bad Internet)
|
|
95
|
-
|
|
96
|
-
On Windows, text-to-speech uses pocket-tts which requires Python and pip install. The setup process is now resilient to slow/unreliable connections:
|
|
97
|
-
|
|
98
|
-
### Features
|
|
99
|
-
- **Extended timeouts**: 120s for pip install (accommodates slow connections)
|
|
100
|
-
- **Retry logic**: 3 attempts with exponential backoff (1s, 2s delays)
|
|
101
|
-
- **Progress reporting**: Real-time updates via WebSocket to UI
|
|
102
|
-
- **Partial install cleanup**: Failed venvs are removed to allow retry
|
|
103
|
-
- **Installation verification**: Binary validation via `--version` check
|
|
104
|
-
- **Concurrent waiting**: Multiple simultaneous requests wait for single setup (600s timeout)
|
|
105
|
-
|
|
106
|
-
### Configuration (lib/windows-pocket-tts-setup.js)
|
|
107
|
-
```javascript
|
|
108
|
-
const CONFIG = {
|
|
109
|
-
PIP_TIMEOUT: 120000, // 2 minutes
|
|
110
|
-
VENV_CREATION_TIMEOUT: 30000, // 30 seconds
|
|
111
|
-
MAX_RETRIES: 3, // 3 attempts
|
|
112
|
-
RETRY_DELAY_MS: 1000, // 1 second initial
|
|
113
|
-
RETRY_BACKOFF_MULTIPLIER: 2, // 2x exponential
|
|
114
|
-
};
|
|
115
|
-
```
|
|
116
|
-
|
|
117
|
-
### Network Requirements
|
|
118
|
-
- **Minimum**: 50 kbps sustained, < 5s latency, < 10% packet loss
|
|
119
|
-
- **Recommended**: 256+ kbps, < 2s latency, < 1% packet loss
|
|
120
|
-
- **Expected time on slow connection**: 2-6 minutes with retries
|
|
121
|
-
|
|
122
|
-
### Progress Messages
|
|
123
|
-
During TTS setup on first use, WebSocket broadcasts:
|
|
124
|
-
```json
|
|
125
|
-
{
|
|
126
|
-
"type": "tts_setup_progress",
|
|
127
|
-
"step": "detecting-python|creating-venv|installing|verifying",
|
|
128
|
-
"status": "in-progress|success|error",
|
|
129
|
-
"message": "descriptive status message with retry count if applicable"
|
|
130
|
-
}
|
|
131
|
-
```
|
|
132
|
-
|
|
133
|
-
### Recovery Behavior
|
|
134
|
-
1. Network timeout → auto-retry with backoff
|
|
135
|
-
2. Partial venv → auto-cleanup before retry
|
|
136
|
-
3. Failed verification → auto-cleanup and error
|
|
137
|
-
4. Concurrent requests → first starts setup, others wait up to 600s
|
|
138
|
-
5. Interrupted setup → cleanup allows fresh retry
|
|
139
|
-
|
|
140
|
-
### Testing
|
|
141
|
-
Setup validates by running pocket-tts binary with `--version` flag to confirm functional installation, not just file existence.
|
|
142
|
-
|
|
143
|
-
## Model Download Fallback Chain Architecture (Task 1C)
|
|
144
|
-
|
|
145
|
-
Three-layer resilient fallback for speech models (280MB whisper-base + 197MB TTS). Designed to eliminate single points of failure while maintaining backward compatibility.
|
|
146
|
-
|
|
147
|
-
### Layer 1: IPFS Gateway (Primary)
|
|
148
|
-
|
|
149
|
-
Decentralized distribution across three gateways with automatic failover:
|
|
150
|
-
|
|
151
|
-
```
|
|
152
|
-
Cloudflare IPFS https://cloudflare-ipfs.com/ipfs/ Priority 1 (99.9% reliable)
|
|
153
|
-
dweb.link https://dweb.link/ipfs/ Priority 2 (99% reliable)
|
|
154
|
-
Pinata https://gateway.pinata.cloud/ipfs/ Priority 3 (99.5% reliable)
|
|
155
|
-
```
|
|
156
|
-
|
|
157
|
-
**Model Distribution**:
|
|
158
|
-
- Whisper Base (280MB): `TBD_WHISPER_HASH` → encoder (78.6MB) + decoder (198.9MB) + configs
|
|
159
|
-
- TTS Models (197MB): `TBD_TTS_HASH` → mimi_encoder (73MB) + decoders + text_conditioner + flow_lm
|
|
160
|
-
|
|
161
|
-
**Characteristics**: 30s timeout per gateway, 2 retries before fallback, SHA-256 per-file verification against IPFS-stored manifest
|
|
162
|
-
|
|
163
|
-
### Layer 2: HuggingFace (Secondary)
|
|
164
|
-
|
|
165
|
-
Current working implementation via webtalk package. Proven reliable with region-dependent latency.
|
|
166
|
-
|
|
167
|
-
```
|
|
168
|
-
Whisper https://huggingface.co/onnx-community/whisper-base/resolve/main/
|
|
169
|
-
TTS https://huggingface.co/datasets/AnEntrypoint/sttttsmodels/resolve/main/tts/
|
|
170
|
-
```
|
|
171
|
-
|
|
172
|
-
**Characteristics**: 3 retries with exponential backoff (2^attempt seconds), 30s timeout, file size validation (minBytes thresholds: encoder ≥40MB, decoder ≥100MB, TTS files ≥18-61MB range)
|
|
173
|
-
|
|
174
|
-
**Implementation Location**: webtalk/whisper-models.js, webtalk/tts-models.js (unchanged, wrapped by fallback logic)
|
|
175
|
-
|
|
176
|
-
### Layer 3: Local Cache + Fallbacks
|
|
177
|
-
|
|
178
|
-
**Primary Cache**: `~/.gmgui/models/` with manifest at `~/.gmgui/models/.manifests.json`
|
|
179
|
-
|
|
180
|
-
**Verification Algorithms**:
|
|
181
|
-
1. Size check (minBytes threshold) → corrupted: delete & retry
|
|
182
|
-
2. SHA-256 hash against manifest → mismatch: delete & re-download
|
|
183
|
-
3. ONNX format validation (header check) → invalid: delete & escalate to primary
|
|
184
|
-
|
|
185
|
-
**Bundled Models** (future): `agentgui/bundled-models.tar.gz` (~50-80MB) for offline-first deployments
|
|
186
|
-
|
|
187
|
-
**Peer-to-Peer** (future): mDNS discovery for LAN sharing across multiple AgentGUI instances
|
|
188
|
-
|
|
189
|
-
### Download Decision Logic
|
|
190
|
-
|
|
191
|
-
```
|
|
192
|
-
1. Check local cache validity → RETURN if valid, record cache_hit metric
|
|
193
|
-
2. TRY PRIMARY (IPFS): attempt 3 gateways sequentially, 2 retries each
|
|
194
|
-
- VERIFY size + sha256 → ON SUCCESS: record primary_success, return
|
|
195
|
-
3. TRY SECONDARY (HuggingFace): 3 attempts with exponential backoff
|
|
196
|
-
- VERIFY file size → ON SUCCESS: record secondary_success, return
|
|
197
|
-
4. TRY TERTIARY (Bundled): extract tarball if present
|
|
198
|
-
- VERIFY extraction → ON SUCCESS: record tertiary_bundled_success, return
|
|
199
|
-
5. TRY TERTIARY (Peer): query mDNS if enabled, fetch from peer
|
|
200
|
-
- VERIFY checksum → ON SUCCESS: record tertiary_peer_success, return
|
|
201
|
-
6. FAILURE: record all_layers_exhausted metric, throw error (optional: activate degraded mode)
|
|
202
|
-
```
|
|
203
|
-
|
|
204
|
-
### Metrics Collection
|
|
205
|
-
|
|
206
|
-
**Storage**: `~/.gmgui/models/.metrics.json` (append-only, rotated daily)
|
|
207
|
-
|
|
208
|
-
**Per-Download Fields**: timestamp, modelType, layer, gateway, status, latency_ms, bytes_downloaded/total, error_type/message
|
|
209
|
-
|
|
210
|
-
**Aggregations**: per-layer success rate, per-gateway success rate, avg latency per layer, cache effectiveness
|
|
211
|
-
|
|
212
|
-
**Dashboard Endpoints**:
|
|
213
|
-
- `GET /api/metrics/downloads` - all metrics
|
|
214
|
-
- `GET /api/metrics/downloads/summary` - aggregated stats
|
|
215
|
-
- `GET /api/metrics/downloads/health` - per-layer health
|
|
216
|
-
- `POST /api/metrics/downloads/reset` - clear history
|
|
217
|
-
|
|
218
|
-
### Cache Invalidation Strategy
|
|
219
|
-
|
|
220
|
-
**Version Manifest** (`~/.gmgui/models/.manifests.json`):
|
|
221
|
-
```json
|
|
222
|
-
{
|
|
223
|
-
"whisper-base": {
|
|
224
|
-
"currentVersion": "1.0.0",
|
|
225
|
-
"ipfsHash": "QmXXXX...",
|
|
226
|
-
"huggingfaceTag": "revision-hash",
|
|
227
|
-
"downloadedAt": "ISO8601",
|
|
228
|
-
"sha256": { "file": "hash...", ... }
|
|
229
|
-
},
|
|
230
|
-
"tts-models": { ... }
|
|
231
|
-
}
|
|
232
|
-
```
|
|
233
|
-
|
|
234
|
-
**Version Mismatch Detection** (on startup + periodic background check):
|
|
235
|
-
- Query HuggingFace API HEAD for latest revision
|
|
236
|
-
- Query IPFS gateway for latest dag-json manifest
|
|
237
|
-
- If new version: log warning, set flag in `/api/status`, prompt user (not auto-download)
|
|
238
|
-
- If corrupted: quarantine to `.bak`, mark invalid, trigger auto-download from primary on next request
|
|
239
|
-
|
|
240
|
-
**Stale Cache Handling**:
|
|
241
|
-
- Max age: 90 days → background check queries IPFS for new hash
|
|
242
|
-
- Stale window: 7 days after max age → serve stale if live fetch fails
|
|
243
|
-
- Offline degradation: serve even if 365 days old when network down
|
|
244
|
-
|
|
245
|
-
**Cleanup Policy**:
|
|
246
|
-
- Backup retention: 1 previous version (`.bak`) for 7 days
|
|
247
|
-
- Failed downloads: delete `*.tmp` after 1 hour idle
|
|
248
|
-
- Old versions: delete if > 90 days old
|
|
249
|
-
- Disk threshold: warn if `~/.gmgui/models` exceeds 2GB
|
|
250
|
-
|
|
251
|
-
### Design Rationale
|
|
252
|
-
|
|
253
|
-
**Why Three Layers?** IPFS (decentralized, no SPoF) + HuggingFace (proven, existing) + Local (offline-ready, LAN-resilient)
|
|
254
|
-
|
|
255
|
-
**Why Metrics First?** Enables data-driven gateway selection, identifies reliability in production, guides timeout/retry tuning
|
|
256
|
-
|
|
257
|
-
**Why No Auto-Upgrade?** User controls timing, allows staged rollout, supports version pinning, reduces surprise breakage
|
|
258
|
-
|
|
259
|
-
**Why Bundled Models?** Enables air-gapped deployments, reduces network load, supports edge environments with poor connectivity
|
|
260
|
-
|
|
261
|
-
### Implementation Roadmap
|
|
262
|
-
|
|
263
|
-
| Phase | Description | Priority |
|
|
264
|
-
|-------|-------------|----------|
|
|
265
|
-
| 1 | Integrate IPFS gateway discovery (default configurable) | HIGH |
|
|
266
|
-
| 2 | Refactor `ensureModelsDownloaded()` to use fallback chain | HIGH |
|
|
267
|
-
| 3 | Add metrics collection to download layer | HIGH |
|
|
268
|
-
| 4 | Implement manifest-based version tracking | MEDIUM |
|
|
269
|
-
| 5 | Add stale-while-revalidate background checks | MEDIUM |
|
|
270
|
-
| 6 | Integrate bundled models option | LOW |
|
|
271
|
-
| 7 | Add peer-to-peer discovery | LOW |
|
|
272
|
-
|
|
273
|
-
### Critical TODOs Before Implementation
|
|
274
|
-
|
|
275
|
-
1. Publish whisper-base to IPFS → obtain ipfsHash
|
|
276
|
-
2. Publish TTS models to IPFS → obtain ipfsHash
|
|
277
|
-
3. Create manifest templates for both models
|
|
278
|
-
4. Design metrics storage schema (SQLite vs JSON)
|
|
279
|
-
5. Plan background check scheduler
|
|
280
|
-
6. Define dashboard UI for metrics visualization
|
|
1
|
+
# AgentGUI
|
|
2
|
+
|
|
3
|
+
Multi-agent GUI client for AI coding agents (Claude Code, Gemini CLI, OpenCode, Goose, etc.) with real-time streaming, WebSocket sync, and SQLite persistence.
|
|
4
|
+
|
|
5
|
+
## Running
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
npm install
|
|
9
|
+
npm run dev # node server.js --watch
|
|
10
|
+
```
|
|
11
|
+
|
|
12
|
+
Server starts on `http://localhost:3000`, redirects to `/gm/`.
|
|
13
|
+
|
|
14
|
+
## Architecture
|
|
15
|
+
|
|
16
|
+
```
|
|
17
|
+
server.js HTTP server + WebSocket + all API routes (raw http.createServer)
|
|
18
|
+
database.js SQLite setup (WAL mode), schema, query functions
|
|
19
|
+
lib/claude-runner.js Agent framework - spawns CLI processes, parses stream-json output
|
|
20
|
+
lib/speech.js Speech-to-text and text-to-speech via @huggingface/transformers
|
|
21
|
+
bin/gmgui.cjs CLI entry point (npx agentgui / bunx agentgui)
|
|
22
|
+
static/index.html Main HTML shell
|
|
23
|
+
static/app.js App initialization
|
|
24
|
+
static/theme.js Theme switching
|
|
25
|
+
static/js/client.js Main client logic
|
|
26
|
+
static/js/conversations.js Conversation management
|
|
27
|
+
static/js/streaming-renderer.js Renders Claude streaming events as HTML
|
|
28
|
+
static/js/event-processor.js Processes incoming events
|
|
29
|
+
static/js/event-filter.js Filters events by type
|
|
30
|
+
static/js/websocket-manager.js WebSocket connection handling
|
|
31
|
+
static/js/ui-components.js UI component helpers
|
|
32
|
+
static/js/syntax-highlighter.js Code syntax highlighting
|
|
33
|
+
static/js/voice.js Voice input/output
|
|
34
|
+
static/js/features.js Feature flags
|
|
35
|
+
static/templates/ 31 HTML template fragments for event rendering
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
## Key Details
|
|
39
|
+
|
|
40
|
+
- Express is used only for file upload (`/api/upload/:conversationId`) and fsbrowse file browser (`/files/:conversationId`). All other routes use raw `http.createServer` with manual routing.
|
|
41
|
+
- Agent discovery scans PATH for known CLI binaries (claude, opencode, gemini, goose, etc.) at startup.
|
|
42
|
+
- Database lives at `~/.gmgui/data.db`. Tables: conversations, messages, events, sessions, stream chunks.
|
|
43
|
+
- WebSocket endpoint is at `BASE_URL + /sync`. Supports subscribe/unsubscribe by sessionId or conversationId, and ping.
|
|
44
|
+
|
|
45
|
+
## Environment Variables
|
|
46
|
+
|
|
47
|
+
- `PORT` - Server port (default: 3000)
|
|
48
|
+
- `BASE_URL` - URL prefix (default: /gm)
|
|
49
|
+
- `STARTUP_CWD` - Working directory passed to agents
|
|
50
|
+
- `HOT_RELOAD` - Set to "false" to disable watch mode
|
|
51
|
+
|
|
52
|
+
## REST API
|
|
53
|
+
|
|
54
|
+
All routes are prefixed with `BASE_URL` (default `/gm`).
|
|
55
|
+
|
|
56
|
+
- `GET /api/conversations` - List conversations
|
|
57
|
+
- `POST /api/conversations` - Create conversation (body: agentId, title, workingDirectory)
|
|
58
|
+
- `GET /api/conversations/:id` - Get conversation with streaming status
|
|
59
|
+
- `POST /api/conversations/:id` - Update conversation
|
|
60
|
+
- `DELETE /api/conversations/:id` - Delete conversation
|
|
61
|
+
- `GET /api/conversations/:id/messages` - Get messages (query: limit, offset)
|
|
62
|
+
- `POST /api/conversations/:id/messages` - Send message (body: content, agentId)
|
|
63
|
+
- `POST /api/conversations/:id/stream` - Start streaming execution
|
|
64
|
+
- `GET /api/conversations/:id/full` - Full conversation load with chunks
|
|
65
|
+
- `GET /api/conversations/:id/chunks` - Get stream chunks (query: since)
|
|
66
|
+
- `GET /api/conversations/:id/sessions/latest` - Get latest session
|
|
67
|
+
- `GET /api/sessions/:id` - Get session
|
|
68
|
+
- `GET /api/sessions/:id/chunks` - Get session chunks (query: since)
|
|
69
|
+
- `GET /api/sessions/:id/execution` - Get execution events (query: limit, offset, filterType)
|
|
70
|
+
- `GET /api/agents` - List discovered agents
|
|
71
|
+
- `GET /api/home` - Get home directory
|
|
72
|
+
- `POST /api/stt` - Speech-to-text (raw audio body)
|
|
73
|
+
- `POST /api/tts` - Text-to-speech (body: text)
|
|
74
|
+
- `GET /api/speech-status` - Speech model loading status
|
|
75
|
+
- `POST /api/folders` - Create folder
|
|
76
|
+
|
|
77
|
+
## WebSocket Protocol
|
|
78
|
+
|
|
79
|
+
Endpoint: `BASE_URL + /sync`
|
|
80
|
+
|
|
81
|
+
Client sends:
|
|
82
|
+
- `{ type: "subscribe", sessionId }` or `{ type: "subscribe", conversationId }`
|
|
83
|
+
- `{ type: "unsubscribe", sessionId }`
|
|
84
|
+
- `{ type: "ping" }`
|
|
85
|
+
|
|
86
|
+
Server broadcasts:
|
|
87
|
+
- `streaming_start` - Agent execution started
|
|
88
|
+
- `streaming_progress` - New event/chunk from agent
|
|
89
|
+
- `streaming_complete` - Execution finished
|
|
90
|
+
- `streaming_error` - Execution failed
|
|
91
|
+
- `conversation_created`, `conversation_updated`, `conversation_deleted`
|
|
92
|
+
- `tts_setup_progress` - Windows pocket-tts setup progress (step, status, message)
|
|
93
|
+
|
|
94
|
+
## Pocket-TTS Windows Setup (Reliability for Slow/Bad Internet)
|
|
95
|
+
|
|
96
|
+
On Windows, text-to-speech uses pocket-tts which requires Python and pip install. The setup process is now resilient to slow/unreliable connections:
|
|
97
|
+
|
|
98
|
+
### Features
|
|
99
|
+
- **Extended timeouts**: 120s for pip install (accommodates slow connections)
|
|
100
|
+
- **Retry logic**: 3 attempts with exponential backoff (1s, 2s delays)
|
|
101
|
+
- **Progress reporting**: Real-time updates via WebSocket to UI
|
|
102
|
+
- **Partial install cleanup**: Failed venvs are removed to allow retry
|
|
103
|
+
- **Installation verification**: Binary validation via `--version` check
|
|
104
|
+
- **Concurrent waiting**: Multiple simultaneous requests wait for single setup (600s timeout)
|
|
105
|
+
|
|
106
|
+
### Configuration (lib/windows-pocket-tts-setup.js)
|
|
107
|
+
```javascript
|
|
108
|
+
const CONFIG = {
|
|
109
|
+
PIP_TIMEOUT: 120000, // 2 minutes
|
|
110
|
+
VENV_CREATION_TIMEOUT: 30000, // 30 seconds
|
|
111
|
+
MAX_RETRIES: 3, // 3 attempts
|
|
112
|
+
RETRY_DELAY_MS: 1000, // 1 second initial
|
|
113
|
+
RETRY_BACKOFF_MULTIPLIER: 2, // 2x exponential
|
|
114
|
+
};
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
### Network Requirements
|
|
118
|
+
- **Minimum**: 50 kbps sustained, < 5s latency, < 10% packet loss
|
|
119
|
+
- **Recommended**: 256+ kbps, < 2s latency, < 1% packet loss
|
|
120
|
+
- **Expected time on slow connection**: 2-6 minutes with retries
|
|
121
|
+
|
|
122
|
+
### Progress Messages
|
|
123
|
+
During TTS setup on first use, WebSocket broadcasts:
|
|
124
|
+
```json
|
|
125
|
+
{
|
|
126
|
+
"type": "tts_setup_progress",
|
|
127
|
+
"step": "detecting-python|creating-venv|installing|verifying",
|
|
128
|
+
"status": "in-progress|success|error",
|
|
129
|
+
"message": "descriptive status message with retry count if applicable"
|
|
130
|
+
}
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
### Recovery Behavior
|
|
134
|
+
1. Network timeout → auto-retry with backoff
|
|
135
|
+
2. Partial venv → auto-cleanup before retry
|
|
136
|
+
3. Failed verification → auto-cleanup and error
|
|
137
|
+
4. Concurrent requests → first starts setup, others wait up to 600s
|
|
138
|
+
5. Interrupted setup → cleanup allows fresh retry
|
|
139
|
+
|
|
140
|
+
### Testing
|
|
141
|
+
Setup validates by running pocket-tts binary with `--version` flag to confirm functional installation, not just file existence.
|
|
142
|
+
|
|
143
|
+
## Model Download Fallback Chain Architecture (Task 1C)
|
|
144
|
+
|
|
145
|
+
Three-layer resilient fallback for speech models (280MB whisper-base + 197MB TTS). Designed to eliminate single points of failure while maintaining backward compatibility.
|
|
146
|
+
|
|
147
|
+
### Layer 1: IPFS Gateway (Primary)
|
|
148
|
+
|
|
149
|
+
Decentralized distribution across three gateways with automatic failover:
|
|
150
|
+
|
|
151
|
+
```
|
|
152
|
+
Cloudflare IPFS https://cloudflare-ipfs.com/ipfs/ Priority 1 (99.9% reliable)
|
|
153
|
+
dweb.link https://dweb.link/ipfs/ Priority 2 (99% reliable)
|
|
154
|
+
Pinata https://gateway.pinata.cloud/ipfs/ Priority 3 (99.5% reliable)
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
**Model Distribution**:
|
|
158
|
+
- Whisper Base (280MB): `TBD_WHISPER_HASH` → encoder (78.6MB) + decoder (198.9MB) + configs
|
|
159
|
+
- TTS Models (197MB): `TBD_TTS_HASH` → mimi_encoder (73MB) + decoders + text_conditioner + flow_lm
|
|
160
|
+
|
|
161
|
+
**Characteristics**: 30s timeout per gateway, 2 retries before fallback, SHA-256 per-file verification against IPFS-stored manifest
|
|
162
|
+
|
|
163
|
+
### Layer 2: HuggingFace (Secondary)
|
|
164
|
+
|
|
165
|
+
Current working implementation via webtalk package. Proven reliable with region-dependent latency.
|
|
166
|
+
|
|
167
|
+
```
|
|
168
|
+
Whisper https://huggingface.co/onnx-community/whisper-base/resolve/main/
|
|
169
|
+
TTS https://huggingface.co/datasets/AnEntrypoint/sttttsmodels/resolve/main/tts/
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
**Characteristics**: 3 retries with exponential backoff (2^attempt seconds), 30s timeout, file size validation (minBytes thresholds: encoder ≥40MB, decoder ≥100MB, TTS files ≥18-61MB range)
|
|
173
|
+
|
|
174
|
+
**Implementation Location**: webtalk/whisper-models.js, webtalk/tts-models.js (unchanged, wrapped by fallback logic)
|
|
175
|
+
|
|
176
|
+
### Layer 3: Local Cache + Fallbacks
|
|
177
|
+
|
|
178
|
+
**Primary Cache**: `~/.gmgui/models/` with manifest at `~/.gmgui/models/.manifests.json`
|
|
179
|
+
|
|
180
|
+
**Verification Algorithms**:
|
|
181
|
+
1. Size check (minBytes threshold) → corrupted: delete & retry
|
|
182
|
+
2. SHA-256 hash against manifest → mismatch: delete & re-download
|
|
183
|
+
3. ONNX format validation (header check) → invalid: delete & escalate to primary
|
|
184
|
+
|
|
185
|
+
**Bundled Models** (future): `agentgui/bundled-models.tar.gz` (~50-80MB) for offline-first deployments
|
|
186
|
+
|
|
187
|
+
**Peer-to-Peer** (future): mDNS discovery for LAN sharing across multiple AgentGUI instances
|
|
188
|
+
|
|
189
|
+
### Download Decision Logic
|
|
190
|
+
|
|
191
|
+
```
|
|
192
|
+
1. Check local cache validity → RETURN if valid, record cache_hit metric
|
|
193
|
+
2. TRY PRIMARY (IPFS): attempt 3 gateways sequentially, 2 retries each
|
|
194
|
+
- VERIFY size + sha256 → ON SUCCESS: record primary_success, return
|
|
195
|
+
3. TRY SECONDARY (HuggingFace): 3 attempts with exponential backoff
|
|
196
|
+
- VERIFY file size → ON SUCCESS: record secondary_success, return
|
|
197
|
+
4. TRY TERTIARY (Bundled): extract tarball if present
|
|
198
|
+
- VERIFY extraction → ON SUCCESS: record tertiary_bundled_success, return
|
|
199
|
+
5. TRY TERTIARY (Peer): query mDNS if enabled, fetch from peer
|
|
200
|
+
- VERIFY checksum → ON SUCCESS: record tertiary_peer_success, return
|
|
201
|
+
6. FAILURE: record all_layers_exhausted metric, throw error (optional: activate degraded mode)
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
### Metrics Collection
|
|
205
|
+
|
|
206
|
+
**Storage**: `~/.gmgui/models/.metrics.json` (append-only, rotated daily)
|
|
207
|
+
|
|
208
|
+
**Per-Download Fields**: timestamp, modelType, layer, gateway, status, latency_ms, bytes_downloaded/total, error_type/message
|
|
209
|
+
|
|
210
|
+
**Aggregations**: per-layer success rate, per-gateway success rate, avg latency per layer, cache effectiveness
|
|
211
|
+
|
|
212
|
+
**Dashboard Endpoints**:
|
|
213
|
+
- `GET /api/metrics/downloads` - all metrics
|
|
214
|
+
- `GET /api/metrics/downloads/summary` - aggregated stats
|
|
215
|
+
- `GET /api/metrics/downloads/health` - per-layer health
|
|
216
|
+
- `POST /api/metrics/downloads/reset` - clear history
|
|
217
|
+
|
|
218
|
+
### Cache Invalidation Strategy
|
|
219
|
+
|
|
220
|
+
**Version Manifest** (`~/.gmgui/models/.manifests.json`):
|
|
221
|
+
```json
|
|
222
|
+
{
|
|
223
|
+
"whisper-base": {
|
|
224
|
+
"currentVersion": "1.0.0",
|
|
225
|
+
"ipfsHash": "QmXXXX...",
|
|
226
|
+
"huggingfaceTag": "revision-hash",
|
|
227
|
+
"downloadedAt": "ISO8601",
|
|
228
|
+
"sha256": { "file": "hash...", ... }
|
|
229
|
+
},
|
|
230
|
+
"tts-models": { ... }
|
|
231
|
+
}
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
**Version Mismatch Detection** (on startup + periodic background check):
|
|
235
|
+
- Query HuggingFace API HEAD for latest revision
|
|
236
|
+
- Query IPFS gateway for latest dag-json manifest
|
|
237
|
+
- If new version: log warning, set flag in `/api/status`, prompt user (not auto-download)
|
|
238
|
+
- If corrupted: quarantine to `.bak`, mark invalid, trigger auto-download from primary on next request
|
|
239
|
+
|
|
240
|
+
**Stale Cache Handling**:
|
|
241
|
+
- Max age: 90 days → background check queries IPFS for new hash
|
|
242
|
+
- Stale window: 7 days after max age → serve stale if live fetch fails
|
|
243
|
+
- Offline degradation: serve even if 365 days old when network down
|
|
244
|
+
|
|
245
|
+
**Cleanup Policy**:
|
|
246
|
+
- Backup retention: 1 previous version (`.bak`) for 7 days
|
|
247
|
+
- Failed downloads: delete `*.tmp` after 1 hour idle
|
|
248
|
+
- Old versions: delete if > 90 days old
|
|
249
|
+
- Disk threshold: warn if `~/.gmgui/models` exceeds 2GB
|
|
250
|
+
|
|
251
|
+
### Design Rationale
|
|
252
|
+
|
|
253
|
+
**Why Three Layers?** IPFS (decentralized, no SPoF) + HuggingFace (proven, existing) + Local (offline-ready, LAN-resilient)
|
|
254
|
+
|
|
255
|
+
**Why Metrics First?** Enables data-driven gateway selection, identifies reliability in production, guides timeout/retry tuning
|
|
256
|
+
|
|
257
|
+
**Why No Auto-Upgrade?** User controls timing, allows staged rollout, supports version pinning, reduces surprise breakage
|
|
258
|
+
|
|
259
|
+
**Why Bundled Models?** Enables air-gapped deployments, reduces network load, supports edge environments with poor connectivity
|
|
260
|
+
|
|
261
|
+
### Implementation Roadmap
|
|
262
|
+
|
|
263
|
+
| Phase | Description | Priority |
|
|
264
|
+
|-------|-------------|----------|
|
|
265
|
+
| 1 | Integrate IPFS gateway discovery (default configurable) | HIGH |
|
|
266
|
+
| 2 | Refactor `ensureModelsDownloaded()` to use fallback chain | HIGH |
|
|
267
|
+
| 3 | Add metrics collection to download layer | HIGH |
|
|
268
|
+
| 4 | Implement manifest-based version tracking | MEDIUM |
|
|
269
|
+
| 5 | Add stale-while-revalidate background checks | MEDIUM |
|
|
270
|
+
| 6 | Integrate bundled models option | LOW |
|
|
271
|
+
| 7 | Add peer-to-peer discovery | LOW |
|
|
272
|
+
|
|
273
|
+
### Critical TODOs Before Implementation
|
|
274
|
+
|
|
275
|
+
1. Publish whisper-base to IPFS → obtain ipfsHash
|
|
276
|
+
2. Publish TTS models to IPFS → obtain ipfsHash
|
|
277
|
+
3. Create manifest templates for both models
|
|
278
|
+
4. Design metrics storage schema (SQLite vs JSON)
|
|
279
|
+
5. Plan background check scheduler
|
|
280
|
+
6. Define dashboard UI for metrics visualization
|