the-grid-cc 1.7.13 → 1.7.15
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/02-SUMMARY.md +156 -0
- package/DAEMON_VALIDATION.md +354 -0
- package/README.md +6 -6
- package/agents/grid-accountant.md +519 -0
- package/agents/grid-git-operator.md +661 -0
- package/agents/grid-researcher.md +421 -0
- package/agents/grid-scout.md +376 -0
- package/commands/grid/VERSION +1 -1
- package/commands/grid/branch.md +567 -0
- package/commands/grid/budget.md +438 -0
- package/commands/grid/daemon.md +637 -0
- package/commands/grid/help.md +29 -0
- package/commands/grid/init.md +409 -18
- package/commands/grid/mc.md +163 -1111
- package/commands/grid/resume.md +656 -0
- package/docs/BUDGET_SYSTEM.md +745 -0
- package/docs/CONFIG_SCHEMA.md +479 -0
- package/docs/DAEMON_ARCHITECTURE.md +780 -0
- package/docs/GIT_AUTONOMY.md +981 -0
- package/docs/GIT_AUTONOMY_INTEGRATION.md +343 -0
- package/docs/MC_OPTIMIZATION.md +181 -0
- package/docs/MC_PROTOCOLS.md +950 -0
- package/docs/PERSISTENCE.md +962 -0
- package/docs/PERSISTENCE_IMPLEMENTATION.md +361 -0
- package/docs/PERSISTENCE_QUICKSTART.md +283 -0
- package/docs/RESEARCH_CONFIG.md +511 -0
- package/docs/RESEARCH_FIRST.md +591 -0
- package/docs/WIRING_VERIFICATION.md +389 -0
- package/package.json +1 -1
- package/templates/daemon-checkpoint.json +51 -0
- package/templates/daemon-config.json +28 -0
- package/templates/git-config.json +65 -0
- package/templates/grid-state/.gitignore-entry +3 -0
- package/templates/grid-state/BLOCK-SUMMARY.md +66 -0
- package/templates/grid-state/BLOCKERS.md +31 -0
- package/templates/grid-state/CHECKPOINT.md +59 -0
- package/templates/grid-state/DECISIONS.md +30 -0
- package/templates/grid-state/README.md +138 -0
- package/templates/grid-state/SCRATCHPAD.md +29 -0
- package/templates/grid-state/STATE.md +47 -0
- package/templates/grid-state/WARMTH.md +48 -0
- package/templates/grid-state/config.json +24 -0
|
@@ -0,0 +1,780 @@
|
|
|
1
|
+
# The Grid - Daemon Mode Architecture
|
|
2
|
+
|
|
3
|
+
Technical design document for long-running autonomous execution in The Grid.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Executive Summary
|
|
8
|
+
|
|
9
|
+
Daemon Mode enables The Grid to execute complex, multi-hour tasks without requiring an active user session. Users can "fire and forget" large projects, check progress asynchronously, and receive notifications when work completes or requires attention.
|
|
10
|
+
|
|
11
|
+
**Key Capabilities:**
|
|
12
|
+
- Long-running autonomous execution (hours to days)
|
|
13
|
+
- Process survival across terminal closures
|
|
14
|
+
- Progress monitoring and status checks
|
|
15
|
+
- Graceful pause, resume, and cancellation
|
|
16
|
+
- Crash recovery and checkpoint-based resumption
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## Current State Analysis
|
|
21
|
+
|
|
22
|
+
### What The Grid Has Today
|
|
23
|
+
|
|
24
|
+
| Feature | Current State |
|
|
25
|
+
|---------|---------------|
|
|
26
|
+
| **State Persistence** | `.grid/STATE.md` - survives terminal close |
|
|
27
|
+
| **Checkpoint Protocol** | Programs return structured checkpoints |
|
|
28
|
+
| **Warmth Transfer** | `lessons_learned` in SUMMARY.md |
|
|
29
|
+
| **Scratchpad** | Live discovery sharing via `.grid/SCRATCHPAD.md` |
|
|
30
|
+
| **Session Resume** | Manual via `/grid` checking STATE.md |
|
|
31
|
+
|
|
32
|
+
### Current Limitations
|
|
33
|
+
|
|
34
|
+
1. **Session-Bound Execution**: Work stops when user closes terminal
|
|
35
|
+
2. **No Background Mode**: Can't run while user does other work
|
|
36
|
+
3. **Manual Resume Required**: User must explicitly restart after pause
|
|
37
|
+
4. **No Notifications**: No way to alert user when work completes
|
|
38
|
+
5. **Context Window Bound**: Single-session context limits (200k tokens)
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## Architecture Overview
|
|
43
|
+
|
|
44
|
+
### Three-Layer Design
|
|
45
|
+
|
|
46
|
+
```
|
|
47
|
+
┌─────────────────────────────────────────────────────────────────────┐
|
|
48
|
+
│ DAEMON CONTROLLER │
|
|
49
|
+
│ Lightweight process manager that survives terminal disconnection │
|
|
50
|
+
│ - Spawns/monitors Claude Code processes │
|
|
51
|
+
│ - Manages checkpoint persistence │
|
|
52
|
+
│ - Handles notifications │
|
|
53
|
+
└─────────────────────────────────────────────────────────────────────┘
|
|
54
|
+
↓
|
|
55
|
+
┌─────────────────────────────────────────────────────────────────────┐
|
|
56
|
+
│ SESSION ORCHESTRATOR │
|
|
57
|
+
│ Master Control instance managing execution waves │
|
|
58
|
+
│ - Coordinates parallel Programs │
|
|
59
|
+
│ - Handles inter-session state transfer │
|
|
60
|
+
│ - Implements durable execution patterns │
|
|
61
|
+
└─────────────────────────────────────────────────────────────────────┘
|
|
62
|
+
↓
|
|
63
|
+
┌─────────────────────────────────────────────────────────────────────┐
|
|
64
|
+
│ WORKER PROGRAMS │
|
|
65
|
+
│ Fresh Claude Code instances for actual work │
|
|
66
|
+
│ - Planners, Executors, Recognizers, etc. │
|
|
67
|
+
│ - Each gets fresh 200k context window │
|
|
68
|
+
│ - Reports progress to Orchestrator │
|
|
69
|
+
└─────────────────────────────────────────────────────────────────────┘
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
### Daemon Controller (Process Layer)
|
|
73
|
+
|
|
74
|
+
The Daemon Controller is a lightweight Node.js process that:
|
|
75
|
+
|
|
76
|
+
1. **Spawns Claude Code sessions** via CLI
|
|
77
|
+
2. **Monitors process health** with heartbeats
|
|
78
|
+
3. **Persists state** to disk on every checkpoint
|
|
79
|
+
4. **Survives disconnection** via `nohup` or similar
|
|
80
|
+
5. **Sends notifications** via system notifications, webhooks, or email
|
|
81
|
+
|
|
82
|
+
```javascript
|
|
83
|
+
// Conceptual: daemon-controller.js
|
|
84
|
+
class DaemonController {
|
|
85
|
+
constructor(config) {
|
|
86
|
+
this.stateFile = '.grid/daemon/state.json';
|
|
87
|
+
this.logFile = '.grid/daemon/daemon.log';
|
|
88
|
+
this.notificationHandler = new NotificationHandler(config);
|
|
89
|
+
}
|
|
90
|
+
|
|
91
|
+
async spawn(taskDescription) {
|
|
92
|
+
// 1. Create daemon state
|
|
93
|
+
const daemonId = generateId();
|
|
94
|
+
await this.persistState({ id: daemonId, status: 'starting' });
|
|
95
|
+
|
|
96
|
+
// 2. Spawn Claude Code in headless mode
|
|
97
|
+
const proc = spawn('claude', [
|
|
98
|
+
'--agent', 'grid-daemon-orchestrator',
|
|
99
|
+
'--input', taskDescription,
|
|
100
|
+
'--output', `.grid/daemon/${daemonId}/output.md`
|
|
101
|
+
], { detached: true });
|
|
102
|
+
|
|
103
|
+
// 3. Detach from terminal
|
|
104
|
+
proc.unref();
|
|
105
|
+
|
|
106
|
+
return daemonId;
|
|
107
|
+
}
|
|
108
|
+
|
|
109
|
+
async checkStatus(daemonId) {
|
|
110
|
+
const state = await this.loadState(daemonId);
|
|
111
|
+
return {
|
|
112
|
+
status: state.status,
|
|
113
|
+
progress: state.progress,
|
|
114
|
+
lastCheckpoint: state.lastCheckpoint,
|
|
115
|
+
logs: await this.tailLogs(daemonId, 20)
|
|
116
|
+
};
|
|
117
|
+
}
|
|
118
|
+
}
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
### Session Orchestrator (Coordination Layer)
|
|
122
|
+
|
|
123
|
+
An enhanced Master Control that implements **durable execution**:
|
|
124
|
+
|
|
125
|
+
```
|
|
126
|
+
SESSION LIFECYCLE
|
|
127
|
+
─────────────────
|
|
128
|
+
|
|
129
|
+
1. INITIALIZE
|
|
130
|
+
├── Load daemon state from disk
|
|
131
|
+
├── Verify last checkpoint integrity
|
|
132
|
+
└── Determine resume point
|
|
133
|
+
|
|
134
|
+
2. EXECUTE WAVE
|
|
135
|
+
├── Spawn Programs (parallel within wave)
|
|
136
|
+
├── Monitor via scratchpad polling
|
|
137
|
+
├── Checkpoint after each Program completes
|
|
138
|
+
└── Persist full state to disk
|
|
139
|
+
|
|
140
|
+
3. CHECKPOINT (after every wave)
|
|
141
|
+
├── Serialize: conversation history, plan state, warmth
|
|
142
|
+
├── Write to `.grid/daemon/{id}/checkpoint.json`
|
|
143
|
+
├── Update progress in daemon state
|
|
144
|
+
└── Notify controller of progress
|
|
145
|
+
|
|
146
|
+
4. HANDLE INTERRUPTION
|
|
147
|
+
├── If graceful: complete current Program, checkpoint, exit
|
|
148
|
+
├── If crash: controller detects via heartbeat timeout
|
|
149
|
+
└── Resume from last checkpoint on restart
|
|
150
|
+
|
|
151
|
+
5. COMPLETE
|
|
152
|
+
├── Final checkpoint with COMPLETE status
|
|
153
|
+
├── Generate summary report
|
|
154
|
+
├── Notify user
|
|
155
|
+
└── Clean up or archive daemon state
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
### State Persistence Model
|
|
159
|
+
|
|
160
|
+
```yaml
|
|
161
|
+
# .grid/daemon/{daemon-id}/checkpoint.json
|
|
162
|
+
{
|
|
163
|
+
"version": "1.0",
|
|
164
|
+
"daemon_id": "20260123-143000-build-auth-api",
|
|
165
|
+
"created": "2026-01-23T14:30:00Z",
|
|
166
|
+
"updated": "2026-01-23T16:45:00Z",
|
|
167
|
+
|
|
168
|
+
"status": "executing", # starting | executing | paused | checkpoint | complete | failed
|
|
169
|
+
|
|
170
|
+
"task": {
|
|
171
|
+
"description": "Build REST API with user authentication",
|
|
172
|
+
"mode": "autopilot",
|
|
173
|
+
"original_prompt": "..."
|
|
174
|
+
},
|
|
175
|
+
|
|
176
|
+
"progress": {
|
|
177
|
+
"current_wave": 2,
|
|
178
|
+
"total_waves": 4,
|
|
179
|
+
"completed_blocks": ["01", "02", "03"],
|
|
180
|
+
"current_block": "04",
|
|
181
|
+
"percent": 65
|
|
182
|
+
},
|
|
183
|
+
|
|
184
|
+
"execution_state": {
|
|
185
|
+
"plan_data": { /* Full Planner output */ },
|
|
186
|
+
"completed_summaries": { /* Block SUMMARY.md contents */ },
|
|
187
|
+
"warmth": { /* Accumulated lessons_learned */ },
|
|
188
|
+
"scratchpad_archive": [ /* All scratchpad entries */ ]
|
|
189
|
+
},
|
|
190
|
+
|
|
191
|
+
"checkpoint_stack": [
|
|
192
|
+
{
|
|
193
|
+
"type": "human-verify",
|
|
194
|
+
"block": "04",
|
|
195
|
+
"details": { /* Checkpoint data */ },
|
|
196
|
+
"created": "2026-01-23T16:45:00Z"
|
|
197
|
+
}
|
|
198
|
+
],
|
|
199
|
+
|
|
200
|
+
"metrics": {
|
|
201
|
+
"start_time": "2026-01-23T14:30:00Z",
|
|
202
|
+
"elapsed_seconds": 8100,
|
|
203
|
+
"programs_spawned": 12,
|
|
204
|
+
"commits_made": 8,
|
|
205
|
+
"estimated_remaining_seconds": 4500
|
|
206
|
+
}
|
|
207
|
+
}
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
---
|
|
211
|
+
|
|
212
|
+
## Durable Execution Implementation
|
|
213
|
+
|
|
214
|
+
### Checkpoint Protocol
|
|
215
|
+
|
|
216
|
+
Based on research into durable execution patterns, checkpointing occurs at these boundaries:
|
|
217
|
+
|
|
218
|
+
| Event | Checkpoint Contents | Recovery Action |
|
|
219
|
+
|-------|---------------------|-----------------|
|
|
220
|
+
| **Wave Complete** | Full state, all summaries | Resume next wave |
|
|
221
|
+
| **Program Complete** | Program output, warmth | Resume wave |
|
|
222
|
+
| **User Checkpoint** | Checkpoint data, pause reason | Wait for user |
|
|
223
|
+
| **Crash** | Last known state | Verify + resume |
|
|
224
|
+
| **Graceful Stop** | Full state + stop reason | Resume on restart |
|
|
225
|
+
|
|
226
|
+
### Heartbeat & Health Monitoring
|
|
227
|
+
|
|
228
|
+
```
|
|
229
|
+
HEARTBEAT PROTOCOL
|
|
230
|
+
──────────────────
|
|
231
|
+
|
|
232
|
+
Orchestrator writes heartbeat every 30 seconds:
|
|
233
|
+
.grid/daemon/{id}/heartbeat.json
|
|
234
|
+
{
|
|
235
|
+
"timestamp": "2026-01-23T16:45:30Z",
|
|
236
|
+
"status": "executing",
|
|
237
|
+
"current_action": "Spawning executor-03"
|
|
238
|
+
}
|
|
239
|
+
|
|
240
|
+
Controller considers Orchestrator dead if:
|
|
241
|
+
- No heartbeat for 2 minutes
|
|
242
|
+
- Process not found in system
|
|
243
|
+
|
|
244
|
+
Recovery:
|
|
245
|
+
1. Controller reads last checkpoint
|
|
246
|
+
2. Spawns new Orchestrator with checkpoint
|
|
247
|
+
3. Orchestrator verifies git state
|
|
248
|
+
4. Resumes from checkpoint
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
### Crash Recovery
|
|
252
|
+
|
|
253
|
+
```python
|
|
254
|
+
def recover_from_crash(daemon_id):
|
|
255
|
+
"""Recovery protocol after unexpected termination."""
|
|
256
|
+
|
|
257
|
+
# 1. Load last checkpoint
|
|
258
|
+
checkpoint = load_checkpoint(daemon_id)
|
|
259
|
+
|
|
260
|
+
# 2. Verify git state matches checkpoint
|
|
261
|
+
actual_commits = get_git_commits_since(checkpoint['task']['start_time'])
|
|
262
|
+
expected_commits = checkpoint['execution_state']['completed_summaries']
|
|
263
|
+
|
|
264
|
+
if commits_match(actual_commits, expected_commits):
|
|
265
|
+
# Clean recovery - resume from checkpoint
|
|
266
|
+
return spawn_orchestrator(checkpoint, mode='resume')
|
|
267
|
+
else:
|
|
268
|
+
# Dirty state - need reconciliation
|
|
269
|
+
return spawn_orchestrator(checkpoint, mode='reconcile')
|
|
270
|
+
|
|
271
|
+
def reconcile_state(checkpoint, actual_git_state):
|
|
272
|
+
"""Reconcile checkpoint with actual git state."""
|
|
273
|
+
|
|
274
|
+
# Find divergence point
|
|
275
|
+
last_matching_commit = find_last_matching(checkpoint, actual_git_state)
|
|
276
|
+
|
|
277
|
+
# Option A: Trust git, update checkpoint
|
|
278
|
+
# Option B: Trust checkpoint, revert git (dangerous)
|
|
279
|
+
# Default: Trust git, log discrepancy
|
|
280
|
+
|
|
281
|
+
updated_checkpoint = rebuild_from_git(last_matching_commit)
|
|
282
|
+
return updated_checkpoint
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
---
|
|
286
|
+
|
|
287
|
+
## Multi-Session Context Management
|
|
288
|
+
|
|
289
|
+
### The Context Window Problem
|
|
290
|
+
|
|
291
|
+
A single Claude Code session is limited to ~200k tokens. Complex projects exceed this. Solution: **session chaining with warmth transfer**.
|
|
292
|
+
|
|
293
|
+
```
|
|
294
|
+
SESSION 1 (Waves 1-2) SESSION 2 (Waves 3-4)
|
|
295
|
+
┌─────────────────────┐ ┌─────────────────────┐
|
|
296
|
+
│ Fresh 200k context │ │ Fresh 200k context │
|
|
297
|
+
│ │ │ │
|
|
298
|
+
│ - Execute Wave 1 │ │ - Load checkpoint │
|
|
299
|
+
│ - Execute Wave 2 │ → │ - Apply warmth │
|
|
300
|
+
│ - Checkpoint │ │ - Execute Wave 3 │
|
|
301
|
+
│ - Extract warmth │ │ - Execute Wave 4 │
|
|
302
|
+
│ - Terminate │ │ - Complete │
|
|
303
|
+
└─────────────────────┘ └─────────────────────┘
|
|
304
|
+
↓ ↑
|
|
305
|
+
checkpoint.json ────────────────────┘
|
|
306
|
+
```
|
|
307
|
+
|
|
308
|
+
### Session Handoff Protocol
|
|
309
|
+
|
|
310
|
+
```python
|
|
311
|
+
def handoff_to_new_session(current_checkpoint):
|
|
312
|
+
"""Hand off to fresh session when context exhausted."""
|
|
313
|
+
|
|
314
|
+
# 1. Save current state
|
|
315
|
+
save_checkpoint(current_checkpoint)
|
|
316
|
+
|
|
317
|
+
# 2. Extract warmth (compressed learnings)
|
|
318
|
+
warmth = extract_warmth(current_checkpoint)
|
|
319
|
+
|
|
320
|
+
# 3. Terminate current session gracefully
|
|
321
|
+
terminate_session()
|
|
322
|
+
|
|
323
|
+
# 4. Spawn fresh session with minimal context
|
|
324
|
+
new_session = spawn_orchestrator({
|
|
325
|
+
'checkpoint_path': current_checkpoint.path,
|
|
326
|
+
'warmth': warmth,
|
|
327
|
+
'mode': 'continue'
|
|
328
|
+
})
|
|
329
|
+
|
|
330
|
+
return new_session
|
|
331
|
+
```
|
|
332
|
+
|
|
333
|
+
### Warmth Compression
|
|
334
|
+
|
|
335
|
+
To fit learnings into new context windows, warmth is compressed:
|
|
336
|
+
|
|
337
|
+
```yaml
|
|
338
|
+
# Full warmth (too large for handoff)
|
|
339
|
+
lessons_learned:
|
|
340
|
+
codebase_patterns:
|
|
341
|
+
- "Uses barrel exports in src/index.ts"
|
|
342
|
+
- "API routes in src/app/api/*/route.ts"
|
|
343
|
+
- "Uses Zod for validation everywhere"
|
|
344
|
+
- "Prisma client in src/lib/db.ts"
|
|
345
|
+
- ... (50 more patterns)
|
|
346
|
+
|
|
347
|
+
# Compressed warmth (fits in context)
|
|
348
|
+
warmth_compressed:
|
|
349
|
+
patterns: "barrel exports, Zod validation, Prisma in lib/db"
|
|
350
|
+
gotchas: "auth middleware runs first, timestamps UTC"
|
|
351
|
+
decisions: "chose JWT over sessions for statelessness"
|
|
352
|
+
critical_files: ["src/lib/auth.ts", "prisma/schema.prisma"]
|
|
353
|
+
```
|
|
354
|
+
|
|
355
|
+
---
|
|
356
|
+
|
|
357
|
+
## User Interaction Patterns
|
|
358
|
+
|
|
359
|
+
### Fire-and-Forget Launch
|
|
360
|
+
|
|
361
|
+
```bash
|
|
362
|
+
# User launches daemon
|
|
363
|
+
/grid:daemon "Build complete e-commerce platform with Stripe integration"
|
|
364
|
+
|
|
365
|
+
# Grid responds
|
|
366
|
+
DAEMON SPAWNED
|
|
367
|
+
══════════════
|
|
368
|
+
|
|
369
|
+
ID: 20260123-143000-ecommerce
|
|
370
|
+
Task: Build complete e-commerce platform with Stripe integration
|
|
371
|
+
Mode: Autopilot
|
|
372
|
+
|
|
373
|
+
Status: Planning phase
|
|
374
|
+
Monitor: /grid:daemon status
|
|
375
|
+
Stop: /grid:daemon stop
|
|
376
|
+
|
|
377
|
+
You can close this terminal. Work continues in background.
|
|
378
|
+
|
|
379
|
+
End of Line.
|
|
380
|
+
```
|
|
381
|
+
|
|
382
|
+
### Status Checking
|
|
383
|
+
|
|
384
|
+
```bash
|
|
385
|
+
# From any terminal
|
|
386
|
+
/grid:daemon status
|
|
387
|
+
|
|
388
|
+
# Output
|
|
389
|
+
DAEMON STATUS
|
|
390
|
+
═════════════
|
|
391
|
+
|
|
392
|
+
ID: 20260123-143000-ecommerce
|
|
393
|
+
Runtime: 2h 15m
|
|
394
|
+
Status: Executing (Wave 3 of 5)
|
|
395
|
+
|
|
396
|
+
Progress: [████████████░░░░░░░░] 60%
|
|
397
|
+
|
|
398
|
+
Current: Block 07 - Payment Integration
|
|
399
|
+
├─ Thread 7.1: Stripe SDK setup ✓
|
|
400
|
+
├─ Thread 7.2: Checkout flow ⚡ In Progress
|
|
401
|
+
└─ Thread 7.3: Webhook handlers ○ Pending
|
|
402
|
+
|
|
403
|
+
Recent Activity:
|
|
404
|
+
16:42 - executor-07: Implementing checkout session creation
|
|
405
|
+
16:38 - executor-06: Completed product catalog API
|
|
406
|
+
16:30 - recognizer: Verified Wave 2 artifacts ✓
|
|
407
|
+
|
|
408
|
+
Commits: 14 made
|
|
409
|
+
Est. Remaining: ~1h 30m
|
|
410
|
+
|
|
411
|
+
End of Line.
|
|
412
|
+
```
|
|
413
|
+
|
|
414
|
+
### Checkpoints (User Attention Required)
|
|
415
|
+
|
|
416
|
+
```bash
|
|
417
|
+
# User gets notification (system notification, webhook, etc.)
|
|
418
|
+
# "Grid Daemon needs your attention"
|
|
419
|
+
|
|
420
|
+
/grid:daemon status
|
|
421
|
+
|
|
422
|
+
# Output
|
|
423
|
+
DAEMON CHECKPOINT
|
|
424
|
+
═════════════════
|
|
425
|
+
|
|
426
|
+
ID: 20260123-143000-ecommerce
|
|
427
|
+
Status: AWAITING USER
|
|
428
|
+
|
|
429
|
+
Checkpoint Type: human-verify
|
|
430
|
+
Block: 08 - Stripe Webhooks
|
|
431
|
+
|
|
432
|
+
What was built:
|
|
433
|
+
- Stripe webhook endpoint at /api/webhooks/stripe
|
|
434
|
+
- Event handlers for payment_intent.succeeded, .failed
|
|
435
|
+
- Signature verification middleware
|
|
436
|
+
|
|
437
|
+
How to verify:
|
|
438
|
+
1. Run: stripe listen --forward-to localhost:3000/api/webhooks/stripe
|
|
439
|
+
2. In another terminal: stripe trigger payment_intent.succeeded
|
|
440
|
+
3. Check logs show "Payment succeeded" event processed
|
|
441
|
+
|
|
442
|
+
Resume: /grid:daemon resume "approved"
|
|
443
|
+
Or: /grid:daemon resume "Issue: webhook not receiving events"
|
|
444
|
+
|
|
445
|
+
End of Line.
|
|
446
|
+
```
|
|
447
|
+
|
|
448
|
+
### Resume After Checkpoint
|
|
449
|
+
|
|
450
|
+
```bash
|
|
451
|
+
/grid:daemon resume "approved"
|
|
452
|
+
|
|
453
|
+
# Output
|
|
454
|
+
DAEMON RESUMED
|
|
455
|
+
══════════════
|
|
456
|
+
|
|
457
|
+
Checkpoint cleared. Continuing execution...
|
|
458
|
+
|
|
459
|
+
Current: Block 09 - Order Management
|
|
460
|
+
Status: Executing
|
|
461
|
+
|
|
462
|
+
End of Line.
|
|
463
|
+
```
|
|
464
|
+
|
|
465
|
+
---
|
|
466
|
+
|
|
467
|
+
## Notification System
|
|
468
|
+
|
|
469
|
+
### Notification Triggers
|
|
470
|
+
|
|
471
|
+
| Event | Default Notification | Configurable |
|
|
472
|
+
|-------|---------------------|--------------|
|
|
473
|
+
| **Daemon Started** | Log only | Yes |
|
|
474
|
+
| **Wave Complete** | None | Yes |
|
|
475
|
+
| **Checkpoint Reached** | System notification | Yes |
|
|
476
|
+
| **Error/Failure** | System notification + sound | Yes |
|
|
477
|
+
| **Daemon Complete** | System notification | Yes |
|
|
478
|
+
| **Stall Detected** | After 30min inactivity | Yes |
|
|
479
|
+
|
|
480
|
+
### Notification Channels
|
|
481
|
+
|
|
482
|
+
```yaml
|
|
483
|
+
# .grid/config.json
|
|
484
|
+
{
|
|
485
|
+
"daemon": {
|
|
486
|
+
"notifications": {
|
|
487
|
+
"system": true, # macOS/Windows native notifications
|
|
488
|
+
"sound": true, # Audio alert on checkpoint/complete
|
|
489
|
+
"webhook": null, # POST to URL on events
|
|
490
|
+
"email": null, # Email notifications (requires setup)
|
|
491
|
+
"slack": null # Slack webhook URL
|
|
492
|
+
},
|
|
493
|
+
"notify_on": {
|
|
494
|
+
"checkpoint": true,
|
|
495
|
+
"complete": true,
|
|
496
|
+
"error": true,
|
|
497
|
+
"wave_complete": false,
|
|
498
|
+
"stall": true
|
|
499
|
+
},
|
|
500
|
+
"stall_threshold_minutes": 30
|
|
501
|
+
}
|
|
502
|
+
}
|
|
503
|
+
```
|
|
504
|
+
|
|
505
|
+
### System Notification Implementation
|
|
506
|
+
|
|
507
|
+
```javascript
|
|
508
|
+
// Using node-notifier for cross-platform notifications
|
|
509
|
+
const notifier = require('node-notifier');
|
|
510
|
+
|
|
511
|
+
function notifyUser(event) {
|
|
512
|
+
notifier.notify({
|
|
513
|
+
title: 'The Grid',
|
|
514
|
+
message: formatEventMessage(event),
|
|
515
|
+
icon: path.join(__dirname, 'grid-icon.png'),
|
|
516
|
+
sound: event.type === 'checkpoint' || event.type === 'complete',
|
|
517
|
+
wait: event.type === 'checkpoint' // Keep notification until dismissed
|
|
518
|
+
});
|
|
519
|
+
}
|
|
520
|
+
```
|
|
521
|
+
|
|
522
|
+
---
|
|
523
|
+
|
|
524
|
+
## Implementation Phases
|
|
525
|
+
|
|
526
|
+
### Phase 1: Foundation (Current Claude Code Capabilities)
|
|
527
|
+
|
|
528
|
+
**What's possible today:**
|
|
529
|
+
|
|
530
|
+
1. **Manual daemon pattern** using `nohup claude ... &`
|
|
531
|
+
2. **State persistence** via existing `.grid/STATE.md`
|
|
532
|
+
3. **Checkpoint-based resume** via `/grid` reading STATE.md
|
|
533
|
+
4. **Background agents** via Claude Code v2.0.60+ `Ctrl+B`
|
|
534
|
+
|
|
535
|
+
**Implementation:**
|
|
536
|
+
- Enhance STATE.md with daemon-specific fields
|
|
537
|
+
- Create `/grid:daemon` command that sets up state and runs in background
|
|
538
|
+
- Use Claude Code's native background agent support where available
|
|
539
|
+
|
|
540
|
+
### Phase 2: Process Management (Requires External Tooling)
|
|
541
|
+
|
|
542
|
+
**Needs:**
|
|
543
|
+
- Daemon controller process (Node.js or shell script)
|
|
544
|
+
- Process monitoring and heartbeat
|
|
545
|
+
- Crash recovery automation
|
|
546
|
+
|
|
547
|
+
**Implementation:**
|
|
548
|
+
```bash
|
|
549
|
+
# grid-daemon-launcher.sh
|
|
550
|
+
#!/bin/bash
|
|
551
|
+
DAEMON_ID=$(date +%Y%m%d-%H%M%S)-$(echo "$1" | tr ' ' '-' | head -c 20)
|
|
552
|
+
DAEMON_DIR=".grid/daemon/$DAEMON_ID"
|
|
553
|
+
mkdir -p "$DAEMON_DIR"
|
|
554
|
+
|
|
555
|
+
# Save task
|
|
556
|
+
echo "$1" > "$DAEMON_DIR/task.txt"
|
|
557
|
+
|
|
558
|
+
# Launch Claude Code in background
|
|
559
|
+
nohup claude --print --dangerouslySkipPermissions \
|
|
560
|
+
-p "$(cat ~/.claude/commands/grid/daemon-executor.md)" \
|
|
561
|
+
--input "$1" \
|
|
562
|
+
> "$DAEMON_DIR/output.log" 2>&1 &
|
|
563
|
+
|
|
564
|
+
echo $! > "$DAEMON_DIR/pid"
|
|
565
|
+
echo "Daemon $DAEMON_ID started"
|
|
566
|
+
```
|
|
567
|
+
|
|
568
|
+
### Phase 3: Full Daemon Mode (Requires Claude Code Changes)
|
|
569
|
+
|
|
570
|
+
**Would need from Claude Code:**
|
|
571
|
+
- Native daemon/service mode
|
|
572
|
+
- IPC for status queries
|
|
573
|
+
- Built-in notification system
|
|
574
|
+
- Multi-session orchestration
|
|
575
|
+
|
|
576
|
+
**Proposal for Claude Code team:**
|
|
577
|
+
```
|
|
578
|
+
Feature Request: Daemon Mode for Claude Code
|
|
579
|
+
|
|
580
|
+
Use Case: Long-running autonomous development tasks
|
|
581
|
+
|
|
582
|
+
Requested Capabilities:
|
|
583
|
+
1. `claude daemon start "task"` - Launch headless session
|
|
584
|
+
2. `claude daemon status <id>` - Query running daemon
|
|
585
|
+
3. `claude daemon stop <id>` - Graceful termination
|
|
586
|
+
4. `claude daemon list` - Show all running daemons
|
|
587
|
+
5. Automatic checkpoint/resume on crash
|
|
588
|
+
6. Native system notifications
|
|
589
|
+
```
|
|
590
|
+
|
|
591
|
+
---
|
|
592
|
+
|
|
593
|
+
## Security Considerations
|
|
594
|
+
|
|
595
|
+
### Sandboxing
|
|
596
|
+
|
|
597
|
+
Daemon mode inherits Claude Code's permission model:
|
|
598
|
+
- `--dangerouslySkipPermissions` should NOT be used in daemon mode
|
|
599
|
+
- Each operation still requires appropriate permissions
|
|
600
|
+
- File system access limited to project directory
|
|
601
|
+
|
|
602
|
+
### Resource Limits
|
|
603
|
+
|
|
604
|
+
```yaml
|
|
605
|
+
# Daemon resource configuration
|
|
606
|
+
daemon:
|
|
607
|
+
max_runtime_hours: 24 # Hard limit on execution time
|
|
608
|
+
max_programs_parallel: 5 # Limit concurrent Programs
|
|
609
|
+
max_commits_per_hour: 20 # Rate limit commits
|
|
610
|
+
max_file_modifications: 100 # Safety limit on file changes
|
|
611
|
+
require_approval_after: 50 # Force checkpoint after N commits
|
|
612
|
+
```
|
|
613
|
+
|
|
614
|
+
### Audit Trail
|
|
615
|
+
|
|
616
|
+
All daemon activity logged to `.grid/daemon/{id}/audit.log`:
|
|
617
|
+
```
|
|
618
|
+
2026-01-23T14:30:00Z | START | Task: "Build e-commerce platform"
|
|
619
|
+
2026-01-23T14:32:15Z | SPAWN | Planner (model: opus)
|
|
620
|
+
2026-01-23T14:35:42Z | PLAN | 12 blocks, 5 waves
|
|
621
|
+
2026-01-23T14:36:00Z | SPAWN | Executor-01 (block: 01)
|
|
622
|
+
2026-01-23T14:42:18Z | COMMIT | abc123 "feat(01): Initialize project"
|
|
623
|
+
...
|
|
624
|
+
```
|
|
625
|
+
|
|
626
|
+
---
|
|
627
|
+
|
|
628
|
+
## Failure Modes & Mitigations
|
|
629
|
+
|
|
630
|
+
| Failure Mode | Detection | Mitigation |
|
|
631
|
+
|--------------|-----------|------------|
|
|
632
|
+
| **Claude Code crash** | Heartbeat timeout | Auto-restart from checkpoint |
|
|
633
|
+
| **System reboot** | Daemon controller starts on boot | Resume from checkpoint |
|
|
634
|
+
| **Context exhaustion** | Token count monitoring | Session handoff |
|
|
635
|
+
| **API rate limit** | 429 response | Exponential backoff |
|
|
636
|
+
| **Git conflict** | Merge failure | Checkpoint, alert user |
|
|
637
|
+
| **Infinite loop** | Stall detection | Alert user, pause |
|
|
638
|
+
| **Permission denied** | Operation failure | Checkpoint, alert user |
|
|
639
|
+
|
|
640
|
+
---
|
|
641
|
+
|
|
642
|
+
## Metrics & Observability
|
|
643
|
+
|
|
644
|
+
### Daemon Metrics
|
|
645
|
+
|
|
646
|
+
```yaml
|
|
647
|
+
# Exposed via /grid:daemon metrics
|
|
648
|
+
metrics:
|
|
649
|
+
runtime_seconds: 8100
|
|
650
|
+
programs_spawned: 12
|
|
651
|
+
programs_failed: 0
|
|
652
|
+
commits_made: 8
|
|
653
|
+
files_created: 24
|
|
654
|
+
files_modified: 15
|
|
655
|
+
lines_written: 2847
|
|
656
|
+
checkpoints_hit: 3
|
|
657
|
+
checkpoint_wait_seconds: 120
|
|
658
|
+
context_resets: 1
|
|
659
|
+
warmth_transfers: 1
|
|
660
|
+
```
|
|
661
|
+
|
|
662
|
+
### Health Dashboard (Future)
|
|
663
|
+
|
|
664
|
+
```
|
|
665
|
+
DAEMON HEALTH
|
|
666
|
+
═════════════
|
|
667
|
+
|
|
668
|
+
Active Daemons: 2
|
|
669
|
+
|
|
670
|
+
┌─────────────────────────────────────────────────────────────────────┐
|
|
671
|
+
│ ID: ecommerce-build Status: EXECUTING Health: ●●●●○ │
|
|
672
|
+
│ Runtime: 2h 15m Progress: 60% Est: 1h 30m │
|
|
673
|
+
├─────────────────────────────────────────────────────────────────────┤
|
|
674
|
+
│ ID: api-refactor Status: CHECKPOINT Health: ●●●●● │
|
|
675
|
+
│ Runtime: 45m Progress: 80% Waiting: human-verify│
|
|
676
|
+
└─────────────────────────────────────────────────────────────────────┘
|
|
677
|
+
|
|
678
|
+
System Resources:
|
|
679
|
+
CPU: 12% (Claude Code processes)
|
|
680
|
+
Memory: 2.1 GB
|
|
681
|
+
Disk: 142 MB (.grid/ state)
|
|
682
|
+
```
|
|
683
|
+
|
|
684
|
+
---
|
|
685
|
+
|
|
686
|
+
## Future Enhancements
|
|
687
|
+
|
|
688
|
+
### Distributed Execution
|
|
689
|
+
|
|
690
|
+
Multiple machines contributing to single daemon:
|
|
691
|
+
- Shared state via cloud storage
|
|
692
|
+
- Work distribution via queue
|
|
693
|
+
- Merge reconciliation
|
|
694
|
+
|
|
695
|
+
### Learning Across Daemons
|
|
696
|
+
|
|
697
|
+
Global warmth database:
|
|
698
|
+
- Patterns learned across all daemons
|
|
699
|
+
- Shared gotchas and best practices
|
|
700
|
+
- Per-project and global layers
|
|
701
|
+
|
|
702
|
+
### Scheduled Daemons
|
|
703
|
+
|
|
704
|
+
Cron-like scheduling:
|
|
705
|
+
```bash
|
|
706
|
+
/grid:daemon schedule "daily at 3am" "Run test suite and fix failures"
|
|
707
|
+
```
|
|
708
|
+
|
|
709
|
+
### Daemon Chaining
|
|
710
|
+
|
|
711
|
+
Sequential daemon execution:
|
|
712
|
+
```bash
|
|
713
|
+
/grid:daemon chain \
|
|
714
|
+
"Build feature X" \
|
|
715
|
+
"Write tests for feature X" \
|
|
716
|
+
"Update documentation"
|
|
717
|
+
```
|
|
718
|
+
|
|
719
|
+
---
|
|
720
|
+
|
|
721
|
+
## Appendix A: State File Schemas
|
|
722
|
+
|
|
723
|
+
### checkpoint.json Schema
|
|
724
|
+
|
|
725
|
+
```json
|
|
726
|
+
{
|
|
727
|
+
"$schema": "http://json-schema.org/draft-07/schema#",
|
|
728
|
+
"type": "object",
|
|
729
|
+
"required": ["version", "daemon_id", "status", "task", "progress"],
|
|
730
|
+
"properties": {
|
|
731
|
+
"version": { "type": "string" },
|
|
732
|
+
"daemon_id": { "type": "string" },
|
|
733
|
+
"created": { "type": "string", "format": "date-time" },
|
|
734
|
+
"updated": { "type": "string", "format": "date-time" },
|
|
735
|
+
"status": {
|
|
736
|
+
"type": "string",
|
|
737
|
+
"enum": ["starting", "executing", "paused", "checkpoint", "complete", "failed"]
|
|
738
|
+
},
|
|
739
|
+
"task": {
|
|
740
|
+
"type": "object",
|
|
741
|
+
"properties": {
|
|
742
|
+
"description": { "type": "string" },
|
|
743
|
+
"mode": { "type": "string" },
|
|
744
|
+
"original_prompt": { "type": "string" }
|
|
745
|
+
}
|
|
746
|
+
},
|
|
747
|
+
"progress": {
|
|
748
|
+
"type": "object",
|
|
749
|
+
"properties": {
|
|
750
|
+
"current_wave": { "type": "integer" },
|
|
751
|
+
"total_waves": { "type": "integer" },
|
|
752
|
+
"completed_blocks": { "type": "array", "items": { "type": "string" } },
|
|
753
|
+
"current_block": { "type": "string" },
|
|
754
|
+
"percent": { "type": "integer" }
|
|
755
|
+
}
|
|
756
|
+
}
|
|
757
|
+
}
|
|
758
|
+
}
|
|
759
|
+
```
|
|
760
|
+
|
|
761
|
+
---
|
|
762
|
+
|
|
763
|
+
## Appendix B: Claude Code Feature Requests
|
|
764
|
+
|
|
765
|
+
For full daemon mode capability, The Grid would benefit from these Claude Code enhancements:
|
|
766
|
+
|
|
767
|
+
1. **Native daemon mode**: `claude daemon` subcommand
|
|
768
|
+
2. **IPC channel**: Query running sessions without terminal
|
|
769
|
+
3. **Notification API**: Hook into system notifications
|
|
770
|
+
4. **Session serialization**: Export/import session state
|
|
771
|
+
5. **Headless operation**: Run without TTY requirement
|
|
772
|
+
6. **Multi-session coordination**: Built-in session chaining
|
|
773
|
+
|
|
774
|
+
---
|
|
775
|
+
|
|
776
|
+
*Document Version: 1.0*
|
|
777
|
+
*Last Updated: 2026-01-23*
|
|
778
|
+
*Author: Grid Program 1 (Daemon Architecture Specialist)*
|
|
779
|
+
|
|
780
|
+
End of Line.
|