groove-dev 0.27.27 → 0.27.28

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (32) hide show
  1. package/.groove-staging/state.json +3 -0
  2. package/.groove-staging/timeline.json +13 -0
  3. package/DECENTRALIZED_NET_WP_V1.md +871 -0
  4. package/README.md +28 -0
  5. package/decentralized-net/ACTION_PLAN.md +422 -0
  6. package/node_modules/@groove-dev/cli/package.json +1 -1
  7. package/node_modules/@groove-dev/daemon/package.json +1 -1
  8. package/node_modules/@groove-dev/daemon/src/api.js +99 -0
  9. package/node_modules/@groove-dev/daemon/src/process.js +12 -0
  10. package/node_modules/@groove-dev/daemon/src/providers/claude-code.js +26 -1
  11. package/node_modules/@groove-dev/gui/dist/assets/{index-DieCV-v1.js → index-Ch1N9G4Z.js} +1728 -1728
  12. package/node_modules/@groove-dev/gui/dist/index.html +1 -1
  13. package/node_modules/@groove-dev/gui/package.json +1 -1
  14. package/node_modules/@groove-dev/gui/src/components/agents/agent-config.jsx +147 -21
  15. package/node_modules/@groove-dev/gui/src/components/agents/spawn-wizard.jsx +206 -44
  16. package/node_modules/@groove-dev/gui/src/components/marketplace/integration-wizard.jsx +11 -24
  17. package/node_modules/@groove-dev/gui/src/components/marketplace/marketplace-card.jsx +1 -36
  18. package/node_modules/@groove-dev/gui/src/lib/integration-logos.js +39 -0
  19. package/package.json +1 -1
  20. package/packages/cli/package.json +1 -1
  21. package/packages/daemon/package.json +1 -1
  22. package/packages/daemon/src/api.js +99 -0
  23. package/packages/daemon/src/process.js +12 -0
  24. package/packages/daemon/src/providers/claude-code.js +26 -1
  25. package/packages/gui/dist/assets/{index-DieCV-v1.js → index-Ch1N9G4Z.js} +1728 -1728
  26. package/packages/gui/dist/index.html +1 -1
  27. package/packages/gui/package.json +1 -1
  28. package/packages/gui/src/components/agents/agent-config.jsx +147 -21
  29. package/packages/gui/src/components/agents/spawn-wizard.jsx +206 -44
  30. package/packages/gui/src/components/marketplace/integration-wizard.jsx +11 -24
  31. package/packages/gui/src/components/marketplace/marketplace-card.jsx +1 -36
  32. package/packages/gui/src/lib/integration-logos.js +39 -0
@@ -0,0 +1,871 @@
1
+ The Groove Protocol
2
+ Decentralized Intelligence Network
3
+ v1.0 Whitepaper — 2026
4
+ GrooveDev.ai
5
+
6
+
7
+ ================================================================================
8
+ EXECUTIVE SUMMARY
9
+ ================================================================================
10
+
11
+ The Groove Protocol is an open-source, peer-to-peer infrastructure designed to
12
+ decentralize intelligence. It combines a 7-layer agentic orchestration harness
13
+ with a decentralized compute marketplace to enable "Savant" models — highly
14
+ specialized, efficient LLMs trained on real agent execution traces.
15
+
16
+ Users contribute compute and data to earn $GROOVE tokens. Developers and
17
+ enterprises access frontier-level intelligence at a fraction of centralized
18
+ cloud costs. The model improves in direct proportion to community usage,
19
+ creating a Circular Intelligence Economy.
20
+
21
+ The core technical innovation is a three-layer inference stack — pipeline
22
+ parallelism, speculative decoding with local draft models, and geographically
23
+ aware routing — that solves the latency problem that killed earlier
24
+ decentralized compute platforms. Consumer devices are active participants in
25
+ inference, not dumb terminals, enabling production-grade response times across
26
+ a globally distributed GPU network.
27
+
28
+
29
+ ================================================================================
30
+ 1. THE PROBLEM
31
+ ================================================================================
32
+
33
+ AI compute is centralized, expensive, and extractive.
34
+
35
+ Today, running frontier-level models requires either:
36
+ (a) Paying per-token API costs to OpenAI, Anthropic, or Google — costs that
37
+ scale unpredictably and create vendor lock-in.
38
+ (b) Purchasing enterprise GPU infrastructure — $200K+ for a single high-end
39
+ rig, with ongoing power, cooling, and maintenance costs.
40
+
41
+ Meanwhile, millions of consumer GPUs sit idle. Gaming rigs, workstations,
42
+ and M-series Macs collectively represent more compute capacity than any
43
+ single cloud provider. This capacity is stranded — there is no efficient
44
+ marketplace to connect people who need compute with people who have it.
45
+
46
+ Previous attempts at decentralized inference (Petals, early Together.ai)
47
+ demonstrated the concept but failed at production UX. The root cause: naive
48
+ layer-by-layer network hopping introduces latency that entirely swallows the
49
+ speed advantage of GPU compute. A 200-token response routed naively through
50
+ 3 nodes accumulates 10+ seconds of pure network delay before any compute
51
+ time is counted.
52
+
53
+ Groove solves this with an inference architecture that converts network delay
54
+ into useful local computation, achieving production-grade throughput on a
55
+ fully decentralized network.
56
+
57
+
58
+ ================================================================================
59
+ 2. ARCHITECTURE OVERVIEW
60
+ ================================================================================
61
+
62
+ The Groove Protocol consists of five interconnected layers:
63
+
64
+ Layer 1 — Inference Pipeline
65
+ Model sharding, speculative decoding, and pipeline parallelism across
66
+ distributed GPU nodes. This is the engine that makes decentralized inference
67
+ fast enough for production use.
68
+
69
+ Layer 2 — Relay Network
70
+ Lightweight coordination nodes that maintain routing tables, assemble
71
+ optimal inference pipelines, handle failover, and manage session state.
72
+ No GPU required — any machine can run a relay node.
73
+
74
+ Layer 3 — Proof & Settlement
75
+ Cryptographic proof of compute generation, batch settlement on a
76
+ high-throughput L2 (Base), escrow contracts
77
+ for consumer-provider payment flows.
78
+
79
+ Layer 4 — Data & Training
80
+ The Savant Loop — execution traces from the orchestration platform become
81
+ training data via federated learning. The network's models improve as a
82
+ direct function of usage.
83
+
84
+ Layer 5 — Economic
85
+ The $GROOVE token economy — compute marketplace, data staking, enterprise
86
+ sandboxes, and protocol-level fee capture that funds open-source
87
+ development and training infrastructure.
88
+
89
+
90
+ ================================================================================
91
+ 3. THE P2P COMPUTE LAYER
92
+ ================================================================================
93
+
94
+ 3.1 Weight Slicing & Model Sharding
95
+
96
+ To serve large models (32B+ parameters) on consumer hardware with 8-16GB
97
+ VRAM, Groove employs layer-wise sharding:
98
+
99
+ A 32B model with 64 transformer layers is sliced into blocks:
100
+ Node B (RTX 4090, 24GB): Layers 1-21
101
+ Node C (RTX 3090, 24GB): Layers 22-42
102
+ Node D (Mac M2 Ultra): Layers 43-64
103
+
104
+ Each node loads only its assigned layer range into VRAM, plus the KV cache
105
+ for active sessions. A 32B model quantized to NF4 requires ~18-22GB total;
106
+ split across 3 nodes, each holds 6-8GB of weights — well within consumer
107
+ GPU capacity.
108
+
109
+ Nodes discover each other and advertise their hosted layer ranges via a
110
+ Distributed Hash Table (DHT) built on libp2p. The DHT tracks:
111
+ - Which model shards each node hosts
112
+ - Geographic region and measured RTT to neighboring nodes
113
+ - Current load (active sessions, queue depth)
114
+ - Uptime reliability score (rolling 24-hour window)
115
+ - GPU specifications and available VRAM
116
+
117
+ 3.2 Quantization
118
+
119
+ Groove uses advanced quantization to maximize intelligence-per-watt:
120
+
121
+ 4-bit / 5-bit NF4 Quantization:
122
+ Reduces a 32B model's VRAM footprint to ~18-22GB, enabling performant
123
+ inference on dual-GPU consumer rigs or high-end Apple Silicon.
124
+
125
+ GGUF Format Support:
126
+ Nodes run shards via llama.cpp or vLLM, supporting the full range of
127
+ community-quantized models. Operators choose their quantization level
128
+ based on their hardware — the relay node factors quant quality into
129
+ pipeline assembly.
130
+
131
+ 3.3 KV Cache Distribution
132
+
133
+ Each node in the pipeline maintains the KV cache for its assigned layers
134
+ for the duration of a session. This enables:
135
+
136
+ Massive Context Windows:
137
+ A single consumer GPU might support 8K context. Three nodes collectively
138
+ support 24K+, with each node managing memory for its layer subset. The
139
+ context window scales with the number of nodes in the pipeline.
140
+
141
+ Distributed Conversation Memory:
142
+ Long conversations that would exhaust a single machine's memory are
143
+ naturally distributed across the pipeline. Each node only holds the KV
144
+ cache for its layers — memory pressure is divided, not multiplied.
145
+
146
+
147
+ ================================================================================
148
+ 4. THE THREE-LAYER INFERENCE STACK
149
+ ================================================================================
150
+
151
+ This is the core technical innovation that makes decentralized inference
152
+ viable. Three techniques stack to convert a network-latency-dominated
153
+ system into a compute-dominated one.
154
+
155
+ 4.1 Pipeline Parallelism
156
+
157
+ Naive approach: Generate token 1 through all 64 layers, then start token 2.
158
+ Each token pays the full cost of every network hop.
159
+
160
+ Pipeline approach: Overlap computation across nodes for different tokens.
161
+
162
+ Time -->
163
+ Node B: [tok1 L1-21] [tok2 L1-21] [tok3 L1-21] [tok4 L1-21] ...
164
+ Node C: [tok1 L22-42] [tok2 L22-42] [tok3 L22-42] ...
165
+ Node D: [tok1 L43-64] [tok2 L43-64] ...
166
+
167
+ After the pipeline fills (3 tokens deep), throughput is bounded by the
168
+ slowest single node, not the sum of all nodes. This alone improves
169
+ throughput from ~1 tok/s to ~5-8 tok/s.
170
+
171
+ 4.2 Speculative Decoding with Local Draft Models
172
+
173
+ This is the architectural decision that transforms the phone from a dumb
174
+ terminal into an active inference participant.
175
+
176
+ The consumer device (phone, laptop, tablet) runs a small 1B-3B quantized
177
+ "draft" model locally. This model speculatively generates a window of
178
+ candidate tokens (typically 8). The full distributed pipeline then verifies
179
+ all candidates in a single forward pass instead of generating them one by
180
+ one.
181
+
182
+ The verification math:
183
+ - Draft model predicts 8 tokens
184
+ - Full 32B model accepts the first N that match its own distribution
185
+ - Typical acceptance rate: 60-80% (higher for code, lower for creative)
186
+ - Result: 5-6 tokens from 1 network round-trip instead of 1 token
187
+
188
+ This converts 70%+ of network RTT overhead into useful local computation.
189
+ The phone isn't waiting — it's actively drafting the next window while the
190
+ pipeline verifies the current one.
191
+
192
+ Adaptive Window Sizing:
193
+ The consumer device dynamically adjusts the speculative window based on
194
+ acceptance rate:
195
+ High acceptance (>80%): Expand window to 12 tokens
196
+ Normal acceptance (50-80%): Standard window of 8 tokens
197
+ Low acceptance (<50%): Shrink window to 4 tokens
198
+
199
+ This ensures the system optimizes for throughput when the draft model is
200
+ accurate and minimizes wasted compute when it's struggling.
201
+
202
+ Domain-Tuned Draft Models:
203
+ The Savant training loop (Section 7) produces domain-specific draft models
204
+ alongside the full models. A code-focused Savant draft model achieves
205
+ 80%+ acceptance on programming tasks. The training flywheel directly
206
+ improves inference speed, not just quality.
207
+
208
+ 4.3 Geographic Routing via DHT
209
+
210
+ The Relay Node doesn't route to ANY node hosting the required layers — it
211
+ routes to the OPTIMAL node based on:
212
+ - Network RTT to adjacent pipeline nodes (<10ms preferred)
213
+ - Current queue depth (prefer idle nodes)
214
+ - GPU capability (match quant requirements to hardware)
215
+ - Uptime reliability (>95% preferred for primary, >90% for standby)
216
+
217
+ Ideal pipeline assembly: all nodes in the same metropolitan area, where hop
218
+ latency drops to 2-5ms instead of 50ms intercontinental. The routing layer
219
+ explicitly optimizes for geographic locality.
220
+
221
+
222
+ ================================================================================
223
+ 5. THE RELAY NETWORK
224
+ ================================================================================
225
+
226
+ Relay Nodes are the coordination layer of the Groove Protocol. They are
227
+ lightweight — no GPU required, minimal CPU and bandwidth — and earn a small
228
+ coordination fee for every session they route.
229
+
230
+ 5.1 Responsibilities
231
+
232
+ Pipeline Assembly:
233
+ When a consumer initiates a session, the relay node queries the DHT,
234
+ evaluates candidates, and assembles an optimal pipeline of compute nodes
235
+ plus standby backups for each position.
236
+
237
+ Session Management:
238
+ The relay maintains session state — which nodes are in the pipeline,
239
+ heartbeat monitoring, KV cache checkpoint status, billing accumulation.
240
+
241
+ Failover Orchestration:
242
+ When a compute node drops (heartbeat timeout at 100ms), the relay
243
+ instantly promotes the standby node and updates routing for all other
244
+ pipeline members. Target failover time: <150ms.
245
+
246
+ Proof Batching:
247
+ The relay collects proof-of-compute attestations from each node and
248
+ submits them in batches to the L2 settlement contract, reducing gas
249
+ costs.
250
+
251
+ 5.2 Why a Separate Coordination Layer
252
+
253
+ Previous decentralized compute systems (Petals, Bittensor) relied on compute
254
+ nodes to self-organize. This creates a scheduling bottleneck under
255
+ heterogeneous load — nodes optimizing for their own utilization make
256
+ globally suboptimal routing decisions.
257
+
258
+ The relay node acts as a neutral scheduler. It has no stake in which compute
259
+ node handles which request — only in assembling the fastest possible
260
+ pipeline. Relay operators earn fees proportional to session quality (measured
261
+ by achieved tok/s vs. estimated tok/s), aligning incentives with good
262
+ routing decisions.
263
+
264
+ 5.3 Network Transport: QUIC (HTTP/3)
265
+
266
+ All node-to-node communication uses QUIC for:
267
+
268
+ 0-RTT Connection Resumption:
269
+ Critical for failover. When Node C drops and C-Prime is promoted, the
270
+ existing QUIC session resumes without a full handshake. Pipeline
271
+ continuity is maintained in milliseconds.
272
+
273
+ Multiplexed Streams:
274
+ Multiple inference requests can share a single connection. Pipeline
275
+ nodes can process requests for different sessions concurrently without
276
+ head-of-line blocking.
277
+
278
+ Built-in Congestion Control:
279
+ QUIC's congestion control is tuned for low-latency interactive traffic,
280
+ unlike TCP's throughput-optimized defaults. Hidden state transfers
281
+ (~13KB per token for a 32B model) complete faster.
282
+
283
+ NAT Traversal:
284
+ Consumer hardware sits behind NAT routers. QUIC over UDP is more
285
+ NAT-friendly than TCP, and combined with libp2p's STUN/TURN hole
286
+ punching, enables direct node-to-node connections without relay
287
+ mediation for data transfer.
288
+
289
+
290
+ ================================================================================
291
+ 6. THE INFERENCE REQUEST LIFECYCLE
292
+ ================================================================================
293
+
294
+ A complete step-by-step protocol for what happens when User A on a phone
295
+ sends a prompt to a Groove Savant model.
296
+
297
+ PHASE 0 — Session Establishment (once per conversation)
298
+
299
+ Phone --> Relay Node: SESSION_INIT
300
+ model: "groove-savant-32b"
301
+ quant: "nf4"
302
+ max_context: 32768
303
+ region_preference: "us-west"
304
+ budget_cap: 150 $GROOVE
305
+
306
+ Relay Node:
307
+ DHT lookup for nodes hosting savant-32b shards
308
+ Filter: region proximity, uptime > 0.95, queue depth < 3
309
+ Assemble primary pipeline: [B: L1-21, C: L22-42, D: L43-64]
310
+ Assign standby pipeline: [B': L1-21, C': L22-42, D': L43-64]
311
+
312
+ Relay Node --> Nodes B, C, D: PIPELINE_RESERVE
313
+ session_id: "abc123"
314
+ ttl: 300s (renewed on activity)
315
+ consumer_stake: 50 $GROOVE (locked in L2 escrow)
316
+
317
+ Relay Node --> Phone: SESSION_READY
318
+ pipeline: [B, C, D]
319
+ estimated_tok_s: 18
320
+ price_per_1k_tokens: 0.3 $GROOVE
321
+
322
+ The relay pre-reserves compute AND standbys before the first token is
323
+ generated. The consumer's stake goes into L2 escrow — preventing griefing
324
+ and guaranteeing provider payment even if the consumer disconnects.
325
+
326
+ PHASE 1 — Prompt Encoding (parallel)
327
+
328
+ Phone: Tokenize prompt locally
329
+ Phone: Run draft model on prompt to pre-generate 8 speculative tokens
330
+ Phone --> Node B: PROMPT_ENCODE
331
+ tokens: [prompt_tokens]
332
+ speculative: [8 draft tokens]
333
+ session_id: "abc123"
334
+
335
+ The phone sends both the prompt AND the first speculative window in a
336
+ single message. While the pipeline processes the prefill, speculative
337
+ tokens are already queued. Zero wasted time.
338
+
339
+ PHASE 2 — Prefill Pass (sequential, once per turn)
340
+
341
+ Node B: Process prompt through L1-21, generate KV cache
342
+ Node B --> Node C: ACTIVATIONS { hidden_states, kv_metadata }
343
+ Node B --> B' (standby): KV_DELTA { incremental cache update }
344
+
345
+ Node C: Process through L22-42, generate KV cache
346
+ Node C --> Node D: ACTIVATIONS { hidden_states, kv_metadata }
347
+ Node C --> C' (standby): KV_DELTA { incremental cache update }
348
+
349
+ Node D: Process through L43-64, produce logits
350
+ Node D --> D' (standby): KV_DELTA { incremental cache update }
351
+
352
+ Prefill is the slowest step — the entire prompt flows through all layers.
353
+ It happens once per conversation turn. For a 500-token prompt across
354
+ 3 nodes: 200-400ms total.
355
+
356
+ Each node simultaneously streams KV deltas to its standby. By the time
357
+ prefill finishes, all three standbys are warm and ready for instant
358
+ failover.
359
+
360
+ PHASE 3 — Speculative Verification
361
+
362
+ Node D has logits for the prompt AND 8 speculative tokens.
363
+ Node D verifies against actual model distribution:
364
+ Tokens 1-5: MATCH (draft model correct)
365
+ Token 6: MISMATCH (draft: "async", model: "Promise")
366
+ Tokens 7-8: DISCARDED (after first mismatch)
367
+
368
+ Node D --> Phone: VERIFY_RESULT
369
+ accepted: ["const", " ", "result", " ", "="]
370
+ correction: "Promise"
371
+ total_generated: 6
372
+
373
+ One network round-trip, 6 tokens. The phone displays all 6 tokens
374
+ immediately while the draft model begins the next speculative window
375
+ starting from the corrected token.
376
+
377
+ PHASE 4 — Steady-State Generation (pipelined + speculative)
378
+
379
+ LOOP:
380
+ Phone: Generate next N speculative tokens via draft model (~15ms)
381
+ Phone --> Node B: SPEC_WINDOW { tokens: [N candidates], turn: T }
382
+
383
+ Pipeline processes while phone drafts next window:
384
+ B: L1-21 --> C: L22-42 --> D: L43-64
385
+
386
+ D --> Phone: VERIFY_RESULT { accepted: 5-7, correction: 1 }
387
+
388
+ Phone: Display accepted tokens to user immediately
389
+ Phone: Adjust window size based on acceptance rate
390
+
391
+ UNTIL: EOS token or context limit reached
392
+
393
+ PHASE 5 — Proof Generation & Settlement
394
+
395
+ Each node, per speculative window processed:
396
+ PROOF_OF_COMPUTE
397
+ session_id, node_id, layers_processed, tokens_verified,
398
+ compute_time_ms, gpu_utilization, nonce,
399
+ signature: sign(private_key, hash(fields))
400
+
401
+ Relay Node batches proofs per session window:
402
+ --> L2 Contract: SETTLE_BATCH
403
+ session_id: "abc123"
404
+ proofs: [B_proof, C_proof, D_proof]
405
+ total_tokens: 6
406
+ consumer: User_A_address
407
+
408
+ L2 Contract:
409
+ Verify signatures
410
+ Release $GROOVE from escrow proportional to layers hosted:
411
+ B: 33% (21/64 layers)
412
+ C: 31% (21/64 layers)
413
+ D: 36% (22/64 layers)
414
+ Deduct from consumer's staked balance
415
+
416
+ Settlement is batched per verification window, not per token. A full
417
+ response generates 30-50 settlement events, keeping L2 gas costs
418
+ manageable.
419
+
420
+ PHASE 6 — Failover
421
+
422
+ Relay Node: Detects Node C heartbeat timeout (100ms threshold)
423
+ Relay Node --> C' (standby): PROMOTE
424
+ session_id: "abc123"
425
+ role: primary
426
+ kv_cache: already warm (continuous delta streaming)
427
+
428
+ Relay Node: Update routing: B --> C' --> D
429
+ Relay Node --> B, D: PIPELINE_UPDATE { new_hop: C' }
430
+
431
+ Total failover time: ~150ms
432
+ User perceives: brief stutter, at most 1 extra second on current response
433
+ Conversation context: fully preserved, no re-prefill required
434
+
435
+
436
+ ================================================================================
437
+ 7. THE SAVANT LOOP — DATA STRATEGY
438
+ ================================================================================
439
+
440
+ Data is the capital of the Groove ecosystem. Instead of generic web-scraping,
441
+ Groove captures Vertical Intelligence — the actual reasoning traces of agents
442
+ performing real work.
443
+
444
+ 7.1 Execution Traces as Training Data
445
+
446
+ The 7-layer orchestration platform captures high-fidelity "Agentic Traces":
447
+ - Reasoning steps and chain-of-thought
448
+ - Tool calls and their results
449
+ - Error encounters and corrections
450
+ - Task decomposition decisions
451
+ - Code generation and iteration patterns
452
+
453
+ Users who opt in to share anonymized traces receive a Data Staking
454
+ Multiplier — earning more $GROOVE per compute hour contributed.
455
+
456
+ 7.2 Federated Fine-Tuning
457
+
458
+ Global Savant models are updated via federated learning:
459
+
460
+ 1. Individual nodes calculate gradient updates based on local usage data
461
+ 2. Only gradient updates (not raw data) are transmitted to Aggregator nodes
462
+ 3. Aggregator nodes merge updates into new model versions using secure
463
+ aggregation — no single aggregator ever sees raw private data
464
+ 4. Updated model weights are distributed back to the network
465
+
466
+ This creates a privacy-preserving training pipeline where the model improves
467
+ continuously without centralizing user data.
468
+
469
+ 7.3 The Draft Model Flywheel
470
+
471
+ The Savant training loop produces two outputs:
472
+ 1. The full Savant model (32B+) — served by the distributed pipeline
473
+ 2. A distilled Savant Draft model (1B-3B) — runs locally on consumer devices
474
+
475
+ The draft model is specifically trained to predict the full model's outputs.
476
+ As the full model improves from execution traces, the draft model improves
477
+ in lockstep. This means:
478
+ - Better Savant model = better draft predictions
479
+ - Better draft predictions = higher speculative acceptance rate
480
+ - Higher acceptance rate = faster inference throughput
481
+ - Faster inference = more usage = more training data
482
+
483
+ The training flywheel directly improves both quality AND speed.
484
+
485
+
486
+ ================================================================================
487
+ 8. KV CACHE MANAGEMENT & FAILOVER
488
+ ================================================================================
489
+
490
+ The KV cache is the conversation's memory. Losing it means re-processing
491
+ the entire conversation history — a UX-destroying event. Groove treats KV
492
+ cache persistence as a first-class protocol concern.
493
+
494
+ 8.1 Asynchronous Checkpoint Streaming
495
+
496
+ Each primary compute node continuously streams KV cache deltas to its
497
+ assigned standby node over QUIC:
498
+
499
+ - Deltas are incremental (per-token additions), not full snapshots
500
+ - Streaming happens in parallel with inference computation
501
+ - Bandwidth overhead is marginal: ~13KB per token per layer boundary
502
+ - Standby nodes maintain a hot mirror of the primary's cache state
503
+
504
+ This means promotion of a standby is a routing table swap, not a
505
+ recomputation. The standby already has the full KV cache — it just starts
506
+ processing the next activation it receives.
507
+
508
+ 8.2 Session Recovery Tiers
509
+
510
+ Tier 1 — Hot Standby (default):
511
+ Continuous delta streaming. Failover in <150ms. No tokens lost.
512
+
513
+ Tier 2 — Warm Recovery:
514
+ Periodic checkpoint snapshots (every 30s). Failover requires re-prefill
515
+ of tokens generated since last checkpoint. Recovery in 1-3s.
516
+
517
+ Tier 3 — Cold Recovery:
518
+ No standby assigned (budget-constrained sessions). Full re-prefill from
519
+ conversation history on a newly assigned node. Recovery in 5-15s
520
+ depending on conversation length.
521
+
522
+ Consumers choose their reliability tier at session initialization. Higher
523
+ tiers cost more $GROOVE (standby nodes must be compensated for reserved
524
+ capacity) but provide seamless failover.
525
+
526
+
527
+ ================================================================================
528
+ 9. PROOF OF COMPUTE & SETTLEMENT
529
+ ================================================================================
530
+
531
+ Unlike traditional mining, Groove nodes perform Useful Work. Every inference
532
+ operation generates cryptographic proof that is verified on-chain to trigger
533
+ $GROOVE token releases.
534
+
535
+ 9.1 Proof Structure
536
+
537
+ Each compute node generates a proof per speculative verification window:
538
+
539
+ PROOF_OF_COMPUTE:
540
+ session_id — unique session identifier
541
+ node_id — node's public key / identity
542
+ layers_processed — layer range executed (e.g., [1, 21])
543
+ tokens_verified — number of tokens in the verification batch
544
+ compute_time_ms — wall clock time for computation
545
+ gpu_utilization — average GPU utilization during compute
546
+ input_hash — hash of received activations
547
+ output_hash — hash of produced activations
548
+ nonce — random value for uniqueness
549
+ signature — Ed25519 signature over hash of all fields
550
+
551
+ 9.2 Verification Mechanisms
552
+
553
+ Three layers of verification prevent dishonest claims:
554
+
555
+ Relay Timing Verification:
556
+ The relay node independently timestamps when activations are sent to
557
+ and received from each compute node. Claimed compute times that
558
+ significantly deviate from relay-observed wall clock time are flagged.
559
+
560
+ Challenge-Response Spot Checks:
561
+ The L2 contract randomly selects a small percentage of proofs for
562
+ re-execution. A trusted verifier node re-processes the same inputs
563
+ and compares output hashes. Failed challenges result in slashing of
564
+ the node's staked $GROOVE.
565
+
566
+ Reputation-Weighted Trust:
567
+ Nodes build reputation scores over time based on:
568
+ - Proof accuracy on spot checks
569
+ - Uptime consistency
570
+ - Latency SLA compliance
571
+ - Session completion rate
572
+ High-reputation nodes face fewer spot checks (reducing overhead).
573
+ New nodes face higher scrutiny until they build trust.
574
+
575
+ 9.3 Batch Settlement
576
+
577
+ Proofs are batched to minimize L2 gas costs:
578
+ - Relay nodes collect proofs per session per verification window
579
+ - Batches are submitted to the L2 contract at configurable intervals
580
+ - The contract verifies signatures, checks relay timing data, and
581
+ releases escrowed $GROOVE proportional to work performed
582
+ - Payment is split among pipeline nodes based on layer count:
583
+ a node hosting 21 of 64 layers receives 33% of the inference payment
584
+
585
+ 9.4 Anti-Gaming Measures
586
+
587
+ Cherry-Pick Prevention:
588
+ Nodes cannot selectively accept easy/short requests and drop hard ones.
589
+ The protocol tracks session completion rate — dropping below 90%
590
+ triggers reputation penalties and reduced routing priority.
591
+
592
+ Latency SLA Compliance:
593
+ At session establishment, the relay estimates tok/s. Nodes that
594
+ consistently underperform their estimated throughput by >30% face
595
+ reduced routing priority. This prevents nodes from over-advertising
596
+ their capabilities.
597
+
598
+ Stake Slashing:
599
+ Compute nodes stake $GROOVE to participate. Verified dishonest behavior
600
+ (failed spot checks, fabricated proofs) results in partial stake
601
+ slashing. The slashed amount is burned, creating deflationary pressure.
602
+
603
+
604
+ ================================================================================
605
+ 10. THE ECONOMIC FLYWHEEL
606
+ ================================================================================
607
+
608
+ User creates Data --> Network trains Model --> Model improves Agents
609
+ --> Agents attract Users --> Users create Data
610
+
611
+ The $GROOVE token is the oxygen of the network:
612
+
613
+ 10.1 Compute Marketplace
614
+
615
+ Consumers:
616
+ Purchase inference by staking $GROOVE into session escrow. Pricing is
617
+ per-1K-tokens, dynamically adjusted based on network supply/demand.
618
+ Consumers set budget caps — sessions terminate gracefully when the cap
619
+ is approached.
620
+
621
+ Providers (GPU Operators):
622
+ Earn $GROOVE for every verified inference computation. Revenue scales
623
+ with GPU capability, uptime, and geographic desirability. A well-
624
+ positioned RTX 4090 in a high-demand metro area earns more than an
625
+ identical card in a low-demand region.
626
+
627
+ Relay Operators:
628
+ Earn a coordination fee (small percentage of session value) for routing
629
+ and session management. Low hardware requirements — any stable internet
630
+ connection qualifies.
631
+
632
+ 10.2 Data Staking
633
+
634
+ Users who opt in to share anonymized execution traces receive a Data Staking
635
+ Multiplier on their token earnings. The multiplier scales with:
636
+ - Volume of traces contributed
637
+ - Quality of traces (measured by downstream training signal)
638
+ - Rarity of the task domain (underrepresented domains earn more)
639
+
640
+ This creates an incentive to use Groove for diverse, complex tasks —
641
+ exactly the data that makes Savant models better.
642
+
643
+ 10.3 Enterprise Sandboxes
644
+
645
+ Large firms pay $GROOVE for private, audited model instances:
646
+ - Dedicated compute pipelines (no shared nodes)
647
+ - Trusted Execution Environment (TEE) guarantees
648
+ - Compliance-auditable inference logs
649
+ - Custom Savant models fine-tuned on enterprise data
650
+
651
+ Enterprise demand creates sustained buy pressure on $GROOVE, providing
652
+ price stability for the broader network.
653
+
654
+ 10.4 Protocol Fee & Treasury
655
+
656
+ A small percentage of every transaction is captured by the protocol:
657
+ - 50% funds open-source development
658
+ - 30% funds Savant training infrastructure (Foundation Fleet GPUs)
659
+ - 20% funds a stability reserve for network incentive smoothing
660
+
661
+ 10.5 Cost Advantage
662
+
663
+ The decentralized model fundamentally undercuts centralized pricing:
664
+ - No data center leases, power contracts, or cooling infrastructure
665
+ - GPU operators use existing hardware during idle time — marginal cost
666
+ approaches electricity only
667
+ - Competition among providers drives prices to near-marginal-cost
668
+ - Enterprise users escape unpredictable API billing and recurring
669
+ cloud infrastructure budget alerts
670
+
671
+
672
+ ================================================================================
673
+ 11. PRIVACY & ENTERPRISE READINESS
674
+ ================================================================================
675
+
676
+ 11.1 Intermediate Activation Privacy
677
+
678
+ When Node B sends hidden states to Node C, those activations contain
679
+ information about the original input. Research has demonstrated partial
680
+ input reconstruction from intermediate activations. Groove addresses this
681
+ at three levels:
682
+
683
+ Standard Tier (Consumer):
684
+ Activations are transmitted in the clear between pipeline nodes.
685
+ Privacy is protected by the ephemerality of the data (activations
686
+ exist only in transit, not stored) and the contractual obligations
687
+ of node operators.
688
+
689
+ Enhanced Tier:
690
+ Differential privacy noise is injected into activations at each
691
+ pipeline boundary. This mathematically bounds the information leakage
692
+ while introducing minimal quality degradation (calibrated epsilon).
693
+
694
+ Enterprise Tier:
695
+ Compute nodes run inside Trusted Execution Environments (TEEs) —
696
+ Intel SGX, AMD SEV, or ARM CCA. The relay node only routes to
697
+ TEE-attested nodes for enterprise sessions. Activations are processed
698
+ in encrypted memory enclaves. Combined with dedicated pipelines, this
699
+ provides end-to-end verifiable privacy.
700
+
701
+ 11.2 Data Privacy in Training
702
+
703
+ The federated learning pipeline ensures raw user data never leaves the
704
+ local node:
705
+ - Only gradient updates are transmitted
706
+ - Secure aggregation protocols prevent the aggregator from
707
+ reconstructing individual contributions
708
+ - Differential privacy is applied at the gradient level before
709
+ transmission
710
+ - Users can audit exactly what data they contribute via the opt-in
711
+ dashboard
712
+
713
+
714
+ ================================================================================
715
+ 12. NETWORK BOOTSTRAP — THE FOUNDATION FLEET
716
+ ================================================================================
717
+
718
+ A fully decentralized network cannot deliver reliable SLAs on Day 1 with
719
+ 10 nodes. Groove addresses the cold start problem with a progressive
720
+ decentralization strategy.
721
+
722
+ Phase 1 — Foundation Fleet (Launch):
723
+ Groove operates a core set of high-performance GPU nodes that guarantee
724
+ baseline model coverage and latency SLAs. These nodes are operated at
725
+ cost — they earn $GROOVE like any other node but are committed to 99.9%
726
+ uptime. The Foundation Fleet ensures every model shard has redundant
727
+ coverage from day one.
728
+
729
+ Phase 2 — Hybrid Network (Months 1-6):
730
+ Community nodes begin joining. The relay network preferentially routes
731
+ to community nodes when they meet latency/reliability thresholds, but
732
+ falls back to Foundation Fleet nodes when community coverage is thin.
733
+ Foundation Fleet nodes handle the long tail — rare model shards,
734
+ low-demand regions, peak load overflow.
735
+
736
+ Phase 3 — Community Majority (Months 6-18):
737
+ As network density grows, community nodes handle the majority of
738
+ inference. Foundation Fleet nodes scale down to standby/overflow only.
739
+ Protocol governance begins transitioning to token-weighted community
740
+ voting.
741
+
742
+ Phase 4 — Full Decentralization (18+ Months):
743
+ Foundation Fleet is fully retired or converted to community-operated
744
+ nodes. The protocol runs entirely on community infrastructure with
745
+ economic incentives maintaining reliability.
746
+
747
+
748
+ ================================================================================
749
+ 13. PERFORMANCE EXPECTATIONS
750
+ ================================================================================
751
+
752
+ Realistic throughput projections for a 32B Savant model, NF4 quantized,
753
+ 3-node pipeline with speculative decoding:
754
+
755
+ Best Case (nodes in same metro, good GPUs):
756
+ Time to first token: 500ms - 1s
757
+ Sustained throughput: 15-25 tok/s
758
+ User experience: comparable to a standard Claude or GPT response
759
+
760
+ Average Case (nodes across same continent):
761
+ Time to first token: 1-3s
762
+ Sustained throughput: 5-10 tok/s
763
+ User experience: noticeable but acceptable for most tasks
764
+
765
+ Worst Case (global distribution, unstable nodes):
766
+ Time to first token: 3-8s
767
+ Sustained throughput: 2-4 tok/s
768
+ User experience: suitable for background agent tasks, not interactive chat
769
+
770
+ For Groove's primary use case — agentic coding tasks running in the
771
+ background — even the average case provides excellent UX. Agents don't
772
+ need 50 tok/s interactive speed. A builder agent processing a refactoring
773
+ task is perfectly served by 10 tok/s sustained throughput.
774
+
775
+
776
+ ================================================================================
777
+ 14. TECHNICAL IMPLEMENTATION NOTES
778
+ ================================================================================
779
+
780
+ 14.1 Node-to-Node NAT Traversal
781
+
782
+ Compute nodes on consumer hardware sit behind NAT routers. Direct P2P
783
+ connections for hidden state transfer require:
784
+ - STUN/TURN hole punching via libp2p's QUIC transport
785
+ - Fallback to relay-mediated forwarding when direct connection fails
786
+ - NAT negotiation adds 100-200ms to initial pipeline setup (one-time
787
+ cost per session)
788
+
789
+ 14.2 Clock Synchronization
790
+
791
+ Proof of Compute includes compute_time_ms. To prevent inflated claims:
792
+ - Relay nodes independently timestamp request dispatch and result receipt
793
+ - NTP synchronization is required for node participation
794
+ - Statistical outlier detection flags nodes with systematically inflated
795
+ timing claims
796
+
797
+ 14.3 Model Distribution
798
+
799
+ New Savant model versions must propagate across the network:
800
+ - Model weights are distributed via BitTorrent-style chunked transfer
801
+ - Nodes download only the layer ranges they host
802
+ - Rolling upgrades: the relay network routes around nodes that are
803
+ updating, ensuring zero downtime during model transitions
804
+ - Incentive: nodes running the latest model version receive a routing
805
+ priority boost for 48 hours post-release
806
+
807
+
808
+ ================================================================================
809
+ 15. ROADMAP
810
+ ================================================================================
811
+
812
+ T1 — Foundation (Q3 2026):
813
+ - Core inference pipeline with 3-node layer sharding
814
+ - Relay node v1 with DHT routing and session management
815
+ - Speculative decoding protocol with local draft model
816
+ - $GROOVE token contract deployment on Base L2
817
+ - Foundation Fleet launch with initial model coverage
818
+ - Proof of Compute v1 with relay timing verification
819
+
820
+ T2 — Network Growth (Q4 2026):
821
+ - Community node onboarding with staking and reputation system
822
+ - KV cache async checkpointing and hot standby failover
823
+ - Enterprise tier with TEE-attested compute nodes
824
+ - Savant v1 model trained on initial execution trace dataset
825
+ - Adaptive geographic routing with real-time RTT optimization
826
+ - Mobile client with integrated draft model
827
+
828
+ T3 — Intelligence Flywheel (Q1 2027):
829
+ - Federated fine-tuning pipeline live on community nodes
830
+ - Domain-tuned draft models (code, reasoning, creative)
831
+ - Challenge-response spot check system for proof verification
832
+ - Enterprise sandbox product launch
833
+ - Cross-region pipeline optimization
834
+ - Data staking multiplier system
835
+
836
+ T4 — Full Decentralization (Q2-Q3 2027):
837
+ - Foundation Fleet retirement / community transition
838
+ - Protocol governance via token-weighted voting
839
+ - Multi-model support (multiple Savant variants on one network)
840
+ - Advanced privacy (differential privacy on activations)
841
+ - Network scaling to 10K+ active compute nodes
842
+ - Developer SDK for third-party applications
843
+
844
+
845
+ ================================================================================
846
+ CONCLUSION
847
+ ================================================================================
848
+
849
+ The Groove Protocol transforms stranded consumer GPU capacity into a global
850
+ intelligence network. By stacking pipeline parallelism, speculative decoding,
851
+ and geographic routing, it solves the latency problem that made previous
852
+ decentralized compute platforms impractical for production use.
853
+
854
+ The economic model aligns incentives across all participants: GPU operators
855
+ earn for idle compute, users get frontier-quality inference at marginal cost,
856
+ and the Savant training loop ensures the entire network gets smarter with
857
+ every task executed.
858
+
859
+ This is not a theoretical architecture. Each component — model sharding,
860
+ speculative decoding, federated learning, QUIC transport, L2 settlement —
861
+ exists in production systems today. Groove's innovation is the integration:
862
+ a coherent protocol that makes these pieces work together as a decentralized
863
+ intelligence marketplace.
864
+
865
+ The future of AI is not a handful of companies renting you access to their
866
+ GPUs. It's a network where everyone contributes and everyone benefits.
867
+
868
+
869
+ ================================================================================
870
+ GrooveDev.ai — Open Source Agentic Orchestration — v1.0 Whitepaper (2026)
871
+ ================================================================================