npm - groove-dev - Versions diffs - 0.27.28 → 0.27.30 - Mend

groove-dev 0.27.28 → 0.27.30

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

package/DECENTRALIZED_NET_WP_V1.md DELETED Viewed

@@ -1,871 +0,0 @@
-The Groove Protocol
-Decentralized Intelligence Network
-v1.0 Whitepaper — 2026
-GrooveDev.ai
-================================================================================
-EXECUTIVE SUMMARY
-================================================================================
-The Groove Protocol is an open-source, peer-to-peer infrastructure designed to
-decentralize intelligence. It combines a 7-layer agentic orchestration harness
-with a decentralized compute marketplace to enable "Savant" models — highly
-specialized, efficient LLMs trained on real agent execution traces.
-Users contribute compute and data to earn $GROOVE tokens. Developers and
-enterprises access frontier-level intelligence at a fraction of centralized
-cloud costs. The model improves in direct proportion to community usage,
-creating a Circular Intelligence Economy.
-The core technical innovation is a three-layer inference stack — pipeline
-parallelism, speculative decoding with local draft models, and geographically
-aware routing — that solves the latency problem that killed earlier
-decentralized compute platforms. Consumer devices are active participants in
-inference, not dumb terminals, enabling production-grade response times across
-a globally distributed GPU network.
-================================================================================
-1. THE PROBLEM
-================================================================================
-AI compute is centralized, expensive, and extractive.
-Today, running frontier-level models requires either:
-  (a) Paying per-token API costs to OpenAI, Anthropic, or Google — costs that
-      scale unpredictably and create vendor lock-in.
-  (b) Purchasing enterprise GPU infrastructure — $200K+ for a single high-end
-      rig, with ongoing power, cooling, and maintenance costs.
-Meanwhile, millions of consumer GPUs sit idle. Gaming rigs, workstations,
-and M-series Macs collectively represent more compute capacity than any
-single cloud provider. This capacity is stranded — there is no efficient
-marketplace to connect people who need compute with people who have it.
-Previous attempts at decentralized inference (Petals, early Together.ai)
-demonstrated the concept but failed at production UX. The root cause: naive
-layer-by-layer network hopping introduces latency that entirely swallows the
-speed advantage of GPU compute. A 200-token response routed naively through
-3 nodes accumulates 10+ seconds of pure network delay before any compute
-time is counted.
-Groove solves this with an inference architecture that converts network delay
-into useful local computation, achieving production-grade throughput on a
-fully decentralized network.
-================================================================================
-2. ARCHITECTURE OVERVIEW
-================================================================================
-The Groove Protocol consists of five interconnected layers:
-Layer 1 — Inference Pipeline
-  Model sharding, speculative decoding, and pipeline parallelism across
-  distributed GPU nodes. This is the engine that makes decentralized inference
-  fast enough for production use.
-Layer 2 — Relay Network
-  Lightweight coordination nodes that maintain routing tables, assemble
-  optimal inference pipelines, handle failover, and manage session state.
-  No GPU required — any machine can run a relay node.
-Layer 3 — Proof & Settlement
-  Cryptographic proof of compute generation, batch settlement on a
-  high-throughput L2 (Base), escrow contracts
-  for consumer-provider payment flows.
-Layer 4 — Data & Training
-  The Savant Loop — execution traces from the orchestration platform become
-  training data via federated learning. The network's models improve as a
-  direct function of usage.
-Layer 5 — Economic
-  The $GROOVE token economy — compute marketplace, data staking, enterprise
-  sandboxes, and protocol-level fee capture that funds open-source
-  development and training infrastructure.
-================================================================================
-3. THE P2P COMPUTE LAYER
-================================================================================
-3.1 Weight Slicing & Model Sharding
-To serve large models (32B+ parameters) on consumer hardware with 8-16GB
-VRAM, Groove employs layer-wise sharding:
-  A 32B model with 64 transformer layers is sliced into blocks:
-    Node B (RTX 4090, 24GB):   Layers 1-21
-    Node C (RTX 3090, 24GB):   Layers 22-42
-    Node D (Mac M2 Ultra):     Layers 43-64
-Each node loads only its assigned layer range into VRAM, plus the KV cache
-for active sessions. A 32B model quantized to NF4 requires ~18-22GB total;
-split across 3 nodes, each holds 6-8GB of weights — well within consumer
-GPU capacity.
-Nodes discover each other and advertise their hosted layer ranges via a
-Distributed Hash Table (DHT) built on libp2p. The DHT tracks:
-  - Which model shards each node hosts
-  - Geographic region and measured RTT to neighboring nodes
-  - Current load (active sessions, queue depth)
-  - Uptime reliability score (rolling 24-hour window)
-  - GPU specifications and available VRAM
-3.2 Quantization
-Groove uses advanced quantization to maximize intelligence-per-watt:
-  4-bit / 5-bit NF4 Quantization:
-    Reduces a 32B model's VRAM footprint to ~18-22GB, enabling performant
-    inference on dual-GPU consumer rigs or high-end Apple Silicon.
-  GGUF Format Support:
-    Nodes run shards via llama.cpp or vLLM, supporting the full range of
-    community-quantized models. Operators choose their quantization level
-    based on their hardware — the relay node factors quant quality into
-    pipeline assembly.
-3.3 KV Cache Distribution
-Each node in the pipeline maintains the KV cache for its assigned layers
-for the duration of a session. This enables:
-  Massive Context Windows:
-    A single consumer GPU might support 8K context. Three nodes collectively
-    support 24K+, with each node managing memory for its layer subset. The
-    context window scales with the number of nodes in the pipeline.
-  Distributed Conversation Memory:
-    Long conversations that would exhaust a single machine's memory are
-    naturally distributed across the pipeline. Each node only holds the KV
-    cache for its layers — memory pressure is divided, not multiplied.
-================================================================================
-4. THE THREE-LAYER INFERENCE STACK
-================================================================================
-This is the core technical innovation that makes decentralized inference
-viable. Three techniques stack to convert a network-latency-dominated
-system into a compute-dominated one.
-4.1 Pipeline Parallelism
-Naive approach: Generate token 1 through all 64 layers, then start token 2.
-Each token pays the full cost of every network hop.
-Pipeline approach: Overlap computation across nodes for different tokens.
-  Time -->
-  Node B: [tok1 L1-21]  [tok2 L1-21]  [tok3 L1-21]  [tok4 L1-21] ...
-  Node C:               [tok1 L22-42] [tok2 L22-42] [tok3 L22-42] ...
-  Node D:                             [tok1 L43-64] [tok2 L43-64] ...
-After the pipeline fills (3 tokens deep), throughput is bounded by the
-slowest single node, not the sum of all nodes. This alone improves
-throughput from ~1 tok/s to ~5-8 tok/s.
-4.2 Speculative Decoding with Local Draft Models
-This is the architectural decision that transforms the phone from a dumb
-terminal into an active inference participant.
-The consumer device (phone, laptop, tablet) runs a small 1B-3B quantized
-"draft" model locally. This model speculatively generates a window of
-candidate tokens (typically 8). The full distributed pipeline then verifies
-all candidates in a single forward pass instead of generating them one by
-one.
-The verification math:
-  - Draft model predicts 8 tokens
-  - Full 32B model accepts the first N that match its own distribution
-  - Typical acceptance rate: 60-80% (higher for code, lower for creative)
-  - Result: 5-6 tokens from 1 network round-trip instead of 1 token
-This converts 70%+ of network RTT overhead into useful local computation.
-The phone isn't waiting — it's actively drafting the next window while the
-pipeline verifies the current one.
-Adaptive Window Sizing:
-  The consumer device dynamically adjusts the speculative window based on
-  acceptance rate:
-    High acceptance (>80%): Expand window to 12 tokens
-    Normal acceptance (50-80%): Standard window of 8 tokens
-    Low acceptance (<50%): Shrink window to 4 tokens
-  This ensures the system optimizes for throughput when the draft model is
-  accurate and minimizes wasted compute when it's struggling.
-Domain-Tuned Draft Models:
-  The Savant training loop (Section 7) produces domain-specific draft models
-  alongside the full models. A code-focused Savant draft model achieves
-  80%+ acceptance on programming tasks. The training flywheel directly
-  improves inference speed, not just quality.
-4.3 Geographic Routing via DHT
-The Relay Node doesn't route to ANY node hosting the required layers — it
-routes to the OPTIMAL node based on:
-  - Network RTT to adjacent pipeline nodes (<10ms preferred)
-  - Current queue depth (prefer idle nodes)
-  - GPU capability (match quant requirements to hardware)
-  - Uptime reliability (>95% preferred for primary, >90% for standby)
-Ideal pipeline assembly: all nodes in the same metropolitan area, where hop
-latency drops to 2-5ms instead of 50ms intercontinental. The routing layer
-explicitly optimizes for geographic locality.
-================================================================================
-5. THE RELAY NETWORK
-================================================================================
-Relay Nodes are the coordination layer of the Groove Protocol. They are
-lightweight — no GPU required, minimal CPU and bandwidth — and earn a small
-coordination fee for every session they route.
-5.1 Responsibilities
-  Pipeline Assembly:
-    When a consumer initiates a session, the relay node queries the DHT,
-    evaluates candidates, and assembles an optimal pipeline of compute nodes
-    plus standby backups for each position.
-  Session Management:
-    The relay maintains session state — which nodes are in the pipeline,
-    heartbeat monitoring, KV cache checkpoint status, billing accumulation.
-  Failover Orchestration:
-    When a compute node drops (heartbeat timeout at 100ms), the relay
-    instantly promotes the standby node and updates routing for all other
-    pipeline members. Target failover time: <150ms.
-  Proof Batching:
-    The relay collects proof-of-compute attestations from each node and
-    submits them in batches to the L2 settlement contract, reducing gas
-    costs.
-5.2 Why a Separate Coordination Layer
-Previous decentralized compute systems (Petals, Bittensor) relied on compute
-nodes to self-organize. This creates a scheduling bottleneck under
-heterogeneous load — nodes optimizing for their own utilization make
-globally suboptimal routing decisions.
-The relay node acts as a neutral scheduler. It has no stake in which compute
-node handles which request — only in assembling the fastest possible
-pipeline. Relay operators earn fees proportional to session quality (measured
-by achieved tok/s vs. estimated tok/s), aligning incentives with good
-routing decisions.
-5.3 Network Transport: QUIC (HTTP/3)
-All node-to-node communication uses QUIC for:
-  0-RTT Connection Resumption:
-    Critical for failover. When Node C drops and C-Prime is promoted, the
-    existing QUIC session resumes without a full handshake. Pipeline
-    continuity is maintained in milliseconds.
-  Multiplexed Streams:
-    Multiple inference requests can share a single connection. Pipeline
-    nodes can process requests for different sessions concurrently without
-    head-of-line blocking.
-  Built-in Congestion Control:
-    QUIC's congestion control is tuned for low-latency interactive traffic,
-    unlike TCP's throughput-optimized defaults. Hidden state transfers
-    (~13KB per token for a 32B model) complete faster.
-  NAT Traversal:
-    Consumer hardware sits behind NAT routers. QUIC over UDP is more
-    NAT-friendly than TCP, and combined with libp2p's STUN/TURN hole
-    punching, enables direct node-to-node connections without relay
-    mediation for data transfer.
-================================================================================
-6. THE INFERENCE REQUEST LIFECYCLE
-================================================================================
-A complete step-by-step protocol for what happens when User A on a phone
-sends a prompt to a Groove Savant model.
-PHASE 0 — Session Establishment (once per conversation)
-  Phone --> Relay Node: SESSION_INIT
-    model: "groove-savant-32b"
-    quant: "nf4"
-    max_context: 32768
-    region_preference: "us-west"
-    budget_cap: 150 $GROOVE
-  Relay Node:
-    DHT lookup for nodes hosting savant-32b shards
-    Filter: region proximity, uptime > 0.95, queue depth < 3
-    Assemble primary pipeline: [B: L1-21, C: L22-42, D: L43-64]
-    Assign standby pipeline: [B': L1-21, C': L22-42, D': L43-64]
-  Relay Node --> Nodes B, C, D: PIPELINE_RESERVE
-    session_id: "abc123"
-    ttl: 300s (renewed on activity)
-    consumer_stake: 50 $GROOVE (locked in L2 escrow)
-  Relay Node --> Phone: SESSION_READY
-    pipeline: [B, C, D]
-    estimated_tok_s: 18
-    price_per_1k_tokens: 0.3 $GROOVE
-  The relay pre-reserves compute AND standbys before the first token is
-  generated. The consumer's stake goes into L2 escrow — preventing griefing
-  and guaranteeing provider payment even if the consumer disconnects.
-PHASE 1 — Prompt Encoding (parallel)
-  Phone: Tokenize prompt locally
-  Phone: Run draft model on prompt to pre-generate 8 speculative tokens
-  Phone --> Node B: PROMPT_ENCODE
-    tokens: [prompt_tokens]
-    speculative: [8 draft tokens]
-    session_id: "abc123"
-  The phone sends both the prompt AND the first speculative window in a
-  single message. While the pipeline processes the prefill, speculative
-  tokens are already queued. Zero wasted time.
-PHASE 2 — Prefill Pass (sequential, once per turn)
-  Node B: Process prompt through L1-21, generate KV cache
-  Node B --> Node C: ACTIVATIONS { hidden_states, kv_metadata }
-  Node B --> B' (standby): KV_DELTA { incremental cache update }
-  Node C: Process through L22-42, generate KV cache
-  Node C --> Node D: ACTIVATIONS { hidden_states, kv_metadata }
-  Node C --> C' (standby): KV_DELTA { incremental cache update }
-  Node D: Process through L43-64, produce logits
-  Node D --> D' (standby): KV_DELTA { incremental cache update }
-  Prefill is the slowest step — the entire prompt flows through all layers.
-  It happens once per conversation turn. For a 500-token prompt across
-  3 nodes: 200-400ms total.
-  Each node simultaneously streams KV deltas to its standby. By the time
-  prefill finishes, all three standbys are warm and ready for instant
-  failover.
-PHASE 3 — Speculative Verification
-  Node D has logits for the prompt AND 8 speculative tokens.
-  Node D verifies against actual model distribution:
-    Tokens 1-5: MATCH (draft model correct)
-    Token 6: MISMATCH (draft: "async", model: "Promise")
-    Tokens 7-8: DISCARDED (after first mismatch)
-  Node D --> Phone: VERIFY_RESULT
-    accepted: ["const", " ", "result", " ", "="]
-    correction: "Promise"
-    total_generated: 6
-  One network round-trip, 6 tokens. The phone displays all 6 tokens
-  immediately while the draft model begins the next speculative window
-  starting from the corrected token.
-PHASE 4 — Steady-State Generation (pipelined + speculative)
-  LOOP:
-    Phone: Generate next N speculative tokens via draft model (~15ms)
-    Phone --> Node B: SPEC_WINDOW { tokens: [N candidates], turn: T }
-    Pipeline processes while phone drafts next window:
-      B: L1-21 --> C: L22-42 --> D: L43-64
-    D --> Phone: VERIFY_RESULT { accepted: 5-7, correction: 1 }
-    Phone: Display accepted tokens to user immediately
-    Phone: Adjust window size based on acceptance rate
-  UNTIL: EOS token or context limit reached
-PHASE 5 — Proof Generation & Settlement
-  Each node, per speculative window processed:
-    PROOF_OF_COMPUTE
-      session_id, node_id, layers_processed, tokens_verified,
-      compute_time_ms, gpu_utilization, nonce,
-      signature: sign(private_key, hash(fields))
-  Relay Node batches proofs per session window:
-    --> L2 Contract: SETTLE_BATCH
-      session_id: "abc123"
-      proofs: [B_proof, C_proof, D_proof]
-      total_tokens: 6
-      consumer: User_A_address
-  L2 Contract:
-    Verify signatures
-    Release $GROOVE from escrow proportional to layers hosted:
-      B: 33% (21/64 layers)
-      C: 31% (21/64 layers)
-      D: 36% (22/64 layers)
-    Deduct from consumer's staked balance
-  Settlement is batched per verification window, not per token. A full
-  response generates 30-50 settlement events, keeping L2 gas costs
-  manageable.
-PHASE 6 — Failover
-  Relay Node: Detects Node C heartbeat timeout (100ms threshold)
-  Relay Node --> C' (standby): PROMOTE
-    session_id: "abc123"
-    role: primary
-    kv_cache: already warm (continuous delta streaming)
-  Relay Node: Update routing: B --> C' --> D
-  Relay Node --> B, D: PIPELINE_UPDATE { new_hop: C' }
-  Total failover time: ~150ms
-  User perceives: brief stutter, at most 1 extra second on current response
-  Conversation context: fully preserved, no re-prefill required
-================================================================================
-7. THE SAVANT LOOP — DATA STRATEGY
-================================================================================
-Data is the capital of the Groove ecosystem. Instead of generic web-scraping,
-Groove captures Vertical Intelligence — the actual reasoning traces of agents
-performing real work.
-7.1 Execution Traces as Training Data
-The 7-layer orchestration platform captures high-fidelity "Agentic Traces":
-  - Reasoning steps and chain-of-thought
-  - Tool calls and their results
-  - Error encounters and corrections
-  - Task decomposition decisions
-  - Code generation and iteration patterns
-Users who opt in to share anonymized traces receive a Data Staking
-Multiplier — earning more $GROOVE per compute hour contributed.
-7.2 Federated Fine-Tuning
-Global Savant models are updated via federated learning:
-  1. Individual nodes calculate gradient updates based on local usage data
-  2. Only gradient updates (not raw data) are transmitted to Aggregator nodes
-  3. Aggregator nodes merge updates into new model versions using secure
-     aggregation — no single aggregator ever sees raw private data
-  4. Updated model weights are distributed back to the network
-This creates a privacy-preserving training pipeline where the model improves
-continuously without centralizing user data.
-7.3 The Draft Model Flywheel
-The Savant training loop produces two outputs:
-  1. The full Savant model (32B+) — served by the distributed pipeline
-  2. A distilled Savant Draft model (1B-3B) — runs locally on consumer devices
-The draft model is specifically trained to predict the full model's outputs.
-As the full model improves from execution traces, the draft model improves
-in lockstep. This means:
-  - Better Savant model = better draft predictions
-  - Better draft predictions = higher speculative acceptance rate
-  - Higher acceptance rate = faster inference throughput
-  - Faster inference = more usage = more training data
-The training flywheel directly improves both quality AND speed.
-================================================================================
-8. KV CACHE MANAGEMENT & FAILOVER
-================================================================================
-The KV cache is the conversation's memory. Losing it means re-processing
-the entire conversation history — a UX-destroying event. Groove treats KV
-cache persistence as a first-class protocol concern.
-8.1 Asynchronous Checkpoint Streaming
-Each primary compute node continuously streams KV cache deltas to its
-assigned standby node over QUIC:
-  - Deltas are incremental (per-token additions), not full snapshots
-  - Streaming happens in parallel with inference computation
-  - Bandwidth overhead is marginal: ~13KB per token per layer boundary
-  - Standby nodes maintain a hot mirror of the primary's cache state
-This means promotion of a standby is a routing table swap, not a
-recomputation. The standby already has the full KV cache — it just starts
-processing the next activation it receives.
-8.2 Session Recovery Tiers
-  Tier 1 — Hot Standby (default):
-    Continuous delta streaming. Failover in <150ms. No tokens lost.
-  Tier 2 — Warm Recovery:
-    Periodic checkpoint snapshots (every 30s). Failover requires re-prefill
-    of tokens generated since last checkpoint. Recovery in 1-3s.
-  Tier 3 — Cold Recovery:
-    No standby assigned (budget-constrained sessions). Full re-prefill from
-    conversation history on a newly assigned node. Recovery in 5-15s
-    depending on conversation length.
-Consumers choose their reliability tier at session initialization. Higher
-tiers cost more $GROOVE (standby nodes must be compensated for reserved
-capacity) but provide seamless failover.
-================================================================================
-9. PROOF OF COMPUTE & SETTLEMENT
-================================================================================
-Unlike traditional mining, Groove nodes perform Useful Work. Every inference
-operation generates cryptographic proof that is verified on-chain to trigger
-$GROOVE token releases.
-9.1 Proof Structure
-Each compute node generates a proof per speculative verification window:
-  PROOF_OF_COMPUTE:
-    session_id         — unique session identifier
-    node_id            — node's public key / identity
-    layers_processed   — layer range executed (e.g., [1, 21])
-    tokens_verified    — number of tokens in the verification batch
-    compute_time_ms    — wall clock time for computation
-    gpu_utilization    — average GPU utilization during compute
-    input_hash         — hash of received activations
-    output_hash        — hash of produced activations
-    nonce              — random value for uniqueness
-    signature          — Ed25519 signature over hash of all fields
-9.2 Verification Mechanisms
-Three layers of verification prevent dishonest claims:
-  Relay Timing Verification:
-    The relay node independently timestamps when activations are sent to
-    and received from each compute node. Claimed compute times that
-    significantly deviate from relay-observed wall clock time are flagged.
-  Challenge-Response Spot Checks:
-    The L2 contract randomly selects a small percentage of proofs for
-    re-execution. A trusted verifier node re-processes the same inputs
-    and compares output hashes. Failed challenges result in slashing of
-    the node's staked $GROOVE.
-  Reputation-Weighted Trust:
-    Nodes build reputation scores over time based on:
-      - Proof accuracy on spot checks
-      - Uptime consistency
-      - Latency SLA compliance
-      - Session completion rate
-    High-reputation nodes face fewer spot checks (reducing overhead).
-    New nodes face higher scrutiny until they build trust.
-9.3 Batch Settlement
-Proofs are batched to minimize L2 gas costs:
-  - Relay nodes collect proofs per session per verification window
-  - Batches are submitted to the L2 contract at configurable intervals
-  - The contract verifies signatures, checks relay timing data, and
-    releases escrowed $GROOVE proportional to work performed
-  - Payment is split among pipeline nodes based on layer count:
-    a node hosting 21 of 64 layers receives 33% of the inference payment
-9.4 Anti-Gaming Measures
-  Cherry-Pick Prevention:
-    Nodes cannot selectively accept easy/short requests and drop hard ones.
-    The protocol tracks session completion rate — dropping below 90%
-    triggers reputation penalties and reduced routing priority.
-  Latency SLA Compliance:
-    At session establishment, the relay estimates tok/s. Nodes that
-    consistently underperform their estimated throughput by >30% face
-    reduced routing priority. This prevents nodes from over-advertising
-    their capabilities.
-  Stake Slashing:
-    Compute nodes stake $GROOVE to participate. Verified dishonest behavior
-    (failed spot checks, fabricated proofs) results in partial stake
-    slashing. The slashed amount is burned, creating deflationary pressure.
-================================================================================
-10. THE ECONOMIC FLYWHEEL
-================================================================================
-  User creates Data --> Network trains Model --> Model improves Agents
-  --> Agents attract Users --> Users create Data
-The $GROOVE token is the oxygen of the network:
-10.1 Compute Marketplace
-  Consumers:
-    Purchase inference by staking $GROOVE into session escrow. Pricing is
-    per-1K-tokens, dynamically adjusted based on network supply/demand.
-    Consumers set budget caps — sessions terminate gracefully when the cap
-    is approached.
-  Providers (GPU Operators):
-    Earn $GROOVE for every verified inference computation. Revenue scales
-    with GPU capability, uptime, and geographic desirability. A well-
-    positioned RTX 4090 in a high-demand metro area earns more than an
-    identical card in a low-demand region.
-  Relay Operators:
-    Earn a coordination fee (small percentage of session value) for routing
-    and session management. Low hardware requirements — any stable internet
-    connection qualifies.
-10.2 Data Staking
-Users who opt in to share anonymized execution traces receive a Data Staking
-Multiplier on their token earnings. The multiplier scales with:
-  - Volume of traces contributed
-  - Quality of traces (measured by downstream training signal)
-  - Rarity of the task domain (underrepresented domains earn more)
-This creates an incentive to use Groove for diverse, complex tasks —
-exactly the data that makes Savant models better.
-10.3 Enterprise Sandboxes
-Large firms pay $GROOVE for private, audited model instances:
-  - Dedicated compute pipelines (no shared nodes)
-  - Trusted Execution Environment (TEE) guarantees
-  - Compliance-auditable inference logs
-  - Custom Savant models fine-tuned on enterprise data
-Enterprise demand creates sustained buy pressure on $GROOVE, providing
-price stability for the broader network.
-10.4 Protocol Fee & Treasury
-A small percentage of every transaction is captured by the protocol:
-  - 50% funds open-source development
-  - 30% funds Savant training infrastructure (Foundation Fleet GPUs)
-  - 20% funds a stability reserve for network incentive smoothing
-10.5 Cost Advantage
-The decentralized model fundamentally undercuts centralized pricing:
-  - No data center leases, power contracts, or cooling infrastructure
-  - GPU operators use existing hardware during idle time — marginal cost
-    approaches electricity only
-  - Competition among providers drives prices to near-marginal-cost
-  - Enterprise users escape unpredictable API billing and recurring
-    cloud infrastructure budget alerts
-================================================================================
-11. PRIVACY & ENTERPRISE READINESS
-================================================================================
-11.1 Intermediate Activation Privacy
-When Node B sends hidden states to Node C, those activations contain
-information about the original input. Research has demonstrated partial
-input reconstruction from intermediate activations. Groove addresses this
-at three levels:
-  Standard Tier (Consumer):
-    Activations are transmitted in the clear between pipeline nodes.
-    Privacy is protected by the ephemerality of the data (activations
-    exist only in transit, not stored) and the contractual obligations
-    of node operators.
-  Enhanced Tier:
-    Differential privacy noise is injected into activations at each
-    pipeline boundary. This mathematically bounds the information leakage
-    while introducing minimal quality degradation (calibrated epsilon).
-  Enterprise Tier:
-    Compute nodes run inside Trusted Execution Environments (TEEs) —
-    Intel SGX, AMD SEV, or ARM CCA. The relay node only routes to
-    TEE-attested nodes for enterprise sessions. Activations are processed
-    in encrypted memory enclaves. Combined with dedicated pipelines, this
-    provides end-to-end verifiable privacy.
-11.2 Data Privacy in Training
-The federated learning pipeline ensures raw user data never leaves the
-local node:
-  - Only gradient updates are transmitted
-  - Secure aggregation protocols prevent the aggregator from
-    reconstructing individual contributions
-  - Differential privacy is applied at the gradient level before
-    transmission
-  - Users can audit exactly what data they contribute via the opt-in
-    dashboard
-================================================================================
-12. NETWORK BOOTSTRAP — THE FOUNDATION FLEET
-================================================================================
-A fully decentralized network cannot deliver reliable SLAs on Day 1 with
-10 nodes. Groove addresses the cold start problem with a progressive
-decentralization strategy.
-Phase 1 — Foundation Fleet (Launch):
-  Groove operates a core set of high-performance GPU nodes that guarantee
-  baseline model coverage and latency SLAs. These nodes are operated at
-  cost — they earn $GROOVE like any other node but are committed to 99.9%
-  uptime. The Foundation Fleet ensures every model shard has redundant
-  coverage from day one.
-Phase 2 — Hybrid Network (Months 1-6):
-  Community nodes begin joining. The relay network preferentially routes
-  to community nodes when they meet latency/reliability thresholds, but
-  falls back to Foundation Fleet nodes when community coverage is thin.
-  Foundation Fleet nodes handle the long tail — rare model shards,
-  low-demand regions, peak load overflow.
-Phase 3 — Community Majority (Months 6-18):
-  As network density grows, community nodes handle the majority of
-  inference. Foundation Fleet nodes scale down to standby/overflow only.
-  Protocol governance begins transitioning to token-weighted community
-  voting.
-Phase 4 — Full Decentralization (18+ Months):
-  Foundation Fleet is fully retired or converted to community-operated
-  nodes. The protocol runs entirely on community infrastructure with
-  economic incentives maintaining reliability.
-================================================================================
-13. PERFORMANCE EXPECTATIONS
-================================================================================
-Realistic throughput projections for a 32B Savant model, NF4 quantized,
-3-node pipeline with speculative decoding:
-Best Case (nodes in same metro, good GPUs):
-  Time to first token: 500ms - 1s
-  Sustained throughput: 15-25 tok/s
-  User experience: comparable to a standard Claude or GPT response
-Average Case (nodes across same continent):
-  Time to first token: 1-3s
-  Sustained throughput: 5-10 tok/s
-  User experience: noticeable but acceptable for most tasks
-Worst Case (global distribution, unstable nodes):
-  Time to first token: 3-8s
-  Sustained throughput: 2-4 tok/s
-  User experience: suitable for background agent tasks, not interactive chat
-For Groove's primary use case — agentic coding tasks running in the
-background — even the average case provides excellent UX. Agents don't
-need 50 tok/s interactive speed. A builder agent processing a refactoring
-task is perfectly served by 10 tok/s sustained throughput.
-================================================================================
-14. TECHNICAL IMPLEMENTATION NOTES
-================================================================================
-14.1 Node-to-Node NAT Traversal
-Compute nodes on consumer hardware sit behind NAT routers. Direct P2P
-connections for hidden state transfer require:
-  - STUN/TURN hole punching via libp2p's QUIC transport
-  - Fallback to relay-mediated forwarding when direct connection fails
-  - NAT negotiation adds 100-200ms to initial pipeline setup (one-time
-    cost per session)
-14.2 Clock Synchronization
-Proof of Compute includes compute_time_ms. To prevent inflated claims:
-  - Relay nodes independently timestamp request dispatch and result receipt
-  - NTP synchronization is required for node participation
-  - Statistical outlier detection flags nodes with systematically inflated
-    timing claims
-14.3 Model Distribution
-New Savant model versions must propagate across the network:
-  - Model weights are distributed via BitTorrent-style chunked transfer
-  - Nodes download only the layer ranges they host
-  - Rolling upgrades: the relay network routes around nodes that are
-    updating, ensuring zero downtime during model transitions
-  - Incentive: nodes running the latest model version receive a routing
-    priority boost for 48 hours post-release
-================================================================================
-15. ROADMAP
-================================================================================
-T1 — Foundation (Q3 2026):
-  - Core inference pipeline with 3-node layer sharding
-  - Relay node v1 with DHT routing and session management
-  - Speculative decoding protocol with local draft model
-  - $GROOVE token contract deployment on Base L2
-  - Foundation Fleet launch with initial model coverage
-  - Proof of Compute v1 with relay timing verification
-T2 — Network Growth (Q4 2026):
-  - Community node onboarding with staking and reputation system
-  - KV cache async checkpointing and hot standby failover
-  - Enterprise tier with TEE-attested compute nodes
-  - Savant v1 model trained on initial execution trace dataset
-  - Adaptive geographic routing with real-time RTT optimization
-  - Mobile client with integrated draft model
-T3 — Intelligence Flywheel (Q1 2027):
-  - Federated fine-tuning pipeline live on community nodes
-  - Domain-tuned draft models (code, reasoning, creative)
-  - Challenge-response spot check system for proof verification
-  - Enterprise sandbox product launch
-  - Cross-region pipeline optimization
-  - Data staking multiplier system
-T4 — Full Decentralization (Q2-Q3 2027):
-  - Foundation Fleet retirement / community transition
-  - Protocol governance via token-weighted voting
-  - Multi-model support (multiple Savant variants on one network)
-  - Advanced privacy (differential privacy on activations)
-  - Network scaling to 10K+ active compute nodes
-  - Developer SDK for third-party applications
-================================================================================
-CONCLUSION
-================================================================================
-The Groove Protocol transforms stranded consumer GPU capacity into a global
-intelligence network. By stacking pipeline parallelism, speculative decoding,
-and geographic routing, it solves the latency problem that made previous
-decentralized compute platforms impractical for production use.
-The economic model aligns incentives across all participants: GPU operators
-earn for idle compute, users get frontier-quality inference at marginal cost,
-and the Savant training loop ensures the entire network gets smarter with
-every task executed.
-This is not a theoretical architecture. Each component — model sharding,
-speculative decoding, federated learning, QUIC transport, L2 settlement —
-exists in production systems today. Groove's innovation is the integration:
-a coherent protocol that makes these pieces work together as a decentralized
-intelligence marketplace.
-The future of AI is not a handful of companies renting you access to their
-GPUs. It's a network where everyone contributes and everyone benefits.
-================================================================================
-GrooveDev.ai — Open Source Agentic Orchestration — v1.0 Whitepaper (2026)
-================================================================================