groove-dev 0.27.77 → 0.27.78

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (79) hide show
  1. package/CLAUDE.md +0 -7
  2. package/MOE_TRAINING_PIPELINE.md +216 -12
  3. package/moe-training/DEPLOY_CENTRAL_COMMAND.md +413 -0
  4. package/moe-training/client/consent.js +96 -0
  5. package/moe-training/client/envelope-builder.js +56 -0
  6. package/moe-training/client/index.js +10 -0
  7. package/moe-training/client/parsers/claude-code.js +110 -0
  8. package/moe-training/client/parsers/codex.js +80 -0
  9. package/moe-training/client/parsers/gemini.js +80 -0
  10. package/moe-training/client/parsers/grok.js +16 -0
  11. package/moe-training/client/parsers/index.js +20 -0
  12. package/moe-training/client/scrubber.js +126 -0
  13. package/moe-training/client/session-attestation.js +114 -0
  14. package/moe-training/client/step-classifier.js +51 -0
  15. package/moe-training/client/trajectory-capture.js +227 -0
  16. package/moe-training/client/transmission-queue.js +93 -0
  17. package/moe-training/package-lock.json +1266 -0
  18. package/moe-training/package.json +20 -0
  19. package/moe-training/server/enrichment.js +24 -0
  20. package/moe-training/server/index.js +119 -0
  21. package/moe-training/server/ledger.js +110 -0
  22. package/moe-training/server/routes/ingest.js +96 -0
  23. package/moe-training/server/routes/sessions.js +43 -0
  24. package/moe-training/server/routes/stats.js +31 -0
  25. package/moe-training/server/scoring.js +63 -0
  26. package/moe-training/server/session-registry.js +156 -0
  27. package/moe-training/server/stats.js +129 -0
  28. package/moe-training/server/stitcher.js +69 -0
  29. package/moe-training/server/storage.js +147 -0
  30. package/moe-training/server/verifier.js +102 -0
  31. package/moe-training/shared/constants.js +30 -0
  32. package/moe-training/shared/crypto.js +45 -0
  33. package/moe-training/shared/envelope-schema.js +220 -0
  34. package/moe-training/test/client/consent.test.js +121 -0
  35. package/moe-training/test/client/envelope-builder.test.js +107 -0
  36. package/moe-training/test/client/parsers/claude-code.test.js +119 -0
  37. package/moe-training/test/client/parsers/codex.test.js +83 -0
  38. package/moe-training/test/client/parsers/gemini.test.js +99 -0
  39. package/moe-training/test/client/scrubber.test.js +133 -0
  40. package/moe-training/test/client/session-attestation-security.test.js +95 -0
  41. package/moe-training/test/client/step-classifier.test.js +88 -0
  42. package/moe-training/test/integration/handshake.test.js +260 -0
  43. package/moe-training/test/server/ingest-security.test.js +166 -0
  44. package/moe-training/test/server/ledger.test.js +131 -0
  45. package/moe-training/test/server/scoring.test.js +242 -0
  46. package/moe-training/test/server/session-registry.test.js +125 -0
  47. package/moe-training/test/server/stitcher.test.js +157 -0
  48. package/moe-training/test/server/verifier.test.js +232 -0
  49. package/moe-training/test/shared/crypto.test.js +87 -0
  50. package/moe-training/test/shared/envelope-schema.test.js +351 -0
  51. package/node_modules/@groove-dev/cli/package.json +1 -1
  52. package/node_modules/@groove-dev/daemon/package.json +1 -1
  53. package/node_modules/@groove-dev/daemon/src/agent-loop.js +48 -5
  54. package/node_modules/@groove-dev/daemon/src/api.js +77 -0
  55. package/node_modules/@groove-dev/daemon/src/index.js +61 -0
  56. package/node_modules/@groove-dev/daemon/src/journalist.js +64 -21
  57. package/node_modules/@groove-dev/daemon/src/process.js +199 -0
  58. package/node_modules/@groove-dev/daemon/src/providers/grok.js +15 -0
  59. package/node_modules/@groove-dev/daemon/src/state.js +20 -1
  60. package/node_modules/@groove-dev/gui/dist/assets/{index-BbmPDhuW.js → index-BJgEJ9lZ.js} +1677 -1677
  61. package/node_modules/@groove-dev/gui/dist/index.html +1 -1
  62. package/node_modules/@groove-dev/gui/package.json +1 -1
  63. package/node_modules/@groove-dev/gui/src/stores/groove.js +32 -0
  64. package/node_modules/@groove-dev/gui/src/views/settings.jsx +167 -1
  65. package/package.json +1 -1
  66. package/packages/cli/package.json +1 -1
  67. package/packages/daemon/package.json +1 -1
  68. package/packages/daemon/src/agent-loop.js +48 -5
  69. package/packages/daemon/src/api.js +77 -0
  70. package/packages/daemon/src/index.js +61 -0
  71. package/packages/daemon/src/journalist.js +64 -21
  72. package/packages/daemon/src/process.js +199 -0
  73. package/packages/daemon/src/providers/grok.js +15 -0
  74. package/packages/daemon/src/state.js +20 -1
  75. package/packages/gui/dist/assets/{index-BbmPDhuW.js → index-BJgEJ9lZ.js} +1677 -1677
  76. package/packages/gui/dist/index.html +1 -1
  77. package/packages/gui/package.json +1 -1
  78. package/packages/gui/src/stores/groove.js +32 -0
  79. package/packages/gui/src/views/settings.jsx +167 -1
package/CLAUDE.md CHANGED
@@ -263,10 +263,3 @@ Audit-driven release. Multi-agent orchestration system with 7 coordination layer
263
263
  - Dashboard: routing donut, cache panel, context health gauges
264
264
  - Monitor/QC agent mode (stay active, loop)
265
265
  - Distribution: demo video, HN launch, Twitter content
266
-
267
- <!-- GROOVE:START -->
268
- ## GROOVE Orchestration (auto-injected)
269
- Active agents: 0
270
- See AGENTS_REGISTRY.md for full agent state.
271
- **Memory policy:** GROOVE manages project memory automatically. Do not read or write MEMORY.md or .groove/memory/ files directly.
272
- <!-- GROOVE:END -->
@@ -706,15 +706,219 @@ the network team to add patterns to `scrubber.py`.
706
706
 
707
707
  ---
708
708
 
709
- ## 10. File Reference
710
-
711
- | File | Description |
712
- |------|-------------|
713
- | `moe-team/src/training/__init__.py` | Package exports: CaptureSession, ConsentManager, ConsentRecord, CorpusStats, DomainTagger, PIIScrubber, TrainingCorpus, TrainingRecord |
714
- | `moe-team/src/training/intake.py` | TrainingDataIntake simple submit/batch/delete API |
715
- | `moe-team/src/training/consent.py` | ConsentManager — SQLite consent storage, versioned consent |
716
- | `moe-team/src/training/scrubber.py` | PIIScrubber 13 compiled regex patterns for PII removal |
717
- | `moe-team/src/training/domain_tagger.py` | DomainTagger — keyword-based domain classification |
718
- | `moe-team/src/training/corpus.py` | TrainingCorpus JSONL storage with daily rotation |
719
- | `moe-team/src/training/capture.py` | CaptureSession session-oriented capture with lifecycle |
720
- | `moe-team/src/training/stats.py` | CorpusStats — summary, daily growth, domain breakdown |
709
+ ## 10. Build plan
710
+
711
+ 1.
712
+ TrajectoryCapture is a standalone component, runs in parallel, behind the opt-in toggle. Not embedded in the Journalist. When the user hasn't opted in, the module doesn't even load — zero overhead. When they flip the switch, it hooks into the same stdout stream the Journalist reads but operates independently with pure pattern matching. No AI cost on the user's machine.
713
+ 1.
714
+ AI enrichment (cognitive_target classification, quality scoring, model fingerprinting via LLM-as-a-Judge) happens at Central Command on your AWS instance. The user's machine sends raw structured trajectories. Central Command does the expensive analysis. This means you control the enrichment model, can upgrade it without shipping app updates, and users pay nothing extra.
715
+ 1.
716
+ Both sides of every coordination exchange get captured. When agent A knocks on agent B's lock, you capture agent A's request envelope AND agent B's response envelope, each with a shared coordination_id that links them. This gives the MoE training data the full picture: how to ask for help AND how to respond to requests. This is the data that teaches Groove agents to be team players. No other dataset on earth has this.
717
+ 1.
718
+ Streaming chunked envelopes. Send sub-envelopes of up to 200 steps during the session, each signed with the ECDH session HMAC and incrementing sequence numbers. When the session ends, send a final SESSION_CLOSE envelope with the outcome data (success/failure, user interventions count, total tokens, duration). Central Command stitches the chunks into the full trajectory using session_id + sequence numbers. This way you don't lose data on crashes, and you can do real-time quality monitoring on incoming data.
719
+ THE FULL ARCHITECTURETrajectoryCapture System:
720
+
721
+ Component: TrajectoryCapture (new file in packages/daemon/src/)
722
+
723
+ Lifecycle:
724
+
725
+ -
726
+ Daemon starts: checks opt-in flag (is_capture_enabled). If false, does nothing. Zero imports, zero overhead.
727
+ -
728
+ User opts in: daemon loads TrajectoryCapture, it registers as a listener on ProcessManager's stdout stream
729
+ -
730
+ Agent spawns: TrajectoryCapture opens a new capture context for that agent, performs ECDH handshake with Central Command, gets session attestation
731
+ -
732
+ During session: parses stdout into typed steps, buffers until 200 steps or 5 minutes (whichever comes first), signs and transmits chunk
733
+ -
734
+ Agent completes/crashes/killed: sends SESSION_CLOSE with outcome data
735
+ -
736
+ User opts out: TrajectoryCapture detaches from stdout stream, flushes any buffered data, tears down
737
+ Step Classification (deterministic, pattern-based):
738
+
739
+ Each provider's parseOutput() already knows the output format. TrajectoryCapture extends this with step type detection:
740
+
741
+ -
742
+ "thought": Lines inside thinking/reasoning blocks. Claude Code wraps these clearly. Codex has internal reasoning markers. Gemini has thinking blocks.
743
+ -
744
+ "action": Tool calls — Read, Edit, Write, Bash, Grep, Glob. Parse the tool name and arguments. For Bash, capture the command string.
745
+ -
746
+ "observation": Tool results — the output that comes back after an action. Truncate intelligently: first 50 + last 20 lines for long outputs, full content for short ones.
747
+ -
748
+ "correction": User message that arrives AFTER the agent has taken at least one action. This is the RLHF gold. The ProcessManager can signal when user input arrives mid-session.
749
+ -
750
+ "resolution": The agent's final summary or commit message. Detect by position (last substantive output before session end) and content patterns (commit, done, implemented, fixed).
751
+ -
752
+ "error": Build failures, test failures, permission errors, crashes. Pattern match on common error signatures (Error:, FAIL, exit code, stack traces).
753
+ -
754
+ "coordination": Lock requests, QC submissions, knock protocol messages. These come through LockManager and Supervisor events, not stdout. TrajectoryCapture also listens to these EventEmitters.
755
+ Envelope Structure (refined):
756
+
757
+ {
758
+ "envelope_id": "env_<uuid>",
759
+ "session_id": "sess_<uuid>",
760
+ "chunk_sequence": 3,
761
+ "contributor_id": "<anonymous_install_uuid>",
762
+ "attestation": {
763
+ "session_hmac": "<HMAC-SHA256>",
764
+ "sequence": 47,
765
+ "app_version_hash": "<sha256 of groove binary>"
766
+ },
767
+ "metadata": {
768
+ "model_engine": "claude-opus-4.6",
769
+ "provider": "claude-code",
770
+ "agent_role": "frontend",
771
+ "agent_id": "frontend-1",
772
+ "task_complexity": "heavy",
773
+ "team_size": 4,
774
+ "session_quality": 82,
775
+ "groove_version": "0.27.77"
776
+ },
777
+ "trajectory_log": [
778
+ {
779
+ "step": 1,
780
+ "type": "thought",
781
+ "timestamp": 1745625600.0,
782
+ "content": "I need to fix the WebSocket reconnection logic...",
783
+ "token_count": 142
784
+ },
785
+ {
786
+ "step": 2,
787
+ "type": "action",
788
+ "timestamp": 1745625602.3,
789
+ "tool": "Grep",
790
+ "arguments": {"pattern": "WebSocket", "path": "src/"},
791
+ "token_count": 28
792
+ },
793
+ {
794
+ "step": 3,
795
+ "type": "observation",
796
+ "timestamp": 1745625603.1,
797
+ "content": "Found 4 matches in src/ws-client.js...",
798
+ "token_count": 89,
799
+ "truncated": false
800
+ },
801
+ {
802
+ "step": 4,
803
+ "type": "correction",
804
+ "timestamp": 1745625680.0,
805
+ "content": "No, use exponential backoff instead of fixed retry",
806
+ "source": "user",
807
+ "token_count": 12
808
+ },
809
+ {
810
+ "step": 5,
811
+ "type": "coordination",
812
+ "timestamp": 1745625700.0,
813
+ "coordination_id": "coord_<uuid>",
814
+ "direction": "outbound",
815
+ "target_agent": "backend-1",
816
+ "protocol": "knock",
817
+ "content": "Requesting lock on src/api/ws-handler.js"
818
+ }
819
+ ]
820
+ }
821
+ The SESSION_CLOSE envelope:
822
+
823
+ {
824
+ "envelope_id": "env_<uuid>",
825
+ "session_id": "sess_<uuid>",
826
+ "type": "SESSION_CLOSE",
827
+ "attestation": { ... },
828
+ "outcome": {
829
+ "status": "SUCCESS",
830
+ "user_interventions": 1,
831
+ "total_steps": 847,
832
+ "total_chunks": 5,
833
+ "total_tokens": 48200,
834
+ "duration_seconds": 1820,
835
+ "files_modified": 6,
836
+ "errors_encountered": 2,
837
+ "errors_recovered": 2,
838
+ "coordination_events": 3
839
+ }
840
+ }
841
+ CENTRAL COMMAND (AWS) — what it receives and does:
842
+
843
+ Endpoint: POST https://api.groovedev.ai/v1/training/ingest
844
+
845
+ Receives chunked envelopes in real time. For each:
846
+
847
+ 1.
848
+ Verify HMAC against the session's shared secret
849
+ 2.
850
+ Verify sequence number is monotonically increasing
851
+ 3.
852
+ Store raw envelope in S3 (or local JSONL as staging)
853
+ 4.
854
+ On SESSION_CLOSE: stitch all chunks into full trajectory, run AI enrichment:
855
+ -
856
+ Cognitive target classification (syntactic/semantic/strategic/corrective/coordinative)
857
+ -
858
+ Model fingerprint verification (LLM-as-a-Judge)
859
+ -
860
+ Latency profile analysis
861
+ -
862
+ Quality score
863
+ -
864
+ Multiplier calculation (model tier x correction count x complexity x coordination uniqueness)
865
+ 1.
866
+ Update the off-chain ledger with contributor points
867
+ 2.
868
+ Every 24h: compute Merkle root, publish to Base L2 (future)
869
+ WHAT MAKES THIS DATASET UNIQUE:
870
+
871
+ The five data types no one else has:
872
+
873
+ 1.
874
+ Multi-agent coordination trajectories — how agents divide work, negotiate locks, hand off context
875
+ 2.
876
+ Role-specific reasoning chains — planner thinks differently than QC thinks differently than frontend builder
877
+ 3.
878
+ Context rotation recovery — how agent N+1 picks up where agent N left off using the handoff brief
879
+ 4.
880
+ User corrections in context — not just "this is wrong," but the full trajectory showing what the agent tried, where it went wrong, and how the human redirected
881
+ 5.
882
+ Error-recovery arcs — agent hits wall, tries alternatives, eventually succeeds (or doesn't) — the full struggle
883
+ This is training data for agents that work in teams, recover from mistakes, and learn from human feedback. That's the product.
884
+
885
+ THE BUILD PLAN:
886
+
887
+ This breaks into two workstreams:
888
+
889
+ Workstream 1 — Client side (in the Groove daemon):
890
+
891
+ -
892
+ TrajectoryCapture component (new daemon module)
893
+ -
894
+ Opt-in toggle wiring (consent flow from the earlier spec)
895
+ -
896
+ ECDH session handshake with Central Command
897
+ -
898
+ Provider-specific stdout parsers for step classification
899
+ -
900
+ Chunked envelope packaging and HMAC signing
901
+ -
902
+ Transmission queue (non-blocking, fail-silent)
903
+ -
904
+ EventEmitter hooks for coordination events
905
+ Workstream 2 — Server side (on AWS/Central Command):
906
+
907
+ -
908
+ Ingest endpoint (POST /v1/training/ingest)
909
+ -
910
+ Session registry (ECDH key management)
911
+ -
912
+ HMAC verification + sequence validation
913
+ -
914
+ Raw envelope storage
915
+ -
916
+ Stitcher (chunks to full trajectory on SESSION_CLOSE)
917
+ -
918
+ AI enrichment pipeline (runs after stitching)
919
+ -
920
+ Off-chain ledger
921
+ -
922
+ Contributor trust scoring
923
+ -
924
+ Stats/monitoring dashboard
@@ -0,0 +1,413 @@
1
+ # Central Command — AWS Deployment Guide
2
+
3
+ Build guide for deploying the MoE Training Data ingest server on the groovedev.ai AWS instance.
4
+
5
+ Last updated: 2026-04-23
6
+
7
+ ---
8
+
9
+ ## 1. What This Server Does
10
+
11
+ Central Command receives Trajectory Envelopes from Groove desktop clients that have opted into training data sharing. It verifies cryptographic attestation (ECDH + HMAC), stores verified envelopes, stitches multi-chunk sessions, scores trajectories using a multiplier matrix, and credits contributors with points.
12
+
13
+ All source code is in: `moe-training/server/`
14
+ Shared utilities: `moe-training/shared/`
15
+
16
+ ---
17
+
18
+ ## 2. System Requirements
19
+
20
+ - Node.js 20+ LTS (ES modules, native fetch)
21
+ - build-essential + python3 (for compiling better-sqlite3 native bindings)
22
+ - nginx (reverse proxy, SSL termination)
23
+ - certbot (Let's Encrypt SSL for api.groovedev.ai)
24
+ - PM2 (process manager) or systemd
25
+ - 1GB+ RAM, 20GB+ disk (envelopes grow over time)
26
+
27
+ ---
28
+
29
+ ## 3. Environment Variables
30
+
31
+ | Variable | Default | Description |
32
+ |---|---|---|
33
+ | GROOVE_CENTRAL_PORT | 8443 | Port the Express server listens on (behind nginx) |
34
+ | NODE_ENV | production | Set to production for security defaults |
35
+
36
+ The server does NOT need any API keys. It only receives data — it does not call any external APIs. The enrichment pipeline (LLM-as-a-Judge) is currently a stub and will need API keys when activated post-launch.
37
+
38
+ ---
39
+
40
+ ## 4. Directory Structure on Server
41
+
42
+ ```
43
+ /opt/groove-central/
44
+ moe-training/
45
+ package.json
46
+ package-lock.json
47
+ server/ <-- the server code
48
+ shared/ <-- shared crypto/schema/constants
49
+ client/ <-- NOT needed on server, but harmless to include
50
+ data/ <-- created automatically on first run
51
+ sessions.db <-- SQLite: ECDH session state, rate limiting
52
+ ledger.db <-- SQLite: contributor points and balances
53
+ envelopes/ <-- JSONL envelope storage (daily rotation)
54
+ 2026-04-26.jsonl
55
+ 2026-04-27.jsonl
56
+ ```
57
+
58
+ The data/ directory is created automatically by the server components on first request. SQLite databases use WAL mode for crash safety. File permissions: directories 0o700, files 0o600.
59
+
60
+ ---
61
+
62
+ ## 5. Dependencies
63
+
64
+ ```json
65
+ {
66
+ "better-sqlite3": "^11.0.0",
67
+ "uuid": "^9.0.0",
68
+ "express": "^4.18.0"
69
+ }
70
+ ```
71
+
72
+ better-sqlite3 is a native C++ addon — it needs compilation tools:
73
+
74
+ ```bash
75
+ sudo apt update
76
+ sudo apt install -y build-essential python3 git
77
+ ```
78
+
79
+ ---
80
+
81
+ ## 6. Deployment Steps
82
+
83
+ ### 6a. Clone and Install
84
+
85
+ ```bash
86
+ # Clone the repo (or scp the moe-training directory)
87
+ cd /opt/groove-central
88
+ git clone https://github.com/grooveai-dev/groove.git
89
+ cd groove/moe-training
90
+
91
+ # Install dependencies
92
+ npm install --production
93
+ ```
94
+
95
+ ### 6b. Test the Server Locally
96
+
97
+ ```bash
98
+ # Start the server
99
+ GROOVE_CENTRAL_PORT=8443 node server/index.js
100
+
101
+ # In another terminal, verify health
102
+ curl http://localhost:8443/health
103
+ # Expected: {"status":"ok","uptime":...}
104
+
105
+ # Verify session endpoint
106
+ curl -X POST http://localhost:8443/v1/sessions/open \
107
+ -H "Content-Type: application/json" \
108
+ -d '{"session_id":"test-1","public_key":"dGVzdA==","provider":"claude-code","model":"claude-opus-4-6","machine_fingerprint":"test","app_version_hash":"abc","groove_version":"0.27.77"}'
109
+ # Expected: {"server_public_key":"..."}
110
+
111
+ # Verify stats endpoint
112
+ curl http://localhost:8443/v1/stats/summary
113
+ # Expected: {"totalEnvelopes":0,...}
114
+
115
+ # Kill the test server (Ctrl+C)
116
+ ```
117
+
118
+ ### 6c. PM2 Process Manager
119
+
120
+ ```bash
121
+ # Install PM2 globally
122
+ sudo npm install -g pm2
123
+
124
+ # Create ecosystem config
125
+ cat > /opt/groove-central/groove/moe-training/ecosystem.config.cjs << 'EOF'
126
+ module.exports = {
127
+ apps: [{
128
+ name: 'groove-central',
129
+ script: 'server/index.js',
130
+ cwd: '/opt/groove-central/groove/moe-training',
131
+ env: {
132
+ NODE_ENV: 'production',
133
+ GROOVE_CENTRAL_PORT: 8443,
134
+ },
135
+ instances: 1,
136
+ max_memory_restart: '500M',
137
+ log_date_format: 'YYYY-MM-DD HH:mm:ss',
138
+ error_file: '/var/log/groove-central/error.log',
139
+ out_file: '/var/log/groove-central/access.log',
140
+ merge_logs: true,
141
+ }],
142
+ };
143
+ EOF
144
+
145
+ # Create log directory
146
+ sudo mkdir -p /var/log/groove-central
147
+ sudo chown $USER:$USER /var/log/groove-central
148
+
149
+ # Start with PM2
150
+ pm2 start ecosystem.config.cjs
151
+
152
+ # Save PM2 config for auto-restart on reboot
153
+ pm2 save
154
+ pm2 startup
155
+ # (follow the instructions PM2 prints to enable startup hook)
156
+ ```
157
+
158
+ ### 6d. Nginx Reverse Proxy + SSL
159
+
160
+ ```nginx
161
+ # /etc/nginx/sites-available/groove-central
162
+
163
+ server {
164
+ listen 80;
165
+ server_name api.groovedev.ai;
166
+
167
+ # Certbot will add the redirect after SSL is set up
168
+ location / {
169
+ return 301 https://$host$request_uri;
170
+ }
171
+ }
172
+
173
+ server {
174
+ listen 443 ssl http2;
175
+ server_name api.groovedev.ai;
176
+
177
+ # SSL certs (managed by certbot)
178
+ ssl_certificate /etc/letsencrypt/live/api.groovedev.ai/fullchain.pem;
179
+ ssl_certificate_key /etc/letsencrypt/live/api.groovedev.ai/privkey.pem;
180
+ ssl_protocols TLSv1.2 TLSv1.3;
181
+ ssl_ciphers HIGH:!aNULL:!MD5;
182
+
183
+ # Proxy to Node.js server
184
+ location /v1/ {
185
+ proxy_pass http://127.0.0.1:8443;
186
+ proxy_http_version 1.1;
187
+ proxy_set_header Host $host;
188
+ proxy_set_header X-Real-IP $remote_addr;
189
+ proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
190
+ proxy_set_header X-Forwarded-Proto $scheme;
191
+
192
+ # Large envelopes (up to 5MB)
193
+ client_max_body_size 5m;
194
+
195
+ # Timeouts for stitching/scoring on SESSION_CLOSE
196
+ proxy_read_timeout 30s;
197
+ proxy_send_timeout 30s;
198
+ }
199
+
200
+ location /health {
201
+ proxy_pass http://127.0.0.1:8443;
202
+ proxy_http_version 1.1;
203
+ }
204
+
205
+ # Block everything else
206
+ location / {
207
+ return 404;
208
+ }
209
+ }
210
+ ```
211
+
212
+ ```bash
213
+ # Enable the site
214
+ sudo ln -s /etc/nginx/sites-available/groove-central /etc/nginx/sites-enabled/
215
+ sudo nginx -t
216
+ sudo systemctl reload nginx
217
+
218
+ # Get SSL certificate
219
+ sudo certbot --nginx -d api.groovedev.ai
220
+ ```
221
+
222
+ ### 6e. DNS
223
+
224
+ Point api.groovedev.ai to the AWS instance's public IP:
225
+ - Type: A record
226
+ - Name: api
227
+ - Value: <AWS instance public IP>
228
+ - TTL: 300
229
+
230
+ If api.groovedev.ai already points to this instance for other services, add the /v1/ location block to the existing nginx config instead of creating a new server block.
231
+
232
+ ---
233
+
234
+ ## 7. API Endpoints
235
+
236
+ All endpoints are under /v1/ (proxied through nginx).
237
+
238
+ ### Session Management
239
+
240
+ POST /v1/sessions/open
241
+ Body: { session_id, public_key (base64), provider, model, machine_fingerprint, app_version_hash, groove_version }
242
+ Returns: { server_public_key (base64) }
243
+ Errors: 400 (missing fields), 429 (rate limited: 20 sessions/hour per fingerprint)
244
+
245
+ POST /v1/sessions/close
246
+ Body: { session_id }
247
+ Returns: { closed: true }
248
+ Errors: 404 (unknown session)
249
+
250
+ ### Data Ingestion
251
+
252
+ POST /v1/training/ingest
253
+ Body: Full Trajectory Envelope JSON (up to 5MB)
254
+ Returns: { accepted: true, envelope_id }
255
+ Errors: { accepted: false, reason: "..." }
256
+ Special: SESSION_CLOSE envelopes trigger stitching, scoring, and ledger credit
257
+
258
+ ### Statistics
259
+
260
+ GET /v1/stats/summary
261
+ Returns: { totalEnvelopes, totalSteps, totalSessions, activeSessions, uniqueContributors, storageSizeMb, totalPointsAwarded }
262
+
263
+ GET /v1/stats/daily?days=7
264
+ Returns: [{ date, envelopes, steps, sessions, points }, ...]
265
+
266
+ GET /v1/stats/models
267
+ Returns: { "claude-opus-4-6": { sessions, steps, points, percentage }, ... }
268
+
269
+ GET /v1/stats/providers
270
+ Returns: { "claude-code": { sessions, steps, points }, ... }
271
+
272
+ GET /v1/stats/leaderboard?limit=10
273
+ Returns: [{ contributor_id (truncated), total_points, total_sessions }, ...]
274
+
275
+ ### Health
276
+
277
+ GET /health
278
+ Returns: { status: "ok", uptime: seconds }
279
+
280
+ ---
281
+
282
+ ## 8. ECDH Handshake Flow
283
+
284
+ This is how client (Groove daemon) and server authenticate:
285
+
286
+ 1. Client generates ephemeral ECDH keypair (prime256v1 curve)
287
+ 2. Client POSTs public key to /v1/sessions/open
288
+ 3. Server generates its own ECDH keypair, derives shared secret, stores session
289
+ 4. Server returns its public key
290
+ 5. Client derives the same shared secret (ECDH math)
291
+ 6. Every envelope is HMAC-SHA256 signed: HMAC(shared_secret, JSON(envelope) + sequence_number)
292
+ 7. Server verifies HMAC and sequence number on each ingest
293
+ 8. Sequence numbers are monotonically increasing — prevents replay attacks
294
+
295
+ The shared secret NEVER crosses the wire. Both sides derive it independently from the key exchange. An attacker would need the server's ephemeral private key to forge envelopes.
296
+
297
+ ---
298
+
299
+ ## 9. Data Storage
300
+
301
+ ### SQLite Databases
302
+
303
+ sessions.db — Active and closed session records
304
+ Columns: session_id, server_private_key, server_public_key, shared_secret, client_public_key, provider, model, machine_fingerprint, app_version_hash, groove_version, expected_sequence, status, created_at, closed_at
305
+ WAL mode enabled for concurrent reads
306
+
307
+ ledger.db — Contributor points
308
+ Table credits: id, contributor_id, session_id, points, base_points, multiplier_breakdown (JSON), created_at
309
+ Table balances: contributor_id (PK), total_points, total_sessions, last_credit_at, trust_score
310
+ WAL mode enabled
311
+
312
+ ### JSONL Envelope Storage
313
+
314
+ Location: ./data/envelopes/YYYY-MM-DD.jsonl
315
+ One JSON object per line, daily rotation.
316
+ Each line is a complete verified envelope (chunk or SESSION_CLOSE).
317
+
318
+ Storage will grow linearly with usage. Estimate: ~1KB per envelope chunk, 5-10 chunks per session.
319
+ At 100 sessions/day = ~1MB/day. At 10,000 sessions/day = ~100MB/day.
320
+
321
+ ---
322
+
323
+ ## 10. Scoring / Multiplier Matrix
324
+
325
+ When a SESSION_CLOSE envelope arrives, the server stitches all chunks and scores:
326
+
327
+ Base points: 1 per trajectory step
328
+
329
+ Model multiplier (applied to all steps):
330
+ claude-opus-4-6, claude-opus-4-7: 5x
331
+ claude-sonnet-4-6: 3x
332
+ gpt-4.5, o3: 5x
333
+ o4-mini: 2x
334
+ gemini-2.5-pro: 3x
335
+ gemini-2.5-flash: 1.5x
336
+
337
+ Quality multipliers (applied to relevant steps):
338
+ User corrections present: 10x on correction steps
339
+ Coordination events: 5x on coordination steps
340
+ Error recovery arcs: 3x on error+resolution steps
341
+ Heavy task complexity: 2x base
342
+ Session quality >= 80: 1.5x total
343
+
344
+ ---
345
+
346
+ ## 11. Monitoring
347
+
348
+ Check server health:
349
+ curl https://api.groovedev.ai/health
350
+
351
+ Check corpus stats:
352
+ curl https://api.groovedev.ai/v1/stats/summary
353
+
354
+ Check daily growth:
355
+ curl https://api.groovedev.ai/v1/stats/daily?days=7
356
+
357
+ Check active sessions:
358
+ curl https://api.groovedev.ai/v1/stats/summary | jq .activeSessions
359
+
360
+ PM2 monitoring:
361
+ pm2 status
362
+ pm2 logs groove-central
363
+ pm2 monit
364
+
365
+ ---
366
+
367
+ ## 12. Backup Strategy
368
+
369
+ Critical data to back up:
370
+ ./data/sessions.db — session keys (active sessions need this)
371
+ ./data/ledger.db — contributor points (this is money)
372
+ ./data/envelopes/ — raw training data (the product)
373
+
374
+ Recommended: daily cron to rsync data/ to S3 or another volume.
375
+
376
+ ```bash
377
+ # Example cron (add to crontab -e)
378
+ 0 3 * * * aws s3 sync /opt/groove-central/groove/moe-training/data/ s3://groove-training-backup/$(date +\%Y-\%m-\%d)/ --quiet
379
+ ```
380
+
381
+ ---
382
+
383
+ ## 13. Security Notes
384
+
385
+ - The server accepts connections from any origin (CORS: *) because Groove clients connect from user machines worldwide
386
+ - Rate limiting: max 20 sessions per machine fingerprint per hour (prevents session spam)
387
+ - HMAC verification on every envelope prevents forged data
388
+ - Sequence numbers prevent replay attacks
389
+ - SQLite databases contain ECDH private keys — protect with filesystem permissions (0o600)
390
+ - nginx should only expose /v1/ and /health paths — everything else returns 404
391
+ - No authentication on stats endpoints (they show aggregate data only, no PII)
392
+
393
+ ---
394
+
395
+ ## 14. Future Additions (Not in This Build)
396
+
397
+ - Enrichment pipeline: LLM-as-a-Judge for model fingerprinting and cognitive target classification (server/enrichment.js is a stub)
398
+ - S3 migration: move envelope storage from local JSONL to S3 for durability
399
+ - PostgreSQL migration: move from SQLite to PostgreSQL if concurrent write pressure increases
400
+ - Merkle tree: daily hash of ledger for on-chain publication (Base L2)
401
+ - Admin dashboard: web UI for monitoring corpus health, contributor activity, data quality
402
+
403
+ ---
404
+
405
+ ## 15. Quick Reference
406
+
407
+ Start: pm2 start ecosystem.config.cjs
408
+ Stop: pm2 stop groove-central
409
+ Restart: pm2 restart groove-central
410
+ Logs: pm2 logs groove-central
411
+ Status: pm2 status
412
+ Health: curl https://api.groovedev.ai/health
413
+ Stats: curl https://api.groovedev.ai/v1/stats/summary