groove-dev 0.27.134 → 0.27.136

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (85) hide show
  1. package/moe-training/client/domain-tagger.js +1 -1
  2. package/moe-training/scripts/retag-delegate-yield.js +303 -0
  3. package/moe-training/test/shared/envelope-schema.test.js +3 -3
  4. package/node_modules/@groove-dev/cli/package.json +1 -1
  5. package/node_modules/@groove-dev/daemon/package.json +1 -1
  6. package/node_modules/@groove-dev/daemon/src/adaptive.js +77 -0
  7. package/node_modules/@groove-dev/daemon/src/api.js +35 -5
  8. package/node_modules/@groove-dev/daemon/src/journalist.js +28 -12
  9. package/node_modules/@groove-dev/daemon/src/model-lab.js +53 -76
  10. package/node_modules/@groove-dev/daemon/src/process.js +91 -2
  11. package/node_modules/@groove-dev/daemon/src/rotator.js +45 -3
  12. package/node_modules/@groove-dev/gui/dist/assets/{index-Dozp69tK.js → index-BrZHF7pK.js} +1770 -1766
  13. package/node_modules/@groove-dev/gui/dist/assets/index-DIfiwdKl.css +1 -0
  14. package/node_modules/@groove-dev/gui/dist/index.html +2 -2
  15. package/node_modules/@groove-dev/gui/package.json +1 -1
  16. package/node_modules/@groove-dev/gui/src/components/agents/agent-chat.jsx +60 -18
  17. package/node_modules/@groove-dev/gui/src/components/agents/agent-feed.jsx +42 -20
  18. package/node_modules/@groove-dev/gui/src/components/agents/agent-file-tree.jsx +1 -1
  19. package/node_modules/@groove-dev/gui/src/components/agents/workspace-mode.jsx +1 -1
  20. package/node_modules/@groove-dev/gui/src/components/chat/chat-messages.jsx +2 -22
  21. package/node_modules/@groove-dev/gui/src/components/editor/code-editor.jsx +9 -9
  22. package/node_modules/@groove-dev/gui/src/components/editor/file-tree.jsx +1 -1
  23. package/node_modules/@groove-dev/gui/src/components/editor/terminal.jsx +7 -0
  24. package/node_modules/@groove-dev/gui/src/components/lab/chat-playground.jsx +59 -51
  25. package/node_modules/@groove-dev/gui/src/components/lab/lab-assistant.jsx +48 -48
  26. package/node_modules/@groove-dev/gui/src/components/lab/metrics-panel.jsx +39 -38
  27. package/node_modules/@groove-dev/gui/src/components/lab/parameter-panel.jsx +4 -5
  28. package/node_modules/@groove-dev/gui/src/components/lab/preset-manager.jsx +11 -11
  29. package/node_modules/@groove-dev/gui/src/components/lab/runtime-config.jsx +66 -62
  30. package/node_modules/@groove-dev/gui/src/components/lab/system-prompt-editor.jsx +13 -13
  31. package/node_modules/@groove-dev/gui/src/components/layout/breadcrumb-bar.jsx +1 -1
  32. package/node_modules/@groove-dev/gui/src/components/preview/preview-workspace.jsx +62 -22
  33. package/node_modules/@groove-dev/gui/src/components/ui/slider.jsx +16 -17
  34. package/node_modules/@groove-dev/gui/src/components/ui/table-tree.jsx +38 -0
  35. package/node_modules/@groove-dev/gui/src/stores/groove.js +23 -9
  36. package/node_modules/@groove-dev/gui/src/views/editor.jsx +1 -1
  37. package/node_modules/@groove-dev/gui/src/views/model-lab.jsx +101 -87
  38. package/node_modules/moe-training/client/domain-tagger.js +1 -1
  39. package/node_modules/moe-training/scripts/retag-delegate-yield.js +303 -0
  40. package/node_modules/moe-training/test/shared/envelope-schema.test.js +3 -3
  41. package/package.json +1 -1
  42. package/packages/cli/package.json +1 -1
  43. package/packages/daemon/package.json +1 -1
  44. package/packages/daemon/src/adaptive.js +77 -0
  45. package/packages/daemon/src/api.js +35 -5
  46. package/packages/daemon/src/journalist.js +28 -12
  47. package/packages/daemon/src/model-lab.js +53 -76
  48. package/packages/daemon/src/process.js +91 -2
  49. package/packages/daemon/src/rotator.js +45 -3
  50. package/packages/gui/dist/assets/{index-Dozp69tK.js → index-BrZHF7pK.js} +1770 -1766
  51. package/packages/gui/dist/assets/index-DIfiwdKl.css +1 -0
  52. package/packages/gui/dist/index.html +2 -2
  53. package/packages/gui/package.json +1 -1
  54. package/packages/gui/src/components/agents/agent-chat.jsx +60 -18
  55. package/packages/gui/src/components/agents/agent-feed.jsx +42 -20
  56. package/packages/gui/src/components/agents/agent-file-tree.jsx +1 -1
  57. package/packages/gui/src/components/agents/workspace-mode.jsx +1 -1
  58. package/packages/gui/src/components/chat/chat-messages.jsx +2 -22
  59. package/packages/gui/src/components/editor/code-editor.jsx +9 -9
  60. package/packages/gui/src/components/editor/file-tree.jsx +1 -1
  61. package/packages/gui/src/components/editor/terminal.jsx +7 -0
  62. package/packages/gui/src/components/lab/chat-playground.jsx +59 -51
  63. package/packages/gui/src/components/lab/lab-assistant.jsx +48 -48
  64. package/packages/gui/src/components/lab/metrics-panel.jsx +39 -38
  65. package/packages/gui/src/components/lab/parameter-panel.jsx +4 -5
  66. package/packages/gui/src/components/lab/preset-manager.jsx +11 -11
  67. package/packages/gui/src/components/lab/runtime-config.jsx +66 -62
  68. package/packages/gui/src/components/lab/system-prompt-editor.jsx +13 -13
  69. package/packages/gui/src/components/layout/breadcrumb-bar.jsx +1 -1
  70. package/packages/gui/src/components/preview/preview-workspace.jsx +62 -22
  71. package/packages/gui/src/components/ui/slider.jsx +16 -17
  72. package/packages/gui/src/components/ui/table-tree.jsx +38 -0
  73. package/packages/gui/src/stores/groove.js +23 -9
  74. package/packages/gui/src/views/editor.jsx +1 -1
  75. package/packages/gui/src/views/model-lab.jsx +101 -87
  76. package/plan_files/DELEGATE_YIELD_TRAINING_TAGS.md +135 -0
  77. package/plan_files/session-quality-rotation-fixes.md +218 -0
  78. package/test.py +571 -0
  79. package/node_modules/@groove-dev/gui/dist/assets/index-BgQL4bNl.css +0 -1
  80. package/packages/gui/dist/assets/index-BgQL4bNl.css +0 -1
  81. /package/{AGENT_ORCHESTRATION.md → plan_files/AGENT_ORCHESTRATION.md} +0 -0
  82. /package/{DYNAMIC_LEAF_ARCH.md → plan_files/DYNAMIC_LEAF_ARCH.md} +0 -0
  83. /package/{EMBEDDING_DIAGNOSTIC.md → plan_files/EMBEDDING_DIAGNOSTIC.md} +0 -0
  84. /package/{EMBEDDING_SERVICE_BUILD_PLAN.md → plan_files/EMBEDDING_SERVICE_BUILD_PLAN.md} +0 -0
  85. /package/{MOE_TRAINING_PIPELINE.md → plan_files/MOE_TRAINING_PIPELINE.md} +0 -0
@@ -0,0 +1,218 @@
1
+ Session Quality & Preemptive Rotation Fixes — Build Doc
2
+ ========================================================
3
+
4
+ Problem
5
+ -------
6
+ Long-running Claude sessions (30+ hours) degrade: freezing mid-sentence, losing context, output quality dropping. GROOVE's rotation system triggers AFTER failure instead of BEFORE.
7
+
8
+ Root Cause
9
+ ----------
10
+ Claude Code provider is flagged managesOwnContext = true (packages/daemon/src/providers/claude-code.js:40). This causes rotator.js line 183 to SKIP all context threshold, token ceiling, and hard ceiling checks. The only rotation path left is quality-based scoring, and it misses the specific degradation patterns.
11
+
12
+ Evidence from fullstack-10 (training team, 11-day session):
13
+ - 2.5M tokens, 247% context usage, ZERO quality rotations triggered
14
+ - All 7 rotations were natural_compaction or manual
15
+ - Output collapsed to 1-8 tokens per response (messages 1484-1510)
16
+ - Thinking blocks truncated mid-sentence at 8 tokens
17
+ - 5 consecutive identical python3 commands (agent stuck in loop)
18
+ - Cache read tokens dropped from 150K to 0 twice (context resets)
19
+ - 80 rate limit events
20
+ - Tool-call-only responses with zero explanatory text in late session
21
+
22
+
23
+ Files Modified
24
+ --------------
25
+
26
+ 1. packages/daemon/src/process.js
27
+ - Change 1: Incomplete response detection in _handleAgentOutput
28
+ - Change 5: Cache reset detection (cacheReadTokens drop tracking)
29
+ - Change 7: Intro context size warning after buildFullPrompt()
30
+
31
+ 2. packages/daemon/src/rotator.js
32
+ - Change 3: Compaction-aware rotation (compaction count tracking, age decay extension)
33
+ - Change 4: Truncation-triggered immediate rotation
34
+ - Change 8: Fix duplicate rotation history entries from recordNaturalCompaction()
35
+
36
+ 3. packages/daemon/src/adaptive.js
37
+ - Change 2: New signals in extractSignals() and scoreSession()
38
+
39
+ 4. packages/daemon/src/journalist.js
40
+ - Change 6: Handoff brief compression and deduplication
41
+
42
+
43
+ Change Details
44
+ --------------
45
+
46
+ CHANGE 1 — Incomplete Response Detection (process.js)
47
+
48
+ What: Detect mid-sentence truncation and abandoned tool calls.
49
+
50
+ Where: _handleAgentOutput method, triggered on output.type === 'activity' && output.subtype === 'assistant'.
51
+
52
+ Logic:
53
+ - Check if last text block ends without terminal punctuation (. ? ! or closing code fence/brace)
54
+ - Track if tool_use was emitted without a matching tool_result in next turn
55
+ - Set registry fields: truncationSuspected (bool), consecutiveTruncations (int)
56
+ - 2+ consecutive truncations = hard degradation signal
57
+ - Emit classifier event: { type: 'error', subtype: 'truncated_response' }
58
+ - Reduce stall threshold from 5min to 2min for truncation-flagged agents
59
+
60
+ Edge cases to watch:
61
+ - Short valid responses like "OK" or "Done" end with a period — should not trigger
62
+ - Tool-use-only responses (no text block at all) need special handling — check if it's a normal tool call or a degraded one by looking at whether the tool_use has meaningful input
63
+ - Must not trigger on the very first response of a session
64
+
65
+
66
+ CHANGE 2 — Output Velocity Decay Signals (adaptive.js)
67
+
68
+ What: Four new quality signals in extractSignals() with corresponding scoring in scoreSession().
69
+
70
+ New signals:
71
+ - outputLengthDecay: Compare avg output tokens in last 5 turns vs first 5. If <50%, signal=1. Score: -10 pts.
72
+ - toolOutputVolume: Sum tool result character lengths. >300KB = signal 1 (-5 pts). >600KB = signal 2 (-10 pts).
73
+ - turnLatencyTrend: Compare avg timestamp gaps in last 10 vs first 10 entries. If >2x, signal=1. Score: -5 pts.
74
+ - bashRepetition: Count consecutive identical Bash commands. 3+ identical = signal 1 (-8 pts).
75
+
76
+ Where in extractSignals(): After the existing fileChurn and errorTrend calculations.
77
+
78
+ Where in scoreSession(): After the existing errorTrend scoring block.
79
+
80
+ Edge cases to watch:
81
+ - outputLengthDecay needs enough turns to compare (require at least 10 entries with output tokens)
82
+ - bashRepetition should normalize commands (trim whitespace) before comparing
83
+ - toolOutputVolume counts cumulative session total, not per-turn
84
+
85
+
86
+ CHANGE 3 — Compaction-Aware Rotation (rotator.js)
87
+
88
+ What: Track compaction events per agent and use them to progressively tighten rotation criteria.
89
+
90
+ Where:
91
+ - recordNaturalCompaction(): increment this.compactionCounts Map
92
+ - check(): new block BEFORE quality rotation (around line 240), applies to ALL providers
93
+ - scoreLiveSession(): extended age decay
94
+
95
+ Logic:
96
+ - this.compactionCounts = new Map() in constructor
97
+ - recordNaturalCompaction increments count for the agent
98
+ - In check(): if compactionCounts >= 5, force rotate with reason 'compaction_ceiling'
99
+ - In check(): if compactionCounts >= 3, use effectiveQualityThreshold = 55 instead of 40
100
+ - scoreLiveSession() extended age decay: >7200s=-15, >14400s=-20, >28800s=-25
101
+
102
+ Edge cases to watch:
103
+ - Compaction count must reset when agent is rotated (new agent ID = fresh count)
104
+ - Don't increment count for the duplicate "unknown" provider entries (see Change 8)
105
+ - The compaction ceiling check should respect MIN_AGE_SEC to avoid killing brand-new agents
106
+
107
+
108
+ CHANGE 4 — Truncation-Triggered Rotation (rotator.js)
109
+
110
+ What: Immediate rotation on consecutive truncated responses.
111
+
112
+ Where: check() method, BEFORE the quality rotation block.
113
+
114
+ Logic:
115
+ - Read agent.truncationSuspected and agent.consecutiveTruncations from registry
116
+ - consecutiveTruncations >= 2: force rotate, reason 'incomplete_response', bypass cooldown
117
+ - truncationSuspected true (single): lower effective quality threshold to 55
118
+
119
+ Edge cases to watch:
120
+ - Must still respect the singleTask skip for Codex
121
+ - Must still respect the idle check (don't rotate while agent is actively producing output)
122
+ - Clear truncation flags on the NEW agent after rotation
123
+
124
+
125
+ CHANGE 5 — Cache Reset Detection (process.js)
126
+
127
+ What: Detect sudden drops in cache read tokens as a context quality signal.
128
+
129
+ Where: _handleAgentOutput, near the existing contextUsage tracking block (around line 1383).
130
+
131
+ Logic:
132
+ - Add Map: this._prevCacheRead (agentId -> last cacheReadTokens value)
133
+ - On each output with cacheReadTokens: compare to previous value
134
+ - If previous > 50,000 AND current < previous * 0.5: cache reset detected
135
+ - Set registry field: cacheResetDetected = true
136
+ - Emit classifier event: { type: 'error', subtype: 'cache_reset' }
137
+ - Rotator treats cacheResetDetected same as truncationSuspected
138
+
139
+ Edge cases to watch:
140
+ - Cache builds up from 0 at session start — must require previous > 50K minimum
141
+ - Natural compaction causes a cache drop too, but recordNaturalCompaction already handles that — don't double-count
142
+ - If the provider doesn't report cacheReadTokens (non-Claude providers), skip this check
143
+
144
+
145
+ CHANGE 6 — Handoff Brief Compression (journalist.js)
146
+
147
+ What: Cap brief size and deduplicate content across rotations.
148
+
149
+ Where: generateHandoffBrief() method (line 851+) and buildRotationSynthesisPrompt() (line 436+).
150
+
151
+ Changes:
152
+ - Hard cap: 5000 chars total for the assembled brief. Truncate from the bottom (rotation history and original task get cut first).
153
+ - Replace inline discoveries with pointer to .groove/memory/agent-discoveries.jsonl
154
+ - Cap original task section to 500 chars
155
+ - Drop 'Recent User Messages' for quality_degradation rotations
156
+ - Reduce rotation history from 3 entries to 1
157
+ - buildRotationSynthesisPrompt: reduce input cap from 30K to 15K, response limit from 2000 to 1500
158
+ - Dedup: hash discovery entries and skip any that appear in the rotation history section
159
+
160
+ Priority order when truncating to fit 5000 char cap:
161
+ 1. Unresolved Errors (keep)
162
+ 2. User Constraints (keep)
163
+ 3. Last 5 Tool Calls (keep)
164
+ 4. Session Summary (truncate to 1500 chars if needed)
165
+ 5. Rotation History (1 entry, truncate to 500 chars if needed)
166
+ 6. Original Task (truncate to 500 chars)
167
+ 7. Everything else (drop)
168
+
169
+
170
+ CHANGE 7 — Intro Context Size Warning (process.js or introducer.js)
171
+
172
+ What: Warn when injected context is too large, optionally truncate.
173
+
174
+ Where: After buildFullPrompt() in the spawn flow, or in the introducer where intro context is assembled.
175
+
176
+ Logic:
177
+ - Measure Buffer.byteLength(fullPrompt)
178
+ - If > 8000 chars: console.warn with agent name and size
179
+ - Config field: maxIntroContextChars (default 10000)
180
+ - If exceeded: truncate agent.introContext (not the task prompt) to fit
181
+ - Truncation should preserve the first and last sections, cutting middle content
182
+
183
+ Edge cases to watch:
184
+ - Don't truncate the task prompt — only the GROOVE-injected intro context
185
+ - The warning should include which agent and what size, so the user can debug
186
+
187
+
188
+ CHANGE 8 — Fix Duplicate Rotation History Entries (rotator.js)
189
+
190
+ What: recordNaturalCompaction() produces two entries per event — one real, one "unknown" with 0 tokens.
191
+
192
+ Where: recordNaturalCompaction() method (line 484+).
193
+
194
+ Evidence: rotation-history.json shows paired entries like:
195
+ { agentName: "planner-9", provider: "claude-code", oldTokens: 4358, ... }
196
+ { agentName: "planner-9", provider: "unknown", oldTokens: 0, ... }
197
+
198
+ Root cause: Likely the broadcast from recordNaturalCompaction triggers a listener that calls it again, or the provider lookup fails on the second invocation. Debug by checking what calls recordNaturalCompaction — it's called from process.js _handleAgentOutput (line 1392). Check if the same stdout event produces two parseOutput results.
199
+
200
+ Fix: Add a dedup guard — if the last entry in rotationHistory has the same agentId and a timestamp within 1 second, skip. Or fix the upstream double-call.
201
+
202
+
203
+ Testing
204
+ -------
205
+ - npm test from packages/daemon/ must pass with all changes
206
+ - npm run build from packages/gui/ must compile
207
+ - Manual test: spawn a long-running agent and verify compaction counting works
208
+ - Manual test: send a large file read and verify toolOutputVolume signal fires
209
+ - Verify rotation-history.json no longer produces duplicate entries
210
+
211
+ Rollback
212
+ --------
213
+ All changes are in 4 files in packages/daemon/src/. If any change causes issues:
214
+ - Revert the specific file to its pre-change state via git checkout
215
+ - Changes are independent enough that any single change can be reverted without affecting the others, EXCEPT:
216
+ - Change 4 depends on Change 1 (truncation detection feeds truncation rotation)
217
+ - Change 3 depends on recordNaturalCompaction (Change 8 touches the same method)
218
+ - If reverting Change 8, also verify Change 3's compaction counting still works