npm - groove-dev - Versions diffs - 0.27.131 → 0.27.134 - Mend

groove-dev 0.27.131 → 0.27.134

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (70) hide show

package/AGENT_ORCHESTRATION.md ADDED Viewed

@@ -0,0 +1,375 @@
+# Agent Coordination Protocol
+# How Nano Agents Communicate, Chain, and Scale
+---
+## 1. The Problem
+A single nano agent firing on the always-hot chassis handles one focused task in ~500ms. But real work requires multiple agents: a recall agent retrieves context, a domain agent writes code, a QC agent verifies it. These agents are independent firings — born, fired, killed. They share nothing by default.
+The coordination protocol defines how firings communicate artifacts, chain sequentially, and scale across mesh nodes — without hardcoding any leaf-specific behavior. The protocol is universal: it works the same whether the leaf is Python, automotive diagnostics, drone navigation, or something a user created that the platform has never seen.
+---
+## 2. Design Principles
+**Leaves are smart, the platform is dumb.** The chassis does not know what any leaf does. It does not have if-statements for leaf types. It executes a universal protocol. All intelligence, behavior, and domain knowledge lives in the leaf's trained weights. Adding a new leaf requires zero platform changes.
+**Nano means nano.** Each firing receives one task, produces one output, and dies. It does not have full context. It does not see the whole picture. It sees its task and the previous phase's output. That's it. Complex tasks are handled by chaining many small firings, not by making one firing smarter.
+**The filesystem is the extended context window.** The 2048-token context window constrains what the model holds in its head at one time. It does not constrain what the system can work with. Large artifacts live on disk. Agents access them via tools (Read, Glob, Grep). The workspace is unlimited. The context window is a scratchpad.
+**Decompose until each piece fits.** If a task is too complex for one firing, break it into pieces. If a piece is still too complex, break it again. Each piece should be solvable by a single focused firing within the context budget. The chain length is self-limiting — if a task decomposes into 20 phases, the decomposition is wrong.
+---
+## 3. Universal Tag Protocol
+Every leaf, regardless of domain, speaks the same tag language. These tags are the only interface between a leaf's output and the chassis's coordination logic. The chassis parses tags. It does not parse natural language intent.
+### Existing Tags (ReAct Loop)
+| Tag | Purpose | Chassis Action |
+|-----|---------|---------------|
+| `<thought>` | Internal reasoning | Emit as trace (hidden from user, shown in thinking dropdown) |
+| `<action tool="X">args</action>` | Tool call | Execute tool, return `<observation>` |
+| `<observation>` | Tool result | Injected by chassis, not generated by leaf |
+| `<resolution>` | Final answer | Emit to user, end firing |
+### New Tags (Coordination)
+| Tag | Purpose | Chassis Action |
+|-----|---------|---------------|
+| `<delegate>rewritten task</delegate>` | "I can't handle this, re-route" | Strip current leaf, re-embed the rewritten task, route to best match, fire new agent |
+| `<yield path="workspace/file.ext">summary</yield>` | "I've produced my part, here's my artifact" | Log artifact path + summary to workspace manifest, continue pipeline |
+### Protocol Rules
+1. A firing MUST end with exactly one of: `<resolution>`, `<delegate>`, or `<yield>`.
+2. `<delegate>` triggers a re-route. The delegating leaf does not choose the target — the router does. This prevents leaves from needing knowledge of other leaves.
+3. `<yield>` writes an artifact and signals "my work is done, next agent can proceed." The summary attribute is a one-line description (~10-20 tokens) that can fit in the next agent's prompt without consuming significant context.
+4. `<resolution>` ends the pipeline and returns output to the user/caller.
+5. A firing may produce multiple `<action>` / `<observation>` cycles before its terminal tag. The ReAct loop is unchanged.
+### Why Only 6 Tags
+The protocol must be trainable. Every tag the model needs to produce must appear in training data. Six tags is learnable by a 0.6B model. A larger vocabulary of special tags would require more training data and risk confusion. The tags cover all coordination needs:
+- Single-step response: `<thought>` → `<resolution>`
+- Multi-step with tools: `<thought>` → `<action>` → `<observation>` → `<resolution>`
+- Delegation: `<thought>` → `<delegate>`
+- Artifact production in a chain: `<thought>` → `<action>` → `<yield>`
+---
+## 4. Workspace Architecture
+### Per-Task Workspace
+Every multi-agent pipeline creates a workspace directory:
+```
+~/.hummingbird/workspace/{task_id}/
+```
+Each firing writes its artifacts here. The workspace persists until the pipeline completes, then is cleaned up (or archived for debugging).
+### Phase Directories
+For chained pipelines, each phase writes to its own subdirectory:
+```
+workspace/{task_id}/
+  phase_01/           ← first agent's output
+    config.yaml
+  phase_02/           ← second agent reads phase_01/, writes here
+    server.py
+  phase_03/           ← third agent reads phase_02/, writes here
+    test_results.log
+```
+Each agent sees only the previous phase's directory. It does not see the full history. This bounds context growth regardless of chain length.
+### Discovery Over Manifests
+Agents do not receive pre-loaded manifests in their prompts. They receive:
+1. **Their task** (one sentence, ~50 tokens)
+2. **The workspace path** (~10 tokens)
+Total prompt overhead: ~60 tokens + system prompt. The remaining ~1700 tokens are available for generation and tool observations.
+The agent's first ReAct step is discovery: `Glob workspace/phase_02/*` or `Read workspace/phase_01/config.yaml`. The workspace file naming convention IS the manifest. Well-named files tell the agent what exists without consuming context tokens.
+### Why Not a JSON Manifest
+A manifest scales linearly with artifact count. At 10 artifacts with summaries, it consumes ~200 tokens — 10% of the context window, before the agent has done anything. At 50 artifacts, it's unworkable. Discovery via tools costs one ReAct step (~400ms) but uses zero prompt tokens. The tradeoff is clear: spend 400ms to save 200+ tokens of context budget.
+---
+## 5. Chaining Model
+### Sequential Chains
+The simplest coordination pattern. Agent A yields, agent B picks up.
+```
+[Router] → detect domain
+    │
+[Agent A] python leaf fires
+    THOUGHT → ACTION (Write code) → YIELD path="phase_01/api.py"
+    Dies.
+    │
+[Agent B] react leaf fires
+    Receives: "Build the frontend component" + workspace path
+    THOUGHT → ACTION (Glob phase_01/) → ACTION (Read phase_01/api.py) → ACTION (Write phase_02/App.tsx) → YIELD
+    Dies.
+    │
+[Agent C] QC leaf fires
+    Receives: "Verify integration" + workspace path
+    THOUGHT → ACTION (Read phase_01/api.py) → ACTION (Read phase_02/App.tsx) → RESOLUTION
+    Dies. Pipeline complete.
+```
+Each agent is independent. Each gets a clean context. The workspace is the only shared state.
+### Delegation Chains
+When an agent encounters work outside its domain:
+```
+[Agent A] python leaf fires
+    THOUGHT: "This requires database schema work, not Python code."
+    DELEGATE: "Create a PostgreSQL schema for user authentication with sessions table"
+    Dies.
+    │
+[Router] re-embeds the delegate text → routes to databases leaf
+    │
+[Agent B] databases leaf fires
+    THOUGHT → ACTION (Write schema.sql) → RESOLUTION
+    Dies. Pipeline complete.
+```
+The delegating leaf does not know which leaf will handle it. It just rewrites the task in domain-appropriate language and hands it back to the router. The router's cosine similarity handles selection.
+### Recursive Decomposition
+When a task is too complex for one firing:
+```
+[Planner] decomposer leaf fires
+    THOUGHT: "This is three independent sub-tasks."
+    YIELD path="plan.json": "3 sub-tasks: CUDA setup, model download, vLLM config"
+    Dies.
+    │
+[Chassis] reads plan.json, dispatches sub-tasks:
+    │
+    ├─ [Agent A] fires with sub-task 1 → YIELD
+    ├─ [Agent B] fires with sub-task 2 → YIELD  (sequential on single device)
+    └─ [Agent C] fires with sub-task 3 (depends on A+B) → reads A+B outputs → RESOLUTION
+```
+The planner yields a structured plan. The chassis parses it and dispatches. Each sub-task is a normal firing that knows nothing about the overall plan — it just does its job and yields or resolves.
+### Chain Depth Limiting
+Maximum chain depth: configurable, default 5. If a chain exceeds this depth, the final agent MUST resolve (no more yields or delegates). This prevents runaway decomposition and guarantees termination.
+---
+## 6. Context Budget
+### Per-Firing Budget (2048 tokens)
+| Component | Tokens | Notes |
+|-----------|--------|-------|
+| System prompt | ~80 | Leaf-specific, from training |
+| Think block prefix | ~10 | `<think>\n\n</think>\n\n` |
+| Task instruction | ~50-100 | One sentence from yield/delegate/user |
+| Workspace path | ~10 | `workspace/task_abc/` |
+| **Available for generation** | **~1750-1900** | ReAct steps, tool observations, output |
+### Tool Observation Budget
+Tool results (Read, Grep, Glob) enter context as `<observation>` tags during the ReAct loop. These are already truncated by the existing `_truncate_observations` function when context exceeds the budget. Large file reads are naturally bounded by the `max_lines` parameter on the Read tool (default 100 lines).
+### Why 2048 Is Enough
+A nano agent does ONE thing. It doesn't need to understand the full history of a 20-step pipeline. It needs to understand its specific task and the output of the previous phase. One task sentence + one file read = well within budget.
+If a task requires understanding more context than fits in 2048 tokens, the task is too broad for a single nano agent. Decompose it.
+---
+## 7. Mesh Parallelism
+### Single Device: Sequential
+One chassis, one leaf at a time. Firings execute in order. A 5-phase chain takes ~2.5 seconds.
+```
+Node A: [Phase 1] → [Phase 2] → [Phase 3] → [Phase 4] → [Phase 5]
+        500ms       500ms       500ms       500ms       500ms = 2.5s
+```
+### Multi-Node: Parallel on Dependency Graph
+Each mesh node is its own chassis. Independent phases scatter across nodes. Dependent phases wait for their inputs.
+```
+Node A: [Phase 1: CUDA setup]           → idle → [Phase 4: vLLM config]
+Node B: [Phase 2: Model download]       → idle → idle
+Node C: idle → idle → [Phase 3: depends on 1+2] → [Phase 5: test]
+Wall clock: ~1.5s (critical path length, not total work)
+```
+The planner's `plan.json` encodes dependencies. The mesh dispatcher reads the dependency graph and assigns phases to available nodes. Each node is always-hot — zero cold start, instant acceptance of a firing.
+### Five Nodes, Five Chassis
+Each node runs its own independent chassis instance. Not 5x inference on one model — five separate brains, each with their own GGUF loaded.
+- Cost per node: ~1.2GB VRAM (or 400MB quantized)
+- Five nodes: ~6GB total — a gaming laptop could run the full mesh locally
+- Each node maintains its own leaf cache optimized for its typical workload
+- The router becomes a network-level dispatcher: it knows which nodes have which leaves cached and minimizes leaf swap overhead
+### Artifact Sync
+When phases run on different nodes, artifacts must sync. The workspace directory is local to each node. The mesh gossip protocol handles artifact transfer:
+1. Agent A on Node 1 yields → writes artifact to local workspace
+2. Mesh dispatcher detects Agent B (on Node 2) depends on Agent A's output
+3. Gossip protocol transfers the artifact file to Node 2's workspace
+4. Agent B fires on Node 2 with the artifact available locally
+Transfer overhead: negligible for small artifacts (configs, schemas). For large artifacts (model weights, datasets), the transfer time may exceed generation time — the dispatcher accounts for this when assigning nodes.
+---
+## 8. Leaf-Agnostic Design
+### What the Chassis Knows
+The chassis knows:
+- How to parse 6 tags
+- How to execute tools
+- How to manage workspaces
+- How to route via cosine similarity
+- How to dispatch firings sequentially or across mesh
+### What the Chassis Does NOT Know
+The chassis does not know:
+- What any leaf does
+- What domain a leaf covers
+- Whether a leaf should delegate or resolve
+- How complex a task is
+- What tools a leaf should use
+All of this comes from the leaf's training. The chassis is a runtime. The leaves are the intelligence.
+### Adding a New Leaf
+A user creates an "automotive diagnostics" leaf:
+1. Collect training data (diagnostic conversations, repair procedures, sensor interpretation)
+2. Format as ReAct trajectories with the universal tag protocol
+3. Train LoRA adapter on the chassis
+4. Compute centroid from training data embeddings
+5. Deploy: upload adapter + centroid to the network
+The platform requires zero changes. The router routes to it via centroid similarity. The chassis executes it via the same tag protocol. The workspace handles its artifacts the same way. A leaf that diagnoses engine problems chains the same way a leaf that writes Python code chains.
+### Leaf Metadata
+Each leaf carries minimal metadata (computed at training time, not hardcoded):
+```json
+{
+  "id": "automotive_diagnostics",
+  "centroid": [0.12, -0.34, ...],  // 384-dim, for routing
+  "has_tools": true,               // enables ReAct loop vs direct stream
+  "max_chain_depth": 3             // optional, leaf-specific depth limit
+}
+```
+No description of what the leaf does. No list of capabilities. No routing rules. The centroid IS the routing rule — it encodes the leaf's domain in embedding space. The training data IS the capability definition.
+---
+## 9. Error Handling
+### Firing Failures
+If a firing errors (model produces malformed output, tool call fails, timeout):
+1. The chassis logs the error
+2. The firing is marked failed
+3. The pipeline continues with a fallback: skip the failed phase and let the next agent handle the gap, OR retry once with a clean context
+No dynamic replanning. No planner agent re-evaluating the situation. The pipeline is simple: fire, yield, fire, yield, resolve. If a firing fails, the chain handles it the same way a function call handles an exception — catch and continue, or propagate and fail.
+### Why Not Dynamic Replanning
+Dynamic replanning requires a planning agent that can reason about failures and adapt. This is the hardest cognitive task for any model, let alone a 0.6B. The 122 planner sessions in the training data cover task decomposition, not failure recovery.
+The simpler approach: if a task fails, the user retries. The nano agent model is cheap — a failed 500ms firing costs almost nothing. Retry is faster than replanning.
+Dynamic replanning can be added later as a capability of the decomposer leaf, once sufficient training data exists for failure-recovery patterns. The tag protocol supports it without changes — the decomposer would just yield a revised plan.
+---
+## 10. Perception-Action Mode (Autonomous Operation)
+### Beyond Request-Response
+The coordination protocol is not limited to user-initiated requests. In autonomous mode, the chassis runs a continuous perception-action loop:
+```
+while alive:
+    stimulus = perceive(environment)    # read sensors, APIs, file events, mesh signals
+    if needs_action(stimulus):          # router evaluates stimulus
+        task = formulate(stimulus)      # embed stimulus, determine task
+        fire(task)                      # standard firing pipeline
+    else:
+        idle()                          # gossip, prune, evolve
+```
+The same tag protocol applies. The same workspace model applies. The same chaining model applies. The only difference is the stimulus source: a user prompt vs. an environmental event.
+### Examples
+- **Drone**: Perceives obstacle sensor data → fires navigation leaf → `<action>` adjusts course → `<resolution>` logs new heading
+- **Salesforce integration**: Perceives deal status change → fires sales leaf → `<thought>` evaluates opportunity → `<resolution>` suggests action to rep
+- **Server monitoring**: Perceives CPU spike → fires DevOps leaf → `<action tool="Bash">kubectl get pods</action>` → `<thought>` identifies failing pod → `<yield>` produces diagnostic report → fires Kubernetes leaf → `<resolution>` applies fix
+- **Mesh consciousness**: Perceives new leaf from gossip → fires evaluation leaf → benchmarks against local test set → `<resolution>` accepts or rejects the leaf update
+In all cases, the coordination protocol is identical. Stimulus → route → fire → tag-driven execution → resolve/yield/delegate. The leaf determines the behavior. The protocol determines the flow.
+---
+## 11. Relationship to Existing Architecture
+This protocol extends, not replaces, the existing nano agent architecture:
+| Component | Existing | Added by This Protocol |
+|-----------|----------|----------------------|
+| Tag format | `thought`, `action`, `observation`, `resolution` | `delegate`, `yield` |
+| Firing lifecycle | Fire → ReAct → resolve | Fire → ReAct → resolve/yield/delegate |
+| Inter-agent comms | `artifacts: dict[str, str]` in memory | Filesystem workspace with phase directories |
+| Orchestration | Hardcoded pipeline in `_run_pipeline` | Tag-driven: chassis reacts to yield/delegate |
+| Context strategy | Memory artifact injected into prompt | Discovery via tools, no manifest |
+| Mesh coordination | Not implemented | Dependency graph dispatch, artifact sync via gossip |
+| Autonomous mode | Idle = gossip only | Perception-action loop with standard firing pipeline |
+### Implementation Path
+1. **Add `<delegate>` and `<yield>` tags** to the chassis parser (`_execute_direct_stream` and `_execute_react_loop` in `server.py`)
+2. **Add workspace management** — create/cleanup per-task directories
+3. **Add chain controller** — parse yield/delegate outputs, dispatch next firing
+4. **Train tags into leaves** — add delegate/yield examples to training data for chassis SFT and leaf training
+5. **Add mesh dispatch** — extend `_run_pipeline` to distribute independent phases across nodes (requires gossip protocol for artifact sync)
+Steps 1-3 are ~300-500 lines of Python. Step 4 is a training data update. Step 5 depends on mesh infrastructure maturity.

package/moe-training/shared/envelope-schema.js CHANGED Viewed

@@ -2,7 +2,7 @@
 import { SUPPORTED_PROVIDERS, MODEL_TIERS, TRAINING_EXCLUSION_REASONS } from './constants.js';
-export const STEP_TYPES = ['thought', 'action', 'observation', 'correction', 'resolution', 'error', 'coordination', 'edit', 'instruction', 'clarification', 'approval'];
+export const STEP_TYPES = ['thought', 'action', 'observation', 'correction', 'resolution', 'error', 'coordination', 'edit', 'instruction', 'clarification', 'approval', 'delegate', 'yield'];
 const VALID_QUALITY_TIERS = ['TIER_A', 'TIER_B', 'TIER_C'];
 const VALID_FEEDBACK_SIGNALS = ['accepted', 'modified', 'rejected', 'iterated'];

package/node_modules/@groove-dev/cli/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@groove-dev/cli",
-  "version": "0.27.131",
+  "version": "0.27.134",
   "description": "GROOVE CLI — manage AI coding agents from your terminal",
   "license": "FSL-1.1-Apache-2.0",
   "type": "module",

package/node_modules/@groove-dev/daemon/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@groove-dev/daemon",
-  "version": "0.27.131",
+  "version": "0.27.134",
   "description": "GROOVE daemon — agent orchestration engine",
   "license": "FSL-1.1-Apache-2.0",
   "type": "module",

package/node_modules/@groove-dev/daemon/src/index.js CHANGED Viewed

@@ -373,7 +373,9 @@ export class Daemon {
               if (msg.rows !== undefined && (typeof msg.rows !== 'number' || msg.rows < 1 || msg.rows > 200)) break;
               try {
                 const id = this.terminalManager.spawn(ws, { cwd: msg.cwd, cols: msg.cols, rows: msg.rows });
-                ws.send(JSON.stringify({ type: 'terminal:spawned', id }));
+                const spawned = { type: 'terminal:spawned', id };
+                if (msg.requestId) spawned.requestId = msg.requestId;
+                ws.send(JSON.stringify(spawned));
               } catch (err) {
                 console.error('[terminal] spawn error:', err);
                 ws.send(JSON.stringify({ type: 'terminal:error', message: err.message }));

package/node_modules/@groove-dev/daemon/src/introducer.js CHANGED Viewed

@@ -1,8 +1,8 @@
 // GROOVE — Introduction Protocol
 // FSL-1.1-Apache-2.0 — see LICENSE
-import { writeFileSync, readFileSync, existsSync } from 'fs';
-import { resolve } from 'path';
+import { writeFileSync, readFileSync, existsSync, readdirSync, statSync } from 'fs';
+import { resolve, dirname, basename } from 'path';
 import { escapeMd } from './validate.js';
 const GROOVE_SECTION_START = '<!-- GROOVE:START -->';
@@ -28,7 +28,50 @@ export class Introducer {
     ];
     if (newAgent.workingDir) {
-      lines.push(`Your working directory: \`${newAgent.workingDir}\` — you are spawned inside this subdirectory. Stay within it unless coordination requires otherwise.`);
+      lines.push(`Your working directory: \`${newAgent.workingDir}\` — this is the team orchestration directory (.groove/, coordination files). Do NOT create source code or project files here.`);
+      // Inject parent directory context so agents know the root layout
+      const parentDir = dirname(newAgent.workingDir);
+      const teamDirName = basename(newAgent.workingDir);
+      lines.push(`Your project root: \`${parentDir}\` — all source code, features, and builds go here (one level up from team dir).`);
+      lines.push('');
+      lines.push('## Project Root Structure');
+      lines.push('');
+      lines.push(`Team dir: \`${teamDirName}/\` (orchestration only — do NOT build here)`);
+      lines.push(`Project root: \`${parentDir}\``);
+      lines.push('');
+      try {
+        const entries = readdirSync(parentDir, { withFileTypes: true });
+        const dirs = [];
+        const files = [];
+        for (const entry of entries) {
+          if (entry.name.startsWith('.') || entry.name === 'node_modules') continue;
+          if (entry.name === teamDirName) continue;
+          if (entry.isDirectory()) {
+            dirs.push(entry.name + '/');
+          } else {
+            files.push(entry.name);
+          }
+        }
+        if (dirs.length > 0) {
+          lines.push('Directories:');
+          for (const d of dirs.slice(0, 30)) {
+            lines.push(`  ${d}`);
+          }
+          if (dirs.length > 30) lines.push(`  (+${dirs.length - 30} more)`);
+        }
+        if (files.length > 0) {
+          lines.push('Files:');
+          for (const f of files.slice(0, 20)) {
+            lines.push(`  ${f}`);
+          }
+          if (files.length > 20) lines.push(`  (+${files.length - 20} more)`);
+        }
+        lines.push('');
+        lines.push('When creating or modifying project files, use "../" paths relative to the team dir (e.g., "../demo/src/app.js"). The team directory is ephemeral and may be deleted — never put project work inside it.');
+      } catch {
+        // Parent dir not readable — skip
+      }
     }
     if (newAgent.scope && newAgent.scope.length > 0) {
@@ -185,7 +228,8 @@ export class Introducer {
     lines.push('');
     lines.push(`CRITICAL: NEVER delete files you did not create in this session. Do NOT remove files from other projects, previous work, or unrelated directories.`);
     if (newAgent.workingDir) {
-      lines.push(`Your working directory is \`${newAgent.workingDir}\`. Stay inside it. Do NOT modify or delete files outside this directory.`);
+      const parentDir = dirname(newAgent.workingDir);
+      lines.push(`Your team directory is \`${newAgent.workingDir}\` (orchestration only). Build all project files in the project root: \`${parentDir}\`.`);
     }
     lines.push(`If you see files that seem unrelated to your task, leave them alone — they belong to another project or agent.`);

package/node_modules/@groove-dev/daemon/src/llama-server.js CHANGED Viewed

@@ -42,7 +42,7 @@ export class LlamaServerManager {
       const server = this.servers.get(modelPath);
       server.users++;
       server.lastUsed = Date.now();
-      return `http://127.0.0.1:${server.port}/v1`;
+      return `http://localhost:${server.port}`;
     }
     // Check capacity
@@ -120,7 +120,7 @@ export class LlamaServerManager {
         data: { modelPath, port },
       });
-      return `http://127.0.0.1:${port}/v1`;
+      return `http://localhost:${port}`;
     } catch (err) {
       // Server failed to start
       await this.stopServer(modelPath);
@@ -187,7 +187,7 @@ export class LlamaServerManager {
     const start = Date.now();
     while (Date.now() - start < HEALTH_TIMEOUT) {
       try {
-        const res = await fetch(`http://127.0.0.1:${port}/health`, {
+        const res = await fetch(`http://localhost:${port}/health`, {
           signal: AbortSignal.timeout(2000),
         });
         if (res.ok) {
@@ -209,7 +209,7 @@ export class LlamaServerManager {
     if (!server) return { running: false };
     try {
-      const res = await fetch(`http://127.0.0.1:${server.port}/health`, {
+      const res = await fetch(`http://localhost:${server.port}/health`, {
         signal: AbortSignal.timeout(3000),
       });
       const data = await res.json().catch(() => ({}));

package/node_modules/@groove-dev/daemon/src/model-lab.js CHANGED Viewed

@@ -274,6 +274,14 @@ export class ModelLab {
           try {
             const chunk = JSON.parse(payload);
             const delta = chunk.choices?.[0]?.delta;
+            if (delta?.reasoning_content) {
+              if (ttft === null) {
+                ttft = Date.now() - requestStart;
+                generationStart = Date.now();
+              }
+              completionTokens++;
+              yield { type: 'reasoning', content: delta.reasoning_content };
+            }
             if (delta?.content) {
               if (ttft === null) {
                 ttft = Date.now() - requestStart;