npm - bloby-bot - Versions diffs - 0.20.6 → 0.20.8 - Mend

bloby-bot 0.20.6 → 0.20.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/package.json +1 -1
package/supervisor/agents/ARCHITECTURE.md +333 -0
package/supervisor/channels/manager.ts +1 -1
package/supervisor/chat/src/components/Chat/TypingIndicator.tsx +4 -0
package/supervisor/index.ts +2 -2

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "bloby-bot",
-  "version": "0.20.6",
+  "version": "0.20.8",
   "releaseNotes": [
     "1. react router implemented",
     "2. new workspace design",

package/supervisor/agents/ARCHITECTURE.md ADDED Viewed

@@ -0,0 +1,333 @@
+# Agent Architecture — Long-lived Query Model
+## Current State (v1)
+Each user message spawns a new `query()` call with `maxTurns: 8`. The orchestrator responds, optionally delegates coding tasks to background sub-agents via the SDK's native `agents` config, and the query ends. The next user message starts a fresh query.
+### What works
+- Orchestrator delegates coding tasks to background `coder` agent
+- Quick tasks (memory writes, config edits) handled directly by orchestrator
+- User never sees the multi-agent abstraction — feels like one entity
+- Sub-agent progress/completion events streamed via WebSocket
+- Scalable agent definitions in `supervisor/agents/`
+### What doesn't
+- **maxTurns budget**: 8 turns gets consumed fast (memory write + delegation + response). Complex messages with multiple intents hit the limit.
+- **Blocking**: While the orchestrator runs its 8 turns, the user can't send another message. The chat locks for 5-15 seconds.
+- **No proactive reporting**: When a sub-agent finishes, the orchestrator can't report back — it's dead. The user only learns about completion on their NEXT message.
+- **Race conditions**: If the user sends a message while a query is running, a second `query()` spawns. Two queries run in parallel with interleaved responses and orphaned abort controllers.
+- **Stateless context**: Each query re-reads memory files, re-assembles the system prompt, re-loads agent definitions. Wasteful and slow.
+---
+## Target Architecture (v2) — Long-lived Query
+### Core Concept
+One `query()` per conversation that stays alive for the duration of the session. User messages are pushed into an async input queue. The agent processes them naturally — responding, delegating, and reporting — all within a single continuous stream.
+```
+┌──────────────┐         ┌─────────────────────┐
+│  User sends   │────────▶│   Async Input Queue  │
+│  messages     │         │  (push at any time)  │
+└──────────────┘         └────────┬────────────┘
+                                  │
+                                  ▼
+                         ┌─────────────────────┐
+                         │   Single long-lived  │
+                         │     query() loop     │◀── background agents report back
+                         │                      │
+                         └────────┬────────────┘
+                                  │
+                                  ▼
+                         ┌─────────────────────┐
+                         │  Stream SDKMessages  │──▶ tokens, tools, responses
+                         │  back to WebSocket   │    all fluid, no blocking
+                         └─────────────────────┘
+```
+### SDK Features That Enable This
+**1. AsyncIterable prompt input**
+`query()` accepts `AsyncIterable<SDKUserMessage>` instead of a plain string. This lets us create an async queue that the SDK consumes as messages arrive:
+```typescript
+// Create an async queue the SDK reads from
+const inputQueue = createAsyncQueue<SDKUserMessage>();
+// Start the query with the queue as input
+const handle = query({
+  prompt: inputQueue,
+  options: { ... }
+});
+// Push messages at any time — the SDK picks them up
+inputQueue.push({
+  type: 'user',
+  message: { role: 'user', content: 'Build me a page' },
+});
+```
+**2. SDKUserMessage.priority**
+Controls when injected messages are processed:
+- `'now'` — interrupt the agent immediately, inject this message into the current turn
+- `'next'` — process after the current turn finishes (default for most user messages)
+- `'later'` — low priority queue (good for system notifications)
+**3. streamInput()**
+Method on the query handle to pipe additional message streams into a running query. Alternative to the queue approach.
+**4. resume / sessionId**
+Session continuity across query restarts. The SDK persists conversation history to JSONL transcripts and can resume from where it left off. This means the long-lived query can survive supervisor restarts.
+**5. stopTask(taskId)**
+Stop specific background agents without killing the whole query. Already integrated in our current implementation.
+---
+## Architecture Details
+### Lifecycle
+```
+User opens chat
+  │
+  ├── First message arrives
+  │     └── Start long-lived query(asyncQueue)
+  │           ├── System prompt + memory + config injected once
+  │           ├── Agents config (coder, researcher) loaded once
+  │           └── for-await loop begins processing events
+  │
+  ├── User sends more messages
+  │     └── Push to asyncQueue (priority: 'next')
+  │           └── Agent sees them naturally, responds in sequence
+  │
+  ├── User sends message while agent is working
+  │     └── Push to asyncQueue (priority: 'next')
+  │           └── Agent processes it after current turn
+  │           └── OR use priority: 'now' for urgent interrupts
+  │
+  ├── Background agent completes
+  │     └── SDK notifies the orchestrator via task_notification
+  │           └── Orchestrator responds naturally: "Done! Here's what I built."
+  │           └── No user message needed — orchestrator is alive
+  │
+  ├── User disconnects / reconnects
+  │     └── Query keeps running (no WebSocket needed for SDK)
+  │     └── Reconnecting client gets caught up via chat:state
+  │
+  ├── Conversation cleared
+  │     └── End the query (abort controller)
+  │     └── Next message starts a new long-lived query
+  │
+  └── Supervisor restarts
+        └── Resume from sessionId (SDK persists to JSONL)
+```
+### What Changes from v1
+| Component | v1 (current) | v2 (long-lived) |
+|---|---|---|
+| `bloby-agent.ts` | `startBlobyAgentQuery()` called per message, creates new `query()` each time | `startConversation()` creates one `query()`, `pushMessage()` adds to queue |
+| `index.ts` WS handler | `user:message` → call `startBlobyAgentQuery()` | `user:message` → call `pushMessage()` on existing conversation |
+| maxTurns | 8 (orchestrator), 50 (sub-agents) | None for orchestrator (runs indefinitely), 50 for sub-agents |
+| System prompt | Re-assembled every message | Assembled once on conversation start |
+| Memory files | Re-read every message | Read once, updated via tool use (agent reads/writes them naturally) |
+| Agent definitions | `buildAgents()` called every message | Built once on conversation start |
+| `agentQueryActive` | Boolean flag, true during query | Always true while conversation exists |
+| Backend restart deferral | Deferred until bot:done | Need new mechanism — query never ends. Defer until agent is between turns (no active tool use). |
+| Sub-agent completion | User learns on next message | Orchestrator reports immediately — it's alive |
+| Error recovery | Query fails → error message → next message starts fresh | Query fails → attempt resume from sessionId → fallback to new query |
+### Key Implementation Details
+#### 1. Async Input Queue
+```typescript
+function createAsyncQueue<T>(): AsyncIterable<T> & { push: (item: T) => void; end: () => void } {
+  const pending: T[] = [];
+  let resolve: ((value: IteratorResult<T>) => void) | null = null;
+  let done = false;
+  return {
+    push(item: T) {
+      if (resolve) {
+        resolve({ value: item, done: false });
+        resolve = null;
+      } else {
+        pending.push(item);
+      }
+    },
+    end() {
+      done = true;
+      if (resolve) resolve({ value: undefined as any, done: true });
+    },
+    [Symbol.asyncIterator]() {
+      return {
+        next(): Promise<IteratorResult<T>> {
+          if (pending.length > 0) {
+            return Promise.resolve({ value: pending.shift()!, done: false });
+          }
+          if (done) return Promise.resolve({ value: undefined as any, done: true });
+          return new Promise((r) => { resolve = r; });
+        },
+      };
+    },
+  };
+}
+```
+#### 2. Conversation Manager
+Replace the current per-message model with a conversation-scoped manager:
+```typescript
+interface LiveConversation {
+  id: string;
+  inputQueue: AsyncQueue<SDKUserMessage>;
+  queryHandle: QueryHandle;
+  abortController: AbortController;
+}
+const conversations = new Map<string, LiveConversation>();
+// Start a conversation (called once, on first message)
+function startConversation(convId: string, options: ConvOptions): LiveConversation
+// Push a user message into an existing conversation
+function pushMessage(convId: string, content: string, attachments?: any[]): void
+// End a conversation (clear context, abort)
+function endConversation(convId: string): void
+```
+#### 3. WebSocket Handler Changes
+```typescript
+// Current (v1):
+ws.on('user:message') → startBlobyAgentQuery(convId, content, ...)
+// New (v2):
+ws.on('user:message') → {
+  let conv = conversations.get(convId);
+  if (!conv) {
+    conv = startConversation(convId, { model, names, ... });
+  }
+  pushMessage(convId, content, attachments);
+}
+```
+#### 4. Backend Restart Timing
+Currently, backend restart is deferred until `bot:done` (query end). With a long-lived query, there's no `bot:done` until the conversation ends. New approach:
+- Track whether the agent is currently in a tool-use turn (has active tool calls)
+- When file changes detected (Write/Edit used), schedule restart for when no tool is active
+- The `bot:response` event (agent finished responding) is a safe restart point
+- Sub-agent `bot:task-done` events are also safe restart points
+#### 5. Session Persistence / Resume
+The SDK writes conversation history to JSONL. On supervisor restart:
+```typescript
+// Try to resume existing conversation
+const conv = query({
+  prompt: inputQueue,
+  options: {
+    ...opts,
+    resume: true,
+    sessionId: savedSessionId,
+  },
+});
+```
+If resume fails (corrupt session, SDK version mismatch), fall back to a fresh query with conversation history from the database.
+#### 6. Memory / Prompt Refresh
+With a long-lived query, the system prompt is set once. If memory files change (agent writes to them), the agent already knows — it made the change. No need to re-inject.
+However, external changes (config changes, channel updates) need a mechanism to notify the agent. Options:
+- Push a system message into the queue: `{ role: 'user', content: '[System] Channel config updated: ...' }`
+- End and restart the conversation (heavyweight, loses context)
+- Accept that external config changes take effect on next conversation
+### Channel Manager (WhatsApp Admin)
+Same pattern — `handleAdminMessage` pushes to the shared conversation queue instead of starting a new query. Admin messages from WhatsApp and chat messages from the UI flow into the same conversation.
+### Scheduler (Pulse / Cron)
+Pulse and cron messages can be pushed into the active conversation queue:
+```typescript
+// Instead of triggerAgent() starting a new query:
+const conv = conversations.get(activeConvId);
+if (conv) {
+  pushMessage(activeConvId, '<PULSE/>');
+} else {
+  // No active conversation — start one for the pulse
+  const conv = startConversation('pulse-' + Date.now(), opts);
+  pushMessage(conv.id, '<PULSE/>');
+}
+```
+This means pulse/cron actions happen in the same conversation context as the user's chat. The agent can reference recent conversation when deciding what to do on pulse.
+---
+## Migration Path
+### Phase 1: Ship v1 (current)
+- maxTurns: 8 orchestrator with SDK native background agents
+- Request-response model (one query per message)
+- Works, has known limitations with turn budget and blocking
+### Phase 2: Long-lived Query Refactor
+1. Implement `createAsyncQueue` utility
+2. Refactor `bloby-agent.ts` → conversation manager with `startConversation` / `pushMessage` / `endConversation`
+3. Refactor `index.ts` WebSocket handler to push to queue instead of starting new queries
+4. Refactor `manager.ts` admin handler same way
+5. Update backend restart logic (trigger on response/task-done, not query-end)
+6. Add session resume for supervisor restarts
+7. Refactor scheduler to push into active conversation
+### Phase 3: Enhancements
+- Session persistence across restarts (resume from sessionId)
+- Priority-based message injection (urgent interrupts vs normal flow)
+- External config change notifications via system messages
+- Conversation timeout / cleanup (end long-idle conversations)
+---
+## Files Affected
+| File | Change |
+|---|---|
+| `supervisor/bloby-agent.ts` | Complete rewrite — conversation manager replacing per-message queries |
+| `supervisor/index.ts` | WebSocket handler refactored to push/pull from conversation |
+| `supervisor/channels/manager.ts` | Admin handler pushes to conversation queue |
+| `supervisor/scheduler.ts` | Pulse/cron push into active conversation |
+| `supervisor/agents/index.ts` | No change — agent definitions loaded once per conversation start |
+| `supervisor/agents/prompts/*` | No change |
+| `worker/prompts/bloby-system-prompt.txt` | Remove maxTurns-related language. The "How You Work" section can be simplified since the orchestrator can now use tools freely without turn budget concerns. |
+---
+## Open Questions
+1. **Conversation lifetime**: When does a long-lived query end? Options: explicit clear, timeout after N hours of inactivity, or truly infinite (until supervisor restart).
+2. **Multiple conversations**: The current system tracks one active conversation via `/api/context/current`. With long-lived queries, do we support multiple concurrent conversations? Or always one?
+3. **Token accumulation**: A long-lived query accumulates context over time. The SDK handles context window management (sliding window, summarization?), but we need to understand the cost implications.
+4. **WhatsApp admin + chat convergence**: If WhatsApp admin messages and chat UI messages both feed into the same conversation queue, the agent sees both sources seamlessly. But the response routing needs to know which channel to reply on.
+5. **Customer agents**: Business mode customer agents are already stateless (per-customer buffers). Should they also become long-lived? Or keep them request-response since customer conversations are shorter?

package/supervisor/channels/manager.ts CHANGED Viewed

@@ -445,7 +445,7 @@ export class ChannelManager {
       { botName, humanName },
       recentMessages,
       undefined, // no supportPrompt
-      5,         // maxTurns: orchestrator mode
+      8,         // maxTurns: orchestrator mode
     );
   }

package/supervisor/chat/src/components/Chat/TypingIndicator.tsx CHANGED Viewed

@@ -33,12 +33,14 @@ export default function TypingIndicator({ text, toolName }: Props) {
             <span className="w-1.5 h-1.5 rounded-full bg-muted-foreground/60 animate-bounce" style={{ animationDelay: '150ms' }} />
             <span className="w-1.5 h-1.5 rounded-full bg-muted-foreground/60 animate-bounce" style={{ animationDelay: '300ms' }} />
           </span>
+          {/* Tool activity label — hidden for now (will be used for skill UI later)
           {toolName && (
             <div className="flex items-center gap-1.5 mt-1 text-xs text-muted-foreground">
               <div className="w-2.5 h-2.5 border-[1.5px] border-muted-foreground/30 border-t-primary rounded-full animate-spin" />
               <span>{toolLabel(toolName)}...</span>
             </div>
           )}
+          */}
         </div>
       </div>
     );
@@ -59,12 +61,14 @@ export default function TypingIndicator({ text, toolName }: Props) {
         >
           {text}
         </Streamdown>
+        {/* Tool activity label — hidden for now (will be used for skill UI later)
         {toolName && (
           <div className="flex items-center gap-1.5 mt-1 text-xs text-muted-foreground">
             <div className="w-2.5 h-2.5 border-[1.5px] border-muted-foreground/30 border-t-primary rounded-full animate-spin" />
             <span>{toolLabel(toolName)}...</span>
           </div>
         )}
+        */}
       </div>
     </div>
   );

package/supervisor/index.ts CHANGED Viewed

@@ -1132,7 +1132,7 @@ ${!connected ? '<script>setTimeout(()=>location.reload(),4000)</script>' : ''}
             const waMirrorJid = waStatus?.connected ? waStatus.info?.phoneNumber : null;
             let waChunkBuf = '';
-            // Start orchestrator query (maxTurns: 5 — fast, delegates heavy work)
+            // Start orchestrator query (maxTurns: 8 — quick tasks direct, coding delegated)
             log.info(`[orchestrator] ──── USER MESSAGE ────`);
             log.info(`[orchestrator] Content: "${content.slice(0, 100)}..."`);
             log.info(`[orchestrator] Model: ${freshConfig.ai.model}`);
@@ -1203,7 +1203,7 @@ ${!connected ? '<script>setTimeout(()=>location.reload(),4000)</script>' : ''}
               broadcastBloby(type, eventData);
             }, data.attachments, savedFiles, { botName, humanName }, recentMessages,
             undefined, // no supportPrompt
-            5,         // maxTurns: orchestrator mode
+            8,         // maxTurns: orchestrator mode
             );
           })();
           return;