npm - @sesamespace/hivemind - Versions diffs - 0.10.0 → 0.11.1 - Mend

@sesamespace/hivemind 0.10.0 → 0.11.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (52) hide show

package/.pnpmrc.json +1 -0
package/AUTO-DEBUG-DESIGN.md +267 -0
package/AUTOMATIC-MEMORY-MANAGEMENT.md +109 -0
package/DASHBOARD-PLAN.md +206 -0
package/MEMORY-ENHANCEMENT-PLAN.md +211 -0
package/TOOL-USE-DESIGN.md +173 -0
package/dist/{chunk-FBQBBAPZ.js → chunk-4C6B2AMB.js} +2 -2
package/dist/{chunk-FK6WYXRM.js → chunk-4YXOQGQC.js} +2 -2
package/dist/{chunk-IXBIAX76.js → chunk-K6KL2VD6.js} +2 -2
package/dist/{chunk-IJRAVHQC.js → chunk-LWJCKTQP.js} +51 -11
package/dist/chunk-LWJCKTQP.js.map +1 -0
package/dist/{chunk-BHCDOHSK.js → chunk-LYL5GG2F.js} +3 -3
package/dist/{chunk-M3A2WRXM.js → chunk-OB6OXLPC.js} +430 -2
package/dist/chunk-OB6OXLPC.js.map +1 -0
package/dist/{chunk-DPLCEMEC.js → chunk-ZA4NWNS6.js} +2 -2
package/dist/commands/fleet.js +3 -3
package/dist/commands/init.js +3 -3
package/dist/commands/service.js +1 -1
package/dist/commands/start.js +3 -3
package/dist/commands/watchdog.js +3 -3
package/dist/dashboard.html +100 -60
package/dist/index.js +2 -2
package/dist/main.js +7 -7
package/dist/start.js +1 -1
package/docs/TOOL-PARITY-PLAN.md +191 -0
package/package.json +23 -24
package/src/memory/dashboard-integration.ts +295 -0
package/src/memory/index.ts +187 -0
package/src/memory/performance-test.ts +208 -0
package/src/memory/processors/agent-sync.ts +312 -0
package/src/memory/processors/command-learner.ts +298 -0
package/src/memory/processors/memory-api-client.ts +105 -0
package/src/memory/processors/message-flow-integration.ts +168 -0
package/src/memory/processors/research-digester.ts +204 -0
package/test-caitlin-access.md +11 -0
package/dist/chunk-IJRAVHQC.js.map +0 -1
package/dist/chunk-M3A2WRXM.js.map +0 -1
package/install.sh +0 -162
package/packages/memory/Cargo.lock +0 -6480
package/packages/memory/Cargo.toml +0 -21
package/packages/memory/src/src/context.rs +0 -179
package/packages/memory/src/src/embeddings.rs +0 -51
package/packages/memory/src/src/main.rs +0 -887
package/packages/memory/src/src/promotion.rs +0 -808
package/packages/memory/src/src/scoring.rs +0 -142
package/packages/memory/src/src/store.rs +0 -460
package/packages/memory/src/src/tasks.rs +0 -321
/package/dist/{chunk-FBQBBAPZ.js.map → chunk-4C6B2AMB.js.map} +0 -0
/package/dist/{chunk-FK6WYXRM.js.map → chunk-4YXOQGQC.js.map} +0 -0
/package/dist/{chunk-IXBIAX76.js.map → chunk-K6KL2VD6.js.map} +0 -0
/package/dist/{chunk-BHCDOHSK.js.map → chunk-LYL5GG2F.js.map} +0 -0
/package/dist/{chunk-DPLCEMEC.js.map → chunk-ZA4NWNS6.js.map} +0 -0

package/.pnpmrc.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"onlyBuiltDependencies":["esbuild","better-sqlite3"]}

package/AUTO-DEBUG-DESIGN.md ADDED Viewed

@@ -0,0 +1,267 @@
+# Auto-Debug Agent — Design Doc
+## Vision
+A Hivemind background processor that watches agent logs, detects errors, diagnoses root causes, and submits fixes as PRs — creating a self-healing codebase. Any fleet agent (e.g. Caitlin) can run this processor to continuously improve the system while using it.
+## How It Works
+```
+Logs → Detect → Deduplicate → Diagnose → Fix → PR → Verify
+```
+### 1. Log Watcher (`log-watcher` processor)
+Tails all Hivemind log files:
+- `/tmp/hivemind-agent.log`
+- `/tmp/hivemind-error.log`
+- `/tmp/hivemind-watchdog.log`
+- `/tmp/hivemind-memory.log`
+- `/tmp/hivemind-memory-error.log`
+**Detection rules:**
+- Stack traces (multiline, starts with `Error:` or `at ...`)
+- Uncaught exceptions / unhandled rejections
+- Repeated warnings (same message 3+ times in 5 min)
+- Process crashes (exit codes ≠ 0)
+- Health check failures logged by watchdog
+- Memory daemon errors
+- Sesame connection failures
+**Output:** `ErrorEvent` objects with:
+```typescript
+interface ErrorEvent {
+  id: string;              // hash of normalized stack trace
+  timestamp: Date;
+  source: "agent" | "watchdog" | "memory";
+  level: "error" | "crash" | "repeated-warning";
+  message: string;
+  stackTrace?: string;
+  logFile: string;
+  lineNumber: number;
+  occurrences: number;     // count within dedup window
+  context: string[];       // surrounding log lines (±10)
+}
+```
+### 2. Deduplication & Prioritization
+Not every error deserves a PR. The processor maintains a local error registry:
+```typescript
+interface ErrorRegistry {
+  // key: error id (normalized stack hash)
+  errors: Map<string, {
+    firstSeen: Date;
+    lastSeen: Date;
+    totalOccurrences: number;
+    status: "new" | "investigating" | "fix-submitted" | "resolved" | "wont-fix";
+    prUrl?: string;
+    severity: number;       // 0-10, auto-calculated
+  }>;
+}
+```
+**Severity scoring:**
+- Crash/uncaught exception: +5
+- Occurs on every startup: +3
+- Frequency (>10/hour): +2
+- Affects Sesame connectivity: +2
+- Memory daemon related: +1
+- Already has a PR open: -10 (skip)
+- Marked wont-fix: -10 (skip)
+**Threshold:** Only investigate errors with severity ≥ 3.
+### 3. Diagnosis Engine
+When an error crosses the threshold, the processor:
+1. **Locate source:** Parse stack trace → map to source files in the repo
+2. **Gather context:** Use `code-indexer` data to understand the file, function, and dependencies
+3. **Check git blame:** Was this recently changed? By whom?
+4. **Search for related:** Query `task-tracker` for related tasks, check if a fix is already in progress
+5. **Build diagnosis prompt:** Assemble all context into a structured prompt for the agent's LLM
+The diagnosis is done by the agent itself (not a separate LLM call) — it feeds into the agent's normal message processing as an internal task.
+### 4. Fix Generation
+Once diagnosed, the agent:
+1. Creates a feature branch: `auto-fix/{error-id-short}`
+2. Makes the code change
+3. Runs `tsup` to verify build
+4. Runs tests if available
+5. Commits with a structured message:
+   ```
+   fix(auto-debug): {short description}
+   Error: {original error message}
+   Occurrences: {count} over {timespan}
+   Source: {log file}
+   Root cause: {diagnosis}
+   Auto-generated by hivemind auto-debug processor
+   ```
+6. Pushes branch and opens PR
+### 5. PR Template
+```markdown
+## 🤖 Auto-Debug Fix
+**Error:** `{error message}`
+**Source:** `{file}:{line}`
+**Frequency:** {count} occurrences over {timespan}
+**Severity:** {score}/10
+### Diagnosis
+{LLM-generated root cause analysis}
+### Fix
+{Description of what changed and why}
+### Log Sample
+```
+{relevant log lines}
+```
+### Verification
+- [ ] Build passes (`tsup`)
+- [ ] Error no longer reproduces (monitored for 30 min post-deploy)
+---
+*Auto-generated by hivemind auto-debug processor. Review before merging.*
+```
+### 6. Post-Fix Monitoring
+After a PR is merged and deployed:
+- Watch for the same error ID in logs
+- If it doesn't recur within 1 hour → mark as `resolved`
+- If it recurs → reopen with additional context, bump severity
+## Architecture
+### Processor Class
+```typescript
+class LogWatcher extends BackgroundProcess {
+  name = "log-watcher";
+  interval = 10_000; // check every 10s
+  private tailPositions: Map<string, number>;  // track file offsets
+  private errorRegistry: ErrorRegistry;
+  private activeBranches: Set<string>;
+  async run(context: ProcessContext): Promise<ProcessResult> {
+    // 1. Read new log lines since last position
+    const newLines = this.readNewLines();
+    // 2. Detect errors
+    const errors = this.detectErrors(newLines);
+    // 3. Deduplicate and score
+    const actionable = this.processErrors(errors);
+    // 4. For high-severity new errors, queue investigation
+    for (const error of actionable) {
+      await this.queueInvestigation(error, context);
+    }
+    return { itemsProcessed: newLines.length, errors: [] };
+  }
+}
+```
+### Integration Points
+- **code-indexer:** Provides file/function context for diagnosis
+- **task-tracker:** Tracks fix progress, prevents duplicate work
+- **agent-sync:** In fleet mode, coordinates so multiple agents don't fix the same bug
+- **Sesame:** Posts status updates to a designated channel (e.g. `#hivemind-debug`)
+- **GitHub:** Opens PRs via `gh` CLI
+### Fleet Coordination
+When multiple agents run this processor:
+1. Error registry is shared via Sesame vault or a shared memory namespace
+2. Lock mechanism: first agent to claim an error ID owns the investigation
+3. Other agents provide additional log samples if they see the same error
+4. Any agent can review/approve another agent's PR
+## Configuration
+```toml
+[processors.log-watcher]
+enabled = true
+interval_ms = 10_000
+severity_threshold = 3
+log_files = [
+  "/tmp/hivemind-agent.log",
+  "/tmp/hivemind-error.log",
+  "/tmp/hivemind-watchdog.log",
+  "/tmp/hivemind-memory.log",
+]
+auto_pr = true              # false = diagnose only, don't submit PRs
+repo = "baileydavis2026/hivemind"
+branch_prefix = "auto-fix"
+notify_channel = "hivemind-debug"  # Sesame channel for status updates
+max_concurrent_fixes = 2    # don't overwhelm with PRs
+cooldown_minutes = 30       # min time between PRs for same error class
+```
+## Safety & Guardrails
+1. **Human review required:** PRs are never auto-merged. A human must approve.
+2. **Scope limits:** Auto-debug only touches files in the Hivemind repo, never system files or other projects.
+3. **Rate limiting:** Max 2 concurrent fix branches, 30-min cooldown per error class.
+4. **Severity gating:** Only acts on errors above threshold — ignores transient/cosmetic issues.
+5. **Rollback awareness:** If a fix introduces new errors, the processor detects this and comments on the PR.
+6. **No secrets in PRs:** Log context is sanitized (API keys, tokens stripped) before inclusion in PR descriptions.
+## Implementation Plan
+### Phase 1: Log Watcher (detection only)
+- Implement `LogWatcher` processor
+- Error detection, deduplication, registry
+- Sesame channel notifications for new errors
+- Dashboard integration (error list view)
+### Phase 2: Diagnosis
+- Stack trace → source mapping
+- Integration with code-indexer for context
+- LLM-powered root cause analysis
+- Diagnosis reports posted to Sesame
+### Phase 3: Auto-Fix PRs
+- Branch creation, code changes, PR submission
+- Build verification
+- Post-merge monitoring
+- Fleet coordination (error locking)
+### Phase 4: Learning
+- Track fix success rate per error pattern
+- Build pattern library (common fixes for common errors)
+- Skip LLM for known patterns → direct fix
+- Feed insights back to MEMORY-ENHANCEMENT-PLAN
+## Success Metrics
+- **Detection rate:** % of real errors caught vs. total errors in logs
+- **False positive rate:** % of investigations that led to wont-fix
+- **Fix success rate:** % of merged PRs that resolved the error
+- **Time to fix:** From first error occurrence to PR merged
+- **Regression rate:** % of fixes that introduced new errors
+## Why This Works
+The fastest path to a bug-free system is having the system's own users (agents) fix bugs as they encounter them. Every agent running Hivemind becomes a contributor to Hivemind's stability. The more agents deployed, the faster bugs are found and fixed. It's a flywheel:
+```
+More agents → More log coverage → More bugs found → More fixes → More stable → More agents
+```
+This is the software equivalent of an immune system — the codebase develops antibodies to its own failure modes.

package/AUTOMATIC-MEMORY-MANAGEMENT.md ADDED Viewed

@@ -0,0 +1,109 @@
+# Automatic Memory Management in Hivemind
+## Overview
+Hivemind v0.8.13+ includes automatic memory management that runs in the background without requiring any cognitive load from agents. This system enriches every agent message with relevant context automatically.
+## Features
+### 1. Background Processors
+- **Code Indexer** - Tracks file access patterns and maintains a "working set"
+- **Task Tracker** - Monitors conversations for task patterns and progress
+- **Research Digester** - Extracts key points from web pages and documents
+- **Command Learner** - Tracks successful command patterns
+- **Agent Sync** - Enables knowledge sharing between agents
+### 2. Auto-Debug System
+- **Log Watcher** - Monitors agent/memory/watchdog logs for errors
+- **Auto Debugger** - Automatically creates fixes for detected errors
+- **Error Registry** - Tracks error patterns and prevents duplicate fixes
+### 3. Zero Configuration
+All memory management features are automatically enabled when you install a new agent. No setup required.
+## How It Works
+1. **Background Processing** - Uses Mac Mini's local compute, not expensive LLM calls
+2. **Continuous Updates** - Memory is updated in real-time as the agent works
+3. **Context Building** - Relevant information is automatically included in each LLM request
+4. **Token Management** - Efficiently manages context size to stay within limits
+## Installation
+The standard one-line installer sets up everything:
+```bash
+npm install -g @sesamespace/hivemind
+hivemind init <sesame-api-key>
+hivemind service install
+```
+This automatically:
+- Installs the memory daemon with background processors
+- Sets up the watchdog to monitor both agent and memory health
+- Enables auto-debug features if configured
+- Starts all services with proper launchd configuration
+## Configuration
+Optional configuration in `config.toml`:
+```toml
+[auto_debug]
+enabled = true
+log_files = ["/tmp/hivemind-agent.log", "/tmp/hivemind-memory.log"]
+severity_threshold = "error"
+auto_pr = true
+repo = "https://github.com/baileydavis2026/hivemind"
+notify_channel = "your-sesame-channel-id"
+```
+## Architecture
+```
+┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
+│   Agent Process │────▶│  Memory Daemon  │◀────│    Watchdog     │
+└─────────────────┘     └─────────────────┘     └─────────────────┘
+                               │
+                               ▼
+                    ┌─────────────────────┐
+                    │ Background Processors│
+                    ├─────────────────────┤
+                    │ • Code Indexer      │
+                    │ • Task Tracker      │
+                    │ • Research Digester │
+                    │ • Command Learner   │
+                    │ • Agent Sync        │
+                    │ • Log Watcher       │
+                    │ • Auto Debugger     │
+                    └─────────────────────┘
+```
+## For Developers
+To extend the memory management system:
+1. Create a new processor implementing `BackgroundProcessor` interface
+2. Register it in the `ProcessManager`
+3. The processor will automatically run in the background
+Example:
+```typescript
+export class MyProcessor extends BackgroundProcessor {
+  async initialize(): Promise<void> {
+    // Setup code
+  }
+  async process(): Promise<void> {
+    // Main processing logic (called periodically)
+  }
+}
+```
+## Benefits
+- **Zero Cognitive Load** - Agents don't need to think about memory
+- **Better Context** - Relevant information is always available
+- **Multi-Agent Collaboration** - Agents can share knowledge seamlessly
+- **Self-Healing** - Auto-debug fixes errors without human intervention
+- **Efficient** - Uses local compute, not expensive API calls

package/DASHBOARD-PLAN.md ADDED Viewed

@@ -0,0 +1,206 @@
+# Hivemind Dashboard — Implementation Plan
+**Goal:** Local web dashboard for debugging memory, context routing, and LLM request formation.
+**Access:** `http://localhost:9485` on the Mac mini (local access only for now).
+**Priority:** LLM Request Inspector first, then Memory Browser, then Context Overview.
+---
+## Phase 1: LLM Request Logger + Inspector UI
+### Backend: Request Logging
+**Where:** Instrument `buildMessages()` in `prompt.ts` and `processMessage()` in `agent.ts`.
+Each logged request captures:
+```typescript
+interface RequestLog {
+  id: string;                    // uuid
+  timestamp: string;             // ISO-8601
+  // Routing
+  context: string;               // which context was used
+  contextSwitched: boolean;      // explicit switch?
+  routingReason: string;         // "pattern_match:X" | "inferred:X" | "active:X"
+  // Sender
+  channelId: string;
+  channelKind: "dm" | "group";
+  senderHandle: string;
+  rawMessage: string;            // as received (with prefix)
+  // Prompt components (broken out for UI)
+  systemPrompt: {
+    identity: string;            // workspace files section
+    l3Knowledge: string[];       // individual L3 entries
+    l2Episodes: Array<{
+      id: string;
+      content: string;
+      score: number;
+      timestamp: string;
+      context_name: string;
+      role: string;
+    }>;
+    contextInfo: string;         // active context section
+    fullText: string;            // complete system prompt as sent
+  };
+  conversationHistory: Array<{ role: string; content: string }>;  // L1 turns included
+  userMessage: string;           // final user message
+  // Response
+  response: {
+    content: string;
+    model: string;
+    latencyMs: number;
+    skipped: boolean;            // was it __SKIP__?
+  };
+  // Config snapshot
+  config: {
+    topK: number;
+    model: string;
+    maxTokens: number;
+    temperature: number;
+  };
+  // Approximate token counts (char-based estimate: chars/4)
+  tokenEstimates: {
+    systemPrompt: number;
+    conversationHistory: number;
+    userMessage: number;
+    total: number;
+  };
+}
+```
+**Storage:** SQLite database at `data/dashboard.db`.
+- Single `request_logs` table with JSON columns for complex fields.
+- Auto-prune: keep last 7 days or 10,000 entries (whichever is smaller).
+- Why SQLite over ring buffer: survives restarts, queryable, minimal overhead.
+**Token estimation:** Use chars/4 approximation. Good enough for relative sizing. Avoid tokenizer dependency.
+**Logging approach:** Eager logging. Serialize at request time. The overhead is minimal (~1ms for JSON.stringify) compared to LLM latency (~1-10s). Capturing the exact state at request time is more valuable than lazy reconstruction.
+### Backend: Dashboard HTTP Server
+**Where:** New file `packages/runtime/src/dashboard.ts`.
+Extend the existing health server (or create a sibling on port 9485):
+- `GET /` — serve the SPA (single HTML file)
+- `GET /api/requests` — list recent requests (paginated, filterable)
+- `GET /api/requests/:id` — single request detail
+- `GET /api/contexts` — proxy to memory daemon's context list
+- `GET /api/contexts/:name/episodes` — proxy L2 episodes
+- `GET /api/contexts/:name/l3` — proxy L3 knowledge
+- `GET /api/stats` — memory stats (episode counts, last promotion, etc.)
+- `DELETE /api/l3/:id` — delete a bad L3 entry (write op from day 1)
+- `POST /api/l3/:id/edit` — edit L3 entry content
+Bind to `127.0.0.1:9485` only.
+### Frontend: Single-File SPA
+**Why single file:** No build step, no React, no dependencies. Ship as one HTML file with embedded CSS/JS. Can always upgrade later.
+**Layout:**
+- Left sidebar: navigation (Requests, Memory, Contexts)
+- Main area: content
+**Request Inspector view:**
+- Reverse-chronological list of requests
+- Each row: timestamp, sender, context, model, latency, token estimate
+- Click to expand → shows all sections:
+  - **Identity files** (collapsible, usually not interesting)
+  - **L3 Knowledge** (list of entries with metadata)
+  - **L2 Episodes** (with similarity scores, timestamps, source context)
+  - **L1 History** (conversation turns)
+  - **User Message** (raw with prefix)
+  - **Response** (with model, latency)
+  - **Config** (top_k, model, temperature)
+  - **Token breakdown** (bar chart showing proportion per section)
+- Filters: by context, by sender, by time range
+- Search: full-text search across messages
+**Memory Browser view (Phase 2):**
+- L2: searchable episode list, filterable by context/role/time
+- L3: per-context knowledge entries with edit/delete buttons
+- Promotion log (if we add logging for it)
+**Context Overview (Phase 2):**
+- List of contexts with episode counts, last active
+- Active context highlighted
+- Click to drill into episodes/L3
+---
+## Phase 2: Memory Browser + Context Overview
+After Phase 1 is working and useful, add:
+- Full L2 browsing with semantic search UI
+- L3 management (view, edit, delete)
+- Context explorer with stats
+- Promotion history logging
+---
+## Implementation Steps (Phase 1)
+### Step 1: Request logging infrastructure
+- [ ] Create `packages/runtime/src/request-logger.ts`
+  - SQLite setup (using better-sqlite3)
+  - `logRequest()` method
+  - `getRequests()` with pagination/filters
+  - `getRequest(id)` for detail view
+  - Auto-pruning on startup
+- [ ] Add better-sqlite3 dependency
+### Step 2: Instrument the pipeline
+- [ ] Modify `agent.ts` `processMessage()` to capture routing decision + timing
+- [ ] Modify `prompt.ts` `buildSystemPrompt()` to return structured components (not just string)
+- [ ] Log each request after LLM response arrives
+- [ ] Capture config snapshot with each log entry
+### Step 3: Dashboard HTTP server
+- [ ] Create `packages/runtime/src/dashboard.ts`
+  - Express-free: use Node's built-in `http` module (like health server)
+  - Serve SPA at `/`
+  - JSON APIs for request logs and memory proxy
+- [ ] Wire into `pipeline.ts` startup
+### Step 4: Frontend SPA
+- [ ] Single HTML file at `packages/runtime/src/dashboard.html`
+  - Vanilla JS, no framework
+  - CSS grid layout
+  - Fetch-based API calls
+  - Expandable request cards
+  - Token breakdown visualization
+  - Basic filtering
+### Step 5: Memory proxy + write ops
+- [ ] Proxy endpoints to memory daemon for L2/L3 browsing
+- [ ] DELETE/PATCH endpoints for L3 management
+---
+## Design Decisions
+| Question | Decision | Rationale |
+|----------|----------|-----------|
+| Storage | SQLite | Survives restarts, queryable, lightweight |
+| Token counting | chars/4 estimate | Good enough, no tokenizer dep |
+| Logging | Eager | Captures exact state, overhead negligible vs LLM latency |
+| Bind address | 127.0.0.1 only | Local access, no auth needed |
+| Framework | None (vanilla) | Single HTML file, no build step |
+| Read-only or read-write? | Read-write from start | Ryan will want to delete bad L3 entries immediately |
+| Persist request logs? | Yes, 7 days | Need to compare across memory config changes |
+| Multi-agent? | Single agent for now | Don't over-engineer, but use agent name in logs |
+| Port | 9485 | Next to health port (9484), easy to remember |
+---
+## Sesame Command Fix (Bonus)
+While we're in the code, fix the sender prefix issue:
+- In `pipeline.ts` `startSesameLoop()`, before calling `agent.processMessage()`, strip the sender prefix for command parsing
+- Or better: in `agent.ts` `handleSpecialCommand()`, strip known prefix patterns before regex matching
+- This unblocks context switching, task commands, and cross-context search over Sesame
+---
+*Created: 2026-02-28*
+*Status: Ready to implement*